File size: 2,453 Bytes
78e6263 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 |
---
base_model:
- mistralai/Mixtral-8x7B-Instruct-v0.1
library_name: transformers
tags:
- mergekit
- merge
license: apache-2.0
language:
- fr
- it
- de
- es
- en
---
# Mixtral-8x7B-Instruct-v0.1-upscaled
This is a frankenmerge of [mistralai/Mixtral-8x7B-Instruct-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1) created by interleaving layers of itself using [mergekit](https://github.com/cg123/mergekit).
## Benchmark
The benchmark score of the [mt-bench](https://github.com/lm-sys/FastChat/tree/main/fastchat/llm_judge) for this model and the original models are as follows:
**1-turn**
|Model|Size|Coding|Extraction|Humanities|Math|Reasoning|Roleplay|STEM|Writing|avg_score|
|---|---|---|---|---|---|---|---|---|---|---|
| Mixtral-8x7B-Instruct-v0.1 | 8x7B | 5.3 | **8.5** | **9.9** | **6.8** | 6.0 | 9.1 | 9.55 | 8.9 | 8.00625 |
| This model | around 8x12B? | **6.3** | 8.4 | **9.9** | 5.4 | **7.7** | **9.2** | **9.75** | **9.8** | **8.30625** |
![mt-bench-1turn](./mt-bench-1turn.png)
**2-turn**
|Model|Size|Coding|Extraction|Humanities|Math|Reasoning|Roleplay|STEM|Writing|avg_score|
|---|---|---|---|---|---|---|---|---|---|---|
| Mixtral-8x7B-Instruct-v0.1 | 8x7B | 4.1 | **8.4** | 9.8 | **4.7** | **5.6** | 9.0 | **9.2** | **9.5** | **7.5375** |
| This model | around 8x12B? | **4.2** | 7.4 | **9.9** | 4.0 | 5.2 | **9.5** | 8.7 | 8.0 | 7.1125 |
![mt-bench-2turn](./mt-bench-2turn.png)
## Merge Details
### Merge Method
This model was merged using the passthrough merge method.
### Models Merged
The following models were included in the merge:
* mistralai/Mixtral-8x7B-Instruct-v0.1
### Configuration
The following YAML configuration was used to produce this model:
```yaml
merge_method: passthrough
slices:
- sources:
- model: mistralai/Mixtral-8x7B-Instruct-v0.1
layer_range: [0, 8]
- sources:
- model: mistralai/Mixtral-8x7B-Instruct-v0.1
layer_range: [4, 12]
- sources:
- model: mistralai/Mixtral-8x7B-Instruct-v0.1
layer_range: [8, 16]
- sources:
- model: mistralai/Mixtral-8x7B-Instruct-v0.1
layer_range: [12, 20]
- sources:
- model: mistralai/Mixtral-8x7B-Instruct-v0.1
layer_range: [16, 24]
- sources:
- model: mistralai/Mixtral-8x7B-Instruct-v0.1
layer_range: [20, 28]
- sources:
- model: mistralai/Mixtral-8x7B-Instruct-v0.1
layer_range: [24, 32]
dtype: bfloat16
tokenizer_source: base
``` |