File size: 2,453 Bytes
78e6263
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
---
base_model:
- mistralai/Mixtral-8x7B-Instruct-v0.1
library_name: transformers
tags:
- mergekit
- merge
license: apache-2.0
language:
- fr
- it
- de
- es
- en
---
# Mixtral-8x7B-Instruct-v0.1-upscaled

This is a frankenmerge of [mistralai/Mixtral-8x7B-Instruct-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1) created by interleaving layers of itself using [mergekit](https://github.com/cg123/mergekit).

## Benchmark
The benchmark score of the [mt-bench](https://github.com/lm-sys/FastChat/tree/main/fastchat/llm_judge) for this model and the original models are as follows:

**1-turn**
|Model|Size|Coding|Extraction|Humanities|Math|Reasoning|Roleplay|STEM|Writing|avg_score|
|---|---|---|---|---|---|---|---|---|---|---|
| Mixtral-8x7B-Instruct-v0.1 | 8x7B | 5.3 | **8.5** | **9.9** | **6.8** | 6.0 | 9.1 | 9.55 | 8.9 | 8.00625 |
| This model  | around 8x12B? | **6.3** | 8.4 | **9.9** | 5.4 | **7.7** | **9.2** | **9.75** | **9.8** | **8.30625** |
![mt-bench-1turn](./mt-bench-1turn.png)

**2-turn**
|Model|Size|Coding|Extraction|Humanities|Math|Reasoning|Roleplay|STEM|Writing|avg_score|
|---|---|---|---|---|---|---|---|---|---|---|
| Mixtral-8x7B-Instruct-v0.1 | 8x7B | 4.1 | **8.4** | 9.8 | **4.7** | **5.6** | 9.0 | **9.2** | **9.5** | **7.5375** |
| This model  | around 8x12B? | **4.2** | 7.4 | **9.9** | 4.0 | 5.2 | **9.5** | 8.7 | 8.0 | 7.1125 |
![mt-bench-2turn](./mt-bench-2turn.png)

## Merge Details
### Merge Method

This model was merged using the passthrough merge method.

### Models Merged

The following models were included in the merge:
* mistralai/Mixtral-8x7B-Instruct-v0.1

### Configuration

The following YAML configuration was used to produce this model:

```yaml
merge_method: passthrough
slices:
  - sources:
      - model: mistralai/Mixtral-8x7B-Instruct-v0.1
        layer_range: [0, 8]
  - sources:
      - model: mistralai/Mixtral-8x7B-Instruct-v0.1
        layer_range: [4, 12]
  - sources:
      - model: mistralai/Mixtral-8x7B-Instruct-v0.1
        layer_range: [8, 16]
  - sources:
      - model: mistralai/Mixtral-8x7B-Instruct-v0.1
        layer_range: [12, 20]
  - sources:
      - model: mistralai/Mixtral-8x7B-Instruct-v0.1
        layer_range: [16, 24]
  - sources:
      - model: mistralai/Mixtral-8x7B-Instruct-v0.1
        layer_range: [20, 28]
  - sources:
      - model: mistralai/Mixtral-8x7B-Instruct-v0.1
        layer_range: [24, 32]
dtype: bfloat16
tokenizer_source: base

```