File size: 2,599 Bytes
626294b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
---
license: apache-2.0
---

Zero-shot results when using the [Llama-3.1-70B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-70B-Instruct) as the teacher model, and the [Llama-3.2-3B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct) as the initialized model

| Model          | [Llama-3.2-3B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct) | [Llama3.2-Mamba-3B-distill](https://huggingface.co/JunxiongWang/Llama3.2-Mamba-3B-distill)       | [Llama3.2-Mamba-3B-dpo](https://huggingface.co/JunxiongWang/Llama3.2-Mamba-3B-dpo)       | [Llama3.2-Mamba2-3B-distill](https://huggingface.co/JunxiongWang/Llama3.2-Mamba2-3B-distill)       | [Llama3.2-Mamba2-3B-dpo](https://huggingface.co/JunxiongWang/Llama3.2-Mamba2-3B-dpo)       |
|---------------|---------------------------------------------------------------------------------|-----------------------------------|-----------------------------------|-----------------------------------|-----------------------------------|
| Initialization Model | N/A                                                                             | Llama-3.2-3B-Instruct             | Llama-3.2-3B-Instruct             | Llama-3.2-3B-Instruct             | Llama-3.2-3B-Instruct             |
| Teacher Model | N/A                                                                             | Llama-3.1-70B-Instruct             | Llama-3.1-70B-Instruct             | Llama-3.1-70B-Instruct             | Llama-3.1-70B-Instruct             |
| arc_challenge       | 0.459    | 0.4838   | 0.5265   | 0.4667   | 0.541    |
| arc_easy            | 0.7407   | 0.7765   | 0.7997   | 0.7668   | 0.8026   |
| hellaswag           | 0.7043   | 0.7037   | 0.7256   | 0.6913   | 0.7445   |
| mmlu                | 0.6043   | 0.5448   | 0.5509   | 0.5312   | 0.5247   |
| openbookqa          | 0.36     | 0.394    | 0.416    | 0.388    | 0.424    |
| piqa                | 0.7568   | 0.7731   | 0.7731   | 0.7601   | 0.7769   |
| pubmedqa            | 0.696    | 0.664    | 0.7      | 0.638    | 0.654    |
| race                | 0.4067   | 0.4029   | 0.4364   | 0.3981   | 0.4344   |
| winogrande          | 0.6748   | 0.6732   | 0.674    | 0.6606   | 0.6732   |
| truthfulqa          | 0.3801   | 0.4202   | 0.4853   | 0.3478   | 0.5028   |


```
@article{junxiongdaniele2024mambainllama,
  title   = {The Mamba in the Llama: Distilling and Accelerating Hybrid Models},
  author  = {Junxiong Wang and Daniele Paliotta and Avner May and Alexander M. Rush and Tri Dao},
  journal = {arXiv preprint arXiv:2408.15237},
  year    = {2024}
}
```