JunxiongWang
commited on
Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,30 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: apache-2.0
|
3 |
+
---
|
4 |
+
|
5 |
+
Zero-shot results when using the [Llama-3.1-70B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-70B-Instruct) as the teacher model, and the [Llama-3.2-3B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct) as the initialized model
|
6 |
+
|
7 |
+
| Model | [Llama-3.2-3B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct) | [Llama3.2-Mamba-3B-distill](https://huggingface.co/JunxiongWang/Llama3.2-Mamba-3B-distill) | [Llama3.2-Mamba-3B-dpo](https://huggingface.co/JunxiongWang/Llama3.2-Mamba-3B-dpo) | [Llama3.2-Mamba2-3B-distill](https://huggingface.co/JunxiongWang/Llama3.2-Mamba2-3B-distill) | [Llama3.2-Mamba2-3B-dpo](https://huggingface.co/JunxiongWang/Llama3.2-Mamba2-3B-dpo) |
|
8 |
+
|---------------|---------------------------------------------------------------------------------|-----------------------------------|-----------------------------------|-----------------------------------|-----------------------------------|
|
9 |
+
| Initialization Model | N/A | Llama-3.2-3B-Instruct | Llama-3.2-3B-Instruct | Llama-3.2-3B-Instruct | Llama-3.2-3B-Instruct |
|
10 |
+
| Teacher Model | N/A | Llama-3.1-70B-Instruct | Llama-3.1-70B-Instruct | Llama-3.1-70B-Instruct | Llama-3.1-70B-Instruct |
|
11 |
+
| arc_challenge | 0.459 | 0.4838 | 0.5265 | 0.4667 | 0.541 |
|
12 |
+
| arc_easy | 0.7407 | 0.7765 | 0.7997 | 0.7668 | 0.8026 |
|
13 |
+
| hellaswag | 0.7043 | 0.7037 | 0.7256 | 0.6913 | 0.7445 |
|
14 |
+
| mmlu | 0.6043 | 0.5448 | 0.5509 | 0.5312 | 0.5247 |
|
15 |
+
| openbookqa | 0.36 | 0.394 | 0.416 | 0.388 | 0.424 |
|
16 |
+
| piqa | 0.7568 | 0.7731 | 0.7731 | 0.7601 | 0.7769 |
|
17 |
+
| pubmedqa | 0.696 | 0.664 | 0.7 | 0.638 | 0.654 |
|
18 |
+
| race | 0.4067 | 0.4029 | 0.4364 | 0.3981 | 0.4344 |
|
19 |
+
| winogrande | 0.6748 | 0.6732 | 0.674 | 0.6606 | 0.6732 |
|
20 |
+
| truthfulqa | 0.3801 | 0.4202 | 0.4853 | 0.3478 | 0.5028 |
|
21 |
+
|
22 |
+
|
23 |
+
```
|
24 |
+
@article{junxiongdaniele2024mambainllama,
|
25 |
+
title = {The Mamba in the Llama: Distilling and Accelerating Hybrid Models},
|
26 |
+
author = {Junxiong Wang and Daniele Paliotta and Avner May and Alexander M. Rush and Tri Dao},
|
27 |
+
journal = {arXiv preprint arXiv:2408.15237},
|
28 |
+
year = {2024}
|
29 |
+
}
|
30 |
+
```
|