alexmarques commited on
Commit
ebebf06
·
verified ·
1 Parent(s): e4a1bbf

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +66 -0
README.md ADDED
@@ -0,0 +1,66 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - vllm
4
+ - sparsity
5
+ pipeline_tag: text-generation
6
+ license: llama3.1
7
+ base_model: neuralmagic/Sparse-Llama-3.1-8B-2of4
8
+ datasets:
9
+ - openai/gsm8k
10
+ language:
11
+ - en
12
+ metrics:
13
+ - accuracy
14
+ ---
15
+
16
+ # Sparse-Llama-3.1-8B-gsm8k-2of4
17
+
18
+ ## Model Overview
19
+ - **Model Architecture:** Llama-3.1-8B
20
+ - **Input:** Text
21
+ - **Output:** Text
22
+ - **Model Optimizations:**
23
+ - **Sparsity:** 2:4
24
+ - **Release Date:** 11/21/2024
25
+ - **Version:** 1.0
26
+ - **License(s):** [llama3.1](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B/blob/main/LICENSE)
27
+ - **Model Developers:** Neural Magic
28
+
29
+ This is AI model especialized in grade-school math obtained by fine-tuning the 2:4 sparse [Sparse-Llama-3.1-8B-2of4](https://huggingface.co/neuralmagic/Sparse-Llama-3.1-8B-2of4) on the [GSM8k](https://huggingface.co/datasets/openai/gsm8k) dataset.
30
+ It achieves 66.9% 0-shot accuracy on the test set of GSM8k, compared to 66.3% for the fine-tuned dense model [Llama-3.1-8B-gsm8k](https://huggingface.co/neuralmagic/Llama-3.1-8B-gsm8k) — demonstrating over **100% accuracy recovery**.
31
+ In constrast, the pretrained [Llama-3.1-8B](https://huggingface.co/meta-llama/Llama-3.1-8B) achieves 50.7% 5-shot accuracy and the sparse foundational [Sparse-Llama-3.1-8B-2of4](https://huggingface.co/neuralmagic/Sparse-Llama-3.1-8B-2of4) model achieves 56.3% 5-shot accuracy.
32
+
33
+
34
+ ### Model Optimizations
35
+
36
+ This inherits the optimizations from its parent, [Sparse-Llama-3.1-8B-2of4](https://huggingface.co/neuralmagic/Sparse-Llama-3.1-8B-2of4).
37
+ Namely, all linear operators within transformer blocks were pruned to the 2:4 sparsity pattern: in each group of four weights, two are retained while two are pruned.
38
+
39
+
40
+ ## Deployment with vLLM
41
+
42
+ This model can be deployed efficiently using the [vLLM](https://docs.vllm.ai/en/latest/) backend. vLLM aslo supports OpenAI-compatible serving. See the [documentation](https://docs.vllm.ai/en/latest/) for more details.
43
+
44
+
45
+ ## Evaluation
46
+
47
+ This model was evaluated on the [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness).
48
+
49
+ ### Accuracy
50
+ #### GSM8k Benchmark
51
+ <table>
52
+ <tr>
53
+ <td><strong>Metric</strong></td>
54
+ <td style="text-align: center"><strong>Llama-3.1-8B<br>(5-shot)</strong></td>
55
+ <td style="text-align: center"><strong>Sparse-Llama-3.1-8B-2of4<br>(5-shot)</strong></td>
56
+ <td style="text-align: center"><strong>Llama-3.1-8B-gsm8k<br>(0-shot)</strong></td>
57
+ <td style="text-align: center"><strong>Sparse-Llama-3.1-8B-gsm8k<br>(0-shot)</strong></td>
58
+ </tr>
59
+ <tr>
60
+ <td>Accuracy</td>
61
+ <td style="text-align: center">50.7%</td>
62
+ <td style="text-align: center">56.3%</td>
63
+ <td style="text-align: center">66.3%</td>
64
+ <td style="text-align: center">66.9%</td>
65
+ </tr>
66
+ </table>