alexmarques commited on
Commit
8302444
·
verified ·
1 Parent(s): fcf51db

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +20 -18
README.md CHANGED
@@ -32,7 +32,7 @@ base_model: meta-llama/Meta-Llama-3.1-405B-Instruct
32
  - **Model Developers:** Neural Magic
33
 
34
  Quantized version of [Meta-Llama-3.1-405B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-405B-Instruct).
35
- It achieves scores within 1% of the scores of the unquantized model for MMLU, ARC-Challenge, GSM-8k, Hellaswag, Winogrande and TruthfulQA.
36
 
37
  ### Model Optimizations
38
 
@@ -145,6 +145,8 @@ The model was evaluated on MMLU, ARC-Challenge, GSM-8K, Hellaswag, Winogrande an
145
  Evaluation was conducted using the Neural Magic fork of [lm-evaluation-harness](https://github.com/neuralmagic/lm-evaluation-harness/tree/llama_3.1_instruct) (branch llama_3.1_instruct) and the [vLLM](https://docs.vllm.ai/en/stable/) engine.
146
  This version of the lm-evaluation-harness includes versions of MMLU, ARC-Challenge and GSM-8K that match the prompting style of [Meta-Llama-3.1-Instruct-evals](https://huggingface.co/datasets/meta-llama/Meta-Llama-3.1-405B-Instruct-evals).
147
 
 
 
148
  ### Accuracy
149
 
150
  #### Open LLM Leaderboard evaluation scores
@@ -162,11 +164,11 @@ This version of the lm-evaluation-harness includes versions of MMLU, ARC-Challen
162
  <tr>
163
  <td>MMLU (5-shot)
164
  </td>
165
- <td>87.41
166
  </td>
167
- <td>87.47
168
  </td>
169
- <td>100.1%
170
  </td>
171
  </tr>
172
  <tr>
@@ -174,9 +176,9 @@ This version of the lm-evaluation-harness includes versions of MMLU, ARC-Challen
174
  </td>
175
  <td>88.26
176
  </td>
177
- <td>88.23
178
  </td>
179
- <td>100.0%
180
  </td>
181
  </tr>
182
  <tr>
@@ -184,9 +186,9 @@ This version of the lm-evaluation-harness includes versions of MMLU, ARC-Challen
184
  </td>
185
  <td>94.97
186
  </td>
187
- <td>94.88
188
  </td>
189
- <td>99.9%
190
  </td>
191
  </tr>
192
  <tr>
@@ -196,7 +198,7 @@ This version of the lm-evaluation-harness includes versions of MMLU, ARC-Challen
196
  </td>
197
  <td>96.13
198
  </td>
199
- <td>99.7%
200
  </td>
201
  </tr>
202
  <tr>
@@ -204,7 +206,7 @@ This version of the lm-evaluation-harness includes versions of MMLU, ARC-Challen
204
  </td>
205
  <td>88.33
206
  </td>
207
- <td>88.50
208
  </td>
209
  <td>100.2%
210
  </td>
@@ -214,9 +216,9 @@ This version of the lm-evaluation-harness includes versions of MMLU, ARC-Challen
214
  </td>
215
  <td>87.21
216
  </td>
217
- <td>87.61
218
  </td>
219
- <td>100.5%
220
  </td>
221
  </tr>
222
  <tr>
@@ -224,7 +226,7 @@ This version of the lm-evaluation-harness includes versions of MMLU, ARC-Challen
224
  </td>
225
  <td>64.64
226
  </td>
227
- <td>65.42
228
  </td>
229
  <td>101.2%
230
  </td>
@@ -234,7 +236,7 @@ This version of the lm-evaluation-harness includes versions of MMLU, ARC-Challen
234
  </td>
235
  <td><strong>86.75</strong>
236
  </td>
237
- <td><strong>86.89</strong>
238
  </td>
239
  <td><strong>100.2%</strong>
240
  </td>
@@ -249,7 +251,7 @@ The results were obtained using the following commands:
249
  ```
250
  lm_eval \
251
  --model vllm \
252
- --model_args pretrained="neuralmagic/Meta-Llama-3.1-405B-Instruct-quantized.w8a16",dtype=auto,add_bos_token=True,max_model_len=3850,max_gen_toks=10,enable_chunked_prefill=True,tensor_parallel_size=8 \
253
  --tasks mmlu_llama_3.1_instruct \
254
  --fewshot_as_multiturn \
255
  --apply_chat_template \
@@ -261,7 +263,7 @@ lm_eval \
261
  ```
262
  lm_eval \
263
  --model vllm \
264
- --model_args pretrained="neuralmagic/Meta-Llama-3.1-405B-Instruct-quantized.w8a16",dtype=auto,add_bos_token=True,max_model_len=4064,max_gen_toks=1024,enable_chunked_prefill=True,tensor_parallel_size=8 \
265
  --tasks mmlu_cot_0shot_llama_3.1_instruct \
266
  --apply_chat_template \
267
  --num_fewshot 0 \
@@ -272,7 +274,7 @@ lm_eval \
272
  ```
273
  lm_eval \
274
  --model vllm \
275
- --model_args pretrained="neuralmagic/Meta-Llama-3.1-405B-Instruct-quantized.w8a16",dtype=auto,add_bos_token=True,max_model_len=3940,max_gen_toks=100,enable_chunked_prefill=True,tensor_parallel_size=8 \
276
  --tasks arc_challenge_llama_3.1_instruct \
277
  --apply_chat_template \
278
  --num_fewshot 0 \
@@ -283,7 +285,7 @@ lm_eval \
283
  ```
284
  lm_eval \
285
  --model vllm \
286
- --model_args pretrained="neuralmagic/Meta-Llama-3.1-405B-Instruct-quantized.w8a16",dtype=auto,add_bos_token=True,max_model_len=4096,max_gen_toks=1024,enable_chunked_prefill=True,tensor_parallel_size=8 \
287
  --tasks gsm8k_cot_llama_3.1_instruct \
288
  --fewshot_as_multiturn \
289
  --apply_chat_template \
 
32
  - **Model Developers:** Neural Magic
33
 
34
  Quantized version of [Meta-Llama-3.1-405B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-405B-Instruct).
35
+ It achieves scores within 0.3% of the scores of the unquantized model for MMLU, ARC-Challenge, GSM-8k, Hellaswag, Winogrande and TruthfulQA.
36
 
37
  ### Model Optimizations
38
 
 
145
  Evaluation was conducted using the Neural Magic fork of [lm-evaluation-harness](https://github.com/neuralmagic/lm-evaluation-harness/tree/llama_3.1_instruct) (branch llama_3.1_instruct) and the [vLLM](https://docs.vllm.ai/en/stable/) engine.
146
  This version of the lm-evaluation-harness includes versions of MMLU, ARC-Challenge and GSM-8K that match the prompting style of [Meta-Llama-3.1-Instruct-evals](https://huggingface.co/datasets/meta-llama/Meta-Llama-3.1-405B-Instruct-evals).
147
 
148
+ **Note:** Results have been updated after Meta modified the chat template.
149
+
150
  ### Accuracy
151
 
152
  #### Open LLM Leaderboard evaluation scores
 
164
  <tr>
165
  <td>MMLU (5-shot)
166
  </td>
167
+ <td>87.38
168
  </td>
169
+ <td>87.59
170
  </td>
171
+ <td>100.2%
172
  </td>
173
  </tr>
174
  <tr>
 
176
  </td>
177
  <td>88.26
178
  </td>
179
+ <td>88.19
180
  </td>
181
+ <td>99.9%
182
  </td>
183
  </tr>
184
  <tr>
 
186
  </td>
187
  <td>94.97
188
  </td>
189
+ <td>94.80
190
  </td>
191
+ <td>99.8%
192
  </td>
193
  </tr>
194
  <tr>
 
198
  </td>
199
  <td>96.13
200
  </td>
201
+ <td>100.8%
202
  </td>
203
  </tr>
204
  <tr>
 
206
  </td>
207
  <td>88.33
208
  </td>
209
+ <td>88.52
210
  </td>
211
  <td>100.2%
212
  </td>
 
216
  </td>
217
  <td>87.21
218
  </td>
219
+ <td>87.92
220
  </td>
221
+ <td>100.8%
222
  </td>
223
  </tr>
224
  <tr>
 
226
  </td>
227
  <td>64.64
228
  </td>
229
+ <td>65.41
230
  </td>
231
  <td>101.2%
232
  </td>
 
236
  </td>
237
  <td><strong>86.75</strong>
238
  </td>
239
+ <td><strong>86.94</strong>
240
  </td>
241
  <td><strong>100.2%</strong>
242
  </td>
 
251
  ```
252
  lm_eval \
253
  --model vllm \
254
+ --model_args pretrained="neuralmagic/Meta-Llama-3.1-405B-Instruct-quantized.w8a16",dtype=auto,max_model_len=3850,max_gen_toks=10,enable_chunked_prefill=True,tensor_parallel_size=8 \
255
  --tasks mmlu_llama_3.1_instruct \
256
  --fewshot_as_multiturn \
257
  --apply_chat_template \
 
263
  ```
264
  lm_eval \
265
  --model vllm \
266
+ --model_args pretrained="neuralmagic/Meta-Llama-3.1-405B-Instruct-quantized.w8a16",dtype=auto,max_model_len=4064,max_gen_toks=1024,enable_chunked_prefill=True,tensor_parallel_size=8 \
267
  --tasks mmlu_cot_0shot_llama_3.1_instruct \
268
  --apply_chat_template \
269
  --num_fewshot 0 \
 
274
  ```
275
  lm_eval \
276
  --model vllm \
277
+ --model_args pretrained="neuralmagic/Meta-Llama-3.1-405B-Instruct-quantized.w8a16",dtype=auto,max_model_len=3940,max_gen_toks=100,enable_chunked_prefill=True,tensor_parallel_size=8 \
278
  --tasks arc_challenge_llama_3.1_instruct \
279
  --apply_chat_template \
280
  --num_fewshot 0 \
 
285
  ```
286
  lm_eval \
287
  --model vllm \
288
+ --model_args pretrained="neuralmagic/Meta-Llama-3.1-405B-Instruct-quantized.w8a16",dtype=auto,max_model_len=4096,max_gen_toks=1024,enable_chunked_prefill=True,tensor_parallel_size=8 \
289
  --tasks gsm8k_cot_llama_3.1_instruct \
290
  --fewshot_as_multiturn \
291
  --apply_chat_template \