lvkaokao commited on
Commit
70b518b
·
1 Parent(s): 7a05c8a

update metric from llm leaderboard.

Browse files
Files changed (1) hide show
  1. README.md +5 -5
README.md CHANGED
@@ -11,12 +11,12 @@ Neural-chat-7b-v3 was trained between September and October, 2023.
11
 
12
  ## Evaluation
13
 
14
- We use the [Eleuther AI Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness/tree/master) to measure the metrics that are adopted by [open_llm_leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard).
15
 
16
- | Model | Average ⬆️| ARC (25-s) ⬆️ | HellaSwag (10-s) ⬆️ | MMLU (5-s) ⬆️| TruthfulQA (MC) (0-s) ⬆️ |
17
- | --- | --- | --- | --- | --- | --- |
18
- |[mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) | 62.4 | 59.58 | 83.31 | 64.16 | 42.15 |
19
- | **Ours** | **67.82** | 67.41 | 82.63 | 61.69 | 59.57 |
20
 
21
 
22
  ## Training procedure
 
11
 
12
  ## Evaluation
13
 
14
+ We submit our model to [open_llm_leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard), and the model performance has been improved significantl as we see from the average metric of 7 tasks from the leaderboard.
15
 
16
+ | Model | Average ⬆️| ARC (25-s) ⬆️ | HellaSwag (10-s) ⬆️ | MMLU (5-s) ⬆️| TruthfulQA (MC) (0-s) ⬆️ | Winogrande (5-s) | GSM8K (5-s) | DROP (3-s) |
17
+ | --- | --- | --- | --- | --- | --- | --- | --- | --- |
18
+ |[mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) | 50.32 | 59.58 | 83.31 | 64.16 | 42.15 | 78.37 | 18.12 | 6.14 |
19
+ | **Ours** | **57.31** | 67.15 | 83.29 | 62.26 | 58.77 | 78.06 | 1.21 | 50.43 |
20
 
21
 
22
  ## Training procedure