Intel
/

neural-chat-7b-v3

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

lvkaokao commited on Nov 14, 2023

Commit

70b518b

·

1 Parent(s): 7a05c8a

update metric from llm leaderboard.

Files changed (1) hide show

README.md +5 -5

README.md CHANGED Viewed

@@ -11,12 +11,12 @@ Neural-chat-7b-v3 was trained between September and October, 2023.
 ## Evaluation
-We use the [Eleuther AI Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness/tree/master) to measure the metrics that are adopted by [open_llm_leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard).
-| Model | Average ⬆️| ARC (25-s) ⬆️ | HellaSwag (10-s) ⬆️ | MMLU (5-s) ⬆️| TruthfulQA (MC) (0-s) ⬆️ |
-| --- | --- | --- | --- | --- | --- |
-|[mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) | 62.4 | 59.58  | 83.31  | 64.16  | 42.15 |
-| **Ours** | **67.82** | 67.41 | 82.63 | 61.69  | 59.57 |
 ## Training procedure

 ## Evaluation
+We submit our model to [open_llm_leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard), and the model performance has been improved significantl as we see from the average metric of 7 tasks from the leaderboard.
+| Model | Average ⬆️| ARC (25-s) ⬆️ | HellaSwag (10-s) ⬆️ | MMLU (5-s) ⬆️| TruthfulQA (MC) (0-s) ⬆️ | Winogrande (5-s) | GSM8K (5-s) | DROP (3-s) |
+| --- | --- | --- | --- | --- | --- | --- | --- | --- |
+|[mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) | 50.32 | 59.58  | 83.31  | 64.16  | 42.15 | 78.37 | 18.12 | 6.14 |
+| **Ours** | **57.31** | 67.15 | 83.29 | 62.26  | 58.77 | 78.06 | 1.21 | 50.43 |
 ## Training procedure