giraffe176
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -196,6 +196,8 @@ dtype: bfloat16
|
|
196 |
|
197 |
|
198 |
### Table of Benchmarks
|
|
|
|
|
199 |
| | MT-Bench | EQ-Bench v2.1 | Average | ARC | HellaSwag | MMLU | TruthfulQA | Winogrande | GSM8K |
|
200 |
|---------------------------------------------------------|---------------------------------------------|---------------------------------------------------------------------------------|---------|-------|-----------|-------|------------|------------|-------|
|
201 |
| giraffe176/WestMaid_HermesMonarchv0.1 | 8.021875 | 77.19 (3 Shot, ooba) | 72.62 | 70.22 | 87.42 | 64.31 | 61.99 | 82.16 | 69.6 |
|
@@ -205,4 +207,11 @@ dtype: bfloat16
|
|
205 |
| NeverSleep/Noromaid-7B-0.4-DPO | | | 59.08 | 62.29 | 84.32 | 63.2 | 42.28 | 76.95 | 25.47 |
|
206 |
| claude-v1 | 7.900000 | 76.83 | | | | | | | |
|
207 |
| gpt-3.5-turbo | 7.943750 | 71.74 | | | | | | | |
|
208 |
-
| | [(Paper)](https://arxiv.org/abs/2306.05685) | [(Paper)](https://arxiv.org/abs/2312.06281) [Leaderboard](https://eqbench.com/) | | | | | | | |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
196 |
|
197 |
|
198 |
### Table of Benchmarks
|
199 |
+
|
200 |
+
## Open LLM Leaderboard
|
201 |
| | MT-Bench | EQ-Bench v2.1 | Average | ARC | HellaSwag | MMLU | TruthfulQA | Winogrande | GSM8K |
|
202 |
|---------------------------------------------------------|---------------------------------------------|---------------------------------------------------------------------------------|---------|-------|-----------|-------|------------|------------|-------|
|
203 |
| giraffe176/WestMaid_HermesMonarchv0.1 | 8.021875 | 77.19 (3 Shot, ooba) | 72.62 | 70.22 | 87.42 | 64.31 | 61.99 | 82.16 | 69.6 |
|
|
|
207 |
| NeverSleep/Noromaid-7B-0.4-DPO | | | 59.08 | 62.29 | 84.32 | 63.2 | 42.28 | 76.95 | 25.47 |
|
208 |
| claude-v1 | 7.900000 | 76.83 | | | | | | | |
|
209 |
| gpt-3.5-turbo | 7.943750 | 71.74 | | | | | | | |
|
210 |
+
| | [(Paper)](https://arxiv.org/abs/2306.05685) | [(Paper)](https://arxiv.org/abs/2312.06281) [Leaderboard](https://eqbench.com/) | | | | | | | |
|
211 |
+
|
212 |
+
## Yet Another LLM Leaderboard benchmarks
|
213 |
+
|
214 |
+
| Model |AGIEval|GPT4All|TruthfulQA|Bigbench|Average|
|
215 |
+
|------------------------------------------------------------------------------------------|------:|------:|---------:|-------:|------:|
|
216 |
+
|[WestMaid_HermesMonarchv0.1](https://huggingface.co/giraffe176/WestMaid_HermesMonarchv0.1)| 45.34| 76.33| 61.99| 46.02| 57.42|
|
217 |
+
|