Update README.md
Browse files
README.md
CHANGED
@@ -36,16 +36,16 @@ print(output["generated_text"])
|
|
36 |
|
37 |
## Evals
|
38 |
|
39 |
-
LM Eval Harness results (local
|
40 |
-
|
41 |
-
<iframe src="https://wandb.ai/ggbetz/argunauts-training/reports/DebateLabKIT-Llama-3-1-Argunaut-1-8B-SFT--VmlldzoxMDc2ODAwOQ" style="border:none;height:1024px;width:100%">
|
42 |
|
43 |
Pinning `Llama-3.1-Argunaut-1-8B-SFT` against top-performing LLama-8B models from [Open LLM Leaderboard](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/):
|
44 |
|
45 |
-
|Model|BBH|MATH|GPQA|
|
46 |
|:--------|:---:|:---:|:---:|:---:|
|
47 |
-
|
|
48 |
-
|
|
|
|
|
|
49 |
|
50 |
|
51 |
## SFT dataset mixture
|
|
|
36 |
|
37 |
## Evals
|
38 |
|
39 |
+
LM Eval Harness results (local completions/vllm): [wandb report](https://api.wandb.ai/links/ggbetz/3bwr0ou6)
|
|
|
|
|
40 |
|
41 |
Pinning `Llama-3.1-Argunaut-1-8B-SFT` against top-performing LLama-8B models from [Open LLM Leaderboard](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/):
|
42 |
|
43 |
+
|Model|BBH|MATH|GPQA|MMLU Pro|
|
44 |
|:--------|:---:|:---:|:---:|:---:|
|
45 |
+
| **Llama-3.1-Argunaut-1-8B-SFT** | 44.6% | 9.0% | 32.1% | 34.5% |
|
46 |
+
| meta-llama/Meta-Llama-3.1-8B-Instruct | 29.9% | 19.3% | 2.6% | 30.7% |
|
47 |
+
| arcee-ai/Llama-3.1-SuperNova-Lite | 31.6% | 17.4% | 7.5% | 32.0% |
|
48 |
+
| allenai/Llama-3.1-Tulu-3-8B-SFT | 13.9% | 11.4% | 3.7% | 20.1% |
|
49 |
|
50 |
|
51 |
## SFT dataset mixture
|