Update README.md
Browse files
README.md
CHANGED
@@ -36,10 +36,20 @@ print(output["generated_text"])
|
|
36 |
|
37 |
## Evals
|
38 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
39 |
|
40 |
## SFT dataset mixture
|
41 |
|
42 |
-
|
|
43 |
|:------|:----:|:----:|
|
44 |
|DebateLabKIT/deepa2-conversations|25%|49%|
|
45 |
|DebateLabKIT/deep-argmap-conversations|25%|18%|
|
|
|
36 |
|
37 |
## Evals
|
38 |
|
39 |
+
LM Eval Harness results (local compoletions/vllm):
|
40 |
+
|
41 |
+
<iframe src="https://wandb.ai/ggbetz/argunauts-training/reports/DebateLabKIT-Llama-3-1-Argunaut-1-8B-SFT--VmlldzoxMDc2ODAwOQ" style="border:none;height:1024px;width:100%">
|
42 |
+
|
43 |
+
Pinning `Llama-3.1-Argunaut-1-8B-SFT` against top-performing LLama-8B models from [Open LLM Leaderboard](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/):
|
44 |
+
|
45 |
+
|Model|BBH|MATH|GPQA|MMLU Pro|
|
46 |
+
|:--------|:--:|:--:|:--:|:--:|
|
47 |
+
|**Llama-3.1-Argunaut-1-8B-SFT**|44.6|9.0|32.1|34.5|
|
48 |
+
|
49 |
|
50 |
## SFT dataset mixture
|
51 |
|
52 |
+
|Dataset|Weight (examples)|Weight (tokens)|
|
53 |
|:------|:----:|:----:|
|
54 |
|DebateLabKIT/deepa2-conversations|25%|49%|
|
55 |
|DebateLabKIT/deep-argmap-conversations|25%|18%|
|