ggbetz commited on
Commit
c91dc28
·
verified ·
1 Parent(s): 16954e7

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +11 -1
README.md CHANGED
@@ -36,10 +36,20 @@ print(output["generated_text"])
36
 
37
  ## Evals
38
 
 
 
 
 
 
 
 
 
 
 
39
 
40
  ## SFT dataset mixture
41
 
42
- |dataset|weight (examples)| weight (tokens)|
43
  |:------|:----:|:----:|
44
  |DebateLabKIT/deepa2-conversations|25%|49%|
45
  |DebateLabKIT/deep-argmap-conversations|25%|18%|
 
36
 
37
  ## Evals
38
 
39
+ LM Eval Harness results (local compoletions/vllm):
40
+
41
+ <iframe src="https://wandb.ai/ggbetz/argunauts-training/reports/DebateLabKIT-Llama-3-1-Argunaut-1-8B-SFT--VmlldzoxMDc2ODAwOQ" style="border:none;height:1024px;width:100%">
42
+
43
+ Pinning `Llama-3.1-Argunaut-1-8B-SFT` against top-performing LLama-8B models from [Open LLM Leaderboard](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/):
44
+
45
+ |Model|BBH|MATH|GPQA|MMLU Pro|
46
+ |:--------|:--:|:--:|:--:|:--:|
47
+ |**Llama-3.1-Argunaut-1-8B-SFT**|44.6|9.0|32.1|34.5|
48
+
49
 
50
  ## SFT dataset mixture
51
 
52
+ |Dataset|Weight (examples)|Weight (tokens)|
53
  |:------|:----:|:----:|
54
  |DebateLabKIT/deepa2-conversations|25%|49%|
55
  |DebateLabKIT/deep-argmap-conversations|25%|18%|