DebateLabKIT
/

Llama-3.1-Argunaut-1-8B-SFT

Text Generation

critical-thinking

argument-mapping

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

ggbetz commited on 24 days ago

Commit

c91dc28

·

verified ·

1 Parent(s): 16954e7

Update README.md

Files changed (1) hide show

README.md +11 -1

README.md CHANGED Viewed

@@ -36,10 +36,20 @@ print(output["generated_text"])
 ## Evals
 ## SFT dataset mixture
-|dataset|weight (examples)| weight (tokens)|
 |:------|:----:|:----:|
 |DebateLabKIT/deepa2-conversations|25%|49%|
 |DebateLabKIT/deep-argmap-conversations|25%|18%|

 ## Evals
+LM Eval Harness results (local compoletions/vllm):
+<iframe src="https://wandb.ai/ggbetz/argunauts-training/reports/DebateLabKIT-Llama-3-1-Argunaut-1-8B-SFT--VmlldzoxMDc2ODAwOQ" style="border:none;height:1024px;width:100%">
+Pinning `Llama-3.1-Argunaut-1-8B-SFT` against top-performing LLama-8B models from [Open LLM Leaderboard](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/):
+|Model|BBH|MATH|GPQA|MMLU Pro|
+|:--------|:--:|:--:|:--:|:--:|
+|**Llama-3.1-Argunaut-1-8B-SFT**|44.6|9.0|32.1|34.5|
 ## SFT dataset mixture
+|Dataset|Weight (examples)|Weight (tokens)|
 |:------|:----:|:----:|
 |DebateLabKIT/deepa2-conversations|25%|49%|
 |DebateLabKIT/deep-argmap-conversations|25%|18%|