T145 commited on
Commit
e814e6a
1 Parent(s): e9d7396

Adding Evaluation Results

Browse files

This is an automated PR created with [this space](https://huggingface.co/spaces/T145/open-llm-leaderboard-results-to-modelcard)!

The purpose of this PR is to add evaluation results from the Open LLM Leaderboard to your model card.

Please report any issues here: https://huggingface.co/spaces/T145/open-llm-leaderboard-results-to-modelcard/discussions

Files changed (1) hide show
  1. README.md +114 -0
README.md CHANGED
@@ -15,6 +15,105 @@ tags:
15
  - argument-mapping
16
  - trl
17
  - sft
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
18
  ---
19
 
20
 
@@ -101,3 +200,18 @@ This work wouldn't be possible without all the **great contributions from the op
101
  - @cognitivecomputations for sharing [spectrum](https://github.com/cognitivecomputations/spectrum/tree/main)
102
 
103
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
15
  - argument-mapping
16
  - trl
17
  - sft
18
+ model-index:
19
+ - name: Llama-3.1-Argunaut-1-8B-SFT
20
+ results:
21
+ - task:
22
+ type: text-generation
23
+ name: Text Generation
24
+ dataset:
25
+ name: IFEval (0-Shot)
26
+ type: wis-k/instruction-following-eval
27
+ split: train
28
+ args:
29
+ num_few_shot: 0
30
+ metrics:
31
+ - type: inst_level_strict_acc and prompt_level_strict_acc
32
+ value: 55.19
33
+ name: averaged accuracy
34
+ source:
35
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=DebateLabKIT%2FLlama-3.1-Argunaut-1-8B-SFT
36
+ name: Open LLM Leaderboard
37
+ - task:
38
+ type: text-generation
39
+ name: Text Generation
40
+ dataset:
41
+ name: BBH (3-Shot)
42
+ type: SaylorTwift/bbh
43
+ split: test
44
+ args:
45
+ num_few_shot: 3
46
+ metrics:
47
+ - type: acc_norm
48
+ value: 27.19
49
+ name: normalized accuracy
50
+ source:
51
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=DebateLabKIT%2FLlama-3.1-Argunaut-1-8B-SFT
52
+ name: Open LLM Leaderboard
53
+ - task:
54
+ type: text-generation
55
+ name: Text Generation
56
+ dataset:
57
+ name: MATH Lvl 5 (4-Shot)
58
+ type: lighteval/MATH-Hard
59
+ split: test
60
+ args:
61
+ num_few_shot: 4
62
+ metrics:
63
+ - type: exact_match
64
+ value: 11.18
65
+ name: exact match
66
+ source:
67
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=DebateLabKIT%2FLlama-3.1-Argunaut-1-8B-SFT
68
+ name: Open LLM Leaderboard
69
+ - task:
70
+ type: text-generation
71
+ name: Text Generation
72
+ dataset:
73
+ name: GPQA (0-shot)
74
+ type: Idavidrein/gpqa
75
+ split: train
76
+ args:
77
+ num_few_shot: 0
78
+ metrics:
79
+ - type: acc_norm
80
+ value: 4.47
81
+ name: acc_norm
82
+ source:
83
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=DebateLabKIT%2FLlama-3.1-Argunaut-1-8B-SFT
84
+ name: Open LLM Leaderboard
85
+ - task:
86
+ type: text-generation
87
+ name: Text Generation
88
+ dataset:
89
+ name: MuSR (0-shot)
90
+ type: TAUR-Lab/MuSR
91
+ args:
92
+ num_few_shot: 0
93
+ metrics:
94
+ - type: acc_norm
95
+ value: 15.85
96
+ name: acc_norm
97
+ source:
98
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=DebateLabKIT%2FLlama-3.1-Argunaut-1-8B-SFT
99
+ name: Open LLM Leaderboard
100
+ - task:
101
+ type: text-generation
102
+ name: Text Generation
103
+ dataset:
104
+ name: MMLU-PRO (5-shot)
105
+ type: TIGER-Lab/MMLU-Pro
106
+ config: main
107
+ split: test
108
+ args:
109
+ num_few_shot: 5
110
+ metrics:
111
+ - type: acc
112
+ value: 27.47
113
+ name: accuracy
114
+ source:
115
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=DebateLabKIT%2FLlama-3.1-Argunaut-1-8B-SFT
116
+ name: Open LLM Leaderboard
117
  ---
118
 
119
 
 
200
  - @cognitivecomputations for sharing [spectrum](https://github.com/cognitivecomputations/spectrum/tree/main)
201
 
202
 
203
+
204
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
205
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/DebateLabKIT__Llama-3.1-Argunaut-1-8B-SFT-details)!
206
+ Summarized results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/contents/viewer/default/train?q=DebateLabKIT%2FLlama-3.1-Argunaut-1-8B-SFT&sort[column]=Average%20%E2%AC%86%EF%B8%8F&sort[direction]=desc)!
207
+
208
+ | Metric |Value (%)|
209
+ |-------------------|--------:|
210
+ |**Average** | 23.56|
211
+ |IFEval (0-Shot) | 55.19|
212
+ |BBH (3-Shot) | 27.19|
213
+ |MATH Lvl 5 (4-Shot)| 11.18|
214
+ |GPQA (0-shot) | 4.47|
215
+ |MuSR (0-shot) | 15.85|
216
+ |MMLU-PRO (5-shot) | 27.47|
217
+