Adding Evaluation Results

#1
by DreadPoor - opened
Files changed (1) hide show
  1. README.md +114 -1
README.md CHANGED
@@ -1,5 +1,104 @@
1
  ---
2
  license: apache-2.0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  ---
4
  # merge
5
 
@@ -39,4 +138,18 @@ merge_method: linear
39
  normalize: false
40
  int8_mask: true
41
  dtype: bfloat16
42
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
+ model-index:
4
+ - name: Howdy-8B-LINEAR
5
+ results:
6
+ - task:
7
+ type: text-generation
8
+ name: Text Generation
9
+ dataset:
10
+ name: IFEval (0-Shot)
11
+ type: wis-k/instruction-following-eval
12
+ split: train
13
+ args:
14
+ num_few_shot: 0
15
+ metrics:
16
+ - type: inst_level_strict_acc and prompt_level_strict_acc
17
+ value: 73.78
18
+ name: averaged accuracy
19
+ source:
20
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=DreadPoor%2FHowdy-8B-LINEAR
21
+ name: Open LLM Leaderboard
22
+ - task:
23
+ type: text-generation
24
+ name: Text Generation
25
+ dataset:
26
+ name: BBH (3-Shot)
27
+ type: SaylorTwift/bbh
28
+ split: test
29
+ args:
30
+ num_few_shot: 3
31
+ metrics:
32
+ - type: acc_norm
33
+ value: 34.23
34
+ name: normalized accuracy
35
+ source:
36
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=DreadPoor%2FHowdy-8B-LINEAR
37
+ name: Open LLM Leaderboard
38
+ - task:
39
+ type: text-generation
40
+ name: Text Generation
41
+ dataset:
42
+ name: MATH Lvl 5 (4-Shot)
43
+ type: lighteval/MATH-Hard
44
+ split: test
45
+ args:
46
+ num_few_shot: 4
47
+ metrics:
48
+ - type: exact_match
49
+ value: 17.37
50
+ name: exact match
51
+ source:
52
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=DreadPoor%2FHowdy-8B-LINEAR
53
+ name: Open LLM Leaderboard
54
+ - task:
55
+ type: text-generation
56
+ name: Text Generation
57
+ dataset:
58
+ name: GPQA (0-shot)
59
+ type: Idavidrein/gpqa
60
+ split: train
61
+ args:
62
+ num_few_shot: 0
63
+ metrics:
64
+ - type: acc_norm
65
+ value: 8.61
66
+ name: acc_norm
67
+ source:
68
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=DreadPoor%2FHowdy-8B-LINEAR
69
+ name: Open LLM Leaderboard
70
+ - task:
71
+ type: text-generation
72
+ name: Text Generation
73
+ dataset:
74
+ name: MuSR (0-shot)
75
+ type: TAUR-Lab/MuSR
76
+ args:
77
+ num_few_shot: 0
78
+ metrics:
79
+ - type: acc_norm
80
+ value: 12.32
81
+ name: acc_norm
82
+ source:
83
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=DreadPoor%2FHowdy-8B-LINEAR
84
+ name: Open LLM Leaderboard
85
+ - task:
86
+ type: text-generation
87
+ name: Text Generation
88
+ dataset:
89
+ name: MMLU-PRO (5-shot)
90
+ type: TIGER-Lab/MMLU-Pro
91
+ config: main
92
+ split: test
93
+ args:
94
+ num_few_shot: 5
95
+ metrics:
96
+ - type: acc
97
+ value: 31.18
98
+ name: accuracy
99
+ source:
100
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=DreadPoor%2FHowdy-8B-LINEAR
101
+ name: Open LLM Leaderboard
102
  ---
103
  # merge
104
 
 
138
  normalize: false
139
  int8_mask: true
140
  dtype: bfloat16
141
+ ```
142
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
143
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/DreadPoor__Howdy-8B-LINEAR-details)!
144
+ Summarized results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/contents/viewer/default/train?q=DreadPoor%2FHowdy-8B-LINEAR&sort[column]=Average%20%E2%AC%86%EF%B8%8F&sort[direction]=desc)!
145
+
146
+ | Metric |Value (%)|
147
+ |-------------------|--------:|
148
+ |**Average** | 29.58|
149
+ |IFEval (0-Shot) | 73.78|
150
+ |BBH (3-Shot) | 34.23|
151
+ |MATH Lvl 5 (4-Shot)| 17.37|
152
+ |GPQA (0-shot) | 8.61|
153
+ |MuSR (0-shot) | 12.32|
154
+ |MMLU-PRO (5-shot) | 31.18|
155
+