leaderboard-pr-bot commited on
Commit
b132e2b
·
verified ·
1 Parent(s): 251a6e5

Adding Evaluation Results

Browse files

This is an automated PR created with https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr

The purpose of this PR is to add evaluation results from the Open LLM Leaderboard to your model card.

If you encounter any issues, please report them to https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr/discussions

Files changed (1) hide show
  1. README.md +120 -5
README.md CHANGED
@@ -1,22 +1,124 @@
1
  ---
2
  language:
3
  - en
4
- library_name: transformers
5
  license: apache-2.0
 
6
  tags:
7
  - gpt
8
  - llm
9
  - large language model
10
  - h2o-llmstudio
11
- thumbnail: >-
12
- https://h2o.ai/etc.clientlibs/h2o/clientlibs/clientlib-site/resources/images/favicon.ico
13
  datasets:
14
  - Open-Orca/OpenOrca
15
  - OpenAssistant/oasst2
16
  - HuggingFaceH4/ultrachat_200k
17
  - meta-math/MetaMathQA
 
18
  widget:
19
- - text: "<|prompt|>Why is drinking water so healthy?</s><|answer|>"
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
20
  ---
21
  # Model Card
22
  ## Summary
@@ -134,4 +236,17 @@ Please read this disclaimer carefully before using the large language model prov
134
  - Reporting Issues: If you encounter any biased, offensive, or otherwise inappropriate content generated by the large language model, please report it to the repository maintainers through the provided channels. Your feedback will help improve the model and mitigate potential issues.
135
  - Changes to this Disclaimer: The developers of this repository reserve the right to modify or update this disclaimer at any time without prior notice. It is the user's responsibility to periodically review the disclaimer to stay informed about any changes.
136
 
137
- By using the large language model provided in this repository, you agree to accept and comply with the terms and conditions outlined in this disclaimer. If you do not agree with any part of this disclaimer, you should refrain from using the model and any content generated by it.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  language:
3
  - en
 
4
  license: apache-2.0
5
+ library_name: transformers
6
  tags:
7
  - gpt
8
  - llm
9
  - large language model
10
  - h2o-llmstudio
 
 
11
  datasets:
12
  - Open-Orca/OpenOrca
13
  - OpenAssistant/oasst2
14
  - HuggingFaceH4/ultrachat_200k
15
  - meta-math/MetaMathQA
16
+ thumbnail: https://h2o.ai/etc.clientlibs/h2o/clientlibs/clientlib-site/resources/images/favicon.ico
17
  widget:
18
+ - text: <|prompt|>Why is drinking water so healthy?</s><|answer|>
19
+ model-index:
20
+ - name: h2o-danube-1.8b-sft
21
+ results:
22
+ - task:
23
+ type: text-generation
24
+ name: Text Generation
25
+ dataset:
26
+ name: AI2 Reasoning Challenge (25-Shot)
27
+ type: ai2_arc
28
+ config: ARC-Challenge
29
+ split: test
30
+ args:
31
+ num_few_shot: 25
32
+ metrics:
33
+ - type: acc_norm
34
+ value: 40.19
35
+ name: normalized accuracy
36
+ source:
37
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=h2oai/h2o-danube-1.8b-sft
38
+ name: Open LLM Leaderboard
39
+ - task:
40
+ type: text-generation
41
+ name: Text Generation
42
+ dataset:
43
+ name: HellaSwag (10-Shot)
44
+ type: hellaswag
45
+ split: validation
46
+ args:
47
+ num_few_shot: 10
48
+ metrics:
49
+ - type: acc_norm
50
+ value: 67.34
51
+ name: normalized accuracy
52
+ source:
53
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=h2oai/h2o-danube-1.8b-sft
54
+ name: Open LLM Leaderboard
55
+ - task:
56
+ type: text-generation
57
+ name: Text Generation
58
+ dataset:
59
+ name: MMLU (5-Shot)
60
+ type: cais/mmlu
61
+ config: all
62
+ split: test
63
+ args:
64
+ num_few_shot: 5
65
+ metrics:
66
+ - type: acc
67
+ value: 33.75
68
+ name: accuracy
69
+ source:
70
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=h2oai/h2o-danube-1.8b-sft
71
+ name: Open LLM Leaderboard
72
+ - task:
73
+ type: text-generation
74
+ name: Text Generation
75
+ dataset:
76
+ name: TruthfulQA (0-shot)
77
+ type: truthful_qa
78
+ config: multiple_choice
79
+ split: validation
80
+ args:
81
+ num_few_shot: 0
82
+ metrics:
83
+ - type: mc2
84
+ value: 40.29
85
+ source:
86
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=h2oai/h2o-danube-1.8b-sft
87
+ name: Open LLM Leaderboard
88
+ - task:
89
+ type: text-generation
90
+ name: Text Generation
91
+ dataset:
92
+ name: Winogrande (5-shot)
93
+ type: winogrande
94
+ config: winogrande_xl
95
+ split: validation
96
+ args:
97
+ num_few_shot: 5
98
+ metrics:
99
+ - type: acc
100
+ value: 65.43
101
+ name: accuracy
102
+ source:
103
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=h2oai/h2o-danube-1.8b-sft
104
+ name: Open LLM Leaderboard
105
+ - task:
106
+ type: text-generation
107
+ name: Text Generation
108
+ dataset:
109
+ name: GSM8k (5-shot)
110
+ type: gsm8k
111
+ config: main
112
+ split: test
113
+ args:
114
+ num_few_shot: 5
115
+ metrics:
116
+ - type: acc
117
+ value: 15.09
118
+ name: accuracy
119
+ source:
120
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=h2oai/h2o-danube-1.8b-sft
121
+ name: Open LLM Leaderboard
122
  ---
123
  # Model Card
124
  ## Summary
 
236
  - Reporting Issues: If you encounter any biased, offensive, or otherwise inappropriate content generated by the large language model, please report it to the repository maintainers through the provided channels. Your feedback will help improve the model and mitigate potential issues.
237
  - Changes to this Disclaimer: The developers of this repository reserve the right to modify or update this disclaimer at any time without prior notice. It is the user's responsibility to periodically review the disclaimer to stay informed about any changes.
238
 
239
+ By using the large language model provided in this repository, you agree to accept and comply with the terms and conditions outlined in this disclaimer. If you do not agree with any part of this disclaimer, you should refrain from using the model and any content generated by it.
240
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
241
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_h2oai__h2o-danube-1.8b-sft)
242
+
243
+ | Metric |Value|
244
+ |---------------------------------|----:|
245
+ |Avg. |43.68|
246
+ |AI2 Reasoning Challenge (25-Shot)|40.19|
247
+ |HellaSwag (10-Shot) |67.34|
248
+ |MMLU (5-Shot) |33.75|
249
+ |TruthfulQA (0-shot) |40.29|
250
+ |Winogrande (5-shot) |65.43|
251
+ |GSM8k (5-shot) |15.09|
252
+