Adding Evaluation Results
#2
by
leaderboard-pr-bot
- opened
README.md
CHANGED
@@ -1,15 +1,118 @@
|
|
1 |
---
|
2 |
-
license: cc-by-4.0
|
3 |
-
datasets:
|
4 |
-
- Open-Orca/OpenOrca
|
5 |
-
- Intel/orca_dpo_pairs
|
6 |
language:
|
7 |
- en
|
|
|
8 |
tags:
|
9 |
- xDAN-AI
|
10 |
- OpenOrca
|
11 |
- DPO
|
12 |
- Self-Think
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
13 |
---
|
14 |
|
15 |
<div style="display: flex; justify-content: center; align-items: center">
|
@@ -130,4 +233,17 @@ and you carefully consider each step before providing answers.
|
|
130 |
We employ rigorous data compliance validation algorithms throughout the training of our language model to ensure the highest level of compliance. However, due to the intricate nature of data and the wide range of potential usage scenarios for the model, we cannot guarantee that it will consistently produce accurate and sensible outputs. Users should be aware of the possibility of the model generating problematic results. Our organization disclaims any responsibility for risks or issues arising from misuse, improper guidance, unlawful usage, misinformation, or subsequent concerns regarding data security.
|
131 |
|
132 |
## About xDAN-AI
|
133 |
-
xDAN-AI represents the forefront of Silicon-Based Life Factory technology. For comprehensive information and deeper insights into our cutting-edge technology and offerings, please visit our website: https://www.xdan.ai.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
|
|
|
|
|
|
|
|
2 |
language:
|
3 |
- en
|
4 |
+
license: cc-by-4.0
|
5 |
tags:
|
6 |
- xDAN-AI
|
7 |
- OpenOrca
|
8 |
- DPO
|
9 |
- Self-Think
|
10 |
+
datasets:
|
11 |
+
- Open-Orca/OpenOrca
|
12 |
+
- Intel/orca_dpo_pairs
|
13 |
+
model-index:
|
14 |
+
- name: xDAN-L1-Chat-RL-v1
|
15 |
+
results:
|
16 |
+
- task:
|
17 |
+
type: text-generation
|
18 |
+
name: Text Generation
|
19 |
+
dataset:
|
20 |
+
name: AI2 Reasoning Challenge (25-Shot)
|
21 |
+
type: ai2_arc
|
22 |
+
config: ARC-Challenge
|
23 |
+
split: test
|
24 |
+
args:
|
25 |
+
num_few_shot: 25
|
26 |
+
metrics:
|
27 |
+
- type: acc_norm
|
28 |
+
value: 66.3
|
29 |
+
name: normalized accuracy
|
30 |
+
source:
|
31 |
+
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=xDAN-AI/xDAN-L1-Chat-RL-v1
|
32 |
+
name: Open LLM Leaderboard
|
33 |
+
- task:
|
34 |
+
type: text-generation
|
35 |
+
name: Text Generation
|
36 |
+
dataset:
|
37 |
+
name: HellaSwag (10-Shot)
|
38 |
+
type: hellaswag
|
39 |
+
split: validation
|
40 |
+
args:
|
41 |
+
num_few_shot: 10
|
42 |
+
metrics:
|
43 |
+
- type: acc_norm
|
44 |
+
value: 85.81
|
45 |
+
name: normalized accuracy
|
46 |
+
source:
|
47 |
+
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=xDAN-AI/xDAN-L1-Chat-RL-v1
|
48 |
+
name: Open LLM Leaderboard
|
49 |
+
- task:
|
50 |
+
type: text-generation
|
51 |
+
name: Text Generation
|
52 |
+
dataset:
|
53 |
+
name: MMLU (5-Shot)
|
54 |
+
type: cais/mmlu
|
55 |
+
config: all
|
56 |
+
split: test
|
57 |
+
args:
|
58 |
+
num_few_shot: 5
|
59 |
+
metrics:
|
60 |
+
- type: acc
|
61 |
+
value: 63.21
|
62 |
+
name: accuracy
|
63 |
+
source:
|
64 |
+
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=xDAN-AI/xDAN-L1-Chat-RL-v1
|
65 |
+
name: Open LLM Leaderboard
|
66 |
+
- task:
|
67 |
+
type: text-generation
|
68 |
+
name: Text Generation
|
69 |
+
dataset:
|
70 |
+
name: TruthfulQA (0-shot)
|
71 |
+
type: truthful_qa
|
72 |
+
config: multiple_choice
|
73 |
+
split: validation
|
74 |
+
args:
|
75 |
+
num_few_shot: 0
|
76 |
+
metrics:
|
77 |
+
- type: mc2
|
78 |
+
value: 56.7
|
79 |
+
source:
|
80 |
+
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=xDAN-AI/xDAN-L1-Chat-RL-v1
|
81 |
+
name: Open LLM Leaderboard
|
82 |
+
- task:
|
83 |
+
type: text-generation
|
84 |
+
name: Text Generation
|
85 |
+
dataset:
|
86 |
+
name: Winogrande (5-shot)
|
87 |
+
type: winogrande
|
88 |
+
config: winogrande_xl
|
89 |
+
split: validation
|
90 |
+
args:
|
91 |
+
num_few_shot: 5
|
92 |
+
metrics:
|
93 |
+
- type: acc
|
94 |
+
value: 78.85
|
95 |
+
name: accuracy
|
96 |
+
source:
|
97 |
+
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=xDAN-AI/xDAN-L1-Chat-RL-v1
|
98 |
+
name: Open LLM Leaderboard
|
99 |
+
- task:
|
100 |
+
type: text-generation
|
101 |
+
name: Text Generation
|
102 |
+
dataset:
|
103 |
+
name: GSM8k (5-shot)
|
104 |
+
type: gsm8k
|
105 |
+
config: main
|
106 |
+
split: test
|
107 |
+
args:
|
108 |
+
num_few_shot: 5
|
109 |
+
metrics:
|
110 |
+
- type: acc
|
111 |
+
value: 59.44
|
112 |
+
name: accuracy
|
113 |
+
source:
|
114 |
+
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=xDAN-AI/xDAN-L1-Chat-RL-v1
|
115 |
+
name: Open LLM Leaderboard
|
116 |
---
|
117 |
|
118 |
<div style="display: flex; justify-content: center; align-items: center">
|
|
|
233 |
We employ rigorous data compliance validation algorithms throughout the training of our language model to ensure the highest level of compliance. However, due to the intricate nature of data and the wide range of potential usage scenarios for the model, we cannot guarantee that it will consistently produce accurate and sensible outputs. Users should be aware of the possibility of the model generating problematic results. Our organization disclaims any responsibility for risks or issues arising from misuse, improper guidance, unlawful usage, misinformation, or subsequent concerns regarding data security.
|
234 |
|
235 |
## About xDAN-AI
|
236 |
+
xDAN-AI represents the forefront of Silicon-Based Life Factory technology. For comprehensive information and deeper insights into our cutting-edge technology and offerings, please visit our website: https://www.xdan.ai.
|
237 |
+
# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
|
238 |
+
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_xDAN-AI__xDAN-L1-Chat-RL-v1)
|
239 |
+
|
240 |
+
| Metric |Value|
|
241 |
+
|---------------------------------|----:|
|
242 |
+
|Avg. |68.38|
|
243 |
+
|AI2 Reasoning Challenge (25-Shot)|66.30|
|
244 |
+
|HellaSwag (10-Shot) |85.81|
|
245 |
+
|MMLU (5-Shot) |63.21|
|
246 |
+
|TruthfulQA (0-shot) |56.70|
|
247 |
+
|Winogrande (5-shot) |78.85|
|
248 |
+
|GSM8k (5-shot) |59.44|
|
249 |
+
|