lmz commited on
Commit
f6cabe8
·
verified ·
1 Parent(s): d7376cc

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +199 -0
README.md ADDED
@@ -0,0 +1,199 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ license: cc-by-4.0
4
+ language:
5
+ - en
6
+ - fr
7
+ - de
8
+ - it
9
+ - pt
10
+ - es
11
+ pipeline_tag: text-generation
12
+ ---
13
+
14
+ # Model Card for Model ID
15
+
16
+ <!-- Provide a quick summary of what the model is/does. -->
17
+
18
+
19
+
20
+ ## Model Details
21
+
22
+ ### Model Description
23
+
24
+ <!-- Provide a longer summary of what this model is. -->
25
+
26
+ Helium-1 preview is a lightweight language model with 2B parameters, targeting edge and mobile devices.
27
+ It supports the following languages: English, French, German, Italian, Portuguese, Spanish.
28
+
29
+ - **Developed by:** Kyutai
30
+ - **Model type:** Large Language Model
31
+ - **Language(s) (NLP):** English, French, German, Italian, Portuguese, Spanish
32
+ - **License:** CC-BY 4.0
33
+
34
+ <!-- ### Model Sources [optional]
35
+
36
+ Provide the basic links for the model.
37
+
38
+ - **Repository:** [More Information Needed]
39
+ - **Paper [optional]:** [More Information Needed]
40
+ - **Demo [optional]:** [More Information Needed] -->
41
+
42
+ ## Uses
43
+
44
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
45
+
46
+ ### Direct Use
47
+
48
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
49
+
50
+ The intended use of the Helium model is research and development of natural language processing systems, including but not limited to language generation and understanding.
51
+ The model can be used in English, French, German, Italian, Portuguese and Spanish.
52
+ For most downstream use cases, the model should be aligned with supervised fine-tuning, RLHF or related methods.
53
+
54
+ ### Out-of-Scope Use
55
+
56
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
57
+
58
+ The model should not be used in other languages than the ones on which it was trained.
59
+ The model is not intended to be used for any malicious or illegal activities of any kind.
60
+ The model was not fine-tuned to follow instructions, and thus should not be used as such.
61
+
62
+ ## Bias, Risks, and Limitations
63
+
64
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
65
+
66
+ Helium-1 preview is a base language model, which was not aligned to human preferences.
67
+ As such, the model can generate incorrect, biased, harmful or generally unhelpful content.
68
+ Thus, the model should not be used for downstream applications without further alignment, evaluations and mitigations of risks.
69
+ <!-- Thus, it should not be used without further evaluations of risks and mitigations. -->
70
+
71
+ ## How to Get Started with the Model
72
+
73
+ Use the code below to get started with the model.
74
+
75
+ ```python
76
+ import torch
77
+ from transformers import pipeline
78
+
79
+ model_id = "kyutai/helium-1-preview"
80
+
81
+ pipe = pipeline(
82
+ "text-generation",
83
+ model=model_id,
84
+ torch_dtype=torch.bfloat16,
85
+ device_map="auto"
86
+ )
87
+
88
+ text = pipe("Hello, today is a great day to")
89
+ ```
90
+
91
+ ## Training Details
92
+
93
+ ### Training Data
94
+
95
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
96
+
97
+ Helium-1 preview was trained on a mix of data including: Wikipedia, Stack Exchange, open-access scientific articles (from peS2o) and Common Crawl.
98
+
99
+ <!--#### Training Hyperparameters
100
+
101
+ - **Training regime:** [More Information Needed] -->
102
+
103
+ <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
104
+
105
+ ## Evaluation
106
+
107
+ <!-- This section describes the evaluation protocols and provides the results. -->
108
+
109
+ #### Testing Data
110
+
111
+ <!-- This should link to a Dataset Card if possible. -->
112
+
113
+ The model was evaluated on MMLU, TriviaQA, NaturalQuestions, ARC Easy & Challenge, Open Book QA, Common Sense QA,
114
+ Physical Interaction QA, Social Interaction QA, HellaSwag, WinoGrande, Multilingual Knowledge QA, FLORES 200.
115
+
116
+ #### Metrics
117
+
118
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
119
+
120
+ We report accuracy on MMLU, ARC, OBQA, CSQA, PIQA, SIQA, HellaSwag, WinoGrande.
121
+ We report exact match on TriviaQA, NQ and MKQA.
122
+ We report BLEU on FLORES.
123
+
124
+ #### English Results
125
+
126
+ | Benchmark | Helium-1 Preview | HF SmolLM2 (1.7B) | Gemma-2 (2.6B) | Llama-3.2 (3B) | Qwen2.5 (1.5B) |
127
+ |--------------|:------:|:------:|:------:|:------:|:------:|
128
+ | | | | | | |
129
+ | MMLU | 51.2 | 50.4 | 53.1 | 56.6 | 61.0 |
130
+ | NQ | 17.3 | 15.1 | 17.7 | 22.0 | 13.1 |
131
+ | TQA | 47.9 | 45.4 | 49.9 | 53.6 | 35.9 |
132
+ | ARC E | 80.9 | 81.8 | 81.1 | 84.6 | 89.7 |
133
+ | ARC C | 62.7 | 64.7 | 66.0 | 69.0 | 77.2 |
134
+ | OBQA | 63.8 | 61.4 | 64.6 | 68.4 | 73.8 |
135
+ | CSQA | 65.6 | 59.0 | 64.4 | 65.4 | 72.4 |
136
+ | PIQA | 77.4 | 77.7 | 79.8 | 78.9 | 76.0 |
137
+ | SIQA | 64.4 | 57.5 | 61.9 | 63.8 | 68.7 |
138
+ | HS | 69.7 | 73.2 | 74.7 | 76.9 | 67.5 |
139
+ | WG | 66.5 | 65.6 | 71.2 | 72.0 | 64.8 |
140
+ | | | | | | |
141
+ | Average | 60.7 | 59.3 | 62.2 | 64.7 | 63.6 |
142
+
143
+ #### Multilingual Results
144
+
145
+ | Language | Benchmark | Helium-1 Preview | HF SmolLM2 (1.7B) | Gemma-2 (2.6B) | Llama-3.2 (3B) | Qwen2.5 (1.5B) |
146
+ |-----|--------------|:------:|:------:|:------:|:------:|:------:|
147
+ | | | | | | | |
148
+ | German | MMLU | 45.6 | 35.3 | 45.0 | 47.5 | 49.5 |
149
+ | | ARC C | 56.7 | 38.4 | 54.7 | 58.3 | 60.2 |
150
+ | | HS | 53.5 | 33.9 | 53.4 | 53.7 | 42.8 |
151
+ | | MKQA | 16.1 | 7.1 | 18.9 | 20.2 | 10.4 |
152
+ | | FLORES | 33.9 | 12.2 | 30.7 | 28.2 | 20.8 |
153
+ | Spanish | MMLU | 46.5 | 38.9 | 46.2 | 49.6 | 52.8 |
154
+ | | ARC C | 58.3 | 43.2 | 58.8 | 60.0 | 68.1 |
155
+ | | HS | 58.6 | 40.8 | 60.5 | 61.1 | 51.4 |
156
+ | | MKQA | 16.0 | 7.9 | 18.5 | 20.6 | 10.6 |
157
+ | | FLORES | 25.7 | 15.0 | 25.7 | 23.7 | 20.4 |
158
+ | French | MMLU | 46.0 | 37.7 | 45.7 | 48.8 | 51.9 |
159
+ | | ARC C | 57.9 | 40.6 | 57.5 | 60.1 | 67.4 |
160
+ | | HS | 59.0 | 41.1 | 60.4 | 59.6 | 51.2 |
161
+ | | MKQA | 16.8 | 8.4 | 18.4 | 19.6 | 9.7 |
162
+ | | FLORES | 44.3 | 20.0 | 43.3 | 39.3 | 31.2 |
163
+ | Italian | MMLU | 46.1 | 36.3 | 45.6 | 48.8 | 50.5 |
164
+ | | ARC C | 57.4 | 39.1 | 53.9 | 60.1 | 64.6 |
165
+ | | HS | 55.2 | 37.7 | 56.2 | 56.8 | 46.8 |
166
+ | | MKQA | 15.3 | 6.3 | 18.0 | 19.0 | 9.9 |
167
+ | | FLORES | 25.8 | 10.4 | 25.2 | 23.8 | 16.4 |
168
+ | Portuguese | MMLU | 46.2 | 37.7 | 45.6 | 49.2 | 53.0 |
169
+ | | ARC C | 56.8 | 40.6 | 57.0 | 62.1 | 66.6 |
170
+ | | HS | 57.3 | 41.0 | 58.7 | 59.1 | 50.9 |
171
+ | | MKQA | 14.7 | 6.6 | 16.9 | 19.1 | 9.2 |
172
+ | | FLORES | 43.0 | 20.0 | 43.6 | 40.5 | 33.0 |
173
+ | | | | | | | |
174
+ | | Average | 42.1 | 27.8 | 42.3 | 43.6 | 40.0
175
+
176
+ ## Technical Specifications
177
+
178
+ ### Model Architecture and Objective
179
+
180
+ | Hyperparameter | Value |
181
+ |--------------|:------:|
182
+ | Layers | 24 |
183
+ | Heads | 20 |
184
+ | Model dimension | 2560 |
185
+ | MLP dimension | 7040 |
186
+ | Context size | 4096 |
187
+ | Theta RoPE | 100,000 |
188
+
189
+ #### Hardware
190
+
191
+ The model was trained on 128 NVIDIA H100 Tensor Core GPUs.
192
+
193
+ #### Software
194
+
195
+ The model was trained using Jax.
196
+
197
+ ## Citation
198
+
199
+ Blog post: https://kyutai.org/2025/01/13/helium-release.html