Update README.md
Browse files
README.md
CHANGED
@@ -2,10 +2,91 @@
|
|
2 |
license: gemma
|
3 |
language:
|
4 |
- it
|
|
|
5 |
base_model:
|
6 |
- VAGOsolutions/SauerkrautLM-gemma-2-9b-it
|
7 |
pipeline_tag: text-generation
|
8 |
library_name: transformers
|
|
|
|
|
|
|
|
|
|
|
9 |
---
|
10 |
|
11 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
2 |
license: gemma
|
3 |
language:
|
4 |
- it
|
5 |
+
- en
|
6 |
base_model:
|
7 |
- VAGOsolutions/SauerkrautLM-gemma-2-9b-it
|
8 |
pipeline_tag: text-generation
|
9 |
library_name: transformers
|
10 |
+
datasets:
|
11 |
+
- mii-llm/argilla-math-preferences-it
|
12 |
+
- ruggsea/wsdm2024-cot-dataset
|
13 |
+
- anakin87/evol-dpo-ita-reranked
|
14 |
+
- mlabonne/orpo-dpo-mix-40k
|
15 |
---
|
16 |
|
17 |
+
<h1>Gemma 2 9B Neogenesis ITA</h1>
|
18 |
+
|
19 |
+
<img src="https://github.com/anakin87/gemma-neogenesis/blob/main/images/gemma_neogenesis_9b.jpeg?raw=true" width="450px">
|
20 |
+
|
21 |
+
Fine-tuned version of [VAGOsolutions/SauerkrautLM-gemma-2-9b-it](https://huggingface.co/VAGOsolutions/SauerkrautLM-gemma-2-9b-it) optimized for better performance in Italian.
|
22 |
+
|
23 |
+
- Good model with 9.24 billion parameters
|
24 |
+
- Supports 8k context length
|
25 |
+
|
26 |
+
# ๐ฎ Usage
|
27 |
+
|
28 |
+
**Text generation with Transformers**
|
29 |
+
|
30 |
+
|
31 |
+
```python
|
32 |
+
import torch
|
33 |
+
from transformers import pipeline
|
34 |
+
|
35 |
+
model_id="anakin87/gemma-2-9b-neogenesis-ita"
|
36 |
+
|
37 |
+
pipe = pipeline(
|
38 |
+
"text-generation",
|
39 |
+
model=model_id,
|
40 |
+
model_kwargs={"torch_dtype": torch.bfloat16},
|
41 |
+
device="cuda",
|
42 |
+
)
|
43 |
+
|
44 |
+
messages = [{"role": "user", "content": "Cos'รจ l'interesse composto? Spiega in maniera semplice e chiara."}]
|
45 |
+
outputs = pipe(messages, max_new_tokens=500)
|
46 |
+
|
47 |
+
print(outputs[0]["generated_text"][1]["content"])
|
48 |
+
```
|
49 |
+
|
50 |
+
|
51 |
+
# ๐ Evaluation Results
|
52 |
+
|
53 |
+
The model was submitted and evaluated in the [Open Ita LLM Leaderboard](https://huggingface.co/spaces/mii-llm/open_ita_llm_leaderboard), the most popular leaderboard for Italian Language Models.
|
54 |
+
|
55 |
+
| Model | MMLU_IT | ARC_IT | HELLASWAG_IT | Average |
|
56 |
+
|-----------------------|---------|--------|--------------|---------|
|
57 |
+
| google/gemma-2-9b-it | 65.67 | 55.6 |68.95 | 63.41 |
|
58 |
+
| VAGOsolutions/SauerkrautLM-gemma-2-9b-it | 65.76 | **61.25** |72.10 | 66.37 |
|
59 |
+
| **anakin87/gemma-2-9b-neogenesis-ita** | **65.82** | **61.25** |**73.29** | **66.79** |
|
60 |
+
|
61 |
+
These results establish this model as a strong 9B model for Italian, outperforming 13-14B models and even surpassing some in the 30-70B range.
|
62 |
+
|
63 |
+
|
64 |
+
# ๐ง Training details
|
65 |
+
|
66 |
+
The model was fine-tuned using [Hugging Face TRL](https://huggingface.co/docs/trl/index) and applying Direct Preference Optimization.
|
67 |
+
|
68 |
+
I adopted a relatively new technique for parameter-efficient learning: [Spectrum](https://arxiv.org/abs/2406.06623). The idea is to train only the layers of the model with high Signal-to-Noise Ratio (SNR) and โ๏ธ freeze the rest. Specifically, training focused on the top 20% most informative layers.
|
69 |
+
|
70 |
+
Batch size: 16; learning rate: 1e-6; epochs: 1.
|
71 |
+
|
72 |
+
The training process took approximately 12 hours on a single NVIDIA A100 GPU (80GB VRAM).
|
73 |
+
|
74 |
+
For the training code, see the DPO section in this [๐ Kaggle notebook](https://www.kaggle.com/code/anakin87/post-training-gemma-for-italian-and-beyond), modified to use a different base model, hyperparameters, and no on-policy data.
|
75 |
+
|
76 |
+
|
77 |
+
# ๐๏ธ Training data
|
78 |
+
The model was trained primarily on Italian data, with a small portion of English data included.
|
79 |
+
|
80 |
+
For Direct Preference Optimization
|
81 |
+
- Italian data
|
82 |
+
- [mii-llm/argilla-math-preferences-it](https://huggingface.co/datasets/mii-llm/argilla-math-preferences-it)
|
83 |
+
- [ruggsea/wsdm2024-cot-dataset](https://huggingface.co/datasets/ruggsea/wsdm2024-cot-dataset)
|
84 |
+
- [anakin87/evol-dpo-ita-reranked](https://huggingface.co/datasets/anakin87/evol-dpo-ita-reranked)
|
85 |
+
- English data
|
86 |
+
- [mlabonne/orpo-dpo-mix-40k](https://huggingface.co/datasets/mlabonne/orpo-dpo-mix-40k)
|
87 |
+
|
88 |
+
๐ Thanks to the authors for providing these datasets.
|
89 |
+
|
90 |
+
|
91 |
+
# ๐ก๏ธ Safety
|
92 |
+
While this model was not specifically fine-tuned for safety, its selective training with the Spectrum technique helps preserve certain safety features from the original model.
|