File size: 3,901 Bytes
3ca8858
 
 
 
2746b0b
3ca8858
 
 
 
2746b0b
 
 
 
 
3ca8858
 
2746b0b
 
 
 
 
 
 
 
 
d221934
 
2746b0b
 
d221934
 
 
2746b0b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
---
license: gemma
language:
- it
- en
base_model:
- VAGOsolutions/SauerkrautLM-gemma-2-9b-it
pipeline_tag: text-generation
library_name: transformers
datasets:
- mii-llm/argilla-math-preferences-it
- ruggsea/wsdm2024-cot-dataset
- anakin87/evol-dpo-ita-reranked
- mlabonne/orpo-dpo-mix-40k
---

<h1>Gemma 2 9B Neogenesis ITA</h1>

<img src="https://github.com/anakin87/gemma-neogenesis/blob/main/images/gemma_neogenesis_9b.jpeg?raw=true" width="450px">

Fine-tuned version of [VAGOsolutions/SauerkrautLM-gemma-2-9b-it](https://huggingface.co/VAGOsolutions/SauerkrautLM-gemma-2-9b-it) optimized for better performance in Italian.

- Good model with 9.24 billion parameters
- Supports 8k context length

*Need a smaller model?* Try [gemma-2-2b-neogenesis-ita](https://huggingface.co/anakin87/gemma-2-2b-neogenesis-ita).

# ๐ŸŽฎ Usage

[๐Ÿ’ฌ๐Ÿ‡ฎ๐Ÿ‡น Try the model on Hugging Face Spaces](https://huggingface.co/spaces/anakin87/gemma-2-9b-neogenesis-ita)


**Text generation with Transformers**


```python
import torch
from transformers import pipeline

model_id="anakin87/gemma-2-9b-neogenesis-ita"

pipe = pipeline(
    "text-generation",
    model=model_id,
    model_kwargs={"torch_dtype": torch.bfloat16},
    device="cuda",
)

messages = [{"role": "user", "content": "Cos'รจ l'interesse composto? Spiega in maniera semplice e chiara."}]
outputs = pipe(messages, max_new_tokens=500)

print(outputs[0]["generated_text"][1]["content"])
```


# ๐Ÿ† Evaluation Results

The model was submitted and evaluated in the [Open Ita LLM Leaderboard](https://huggingface.co/spaces/mii-llm/open_ita_llm_leaderboard), the most popular leaderboard for Italian Language Models.

| Model                 | MMLU_IT | ARC_IT | HELLASWAG_IT | Average |
|-----------------------|---------|--------|--------------|---------|
| google/gemma-2-9b-it  | 65.67   | 55.6  |68.95         | 63.41   |
| VAGOsolutions/SauerkrautLM-gemma-2-9b-it  | 65.76   | **61.25**  |72.10        | 66.37   |
| **anakin87/gemma-2-9b-neogenesis-ita**  | **65.82**   | **61.25**  |**73.29**         | **66.79**   |

These results establish this model as a strong 9B model for Italian, outperforming 13-14B models and even surpassing some in the 30-70B range.


# ๐Ÿ”ง Training details

The model was fine-tuned using [Hugging Face TRL](https://huggingface.co/docs/trl/index) and applying Direct Preference Optimization.

I adopted a relatively new technique for parameter-efficient learning: [Spectrum](https://arxiv.org/abs/2406.06623). The idea is to train only the layers of the model with high Signal-to-Noise Ratio (SNR) and โ„๏ธ freeze the rest. Specifically, training focused on the top 20% most informative layers.

Batch size: 16; learning rate: 1e-6; epochs: 1.

The training process took approximately 12 hours on a single NVIDIA A100 GPU (80GB VRAM).

For the training code, see the DPO section in this [๐Ÿ““ Kaggle notebook](https://www.kaggle.com/code/anakin87/post-training-gemma-for-italian-and-beyond), modified to use a different base model, hyperparameters, and no on-policy data.


# ๐Ÿ—ƒ๏ธ Training data
The model was trained primarily on Italian data, with a small portion of English data included.

For Direct Preference Optimization
- Italian data
  - [mii-llm/argilla-math-preferences-it](https://huggingface.co/datasets/mii-llm/argilla-math-preferences-it)
  - [ruggsea/wsdm2024-cot-dataset](https://huggingface.co/datasets/ruggsea/wsdm2024-cot-dataset)
  - [anakin87/evol-dpo-ita-reranked](https://huggingface.co/datasets/anakin87/evol-dpo-ita-reranked)
- English data
  - [mlabonne/orpo-dpo-mix-40k](https://huggingface.co/datasets/mlabonne/orpo-dpo-mix-40k)

๐Ÿ™ Thanks to the authors for providing these datasets.  


# ๐Ÿ›ก๏ธ Safety
While this model was not specifically fine-tuned for safety, its selective training with the Spectrum technique helps preserve certain safety features from the original model.