|
--- |
|
license: gemma |
|
language: |
|
- it |
|
- en |
|
base_model: |
|
- VAGOsolutions/SauerkrautLM-gemma-2-9b-it |
|
pipeline_tag: text-generation |
|
library_name: transformers |
|
datasets: |
|
- mii-llm/argilla-math-preferences-it |
|
- ruggsea/wsdm2024-cot-dataset |
|
- anakin87/evol-dpo-ita-reranked |
|
- mlabonne/orpo-dpo-mix-40k |
|
--- |
|
|
|
<h1>Gemma 2 9B Neogenesis ITA</h1> |
|
|
|
<img src="https://github.com/anakin87/gemma-neogenesis/blob/main/images/gemma_neogenesis_9b.jpeg?raw=true" width="450px"> |
|
|
|
Fine-tuned version of [VAGOsolutions/SauerkrautLM-gemma-2-9b-it](https://huggingface.co/VAGOsolutions/SauerkrautLM-gemma-2-9b-it) optimized for better performance in Italian. |
|
|
|
- Good model with 9.24 billion parameters |
|
- Supports 8k context length |
|
|
|
*Need a smaller model?* Try [gemma-2-2b-neogenesis-ita](https://huggingface.co/anakin87/gemma-2-2b-neogenesis-ita). |
|
|
|
# ๐ฎ Usage |
|
|
|
[๐ฌ๐ฎ๐น Try the model on Hugging Face Spaces](https://huggingface.co/spaces/anakin87/gemma-2-9b-neogenesis-ita) |
|
|
|
|
|
**Text generation with Transformers** |
|
|
|
|
|
```python |
|
import torch |
|
from transformers import pipeline |
|
|
|
model_id="anakin87/gemma-2-9b-neogenesis-ita" |
|
|
|
pipe = pipeline( |
|
"text-generation", |
|
model=model_id, |
|
model_kwargs={"torch_dtype": torch.bfloat16}, |
|
device="cuda", |
|
) |
|
|
|
messages = [{"role": "user", "content": "Cos'รจ l'interesse composto? Spiega in maniera semplice e chiara."}] |
|
outputs = pipe(messages, max_new_tokens=500) |
|
|
|
print(outputs[0]["generated_text"][1]["content"]) |
|
``` |
|
|
|
|
|
# ๐ Evaluation Results |
|
|
|
The model was submitted and evaluated in the [Open Ita LLM Leaderboard](https://huggingface.co/spaces/mii-llm/open_ita_llm_leaderboard), the most popular leaderboard for Italian Language Models. |
|
|
|
| Model | MMLU_IT | ARC_IT | HELLASWAG_IT | Average | |
|
|-----------------------|---------|--------|--------------|---------| |
|
| google/gemma-2-9b-it | 65.67 | 55.6 |68.95 | 63.41 | |
|
| VAGOsolutions/SauerkrautLM-gemma-2-9b-it | 65.76 | **61.25** |72.10 | 66.37 | |
|
| **anakin87/gemma-2-9b-neogenesis-ita** | **65.82** | **61.25** |**73.29** | **66.79** | |
|
|
|
These results establish this model as a strong 9B model for Italian, outperforming 13-14B models and even surpassing some in the 30-70B range. |
|
|
|
|
|
# ๐ง Training details |
|
|
|
The model was fine-tuned using [Hugging Face TRL](https://huggingface.co/docs/trl/index) and applying Direct Preference Optimization. |
|
|
|
I adopted a relatively new technique for parameter-efficient learning: [Spectrum](https://arxiv.org/abs/2406.06623). The idea is to train only the layers of the model with high Signal-to-Noise Ratio (SNR) and โ๏ธ freeze the rest. Specifically, training focused on the top 20% most informative layers. |
|
|
|
Batch size: 16; learning rate: 1e-6; epochs: 1. |
|
|
|
The training process took approximately 12 hours on a single NVIDIA A100 GPU (80GB VRAM). |
|
|
|
For the training code, see the DPO section in this [๐ Kaggle notebook](https://www.kaggle.com/code/anakin87/post-training-gemma-for-italian-and-beyond), modified to use a different base model, hyperparameters, and no on-policy data. |
|
|
|
|
|
# ๐๏ธ Training data |
|
The model was trained primarily on Italian data, with a small portion of English data included. |
|
|
|
For Direct Preference Optimization |
|
- Italian data |
|
- [mii-llm/argilla-math-preferences-it](https://huggingface.co/datasets/mii-llm/argilla-math-preferences-it) |
|
- [ruggsea/wsdm2024-cot-dataset](https://huggingface.co/datasets/ruggsea/wsdm2024-cot-dataset) |
|
- [anakin87/evol-dpo-ita-reranked](https://huggingface.co/datasets/anakin87/evol-dpo-ita-reranked) |
|
- English data |
|
- [mlabonne/orpo-dpo-mix-40k](https://huggingface.co/datasets/mlabonne/orpo-dpo-mix-40k) |
|
|
|
๐ Thanks to the authors for providing these datasets. |
|
|
|
|
|
# ๐ก๏ธ Safety |
|
While this model was not specifically fine-tuned for safety, its selective training with the Spectrum technique helps preserve certain safety features from the original model. |