anakin87
/

gemma-2-9b-neogenesis-ita

@@ -2,10 +2,91 @@
 license: gemma
 language:
 - it
 base_model:
 - VAGOsolutions/SauerkrautLM-gemma-2-9b-it
 pipeline_tag: text-generation
 library_name: transformers
 ---
-Work in progress - Work in progress - Work in progress - Work in progress - Work in progress - Work in progress - Work in progress - Work in progress - Work in progress - Work in progress - Work in progress

 license: gemma
 language:
 - it
+- en
 base_model:
 - VAGOsolutions/SauerkrautLM-gemma-2-9b-it
 pipeline_tag: text-generation
 library_name: transformers
+datasets:
+- mii-llm/argilla-math-preferences-it
+- ruggsea/wsdm2024-cot-dataset
+- anakin87/evol-dpo-ita-reranked
+- mlabonne/orpo-dpo-mix-40k
 ---
+<h1>Gemma 2 9B Neogenesis ITA</h1>
+<img src="https://github.com/anakin87/gemma-neogenesis/blob/main/images/gemma_neogenesis_9b.jpeg?raw=true" width="450px">
+Fine-tuned version of [VAGOsolutions/SauerkrautLM-gemma-2-9b-it](https://huggingface.co/VAGOsolutions/SauerkrautLM-gemma-2-9b-it) optimized for better performance in Italian.
+- Good model with 9.24 billion parameters
+- Supports 8k context length
+# 🎮 Usage
+**Text generation with Transformers**
+```python
+import torch
+from transformers import pipeline
+model_id="anakin87/gemma-2-9b-neogenesis-ita"
+pipe = pipeline(
+    "text-generation",
+    model=model_id,
+    model_kwargs={"torch_dtype": torch.bfloat16},
+    device="cuda",
+)
+messages = [{"role": "user", "content": "Cos'è l'interesse composto? Spiega in maniera semplice e chiara."}]
+outputs = pipe(messages, max_new_tokens=500)
+print(outputs[0]["generated_text"][1]["content"])
+```
+# 🏆 Evaluation Results
+The model was submitted and evaluated in the [Open Ita LLM Leaderboard](https://huggingface.co/spaces/mii-llm/open_ita_llm_leaderboard), the most popular leaderboard for Italian Language Models.
+| Model                 | MMLU_IT | ARC_IT | HELLASWAG_IT | Average |
+|-----------------------|---------|--------|--------------|---------|
+| google/gemma-2-9b-it  | 65.67   | 55.6  |68.95         | 63.41   |
+| VAGOsolutions/SauerkrautLM-gemma-2-9b-it  | 65.76   | **61.25**  |72.10        | 66.37   |
+| **anakin87/gemma-2-9b-neogenesis-ita**  | **65.82**   | **61.25**  |**73.29**         | **66.79**   |
+These results establish this model as a strong 9B model for Italian, outperforming 13-14B models and even surpassing some in the 30-70B range.
+# 🔧 Training details
+The model was fine-tuned using [Hugging Face TRL](https://huggingface.co/docs/trl/index) and applying Direct Preference Optimization.
+I adopted a relatively new technique for parameter-efficient learning: [Spectrum](https://arxiv.org/abs/2406.06623). The idea is to train only the layers of the model with high Signal-to-Noise Ratio (SNR) and ❄️ freeze the rest. Specifically, training focused on the top 20% most informative layers.
+Batch size: 16; learning rate: 1e-6; epochs: 1.
+The training process took approximately 12 hours on a single NVIDIA A100 GPU (80GB VRAM).
+For the training code, see the DPO section in this [📓 Kaggle notebook](https://www.kaggle.com/code/anakin87/post-training-gemma-for-italian-and-beyond), modified to use a different base model, hyperparameters, and no on-policy data.
+# 🗃️ Training data
+The model was trained primarily on Italian data, with a small portion of English data included.
+For Direct Preference Optimization
+- Italian data
+  - [mii-llm/argilla-math-preferences-it](https://huggingface.co/datasets/mii-llm/argilla-math-preferences-it)
+  - [ruggsea/wsdm2024-cot-dataset](https://huggingface.co/datasets/ruggsea/wsdm2024-cot-dataset)
+  - [anakin87/evol-dpo-ita-reranked](https://huggingface.co/datasets/anakin87/evol-dpo-ita-reranked)
+- English data
+  - [mlabonne/orpo-dpo-mix-40k](https://huggingface.co/datasets/mlabonne/orpo-dpo-mix-40k)
+🙏 Thanks to the authors for providing these datasets.
+# 🛡️ Safety
+While this model was not specifically fine-tuned for safety, its selective training with the Spectrum technique helps preserve certain safety features from the original model.