--- license: gemma language: - it - en base_model: - VAGOsolutions/SauerkrautLM-gemma-2-9b-it pipeline_tag: text-generation library_name: transformers datasets: - mii-llm/argilla-math-preferences-it - ruggsea/wsdm2024-cot-dataset - anakin87/evol-dpo-ita-reranked - mlabonne/orpo-dpo-mix-40k ---

Gemma 2 9B Neogenesis ITA

Fine-tuned version of [VAGOsolutions/SauerkrautLM-gemma-2-9b-it](https://huggingface.co/VAGOsolutions/SauerkrautLM-gemma-2-9b-it) optimized for better performance in Italian. - Good model with 9.24 billion parameters - Supports 8k context length *Need a smaller model?* Try [gemma-2-2b-neogenesis-ita](https://huggingface.co/anakin87/gemma-2-2b-neogenesis-ita). # ๐ŸŽฎ Usage [๐Ÿ’ฌ๐Ÿ‡ฎ๐Ÿ‡น Try the model on Hugging Face Spaces](https://huggingface.co/spaces/anakin87/gemma-2-9b-neogenesis-ita) **Text generation with Transformers** ```python import torch from transformers import pipeline model_id="anakin87/gemma-2-9b-neogenesis-ita" pipe = pipeline( "text-generation", model=model_id, model_kwargs={"torch_dtype": torch.bfloat16}, device="cuda", ) messages = [{"role": "user", "content": "Cos'รจ l'interesse composto? Spiega in maniera semplice e chiara."}] outputs = pipe(messages, max_new_tokens=500) print(outputs[0]["generated_text"][1]["content"]) ``` # ๐Ÿ† Evaluation Results The model was submitted and evaluated in the [Open Ita LLM Leaderboard](https://huggingface.co/spaces/mii-llm/open_ita_llm_leaderboard), the most popular leaderboard for Italian Language Models. | Model | MMLU_IT | ARC_IT | HELLASWAG_IT | Average | |-----------------------|---------|--------|--------------|---------| | google/gemma-2-9b-it | 65.67 | 55.6 |68.95 | 63.41 | | VAGOsolutions/SauerkrautLM-gemma-2-9b-it | 65.76 | **61.25** |72.10 | 66.37 | | **anakin87/gemma-2-9b-neogenesis-ita** | **65.82** | **61.25** |**73.29** | **66.79** | These results establish this model as a strong 9B model for Italian, outperforming 13-14B models and even surpassing some in the 30-70B range. # ๐Ÿ”ง Training details The model was fine-tuned using [Hugging Face TRL](https://huggingface.co/docs/trl/index) and applying Direct Preference Optimization. I adopted a relatively new technique for parameter-efficient learning: [Spectrum](https://arxiv.org/abs/2406.06623). The idea is to train only the layers of the model with high Signal-to-Noise Ratio (SNR) and โ„๏ธ freeze the rest. Specifically, training focused on the top 20% most informative layers. Batch size: 16; learning rate: 1e-6; epochs: 1. The training process took approximately 12 hours on a single NVIDIA A100 GPU (80GB VRAM). For the training code, see the DPO section in this [๐Ÿ““ Kaggle notebook](https://www.kaggle.com/code/anakin87/post-training-gemma-for-italian-and-beyond), modified to use a different base model, hyperparameters, and no on-policy data. # ๐Ÿ—ƒ๏ธ Training data The model was trained primarily on Italian data, with a small portion of English data included. For Direct Preference Optimization - Italian data - [mii-llm/argilla-math-preferences-it](https://huggingface.co/datasets/mii-llm/argilla-math-preferences-it) - [ruggsea/wsdm2024-cot-dataset](https://huggingface.co/datasets/ruggsea/wsdm2024-cot-dataset) - [anakin87/evol-dpo-ita-reranked](https://huggingface.co/datasets/anakin87/evol-dpo-ita-reranked) - English data - [mlabonne/orpo-dpo-mix-40k](https://huggingface.co/datasets/mlabonne/orpo-dpo-mix-40k) ๐Ÿ™ Thanks to the authors for providing these datasets. # ๐Ÿ›ก๏ธ Safety While this model was not specifically fine-tuned for safety, its selective training with the Spectrum technique helps preserve certain safety features from the original model.