Update README.md

d221934 verified 14 days ago

3.9 kB

	---
	license: gemma
	language:
	- it
	- en
	base_model:
	- VAGOsolutions/SauerkrautLM-gemma-2-9b-it
	pipeline_tag: text-generation
	library_name: transformers
	datasets:
	- mii-llm/argilla-math-preferences-it
	- ruggsea/wsdm2024-cot-dataset
	- anakin87/evol-dpo-ita-reranked
	- mlabonne/orpo-dpo-mix-40k
	---

	<h1>Gemma 2 9B Neogenesis ITA</h1>

	<img src="https://github.com/anakin87/gemma-neogenesis/blob/main/images/gemma_neogenesis_9b.jpeg?raw=true" width="450px">

	Fine-tuned version of [VAGOsolutions/SauerkrautLM-gemma-2-9b-it](https://huggingface.co/VAGOsolutions/SauerkrautLM-gemma-2-9b-it) optimized for better performance in Italian.

	- Good model with 9.24 billion parameters
	- Supports 8k context length

	Need a smaller model? Try [gemma-2-2b-neogenesis-ita](https://huggingface.co/anakin87/gemma-2-2b-neogenesis-ita).

	# 🎮 Usage

	[💬🇮🇹 Try the model on Hugging Face Spaces](https://huggingface.co/spaces/anakin87/gemma-2-9b-neogenesis-ita)


	Text generation with Transformers


	```python
	import torch
	from transformers import pipeline

	model_id="anakin87/gemma-2-9b-neogenesis-ita"

	pipe = pipeline(
	"text-generation",
	model=model_id,
	model_kwargs={"torch_dtype": torch.bfloat16},
	device="cuda",
	)

	messages = [{"role": "user", "content": "Cos'è l'interesse composto? Spiega in maniera semplice e chiara."}]
	outputs = pipe(messages, max_new_tokens=500)

	print(outputs[0]["generated_text"][1]["content"])
	```


	# 🏆 Evaluation Results

	The model was submitted and evaluated in the [Open Ita LLM Leaderboard](https://huggingface.co/spaces/mii-llm/open_ita_llm_leaderboard), the most popular leaderboard for Italian Language Models.

	\| Model \| MMLU_IT \| ARC_IT \| HELLASWAG_IT \| Average \|
	\|-----------------------\|---------\|--------\|--------------\|---------\|
	\| google/gemma-2-9b-it \| 65.67 \| 55.6 \|68.95 \| 63.41 \|
	\| VAGOsolutions/SauerkrautLM-gemma-2-9b-it \| 65.76 \| 61.25 \|72.10 \| 66.37 \|
	\| anakin87/gemma-2-9b-neogenesis-ita \| 65.82 \| 61.25 \|73.29 \| 66.79 \|

	These results establish this model as a strong 9B model for Italian, outperforming 13-14B models and even surpassing some in the 30-70B range.


	# 🔧 Training details

	The model was fine-tuned using [Hugging Face TRL](https://huggingface.co/docs/trl/index) and applying Direct Preference Optimization.

	I adopted a relatively new technique for parameter-efficient learning: [Spectrum](https://arxiv.org/abs/2406.06623). The idea is to train only the layers of the model with high Signal-to-Noise Ratio (SNR) and ❄️ freeze the rest. Specifically, training focused on the top 20% most informative layers.

	Batch size: 16; learning rate: 1e-6; epochs: 1.

	The training process took approximately 12 hours on a single NVIDIA A100 GPU (80GB VRAM).

	For the training code, see the DPO section in this [📓 Kaggle notebook](https://www.kaggle.com/code/anakin87/post-training-gemma-for-italian-and-beyond), modified to use a different base model, hyperparameters, and no on-policy data.


	# 🗃️ Training data
	The model was trained primarily on Italian data, with a small portion of English data included.

	For Direct Preference Optimization
	- Italian data
	- [mii-llm/argilla-math-preferences-it](https://huggingface.co/datasets/mii-llm/argilla-math-preferences-it)
	- [ruggsea/wsdm2024-cot-dataset](https://huggingface.co/datasets/ruggsea/wsdm2024-cot-dataset)
	- [anakin87/evol-dpo-ita-reranked](https://huggingface.co/datasets/anakin87/evol-dpo-ita-reranked)
	- English data
	- [mlabonne/orpo-dpo-mix-40k](https://huggingface.co/datasets/mlabonne/orpo-dpo-mix-40k)

	🙏 Thanks to the authors for providing these datasets.


	# 🛡️ Safety
	While this model was not specifically fine-tuned for safety, its selective training with the Spectrum technique helps preserve certain safety features from the original model.