anakin87 commited on
Commit
69e6696
ยท
verified ยท
1 Parent(s): 5c8c851

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +92 -1
README.md CHANGED
@@ -8,4 +8,95 @@ pipeline_tag: text-generation
8
  library_name: transformers
9
  ---
10
 
11
- Work in progress - Work in progress - Work in progress - Work in progress - Work in progress - Work in progress - Work in progress - Work in progress - Work in progress - Work in progress - Work in progress
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8
  library_name: transformers
9
  ---
10
 
11
+ <h1>Gemma 2 2B Neogenesis ITA</h1>
12
+
13
+ <img src="https://github.com/anakin87/gemma-neogenesis/blob/main/images/gemma_neogenesis_2b.jpeg?raw=true" width="450px">
14
+
15
+ Fine-tuned version of [google/gemma-2-2b-it](https://huggingface.co/google/gemma-2-2b-it) optimized for better performance in Italian.
16
+
17
+ - Small yet powerful model with 2.6 billion parameters
18
+ - Supports 8k context length
19
+
20
+
21
+ # Usage
22
+
23
+ [๐Ÿ’ฌ๐Ÿ‡ฎ๐Ÿ‡น Try the model on Hugging Face Spaces](https://huggingface.co/spaces/anakin87/gemma-2-2b-neogenesis-ita)
24
+
25
+ **Text generation with Transformers**
26
+
27
+
28
+ ```python
29
+ import torch
30
+ from transformers import pipeline
31
+
32
+ model_id="anakin87/gemma-2-2b-neogenesis-ita"
33
+
34
+ pipe = pipeline(
35
+ "text-generation",
36
+ model=model_id,
37
+ model_kwargs={"torch_dtype": torch.bfloat16},
38
+ device="cuda",
39
+ )
40
+
41
+ messages = [{"role": "user", "content": "Cos'รจ l'interesse composto? Spiega in maniera semplice e chiara."}]
42
+ outputs = pipe(messages, max_new_tokens=500)
43
+
44
+ print(outputs[0]["generated_text"][1]["content"])
45
+
46
+ >>> Immagina di avere 100 euro e di depositarli in un conto che ti dร  un interesse del 5% all'anno....
47
+ ```
48
+
49
+ For more usage examples and applications, refer to the [๐Ÿ““ Kaggle notebook](https://www.kaggle.com/code/anakin87/post-training-gemma-for-italian-and-beyond).
50
+
51
+ # Evaluation Results
52
+
53
+ The model was submitted and evaluated in the [Open Ita LLM Leaderboard](https://huggingface.co/spaces/mii-llm/open_ita_llm_leaderboard), the most popular leaderboard for Italian Language Models.
54
+
55
+ | Model | MMLU_IT | ARC_IT | HELLASWAG_IT | Average |
56
+ |-----------------------|---------|--------|--------------|---------|
57
+ | google/gemma-2-2b-it | 47.65 | 40.03 |54.69 | 47.46 |
58
+ | [anakin87/gemma-2-2b-ita-sft](https://huggingface.co/anakin87/gemma-2-2b-ita-sft) (SFT checkpoint) | 47.77 | **41.15** |55.66 | 48.19 |
59
+ | **anakin87/gemma-2-2b-neogenesis-ita (DPO)** | **48.03** | 40.46 |**56.97** | **48.49** |
60
+
61
+ Qualitative evaluation across various domains is available [here](https://html-preview.github.io/?url=https://github.com/anakin87/gemma-neogenesis/blob/main/qualitative_evaluation/qualitative_evaluation.html).
62
+
63
+ # ๐Ÿ”ง Training details
64
+
65
+ The model was fine-tuned using [Hugging Face TRL](https://huggingface.co/docs/trl/index).
66
+
67
+ The training involved Instruction Fine Tuning and Direct Preference Optimization.
68
+
69
+ I adopted a relatively new technique for parameter-efficient learning: [Spectrum](https://arxiv.org/abs/2406.06623). The idea is to train only the layers of the model with high Signal-to-Noise Ratio (SNR) and โ„๏ธ freeze the rest.
70
+
71
+ Training required about 15 hours on a single NVIDIA A6000 GPU (48GB VRAM).
72
+
73
+ For comprehensive training details, check out the [๐Ÿ““ Kaggle notebook](https://www.kaggle.com/code/anakin87/post-training-gemma-for-italian-and-beyond).
74
+
75
+
76
+ # Training data
77
+ The model was trained primarily on Italian data, with a small portion of English data included.
78
+
79
+
80
+ For Instruction Fine Tuning:
81
+ - Italian data
82
+ - [efederici/capybara-claude-15k-ita](https://huggingface.co/datasets/efederici/capybara-claude-15k-ita)
83
+ - [anakin87/fine-instructions-ita-70k](https://huggingface.co/datasets/anakin87/fine-instructions-ita-70k)
84
+
85
+ For Direct Preference Optimization
86
+ - Italian data
87
+ - [mii-llm/argilla-math-preferences-it](https://huggingface.co/datasets/mii-llm/argilla-math-preferences-it)
88
+ - [ruggsea/wsdm2024-cot-dataset](https://huggingface.co/datasets/ruggsea/wsdm2024-cot-dataset)
89
+ - [anakin87/evol-dpo-ita-reranked](https://huggingface.co/datasets/anakin87/evol-dpo-ita-reranked)
90
+ - [anakin87/gemma-vs-gemma-preferences](https://huggingface.co/datasets/anakin87/gemma-vs-gemma-preferences)
91
+ - English data
92
+ - [mlabonne/orpo-dpo-mix-40k](https://huggingface.co/datasets/mlabonne/orpo-dpo-mix-40k)
93
+
94
+ ๐Ÿ™ Thanks to the authors for providing these datasets.
95
+
96
+
97
+ # Usage limitations
98
+
99
+ Although the model demonstrates solid Italian fluency and good reasoning capabilities for its small size, it is expected to have limited world knowledge due to its restricted number of parameters. This limitation can be mitigated by pairing it with techniques like Retrieval-Augmented Generation. Check out the [๐Ÿ““ Kaggle notebook](https://www.kaggle.com/code/anakin87/post-training-gemma-for-italian-and-beyond) for an example.
100
+
101
+ # Safety
102
+ While this model was not specifically fine-tuned for safety, its selective training with the Spectrum technique helps preserve certain safety features from the original model, as emerged in the [qualitative evaluation]((https://html-preview.github.io/?url=https://github.com/anakin87/gemma-neogenesis/blob/main/qualitative_evaluation/qualitative_evaluation.html)).