anakin87
/

gemma-2-2b-neogenesis-ita

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

anakin87 commited on 4 days ago

Commit

85252ea

·

verified ·

1 Parent(s): 5b90b4a

Update README.md

Files changed (1) hide show

README.md +4 -2

README.md CHANGED Viewed

@@ -78,11 +78,13 @@ The model was fine-tuned using [Hugging Face TRL](https://huggingface.co/docs/tr
 The training involved Instruction Fine Tuning and Direct Preference Optimization.
 I adopted a relatively new technique for parameter-efficient learning: [Spectrum](https://arxiv.org/abs/2406.06623). The idea is to train only the layers of the model with high Signal-to-Noise Ratio (SNR) and ❄️ freeze the rest.
-Training required about 15 hours on a single NVIDIA A6000 GPU (48GB VRAM).
-For comprehensive training details, check out the [📓 Kaggle notebook](https://www.kaggle.com/code/anakin87/post-training-gemma-for-italian-and-beyond).
 # 🗃️ Training data
 The model was trained primarily on Italian data, with a small portion of English data included.

 The training involved Instruction Fine Tuning and Direct Preference Optimization.
 I adopted a relatively new technique for parameter-efficient learning: [Spectrum](https://arxiv.org/abs/2406.06623). The idea is to train only the layers of the model with high Signal-to-Noise Ratio (SNR) and ❄️ freeze the rest.
+Specifically, training focused on the top 25% most informative layers.
+Batch size: 16; learning rate: 5e-6; epochs: 1 for SFT and 1 for DPO.
+Training required about 15 hours on a single NVIDIA A6000 GPU (48GB VRAM).
+For comprehensive training code and details, check out the [📓 Kaggle notebook](https://www.kaggle.com/code/anakin87/post-training-gemma-for-italian-and-beyond).
 # 🗃️ Training data
 The model was trained primarily on Italian data, with a small portion of English data included.