Update README.md
Browse files
README.md
CHANGED
@@ -78,11 +78,13 @@ The model was fine-tuned using [Hugging Face TRL](https://huggingface.co/docs/tr
|
|
78 |
The training involved Instruction Fine Tuning and Direct Preference Optimization.
|
79 |
|
80 |
I adopted a relatively new technique for parameter-efficient learning: [Spectrum](https://arxiv.org/abs/2406.06623). The idea is to train only the layers of the model with high Signal-to-Noise Ratio (SNR) and โ๏ธ freeze the rest.
|
|
|
81 |
|
82 |
-
|
83 |
|
84 |
-
|
85 |
|
|
|
86 |
|
87 |
# ๐๏ธ Training data
|
88 |
The model was trained primarily on Italian data, with a small portion of English data included.
|
|
|
78 |
The training involved Instruction Fine Tuning and Direct Preference Optimization.
|
79 |
|
80 |
I adopted a relatively new technique for parameter-efficient learning: [Spectrum](https://arxiv.org/abs/2406.06623). The idea is to train only the layers of the model with high Signal-to-Noise Ratio (SNR) and โ๏ธ freeze the rest.
|
81 |
+
Specifically, training focused on the top 25% most informative layers.
|
82 |
|
83 |
+
Batch size: 16; learning rate: 5e-6; epochs: 1 for SFT and 1 for DPO.
|
84 |
|
85 |
+
Training required about 15 hours on a single NVIDIA A6000 GPU (48GB VRAM).
|
86 |
|
87 |
+
For comprehensive training code and details, check out the [๐ Kaggle notebook](https://www.kaggle.com/code/anakin87/post-training-gemma-for-italian-and-beyond).
|
88 |
|
89 |
# ๐๏ธ Training data
|
90 |
The model was trained primarily on Italian data, with a small portion of English data included.
|