anakin87 commited on
Commit
85252ea
ยท
verified ยท
1 Parent(s): 5b90b4a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -2
README.md CHANGED
@@ -78,11 +78,13 @@ The model was fine-tuned using [Hugging Face TRL](https://huggingface.co/docs/tr
78
  The training involved Instruction Fine Tuning and Direct Preference Optimization.
79
 
80
  I adopted a relatively new technique for parameter-efficient learning: [Spectrum](https://arxiv.org/abs/2406.06623). The idea is to train only the layers of the model with high Signal-to-Noise Ratio (SNR) and โ„๏ธ freeze the rest.
 
81
 
82
- Training required about 15 hours on a single NVIDIA A6000 GPU (48GB VRAM).
83
 
84
- For comprehensive training details, check out the [๐Ÿ““ Kaggle notebook](https://www.kaggle.com/code/anakin87/post-training-gemma-for-italian-and-beyond).
85
 
 
86
 
87
  # ๐Ÿ—ƒ๏ธ Training data
88
  The model was trained primarily on Italian data, with a small portion of English data included.
 
78
  The training involved Instruction Fine Tuning and Direct Preference Optimization.
79
 
80
  I adopted a relatively new technique for parameter-efficient learning: [Spectrum](https://arxiv.org/abs/2406.06623). The idea is to train only the layers of the model with high Signal-to-Noise Ratio (SNR) and โ„๏ธ freeze the rest.
81
+ Specifically, training focused on the top 25% most informative layers.
82
 
83
+ Batch size: 16; learning rate: 5e-6; epochs: 1 for SFT and 1 for DPO.
84
 
85
+ Training required about 15 hours on a single NVIDIA A6000 GPU (48GB VRAM).
86
 
87
+ For comprehensive training code and details, check out the [๐Ÿ““ Kaggle notebook](https://www.kaggle.com/code/anakin87/post-training-gemma-for-italian-and-beyond).
88
 
89
  # ๐Ÿ—ƒ๏ธ Training data
90
  The model was trained primarily on Italian data, with a small portion of English data included.