About the training details
#5
by
hiyouga
- opened
It's great to see the impressive work on the edge-side model for long inference. We noticed that you might have used LlamaFactory for fine-tuning the models. To provide more clarity, could you please include details about the training framework used in the model card? Thanks!
Btw, it would greatly enhance the user experience if a Colab notebook for local deployment is available π
This is my config yaml
### model
model_name_or_path: /home/syx/Qwen2.5-3B-Instruct
### method
stage: sft
do_train: true
finetuning_type: full
deepspeed: examples/deepspeed/ds_z3_config.json
### dataset
dataset: o1-v2, o1-v3
template: qwen
neat_packing: true
cutoff_len: 16384
overwrite_cache: true
preprocessing_num_workers: 16
### output
output_dir: saves/qwen2-01-qat/full/sft
logging_steps: 1
save_steps: 100
plot_loss: true
overwrite_output_dir: true
I will upload a colab notebook recently. Thanks for this nice advice.
Great, how about adding them to readme file for better reproducibility?
Thanks for advice. I will add it to the README.