About the training details

by hiyouga - opened 3 days ago

3 days ago

It's great to see the impressive work on the edge-side model for long inference. We noticed that you might have used LlamaFactory for fine-tuning the models. To provide more clarity, could you please include details about the training framework used in the model card? Thanks!

hiyouga

3 days ago

Btw, it would greatly enhance the user experience if a Colab notebook for local deployment is available 🙌

yixinsong

PowerInfer org 3 days ago

•

edited 1 day ago

This is my config yaml

### model
model_name_or_path: /home/syx/Qwen2.5-3B-Instruct

### method
stage: sft
do_train: true
finetuning_type: full
deepspeed: examples/deepspeed/ds_z3_config.json

### dataset
dataset: o1-v2, o1-v3
template: qwen
neat_packing: true
cutoff_len: 16384
overwrite_cache: true
preprocessing_num_workers: 16

### output
output_dir: saves/qwen2-01-qat/full/sft
logging_steps: 1
save_steps: 100
plot_loss: true
overwrite_output_dir: true

yixinsong

PowerInfer org 3 days ago

I will upload a colab notebook recently. Thanks for this nice advice.

hiyouga

1 day ago

Great, how about adding them to readme file for better reproducibility?

yixinsong

PowerInfer org 1 day ago

Thanks for advice. I will add it to the README.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment