--- datasets: - lorinma/IE_Sharegpt_zh language: - zh pipeline_tag: text-generation --- An LLM for Chinese Information Extraction. 基于Baichuan-7B,使用8张A800进行了全参数SFT。目的是使用一个强基座模型复现[zju cama](https://github.com/zjunlp/KnowLM) 对于SFT的数据进行了扩充: ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6413d7be996b2e426f230fb7/-ztfAUGqxUwCdguLsofmy.png) 并没有跑Eval,欢迎提供! 训练用的Codebase是来自于[shibing624大佬](https://github.com/shibing624/MedicalGPT) 使用的Bash如下 ``` CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 torchrun --nproc_per_node 8 ../supervised_finetuning.py \ --model_type baichuan \ --model_name_or_path /data/llm/models/Pretrained/Baichuan-7B/ \ --train_file_dir ../data/finetune/1124_IELLM/ \ --per_device_train_batch_size 8 \ --do_train \ --use_peft False \ --num_train_epochs 3 \ --learning_rate 2e-5 \ --warmup_ratio 0.03 \ --weight_decay 0. \ --fp16 \ --logging_strategy steps \ --logging_steps 10 \ --save_strategy epoch \ --save_total_limit 5 \ --gradient_accumulation_steps 1 \ --preprocessing_num_workers 8 \ --output_dir ../results/20231124_IELLM \ --overwrite_output_dir \ --ddp_timeout 30000 \ --logging_first_step True \ --torch_dtype float16 \ --device_map auto \ --report_to tensorboard \ --ddp_find_unused_parameters False \ --gradient_checkpointing True \ --cache_dir ./cache \ --model_max_length 2048 \ --deepspeed ../deepspeed_zero_stage2_config.json \ --template_name baichuan \ --flash_attn ``` ``` ***** train metrics ***** epoch = 3.0 train_loss = 0.1012 train_runtime = 1 day, 14:16:59.20 train_samples = 376031 train_samples_per_second = 8.185 train_steps_per_second = 0.128 ``` ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6413d7be996b2e426f230fb7/gmtZh9d2HJ5EkxZURtr-J.png) 测试结果: ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6413d7be996b2e426f230fb7/uARr0XOturW2aKVDRzXQe.png) ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6413d7be996b2e426f230fb7/S6XDGjaXY6E2qpy-nhf0p.png)