簡介

Riyuechang/Breeze-7B-PTT-Chat-v1所使用的，未與主模型MediaTek-Research/Breeze-7B-Instruct-v1_0合併的lora模型

注意!!

此Lora模型有使用Dora技術，Dora能讓模型有更好的學習效率
代價就是會讓訓練和推理花費的時間大幅上升，尤其是推理的速度會非常慢
建議把此Lora模型跟主模型合併後在進行推理

設備

Ubuntu 22.04.4 LTS
NVIDIA GeForce RTX 3060 12G

Lora參數

r=8,
lora_alpha=32,
lora_dropout=0.1,
task_type="CAUSAL_LM",
target_modules="all-linear",
bias="none",
use_dora=True,
use_rslora=True

訓練參數

per_device_train_batch_size=28,  
gradient_accumulation_steps=1,  
num_train_epochs=3,  
warmup_ratio=0.1,  
learning_rate=2e-5,  
bf16=True,  
save_strategy="steps",  
save_steps=500,  
save_total_limit=10,  
logging_steps=10,  
output_dir=log_output,  
optim="paged_adamw_8bit",  
gradient_checkpointing=True

結果

loss: 1.1035

Riyuechang
/

Breeze-7B-PTT-Chat-v1_lora

簡介

注意!!

設備

Lora參數

訓練參數

結果

Model tree for Riyuechang/Breeze-7B-PTT-Chat-v1_lora

Dataset used to train Riyuechang/Breeze-7B-PTT-Chat-v1_lora