metadata

license: apache-2.0
library_name: peft
tags:
  - trl
  - dpo
  - generated_from_trainer
base_model: mistralai/Mistral-7B-v0.1
model-index:
  - name: zephyr-7b
    results: []

zephyr-7b

This model is a fine-tuned version of mistralai/Mistral-7B-v0.1 on the None dataset. It achieves the following results on the evaluation set:

Loss: 0.3625
Rewards/chosen: -150.6127
Rewards/rejected: -146.3050
Rewards/accuracies: 0.2421
Rewards/margins: -4.3077
Logps/rejected: -14705.8975
Logps/chosen: -15130.1680
Logits/rejected: 13.5362
Logits/chosen: 13.4716
Use Label: 11165.9844
Pred Label: 7522.0161

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-06
train_batch_size: 4
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 4
gradient_accumulation_steps: 4
total_train_batch_size: 64
total_eval_batch_size: 32
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen	Use Label	Pred Label
0.6637	0.1	100	0.6642	-0.0947	-0.1635	0.3254	0.0687	-91.7446	-78.3734	-2.0927	-2.1253	1838.9207	17.0794
0.3902	0.21	200	0.3930	-14.4219	-14.0352	0.2560	-0.3866	-1478.9202	-1511.0870	2.8471	2.7727	3444.6985	515.3016
0.3845	0.31	300	0.3786	-23.0869	-24.5685	0.2520	1.4817	-2532.2498	-2377.5872	5.4283	5.3070	4579.4922	1484.5079
0.3477	0.42	400	0.3622	-111.3259	-109.5294	0.25	-1.7965	-11028.3408	-11201.4893	11.6816	11.5716	5682.4922	2485.5081
0.3468	0.52	500	0.3613	-144.7782	-140.7408	0.2421	-4.0373	-14149.4824	-14546.7158	13.8885	13.8347	6784.2383	3487.7620
0.33	0.63	600	0.3605	-143.0167	-138.8336	0.2401	-4.1831	-13958.7627	-14370.5693	12.5943	12.5399	7857.4287	4518.5713
0.3665	0.73	700	0.3614	-150.1877	-145.8865	0.2421	-4.3011	-14664.0518	-15087.6680	13.4024	13.3367	8936.4287	5543.5713
0.3731	0.84	800	0.3623	-150.4385	-146.1303	0.2401	-4.3082	-14688.4258	-15112.7539	13.5339	13.4696	10050.3330	6533.6665
0.3696	0.94	900	0.3625	-150.6127	-146.3050	0.2421	-4.3077	-14705.8975	-15130.1680	13.5362	13.4716	11165.9844	7522.0161

Framework versions

PEFT 0.7.1
Transformers 4.38.2
Pytorch 2.1.1+cu121
Datasets 2.14.6
Tokenizers 0.15.2