zephyr-7b-dpo-qlora

This model is a fine-tuned version of alignment-handbook/zephyr-7b-sft-qlora on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:

  • Loss: 0.4880
  • Rewards/chosen: -2.8615
  • Rewards/rejected: -3.9313
  • Rewards/accuracies: 0.7262
  • Rewards/margins: 1.0698
  • Logps/rejected: -626.2534
  • Logps/chosen: -549.3907
  • Logits/rejected: 1.3412
  • Logits/chosen: 0.7713

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 1
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 3
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 12
  • total_eval_batch_size: 24
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6884 0.02 100 0.6868 0.0390 0.0284 0.6146 0.0106 -230.2779 -259.3362 -2.3476 -2.3366
0.6654 0.04 200 0.6657 0.0334 -0.0194 0.6399 0.0528 -235.0622 -259.9052 -2.2635 -2.2585
0.6346 0.06 300 0.6431 -0.2564 -0.3692 0.6533 0.1128 -270.0399 -288.8787 -2.2107 -2.2217
0.5888 0.08 400 0.6162 -0.4195 -0.6312 0.6518 0.2118 -296.2420 -305.1884 -1.9579 -1.9905
0.5806 0.1 500 0.5916 -1.3171 -1.6507 0.6637 0.3337 -398.1920 -394.9468 -0.4990 -0.5253
0.6219 0.12 600 0.5753 -1.1344 -1.5063 0.6503 0.3719 -383.7478 -376.6808 0.0384 -0.0361
0.5586 0.14 700 0.5733 -0.7892 -1.1878 0.6667 0.3986 -351.8957 -342.1609 0.3073 0.2473
0.6123 0.16 800 0.5578 -1.2731 -1.7042 0.6652 0.4311 -403.5397 -390.5542 1.0809 1.0327
0.555 0.18 900 0.5461 -1.1941 -1.8087 0.6771 0.6146 -413.9875 -382.6491 1.4158 1.3993
0.4905 0.2 1000 0.5463 -1.2469 -1.9528 0.6890 0.7058 -428.3945 -387.9334 0.8211 0.7732
0.5214 0.22 1100 0.5356 -1.2786 -1.8992 0.6979 0.6206 -423.0347 -391.1008 1.3945 1.4163
0.4988 0.24 1200 0.5307 -1.2179 -1.9293 0.6979 0.7115 -426.0503 -385.0261 1.0273 0.9228
0.5324 0.26 1300 0.5320 -1.4512 -2.1779 0.7024 0.7267 -450.9060 -408.3595 0.9344 0.5917
0.5286 0.27 1400 0.5193 -1.3777 -2.1412 0.7039 0.7634 -447.2371 -401.0145 1.1979 0.8244
0.6095 0.29 1500 0.5206 -1.1730 -1.8883 0.7009 0.7153 -421.9497 -380.5422 0.3598 -0.0238
0.5627 0.31 1600 0.5225 -1.8811 -2.7733 0.6935 0.8922 -510.4463 -451.3462 0.7395 0.4147
0.5222 0.33 1700 0.5210 -1.1883 -1.8477 0.7143 0.6593 -417.8853 -382.0739 -0.0643 -0.3844
0.5163 0.35 1800 0.5219 -1.1780 -1.9783 0.7247 0.8003 -430.9522 -381.0428 1.3000 0.9605
0.511 0.37 1900 0.5214 -1.8532 -2.7395 0.7188 0.8863 -507.0662 -448.5622 1.3052 0.9550
0.484 0.39 2000 0.5161 -1.7800 -2.6182 0.7188 0.8382 -494.9370 -441.2427 1.6339 1.3132
0.4863 0.41 2100 0.5183 -2.7826 -3.8427 0.7158 1.0600 -617.3857 -541.5035 2.3428 2.0461
0.5233 0.43 2200 0.5115 -1.7702 -2.6185 0.7173 0.8483 -494.9643 -440.2580 0.9791 0.5628
0.5343 0.45 2300 0.5079 -1.4313 -2.2210 0.7202 0.7897 -455.2213 -406.3701 1.0255 0.5469
0.5251 0.47 2400 0.5088 -2.7117 -3.7995 0.7173 1.0878 -613.0708 -534.4126 2.1153 1.5133
0.5104 0.49 2500 0.5006 -2.9970 -4.0022 0.7202 1.0052 -633.3362 -562.9377 2.2889 1.7461
0.429 0.51 2600 0.5238 -3.6282 -4.8032 0.7143 1.1750 -713.4386 -626.0600 3.6631 3.2827
0.4255 0.53 2700 0.4993 -2.4946 -3.5067 0.7188 1.0121 -583.7889 -512.7010 2.1920 1.6873
0.4733 0.55 2800 0.4990 -3.2116 -4.2800 0.7202 1.0684 -661.1174 -584.3987 2.6796 2.2111
0.5394 0.57 2900 0.5040 -2.9132 -3.9276 0.7158 1.0143 -625.8766 -554.5653 1.7758 1.2351
0.5128 0.59 3000 0.5061 -2.5974 -3.5725 0.7173 0.9750 -590.3638 -522.9818 2.1284 1.6663
0.5215 0.61 3100 0.4960 -2.2632 -3.1876 0.7188 0.9245 -551.8787 -489.5560 1.4432 0.8594
0.5023 0.63 3200 0.4999 -2.8630 -3.9641 0.7128 1.1011 -629.5237 -549.5392 1.9057 1.2951
0.5042 0.65 3300 0.4904 -2.8448 -3.8793 0.7307 1.0345 -621.0500 -547.7245 1.9776 1.4334
0.498 0.67 3400 0.4879 -2.8423 -3.8097 0.7321 0.9673 -614.0843 -547.4754 1.4781 0.9608
0.4987 0.69 3500 0.4902 -2.6926 -3.7172 0.7307 1.0246 -604.8372 -532.4977 1.3819 0.8557
0.5824 0.71 3600 0.4908 -2.5673 -3.5933 0.7292 1.0260 -592.4445 -519.9661 1.1037 0.5336
0.425 0.73 3700 0.4906 -2.7666 -3.8246 0.7307 1.0580 -615.5826 -539.9020 1.2903 0.7257
0.4756 0.75 3800 0.4916 -2.8732 -3.9598 0.7292 1.0866 -629.0961 -550.5607 1.5015 0.9387
0.4597 0.77 3900 0.4896 -2.8617 -3.9425 0.7277 1.0808 -627.3712 -549.4086 1.3350 0.7636
0.4649 0.79 4000 0.4885 -2.8682 -3.9370 0.7232 1.0688 -626.8230 -550.0615 1.2903 0.7213
0.4689 0.8 4100 0.4880 -2.8425 -3.9060 0.7232 1.0634 -623.7166 -547.4950 1.2495 0.6763
0.4275 0.82 4200 0.4877 -2.8671 -3.9353 0.7232 1.0682 -626.6478 -549.9532 1.3067 0.7331
0.5325 0.84 4300 0.4881 -2.8855 -3.9630 0.7262 1.0775 -629.4202 -551.7905 1.3795 0.8070
0.532 0.86 4400 0.4881 -2.8672 -3.9406 0.7277 1.0734 -627.1785 -549.9610 1.3435 0.7732
0.4558 0.88 4500 0.4879 -2.8560 -3.9259 0.7262 1.0699 -625.7067 -548.8392 1.3411 0.7711
0.5541 0.9 4600 0.4882 -2.8601 -3.9295 0.7262 1.0694 -626.0704 -549.2481 1.3428 0.7729
0.5743 0.92 4700 0.4879 -2.8641 -3.9344 0.7262 1.0702 -626.5551 -549.6526 1.3445 0.7755
0.4657 0.94 4800 0.4880 -2.8626 -3.9322 0.7292 1.0696 -626.3386 -549.4993 1.3437 0.7749
0.5126 0.96 4900 0.4880 -2.8636 -3.9339 0.7277 1.0703 -626.5126 -549.6042 1.3440 0.7748
0.3967 0.98 5000 0.4880 -2.8643 -3.9344 0.7262 1.0702 -626.5614 -549.6658 1.3424 0.7736

Framework versions

  • PEFT 0.7.1
  • Transformers 4.36.2
  • Pytorch 2.2.1+cu121
  • Datasets 2.14.6
  • Tokenizers 0.15.2
Downloads last month
0
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for chanchan7/zephyr-7b-dpo-qlora

Adapter
(1244)
this model

Dataset used to train chanchan7/zephyr-7b-dpo-qlora