aisuko's picture
End of training
29428ff verified
|
raw
history blame
10.9 kB
metadata
license: apache-2.0
base_model: HuggingFaceTB/SmolLM-135M-Instruct
tags:
  - trl
  - orpo
  - generated_from_trainer
model-index:
  - name: ft-smollm-135M-instruct-on-hf-ultrafeedback
    results: []

ft-smollm-135M-instruct-on-hf-ultrafeedback

This model is a fine-tuned version of HuggingFaceTB/SmolLM-135M-Instruct on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:

  • Loss: 1.0652
  • Rewards/chosen: -0.1245
  • Rewards/rejected: -0.1253
  • Rewards/accuracies: 0.4770
  • Rewards/margins: 0.0008
  • Logps/rejected: -1.2525
  • Logps/chosen: -1.2449
  • Logits/rejected: 52.1922
  • Logits/chosen: 51.8967
  • Nll Loss: 0.9899
  • Log Odds Ratio: -0.7525
  • Log Odds Chosen: 0.0414

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0003
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 8
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen Nll Loss Log Odds Ratio Log Odds Chosen
2.2135 0.03 100 1.1267 -0.1300 -0.1302 0.4650 0.0001 -1.3019 -1.3005 19.2030 19.1013 1.0525 -0.7427 0.0087
1.1677 0.05 200 1.1234 -0.1281 -0.1280 0.4670 -0.0001 -1.2803 -1.2809 27.7444 27.6236 1.0481 -0.7528 0.0127
1.1676 0.08 300 1.1302 -0.1291 -0.1286 0.4660 -0.0004 -1.2865 -1.2908 30.9616 30.8646 1.0544 -0.7587 0.0102
1.133 0.11 400 1.1538 -0.1322 -0.1315 0.4510 -0.0008 -1.3145 -1.3223 33.4234 33.3438 1.0772 -0.7658 0.0093
1.1642 0.13 500 1.1382 -0.1315 -0.1309 0.4520 -0.0006 -1.3092 -1.3149 34.6557 34.5676 1.0623 -0.7593 0.0099
1.1315 0.16 600 1.1392 -0.1315 -0.1306 0.4560 -0.0009 -1.3063 -1.3154 36.8073 36.6894 1.0628 -0.7639 0.0066
1.1564 0.19 700 1.1323 -0.1313 -0.1307 0.4710 -0.0005 -1.3073 -1.3126 38.2088 38.0446 1.0565 -0.7576 0.0112
1.1562 0.21 800 1.1310 -0.1314 -0.1313 0.4640 -0.0000 -1.3133 -1.3136 40.0474 39.8232 1.0554 -0.7559 0.0252
1.1665 0.24 900 1.1220 -0.1307 -0.1301 0.4570 -0.0006 -1.3013 -1.3069 40.6970 40.5118 1.0462 -0.7580 0.0126
1.1713 0.27 1000 1.1329 -0.1315 -0.1309 0.4580 -0.0005 -1.3093 -1.3146 42.3554 42.1528 1.0565 -0.7633 0.0184
1.1306 0.29 1100 1.1211 -0.1310 -0.1304 0.4560 -0.0006 -1.3039 -1.3098 42.6754 42.5111 1.0451 -0.7594 0.0122
1.1215 0.32 1200 1.1273 -0.1313 -0.1306 0.4570 -0.0007 -1.3056 -1.3128 44.4291 44.2082 1.0511 -0.7615 0.0113
1.1383 0.35 1300 1.1156 -0.1298 -0.1293 0.4600 -0.0006 -1.2926 -1.2984 44.8096 44.6178 1.0392 -0.7638 0.0168
1.1549 0.37 1400 1.1090 -0.1292 -0.1290 0.4640 -0.0003 -1.2898 -1.2924 45.3797 45.1471 1.0332 -0.7587 0.0223
1.1376 0.4 1500 1.1113 -0.1296 -0.1294 0.4650 -0.0002 -1.2935 -1.2958 46.4136 46.1814 1.0354 -0.7591 0.0207
1.1355 0.43 1600 1.1051 -0.1286 -0.1284 0.4660 -0.0002 -1.2839 -1.2858 46.8894 46.6616 1.0290 -0.7612 0.0219
1.0894 0.45 1700 1.1001 -0.1282 -0.1281 0.4670 -0.0001 -1.2810 -1.2824 46.8995 46.7032 1.0238 -0.7621 0.0317
1.1561 0.48 1800 1.0976 -0.1283 -0.1281 0.4740 -0.0002 -1.2811 -1.2829 47.7268 47.4906 1.0219 -0.7573 0.0210
1.0969 0.51 1900 1.0952 -0.1277 -0.1274 0.4710 -0.0003 -1.2738 -1.2771 48.0909 47.8791 1.0190 -0.7626 0.0221
1.1034 0.53 2000 1.0971 -0.1277 -0.1274 0.4650 -0.0004 -1.2736 -1.2774 48.6271 48.4186 1.0209 -0.7622 0.0210
1.0806 0.56 2100 1.0894 -0.1275 -0.1274 0.4730 -0.0001 -1.2743 -1.2750 48.9781 48.7443 1.0139 -0.7556 0.0238
1.1148 0.59 2200 1.0917 -0.1282 -0.1290 0.4770 0.0008 -1.2896 -1.2820 49.9987 49.7273 1.0168 -0.7496 0.0411
1.106 0.61 2300 1.0866 -0.1273 -0.1276 0.4760 0.0003 -1.2757 -1.2726 49.6562 49.4520 1.0112 -0.7538 0.0327
1.1022 0.64 2400 1.0876 -0.1268 -0.1268 0.4700 -0.0000 -1.2682 -1.2683 50.6454 50.3935 1.0117 -0.7590 0.0296
1.0777 0.67 2500 1.0871 -0.1268 -0.1268 0.4690 0.0001 -1.2684 -1.2677 50.7985 50.5549 1.0112 -0.7592 0.0329
1.1016 0.69 2600 1.0805 -0.1265 -0.1273 0.4770 0.0008 -1.2729 -1.2654 51.1070 50.8537 1.0054 -0.7503 0.0416
1.1123 0.72 2700 1.0785 -0.1255 -0.1253 0.4730 -0.0002 -1.2534 -1.2552 51.0774 50.8296 1.0024 -0.7613 0.0234
1.1172 0.75 2800 1.0736 -0.1252 -0.1253 0.4750 0.0002 -1.2533 -1.2517 51.2562 50.9836 0.9979 -0.7572 0.0271
1.0614 0.77 2900 1.0718 -0.1252 -0.1259 0.4760 0.0007 -1.2591 -1.2521 51.5419 51.2800 0.9964 -0.7537 0.0404
1.0896 0.8 3000 1.0695 -0.1261 -0.1277 0.4810 0.0016 -1.2773 -1.2611 51.5967 51.3290 0.9951 -0.7439 0.0530
1.0908 0.83 3100 1.0711 -0.1249 -0.1251 0.4760 0.0002 -1.2512 -1.2489 52.0281 51.7418 0.9954 -0.7572 0.0330
1.09 0.85 3200 1.0676 -0.1245 -0.1247 0.4720 0.0002 -1.2467 -1.2450 52.0018 51.7152 0.9920 -0.7566 0.0315
1.0677 0.88 3300 1.0657 -0.1244 -0.1248 0.4740 0.0005 -1.2482 -1.2435 52.0825 51.7926 0.9902 -0.7552 0.0390
1.0712 0.91 3400 1.0644 -0.1244 -0.1250 0.4760 0.0007 -1.2504 -1.2437 52.0637 51.7715 0.9891 -0.7529 0.0402
1.0732 0.93 3500 1.0642 -0.1244 -0.1251 0.4770 0.0007 -1.2510 -1.2438 52.1319 51.8349 0.9889 -0.7526 0.0404
1.0669 0.96 3600 1.0647 -0.1244 -0.1252 0.4770 0.0007 -1.2518 -1.2443 52.1397 51.8447 0.9894 -0.7525 0.0411
1.0774 0.99 3700 1.0652 -0.1245 -0.1253 0.4770 0.0008 -1.2525 -1.2449 52.1922 51.8967 0.9899 -0.7525 0.0414

Framework versions

  • Transformers 4.39.3
  • Pytorch 2.1.2
  • Datasets 2.18.0
  • Tokenizers 0.15.2