python-gpt2 / README.md
MadMarx37's picture
End of training
7ef7c5a
metadata
license: mit
base_model: gpt2
tags:
  - generated_from_trainer
model-index:
  - name: python-gpt2
    results: []

python-gpt2

This model is a fine-tuned version of gpt2 on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 2.1448

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0005
  • train_batch_size: 32
  • eval_batch_size: 32
  • seed: 42
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 256
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 100
  • num_epochs: 1
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
9.2956 0.0138 25 7.9483
6.8319 0.0275 50 6.0463
5.653 0.0413 75 5.3905
5.0998 0.0551 100 5.0523
4.7296 0.0688 125 4.7295
4.4676 0.0826 150 4.4801
4.2285 0.0964 175 4.2580
4.0335 0.1101 200 4.0891
3.8654 0.1239 225 3.9376
3.7442 0.1377 250 3.8222
3.6155 0.1514 275 3.7006
3.4805 0.1652 300 3.5997
3.3804 0.1790 325 3.4840
3.3074 0.1927 350 3.3887
3.1737 0.2065 375 3.2711
3.0593 0.2203 400 3.1535
2.9634 0.2340 425 3.0443
2.887 0.2478 450 2.9574
2.7808 0.2616 475 2.8775
2.7117 0.2753 500 2.8190
2.6611 0.2891 525 2.7515
2.6141 0.3029 550 2.7097
2.5752 0.3167 575 2.6704
2.5038 0.3304 600 2.6307
2.4852 0.3442 625 2.6004
2.4638 0.3580 650 2.5696
2.4362 0.3717 675 2.5343
2.3896 0.3855 700 2.5131
2.3669 0.3993 725 2.4886
2.3174 0.4130 750 2.4695
2.3152 0.4268 775 2.4478
2.2916 0.4406 800 2.4271
2.2743 0.4543 825 2.4166
2.2555 0.4681 850 2.3959
2.2545 0.4819 875 2.3794
2.2291 0.4956 900 2.3645
2.2032 0.5094 925 2.3499
2.1842 0.5232 950 2.3382
2.1505 0.5369 975 2.3263
2.1668 0.5507 1000 2.3147
2.1649 0.5645 1025 2.3072
2.1427 0.5782 1050 2.2926
2.1051 0.5920 1075 2.2799
2.0792 0.6058 1100 2.2708
2.1171 0.6195 1125 2.2570
2.1012 0.6333 1150 2.2470
2.0853 0.6471 1175 2.2405
2.0786 0.6608 1200 2.2312
2.0664 0.6746 1225 2.2238
2.0706 0.6884 1250 2.2183
2.0557 0.7021 1275 2.2102
2.0404 0.7159 1300 2.2042
2.0493 0.7297 1325 2.1978
2.0373 0.7434 1350 2.1907
2.0093 0.7572 1375 2.1837
2.0228 0.7710 1400 2.1819
2.0147 0.7847 1425 2.1739
2.0206 0.7985 1450 2.1694
2.0156 0.8123 1475 2.1671
2.0126 0.8260 1500 2.1622
1.9834 0.8398 1525 2.1598
2.0182 0.8536 1550 2.1558
1.9876 0.8674 1575 2.1543
1.9914 0.8811 1600 2.1515
1.9933 0.8949 1625 2.1498
1.9945 0.9087 1650 2.1483
1.9733 0.9224 1675 2.1470
1.9778 0.9362 1700 2.1467
1.983 0.9500 1725 2.1454
1.9716 0.9637 1750 2.1453
1.9668 0.9775 1775 2.1449
1.9733 0.9913 1800 2.1448

Framework versions

  • Transformers 4.40.1
  • Pytorch 2.2.0+cu121
  • Datasets 2.19.0
  • Tokenizers 0.19.1