python-gpt2 / README.md

End of training

7ef7c5a 8 months ago

5.06 kB

	---
	license: mit
	base_model: gpt2
	tags:
	- generated_from_trainer
	model-index:
	- name: python-gpt2
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# python-gpt2

	This model is a fine-tuned version of [gpt2](https://huggingface.co/gpt2) on an unknown dataset.
	It achieves the following results on the evaluation set:
	- Loss: 2.1448

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 0.0005
	- train_batch_size: 32
	- eval_batch_size: 32
	- seed: 42
	- gradient_accumulation_steps: 8
	- total_train_batch_size: 256
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: cosine
	- lr_scheduler_warmup_steps: 100
	- num_epochs: 1
	- mixed_precision_training: Native AMP

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:------:\|:----:\|:---------------:\|
	\| 9.2956 \| 0.0138 \| 25 \| 7.9483 \|
	\| 6.8319 \| 0.0275 \| 50 \| 6.0463 \|
	\| 5.653 \| 0.0413 \| 75 \| 5.3905 \|
	\| 5.0998 \| 0.0551 \| 100 \| 5.0523 \|
	\| 4.7296 \| 0.0688 \| 125 \| 4.7295 \|
	\| 4.4676 \| 0.0826 \| 150 \| 4.4801 \|
	\| 4.2285 \| 0.0964 \| 175 \| 4.2580 \|
	\| 4.0335 \| 0.1101 \| 200 \| 4.0891 \|
	\| 3.8654 \| 0.1239 \| 225 \| 3.9376 \|
	\| 3.7442 \| 0.1377 \| 250 \| 3.8222 \|
	\| 3.6155 \| 0.1514 \| 275 \| 3.7006 \|
	\| 3.4805 \| 0.1652 \| 300 \| 3.5997 \|
	\| 3.3804 \| 0.1790 \| 325 \| 3.4840 \|
	\| 3.3074 \| 0.1927 \| 350 \| 3.3887 \|
	\| 3.1737 \| 0.2065 \| 375 \| 3.2711 \|
	\| 3.0593 \| 0.2203 \| 400 \| 3.1535 \|
	\| 2.9634 \| 0.2340 \| 425 \| 3.0443 \|
	\| 2.887 \| 0.2478 \| 450 \| 2.9574 \|
	\| 2.7808 \| 0.2616 \| 475 \| 2.8775 \|
	\| 2.7117 \| 0.2753 \| 500 \| 2.8190 \|
	\| 2.6611 \| 0.2891 \| 525 \| 2.7515 \|
	\| 2.6141 \| 0.3029 \| 550 \| 2.7097 \|
	\| 2.5752 \| 0.3167 \| 575 \| 2.6704 \|
	\| 2.5038 \| 0.3304 \| 600 \| 2.6307 \|
	\| 2.4852 \| 0.3442 \| 625 \| 2.6004 \|
	\| 2.4638 \| 0.3580 \| 650 \| 2.5696 \|
	\| 2.4362 \| 0.3717 \| 675 \| 2.5343 \|
	\| 2.3896 \| 0.3855 \| 700 \| 2.5131 \|
	\| 2.3669 \| 0.3993 \| 725 \| 2.4886 \|
	\| 2.3174 \| 0.4130 \| 750 \| 2.4695 \|
	\| 2.3152 \| 0.4268 \| 775 \| 2.4478 \|
	\| 2.2916 \| 0.4406 \| 800 \| 2.4271 \|
	\| 2.2743 \| 0.4543 \| 825 \| 2.4166 \|
	\| 2.2555 \| 0.4681 \| 850 \| 2.3959 \|
	\| 2.2545 \| 0.4819 \| 875 \| 2.3794 \|
	\| 2.2291 \| 0.4956 \| 900 \| 2.3645 \|
	\| 2.2032 \| 0.5094 \| 925 \| 2.3499 \|
	\| 2.1842 \| 0.5232 \| 950 \| 2.3382 \|
	\| 2.1505 \| 0.5369 \| 975 \| 2.3263 \|
	\| 2.1668 \| 0.5507 \| 1000 \| 2.3147 \|
	\| 2.1649 \| 0.5645 \| 1025 \| 2.3072 \|
	\| 2.1427 \| 0.5782 \| 1050 \| 2.2926 \|
	\| 2.1051 \| 0.5920 \| 1075 \| 2.2799 \|
	\| 2.0792 \| 0.6058 \| 1100 \| 2.2708 \|
	\| 2.1171 \| 0.6195 \| 1125 \| 2.2570 \|
	\| 2.1012 \| 0.6333 \| 1150 \| 2.2470 \|
	\| 2.0853 \| 0.6471 \| 1175 \| 2.2405 \|
	\| 2.0786 \| 0.6608 \| 1200 \| 2.2312 \|
	\| 2.0664 \| 0.6746 \| 1225 \| 2.2238 \|
	\| 2.0706 \| 0.6884 \| 1250 \| 2.2183 \|
	\| 2.0557 \| 0.7021 \| 1275 \| 2.2102 \|
	\| 2.0404 \| 0.7159 \| 1300 \| 2.2042 \|
	\| 2.0493 \| 0.7297 \| 1325 \| 2.1978 \|
	\| 2.0373 \| 0.7434 \| 1350 \| 2.1907 \|
	\| 2.0093 \| 0.7572 \| 1375 \| 2.1837 \|
	\| 2.0228 \| 0.7710 \| 1400 \| 2.1819 \|
	\| 2.0147 \| 0.7847 \| 1425 \| 2.1739 \|
	\| 2.0206 \| 0.7985 \| 1450 \| 2.1694 \|
	\| 2.0156 \| 0.8123 \| 1475 \| 2.1671 \|
	\| 2.0126 \| 0.8260 \| 1500 \| 2.1622 \|
	\| 1.9834 \| 0.8398 \| 1525 \| 2.1598 \|
	\| 2.0182 \| 0.8536 \| 1550 \| 2.1558 \|
	\| 1.9876 \| 0.8674 \| 1575 \| 2.1543 \|
	\| 1.9914 \| 0.8811 \| 1600 \| 2.1515 \|
	\| 1.9933 \| 0.8949 \| 1625 \| 2.1498 \|
	\| 1.9945 \| 0.9087 \| 1650 \| 2.1483 \|
	\| 1.9733 \| 0.9224 \| 1675 \| 2.1470 \|
	\| 1.9778 \| 0.9362 \| 1700 \| 2.1467 \|
	\| 1.983 \| 0.9500 \| 1725 \| 2.1454 \|
	\| 1.9716 \| 0.9637 \| 1750 \| 2.1453 \|
	\| 1.9668 \| 0.9775 \| 1775 \| 2.1449 \|
	\| 1.9733 \| 0.9913 \| 1800 \| 2.1448 \|


	### Framework versions

	- Transformers 4.40.1
	- Pytorch 2.2.0+cu121
	- Datasets 2.19.0
	- Tokenizers 0.19.1

	---
	license: mit
	base_model: gpt2
	tags:
	- generated_from_trainer
	model-index:
	- name: python-gpt2
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# python-gpt2

	This model is a fine-tuned version of [gpt2](https://huggingface.co/gpt2) on an unknown dataset.
	It achieves the following results on the evaluation set:
	- Loss: 2.1448

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 0.0005
	- train_batch_size: 32
	- eval_batch_size: 32
	- seed: 42
	- gradient_accumulation_steps: 8
	- total_train_batch_size: 256
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: cosine
	- lr_scheduler_warmup_steps: 100
	- num_epochs: 1
	- mixed_precision_training: Native AMP

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:------:\|:----:\|:---------------:\|
	\| 9.2956 \| 0.0138 \| 25 \| 7.9483 \|
	\| 6.8319 \| 0.0275 \| 50 \| 6.0463 \|
	\| 5.653 \| 0.0413 \| 75 \| 5.3905 \|
	\| 5.0998 \| 0.0551 \| 100 \| 5.0523 \|
	\| 4.7296 \| 0.0688 \| 125 \| 4.7295 \|
	\| 4.4676 \| 0.0826 \| 150 \| 4.4801 \|
	\| 4.2285 \| 0.0964 \| 175 \| 4.2580 \|
	\| 4.0335 \| 0.1101 \| 200 \| 4.0891 \|
	\| 3.8654 \| 0.1239 \| 225 \| 3.9376 \|
	\| 3.7442 \| 0.1377 \| 250 \| 3.8222 \|
	\| 3.6155 \| 0.1514 \| 275 \| 3.7006 \|
	\| 3.4805 \| 0.1652 \| 300 \| 3.5997 \|
	\| 3.3804 \| 0.1790 \| 325 \| 3.4840 \|
	\| 3.3074 \| 0.1927 \| 350 \| 3.3887 \|
	\| 3.1737 \| 0.2065 \| 375 \| 3.2711 \|
	\| 3.0593 \| 0.2203 \| 400 \| 3.1535 \|
	\| 2.9634 \| 0.2340 \| 425 \| 3.0443 \|
	\| 2.887 \| 0.2478 \| 450 \| 2.9574 \|
	\| 2.7808 \| 0.2616 \| 475 \| 2.8775 \|
	\| 2.7117 \| 0.2753 \| 500 \| 2.8190 \|
	\| 2.6611 \| 0.2891 \| 525 \| 2.7515 \|
	\| 2.6141 \| 0.3029 \| 550 \| 2.7097 \|
	\| 2.5752 \| 0.3167 \| 575 \| 2.6704 \|
	\| 2.5038 \| 0.3304 \| 600 \| 2.6307 \|
	\| 2.4852 \| 0.3442 \| 625 \| 2.6004 \|
	\| 2.4638 \| 0.3580 \| 650 \| 2.5696 \|
	\| 2.4362 \| 0.3717 \| 675 \| 2.5343 \|
	\| 2.3896 \| 0.3855 \| 700 \| 2.5131 \|
	\| 2.3669 \| 0.3993 \| 725 \| 2.4886 \|
	\| 2.3174 \| 0.4130 \| 750 \| 2.4695 \|
	\| 2.3152 \| 0.4268 \| 775 \| 2.4478 \|
	\| 2.2916 \| 0.4406 \| 800 \| 2.4271 \|
	\| 2.2743 \| 0.4543 \| 825 \| 2.4166 \|
	\| 2.2555 \| 0.4681 \| 850 \| 2.3959 \|
	\| 2.2545 \| 0.4819 \| 875 \| 2.3794 \|
	\| 2.2291 \| 0.4956 \| 900 \| 2.3645 \|
	\| 2.2032 \| 0.5094 \| 925 \| 2.3499 \|
	\| 2.1842 \| 0.5232 \| 950 \| 2.3382 \|
	\| 2.1505 \| 0.5369 \| 975 \| 2.3263 \|
	\| 2.1668 \| 0.5507 \| 1000 \| 2.3147 \|
	\| 2.1649 \| 0.5645 \| 1025 \| 2.3072 \|
	\| 2.1427 \| 0.5782 \| 1050 \| 2.2926 \|
	\| 2.1051 \| 0.5920 \| 1075 \| 2.2799 \|
	\| 2.0792 \| 0.6058 \| 1100 \| 2.2708 \|
	\| 2.1171 \| 0.6195 \| 1125 \| 2.2570 \|
	\| 2.1012 \| 0.6333 \| 1150 \| 2.2470 \|
	\| 2.0853 \| 0.6471 \| 1175 \| 2.2405 \|
	\| 2.0786 \| 0.6608 \| 1200 \| 2.2312 \|
	\| 2.0664 \| 0.6746 \| 1225 \| 2.2238 \|
	\| 2.0706 \| 0.6884 \| 1250 \| 2.2183 \|
	\| 2.0557 \| 0.7021 \| 1275 \| 2.2102 \|
	\| 2.0404 \| 0.7159 \| 1300 \| 2.2042 \|
	\| 2.0493 \| 0.7297 \| 1325 \| 2.1978 \|
	\| 2.0373 \| 0.7434 \| 1350 \| 2.1907 \|
	\| 2.0093 \| 0.7572 \| 1375 \| 2.1837 \|
	\| 2.0228 \| 0.7710 \| 1400 \| 2.1819 \|
	\| 2.0147 \| 0.7847 \| 1425 \| 2.1739 \|
	\| 2.0206 \| 0.7985 \| 1450 \| 2.1694 \|
	\| 2.0156 \| 0.8123 \| 1475 \| 2.1671 \|
	\| 2.0126 \| 0.8260 \| 1500 \| 2.1622 \|
	\| 1.9834 \| 0.8398 \| 1525 \| 2.1598 \|
	\| 2.0182 \| 0.8536 \| 1550 \| 2.1558 \|
	\| 1.9876 \| 0.8674 \| 1575 \| 2.1543 \|
	\| 1.9914 \| 0.8811 \| 1600 \| 2.1515 \|
	\| 1.9933 \| 0.8949 \| 1625 \| 2.1498 \|
	\| 1.9945 \| 0.9087 \| 1650 \| 2.1483 \|
	\| 1.9733 \| 0.9224 \| 1675 \| 2.1470 \|
	\| 1.9778 \| 0.9362 \| 1700 \| 2.1467 \|
	\| 1.983 \| 0.9500 \| 1725 \| 2.1454 \|
	\| 1.9716 \| 0.9637 \| 1750 \| 2.1453 \|
	\| 1.9668 \| 0.9775 \| 1775 \| 2.1449 \|
	\| 1.9733 \| 0.9913 \| 1800 \| 2.1448 \|


	### Framework versions

	- Transformers 4.40.1
	- Pytorch 2.2.0+cu121
	- Datasets 2.19.0
	- Tokenizers 0.19.1