Model save

741ad76 verified 6 months ago

3.33 kB

	---
	tags:
	- generated_from_trainer
	model-index:
	- name: led-large
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# led-large

	This model was trained from scratch on the None dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.1850

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 3e-05
	- train_batch_size: 1
	- eval_batch_size: 1
	- seed: 42
	- gradient_accumulation_steps: 64
	- total_train_batch_size: 64
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: polynomial
	- lr_scheduler_warmup_steps: 500
	- training_steps: 20000
	- mixed_precision_training: Native AMP

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:-----:\|:-----:\|:---------------:\|
	\| 0.1479 \| 0.11 \| 500 \| 0.1901 \|
	\| 0.1442 \| 0.22 \| 1000 \| 0.1917 \|
	\| 0.1466 \| 0.33 \| 1500 \| 0.1959 \|
	\| 0.1447 \| 0.45 \| 2000 \| 0.1918 \|
	\| 0.1633 \| 0.56 \| 2500 \| 0.1874 \|
	\| 0.171 \| 0.67 \| 3000 \| 0.1849 \|
	\| 0.1662 \| 0.78 \| 3500 \| 0.1843 \|
	\| 0.1743 \| 0.89 \| 4000 \| 0.1837 \|
	\| 0.1492 \| 1.0 \| 4500 \| 0.1842 \|
	\| 0.1515 \| 1.11 \| 5000 \| 0.1849 \|
	\| 0.1497 \| 1.23 \| 5500 \| 0.1840 \|
	\| 0.1515 \| 1.34 \| 6000 \| 0.1839 \|
	\| 0.1482 \| 1.45 \| 6500 \| 0.1841 \|
	\| 0.145 \| 1.56 \| 7000 \| 0.1849 \|
	\| 0.1467 \| 1.67 \| 7500 \| 0.1824 \|
	\| 0.1509 \| 1.78 \| 8000 \| 0.1809 \|
	\| 0.15 \| 1.89 \| 8500 \| 0.1832 \|
	\| 0.1383 \| 2.01 \| 9000 \| 0.1831 \|
	\| 0.1331 \| 2.12 \| 9500 \| 0.1820 \|
	\| 0.1406 \| 2.23 \| 10000 \| 0.1830 \|
	\| 0.1362 \| 2.34 \| 10500 \| 0.1844 \|
	\| 0.1373 \| 2.45 \| 11000 \| 0.1836 \|
	\| 0.1269 \| 2.56 \| 11500 \| 0.1842 \|
	\| 0.1362 \| 2.67 \| 12000 \| 0.1819 \|
	\| 0.14 \| 2.79 \| 12500 \| 0.1832 \|
	\| 0.1319 \| 2.9 \| 13000 \| 0.1837 \|
	\| 0.1304 \| 3.01 \| 13500 \| 0.1845 \|
	\| 0.1278 \| 3.12 \| 14000 \| 0.1844 \|
	\| 0.1235 \| 3.23 \| 14500 \| 0.1832 \|
	\| 0.1293 \| 3.34 \| 15000 \| 0.1855 \|
	\| 0.1302 \| 3.45 \| 15500 \| 0.1836 \|
	\| 0.1285 \| 3.57 \| 16000 \| 0.1860 \|
	\| 0.1274 \| 3.68 \| 16500 \| 0.1860 \|
	\| 0.1261 \| 3.79 \| 17000 \| 0.1854 \|
	\| 0.1304 \| 3.9 \| 17500 \| 0.1859 \|
	\| 0.1223 \| 4.01 \| 18000 \| 0.1862 \|
	\| 0.1235 \| 4.12 \| 18500 \| 0.1849 \|
	\| 0.1286 \| 4.23 \| 19000 \| 0.1858 \|
	\| 0.1186 \| 4.35 \| 19500 \| 0.1856 \|
	\| 0.1293 \| 4.46 \| 20000 \| 0.1850 \|


	### Framework versions

	- Transformers 4.37.2
	- Pytorch 2.2.2+cu121
	- Datasets 2.18.0
	- Tokenizers 0.15.1

	---
	tags:
	- generated_from_trainer
	model-index:
	- name: led-large
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# led-large

	This model was trained from scratch on the None dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.1850

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 3e-05
	- train_batch_size: 1
	- eval_batch_size: 1
	- seed: 42
	- gradient_accumulation_steps: 64
	- total_train_batch_size: 64
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: polynomial
	- lr_scheduler_warmup_steps: 500
	- training_steps: 20000
	- mixed_precision_training: Native AMP

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:-----:\|:-----:\|:---------------:\|
	\| 0.1479 \| 0.11 \| 500 \| 0.1901 \|
	\| 0.1442 \| 0.22 \| 1000 \| 0.1917 \|
	\| 0.1466 \| 0.33 \| 1500 \| 0.1959 \|
	\| 0.1447 \| 0.45 \| 2000 \| 0.1918 \|
	\| 0.1633 \| 0.56 \| 2500 \| 0.1874 \|
	\| 0.171 \| 0.67 \| 3000 \| 0.1849 \|
	\| 0.1662 \| 0.78 \| 3500 \| 0.1843 \|
	\| 0.1743 \| 0.89 \| 4000 \| 0.1837 \|
	\| 0.1492 \| 1.0 \| 4500 \| 0.1842 \|
	\| 0.1515 \| 1.11 \| 5000 \| 0.1849 \|
	\| 0.1497 \| 1.23 \| 5500 \| 0.1840 \|
	\| 0.1515 \| 1.34 \| 6000 \| 0.1839 \|
	\| 0.1482 \| 1.45 \| 6500 \| 0.1841 \|
	\| 0.145 \| 1.56 \| 7000 \| 0.1849 \|
	\| 0.1467 \| 1.67 \| 7500 \| 0.1824 \|
	\| 0.1509 \| 1.78 \| 8000 \| 0.1809 \|
	\| 0.15 \| 1.89 \| 8500 \| 0.1832 \|
	\| 0.1383 \| 2.01 \| 9000 \| 0.1831 \|
	\| 0.1331 \| 2.12 \| 9500 \| 0.1820 \|
	\| 0.1406 \| 2.23 \| 10000 \| 0.1830 \|
	\| 0.1362 \| 2.34 \| 10500 \| 0.1844 \|
	\| 0.1373 \| 2.45 \| 11000 \| 0.1836 \|
	\| 0.1269 \| 2.56 \| 11500 \| 0.1842 \|
	\| 0.1362 \| 2.67 \| 12000 \| 0.1819 \|
	\| 0.14 \| 2.79 \| 12500 \| 0.1832 \|
	\| 0.1319 \| 2.9 \| 13000 \| 0.1837 \|
	\| 0.1304 \| 3.01 \| 13500 \| 0.1845 \|
	\| 0.1278 \| 3.12 \| 14000 \| 0.1844 \|
	\| 0.1235 \| 3.23 \| 14500 \| 0.1832 \|
	\| 0.1293 \| 3.34 \| 15000 \| 0.1855 \|
	\| 0.1302 \| 3.45 \| 15500 \| 0.1836 \|
	\| 0.1285 \| 3.57 \| 16000 \| 0.1860 \|
	\| 0.1274 \| 3.68 \| 16500 \| 0.1860 \|
	\| 0.1261 \| 3.79 \| 17000 \| 0.1854 \|
	\| 0.1304 \| 3.9 \| 17500 \| 0.1859 \|
	\| 0.1223 \| 4.01 \| 18000 \| 0.1862 \|
	\| 0.1235 \| 4.12 \| 18500 \| 0.1849 \|
	\| 0.1286 \| 4.23 \| 19000 \| 0.1858 \|
	\| 0.1186 \| 4.35 \| 19500 \| 0.1856 \|
	\| 0.1293 \| 4.46 \| 20000 \| 0.1850 \|


	### Framework versions

	- Transformers 4.37.2
	- Pytorch 2.2.2+cu121
	- Datasets 2.18.0
	- Tokenizers 0.15.1