End of training

985d70b verified 23 days ago

6.12 kB

	---
	license: apache-2.0
	base_model: hZzy/qwen2.5-0.5b-sft-news-IFT
	tags:
	- alignment-handbook
	- ndcg
	- trl
	- expo
	- generated_from_trainer
	- trl
	- expo
	- generated_from_trainer
	datasets:
	- hZzy/train_pairwise_weighted
	model-index:
	- name: qwen2.5-0.5b-expo-L2EXPO-W0-noES4-0.1
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="200" height="32"/>](https://wandb.ai/zhiyuzha-university-of-florida/huggingface/runs/n7w9uz5i)
	# qwen2.5-0.5b-expo-L2EXPO-W0-noES4-0.1

	This model is a fine-tuned version of [hZzy/qwen2.5-0.5b-sft-news-IFT](https://huggingface.co/hZzy/qwen2.5-0.5b-sft-news-IFT) on the hZzy/train_pairwise_weighted dataset.
	It achieves the following results on the evaluation set:
	- Loss: 179.2621
	- Logps: -92.2613
	- Logits: -1.4975
	- Objective: 175.9752
	- Dpo Loss: 0.6785
	- Regularize: 0.3992
	- Ranking Simple: 0.5280
	- Ranking Idealized: 0.6025
	- Ranking Idealized Expo: 0.5233
	- Wo Beta: 16.5856

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 5e-07
	- train_batch_size: 4
	- eval_batch_size: 4
	- seed: 42
	- distributed_type: multi-GPU
	- num_devices: 3
	- gradient_accumulation_steps: 12
	- total_train_batch_size: 144
	- total_eval_batch_size: 12
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: cosine
	- lr_scheduler_warmup_ratio: 0.1
	- num_epochs: 3

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Logps \| Logits \| Objective \| Dpo Loss \| Regularize \| Ranking Simple \| Ranking Idealized \| Ranking Idealized Expo \| Wo Beta \|
	\|:-------------:\|:------:\|:----:\|:---------------:\|:--------:\|:-------:\|:---------:\|:--------:\|:----------:\|:--------------:\|:-----------------:\|:----------------------:\|:-------:\|
	\| 182.5182 \| 0.1417 \| 50 \| 182.5003 \| -90.8517 \| -1.4200 \| 180.4893 \| 0.6895 \| 0.4093 \| 0.5248 \| 0.6025 \| 0.5233 \| 16.3100 \|
	\| 159.305 \| 0.2834 \| 100 \| 182.1522 \| -91.3531 \| -1.4622 \| 180.5219 \| 0.6860 \| 0.4103 \| 0.5311 \| 0.6025 \| 0.5233 \| 16.3819 \|
	\| 150.2379 \| 0.4251 \| 150 \| 180.0575 \| -90.2469 \| -1.4576 \| 177.1578 \| 0.6806 \| 0.4010 \| 0.5331 \| 0.6025 \| 0.5233 \| 16.6107 \|
	\| 135.925 \| 0.5668 \| 200 \| 179.9740 \| -91.1249 \| -1.4453 \| 177.0413 \| 0.6795 \| 0.4006 \| 0.5305 \| 0.6025 \| 0.5233 \| 16.2687 \|
	\| 130.7065 \| 0.7085 \| 250 \| 181.5092 \| -91.6178 \| -1.5061 \| 178.2784 \| 0.6800 \| 0.4049 \| 0.5305 \| 0.6025 \| 0.5233 \| 16.6407 \|
	\| 109.74 \| 0.8503 \| 300 \| 180.4924 \| -92.4236 \| -1.4760 \| 178.1365 \| 0.6815 \| 0.4047 \| 0.5305 \| 0.6025 \| 0.5233 \| 16.4981 \|
	\| 104.2663 \| 0.9920 \| 350 \| 182.2591 \| -92.8005 \| -1.5066 \| 178.8644 \| 0.6808 \| 0.4058 \| 0.5290 \| 0.6025 \| 0.5233 \| 16.5694 \|
	\| 91.3585 \| 1.1337 \| 400 \| 180.0295 \| -92.3854 \| -1.4789 \| 177.7148 \| 0.6800 \| 0.4024 \| 0.5280 \| 0.6025 \| 0.5233 \| 16.5852 \|
	\| 77.8925 \| 1.2754 \| 450 \| 179.2441 \| -92.7062 \| -1.4746 \| 175.8475 \| 0.6792 \| 0.3989 \| 0.5274 \| 0.6025 \| 0.5233 \| 16.5269 \|
	\| 73.5844 \| 1.4171 \| 500 \| 180.3643 \| -93.2695 \| -1.4849 \| 176.2332 \| 0.6786 \| 0.3994 \| 0.5305 \| 0.6025 \| 0.5233 \| 16.5003 \|
	\| 74.752 \| 1.5588 \| 550 \| 181.3646 \| -92.8892 \| -1.4832 \| 177.2267 \| 0.6795 \| 0.4020 \| 0.5274 \| 0.6025 \| 0.5233 \| 16.5546 \|
	\| 66.606 \| 1.7005 \| 600 \| 179.4953 \| -91.6158 \| -1.4675 \| 176.2793 \| 0.6789 \| 0.3999 \| 0.5311 \| 0.6025 \| 0.5233 \| 16.6183 \|
	\| 65.4503 \| 1.8422 \| 650 \| 180.1248 \| -91.8974 \| -1.5046 \| 176.5553 \| 0.6790 \| 0.4003 \| 0.5285 \| 0.6025 \| 0.5233 \| 16.5373 \|
	\| 62.3615 \| 1.9839 \| 700 \| 179.3857 \| -91.5875 \| -1.4984 \| 176.0021 \| 0.6784 \| 0.3992 \| 0.5300 \| 0.6025 \| 0.5233 \| 16.5863 \|
	\| 48.9708 \| 2.1256 \| 750 \| 179.8103 \| -92.1933 \| -1.4919 \| 176.7028 \| 0.6794 \| 0.4011 \| 0.5274 \| 0.6025 \| 0.5233 \| 16.5884 \|
	\| 51.9463 \| 2.2674 \| 800 \| 179.2178 \| -92.0065 \| -1.4993 \| 175.7036 \| 0.6782 \| 0.3986 \| 0.5290 \| 0.6025 \| 0.5233 \| 16.5689 \|
	\| 44.3463 \| 2.4091 \| 850 \| 179.1735 \| -92.2372 \| -1.4918 \| 175.7777 \| 0.6783 \| 0.3988 \| 0.5285 \| 0.6025 \| 0.5233 \| 16.5682 \|
	\| 44.3015 \| 2.5508 \| 900 \| 179.1590 \| -92.1898 \| -1.4983 \| 175.8240 \| 0.6784 \| 0.3990 \| 0.5280 \| 0.6025 \| 0.5233 \| 16.5905 \|
	\| 43.4164 \| 2.6925 \| 950 \| 179.2801 \| -92.2046 \| -1.4967 \| 176.0408 \| 0.6785 \| 0.3993 \| 0.5274 \| 0.6025 \| 0.5233 \| 16.5891 \|
	\| 43.6009 \| 2.8342 \| 1000 \| 179.2791 \| -92.2705 \| -1.4978 \| 175.9963 \| 0.6785 \| 0.3992 \| 0.5280 \| 0.6025 \| 0.5233 \| 16.5880 \|
	\| 47.7054 \| 2.9759 \| 1050 \| 179.2622 \| -92.2613 \| -1.4975 \| 175.9752 \| 0.6785 \| 0.3992 \| 0.5280 \| 0.6025 \| 0.5233 \| 16.5856 \|


	### Framework versions

	- Transformers 4.42.0
	- Pytorch 2.3.0+cu121
	- Datasets 3.2.0
	- Tokenizers 0.19.1