Visualize in Weights & Biases

pythia-410m-deduped

This model is a fine-tuned version of EleutherAI/pythia-410m-deduped on the princeton-nlp/llama3-ultrafeedback dataset. It achieves the following results on the evaluation set:

  • Loss: 1.6928
  • Original Losses: 1.7344
  • Weight: 1.0
  • Abs Diff: 0.3008
  • Rewards/chosen: -5.4375
  • Rewards/rejected: -5.4688
  • Rewards/accuracies: 0.4758
  • Rewards/margins: 0.0228
  • Logps/rejected: -2.1875
  • Logps/chosen: -2.1719
  • Logits/rejected: 5.7188
  • Logits/chosen: 5.7188
  • All Logps 1: -811.2697
  • All Logps 1 Values: -811.2697
  • All Logps 2: 447.4254
  • All Logps 2 Values: 447.4254

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-06
  • train_batch_size: 2
  • eval_batch_size: 4
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 8
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 128
  • total_eval_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Original Losses Weight Abs Diff Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen All Logps 1 All Logps 1 Values All Logps 2 All Logps 2 Values
1.9439 0.0427 20 1.7861 1.8125 1.0 0.3574 -4.9688 -5.0 0.4556 0.0187 -1.9922 -1.9844 5.1875 5.2188 -694.3344 -694.3344 447.4254 447.4254
1.8637 0.0855 40 1.7850 1.8125 1.0 0.3574 -4.9688 -4.9688 0.4637 0.0112 -1.9922 -1.9844 5.1875 5.25 -694.3014 -694.3014 447.4254 447.4254
1.8856 0.1282 60 1.7741 1.8125 1.0 0.3496 -4.9375 -4.9375 0.4435 -0.0004 -1.9766 -1.9766 5.2188 5.25 -695.6515 -695.6515 447.4254 447.4254
1.8193 0.1710 80 1.7628 1.8047 1.0 0.3477 -4.9375 -4.9375 0.4637 0.0016 -1.9844 -1.9766 5.3125 5.3438 -699.6716 -699.6716 447.4254 447.4254
1.8542 0.2137 100 1.7501 1.7891 1.0 0.3340 -4.9375 -4.9688 0.4758 0.0138 -1.9844 -1.9766 5.4062 5.4375 -707.3261 -707.3261 447.4254 447.4254
1.7907 0.2565 120 1.7458 1.7891 1.0 0.3301 -5.0 -4.9688 0.4315 -0.0052 -1.9922 -1.9922 5.4688 5.5 -714.8251 -714.8251 447.4254 447.4254
1.8332 0.2992 140 1.7375 1.7969 1.0 0.3281 -5.0312 -5.0 0.4637 -0.0200 -2.0 -2.0156 5.5312 5.5625 -723.8403 -723.8403 447.4254 447.4254
1.7599 0.3420 160 1.7328 1.7969 1.0 0.3301 -5.0938 -5.0625 0.4355 -0.0156 -2.0312 -2.0312 5.5625 5.5938 -734.5149 -734.5149 447.4254 447.4254
1.8462 0.3847 180 1.7246 1.7734 1.0 0.3184 -5.125 -5.125 0.4516 -0.0015 -2.0469 -2.0469 5.5625 5.5938 -745.0103 -745.0103 447.4254 447.4254
1.8253 0.4275 200 1.7154 1.7656 1.0 0.3145 -5.1562 -5.1875 0.4476 0.0043 -2.0625 -2.0625 5.5625 5.5938 -755.3181 -755.3181 447.4254 447.4254
1.8056 0.4702 220 1.7119 1.7734 1.0 0.3203 -5.2188 -5.2188 0.4476 0.0032 -2.0938 -2.0938 5.5938 5.625 -762.7902 -762.7902 447.4254 447.4254
1.7958 0.5130 240 1.7096 1.7734 1.0 0.3164 -5.25 -5.25 0.4556 -0.0002 -2.1094 -2.1094 5.5938 5.625 -770.9695 -770.9695 447.4254 447.4254
1.7141 0.5557 260 1.7073 1.7578 1.0 0.3086 -5.2812 -5.2812 0.4355 0.0052 -2.1094 -2.1094 5.625 5.625 -775.2407 -775.2407 447.4254 447.4254
1.7021 0.5985 280 1.7085 1.7656 1.0 0.3125 -5.2812 -5.2812 0.4597 -0.0014 -2.1094 -2.1094 5.625 5.6562 -778.4560 -778.4560 447.4254 447.4254
1.7788 0.6412 300 1.7020 1.7578 1.0 0.3066 -5.3125 -5.3125 0.4677 0.0104 -2.125 -2.125 5.6562 5.6875 -784.0049 -784.0049 447.4254 447.4254
1.679 0.6839 320 1.7053 1.7578 1.0 0.3105 -5.3438 -5.3438 0.4476 0.0002 -2.1406 -2.1406 5.6562 5.6875 -791.0703 -791.0703 447.4254 447.4254
1.751 0.7267 340 1.7006 1.7578 1.0 0.3105 -5.375 -5.4062 0.4919 0.0085 -2.1562 -2.1562 5.6562 5.6875 -797.0882 -797.0882 447.4254 447.4254
1.7191 0.7694 360 1.6990 1.7656 1.0 0.3086 -5.4375 -5.4062 0.4476 -0.0044 -2.1719 -2.1719 5.6875 5.6875 -803.0909 -803.0909 447.4254 447.4254
1.7226 0.8122 380 1.6993 1.7578 1.0 0.3086 -5.4375 -5.4375 0.4758 0.0093 -2.1719 -2.1719 5.6875 5.7188 -806.9357 -806.9357 447.4254 447.4254
1.7198 0.8549 400 1.6968 1.7578 1.0 0.3066 -5.4688 -5.4688 0.4556 0.0020 -2.1875 -2.1875 5.6875 5.7188 -810.5368 -810.5368 447.4254 447.4254
1.7057 0.8977 420 1.6963 1.75 1.0 0.3047 -5.4688 -5.4688 0.4718 0.0151 -2.1875 -2.1875 5.6875 5.7188 -811.7772 -811.7772 447.4254 447.4254
1.75 0.9404 440 1.6973 1.7578 1.0 0.3086 -5.4688 -5.4688 0.4677 0.0077 -2.1875 -2.1875 5.6875 5.7188 -811.8970 -811.8970 447.4254 447.4254
1.6912 0.9832 460 1.6928 1.7344 1.0 0.3008 -5.4375 -5.4688 0.4758 0.0228 -2.1875 -2.1719 5.7188 5.7188 -811.2697 -811.2697 447.4254 447.4254

Framework versions

  • Transformers 4.42.3
  • Pytorch 2.2.2+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
30
Safetensors
Model size
405M params
Tensor type
BF16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for RAY2L/pythia-410m-deduped-SimPOW-1

Finetuned
(82)
this model

Dataset used to train RAY2L/pythia-410m-deduped-SimPOW-1