File size: 8,613 Bytes
c2f275b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
---
library_name: peft
tags:
- alignment-handbook
- generated_from_trainer
base_model: g8a9/tweety-mistral-7b
datasets:
- giux78/ultrafeedback-binarized-preferences-cleaned-ita
model-index:
- name: dpo
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# dpo

This model is a fine-tuned version of [/leonardo_scratch/fast/IscrC_ItaLLM_0/tweety_models/sft](https://huggingface.co//leonardo_scratch/fast/IscrC_ItaLLM_0/tweety_models/sft) on the giux78/ultrafeedback-binarized-preferences-cleaned-ita dataset.
It achieves the following results on the evaluation set:
- Loss: 0.6931
- Rewards/chosen: -0.0430
- Rewards/rejected: -0.0430
- Rewards/accuracies: 0.0
- Rewards/margins: 0.0
- Logps/rejected: -310.7832
- Logps/chosen: -310.7832
- Logits/rejected: -2.3909
- Logits/chosen: -2.3909

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 5e-06
- train_batch_size: 4
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- gradient_accumulation_steps: 4
- total_train_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 1

### Training results

| Training Loss | Epoch  | Step | Logits/chosen | Logits/rejected | Logps/chosen | Logps/rejected | Validation Loss | Rewards/accuracies | Rewards/chosen | Rewards/margins | Rewards/rejected |
|:-------------:|:------:|:----:|:-------------:|:---------------:|:------------:|:--------------:|:---------------:|:------------------:|:--------------:|:---------------:|:----------------:|
| 0.6931        | 0.0292 | 100  | -2.3941       | -2.3941         | -306.3899    | -306.3899      | 0.6931          | 0.0                | 0.0009         | 0.0             | 0.0009           |
| 0.6931        | 0.0584 | 200  | -2.3946       | -2.3946         | -306.5539    | -306.5539      | 0.6931          | 0.0                | -0.0008        | 0.0             | -0.0008          |
| 0.6931        | 0.0876 | 300  | -2.3942       | -2.3942         | -307.0490    | -307.0490      | 0.6931          | 0.0                | -0.0057        | 0.0             | -0.0057          |
| 0.6931        | 0.1168 | 400  | -2.3940       | -2.3940         | -307.3796    | -307.3796      | 0.6931          | 0.0                | -0.0090        | 0.0             | -0.0090          |
| 0.6931        | 0.1460 | 500  | -2.3937       | -2.3937         | -307.1581    | -307.1581      | 0.6931          | 0.0                | -0.0068        | 0.0             | -0.0068          |
| 0.6931        | 0.1751 | 600  | -2.3950       | -2.3950         | -306.9631    | -306.9631      | 0.6931          | 0.0                | -0.0048        | 0.0             | -0.0048          |
| 0.6931        | 0.2043 | 700  | -2.3949       | -2.3949         | -307.6349    | -307.6349      | 0.6931          | 0.0                | -0.0116        | 0.0             | -0.0116          |
| 0.6931        | 0.2335 | 800  | -2.3947       | -2.3947         | -307.6957    | -307.6957      | 0.6931          | 0.0                | -0.0122        | 0.0             | -0.0122          |
| 0.6931        | 0.2627 | 900  | -2.3968       | -2.3968         | -307.1708    | -307.1708      | 0.6931          | 0.0                | -0.0069        | 0.0             | -0.0069          |
| 0.6931        | 0.2919 | 1000 | -2.3967       | -2.3967         | -308.2130    | -308.2130      | 0.6931          | 0.0                | -0.0173        | 0.0             | -0.0173          |
| 0.6931        | 0.3211 | 1100 | -2.3971       | -2.3971         | -309.4724    | -309.4724      | 0.6931          | 0.0                | -0.0299        | 0.0             | -0.0299          |
| 0.6931        | 0.3503 | 1200 | -2.3976       | -2.3976         | -310.0194    | -310.0194      | 0.6931          | 0.0                | -0.0354        | 0.0             | -0.0354          |
| 0.6931        | 0.3795 | 1300 | -2.3963       | -2.3963         | -309.5114    | -309.5114      | 0.6931          | 0.0                | -0.0303        | 0.0             | -0.0303          |
| 0.6931        | 0.4087 | 1400 | -2.3955       | -2.3955         | -309.2061    | -309.2061      | 0.6931          | 0.0                | -0.0273        | 0.0             | -0.0273          |
| 0.6931        | 0.4379 | 1500 | -2.3943       | -2.3943         | -308.9652    | -308.9652      | 0.6931          | 0.0                | -0.0249        | 0.0             | -0.0249          |
| 0.6931        | 0.4671 | 1600 | -2.3954       | -2.3954         | -309.1586    | -309.1586      | 0.6931          | 0.0                | -0.0268        | 0.0             | -0.0268          |
| 0.6931        | 0.4962 | 1700 | -2.3913       | -2.3913         | -309.4055    | -309.4055      | 0.6931          | 0.0                | -0.0293        | 0.0             | -0.0293          |
| 0.6931        | 0.5254 | 1800 | -2.3927       | -2.3927         | -310.2643    | -310.2643      | 0.6931          | 0.0                | -0.0379        | 0.0             | -0.0379          |
| 0.6931        | 0.5546 | 1900 | -2.3927       | -2.3927         | -310.4164    | -310.4164      | 0.6931          | 0.0                | -0.0394        | 0.0             | -0.0394          |
| 0.6931        | 0.5838 | 2000 | -2.3920       | -2.3920         | -310.4427    | -310.4427      | 0.6931          | 0.0                | -0.0396        | 0.0             | -0.0396          |
| 0.6931        | 0.6130 | 2100 | -2.3901       | -2.3901         | -310.7150    | -310.7150      | 0.6931          | 0.0                | -0.0424        | 0.0             | -0.0424          |
| 0.6931        | 0.6422 | 2200 | -2.3911       | -2.3911         | -311.0310    | -311.0310      | 0.6931          | 0.0                | -0.0455        | 0.0             | -0.0455          |
| 0.6931        | 0.6714 | 2300 | -2.3912       | -2.3912         | -310.7881    | -310.7881      | 0.6931          | 0.0                | -0.0431        | 0.0             | -0.0431          |
| 0.6931        | 0.7006 | 2400 | -2.3899       | -2.3899         | -310.6455    | -310.6455      | 0.6931          | 0.0                | -0.0417        | 0.0             | -0.0417          |
| 0.6931        | 0.7298 | 2500 | -2.3915       | -2.3915         | -310.8196    | -310.8196      | 0.6931          | 0.0                | -0.0434        | 0.0             | -0.0434          |
| 0.6931        | 0.7590 | 2600 | 0.6931        | -0.0438         | -0.0438      | 0.0            | 0.0             | -310.8546          | -310.8546      | -2.3919         | -2.3919          |
| 0.6931        | 0.7881 | 2700 | 0.6931        | -0.0436         | -0.0436      | 0.0            | 0.0             | -310.8407          | -310.8407      | -2.3916         | -2.3916          |
| 0.6931        | 0.8173 | 2800 | 0.6931        | -0.0432         | -0.0432      | 0.0            | 0.0             | -310.7981          | -310.7981      | -2.3915         | -2.3915          |
| 0.6931        | 0.8465 | 2900 | 0.6931        | -0.0432         | -0.0432      | 0.0            | 0.0             | -310.7943          | -310.7943      | -2.3920         | -2.3920          |
| 0.6931        | 0.8757 | 3000 | 0.6931        | -0.0431         | -0.0431      | 0.0            | 0.0             | -310.7866          | -310.7866      | -2.3918         | -2.3918          |
| 0.6931        | 0.9049 | 3100 | 0.6931        | -0.0430         | -0.0430      | 0.0            | 0.0             | -310.7794          | -310.7794      | -2.3908         | -2.3908          |
| 0.6931        | 0.9341 | 3200 | 0.6931        | -0.0430         | -0.0430      | 0.0            | 0.0             | -310.7812          | -310.7812      | -2.3911         | -2.3911          |
| 0.6931        | 0.9633 | 3300 | 0.6931        | -0.0430         | -0.0430      | 0.0            | 0.0             | -310.7767          | -310.7767      | -2.3915         | -2.3915          |
| 0.6931        | 0.9925 | 3400 | 0.6931        | -0.0430         | -0.0430      | 0.0            | 0.0             | -310.7832          | -310.7832      | -2.3909         | -2.3909          |


### Framework versions

- PEFT 0.7.1
- Transformers 4.40.2
- Pytorch 2.1.2+cu121
- Datasets 2.19.1
- Tokenizers 0.19.1