Tim-05369 commited on
Commit
100a631
·
verified ·
1 Parent(s): 89d3223

Tim-05369/orpo_train_using_space_demo

Browse files
Files changed (3) hide show
  1. README.md +13 -13
  2. model.safetensors +1 -1
  3. training_args.bin +1 -1
README.md CHANGED
@@ -17,18 +17,18 @@ should probably proofread and complete it, then remove this comment. -->
17
 
18
  This model is a fine-tuned version of [TinyLlama/TinyLlama-1.1B-Chat-v1.0](https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0) on the None dataset.
19
  It achieves the following results on the evaluation set:
20
- - Loss: 1.8073
21
- - Rewards/chosen: -0.0719
22
- - Rewards/rejected: -0.1594
23
  - Rewards/accuracies: 0.6667
24
- - Rewards/margins: 0.0874
25
- - Logps/rejected: -3.1874
26
- - Logps/chosen: -1.4385
27
- - Logits/rejected: -2.2859
28
- - Logits/chosen: -2.4167
29
- - Nll Loss: 1.7887
30
- - Log Odds Ratio: -0.3720
31
- - Log Odds Chosen: 2.6873
32
 
33
  ## Model description
34
 
@@ -48,8 +48,8 @@ More information needed
48
 
49
  The following hyperparameters were used during training:
50
  - learning_rate: 5e-05
51
- - train_batch_size: 8
52
- - eval_batch_size: 8
53
  - seed: 42
54
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
55
  - lr_scheduler_type: linear
 
17
 
18
  This model is a fine-tuned version of [TinyLlama/TinyLlama-1.1B-Chat-v1.0](https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0) on the None dataset.
19
  It achieves the following results on the evaluation set:
20
+ - Loss: 2.2772
21
+ - Rewards/chosen: -0.1134
22
+ - Rewards/rejected: -0.0902
23
  - Rewards/accuracies: 0.6667
24
+ - Rewards/margins: -0.0232
25
+ - Logps/rejected: -1.8039
26
+ - Logps/chosen: -2.2683
27
+ - Logits/rejected: -2.3984
28
+ - Logits/chosen: -2.3109
29
+ - Nll Loss: 2.2241
30
+ - Log Odds Ratio: -1.0632
31
+ - Log Odds Chosen: -0.4972
32
 
33
  ## Model description
34
 
 
48
 
49
  The following hyperparameters were used during training:
50
  - learning_rate: 5e-05
51
+ - train_batch_size: 32
52
+ - eval_batch_size: 32
53
  - seed: 42
54
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
55
  - lr_scheduler_type: linear
model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:74ddbbdceecb03ec6e89c6ae9a82ff27449b7190a035a4c92a99d41dc6a682c0
3
  size 4400216536
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e342c4848f6e6352a74f56ba62320f875186805671cf32b49f37cd80bca8d59c
3
  size 4400216536
training_args.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:ccba72281dc9a290f89ffb1ade8e85412a27e156a73f32f98f07081e5a0b7c57
3
  size 5496
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e59fa5bdfb1ef5d310d364528550c13174df64b6cdcd5d935757d10e0bc07d39
3
  size 5496