Update README.md
Browse files
README.md
CHANGED
@@ -92,7 +92,8 @@ In particular, the model was evaluated on some categories of gender biases, prop
|
|
92 |
|
93 |
## Training details
|
94 |
|
95 |
-
This model is a fine-tuned version of [starchat2-15b-sft-v0.1](https://huggingface.co/HuggingFaceH4/starchat2-15b-sft-v0.1) on the HuggingFaceH4/ultrafeedback_binarized and the HuggingFaceH4/orca_dpo_pairs datasets.
|
|
|
96 |
It achieves the following results on the evaluation set:
|
97 |
- Loss: 0.4347
|
98 |
- Rewards/chosen: -0.9461
|
@@ -104,18 +105,6 @@ It achieves the following results on the evaluation set:
|
|
104 |
- Logits/rejected: -2.3817
|
105 |
- Logits/chosen: -2.3005
|
106 |
|
107 |
-
## Model description
|
108 |
-
|
109 |
-
More information needed
|
110 |
-
|
111 |
-
## Intended uses & limitations
|
112 |
-
|
113 |
-
More information needed
|
114 |
-
|
115 |
-
## Training and evaluation data
|
116 |
-
|
117 |
-
More information needed
|
118 |
-
|
119 |
## Training procedure
|
120 |
|
121 |
### Training hyperparameters
|
|
|
92 |
|
93 |
## Training details
|
94 |
|
95 |
+
This model is a fine-tuned version of [starchat2-15b-sft-v0.1](https://huggingface.co/HuggingFaceH4/starchat2-15b-sft-v0.1) on the HuggingFaceH4/ultrafeedback_binarized and the HuggingFaceH4/orca_dpo_pairs datasets. Check out the recipe in the [Alignment Handbook](https://github.com/huggingface/alignment-handbook) for more details.
|
96 |
+
|
97 |
It achieves the following results on the evaluation set:
|
98 |
- Loss: 0.4347
|
99 |
- Rewards/chosen: -0.9461
|
|
|
105 |
- Logits/rejected: -2.3817
|
106 |
- Logits/chosen: -2.3005
|
107 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
108 |
## Training procedure
|
109 |
|
110 |
### Training hyperparameters
|