Update README.md
Browse files
README.md
CHANGED
@@ -19,6 +19,8 @@ The framework is shown above. The introduced text generation regularization mark
|
|
19 |
|
20 |
This reward model is finetuned from [gemma-2-2b-it](https://huggingface.co/google/gemma-2-2b-it) using the [weqweasdas/preference_dataset_mixture2_and_safe_pku](https://huggingface.co/datasets/weqweasdas/preference_dataset_mixture2_and_safe_pku) dataset.
|
21 |
|
|
|
|
|
22 |
|
23 |
## Evaluation
|
24 |
We evaluate GRM-Gemma2-2B-sftreg on the [reward model benchmark](https://huggingface.co/spaces/allenai/reward-bench), where it achieves a score of 84.7.
|
|
|
19 |
|
20 |
This reward model is finetuned from [gemma-2-2b-it](https://huggingface.co/google/gemma-2-2b-it) using the [weqweasdas/preference_dataset_mixture2_and_safe_pku](https://huggingface.co/datasets/weqweasdas/preference_dataset_mixture2_and_safe_pku) dataset.
|
21 |
|
22 |
+
Check our GRM series at 🤗[hugging face](https://huggingface.co/collections/Ray2333/grm-66882bdf7152951779506c7b) and github repo at [Github](https://github.com/YangRui2015/Generalizable-Reward-Model).
|
23 |
+
|
24 |
|
25 |
## Evaluation
|
26 |
We evaluate GRM-Gemma2-2B-sftreg on the [reward model benchmark](https://huggingface.co/spaces/allenai/reward-bench), where it achieves a score of 84.7.
|