Update README.md
Browse files
README.md
CHANGED
@@ -6,7 +6,7 @@ pipeline_tag: text-classification
|
|
6 |
---
|
7 |
|
8 |
# Introduction
|
9 |
-
This reward model achieves a score of 92.
|
10 |
|
11 |
Check our GRM series at 🤗[hugging face](https://huggingface.co/collections/Ray2333/grm-66882bdf7152951779506c7b), our paper at [Arxiv](https://arxiv.org/abs/2406.10216), and github repo at [Github](https://github.com/YangRui2015/Generalizable-Reward-Model).
|
12 |
|
@@ -19,7 +19,7 @@ We evaluate GRM_Llama3.1_8B_rewardmodel-ft on the [reward model benchmark](https
|
|
19 |
|
20 |
| Model | Average | Chat | Chat Hard | Safety | Reasoning |
|
21 |
|:-------------------------:|:-------------:|:---------:|:---------:|:--------:|:-----------:|
|
22 |
-
|GRM_Llama3.1_8B_rewardmodel-ft| 92.
|
23 |
|[GRM-Llama3-8B-rewardmodel-ft](https://huggingface.co/Ray2333/GRM-Llama3-8B-rewardmodel-ft)**(8B)**|91.5|95.5|86.2|90.8|93.6|
|
24 |
|[GRM-Llama3.2-3B-rewardmodel-ft](https://huggingface.co/Ray2333/GRM-Llama3.2-3B-rewardmodel-ft)**(ours, 3B)**|90.9|91.6|84.9|92.7|94.6|
|
25 |
| [GRM-gemma2-2B-rewardmodel-ft](https://huggingface.co/Ray2333/GRM-gemma2-2B-rewardmodel-ft) **(Ours, 2B)**| 88.4 | 93.0 | 77.2 | 92.2 | 91.2 |
|
|
|
6 |
---
|
7 |
|
8 |
# Introduction
|
9 |
+
This reward model achieves a score of 92.6 on reward-bench, which is finetuned from a GRM-Llama3.1-8B-sftreg model using the decontaminated [Skywork preference dataset v0.2](https://huggingface.co/datasets/Skywork/Skywork-Reward-Preference-80K-v0.2).
|
10 |
|
11 |
Check our GRM series at 🤗[hugging face](https://huggingface.co/collections/Ray2333/grm-66882bdf7152951779506c7b), our paper at [Arxiv](https://arxiv.org/abs/2406.10216), and github repo at [Github](https://github.com/YangRui2015/Generalizable-Reward-Model).
|
12 |
|
|
|
19 |
|
20 |
| Model | Average | Chat | Chat Hard | Safety | Reasoning |
|
21 |
|:-------------------------:|:-------------:|:---------:|:---------:|:--------:|:-----------:|
|
22 |
+
|GRM_Llama3.1_8B_rewardmodel-ft| 92.6|95.0 |87.7|91.4|96.4|
|
23 |
|[GRM-Llama3-8B-rewardmodel-ft](https://huggingface.co/Ray2333/GRM-Llama3-8B-rewardmodel-ft)**(8B)**|91.5|95.5|86.2|90.8|93.6|
|
24 |
|[GRM-Llama3.2-3B-rewardmodel-ft](https://huggingface.co/Ray2333/GRM-Llama3.2-3B-rewardmodel-ft)**(ours, 3B)**|90.9|91.6|84.9|92.7|94.6|
|
25 |
| [GRM-gemma2-2B-rewardmodel-ft](https://huggingface.co/Ray2333/GRM-gemma2-2B-rewardmodel-ft) **(Ours, 2B)**| 88.4 | 93.0 | 77.2 | 92.2 | 91.2 |
|