Text Classification
Safetensors
llama
Ray2333 commited on
Commit
2bc4498
·
verified ·
1 Parent(s): bfee1ec

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -6,7 +6,7 @@ pipeline_tag: text-classification
6
  ---
7
 
8
  # Introduction
9
- This reward model achieves a score of 92.8 on reward-bench, which is finetuned from a GRM-Llama3.1-8B-sftreg model using the decontaminated [Skywork preference dataset v0.2](https://huggingface.co/datasets/Skywork/Skywork-Reward-Preference-80K-v0.2).
10
 
11
  Check our GRM series at 🤗[hugging face](https://huggingface.co/collections/Ray2333/grm-66882bdf7152951779506c7b), our paper at [Arxiv](https://arxiv.org/abs/2406.10216), and github repo at [Github](https://github.com/YangRui2015/Generalizable-Reward-Model).
12
 
@@ -19,7 +19,7 @@ We evaluate GRM_Llama3.1_8B_rewardmodel-ft on the [reward model benchmark](https
19
 
20
  | Model | Average | Chat | Chat Hard | Safety | Reasoning |
21
  |:-------------------------:|:-------------:|:---------:|:---------:|:--------:|:-----------:|
22
- |GRM_Llama3.1_8B_rewardmodel-ft| 92.8|96.1 |87.3|91.1|96.9|
23
  |[GRM-Llama3-8B-rewardmodel-ft](https://huggingface.co/Ray2333/GRM-Llama3-8B-rewardmodel-ft)**(8B)**|91.5|95.5|86.2|90.8|93.6|
24
  |[GRM-Llama3.2-3B-rewardmodel-ft](https://huggingface.co/Ray2333/GRM-Llama3.2-3B-rewardmodel-ft)**(ours, 3B)**|90.9|91.6|84.9|92.7|94.6|
25
  | [GRM-gemma2-2B-rewardmodel-ft](https://huggingface.co/Ray2333/GRM-gemma2-2B-rewardmodel-ft) **(Ours, 2B)**| 88.4 | 93.0 | 77.2 | 92.2 | 91.2 |
 
6
  ---
7
 
8
  # Introduction
9
+ This reward model achieves a score of 92.6 on reward-bench, which is finetuned from a GRM-Llama3.1-8B-sftreg model using the decontaminated [Skywork preference dataset v0.2](https://huggingface.co/datasets/Skywork/Skywork-Reward-Preference-80K-v0.2).
10
 
11
  Check our GRM series at 🤗[hugging face](https://huggingface.co/collections/Ray2333/grm-66882bdf7152951779506c7b), our paper at [Arxiv](https://arxiv.org/abs/2406.10216), and github repo at [Github](https://github.com/YangRui2015/Generalizable-Reward-Model).
12
 
 
19
 
20
  | Model | Average | Chat | Chat Hard | Safety | Reasoning |
21
  |:-------------------------:|:-------------:|:---------:|:---------:|:--------:|:-----------:|
22
+ |GRM_Llama3.1_8B_rewardmodel-ft| 92.6|95.0 |87.7|91.4|96.4|
23
  |[GRM-Llama3-8B-rewardmodel-ft](https://huggingface.co/Ray2333/GRM-Llama3-8B-rewardmodel-ft)**(8B)**|91.5|95.5|86.2|90.8|93.6|
24
  |[GRM-Llama3.2-3B-rewardmodel-ft](https://huggingface.co/Ray2333/GRM-Llama3.2-3B-rewardmodel-ft)**(ours, 3B)**|90.9|91.6|84.9|92.7|94.6|
25
  | [GRM-gemma2-2B-rewardmodel-ft](https://huggingface.co/Ray2333/GRM-gemma2-2B-rewardmodel-ft) **(Ours, 2B)**| 88.4 | 93.0 | 77.2 | 92.2 | 91.2 |