nvidia
/

Nemotron-4-340B-Reward

Model card Files Files and versions Community

zhilinw commited on Jun 14, 2024

Commit

3d6fe6d

·

verified ·

1 Parent(s): 643bdbe

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -23,7 +23,7 @@ datasets:
 ### Model Overview
-The Nemotron-4-340B-Reward is a multi-dimensional Reward Model that can be used as part of a synthetic data generation pipeline to create training data that helps researchers and developers build their own LLMs; Nemotron-4-340B-Reward consists of the Nemotron-4-340B-Base model and a linear layer that converts the final layer representation of the end-of-response token into five scalar values, each corresponding to a [HelpSteer](https://arxiv.org/abs/2311.09528) attribute.
 Given a conversation with multiple turns between user and assistant, it rates the following attributes (typically between 0 and 4) for every assistant turn.

 ### Model Overview
+The Nemotron-4-340B-Reward is a multi-dimensional Reward Model that can be used as part of a synthetic data generation pipeline to create training data that helps researchers and developers build their own LLMs; Nemotron-4-340B-Reward consists of the Nemotron-4-340B-Base model and a linear layer that converts the final layer representation of the end-of-response token into five scalar values, each corresponding to a [HelpSteer2](https://arxiv.org/abs/2406.08673) attribute.
 Given a conversation with multiple turns between user and assistant, it rates the following attributes (typically between 0 and 4) for every assistant turn.