nvidia
/

Nemotron-4-340B-Reward

NeMo

English

nvidia

steerlm

reward model

Model card Files Files and versions Community

zhilinw commited on Jun 14, 2024

Commit

5193c15

verified ·

1 Parent(s): 3d6fe6d

Update README.md

Browse files

Files changed (1) hide show

README.md +7 -6

README.md CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 license: other
 license_name: nvidia-open-model-license
-license_link: LICENSE
 library_name: nemo
 language:
 - en
@@ -25,6 +25,8 @@ datasets:
 The Nemotron-4-340B-Reward is a multi-dimensional Reward Model that can be used as part of a synthetic data generation pipeline to create training data that helps researchers and developers build their own LLMs; Nemotron-4-340B-Reward consists of the Nemotron-4-340B-Base model and a linear layer that converts the final layer representation of the end-of-response token into five scalar values, each corresponding to a [HelpSteer2](https://arxiv.org/abs/2406.08673) attribute.
 Given a conversation with multiple turns between user and assistant, it rates the following attributes (typically between 0 and 4) for every assistant turn.
 1. **Helpfulness**: Overall helpfulness of the response to the prompt.
@@ -36,9 +38,10 @@ Given a conversation with multiple turns between user and assistant, it rates th
 Nonetheless, if you are only interested in using it as a conventional reward model that outputs a singular scalar, we recommend using the weights ```[0, 0, 0, 0, 0.3, 0.74, 0.46, 0.47, -0.33]``` to do elementwise multiplication with the predicted attributes (which outputs 9 float values in line with [Llama2-13B-SteerLM-RM](https://huggingface.co/nvidia/Llama2-13B-SteerLM-RM) but the first four are not trained or used)
 Under the NVIDIA Open Model License, NVIDIA confirms:
-Models are commercially usable.
-You are free to create and distribute Derivative Models.
-NVIDIA does not claim ownership to any outputs generated using the Models or Derivative Models.
 ### License:
@@ -54,11 +57,9 @@ Nemotron-4 340B-Reward can be used in the alignment stage to align pretrained mo
 **Model Input:** Text only
 **Input Format:**  String
-**Input Parameters:**  One-Dimensional (1D)
 **Model Output:** Scalar Values (List of 9 Floats)
 **Output Format:**  Float
-**Output Parameters:**  1D
 **Model Dates:** Nemotron-4-340B-Reward was trained between December 2023 and May 2024

 ---
 license: other
 license_name: nvidia-open-model-license
+license_link: https://developer.download.nvidia.com/licenses/nvidia-open-model-license-agreement-june-2024.pdf
 library_name: nemo
 language:
 - en
 The Nemotron-4-340B-Reward is a multi-dimensional Reward Model that can be used as part of a synthetic data generation pipeline to create training data that helps researchers and developers build their own LLMs; Nemotron-4-340B-Reward consists of the Nemotron-4-340B-Base model and a linear layer that converts the final layer representation of the end-of-response token into five scalar values, each corresponding to a [HelpSteer2](https://arxiv.org/abs/2406.08673) attribute.
+It supports a context length of up to 4,096 tokens.
 Given a conversation with multiple turns between user and assistant, it rates the following attributes (typically between 0 and 4) for every assistant turn.
 1. **Helpfulness**: Overall helpfulness of the response to the prompt.
 Nonetheless, if you are only interested in using it as a conventional reward model that outputs a singular scalar, we recommend using the weights ```[0, 0, 0, 0, 0.3, 0.74, 0.46, 0.47, -0.33]``` to do elementwise multiplication with the predicted attributes (which outputs 9 float values in line with [Llama2-13B-SteerLM-RM](https://huggingface.co/nvidia/Llama2-13B-SteerLM-RM) but the first four are not trained or used)
 Under the NVIDIA Open Model License, NVIDIA confirms:
+* Models are commercially usable.
+* You are free to create and distribute Derivative Models.
+* NVIDIA does not claim ownership to any outputs generated using the Models or Derivative Models.
 ### License:
 **Model Input:** Text only
 **Input Format:**  String
 **Model Output:** Scalar Values (List of 9 Floats)
 **Output Format:**  Float
 **Model Dates:** Nemotron-4-340B-Reward was trained between December 2023 and May 2024