Update README.md
Browse files
README.md
CHANGED
@@ -1,7 +1,7 @@
|
|
1 |
---
|
2 |
license: other
|
3 |
license_name: nvidia-open-model-license
|
4 |
-
license_link:
|
5 |
library_name: nemo
|
6 |
language:
|
7 |
- en
|
@@ -25,6 +25,8 @@ datasets:
|
|
25 |
|
26 |
The Nemotron-4-340B-Reward is a multi-dimensional Reward Model that can be used as part of a synthetic data generation pipeline to create training data that helps researchers and developers build their own LLMs; Nemotron-4-340B-Reward consists of the Nemotron-4-340B-Base model and a linear layer that converts the final layer representation of the end-of-response token into five scalar values, each corresponding to a [HelpSteer2](https://arxiv.org/abs/2406.08673) attribute.
|
27 |
|
|
|
|
|
28 |
Given a conversation with multiple turns between user and assistant, it rates the following attributes (typically between 0 and 4) for every assistant turn.
|
29 |
|
30 |
1. **Helpfulness**: Overall helpfulness of the response to the prompt.
|
@@ -36,9 +38,10 @@ Given a conversation with multiple turns between user and assistant, it rates th
|
|
36 |
Nonetheless, if you are only interested in using it as a conventional reward model that outputs a singular scalar, we recommend using the weights ```[0, 0, 0, 0, 0.3, 0.74, 0.46, 0.47, -0.33]``` to do elementwise multiplication with the predicted attributes (which outputs 9 float values in line with [Llama2-13B-SteerLM-RM](https://huggingface.co/nvidia/Llama2-13B-SteerLM-RM) but the first four are not trained or used)
|
37 |
|
38 |
Under the NVIDIA Open Model License, NVIDIA confirms:
|
39 |
-
|
40 |
-
|
41 |
-
|
|
|
42 |
|
43 |
### License:
|
44 |
|
@@ -54,11 +57,9 @@ Nemotron-4 340B-Reward can be used in the alignment stage to align pretrained mo
|
|
54 |
|
55 |
**Model Input:** Text only
|
56 |
**Input Format:** String
|
57 |
-
**Input Parameters:** One-Dimensional (1D)
|
58 |
|
59 |
**Model Output:** Scalar Values (List of 9 Floats)
|
60 |
**Output Format:** Float
|
61 |
-
**Output Parameters:** 1D
|
62 |
|
63 |
**Model Dates:** Nemotron-4-340B-Reward was trained between December 2023 and May 2024
|
64 |
|
|
|
1 |
---
|
2 |
license: other
|
3 |
license_name: nvidia-open-model-license
|
4 |
+
license_link: https://developer.download.nvidia.com/licenses/nvidia-open-model-license-agreement-june-2024.pdf
|
5 |
library_name: nemo
|
6 |
language:
|
7 |
- en
|
|
|
25 |
|
26 |
The Nemotron-4-340B-Reward is a multi-dimensional Reward Model that can be used as part of a synthetic data generation pipeline to create training data that helps researchers and developers build their own LLMs; Nemotron-4-340B-Reward consists of the Nemotron-4-340B-Base model and a linear layer that converts the final layer representation of the end-of-response token into five scalar values, each corresponding to a [HelpSteer2](https://arxiv.org/abs/2406.08673) attribute.
|
27 |
|
28 |
+
It supports a context length of up to 4,096 tokens.
|
29 |
+
|
30 |
Given a conversation with multiple turns between user and assistant, it rates the following attributes (typically between 0 and 4) for every assistant turn.
|
31 |
|
32 |
1. **Helpfulness**: Overall helpfulness of the response to the prompt.
|
|
|
38 |
Nonetheless, if you are only interested in using it as a conventional reward model that outputs a singular scalar, we recommend using the weights ```[0, 0, 0, 0, 0.3, 0.74, 0.46, 0.47, -0.33]``` to do elementwise multiplication with the predicted attributes (which outputs 9 float values in line with [Llama2-13B-SteerLM-RM](https://huggingface.co/nvidia/Llama2-13B-SteerLM-RM) but the first four are not trained or used)
|
39 |
|
40 |
Under the NVIDIA Open Model License, NVIDIA confirms:
|
41 |
+
|
42 |
+
* Models are commercially usable.
|
43 |
+
* You are free to create and distribute Derivative Models.
|
44 |
+
* NVIDIA does not claim ownership to any outputs generated using the Models or Derivative Models.
|
45 |
|
46 |
### License:
|
47 |
|
|
|
57 |
|
58 |
**Model Input:** Text only
|
59 |
**Input Format:** String
|
|
|
60 |
|
61 |
**Model Output:** Scalar Values (List of 9 Floats)
|
62 |
**Output Format:** Float
|
|
|
63 |
|
64 |
**Model Dates:** Nemotron-4-340B-Reward was trained between December 2023 and May 2024
|
65 |
|