allenai
/

tulu-v2.5-70b-uf-rm

Text Classification

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

hamishivi commited on Jun 12, 2024

Commit

c72eec1

·

verified ·

1 Parent(s): bd23b17

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -19,7 +19,7 @@ license: apache-2.0
 Tulu is a series of language models that are trained to act as helpful assistants.
 Tulu V2.5 is a series of models trained using DPO and PPO starting from the [Tulu 2 suite](https://huggingface.co/collections/allenai/tulu-v2-suite-6551b56e743e6349aab45101).
 This is a 70B reward model used for PPO training trained on the UltraFeedback dataset.
-It was used to train [this](https://huggingface.co/allenai/tulu-v2.5-ppo-13b-uf-mean-70b-uf-rm) model.
 For more details, read the paper:
 [Unpacking DPO and PPO: Disentangling Best Practices for Learning from Preference Feedback](https://link.todo).

 Tulu is a series of language models that are trained to act as helpful assistants.
 Tulu V2.5 is a series of models trained using DPO and PPO starting from the [Tulu 2 suite](https://huggingface.co/collections/allenai/tulu-v2-suite-6551b56e743e6349aab45101).
 This is a 70B reward model used for PPO training trained on the UltraFeedback dataset.
+It was used to train [this](https://huggingface.co/allenai/tulu-v2.5-ppo-13b-uf-mean-70b-uf-rm) model, and [this](https://huggingface.co/allenai/tulu-v2.5-ppo-13b-uf-mean-70b-uf-rm-mixed-prompts) model.
 For more details, read the paper:
 [Unpacking DPO and PPO: Disentangling Best Practices for Learning from Preference Feedback](https://link.todo).