allenai
/

tulu-v2.5-ppo-13b-uf-mean-70b-mix-rm

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

hamishivi commited on Jun 12, 2024

Commit

51944c8

·

verified ·

1 Parent(s): e83c2fb

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -7,7 +7,7 @@ datasets:
 - allenai/tulu-v2-sft-mixture
 language:
 - en
-base_model: allenai/tulu-2-dpo-13b
 license: apache-2.0
 ---
 <center>
@@ -18,7 +18,7 @@ license: apache-2.0
 Tulu is a series of language models that are trained to act as helpful assistants.
 Tulu V2.5 is a series of models trained using DPO and PPO starting from the [Tulu 2 suite](https://huggingface.co/collections/allenai/tulu-v2-suite-6551b56e743e6349aab45101).
-This model is trained on the UltraFeedback dataset (using the per-aspect/fine-grained scores for deciding chosen and rejected) using PPO.
 We used a 70B RM trained on our preference data mix, and then used the UltraFeedback prompts during PPO training.
 For more details, read the paper:

 - allenai/tulu-v2-sft-mixture
 language:
 - en
+base_model: allenai/tulu-2-13b
 license: apache-2.0
 ---
 <center>
 Tulu is a series of language models that are trained to act as helpful assistants.
 Tulu V2.5 is a series of models trained using DPO and PPO starting from the [Tulu 2 suite](https://huggingface.co/collections/allenai/tulu-v2-suite-6551b56e743e6349aab45101).
+This model is trained using PPO.
 We used a 70B RM trained on our preference data mix, and then used the UltraFeedback prompts during PPO training.
 For more details, read the paper: