JunxiongWang
/

mamba_0_5_dpo_ep3

Text Generation

alignment-handbook

Generated from Trainer

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Junxiong Wang commited on Jul 18, 2024

Commit

1f9986e

•

1 Parent(s): 24a7edb

add models

Files changed (2) hide show

README.md +4 -4
configs.yaml +3 -3

README.md CHANGED Viewed

@@ -1,21 +1,21 @@
 ---
-base_model: /data/junxiong/sft/zephyr_0_5_sft_open_not_openhermes_progressive_train_largest_dataset/
 tags:
 - alignment-handbook
 - generated_from_trainer
 datasets:
 - HuggingFaceH4/ultrafeedback_binarized
 model-index:
-- name: zephyr_0_5_dpo_open_not_openhermes_progressive_train_largest_dataset_ep3
   results: []
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
-# zephyr_0_5_dpo_open_not_openhermes_progressive_train_largest_dataset_ep3
-This model is a fine-tuned version of [/data/junxiong/sft/zephyr_0_5_sft_open_not_openhermes_progressive_train_largest_dataset/](https://huggingface.co//data/junxiong/sft/zephyr_0_5_sft_open_not_openhermes_progressive_train_largest_dataset/) on the HuggingFaceH4/ultrafeedback_binarized dataset.
 It achieves the following results on the evaluation set:
 - Loss: 0.7141
 - Rewards/chosen: -5.3346

 ---
+base_model: JunxiongWang/mamba_0_5_sft
 tags:
 - alignment-handbook
 - generated_from_trainer
 datasets:
 - HuggingFaceH4/ultrafeedback_binarized
 model-index:
+- name: mamba_0_5_dpo_ep3
   results: []
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
+# mamba_0_5_dpo_ep3
+This model is a fine-tuned version of [JunxiongWang/mamba_0_5_dpo_ep3](https://huggingface.co/JunxiongWang/mamba_0_5_dpo_ep3) on the HuggingFaceH4/ultrafeedback_binarized dataset.
 It achieves the following results on the evaluation set:
 - Loss: 0.7141
 - Rewards/chosen: -5.3346

configs.yaml CHANGED Viewed

@@ -1,8 +1,8 @@
-mamba_0_5:
   prompt_template: "zephyr-7b-alpha/prompt.txt"
   fn_completions: "huggingface_local_completions"
   completions_kwargs:
-    model_name: "/data/junxiong/sft/zephyr_0_5_dpo_open_not_openhermes_progressive_train_largest_dataset_ep3/"
     model_kwargs:
       torch_dtype: 'bfloat16'
     max_new_tokens: 2048
@@ -10,4 +10,4 @@ mamba_0_5:
     top_p: 1.0
     do_sample: True
   pretty_name: "Mamba 0 5 From Zephyr 7B Beta"
-  link: "https://huggingface.co/HuggingFaceH4/zephyr-7b-beta"

+mamba_0_5_dpo_ep3:
   prompt_template: "zephyr-7b-alpha/prompt.txt"
   fn_completions: "huggingface_local_completions"
   completions_kwargs:
+    model_name: "JunxiongWang/mamba_0_5_dpo_ep3"
     model_kwargs:
       torch_dtype: 'bfloat16'
     max_new_tokens: 2048
     top_p: 1.0
     do_sample: True
   pretty_name: "Mamba 0 5 From Zephyr 7B Beta"
+  link: "https://huggingface.co/JunxiongWang/mamba_0_5_dpo_ep3"