---
license: apache-2.0
datasets:
- Intel/orca_dpo_pairs
pipeline_tag: text-generation
---

# DeciDPObyBB - a 7b DeciLM Finetune using DPO

Built by fine-tuning [DeciLM-7B-Insruct](https://huggingface.co/Deci/DeciLM-7B-instruct) using [Intel Orca DPO Pairs](https://huggingface.co/datasets/Intel/orca_dpo_pairs)

created by [bhaiyabot](bhaiyabot.in)

built for research and learning purposes!

usage:

```
message = [
    {"role": "system", "content": "You are a very helpful assistant chatbot that thinks step by step"},
    {"role": "user", "content": input}
]
tokenizer = AutoTokenizer.from_pretrained(new_model)
prompt = tokenizer.apply_chat_template(message, add_generation_prompt=True, tokenize=False)


sequences = pipeline(
    prompt,
    do_sample=True,
    temperature=1,
    num_beams=5,
    max_length=1000,
    pad_token_id=tokenizer.eos_token_id,
)
print(sequences[0]['generated_text'])
```

```bibtex
@misc{DeciFoundationModels,
title = {DeciLM-7B-instruct},
author = {DeciAI Research Team},
year = {2023}
url={https://huggingface.co/Deci/DeciLM-7B-instruct},
}

@misc{rafailov2023direct,
      title={Direct Preference Optimization: Your Language Model is Secretly a Reward Model}, 
      author={Rafael Rafailov and Archit Sharma and Eric Mitchell and Stefano Ermon and Christopher D. Manning and Chelsea Finn},
      year={2023},
      eprint={2305.18290},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}
```


more details to come soon