--- license: apache-2.0 datasets: - Intel/orca_dpo_pairs pipeline_tag: text-generation --- # DeciDPObyBB - a 7b DeciLM Finetune using DPO Built by fine-tuning [DeciLM-7B-Insruct](https://huggingface.co/Deci/DeciLM-7B-instruct) using [Intel Orca DPO Pairs](https://huggingface.co/datasets/Intel/orca_dpo_pairs) created by [bhaiyabot](bhaiyabot.in) built for research and learning purposes! usage: ``` message = [ {"role": "system", "content": "You are a very helpful assistant chatbot that thinks step by step"}, {"role": "user", "content": input} ] tokenizer = AutoTokenizer.from_pretrained(new_model) prompt = tokenizer.apply_chat_template(message, add_generation_prompt=True, tokenize=False) sequences = pipeline( prompt, do_sample=True, temperature=1, num_beams=5, max_length=1000, pad_token_id=tokenizer.eos_token_id, ) print(sequences[0]['generated_text']) ``` ```bibtex @misc{DeciFoundationModels, title = {DeciLM-7B-instruct}, author = {DeciAI Research Team}, year = {2023} url={https://huggingface.co/Deci/DeciLM-7B-instruct}, } @misc{rafailov2023direct, title={Direct Preference Optimization: Your Language Model is Secretly a Reward Model}, author={Rafael Rafailov and Archit Sharma and Eric Mitchell and Stefano Ermon and Christopher D. Manning and Chelsea Finn}, year={2023}, eprint={2305.18290}, archivePrefix={arXiv}, primaryClass={cs.LG} } ``` more details to come soon