--- license: mit base_model: jpacifico/Chocolatine-3B-Instruct-DPO-Revised pipeline_tag: text-generation inference: false model_creator: jpacifico model_name: Chocolatine-3B-Instruct-DPO-Revised model_type: phi3 language: - fr - en datasets: - jpacifico/french-orca-dpo-pairs-revised library_name: transformers quantized_by: ThiloteE tags: - text-generation-inference - transformers - GGUF - GPT4All-community - GPT4All - conversational - french - chocolatine --- > [!NOTE] >This is a model that is assumed to perform well, but may require more testing and user feedback. Be aware, only models featured within the GUI of GPT4All, are curated and officially supported by Nomic. Use at your own risk. # About - Static quants of https://huggingface.co/jpacifico/Chocolatine-3B-Instruct-DPO-Revised at commit [fa3e742](https://huggingface.co/jpacifico/Chocolatine-3B-Instruct-DPO-Revised/commit/fa3e742dd80b3f38127fb62f5fc66eaf468fb95c) - Quantized by [ThiloteE](https://huggingface.co/ThiloteE) with llama.cpp commit [e09a800](https://github.com/ggerganov/llama.cpp/commit/e09a800f9a9b19c73aa78e03b4c4be8ed988f3e6) These quants were created with a customized configuration that have been proven to not cause visible end of string (eos) tokens during inference with [GPT4All](https://www.nomic.ai/gpt4all). The config.json, generation_config.json and tokenizer_config.json differ from the original configuration as can be found in the original model's repository at the time of creation of these quants. # Prompt Template (for GPT4All) Example System Prompt: ``` <|system|> Vous trouverez ci-dessous une instruction décrivant une tâche. Rédigez une réponse qui réponde de manière appropriée à la demande.<|end|> ``` Chat Template: ``` <|user|> %1<|end|> <|assistant|> %2<|end|> ``` # Context Length `4096` Use a lower value during inference, if you do not have enough RAM or VRAM. # Provided Quants | Link | Type | Size/GB | Notes | |:-----|:-----|--------:|:------| | [GGUF](https://huggingface.co/GPT4All-Community/Chocolatine-3B-Instruct-DPO-Revised-GGUF/resolve/main/Chocolatine-3B-Instruct-DPO-Revised-Q4_0.gguf?download=true) | Q4_0 | 2.44 | fast, recommended | # About GGUF If you are unsure how to use GGUF files, refer to one of [TheBloke's READMEs](https://huggingface.co/TheBloke/DiscoLM_German_7b_v1-GGUF) for more details, including on how to concatenate multi-part files. Here is a handy graph by ikawrakow comparing some quant types (lower is better): ![image.png](https://www.nethype.de/huggingface_embed/quantpplgraph.png) And here are Artefact2's thoughts on the matter: https://gist.github.com/Artefact2/b5f810600771265fc1e39442288e8ec9 # Thanks I thank Mradermacher and TheBloke for Inspiration to this model card and their contributions to open source. Also 3Simplex for lots of help along the way. Shoutout to the GPT4All and llama.cpp communities :-) ------ ------ ------ # Original Model card: ### Chocolatine-3B-Instruct-DPO-Revised DPO fine-tuned of [microsoft/Phi-3-mini-4k-instruct](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct) (3.82B params) using the [jpacifico/french-orca-dpo-pairs-revised](https://huggingface.co/datasets/jpacifico/french-orca-dpo-pairs-revised) rlhf dataset. Training in French also improves the model in English, surpassing the performances of its base model. Window context = 4k tokens ### Benchmarks Chocolatine is the best-performing 3B model on the [OpenLLM Leaderboard](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard) (august 2024) ![image/png](https://github.com/jpacifico/Chocolatine-LLM/blob/main/Assets/openllm_choco3b_revised.png?raw=false) | Metric |Value| |-------------------|----:| |**Avg.** |**27.63**| |IFEval (0-Shot) |56.23| |BBH (3-Shot) |37.16| |MATH Lvl 5 (4-Shot)|14.5| |GPQA (0-shot) |9.62| |MuSR (0-shot) |15.1| |MMLU-PRO (5-shot) |33.21| ### MT-Bench-French Chocolatine-3B-Instruct-DPO-Revised is outperforming GPT-3.5-Turbo on [MT-Bench-French](https://huggingface.co/datasets/bofenghuang/mt-bench-french) by Bofeng Huang, used with [multilingual-mt-bench](https://github.com/Peter-Devine/multilingual_mt_bench) ``` ########## First turn ########## score model turn gpt-3.5-turbo 1 8.1375 Chocolatine-3B-Instruct-DPO-Revised 1 7.9875 Daredevil-8B 1 7.8875 Daredevil-8B-abliterated 1 7.8375 Chocolatine-3B-Instruct-DPO-v1.0 1 7.6875 NeuralDaredevil-8B-abliterated 1 7.6250 Phi-3-mini-4k-instruct 1 7.2125 Meta-Llama-3-8B-Instruct 1 7.1625 vigostral-7b-chat 1 6.7875 Mistral-7B-Instruct-v0.3 1 6.7500 Mistral-7B-Instruct-v0.2 1 6.2875 French-Alpaca-7B-Instruct_beta 1 5.6875 vigogne-2-7b-chat 1 5.6625 vigogne-2-7b-instruct 1 5.1375 ########## Second turn ########## score model turn Chocolatine-3B-Instruct-DPO-Revised 2 7.937500 gpt-3.5-turbo 2 7.679167 Chocolatine-3B-Instruct-DPO-v1.0 2 7.612500 NeuralDaredevil-8B-abliterated 2 7.125000 Daredevil-8B 2 7.087500 Daredevil-8B-abliterated 2 6.873418 Meta-Llama-3-8B-Instruct 2 6.800000 Mistral-7B-Instruct-v0.2 2 6.512500 Mistral-7B-Instruct-v0.3 2 6.500000 Phi-3-mini-4k-instruct 2 6.487500 vigostral-7b-chat 2 6.162500 French-Alpaca-7B-Instruct_beta 2 5.487395 vigogne-2-7b-chat 2 2.775000 vigogne-2-7b-instruct 2 2.240506 ########## Average ########## score model Chocolatine-3B-Instruct-DPO-Revised 7.962500 gpt-3.5-turbo 7.908333 Chocolatine-3B-Instruct-DPO-v1.0 7.650000 Daredevil-8B 7.487500 NeuralDaredevil-8B-abliterated 7.375000 Daredevil-8B-abliterated 7.358491 Meta-Llama-3-8B-Instruct 6.981250 Phi-3-mini-4k-instruct 6.850000 Mistral-7B-Instruct-v0.3 6.625000 vigostral-7b-chat 6.475000 Mistral-7B-Instruct-v0.2 6.400000 French-Alpaca-7B-Instruct_beta 5.587866 vigogne-2-7b-chat 4.218750 vigogne-2-7b-instruct 3.698113 ``` ### Usage You can run this model using my [Colab notebook](https://github.com/jpacifico/Chocolatine-LLM/blob/main/Chocolatine_3B_inference_test_colab.ipynb) You can also run Chocolatine using the following code: ```python import transformers from transformers import AutoTokenizer # Format prompt message = [ {"role": "system", "content": "You are a helpful assistant chatbot."}, {"role": "user", "content": "What is a Large Language Model?"} ] tokenizer = AutoTokenizer.from_pretrained(new_model) prompt = tokenizer.apply_chat_template(message, add_generation_prompt=True, tokenize=False) # Create pipeline pipeline = transformers.pipeline( "text-generation", model=new_model, tokenizer=tokenizer ) # Generate text sequences = pipeline( prompt, do_sample=True, temperature=0.7, top_p=0.9, num_return_sequences=1, max_length=200, ) print(sequences[0]['generated_text']) ``` * **4-bit quantized version** is available here : [jpacifico/Chocolatine-3B-Instruct-DPO-Revised-Q4_K_M-GGUF](https://huggingface.co/jpacifico/Chocolatine-3B-Instruct-DPO-Revised-Q4_K_M-GGUF) * **Ollama**: [jpacifico/chocolatine-3b](https://ollama.com/jpacifico/chocolatine-3b) ```bash ollama run jpacifico/chocolatine-3b ``` Ollama *Modelfile* example : ```bash FROM ./chocolatine-3b-instruct-dpo-revised-q4_k_m.gguf TEMPLATE """{{ if .System }}<|system|> {{ .System }}<|end|> {{ end }}{{ if .Prompt }}<|user|> {{ .Prompt }}<|end|> {{ end }}<|assistant|> {{ .Response }}<|end|> """ PARAMETER stop """{"stop": ["<|end|>","<|user|>","<|assistant|>"]}""" SYSTEM """You are a friendly assistant called Chocolatine.""" ``` ### Limitations The Chocolatine model is a quick demonstration that a base model can be easily fine-tuned to achieve compelling performance. It does not have any moderation mechanism. - **Developed by:** Jonathan Pacifico, 2024 - **Model type:** LLM - **Language(s) (NLP):** French, English - **License:** MIT