Which model is better for a chatbot fine-tuned on healthcare data?
Meta-Llama-3-8B-Instruct
Mistral-7B-Instruct-v0.2
We have been getting great results with Mistral and were about to initiate our final training, but now Meta has released this new version, and so I am hoping people can offer their two cents to aid our decision.
We’ve been using Mistral-8x7B-Instruct-v0.1 model to preprocess our training samples through fireworks.ai API for $0.50/1M tokens and it’s been great. Cheaper than GPT-4, fireworks.ai is great, and it’s been great at formatting JSON.
We’ve then been taking our preprocessed samples and fine-tuning Mistral-7B-Instruct-v0.2 using together.ai, and our first two test trainings blew us away. We’re almost finished preprocessing our entire dataset and are about to fine-tune a model using 1M samples, so we’re really excited!
I plan on doing a smaller test fine-tuning with LLaMA 3 8B Instruct after we do our full training because I can’t see any major benefit from using it that justifies changing our strategy since we’re really focused on just releasing our version 1 model at this point. But we plan on doing some side-by-side tests using a smaller dataset with like 50K samples so we can consider using LLaMA for our version 2 model.
One thing I will say about Mistral is that they don’t have a designated syntax that I’m aware of for a system prompt, but we emulate one by including two messages (user and assistant) at the start of our messages array where roles and system-prompty stuff can be established, and it works great.