How to get result as good as Hugging Face Chat Mixtral-8x7b-Instruct

#107
by Panckackes - opened

hello everyone when I try to use mixtral at work to integrate it into a work process, however when I use the same prompt in hugging face chat and locally I don't get the same results at all, locally the results are absolutely unusable.

here is my code :

#install packages
!pip install -q langchain
!pip install -q transformers
!pip install -q ctransformers[cuda]

#Import the Model
from transformers import AutoModelForCausalLM, AutoTokenizer
device = "cuda" # the device to load the model onto
#model = AutoModelForCausalLM.from_pretrained("bigcode/octocoder", load_in_8bit=True, pad_token_id=0)
model = AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B-Instruct-v0.1")
tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-Instruct-v0.1")
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

from transformers import pipeline
text_generation_pipeline = transformers.pipeline(
model=model,
tokenizer=tokenizer,
task="text-generation",
temperature=0.2,
repetition_penalty=1.2,
max_new_tokens=500,
device=0, # -1 CPU, 0 GPU
top_k=50,
top_p=0.95,
do_sample=True,
pad_token_id = 50256,
)
mistral_llm = HuggingFacePipeline(pipeline=text_generation_pipeline)

#Define the Prompt
prompt = """

Hugging_Face_Mixtral.PNG
Local_Mixtral.PNG

I have the exact same parameters as describe in the source code of hugging face (temperature, etc) but I get something unusable locally and something super usable on hugging face chat.

Local_Mixtral.PNG

Have you tried to look into the prompt template? In my experience, Mistral and Mixtral models are super sensitive with those symbols, one extra space and you are on a very wrong path in the response! You can use HuggingChat UI to see the actual prompt that goes into the LLM, so when you see a good response just click on that download button in the user's question:

image.png

That will show you exactly how the prompt was constructed (in addition to generation configs)

{
  "note": "This is a preview of the prompt that will be sent to the model when retrying the message. It may differ from what was sent in the past if the parameters have been updated since",
  "prompt": "<s> [INST]You are a helpful assistant.\n This is a test, just say test back! [/INST]",
  "model": "mistralai/Mixtral-8x7B-Instruct-v0.1",
  "parameters": {
    "temperature": 0.6,
    "truncate": 24576,
    "max_new_tokens": 8192,
    "stop": [
      "</s>"
    ],
    "top_p": 0.95,
    "top_k": 50,
    "repetition_penalty": 1.2,
    "stop_sequences": [
      "</s>"
    ],
    "return_full_text": false
  }
}

@MaziyarPanahi thank you for these instructions. Where do I see this download button? I went to HuggingChat and started a chat with mistralai/Mixtral-8x7B-Instruct-v0.1 but could not find it.
I was able to find some templates in the source code here and here

You are welcome @kristjan
On HuggingFace/chat, when you ask a question you can hover over the question and these icons will appear:

image.png

It is right next to the re-generate icon/button

Your lines (below) seem to refer to Mistal 7B, not Mixtral 8x7B?

model = AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B-Instruct-v0.1")
tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-Instruct-v0.1")

HuggingChat does also offer the 7B Mistral model. Pretty solid model, I really like it. (also sensitive to prompt template, one extra space before EOS and the whole thing changes)

Yes, but if the goal is to reproduce HuggingChat's Mixtral 8x7B results, probably can't do that straightforwardly when calling Mistral 7B instead of Mixtral 8x-7B ^_^'

You can switch models in HuggingChat. Choose whichever model you want to test on HuggingChat, if you see they perform better hosted on HuggingChat, you can use this feature to see the exact prompt template and parameters to reproduce locally. (I was just making an example, you should adjust the model's name accordingly) - this should answer the question "why a model performs better on HuggingChat than my local ENV"/

I think there is a misunderstanding @MaziyarPanahi , Kerea saw a brutal error on the original post that instantly explains why it's not working, the main reason the results are different are first of all caused by this line: AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B-Instruct-v0.1"), it's obvious that the author of the post will not have the same results as Mixtral if they use Mistral ! Of course other parameters will have an impact, but it's kinda useless if they are using the wrong model all together.

I think there is a misunderstanding @MaziyarPanahi , Kerea saw a brutal error on the original post that instantly explains why it's not working, the main reason the results are different are first of all caused by this line: AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B-Instruct-v0.1"), it's obvious that the author of the post will not have the same results as Mixtral if they use Mistral ! Of course other parameters will have an impact, but it's kinda useless if they are using the wrong model all together.

I understand now and thanks for the clarification :)

Sign up or log in to comment