Define quantization configuration

quantization_config = BitsAndBytesConfig(
load_in_4bit=True,

)

Model path

mistral_models_path = "mistralai/Mistral-Nemo-Instruct-2407"

Load tokenizer

tokenizer = AutoTokenizer.from_pretrained(
mistral_models_path,
cache_dir=r'D:\ai\mistral_models\bosta'
)

Load model with updated configurations

model = AutoModelForCausalLM.from_pretrained(
mistral_models_path,
torch_dtype=torch.bfloat16,

quantization_config=quantization_config,
cache_dir=r'D:\ai\mistral_models\bosta'
)

conversation = [{"role": "user", "content": "tell me a history of a bread"}]
tools = [get_current_weather]

format and tokenize the tool use prompt

inputs = tokenizer.apply_chat_template(
conversation,

add_generation_prompt=True,
return_dict=True,
return_tensors="pt",
)

inputs.to(model.device)
a = model.generate(**inputs, max_new_tokens=50, temperature=0.7,
top_k=50,
top_p=0.9)
print(tokenizer.decode(a[0], skip_special_tokens=True))

#return
Setting pad_token_id to eos_token_id:2 for open-end generation.
C:\Users\joaom\AppData\Roaming\Python\Python311\site-packages\transformers\generation\utils.py:1259: UserWarning: Using the model-agnostic default max_length (=20) to control the generation length. We recommend setting max_new_tokens to control the maximum length of the generation.
warnings.warn(
tell me a history of a breadSurebiasedbiasedbiasedbiasedbiasedbiasedbiasedbiased

mistralai
/

Mistral-Nemo-Instruct-2407

why the model is returng biasedbiasedbiasedbiasedbiasedbiasedbiasedbiased!!!!!!

Define quantization configuration

Model path

Load tokenizer

Load model with updated configurations

format and tokenize the tool use prompt