Loading pre-trained models with AddedTokens

stefanomezza · May 30, 2024, 12:42am

I have pre-trained a "meta-llama/Llama-2-7b-chat-hf" model using the transformers library. Since my model uses additional tokens, I added them to the tokeniser before training and fine-tuned the “embed_tokens” module of the network. My training code looked like this:

  tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-2-7b-chat-hf",trust_remote_code=True, token=hf_token)
  tokenizer.add_special_tokens({ "additional_special_tokens":[AddedToken("<|move|>"),
                                                              AddedToken("<|endmove|>"),
                                                              AddedToken("<|end|>")]})

  model = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config=bnb_config,
    device_map=device_map,
    token=hf_token
  )
  model.resize_token_embeddings(len(tokenizer))
  peft_config = LoraConfig(
      lora_alpha=lora_alpha,
      lora_dropout=lora_dropout,
      r=lora_r,
      bias="none",
      modules_to_save= ["embed_tokens", "lm_head"],
      task_type="CAUSAL_LM",
  )

The model trained and saved successfully. However, when trying to load it using AutoModelForCausalLM.from_pretrained, I get the following error:

Error(s) in loading state_dict for LlamaForCausalLM:
size mismatch for model.embed_tokens.modules_to_save.default.weight: copying a param with shape torch.Size([32003, 4096]) from checkpoint, the shape in current model is torch.Size([32000, 4096]).
size mismatch for lm_head.modules_to_save.default.weight: copying a param with shape torch.Size([32003, 4096]) from checkpoint, the shape in current model is torch.Size([32000, 4096])

I appreciate the error is due to the fact that the fine-tuned model has three additional tokens and that causes a mismatch, but how should I load a pre-trained model with a different input shape like mine?

I looked into the transformers API docs for a way to load models with AddedTokens, but I couldn’t find anything. I read a blog post mentioning that passing ignore_mismatched_sizes=True to the from_pretrained function would solve the issue, but it didn’t work for me.

thatone · October 14, 2024, 2:07pm

was this ever solved? i have basically the exact problem

RuntimeError: Error(s) in loading state_dict for MistralForCausalLM:
size mismatch for model.embed_tokens.weight: copying a param with shape torch.Size([32002, 4096]) from checkpoint, the shape in current model is torch.Size([32000, 4096]).
size mismatch for lm_head.weight: copying a param with shape torch.Size([32002, 4096]) from checkpoint, the shape in current model is torch.Size([32000, 4096]).

John6666 · October 14, 2024, 2:14pm

This is a post dealing with a completely different issue, but sometimes there are bugs specific to the version of PEFT, so you might want to try changing the version of each library.
My recent experience has been that some programs do not work if the transformers library is newer than 4.44.

Topic		Replies	Views
Unable to load a model with added special token 🤗Transformers	1	493	April 3, 2024
Could not load model meta-llama/Llama-2-7b-chat-hf with any of the following classes 🤗Transformers	22	46395	December 19, 2024
Using Token to Access Llama2 Beginners	3	11948	February 21, 2024
Loading and using Autotrain model error 🤗AutoTrain	0	625	November 28, 2023
Wrror while accessing pretrained model with auth token Models	0	131	March 19, 2024

Loading pre-trained models with AddedTokens

Related topics