Loading pre-trained models with AddedTokens

I have pre-trained a "meta-llama/Llama-2-7b-chat-hf" model using the transformers library. Since my model uses additional tokens, I added them to the tokeniser before training and fine-tuned the “embed_tokens” module of the network. My training code looked like this:

  tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-2-7b-chat-hf",trust_remote_code=True, token=hf_token)
  tokenizer.add_special_tokens({ "additional_special_tokens":[AddedToken("<|move|>"),
                                                              AddedToken("<|endmove|>"),
                                                              AddedToken("<|end|>")]})

  model = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config=bnb_config,
    device_map=device_map,
    token=hf_token
  )
  model.resize_token_embeddings(len(tokenizer))
  peft_config = LoraConfig(
      lora_alpha=lora_alpha,
      lora_dropout=lora_dropout,
      r=lora_r,
      bias="none",
      modules_to_save= ["embed_tokens", "lm_head"],
      task_type="CAUSAL_LM",
  )

The model trained and saved successfully. However, when trying to load it using AutoModelForCausalLM.from_pretrained, I get the following error:

Error(s) in loading state_dict for LlamaForCausalLM:
size mismatch for model.embed_tokens.modules_to_save.default.weight: copying a param with shape torch.Size([32003, 4096]) from checkpoint, the shape in current model is torch.Size([32000, 4096]).
size mismatch for lm_head.modules_to_save.default.weight: copying a param with shape torch.Size([32003, 4096]) from checkpoint, the shape in current model is torch.Size([32000, 4096])

I appreciate the error is due to the fact that the fine-tuned model has three additional tokens and that causes a mismatch, but how should I load a pre-trained model with a different input shape like mine?

I looked into the transformers API docs for a way to load models with AddedTokens, but I couldn’t find anything. I read a blog post mentioning that passing ignore_mismatched_sizes=True to the from_pretrained function would solve the issue, but it didn’t work for me.

was this ever solved? i have basically the exact problem

RuntimeError: Error(s) in loading state_dict for MistralForCausalLM:
size mismatch for model.embed_tokens.weight: copying a param with shape torch.Size([32002, 4096]) from checkpoint, the shape in current model is torch.Size([32000, 4096]).
size mismatch for lm_head.weight: copying a param with shape torch.Size([32002, 4096]) from checkpoint, the shape in current model is torch.Size([32000, 4096]).

This is a post dealing with a completely different issue, but sometimes there are bugs specific to the version of PEFT, so you might want to try changing the version of each library.
My recent experience has been that some programs do not work if the transformers library is newer than 4.44.