I have pre-trained a "meta-llama/Llama-2-7b-chat-hf"
model using the transformers
library. Since my model uses additional tokens, I added them to the tokeniser before training and fine-tuned the “embed_tokens” module of the network. My training code looked like this:
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-2-7b-chat-hf",trust_remote_code=True, token=hf_token)
tokenizer.add_special_tokens({ "additional_special_tokens":[AddedToken("<|move|>"),
AddedToken("<|endmove|>"),
AddedToken("<|end|>")]})
model = AutoModelForCausalLM.from_pretrained(
model_name,
quantization_config=bnb_config,
device_map=device_map,
token=hf_token
)
model.resize_token_embeddings(len(tokenizer))
peft_config = LoraConfig(
lora_alpha=lora_alpha,
lora_dropout=lora_dropout,
r=lora_r,
bias="none",
modules_to_save= ["embed_tokens", "lm_head"],
task_type="CAUSAL_LM",
)
The model trained and saved successfully. However, when trying to load it using AutoModelForCausalLM.from_pretrained
, I get the following error:
Error(s) in loading state_dict for LlamaForCausalLM:
size mismatch for model.embed_tokens.modules_to_save.default.weight: copying a param with shape torch.Size([32003, 4096]) from checkpoint, the shape in current model is torch.Size([32000, 4096]).
size mismatch for lm_head.modules_to_save.default.weight: copying a param with shape torch.Size([32003, 4096]) from checkpoint, the shape in current model is torch.Size([32000, 4096])
I appreciate the error is due to the fact that the fine-tuned model has three additional tokens and that causes a mismatch, but how should I load a pre-trained model with a different input shape like mine?
I looked into the transformers API docs for a way to load models with AddedTokens, but I couldn’t find anything. I read a blog post mentioning that passing ignore_mismatched_sizes=True
to the from_pretrained function would solve the issue, but it didn’t work for me.