Fine tuning with transformers?
I've had luck fine tuning Zephyr and other fine tunes of Mistral with transformers 4.35.2 but Starling throws an error related to vocab mismatch (likely because I'm using Mistral)
shape '[-1, 32000]' is invalid for input of size 19457216
this originates from transformers/models/mistral/modeling_mistral.py
line 1032 for those interested.shift_logits.view(-1, self.config.vocab_size)
Does anyone know of a workaround until we have 1st class support in transformers?
For what it’s worth I’ve successfully done a full fine tune on Starling with Transformers 4.35.2 (with Axolotl). Are you perhaps adding tokens and changing the vocab size? I trained with the OpenChat prompt format and stuck with the default EOS, BOS tokens etc so no added tokens were necessary. I think the openchat.json file may also be relevant?
When do you use the openchat.json file? I didn't even pull that down ahead of converting weights/fine tuning so I'm curious to learn more
https://huggingface.co/berkeley-nest/Starling-LM-7B-alpha/blob/main/openchat.json
I may be wrong about that. It was one of a couple of files added after the fact that seemed to fix early training issues, but that one seems more related to compatibility with the OpenChat API. I'm working with Transformers indirectly, via Axolotl, so it's difficult to tease out why it's working in my instance versus yours. The OpenChat 3.5 format used by Starling adds a couple of tokens to the vocabulary that I suspect are the source of your issues. Hopefully the devs can help.