`model_max_length` might be missing from the `tokenizer_config.json`
New to HF and total beginner with any language models. Not sure if this is a feature or a bug tho. Hope this might help.
I bumped into an issue that calling
clip_model_name: str = "laion/CLIP-ViT-H-14-laion2B-s32B-b79K"
clip_model = CLIPModel.from_pretrained(clip_model_name, local_files_only=True)
clip_tokenizer = AutoTokenizer.from_pretrained(clip_model_name)
often returns errors of tensor shape not matching, when calling clip_tokenizer
. This isn't an issue for other Openai/CLIP models. As they all have model_max_length: 77
in the tokenizer_config.json
.
my temporary workaround: token_features = clip_tokenizer([something], return_tensors="pt", truncation=True, max_length=77)
@rwightman would you know how to fix this ?
The tokenizer config was taken from https://huggingface.co/openai/clip-vit-base-patch32/blob/main/tokenizer_config.json ... so suprised that one works and this one doesn't, perhaps that specific OpenAI CLIP model was missing the max length... will look closer