Is finetuning supported for this model? If so can people give me some pointers?
Is finetuning supported for this model? If so can people give me some pointers?
I found this post: /static-proxy?url=https%3A%2F%2Fdiscuss.huggingface.co%2Ft%2Ffinetune-blip-on-customer-dataset-20893%2F28446%3C%2Fa%3E%3Cbr%3Ebut this is about Salesforce/blip-vqa-base
instead of this model Salesforce/blip-image-captioning-large
Hi
@husjerry
Fine-tuning the VQA should be done in the same way than the image captioning fine-tuning, I think the only difference is on the way you prompt the model (but I am not sure).
You can have a look at the instructions shared on that thread or refer to the original vqa fine-tuning script:; https://github.com/salesforce/BLIP/blob/main/train_vqa.py and try to use the HF model or use their model, then convert it to HF version using the conversion script here: https://github.com/huggingface/transformers/blob/main/src/transformers/models/blip/convert_blip_original_pytorch_to_hf.py