Owl-vit batch images inference

gfatigati · January 17, 2023, 10:02am

Dear hugging face users,

I’m trying to implement batch images inference on Owl-Vit. At the moment, I’m working on a set of 11 images, with 72 labels and batch_size=2. I get information how to implement batch size from here:

https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/zeroshot_object_detection_with_owlvit.ipynb#scrollTo=-Wc92cWK-Aas

with the only different I’m using “google/owlvit-large-patch14” model instead of “google/owlvit-large-patch32”. The code works well for first two images, but on third, I get:

RuntimeError: shape '[4, 37, 768]' is invalid for input of size 115200

here:

with torch.no_grad():
    outputs = model(**inputs)

I don’t understand what such shapes are. Are referring to image in process or the underlying net? Maybe I made some mistakes? I’m using too much labels? Thanks.

nielsr · January 18, 2023, 1:15pm

cc @adirik

merve · May 7, 2024, 1:04pm

A bit late to answer this but this might be due to how you’re not batching your text queries. Also keep in mind it seems that you might’ve hit max text tokens you can pass to OWL which uses CLIP tokenizer.

Topic		Replies	Views
Inference on Multi-GPU/multinode Beginners	4	6908	January 12, 2023
Inference using pretrained models and batch size > 1 Beginners	0	396	August 24, 2022
OwlV2 significantly slower than OwlVit 🤗Transformers	0	448	February 14, 2024
Is it possible to train ViT with different number of patches in every batch? (Non-square images dataset) Models	3	2623	May 1, 2024
Failling fine-tuning OWL-ViT Beginners	4	2432	April 11, 2023

Owl-vit batch images inference

Related topics