Inference with VitMAE by providing a mask

ManuD · January 3, 2024, 10:29pm

Hi,

I am trying to use https://huggingface.co/docs/transformers/model_doc/vit_mae: More specifically, I have an image and a mask which specifies the parts of the image I’d like to reconstruct.

As I understand the paper, the model is designed for this tasks, but looking into the code and demos I always find that the masks is generated by the forward method of the mae model.

Is my understanding correct or am I missing some essential parts?
Is there a way to achieve my goal without changing too much on the original code?

Thanks for your help!

Topic		Replies	Views
Combining encoder from one model and a decoder for another for image reconstruction Beginners	0	306	December 15, 2022
Call ViTMAE Forward Embedding Models	1	290	March 30, 2023
How to use ViT MAE for image classification? 🤗Transformers	4	1988	December 3, 2024
Why does ViTForMaskedImageModeling not construct the original image correctly? Beginners	0	172	May 17, 2023
Calling ViTMAEModel with embeddings and encoder Beginners	2	257	January 31, 2024

Inference with VitMAE by providing a mask

Related topics