How to use I-JEPA for image classficiation

Nathanaj · July 15, 2023, 6:22pm

Hello everyone,

I am currently trying to implement image classification using I-JEPA. The paper from Yann Lecun ([2111.06377] Masked Autoencoders Are Scalable Vision Learners) mentioned that it could be applied for image classification, which piqued my interest in exploring its usage for the same. However, I’m facing a bit of confusion when it comes to actual implementation.

From the repository provided on GitHub, I am finding it hard to understand how to modify the model to add a linear classifier. More so, I am unclear about how to re-train the model on my data. The pre-trained models available on their GitHub are also present, but I must admit, I am finding it difficult to grasp how to leverage these for my purpose.

Could anyone who has some experience with I-JEPA help me understand the process? Any guidance on how to adapt the model for image classification, and potentially how to use the pre-trained models, would be extremely appreciated.

Looking forward to your suggestions and guidance. Thanks in advance!

ChristopherMarais · March 13, 2024, 2:10am

Hi Nathan,

Did you figure this out. I have been thinking about doing the same and have finally come round to actually attempting this. have you tested this?

nielsr · March 13, 2024, 7:43am

Hi,

The paper you refer to (MAE, or masked autoencoders) is available as ViTMAEForImageClassification in the Transformers library. It adds a linear classifier on top of the base ViTMAEModel. There’s also the ViTMAEForPreTraining class which adds the decoder used for pre-training.

Refer to the docs: ViTMAE. You can fine-tune it easily on your custom dataset by following the image classification notebook or example scripts.

I-JEPA is a different paper: [2301.08243] Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture, but the architecture is very similar (namely, a Vision Transformer). One would need to add I-JEPA as a separate model in the Transformers library.

ChristopherMarais · March 26, 2024, 10:01pm

How would you go about adding it to the library? I’d love to do that. @nielsr

I have managed to add a classification layer on top of the I_JEPA encoder, but it seems to not be very accurate and takes a while to train for now. I need to play with the hyperparameters a bit, but still find it strange that the out of the box classification abilities are so limited.

nielsr · December 6, 2024, 6:41pm

Hi,

I-JEPA is now supported in Transformers! I-JEPA

Topic		Replies	Views
How to use ViT MAE for image classification? 🤗Transformers	4	2024	December 3, 2024
New Paper: Masked Autoencoders Are Scalable Vision Learners Research	0	1360	November 14, 2021
Img2seq model with pretrained weights Beginners	7	1170	November 18, 2021
Using ViTMAEModel as an encoder for a UNet decoder for semantic segmentation 🤗Transformers	0	133	June 9, 2024
Inference with VitMAE by providing a mask 🤗Transformers	0	266	January 3, 2024

How to use I-JEPA for image classficiation

Related topics