(Meta-comment: I’m actually not sure which forum this would best fit into - seems like it would be useful to have a place where we can discuss new papers.)
This new work by Kaiming He et al seems pretty interesting - they use a very simple setup for masking during pre-training a ViT and it looks like they get very good results across a variety of tasks.
So far, I see an implementation by lucidrains.