Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model Paper • 2401.09417 • Published Jan 17, 2024 • 59
FiT: Flexible Vision Transformer for Diffusion Model Paper • 2402.12376 • Published Feb 19, 2024 • 48
Patch n' Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution Paper • 2307.06304 • Published Jul 12, 2023 • 28