Multimodal Autoregressive Pre-training of Large Vision Encoders Paper โข 2411.14402 โข Published Nov 21, 2024 โข 43
eP-ALM: Efficient Perceptual Augmentation of Language Models Paper โข 2303.11403 โข Published Mar 20, 2023 โข 3
Unified Model for Image, Video, Audio and Language Tasks Paper โข 2307.16184 โข Published Jul 30, 2023 โข 15