Video-LaVIT: Unified Video-Language Pre-training with Decoupled Visual-Motional Tokenization Paper • 2402.03161 • Published Feb 5, 2024 • 14
Unified Language-Vision Pretraining in LLM with Dynamic Discrete Visual Tokenization Paper • 2309.04669 • Published Sep 9, 2023 • 2