Video-3D LLM: Learning Position-Aware Video Representation for 3D Scene Understanding Paper • 2412.00493 • Published Nov 30, 2024 • 16
Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction Paper • 2404.02905 • Published Apr 3, 2024 • 65
GiT: Towards Generalist Vision Transformer through Universal Language Interface Paper • 2403.09394 • Published Mar 14, 2024 • 25