2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining Paper โข 2501.00958 โข Published 4 days ago โข 75
VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM Paper โข 2501.00599 โข Published 6 days ago โข 35
VBench++: Comprehensive and Versatile Benchmark Suite for Video Generative Models Paper โข 2411.13503 โข Published Nov 20, 2024 โข 30
M-Longdoc: A Benchmark For Multimodal Super-Long Document Understanding And A Retrieval-Aware Tuning Framework Paper โข 2411.06176 โข Published Nov 9, 2024 โข 45
Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss Paper โข 2410.17243 โข Published Oct 22, 2024 โข 89
VideoBooth: Diffusion-based Video Generation with Image Prompts Paper โข 2312.00777 โข Published Dec 1, 2023 โข 21
FreeInit: Bridging Initialization Gap in Video Diffusion Models Paper โข 2312.07537 โข Published Dec 12, 2023 โข 25
FreeInit: Bridging Initialization Gap in Video Diffusion Models Paper โข 2312.07537 โข Published Dec 12, 2023 โข 25
VideoBooth: Diffusion-based Video Generation with Image Prompts Paper โข 2312.00777 โข Published Dec 1, 2023 โข 21
LAVIE: High-Quality Video Generation with Cascaded Latent Diffusion Models Paper โข 2309.15103 โข Published Sep 26, 2023 โข 42