ZipAR: Accelerating Autoregressive Image Generation through Spatial Locality Paper • 2412.04062 • Published Dec 5, 2024 • 7
Mimir: Improving Video Diffusion Models for Precise Text Understanding Paper • 2412.03085 • Published Dec 4, 2024 • 12
MarDini: Masked Autoregressive Diffusion for Video Generation at Scale Paper • 2410.20280 • Published Oct 26, 2024 • 23
NaturalBench: Evaluating Vision-Language Models on Natural Adversarial Samples Paper • 2410.14669 • Published Oct 18, 2024 • 36
MLLM can see? Dynamic Correction Decoding for Hallucination Mitigation Paper • 2410.11779 • Published Oct 15, 2024 • 25
Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders Paper • 2408.15998 • Published Aug 28, 2024 • 84
Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model Paper • 2408.11039 • Published Aug 20, 2024 • 58
Seeing Faces in Things: A Model and Dataset for Pareidolia Paper • 2409.16143 • Published Sep 24, 2024 • 17
PixWizard: Versatile Image-to-Image Visual Assistant with Open-Language Instructions Paper • 2409.15278 • Published Sep 23, 2024 • 24
Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution Paper • 2409.12191 • Published Sep 18, 2024 • 76
CDM: A Reliable Metric for Fair and Accurate Formula Recognition Evaluation Paper • 2409.03643 • Published Sep 5, 2024 • 19
Real-Time Video Generation with Pyramid Attention Broadcast Paper • 2408.12588 • Published Aug 22, 2024 • 16