MLLM-as-a-Judge for Image Safety without Human Labeling Paper • 2501.00192 • Published 6 days ago • 22
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining Paper • 2501.00958 • Published 4 days ago • 75
Are Vision-Language Models Truly Understanding Multi-vision Sensor? Paper • 2412.20750 • Published 7 days ago • 17
Explanatory Instructions: Towards Unified Vision Tasks Understanding and Zero-shot Generalization Paper • 2412.18525 • Published 13 days ago • 62
On the Compositional Generalization of Multimodal LLMs for Medical Imaging Paper • 2412.20070 • Published 9 days ago • 40
The Superposition of Diffusion Models Using the Itô Density Estimator Paper • 2412.17762 • Published 14 days ago • 12
Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey Paper • 2412.18619 • Published 21 days ago • 49
Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment Paper • 2412.19326 • Published 11 days ago • 18
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs Paper • 2412.18925 • Published 12 days ago • 86
MMFactory: A Universal Solution Search Engine for Vision-Language Tasks Paper • 2412.18072 • Published 13 days ago • 14
A Silver Bullet or a Compromise for Full Attention? A Comprehensive Study of Gist Token-based Context Compression Paper • 2412.17483 • Published 14 days ago • 29
Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search Paper • 2412.18319 • Published 13 days ago • 34
Revisiting In-Context Learning with Long Context Language Models Paper • 2412.16926 • Published 15 days ago • 27
Fourier Position Embedding: Enhancing Attention's Periodic Extension for Length Generalization Paper • 2412.17739 • Published 14 days ago • 37
Deliberation in Latent Space via Differentiable Cache Augmentation Paper • 2412.17747 • Published 14 days ago • 28
Diving into Self-Evolving Training for Multimodal Reasoning Paper • 2412.17451 • Published 14 days ago • 41
SCOPE: Optimizing Key-Value Cache Compression in Long-context Generation Paper • 2412.13649 • Published 19 days ago • 20