Remember, Retrieve and Generate: Understanding Infinite Visual Concepts as Your Personalized Assistant Paper • 2410.13360 • Published Oct 17, 2024 • 8
Critic-V: VLM Critics Help Catch VLM Errors in Multimodal Reasoning Paper • 2411.18203 • Published Nov 27, 2024 • 33
Towards Interpreting Visual Information Processing in Vision-Language Models Paper • 2410.07149 • Published Oct 9, 2024 • 1
Understanding Alignment in Multimodal LLMs: A Comprehensive Study Paper • 2407.02477 • Published Jul 2, 2024 • 21
Enhancing Instruction-Following Capability of Visual-Language Models by Reducing Image Redundancy Paper • 2411.15453 • Published Nov 23, 2024
Large Multi-modal Models Can Interpret Features in Large Multi-modal Models Paper • 2411.14982 • Published Nov 22, 2024 • 16
I Don't Know: Explicit Modeling of Uncertainty with an [IDK] Token Paper • 2412.06676 • Published 28 days ago • 9
From Uncertainty to Trust: Enhancing Reliability in Vision-Language Models with Uncertainty-Guided Dropout Decoding Paper • 2412.06474 • Published 28 days ago
OLA-VLM: Elevating Visual Perception in Multimodal LLMs with Auxiliary Embedding Distillation Paper • 2412.09585 • Published 25 days ago • 10
SynerGen-VL: Towards Synergistic Image Understanding and Generation with Vision Experts and Token Folding Paper • 2412.09604 • Published 25 days ago • 35
LLaVA-UHD v2: an MLLM Integrating High-Resolution Feature Pyramid via Hierarchical Window Transformer Paper • 2412.13871 • Published 19 days ago • 17
FastVLM: Efficient Vision Encoding for Vision Language Models Paper • 2412.13303 • Published 20 days ago • 13
Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey Paper • 2412.18619 • Published 21 days ago • 49
Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment Paper • 2412.19326 • Published 11 days ago • 18
Explanatory Instructions: Towards Unified Vision Tasks Understanding and Zero-shot Generalization Paper • 2412.18525 • Published 13 days ago • 62