VLM - a poonyZ Collection

poonyZ 's Collections

T2I

agi

fancy

VLM

llm

VLM

updated 4 days ago

Remember, Retrieve and Generate: Understanding Infinite Visual Concepts as Your Personalized Assistant

Paper • 2410.13360 • Published Oct 17, 2024 • 8
Critic-V: VLM Critics Help Catch VLM Errors in Multimodal Reasoning

Paper • 2411.18203 • Published Nov 27, 2024 • 33
Towards Interpreting Visual Information Processing in Vision-Language Models

Paper • 2410.07149 • Published Oct 9, 2024 • 1
Understanding Alignment in Multimodal LLMs: A Comprehensive Study

Paper • 2407.02477 • Published Jul 2, 2024 • 21
Enhancing Instruction-Following Capability of Visual-Language Models by Reducing Image Redundancy

Paper • 2411.15453 • Published Nov 23, 2024
Large Multi-modal Models Can Interpret Features in Large Multi-modal Models

Paper • 2411.14982 • Published Nov 22, 2024 • 16
I Don't Know: Explicit Modeling of Uncertainty with an [IDK] Token

Paper • 2412.06676 • Published 28 days ago • 9
From Uncertainty to Trust: Enhancing Reliability in Vision-Language Models with Uncertainty-Guided Dropout Decoding

Paper • 2412.06474 • Published 28 days ago
OLA-VLM: Elevating Visual Perception in Multimodal LLMs with Auxiliary Embedding Distillation

Paper • 2412.09585 • Published 25 days ago • 10
SynerGen-VL: Towards Synergistic Image Understanding and Generation with Vision Experts and Token Folding

Paper • 2412.09604 • Published 25 days ago • 35
Analyzing The Language of Visual Tokens

Paper • 2411.05001 • Published Nov 7, 2024 • 23
LLaVA-UHD v2: an MLLM Integrating High-Resolution Feature Pyramid via Hierarchical Window Transformer

Paper • 2412.13871 • Published 19 days ago • 17
FastVLM: Efficient Vision Encoding for Vision Language Models

Paper • 2412.13303 • Published 20 days ago • 13
Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey

Paper • 2412.18619 • Published 21 days ago • 49
Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment

Paper • 2412.19326 • Published 11 days ago • 18
Explanatory Instructions: Towards Unified Vision Tasks Understanding and Zero-shot Generalization

Paper • 2412.18525 • Published 13 days ago • 62