MIT-10M: A Large Scale Parallel Corpus of Multilingual Image Translation Paper • 2412.07147 • Published 27 days ago • 5
Grounding Descriptions in Images informs Zero-Shot Visual Recognition Paper • 2412.04429 • Published Dec 5, 2024
Exploring Multi-Grained Concept Annotations for Multimodal Large Language Models Paper • 2412.05939 • Published 29 days ago • 13
Euclid: Supercharging Multimodal LLMs with Synthetic High-Fidelity Visual Descriptions Paper • 2412.08737 • Published 26 days ago • 52
VisionArena: 230K Real World User-VLM Conversations with Preference Labels Paper • 2412.08687 • Published 26 days ago • 13
BiMediX2: Bio-Medical EXpert LMM for Diverse Medical Modalities Paper • 2412.07769 • Published 27 days ago • 26
MegaPairs: Massive Data Synthesis For Universal Multimodal Retrieval Paper • 2412.14475 • Published 18 days ago • 52
Toward Robust Hyper-Detailed Image Captioning: A Multiagent Approach and Dual Evaluation Metrics for Factuality and Coverage Paper • 2412.15484 • Published 17 days ago • 14