Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling Paper • 2412.05271 • Published Dec 6, 2024 • 129
MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale Paper • 2412.05237 • Published Dec 6, 2024 • 47
LLaVA-o1: Let Vision Language Models Reason Step-by-Step Paper • 2411.10440 • Published Nov 15, 2024 • 113
Vintern-1B: An Efficient Multimodal Large Language Model for Vietnamese Paper • 2408.12480 • Published Aug 22, 2024 • 20
InternVL2.0 Collection Expanding Performance Boundaries of Open-Source MLLM • 15 items • Updated 12 days ago • 89