2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining Paper • 2501.00958 • Published 4 days ago • 75
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs Paper • 2412.18925 • Published 12 days ago • 86
MixLLM: LLM Quantization with Global Mixed-precision between Output-features and Highly-efficient System Design Paper • 2412.14590 • Published 18 days ago • 13
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference Paper • 2412.13663 • Published 19 days ago • 116
Apollo: An Exploration of Video Understanding in Large Multimodal Models Paper • 2412.10360 • Published 24 days ago • 136
POINTS1.5: Building a Vision-Language Model towards Real World Applications Paper • 2412.08443 • Published 26 days ago • 38
ProcessBench: Identifying Process Errors in Mathematical Reasoning Paper • 2412.06559 • Published 28 days ago • 72
CompCap: Improving Multimodal Large Language Models with Composite Captions Paper • 2412.05243 • Published about 1 month ago • 18
APOLLO: SGD-like Memory, AdamW-level Performance Paper • 2412.05270 • Published about 1 month ago • 38
Florence-VL: Enhancing Vision-Language Models with Generative Vision Encoder and Depth-Breadth Fusion Paper • 2412.04424 • Published Dec 5, 2024 • 59
PaliGemma 2: A Family of Versatile VLMs for Transfer Paper • 2412.03555 • Published Dec 4, 2024 • 121
PaliGemma 2 Release Collection Vision-Language Models available in multiple 3B, 10B and 28B variants. • 23 items • Updated 24 days ago • 123
Puzzle: Distillation-Based NAS for Inference-Optimized LLMs Paper • 2411.19146 • Published Nov 28, 2024 • 13