MLLM-as-a-Judge for Image Safety without Human Labeling Paper • 2501.00192 • Published 4 days ago • 14
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining Paper • 2501.00958 • Published 2 days ago • 47
Explanatory Instructions: Towards Unified Vision Tasks Understanding and Zero-shot Generalization Paper • 2412.18525 • Published 11 days ago • 59
MEDEC: A Benchmark for Medical Error Detection and Correction in Clinical Notes Paper • 2412.19260 • Published 9 days ago • 1
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs Paper • 2412.18925 • Published 10 days ago • 82
Video-Panda: Parameter-efficient Alignment for Encoder-free Video-Language Models Paper • 2412.18609 • Published 10 days ago • 13
Molar: Multimodal LLMs with Collaborative Filtering Alignment for Enhanced Sequential Recommendation Paper • 2412.18176 • Published 11 days ago • 15
Revisiting In-Context Learning with Long Context Language Models Paper • 2412.16926 • Published 13 days ago • 27
No More Adam: Learning Rate Scaling at Initialization is All You Need Paper • 2412.11768 • Published 19 days ago • 41
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference Paper • 2412.13663 • Published 17 days ago • 116
Byte Latent Transformer: Patches Scale Better Than Tokens Paper • 2412.09871 • Published 22 days ago • 80
Lyra: An Efficient and Speech-Centric Framework for Omni-Cognition Paper • 2412.09501 • Published 23 days ago • 43
Euclid: Supercharging Multimodal LLMs with Synthetic High-Fidelity Visual Descriptions Paper • 2412.08737 • Published 23 days ago • 52
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions Paper • 2412.09596 • Published 22 days ago • 92
BiMediX2: Bio-Medical EXpert LMM for Diverse Medical Modalities Paper • 2412.07769 • Published 24 days ago • 26