Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference Paper β’ 2412.13663 β’ Published 14 days ago β’ 113
ModernBERT Collection Bringing BERT into modernity via both architecture changes and scaling β’ 3 items β’ Updated 13 days ago β’ 107
The Open Source Advantage in Large Language Models (LLMs) Paper β’ 2412.12004 β’ Published 16 days ago β’ 9
SynerGen-VL: Towards Synergistic Image Understanding and Generation with Vision Experts and Token Folding Paper β’ 2412.09604 β’ Published 20 days ago β’ 35
Apollo: An Exploration of Video Understanding in Large Multimodal Models Paper β’ 2412.10360 β’ Published 19 days ago β’ 132
Euclid: Supercharging Multimodal LLMs with Synthetic High-Fidelity Visual Descriptions Paper β’ 2412.08737 β’ Published 21 days ago β’ 51
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions Paper β’ 2412.09596 β’ Published 20 days ago β’ 92
POINTS1.5: Building a Vision-Language Model towards Real World Applications Paper β’ 2412.08443 β’ Published 21 days ago β’ 38
LAION-SG: An Enhanced Large-Scale Dataset for Training Complex Image-Text Models with Structural Annotations Paper β’ 2412.08580 β’ Published 21 days ago β’ 45
SynCamMaster: Synchronizing Multi-Camera Video Generation from Diverse Viewpoints Paper β’ 2412.07760 β’ Published 22 days ago β’ 50
DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation Paper β’ 2412.07589 β’ Published 22 days ago β’ 46
Evaluating and Aligning CodeLLMs on Human Preference Paper β’ 2412.05210 β’ Published 26 days ago β’ 47
STIV: Scalable Text and Image Conditioned Video Generation Paper β’ 2412.07730 β’ Published 22 days ago β’ 70
Training Large Language Models to Reason in a Continuous Latent Space Paper β’ 2412.06769 β’ Published 23 days ago β’ 63
ProcessBench: Identifying Process Errors in Mathematical Reasoning Paper β’ 2412.06559 β’ Published 23 days ago β’ 69
Unraveling the Complexity of Memory in RL Agents: an Approach for Classification and Evaluation Paper β’ 2412.06531 β’ Published 23 days ago β’ 71
MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale Paper β’ 2412.05237 β’ Published 26 days ago β’ 46
EXAONE 3.5: Series of Large Language Models for Real-world Use Cases Paper β’ 2412.04862 β’ Published 26 days ago β’ 48