Apollo: An Exploration of Video Understanding in Large Multimodal Models Paper • 2412.10360 • Published 21 days ago • 136
YuLan-Mini Collection A highly capable 2.4B lightweight LLM using only 1T pre-training data with all details. • 5 items • Updated 6 days ago • 10
Large Motion Video Autoencoding with Cross-modal Video VAE Paper • 2412.17805 • Published 11 days ago • 23
Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction Paper • 2412.04454 • Published 29 days ago • 54
LLM-jp-3 Pre-trained Models Collection Pre-trained models in the LLM-jp-3 model series • 4 items • Updated 11 days ago • 5
AIMv2 Collection A collection of AIMv2 vision encoders that supports a number of resolutions, native resolution, and a distilled checkpoint. • 19 items • Updated Nov 22, 2024 • 68
Gemma-APS Release Collection Gemma models for text-to-propositions segmentation. The models are distilled from fine-tuned Gemini Pro model applied to multi-domain synthetic data. • 3 items • Updated 21 days ago • 20
Loong: Generating Minute-level Long Videos with Autoregressive Language Models Paper • 2410.02757 • Published Oct 3, 2024 • 36
InfiMM-WebMath-40B: Advancing Multimodal Pre-Training for Enhanced Mathematical Reasoning Paper • 2409.12568 • Published Sep 19, 2024 • 48
Qwen2.5 Collection Qwen2.5 language models, including pretrained and instruction-tuned models of 7 sizes, including 0.5B, 1.5B, 3B, 7B, 14B, 32B, and 72B. • 45 items • Updated Nov 28, 2024 • 452