Collections
Discover the best community collections!
Collections including paper arxiv:2409.18869
-
MaskLLM: Learnable Semi-Structured Sparsity for Large Language Models
Paper • 2409.17481 • Published • 47 -
Reducing the Footprint of Multi-Vector Retrieval with Minimal Performance Impact via Token Pooling
Paper • 2409.14683 • Published • 10 -
Discovering the Gems in Early Layers: Accelerating Long-Context LLMs with 1000x Input Token Reduction
Paper • 2409.17422 • Published • 25 -
Emu3: Next-Token Prediction is All You Need
Paper • 2409.18869 • Published • 94
-
Mamba-YOLO-World: Marrying YOLO-World with Mamba for Open-Vocabulary Detection
Paper • 2409.08513 • Published • 12 -
Windows Agent Arena: Evaluating Multi-Modal OS Agents at Scale
Paper • 2409.08264 • Published • 44 -
Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution
Paper • 2409.12191 • Published • 76 -
LLMs + Persona-Plug = Personalized LLMs
Paper • 2409.11901 • Published • 32
-
MambaVision: A Hybrid Mamba-Transformer Vision Backbone
Paper • 2407.08083 • Published • 28 -
Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model
Paper • 2408.11039 • Published • 58 -
The Mamba in the Llama: Distilling and Accelerating Hybrid Models
Paper • 2408.15237 • Published • 39 -
Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think
Paper • 2409.11355 • Published • 29
-
INT-FP-QSim: Mixed Precision and Formats For Large Language Models and Vision Transformers
Paper • 2307.03712 • Published • 1 -
Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters
Paper • 2408.04093 • Published • 4 -
Arcee's MergeKit: A Toolkit for Merging Large Language Models
Paper • 2403.13257 • Published • 20 -
LongVILA: Scaling Long-Context Visual Language Models for Long Videos
Paper • 2408.10188 • Published • 51
-
VILA^2: VILA Augmented VILA
Paper • 2407.17453 • Published • 40 -
Octopus v4: Graph of language models
Paper • 2404.19296 • Published • 117 -
Octo-planner: On-device Language Model for Planner-Action Agents
Paper • 2406.18082 • Published • 48 -
Dolphin: Long Context as a New Modality for Energy-Efficient On-Device Language Models
Paper • 2408.15518 • Published • 43
-
Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs
Paper • 2406.16860 • Published • 60 -
PaliGemma: A versatile 3B VLM for transfer
Paper • 2407.07726 • Published • 68 -
E5-V: Universal Embeddings with Multimodal Large Language Models
Paper • 2407.12580 • Published • 40 -
Emu3: Next-Token Prediction is All You Need
Paper • 2409.18869 • Published • 94
-
Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities
Paper • 2401.14405 • Published • 13 -
CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs
Paper • 2406.18521 • Published • 29 -
xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations
Paper • 2408.12590 • Published • 35 -
Law of Vision Representation in MLLMs
Paper • 2408.16357 • Published • 93