Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2409.18869

Emu3: Next-Token Prediction is All You Need

Paper • 2409.18869 • Published Sep 27, 2024 • 94
Harnessing Webpage UIs for Text-Rich Visual Understanding

Paper • 2410.13824 • Published Oct 17, 2024 • 30

MIO: A Foundation Model on Multimodal Tokens

Paper • 2409.17692 • Published Sep 26, 2024 • 53
Emu3: Next-Token Prediction is All You Need

Paper • 2409.18869 • Published Sep 27, 2024 • 94

Emu3: Next-Token Prediction is All You Need

Paper • 2409.18869 • Published Sep 27, 2024 • 94
Running on Zero

3.21k

🏢

TRELLIS

Scalable and Versatile 3D Generation from images

random interest papers

MaskLLM: Learnable Semi-Structured Sparsity for Large Language Models

Paper • 2409.17481 • Published Sep 26, 2024 • 47
Reducing the Footprint of Multi-Vector Retrieval with Minimal Performance Impact via Token Pooling

Paper • 2409.14683 • Published Sep 23, 2024 • 10
Discovering the Gems in Early Layers: Accelerating Long-Context LLMs with 1000x Input Token Reduction

Paper • 2409.17422 • Published Sep 25, 2024 • 25
Emu3: Next-Token Prediction is All You Need

Paper • 2409.18869 • Published Sep 27, 2024 • 94

Mamba-YOLO-World: Marrying YOLO-World with Mamba for Open-Vocabulary Detection

Paper • 2409.08513 • Published Sep 13, 2024 • 12
Windows Agent Arena: Evaluating Multi-Modal OS Agents at Scale

Paper • 2409.08264 • Published Sep 12, 2024 • 44
Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution

Paper • 2409.12191 • Published Sep 18, 2024 • 76
LLMs + Persona-Plug = Personalized LLMs

Paper • 2409.11901 • Published Sep 18, 2024 • 32

MambaVision: A Hybrid Mamba-Transformer Vision Backbone

Paper • 2407.08083 • Published Jul 10, 2024 • 28
Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model

Paper • 2408.11039 • Published Aug 20, 2024 • 58
The Mamba in the Llama: Distilling and Accelerating Hybrid Models

Paper • 2408.15237 • Published Aug 27, 2024 • 39
Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think

Paper • 2409.11355 • Published Sep 17, 2024 • 29

This collection is for Transformer Articles

INT-FP-QSim: Mixed Precision and Formats For Large Language Models and Vision Transformers

Paper • 2307.03712 • Published Jul 7, 2023 • 1
Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters

Paper • 2408.04093 • Published Aug 7, 2024 • 4
Arcee's MergeKit: A Toolkit for Merging Large Language Models

Paper • 2403.13257 • Published Mar 20, 2024 • 20
LongVILA: Scaling Long-Context Visual Language Models for Long Videos

Paper • 2408.10188 • Published Aug 19, 2024 • 51

Perception and abstraction. Each modality is tokenized and embedded into vectors for model to comprehend.

VILA^2: VILA Augmented VILA

Paper • 2407.17453 • Published Jul 24, 2024 • 40
Octopus v4: Graph of language models

Paper • 2404.19296 • Published Apr 30, 2024 • 117
Octo-planner: On-device Language Model for Planner-Action Agents

Paper • 2406.18082 • Published Jun 26, 2024 • 48
Dolphin: Long Context as a New Modality for Energy-Efficient On-Device Language Models

Paper • 2408.15518 • Published Aug 28, 2024 • 43

Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs

Paper • 2406.16860 • Published Jun 24, 2024 • 60
PaliGemma: A versatile 3B VLM for transfer

Paper • 2407.07726 • Published Jul 10, 2024 • 68
E5-V: Universal Embeddings with Multimodal Large Language Models

Paper • 2407.12580 • Published Jul 17, 2024 • 40
Emu3: Next-Token Prediction is All You Need

Paper • 2409.18869 • Published Sep 27, 2024 • 94

General Multimodal Learning

Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities

Paper • 2401.14405 • Published Jan 25, 2024 • 13
CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs

Paper • 2406.18521 • Published Jun 26, 2024 • 29
xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations

Paper • 2408.12590 • Published Aug 22, 2024 • 35
Law of Vision Representation in MLLMs

Paper • 2408.16357 • Published Aug 29, 2024 • 93

Previous
1
2
3
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs