PeppePasti
's Collections
Multimodal LLMs
updated
Building and better understanding vision-language models: insights and
future directions
Paper
•
2408.12637
•
Published
•
124
Transfusion: Predict the Next Token and Diffuse Images with One
Multi-Modal Model
Paper
•
2408.11039
•
Published
•
58
Mini-Omni: Language Models Can Hear, Talk While Thinking in Streaming
Paper
•
2408.16725
•
Published
•
52
Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of
Encoders
Paper
•
2408.15998
•
Published
•
84
MAVIS: Mathematical Visual Instruction Tuning
Paper
•
2407.08739
•
Published
•
31
Law of Vision Representation in MLLMs
Paper
•
2408.16357
•
Published
•
92
Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs
Paper
•
2406.16860
•
Published
•
59
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training
Paper
•
2403.09611
•
Published
•
125
LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via
Hybrid Architecture
Paper
•
2409.02889
•
Published
•
55
FrozenSeg: Harmonizing Frozen Foundation Models for Open-Vocabulary
Segmentation
Paper
•
2409.03525
•
Published
•
12
PiTe: Pixel-Temporal Alignment for Large Video-Language Model
Paper
•
2409.07239
•
Published
•
11
One missing piece in Vision and Language: A Survey on Comics
Understanding
Paper
•
2409.09502
•
Published
•
23
NVLM: Open Frontier-Class Multimodal LLMs
Paper
•
2409.11402
•
Published
•
72
Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at
Any Resolution
Paper
•
2409.12191
•
Published
•
76
MMSearch: Benchmarking the Potential of Large Models as Multi-modal
Search Engines
Paper
•
2409.12959
•
Published
•
37
InfiMM-WebMath-40B: Advancing Multimodal Pre-Training for Enhanced
Mathematical Reasoning
Paper
•
2409.12568
•
Published
•
48
Phantom of Latent for Large Language and Vision Models
Paper
•
2409.14713
•
Published
•
28
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art
Multimodal Models
Paper
•
2409.17146
•
Published
•
106
EMOVA: Empowering Language Models to See, Hear and Speak with Vivid
Emotions
Paper
•
2409.18042
•
Published
•
36