weleen
's Collections
foundation model
updated
DreamLLM: Synergistic Multimodal Comprehension and Creation
Paper
•
2309.11499
•
Published
•
58
An Introduction to Vision-Language Modeling
Paper
•
2405.17247
•
Published
•
87
Chameleon: Mixed-Modal Early-Fusion Foundation Models
Paper
•
2405.09818
•
Published
•
126
No Time to Waste: Squeeze Time into Channel for Mobile Video
Understanding
Paper
•
2405.08344
•
Published
•
12
KAN or MLP: A Fairer Comparison
Paper
•
2407.16674
•
Published
•
42
OmniBind: Large-scale Omni Multimodal Representation via Binding Spaces
Paper
•
2407.11895
•
Published
•
7
VILA^2: VILA Augmented VILA
Paper
•
2407.17453
•
Published
•
39
Improving 2D Feature Representations by 3D-Aware Fine-Tuning
Paper
•
2407.20229
•
Published
•
7
POA: Pre-training Once for Models of All Sizes
Paper
•
2408.01031
•
Published
•
26
Improving Text Embeddings for Smaller Language Models Using Contrastive
Fine-tuning
Paper
•
2408.00690
•
Published
•
24
LongVILA: Scaling Long-Context Visual Language Models for Long Videos
Paper
•
2408.10188
•
Published
•
51
Transfusion: Predict the Next Token and Diffuse Images with One
Multi-Modal Model
Paper
•
2408.11039
•
Published
•
58
Show-o: One Single Transformer to Unify Multimodal Understanding and
Generation
Paper
•
2408.12528
•
Published
•
51
TWLV-I: Analysis and Insights from Holistic Evaluation on Video
Foundation Models
Paper
•
2408.11318
•
Published
•
55