foundation model - a weleen Collection

weleen 's Collections

foundation model

aigc

aigc acceleration

gs

foundation model

updated Aug 26, 2024

DreamLLM: Synergistic Multimodal Comprehension and Creation

Paper • 2309.11499 • Published Sep 20, 2023 • 58
An Introduction to Vision-Language Modeling

Paper • 2405.17247 • Published May 27, 2024 • 87
Chameleon: Mixed-Modal Early-Fusion Foundation Models

Paper • 2405.09818 • Published May 16, 2024 • 126
No Time to Waste: Squeeze Time into Channel for Mobile Video Understanding

Paper • 2405.08344 • Published May 14, 2024 • 12
KAN or MLP: A Fairer Comparison

Paper • 2407.16674 • Published Jul 23, 2024 • 42
OmniBind: Large-scale Omni Multimodal Representation via Binding Spaces

Paper • 2407.11895 • Published Jul 16, 2024 • 7
VILA^2: VILA Augmented VILA

Paper • 2407.17453 • Published Jul 24, 2024 • 39
Improving 2D Feature Representations by 3D-Aware Fine-Tuning

Paper • 2407.20229 • Published Jul 29, 2024 • 7
POA: Pre-training Once for Models of All Sizes

Paper • 2408.01031 • Published Aug 2, 2024 • 26
Improving Text Embeddings for Smaller Language Models Using Contrastive Fine-tuning

Paper • 2408.00690 • Published Aug 1, 2024 • 24
LongVILA: Scaling Long-Context Visual Language Models for Long Videos

Paper • 2408.10188 • Published Aug 19, 2024 • 51
Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model

Paper • 2408.11039 • Published Aug 20, 2024 • 58
Show-o: One Single Transformer to Unify Multimodal Understanding and Generation

Paper • 2408.12528 • Published Aug 22, 2024 • 51
TWLV-I: Analysis and Insights from Holistic Evaluation on Video Foundation Models

Paper • 2408.11318 • Published Aug 21, 2024 • 55