DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation Paper • 2412.07589 • Published Dec 10, 2024 • 45
VILA-U-7B Collection VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation • 2 items • Updated 19 days ago • 5
Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision, Language, Audio, and Action Paper • 2312.17172 • Published Dec 28, 2023 • 28