LongVILA: Scaling Long-Context Visual Language Models for Long Videos Paper • 2408.10188 • Published Aug 19, 2024 • 51
VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation Paper • 2409.04429 • Published Sep 6, 2024
SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformers Paper • 2410.10629 • Published Oct 14, 2024 • 10
COAT: Compressing Optimizer states and Activation for Memory-Efficient FP8 Training Paper • 2410.19313 • Published Oct 25, 2024 • 19
TinyTL: Reduce Activations, Not Trainable Parameters for Efficient On-Device Learning Paper • 2007.11622 • Published Jul 22, 2020