-
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper • 2402.17764 • Published • 605 -
CLEAR: Character Unlearning in Textual and Visual Modalities
Paper • 2410.18057 • Published • 200 -
Unpacking SDXL Turbo: Interpreting Text-to-Image Models with Sparse Autoencoders
Paper • 2410.22366 • Published • 77 -
Emu3: Next-Token Prediction is All You Need
Paper • 2409.18869 • Published • 94
Collections
Discover the best community collections!
Collections including paper arxiv:2402.13144
-
Training-Free Consistent Text-to-Image Generation
Paper • 2402.03286 • Published • 65 -
ConsistI2V: Enhancing Visual Consistency for Image-to-Video Generation
Paper • 2402.04324 • Published • 23 -
λ-ECLIPSE: Multi-Concept Personalized Text-to-Image Diffusion Models by Leveraging CLIP Latent Space
Paper • 2402.05195 • Published • 18 -
FiT: Flexible Vision Transformer for Diffusion Model
Paper • 2402.12376 • Published • 48
-
Neural Network Diffusion
Paper • 2402.13144 • Published • 95 -
Genie: Generative Interactive Environments
Paper • 2402.15391 • Published • 70 -
Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models
Paper • 2402.17177 • Published • 88 -
VisionLLaMA: A Unified LLaMA Interface for Vision Tasks
Paper • 2403.00522 • Published • 44