-
LinFusion: 1 GPU, 1 Minute, 16K Image
Paper • 2409.02097 • Published • 33 -
Phidias: A Generative Model for Creating 3D Content from Text, Image, and 3D Conditions with Reference-Augmented Diffusion
Paper • 2409.11406 • Published • 26 -
Diffusion Models Are Real-Time Game Engines
Paper • 2408.14837 • Published • 123 -
Segment Anything with Multiple Modalities
Paper • 2408.09085 • Published • 22
Collections
Discover the best community collections!
Collections including paper arxiv:2410.22366
-
HtmlRAG: HTML is Better Than Plain Text for Modeling Retrieved Knowledge in RAG Systems
Paper • 2411.02959 • Published • 66 -
GarVerseLOD: High-Fidelity 3D Garment Reconstruction from a Single In-the-Wild Image using a Dataset with Levels of Details
Paper • 2411.03047 • Published • 8 -
MVPaint: Synchronized Multi-View Diffusion for Painting Anything 3D
Paper • 2411.02336 • Published • 23 -
GenXD: Generating Any 3D and 4D Scenes
Paper • 2411.02319 • Published • 20
-
Animate-X: Universal Character Image Animation with Enhanced Motion Representation
Paper • 2410.10306 • Published • 54 -
ReCapture: Generative Video Camera Controls for User-Provided Videos using Masked Video Fine-Tuning
Paper • 2411.05003 • Published • 70 -
TIP-I2V: A Million-Scale Real Text and Image Prompt Dataset for Image-to-Video Generation
Paper • 2411.04709 • Published • 25 -
IterComp: Iterative Composition-Aware Feedback Learning from Model Gallery for Text-to-Image Generation
Paper • 2410.07171 • Published • 42
-
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper • 2402.17764 • Published • 607 -
CLEAR: Character Unlearning in Textual and Visual Modalities
Paper • 2410.18057 • Published • 200 -
Unpacking SDXL Turbo: Interpreting Text-to-Image Models with Sparse Autoencoders
Paper • 2410.22366 • Published • 77 -
Emu3: Next-Token Prediction is All You Need
Paper • 2409.18869 • Published • 94
-
Learning Video Representations without Natural Videos
Paper • 2410.24213 • Published • 15 -
Navigating the Unknown: A Chat-Based Collaborative Interface for Personalized Exploratory Tasks
Paper • 2410.24032 • Published • 9 -
Unpacking SDXL Turbo: Interpreting Text-to-Image Models with Sparse Autoencoders
Paper • 2410.22366 • Published • 77 -
Stealing User Prompts from Mixture of Experts
Paper • 2410.22884 • Published • 14
-
MotionCLR: Motion Generation and Training-free Editing via Understanding Attention Mechanisms
Paper • 2410.18977 • Published • 14 -
FrugalNeRF: Fast Convergence for Few-shot Novel View Synthesis without Learned Priors
Paper • 2410.16271 • Published • 81 -
GS^3: Efficient Relighting with Triple Gaussian Splatting
Paper • 2410.11419 • Published • 11 -
ZeroComp: Zero-shot Object Compositing from Image Intrinsics via Diffusion
Paper • 2410.08168 • Published • 9
-
FiTv2: Scalable and Improved Flexible Vision Transformer for Diffusion Model
Paper • 2410.13925 • Published • 23 -
BiGR: Harnessing Binary Latent Codes for Image Generation and Improved Visual Representation Capabilities
Paper • 2410.14672 • Published • 7 -
Scalable Ranked Preference Optimization for Text-to-Image Generation
Paper • 2410.18013 • Published • 14 -
DreamClear: High-Capacity Real-World Image Restoration with Privacy-Safe Dataset Curation
Paper • 2410.18666 • Published • 19