stablegravity
's Collections
aigc
updated
VideoBooth: Diffusion-based Video Generation with Image Prompts
Paper
•
2312.00777
•
Published
•
21
MotionCtrl: A Unified and Flexible Motion Controller for Video
Generation
Paper
•
2312.03641
•
Published
•
20
GenTron: Delving Deep into Diffusion Transformers for Image and Video
Generation
Paper
•
2312.04557
•
Published
•
13
DreamVideo: Composing Your Dream Videos with Customized Subject and
Motion
Paper
•
2312.04433
•
Published
•
10
AnimateLCM: Accelerating the Animation of Personalized Diffusion Models
and Adapters with Decoupled Consistency Learning
Paper
•
2402.00769
•
Published
•
22
Motion-I2V: Consistent and Controllable Image-to-Video Generation with
Explicit Motion Modeling
Paper
•
2401.15977
•
Published
•
37
Object-Driven One-Shot Fine-tuning of Text-to-Image Diffusion with
Prototypical Embedding
Paper
•
2401.15708
•
Published
•
11
Diffuse to Choose: Enriching Image Conditioned Inpainting in Latent
Diffusion Models for Virtual Try-All
Paper
•
2401.13795
•
Published
•
66
Deconstructing Denoising Diffusion Models for Self-Supervised Learning
Paper
•
2401.14404
•
Published
•
17
BootPIG: Bootstrapping Zero-shot Personalized Image Generation
Capabilities in Pretrained Diffusion Models
Paper
•
2401.13974
•
Published
•
12
Scaling Up to Excellence: Practicing Model Scaling for Photo-Realistic
Image Restoration In the Wild
Paper
•
2401.13627
•
Published
•
73
Lumiere: A Space-Time Diffusion Model for Video Generation
Paper
•
2401.12945
•
Published
•
86
Spotting LLMs With Binoculars: Zero-Shot Detection of Machine-Generated
Text
Paper
•
2401.12070
•
Published
•
44
StreamVoice: Streamable Context-Aware Language Modeling for Real-time
Zero-Shot Voice Conversion
Paper
•
2401.11053
•
Published
•
10
Scalable High-Resolution Pixel-Space Image Synthesis with Hourglass
Diffusion Transformers
Paper
•
2401.11605
•
Published
•
22
Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data
Paper
•
2401.10891
•
Published
•
60
Medusa: Simple LLM Inference Acceleration Framework with Multiple
Decoding Heads
Paper
•
2401.10774
•
Published
•
54
Synthesizing Moving People with 3D Control
Paper
•
2401.10889
•
Published
•
12
WorldDreamer: Towards General World Models for Video Generation via
Predicting Masked Tokens
Paper
•
2401.09985
•
Published
•
15
ActAnywhere: Subject-Aware Video Background Generation
Paper
•
2401.10822
•
Published
•
13
VideoCrafter2: Overcoming Data Limitations for High-Quality Video
Diffusion Models
Paper
•
2401.09047
•
Published
•
14
InstantID: Zero-shot Identity-Preserving Generation in Seconds
Paper
•
2401.07519
•
Published
•
54
Chain-of-Thought Reasoning Without Prompting
Paper
•
2402.10200
•
Published
•
105
Design2Code: How Far Are We From Automating Front-End Engineering?
Paper
•
2403.03163
•
Published
•
94
FlashFace: Human Image Personalization with High-fidelity Identity
Preservation
Paper
•
2403.17008
•
Published
•
20
KAN: Kolmogorov-Arnold Networks
Paper
•
2404.19756
•
Published
•
109
InstantFamily: Masked Attention for Zero-shot Multi-ID Image Generation
Paper
•
2404.19427
•
Published
•
72
Octopus v4: Graph of language models
Paper
•
2404.19296
•
Published
•
117
Make Your LLM Fully Utilize the Context
Paper
•
2404.16811
•
Published
•
53
ConsistentID: Portrait Generation with Multimodal Fine-Grained Identity
Preserving
Paper
•
2404.16771
•
Published
•
17
PuLID: Pure and Lightning ID Customization via Contrastive Alignment
Paper
•
2404.16022
•
Published
•
23
FlowMind: Automatic Workflow Generation with LLMs
Paper
•
2404.13050
•
Published
•
34
Hyper-SD: Trajectory Segmented Consistency Model for Efficient Image
Synthesis
Paper
•
2404.13686
•
Published
•
28
Dynamic Typography: Bringing Words to Life
Paper
•
2404.11614
•
Published
•
45
Toward Self-Improvement of LLMs via Imagination, Searching, and
Criticizing
Paper
•
2404.12253
•
Published
•
55
ControlNet++: Improving Conditional Controls with Efficient Consistency
Feedback
Paper
•
2404.07987
•
Published
•
47
Rho-1: Not All Tokens Are What You Need
Paper
•
2404.07965
•
Published
•
89
RULER: What's the Real Context Size of Your Long-Context Language
Models?
Paper
•
2404.06654
•
Published
•
35
ByteEdit: Boost, Comply and Accelerate Generative Image Editing
Paper
•
2404.04860
•
Published
•
25
SwapAnything: Enabling Arbitrary Object Swapping in Personalized Visual
Editing
Paper
•
2404.05717
•
Published
•
25
MagicTime: Time-lapse Video Generation Models as Metamorphic Simulators
Paper
•
2404.05014
•
Published
•
33
SpatialTracker: Tracking Any 2D Pixels in 3D Space
Paper
•
2404.04319
•
Published
•
24
Direct Nash Optimization: Teaching Language Models to Self-Improve with
General Preferences
Paper
•
2404.03715
•
Published
•
61
Stream of Search (SoS): Learning to Search in Language
Paper
•
2404.03683
•
Published
•
30
Social Skill Training with Large Language Models
Paper
•
2404.04204
•
Published
•
15
Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale
Prediction
Paper
•
2404.02905
•
Published
•
67
Advancing LLM Reasoning Generalists with Preference Trees
Paper
•
2404.02078
•
Published
•
44
StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video
Generation
Paper
•
2405.01434
•
Published
•
54
MLCM: Multistep Consistency Distillation of Latent Diffusion Model
Paper
•
2406.05768
•
Published
•
10
Paper
•
2406.09414
•
Published
•
96
RealTalk: Real-time and Realistic Audio-driven Face Generation with 3D
Facial Prior-guided Identity Alignment Network
Paper
•
2406.18284
•
Published
•
19
GenCA: A Text-conditioned Generative Model for Realistic and Drivable
Codec Avatars
Paper
•
2408.13674
•
Published
•
18
Click2Mask: Local Editing with Dynamic Mask Generation
Paper
•
2409.08272
•
Published
•
5
MotionShop: Zero-Shot Motion Transfer in Video Diffusion Models with
Mixture of Score Guidance
Paper
•
2412.05355
•
Published
•
7
Around the World in 80 Timesteps: A Generative Approach to Global Visual
Geolocation
Paper
•
2412.06781
•
Published
•
19
PanoDreamer: 3D Panorama Synthesis from a Single Image
Paper
•
2412.04827
•
Published
•
10