zzfive
's Collections
WorldDreamer: Towards General World Models for Video Generation via
Predicting Masked Tokens
Paper
•
2401.09985
•
Published
•
15
CustomVideo: Customizing Text-to-Video Generation with Multiple Subjects
Paper
•
2401.09962
•
Published
•
8
Inflation with Diffusion: Efficient Temporal Adaptation for
Text-to-Video Super-Resolution
Paper
•
2401.10404
•
Published
•
10
ActAnywhere: Subject-Aware Video Background Generation
Paper
•
2401.10822
•
Published
•
13
Lumiere: A Space-Time Diffusion Model for Video Generation
Paper
•
2401.12945
•
Published
•
86
AnimateLCM: Accelerating the Animation of Personalized Diffusion Models
and Adapters with Decoupled Consistency Learning
Paper
•
2402.00769
•
Published
•
22
VideoPrism: A Foundational Visual Encoder for Video Understanding
Paper
•
2402.13217
•
Published
•
23
Video ReCap: Recursive Captioning of Hour-Long Videos
Paper
•
2402.13250
•
Published
•
25
Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video
Synthesis
Paper
•
2402.14797
•
Published
•
20
Sora: A Review on Background, Technology, Limitations, and Opportunities
of Large Vision Models
Paper
•
2402.17177
•
Published
•
88
Sora Generates Videos with Stunning Geometrical Consistency
Paper
•
2402.17403
•
Published
•
16
Seeing and Hearing: Open-domain Visual-Audio Generation with Diffusion
Latent Aligners
Paper
•
2402.17723
•
Published
•
16
Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers
Paper
•
2402.19479
•
Published
•
32
NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and
Diffusion Models
Paper
•
2403.03100
•
Published
•
34
Tuning-Free Noise Rectification for High Fidelity Image-to-Video
Generation
Paper
•
2403.02827
•
Published
•
6
Video Editing via Factorized Diffusion Distillation
Paper
•
2403.09334
•
Published
•
21
Video Mamba Suite: State Space Model as a Versatile Alternative for
Video Understanding
Paper
•
2403.09626
•
Published
•
13
SDEdit: Guided Image Synthesis and Editing with Stochastic Differential
Equations
Paper
•
2108.01073
•
Published
•
7
AnimateDiff-Lightning: Cross-Model Diffusion Distillation
Paper
•
2403.12706
•
Published
•
17
Streaming Dense Video Captioning
Paper
•
2404.01297
•
Published
•
11
Tango 2: Aligning Diffusion-based Text-to-Audio Generations through
Direct Preference Optimization
Paper
•
2404.09956
•
Published
•
11
MotionMaster: Training-free Camera Motion Transfer For Video Generation
Paper
•
2404.15789
•
Published
•
10
LLM-AD: Large Language Model based Audio Description System
Paper
•
2405.00983
•
Published
•
16
FIFO-Diffusion: Generating Infinite Videos from Text without Training
Paper
•
2405.11473
•
Published
•
53
ReVideo: Remake a Video with Motion and Content Control
Paper
•
2405.13865
•
Published
•
23
Visual Echoes: A Simple Unified Transformer for Audio-Visual Generation
Paper
•
2405.14598
•
Published
•
11
Denoising LM: Pushing the Limits of Error Correction Models for Speech
Recognition
Paper
•
2405.15216
•
Published
•
12
I2VEdit: First-Frame-Guided Video Editing via Image-to-Video Diffusion
Models
Paper
•
2405.16537
•
Published
•
16
Looking Backward: Streaming Video-to-Video Translation with Feature
Banks
Paper
•
2405.15757
•
Published
•
14
Human4DiT: Free-view Human Video Generation with 4D Diffusion
Transformer
Paper
•
2405.17405
•
Published
•
14
Collaborative Video Diffusion: Consistent Multi-video Generation with
Camera Control
Paper
•
2405.17414
•
Published
•
10
Instruct-MusicGen: Unlocking Text-to-Music Editing for Music Language
Models via Instruction Tuning
Paper
•
2405.18386
•
Published
•
20
T2V-Turbo: Breaking the Quality Bottleneck of Video Consistency Model
with Mixed Reward Feedback
Paper
•
2405.18750
•
Published
•
21
EasyAnimate: A High-Performance Long Video Generation Method based on
Transformer Architecture
Paper
•
2405.18991
•
Published
•
12
MOFA-Video: Controllable Image Animation via Generative Motion Field
Adaptions in Frozen Image-to-Video Diffusion Model
Paper
•
2405.20222
•
Published
•
10
DeMamba: AI-Generated Video Detection on Million-Scale GenVideo
Benchmark
Paper
•
2405.19707
•
Published
•
6
Learning Temporally Consistent Video Depth from Video Diffusion Priors
Paper
•
2406.01493
•
Published
•
18
ZeroSmooth: Training-free Diffuser Adaptation for High Frame Rate Video
Generation
Paper
•
2406.00908
•
Published
•
12
Searching Priors Makes Text-to-Video Synthesis Better
Paper
•
2406.03215
•
Published
•
11
ShareGPT4Video: Improving Video Understanding and Generation with Better
Captions
Paper
•
2406.04325
•
Published
•
72
SF-V: Single Forward Video Generation Model
Paper
•
2406.04324
•
Published
•
23
VideoTetris: Towards Compositional Text-to-Video Generation
Paper
•
2406.04277
•
Published
•
23
MotionClone: Training-Free Motion Cloning for Controllable Video
Generation
Paper
•
2406.05338
•
Published
•
39
NaRCan: Natural Refined Canonical Image with Integration of Diffusion
Prior for Video Editing
Paper
•
2406.06523
•
Published
•
50
Hierarchical Patch Diffusion Models for High-Resolution Video Generation
Paper
•
2406.07792
•
Published
•
13
AV-DiT: Efficient Audio-Visual Diffusion Transformer for Joint Audio and
Video Generation
Paper
•
2406.07686
•
Published
•
14
TC-Bench: Benchmarking Temporal Compositionality in Text-to-Video and
Image-to-Video Generation
Paper
•
2406.08656
•
Published
•
7
Rethinking Human Evaluation Protocol for Text-to-Video Models: Enhancing
Reliability,Reproducibility, and Practicality
Paper
•
2406.08845
•
Published
•
8
ExVideo: Extending Video Diffusion Models via Parameter-Efficient
Post-Tuning
Paper
•
2406.14130
•
Published
•
10
MantisScore: Building Automatic Metrics to Simulate Fine-grained Human
Feedback for Video Generation
Paper
•
2406.15252
•
Published
•
14
Video-Infinity: Distributed Long Video Generation
Paper
•
2406.16260
•
Published
•
28
DiffIR2VR-Zero: Zero-Shot Video Restoration with Diffusion-based Image
Restoration Models
Paper
•
2407.01519
•
Published
•
22
SVG: 3D Stereoscopic Video Generation via Denoising Frame Matrix
Paper
•
2407.00367
•
Published
•
9
VIMI: Grounding Video Generation through Multi-modal Instruction
Paper
•
2407.06304
•
Published
•
10
VEnhancer: Generative Space-Time Enhancement for Video Generation
Paper
•
2407.07667
•
Published
•
14
Still-Moving: Customized Video Generation without Customized Video Data
Paper
•
2407.08674
•
Published
•
12
CrowdMoGen: Zero-Shot Text-Driven Collective Motion Generation
Paper
•
2407.06188
•
Published
•
1
TCAN: Animating Human Images with Temporally Consistent Pose Guidance
using Diffusion Models
Paper
•
2407.09012
•
Published
•
9
Paper
•
2407.09533
•
Published
•
6
Noise Calibration: Plug-and-play Content-Preserving Video Enhancement
using Pre-trained Video Diffusion Models
Paper
•
2407.10285
•
Published
•
4
VD3D: Taming Large Video Diffusion Transformers for 3D Camera Control
Paper
•
2407.12781
•
Published
•
13
Towards Understanding Unsafe Video Generation
Paper
•
2407.12581
•
Published
Streetscapes: Large-scale Consistent Street View Generation Using
Autoregressive Video Diffusion
Paper
•
2407.13759
•
Published
•
17
Cinemo: Consistent and Controllable Image Animation with Motion
Diffusion Models
Paper
•
2407.15642
•
Published
•
11
MovieDreamer: Hierarchical Generation for Coherent Long Visual Sequence
Paper
•
2407.16655
•
Published
•
30
T2V-CompBench: A Comprehensive Benchmark for Compositional Text-to-video
Generation
Paper
•
2407.14505
•
Published
•
26
FreeLong: Training-Free Long Video Generation with SpectralBlend
Temporal Attention
Paper
•
2407.19918
•
Published
•
49
Tora: Trajectory-oriented Diffusion Transformer for Video Generation
Paper
•
2407.21705
•
Published
•
27
Fine-gained Zero-shot Video Sampling
Paper
•
2407.21475
•
Published
•
6
Reenact Anything: Semantic Video Motion Transfer Using Motion-Textual
Inversion
Paper
•
2408.00458
•
Published
•
11
UniTalker: Scaling up Audio-Driven 3D Facial Animation through A Unified
Model
Paper
•
2408.00762
•
Published
•
9
VidGen-1M: A Large-Scale Dataset for Text-to-video Generation
Paper
•
2408.02629
•
Published
•
13
ReSyncer: Rewiring Style-based Generator for Unified Audio-Visually
Synced Facial Performer
Paper
•
2408.03284
•
Published
•
10
Puppet-Master: Scaling Interactive Video Generation as a Motion Prior
for Part-Level Dynamics
Paper
•
2408.04631
•
Published
•
8
Kalman-Inspired Feature Propagation for Video Face Super-Resolution
Paper
•
2408.05205
•
Published
•
8
CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer
Paper
•
2408.06072
•
Published
•
37
FancyVideo: Towards Dynamic and Consistent Video Generation via
Cross-frame Textual Guidance
Paper
•
2408.08189
•
Published
•
15
Factorized-Dreamer: Training A High-Quality Video Generator with Limited
and Low-Quality Data
Paper
•
2408.10119
•
Published
•
16
TWLV-I: Analysis and Insights from Holistic Evaluation on Video
Foundation Models
Paper
•
2408.11318
•
Published
•
55
TrackGo: A Flexible and Efficient Method for Controllable Video
Generation
Paper
•
2408.11475
•
Published
•
17
Real-Time Video Generation with Pyramid Attention Broadcast
Paper
•
2408.12588
•
Published
•
15
CustomCrafter: Customized Video Generation with Preserving Motion and
Concept Composition Abilities
Paper
•
2408.13239
•
Published
•
10
Training-free Long Video Generation with Chain of Diffusion Model
Experts
Paper
•
2408.13423
•
Published
•
22
TVG: A Training-free Transition Video Generation Method with Diffusion
Models
Paper
•
2408.13413
•
Published
•
14
Generative Inbetweening: Adapting Image-to-Video Models for Keyframe
Interpolation
Paper
•
2408.15239
•
Published
•
29
OD-VAE: An Omni-dimensional Video Compressor for Improving Latent Video
Diffusion Model
Paper
•
2409.01199
•
Published
•
13
Follow-Your-Canvas: Higher-Resolution Video Outpainting with Extensive
Content Generation
Paper
•
2409.01055
•
Published
•
6
Loopy: Taming Audio-Driven Portrait Avatar with Long-Term Motion
Dependency
Paper
•
2409.02634
•
Published
•
90
OSV: One Step is Enough for High-Quality Image to Video Generation
Paper
•
2409.11367
•
Published
•
13
Towards Diverse and Efficient Audio Captioning via Diffusion Models
Paper
•
2409.09401
•
Published
•
6
LVCD: Reference-based Lineart Video Colorization with Diffusion Models
Paper
•
2409.12960
•
Published
•
24
Denoising Reuse: Exploiting Inter-frame Motion Consistency for Efficient
Video Latent Generation
Paper
•
2409.12532
•
Published
•
5
MIMO: Controllable Character Video Synthesis with Spatial Decomposed
Modeling
Paper
•
2409.16160
•
Published
•
32
PhysGen: Rigid-Body Physics-Grounded Image-to-Video Generation
Paper
•
2409.18964
•
Published
•
26
VideoGuide: Improving Video Diffusion Models without Training Through a
Teacher's Guide
Paper
•
2410.04364
•
Published
•
28
AuroraCap: Efficient, Performant Video Detailed Captioning and a New
Benchmark
Paper
•
2410.03051
•
Published
•
4
Pyramidal Flow Matching for Efficient Video Generative Modeling
Paper
•
2410.05954
•
Published
•
38
T2V-Turbo-v2: Enhancing Video Generation Model Post-Training through
Data, Reward, and Conditional Guidance Design
Paper
•
2410.05677
•
Published
•
14
Loong: Generating Minute-level Long Videos with Autoregressive Language
Models
Paper
•
2410.02757
•
Published
•
36
Animate-X: Universal Character Image Animation with Enhanced Motion
Representation
Paper
•
2410.10306
•
Published
•
54
Cavia: Camera-controllable Multi-view Video Diffusion with
View-Integrated Attention
Paper
•
2410.10774
•
Published
•
25
LVD-2M: A Long-take Video Dataset with Temporally Dense Captions
Paper
•
2410.10816
•
Published
•
20
Movie Gen: A Cast of Media Foundation Models
Paper
•
2410.13720
•
Published
•
90
DreamVideo-2: Zero-Shot Subject-Driven Video Customization with Precise
Motion Control
Paper
•
2410.13830
•
Published
•
24
LongVU: Spatiotemporal Adaptive Compression for Long Video-Language
Understanding
Paper
•
2410.17434
•
Published
•
25
FasterCache: Training-Free Video Diffusion Model Acceleration with High
Quality
Paper
•
2410.19355
•
Published
•
23
MarDini: Masked Autoregressive Diffusion for Video Generation at Scale
Paper
•
2410.20280
•
Published
•
23
SlowFast-VGen: Slow-Fast Learning for Action-Driven Long Video
Generation
Paper
•
2410.23277
•
Published
•
9
Fashion-VDM: Video Diffusion Model for Virtual Try-On
Paper
•
2411.00225
•
Published
•
9
Adaptive Caching for Faster Video Generation with Diffusion Transformers
Paper
•
2411.02397
•
Published
•
23
Motion Control for Enhanced Complex Action Video Generation
Paper
•
2411.08328
•
Published
•
5
AnimateAnything: Consistent and Controllable Animation for Video
Generation
Paper
•
2411.10836
•
Published
•
23
StableV2V: Stablizing Shape Consistency in Video-to-Video Editing
Paper
•
2411.11045
•
Published
•
11
FlipSketch: Flipping Static Drawings to Text-Guided Sketch Animations
Paper
•
2411.10818
•
Published
•
24
VBench++: Comprehensive and Versatile Benchmark Suite for Video
Generative Models
Paper
•
2411.13503
•
Published
•
30
MagicDriveDiT: High-Resolution Long Video Generation for Autonomous
Driving with Adaptive Control
Paper
•
2411.13807
•
Published
•
11
Efficient Long Video Tokenization via Coordinated-based Patch
Reconstruction
Paper
•
2411.14762
•
Published
•
11
VideoRepair: Improving Text-to-Video Generation via Misalignment
Evaluation and Localized Refinement
Paper
•
2411.15115
•
Published
•
9
DreamRunner: Fine-Grained Storytelling Video Generation with
Retrieval-Augmented Motion Adaptation
Paper
•
2411.16657
•
Published
•
17
AnchorCrafter: Animate CyberAnchors Saling Your Products via
Human-Object Interacting Video Generation
Paper
•
2411.17383
•
Published
•
6
Identity-Preserving Text-to-Video Generation by Frequency Decomposition
Paper
•
2411.17440
•
Published
•
35
Free^2Guide: Gradient-Free Path Integral Control for Enhancing
Text-to-Video Generation with Large Vision-Language Models
Paper
•
2411.17041
•
Published
•
12
Video Depth without Video Models
Paper
•
2411.19189
•
Published
•
33
Timestep Embedding Tells: It's Time to Cache for Video Diffusion Model
Paper
•
2411.19108
•
Published
•
17
Spatiotemporal Skip Guidance for Enhanced Video Diffusion Sampling
Paper
•
2411.18664
•
Published
•
23
AC3D: Analyzing and Improving 3D Camera Control in Video Diffusion
Transformers
Paper
•
2411.18673
•
Published
•
8
VISTA: Enhancing Long-Duration and High-Resolution Video Understanding
by Video Spatiotemporal Augmentation
Paper
•
2412.00927
•
Published
•
26
Open-Sora Plan: Open-Source Large Video Generation Model
Paper
•
2412.00131
•
Published
•
32
TAPTRv3: Spatial and Temporal Context Foster Robust Tracking of Any
Point in Long Video
Paper
•
2411.18671
•
Published
•
20
Paper
•
2411.18933
•
Published
•
16
WF-VAE: Enhancing Video VAE by Wavelet-Driven Energy Flow for Latent
Video Diffusion Model
Paper
•
2411.17459
•
Published
•
10
Long Video Diffusion Generation with Segmented Cross-Attention and
Content-Rich Video Data Curation
Paper
•
2412.01316
•
Published
•
8
VideoGen-of-Thought: A Collaborative Framework for Multi-Shot Video
Generation
Paper
•
2412.02259
•
Published
•
59
NVComposer: Boosting Generative Novel View Synthesis with Multiple
Sparse and Unposed Images
Paper
•
2412.03517
•
Published
•
18
Mimir: Improving Video Diffusion Models for Precise Text Understanding
Paper
•
2412.03085
•
Published
•
12
LiFT: Leveraging Human Feedback for Text-to-Video Model Alignment
Paper
•
2412.04814
•
Published
•
45
GenMAC: Compositional Text-to-Video Generation with Multi-Agent
Collaboration
Paper
•
2412.04440
•
Published
•
19
Mind the Time: Temporally-Controlled Multi-Event Video Generation
Paper
•
2412.05263
•
Published
•
10
Divot: Diffusion Powers Video Tokenizer for Comprehension and Generation
Paper
•
2412.04432
•
Published
•
14
MotionShop: Zero-Shot Motion Transfer in Video Diffusion Models with
Mixture of Score Guidance
Paper
•
2412.05355
•
Published
•
7
STIV: Scalable Text and Image Conditioned Video Generation
Paper
•
2412.07730
•
Published
•
70
Paper
•
2412.07583
•
Published
•
19
MoViE: Mobile Diffusion for Video Editing
Paper
•
2412.06578
•
Published
•
18
Video Motion Transfer with Diffusion Transformers
Paper
•
2412.07776
•
Published
•
17
SynCamMaster: Synchronizing Multi-Camera Video Generation from Diverse
Viewpoints
Paper
•
2412.07760
•
Published
•
50
StyleMaster: Stylize Your Video with Artistic Generation and Translation
Paper
•
2412.07744
•
Published
•
19
Track4Gen: Teaching Video Diffusion Models to Track Points Improves
Video Generation
Paper
•
2412.06016
•
Published
•
20
DisPose: Disentangling Pose Guidance for Controllable Human Image
Animation
Paper
•
2412.09349
•
Published
•
8
InstanceCap: Improving Text-to-Video Generation via Instance-aware
Structured Caption
Paper
•
2412.09283
•
Published
•
19
LinGen: Towards High-Resolution Minute-Length Text-to-Video Generation
with Linear Computational Complexity
Paper
•
2412.09856
•
Published
•
9
VividFace: A Diffusion-Based Hybrid Framework for High-Fidelity Video
Face Swapping
Paper
•
2412.11279
•
Published
•
12
SUGAR: Subject-Driven Video Customization in a Zero-Shot Manner
Paper
•
2412.10533
•
Published
•
5
MIVE: New Design and Benchmark for Multi-Instance Video Editing
Paper
•
2412.12877
•
Published
•
4
AniDoc: Animation Creation Made Easier
Paper
•
2412.14173
•
Published
•
49
Autoregressive Video Generation without Vector Quantization
Paper
•
2412.14169
•
Published
•
14
VidTok: A Versatile and Open-Source Video Tokenizer
Paper
•
2412.13061
•
Published
•
8
Parallelized Autoregressive Visual Generation
Paper
•
2412.15119
•
Published
•
49
TRecViT: A Recurrent Video Transformer
Paper
•
2412.14294
•
Published
•
12
Large Motion Video Autoencoding with Cross-modal Video VAE
Paper
•
2412.17805
•
Published
•
23
DiTCtrl: Exploring Attention Control in Multi-Modal Diffusion
Transformer for Tuning-Free Multi-Prompt Longer Video Generation
Paper
•
2412.18597
•
Published
•
19
MotiF: Making Text Count in Image Animation with Motion Focal Loss
Paper
•
2412.16153
•
Published
•
6
VideoMaker: Zero-shot Customized Video Generation with the Inherent
Force of Video Diffusion Models
Paper
•
2412.19645
•
Published
•
13