RealCraft: Attention Control as A Solution for Zero-shot Long Video Editing
Abstract
Although large-scale text-to-image generative models have shown promising performance in synthesizing high-quality images, directly applying these models to image editing remains a significant challenge. This challenge is further amplified in video editing due to the additional dimension of time. Especially for editing real videos as it necessitates maintaining a stable semantic layout across the frames while executing localized edits precisely without disrupting the existing backgrounds. In this paper, we propose RealCraft, an attention-control-based method for zero-shot editing in real videos. By employing the object-centric manipulation of cross-attention between prompts and frames and spatial-temporal attention within the frames, we achieve precise shape-wise editing along with enhanced consistency. Our model can be used directly with Stable Diffusion and operates without the need for additional localized information. We showcase our zero-shot attention-control-based method across a range of videos, demonstrating localized, high-fidelity, shape-precise and time-consistent editing in videos of various lengths, up to 64 frames.
Community
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- MagicStick: Controllable Video Editing via Control Handle Transformations (2023)
- RAVE: Randomized Noise Shuffling for Fast and Consistent Video Editing with Diffusion Models (2023)
- DiffusionAtlas: High-Fidelity Consistent Diffusion Video Editing (2023)
- BIVDiff: A Training-Free Framework for General-Purpose Video Synthesis via Bridging Image and Video Diffusion Models (2023)
- VASE: Object-Centric Appearance and Shape Manipulation of Real Videos (2024)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper