CogView3: Finer and Faster Text-to-Image Generation via Relay Diffusion
Abstract
Recent advancements in text-to-image generative systems have been largely driven by diffusion models. However, single-stage text-to-image diffusion models still face challenges, in terms of computational efficiency and the refinement of image details. To tackle the issue, we propose CogView3, an innovative cascaded framework that enhances the performance of text-to-image diffusion. CogView3 is the first model implementing relay diffusion in the realm of text-to-image generation, executing the task by first creating low-resolution images and subsequently applying relay-based super-resolution. This methodology not only results in competitive text-to-image outputs but also greatly reduces both training and inference costs. Our experimental results demonstrate that CogView3 outperforms SDXL, the current state-of-the-art open-source text-to-image diffusion model, by 77.0\% in human evaluations, all while requiring only about 1/2 of the inference time. The distilled variant of CogView3 achieves comparable performance while only utilizing 1/10 of the inference time by SDXL.
Community
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- UNIMO-G: Unified Image Generation through Multimodal Conditional Diffusion (2024)
- UniVG: Towards UNIfied-modal Video Generation (2024)
- ResAdapter: Domain Consistent Resolution Adapter for Diffusion Models (2024)
- Textual Localization: Decomposing Multi-concept Images for Subject-Driven Text-to-Image Generation (2024)
- Make a Cheap Scaling: A Self-Cascade Diffusion Model for Higher-Resolution Adaptation (2024)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
狗
Models citing this paper 1
Datasets citing this paper 0
No dataset linking this paper