arxiv:2409.07756

DiTAS: Quantizing Diffusion Transformers via Enhanced Activation Smoothing

Published on Sep 12, 2024

Authors:

Abstract

Diffusion Transformers (DiTs) have recently attracted significant interest from both industry and academia due to their enhanced capabilities in visual generation, surpassing the performance of traditional diffusion models that employ U-Net. However, the improved performance of DiTs comes at the expense of higher parameter counts and implementation costs, which significantly limits their deployment on resource-constrained devices like mobile phones. We propose DiTAS, a data-free post-training quantization (PTQ) method for efficient DiT inference. DiTAS relies on the proposed temporal-aggregated smoothing techniques to mitigate the impact of the channel-wise outliers within the input activations, leading to much lower quantization error under extremely low bitwidth. To further enhance the performance of the quantized DiT, we adopt the layer-wise grid search strategy to optimize the smoothing factor. Experimental results demonstrate that our approach enables 4-bit weight, 8-bit activation (W4A8) quantization for DiTs while maintaining comparable performance as the full-precision model.

View arXiv page View PDF Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2409.07756 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2409.07756 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2409.07756 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.