Zigeng
/

VAR_CoDe

Model card Files Files and versions Community

VAR_CoDe / README.md

Zigeng's picture

Update README.md

b9263b6 verified about 2 months ago

|

history blame contribute delete

2.07 kB

	---
	license: mit
	---

	<div align="center">
	<h1>🚀 CoDe: Collaborative Decoding Makes Visual Auto-Regressive Modeling Efficient</h1>
	</div>

	> Collaborative Decoding Makes Visual Auto-Regressive Modeling Efficient
	> [Zigeng Chen](https://github.com/czg1225), [Xinyin Ma](https://horseee.github.io/), [Gongfan Fang](https://fangggf.github.io/), [Xinchao Wang](https://sites.google.com/site/sitexinchaowang/)
	> [Learning and Vision Lab](http://lv-nus.org/), National University of Singapore
	> 🥯[[Paper]](https://arxiv.org/abs/2411.17787)🎄[[Project Page]](https://czg1225.github.io/CoDe_page/) 💻 [[GitHub]](https://github.com/czg1225/CoDe)

	<div align="center">
	<img src="intro.png" width="100%" ></img>
	<br>
	<em>
	We partition the multi-scale inference process into a seamless collaboration between a large model and a small model.
	</em>
	</div>
	<be>

	<div align="center">
	<img src="teaser.png" width="90%" ></img>
	<br>
	<em>
	1.7x Speedup and 0.5x memory consumption on ImageNet-256 generation. Top: original VAR-d30; Bottom: CoDe N=8. Speed measurement does not include vae decoder
	</em>
	</div>
	<be>

	## 💡 Introduction
	We propose Collaborative Decoding (CoDe), a novel decoding strategy tailored to the VAR framework. CoDe capitalizes on two critical observations: the substantially reduced parameter demands at larger scales and the exclusive generation patterns across different scales. Based on these insights, we partition the multi-scale inference process into a seamless collaboration between a large model and a small model.This collaboration yields remarkable efficiency with minimal impact on quality: CoDe achieves a 1.7x speedup, slashes memory usage by around 50%, and preserves image quality with only a negligible FID increase from 1.95 to 1.98. When drafting steps are further decreased, CoDe can achieve an impressive 2.9x acceleration, reaching over 41 images/s at 256x256 resolution on a single NVIDIA 4090 GPU, while preserving a commendable FID of 2.27.
	![figure](curve.png)
	![figure](frame.png)