File size: 5,889 Bytes
aec7213 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 |
---
license: apache-2.0
pipeline_tag: text-to-video
---
# RepVideo: Rethinking Cross-Layer Representation for Video Generation
<!-- <p align="center" width="100%">
<img src="ISEKAI_overview.png" width="80%" height="80%">
</p> -->
<div class="is-size-5 publication-authors", align="center">
<!-- Paper authors -->
<span class="author-block">
<a href="https://chenyangsi.top/" target="_blank">Chenyang Si</a><sup>1β </sup>,</span>
<span class="author-block">
<a href="https://scholar.google.com/citations?user=ORlELG8AAAAJ" target="_blank">Weichen Fan</a><sup>1β </sup>,</span>
<span class="author-block">
<a href="https://scholar.google.com/citations?user=FkkaUgwAAAAJ&hl=en" target="_blank">Zhengyao Lv</a><sup>2</sup>,</span>
<span class="author-block">
<a href="https://ziqihuangg.github.io/" target="_blank">Ziqi Huang</a><sup>1</sup>,</span>
<span class="author-block">
<a href="https://mmlab.siat.ac.cn/yuqiao" target="_blank">Yu Qiao</a><sup>2</sup>,</span>
<span class="author-block">
<a href="https://liuziwei7.github.io/" target="_blank">Ziwei Liu</a><sup>1β</sup>
</span>
</div>
<div class="is-size-5 publication-authors", align="center">
<span class="author-block">S-Lab, Nanyang Technological University<sup>1</sup> Shanghai Artificial Intelligence Laboratory <sup>2</sup> </span>
<span class="eql-cntrb"><small><br><sup>β </sup>Equal contribution. <sup>β</sup>Corresponding Author.</small></span>
</div>
</p>
<div align="center">
<a href="https://arxiv.org/pdf/2501.08994">Paper</a> |
<a href="https://vchitect.github.io/RepVid-Webpage/">Project Page</a>
</div>
<!-- <p align="center">
π Join our <a href="https://github.com/Vchitect/RepVideo/tree/master/assets/channel/lark.jpeg" target="_blank">Lark</a> and <a href="https://discord.gg/aJAbn9sN" target="_blank">Discord</a>
</p> -->
---
![](https://img.shields.io/badge/RepVideo-v0.1-darkcyan)
[![Hits](https://hits.seeyoufarm.com/api/count/incr/badge.svg?url=https%3A%2F%2Fgithub.com%2FVchitect%2FRepVideo&count_bg=%23BDC4B7&title_bg=%2342C4A8&icon=octopusdeploy.svg&icon_color=%23E7E7E7&title=visitors&edge_flat=true)](https://hits.seeyoufarm.com)
[![Generic badge](https://img.shields.io/badge/Checkpoint-red.svg)](https://huggingface.co/Vchitect/RepVideo)
[![Hits](https://hits.seeyoufarm.com/api/count/incr/badge.svg?url=https%3A%2F%2Farxiv.org%2Fpdf%2F2501.08994&count_bg=%2379C83D&title_bg=%23555555&icon=&icon_color=%23E7E7E7&title=Paper&edge_flat=false)](https://hits.seeyoufarm.com)
[![Hits](https://hits.seeyoufarm.com/api/count/incr/badge.svg?url=https%3A%2F%2Fgithub.com%2FVchitect%2FRepVid-Webpage&count_bg=%23BE4C4C&title_bg=%235E5D64&icon=&icon_color=%23E7E7E7&title=Page&edge_flat=false)](https://hits.seeyoufarm.com)
## :astonished: Gallery
<table class="center">
<tr>
<td><img src="assets/1.gif"> </td>
<td><img src="assets/2.gif"> </td>
<td><img src="assets/3.gif"> </td>
</tr>
<tr>
<td><img src="assets/4.gif"> </td>
<td><img src="assets/5.gif"> </td>
<td><img src="assets/6.gif"> </td>
</tr>
<tr>
<td><img src="assets/7.gif"> </td>
<td><img src="assets/8.gif"> </td>
<td><img src="assets/9.gif"> </td>
</tr>
<tr>
<td><img src="assets/10.gif"> </td>
<td><img src="assets/11.gif"> </td>
<td><img src="assets/12.gif"> </td>
</tr>
</table>
## Installation
### 1. Create a conda environment and download models
```bash
conda create -n RepVid python==3.10
conda activate RepVid
pip install -r requirements.txt
mkdir ckpt
cd ckpt
mkdir t5-v1_1-xxl
wget https://huggingface.co/THUDM/CogVideoX-2b/resolve/main/text_encoder/config.json
wget https://huggingface.co/THUDM/CogVideoX-2b/resolve/main/text_encoder/model-00001-of-00002.safetensors
wget https://huggingface.co/THUDM/CogVideoX-2b/resolve/main/text_encoder/model-00002-of-00002.safetensors
wget https://huggingface.co/THUDM/CogVideoX-2b/resolve/main/text_encoder/model.safetensors.index.json
wget https://huggingface.co/THUDM/CogVideoX-2b/resolve/main/tokenizer/added_tokens.json
wget https://huggingface.co/THUDM/CogVideoX-2b/resolve/main/tokenizer/special_tokens_map.json
wget https://huggingface.co/THUDM/CogVideoX-2b/resolve/main/tokenizer/spiece.model
wget https://huggingface.co/THUDM/CogVideoX-2b/resolve/main/tokenizer/tokenizer_config.json
cd ../
mkdir vae
wget https://cloud.tsinghua.edu.cn/f/fdba7608a49c463ba754/?dl=1
mv 'index.html?dl=1' vae.zip
unzip vae.zip
```
## Inference
~~~bash
cd sat
bash run.sh
~~~
## BibTeX
```
@article{si2025RepVideo,
title={RepVideo: Rethinking Cross-Layer Representation for Video Generation},
author={Si, Chenyang and Fan, Weichen and Lv, Zhengyao and Huang, Ziqi and Qiao, Yu and Liu, Ziwei},
journal={arXiv 2501.08994},
year={2025}
}
```
## π License
This code is licensed under Apache-2.0. The framework is fully open for academic research and also allows free commercial usage.
## Disclaimer
We disclaim responsibility for user-generated content. The model was not trained to realistically represent people or events, so using it to generate such content is beyond the model's capabilities. It is prohibited for pornographic, violent and bloody content generation, and to generate content that is demeaning or harmful to people or their environment, culture, religion, etc. Users are solely liable for their actions. The project contributors are not legally affiliated with, nor accountable for users' behaviors. Use the generative model responsibly, adhering to ethical and legal standards. |