bubbliiiing
commited on
Commit
·
8f9e423
1
Parent(s):
3b485b4
Update Readme
Browse files- README.md +90 -23
- README_en.md +90 -23
README.md
CHANGED
@@ -33,6 +33,8 @@ tasks:
|
|
33 |
|
34 |
😊 Welcome!
|
35 |
|
|
|
|
|
36 |
[English](./README_en.md) | 简体中文
|
37 |
|
38 |
# 目录
|
@@ -52,6 +54,7 @@ CogVideoX-Fun是一个基于CogVideoX结构修改后的的pipeline,是一个
|
|
52 |
我们会逐渐支持从不同平台快速启动,请参阅 [快速启动](#快速启动)。
|
53 |
|
54 |
新特性:
|
|
|
55 |
- 创建代码!现在支持 Windows 和 Linux。支持2b与5b最大256x256x49到1024x1024x49的任意分辨率的视频生成。[ 2024.09.18 ]
|
56 |
|
57 |
功能概览:
|
@@ -95,10 +98,10 @@ cd CogVideoX-Fun
|
|
95 |
mkdir models/Diffusion_Transformer
|
96 |
mkdir models/Personalized_Model
|
97 |
|
98 |
-
wget https://pai-aigc-photog.oss-cn-hangzhou.aliyuncs.com/cogvideox_fun/Diffusion_Transformer/CogVideoX-Fun-2b-InP.tar.gz -O models/Diffusion_Transformer/CogVideoX-Fun-2b-InP.tar.gz
|
99 |
|
100 |
cd models/Diffusion_Transformer/
|
101 |
-
tar -xvf CogVideoX-Fun-2b-InP.tar.gz
|
102 |
cd ../../
|
103 |
```
|
104 |
|
@@ -130,8 +133,8 @@ Linux 的详细信息:
|
|
130 |
```
|
131 |
📦 models/
|
132 |
├── 📂 Diffusion_Transformer/
|
133 |
-
│ ├── 📂 CogVideoX-Fun-2b-InP/
|
134 |
-
│ └── 📂 CogVideoX-Fun-5b-InP/
|
135 |
├── 📂 Personalized_Model/
|
136 |
│ └── your trained trainformer model / your trained lora model (for UI load)
|
137 |
```
|
@@ -139,42 +142,43 @@ Linux 的详细信息:
|
|
139 |
# 视频作品
|
140 |
所展示的结果都是图生视频获得。
|
141 |
|
142 |
-
### CogVideoX-Fun-5B
|
143 |
|
144 |
Resolution-1024
|
145 |
|
146 |
<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
|
147 |
<tr>
|
148 |
<td>
|
149 |
-
<video src="https://github.com/user-attachments/assets/
|
150 |
</td>
|
151 |
<td>
|
152 |
-
<video src="https://github.com/user-attachments/assets/
|
153 |
</td>
|
154 |
<td>
|
155 |
-
<video src="https://github.com/user-attachments/assets/
|
156 |
</td>
|
157 |
<td>
|
158 |
-
<video src="https://github.com/user-attachments/assets/
|
159 |
</td>
|
160 |
</tr>
|
161 |
</table>
|
162 |
|
|
|
163 |
Resolution-768
|
164 |
|
165 |
<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
|
166 |
<tr>
|
167 |
<td>
|
168 |
-
<video src="https://github.com/user-attachments/assets/
|
169 |
</td>
|
170 |
<td>
|
171 |
-
<video src="https://github.com/user-attachments/assets/
|
172 |
</td>
|
173 |
<td>
|
174 |
-
<video src="https://github.com/user-attachments/assets/
|
175 |
</td>
|
176 |
<td>
|
177 |
-
<video src="https://github.com/user-attachments/assets/
|
178 |
</td>
|
179 |
</tr>
|
180 |
</table>
|
@@ -184,41 +188,92 @@ Resolution-512
|
|
184 |
<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
|
185 |
<tr>
|
186 |
<td>
|
187 |
-
<video src="https://github.com/user-attachments/assets/
|
188 |
</td>
|
189 |
<td>
|
190 |
-
<video src="https://github.com/user-attachments/assets/
|
191 |
</td>
|
192 |
<td>
|
193 |
-
<video src="https://github.com/user-attachments/assets/
|
194 |
</td>
|
195 |
<td>
|
196 |
-
<video src="https://github.com/user-attachments/assets/
|
197 |
</td>
|
198 |
</tr>
|
199 |
</table>
|
200 |
|
201 |
-
### CogVideoX-Fun-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
202 |
|
203 |
Resolution-768
|
204 |
|
205 |
<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
|
206 |
<tr>
|
207 |
<td>
|
208 |
-
<video src="https://github.com/user-attachments/assets/
|
209 |
</td>
|
210 |
<td>
|
211 |
-
<video src="https://github.com/user-attachments/assets/
|
212 |
</td>
|
213 |
<td>
|
214 |
-
<video src="https://github.com/user-attachments/assets/
|
215 |
</td>
|
216 |
<td>
|
217 |
-
<video src="https://github.com/user-attachments/assets/
|
218 |
</td>
|
219 |
</tr>
|
220 |
</table>
|
221 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
222 |
|
223 |
# 如何使用
|
224 |
|
@@ -318,6 +373,18 @@ sh scripts/train.sh
|
|
318 |
关于一些参数的设置细节,可以查看[Readme Train](scripts/README_TRAIN.md)与[Readme Lora](scripts/README_TRAIN_LORA.md)
|
319 |
|
320 |
# 模型地址
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
321 |
| 名称 | 存储空间 | Hugging Face | Model Scope | 描述 |
|
322 |
|--|--|--|--|--|
|
323 |
| CogVideoX-Fun-2b-InP.tar.gz | 解压前 9.7 GB / 解压后 13.0 GB | [🤗Link](https://huggingface.co/alibaba-pai/CogVideoX-Fun-2b-InP) | [😄Link](https://modelscope.cn/models/PAI/CogVideoX-Fun-2b-InP) | 官方的图生视频权重。支持多分辨率(512,768,1024,1280)的视频预测,以49帧、每秒8帧进行训练 |
|
@@ -335,4 +402,4 @@ sh scripts/train.sh
|
|
335 |
|
336 |
CogVideoX-2B 模型 (包括其对应的Transformers模块,VAE模块) 根据 [Apache 2.0 协议](LICENSE) 许可证发布。
|
337 |
|
338 |
-
CogVideoX-5B 模型(Transformer 模块)在[CogVideoX许可证](https://huggingface.co/THUDM/CogVideoX-5b/blob/main/LICENSE)下发布.
|
|
|
33 |
|
34 |
😊 Welcome!
|
35 |
|
36 |
+
[![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-yellow)](https://huggingface.co/spaces/alibaba-pai/CogVideoX-Fun-5b)
|
37 |
+
|
38 |
[English](./README_en.md) | 简体中文
|
39 |
|
40 |
# 目录
|
|
|
54 |
我们会逐渐支持从不同平台快速启动,请参阅 [快速启动](#快速启动)。
|
55 |
|
56 |
新特性:
|
57 |
+
- 重新训练i2v模型,添加Noise,使得视频的运动幅度更大。上传控制模型训练代码与Control模型。[ 2024.09.29 ]
|
58 |
- 创建代码!现在支持 Windows 和 Linux。支持2b与5b最大256x256x49到1024x1024x49的任意分辨率的视频生成。[ 2024.09.18 ]
|
59 |
|
60 |
功能概览:
|
|
|
98 |
mkdir models/Diffusion_Transformer
|
99 |
mkdir models/Personalized_Model
|
100 |
|
101 |
+
wget https://pai-aigc-photog.oss-cn-hangzhou.aliyuncs.com/cogvideox_fun/Diffusion_Transformer/CogVideoX-Fun-V1.1-2b-InP.tar.gz -O models/Diffusion_Transformer/CogVideoX-Fun-V1.1-2b-InP.tar.gz
|
102 |
|
103 |
cd models/Diffusion_Transformer/
|
104 |
+
tar -xvf CogVideoX-Fun-V1.1-2b-InP.tar.gz
|
105 |
cd ../../
|
106 |
```
|
107 |
|
|
|
133 |
```
|
134 |
📦 models/
|
135 |
├── 📂 Diffusion_Transformer/
|
136 |
+
│ ├── 📂 CogVideoX-Fun-V1.1-2b-InP/
|
137 |
+
│ └── 📂 CogVideoX-Fun-V1.1-5b-InP/
|
138 |
├── 📂 Personalized_Model/
|
139 |
│ └── your trained trainformer model / your trained lora model (for UI load)
|
140 |
```
|
|
|
142 |
# 视频作品
|
143 |
所展示的结果都是图生视频获得。
|
144 |
|
145 |
+
### CogVideoX-Fun-V1.1-5B
|
146 |
|
147 |
Resolution-1024
|
148 |
|
149 |
<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
|
150 |
<tr>
|
151 |
<td>
|
152 |
+
<video src="https://github.com/user-attachments/assets/34e7ec8f-293e-4655-bb14-5e1ee476f788" width="100%" controls autoplay loop></video>
|
153 |
</td>
|
154 |
<td>
|
155 |
+
<video src="https://github.com/user-attachments/assets/7809c64f-eb8c-48a9-8bdc-ca9261fd5434" width="100%" controls autoplay loop></video>
|
156 |
</td>
|
157 |
<td>
|
158 |
+
<video src="https://github.com/user-attachments/assets/8e76aaa4-c602-44ac-bcb4-8b24b72c386c" width="100%" controls autoplay loop></video>
|
159 |
</td>
|
160 |
<td>
|
161 |
+
<video src="https://github.com/user-attachments/assets/19dba894-7c35-4f25-b15c-384167ab3b03" width="100%" controls autoplay loop></video>
|
162 |
</td>
|
163 |
</tr>
|
164 |
</table>
|
165 |
|
166 |
+
|
167 |
Resolution-768
|
168 |
|
169 |
<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
|
170 |
<tr>
|
171 |
<td>
|
172 |
+
<video src="https://github.com/user-attachments/assets/0bc339b9-455b-44fd-8917-80272d702737" width="100%" controls autoplay loop></video>
|
173 |
</td>
|
174 |
<td>
|
175 |
+
<video src="https://github.com/user-attachments/assets/70a043b9-6721-4bd9-be47-78b7ec5c27e9" width="100%" controls autoplay loop></video>
|
176 |
</td>
|
177 |
<td>
|
178 |
+
<video src="https://github.com/user-attachments/assets/d5dd6c09-14f3-40f8-8b6d-91e26519b8ac" width="100%" controls autoplay loop></video>
|
179 |
</td>
|
180 |
<td>
|
181 |
+
<video src="https://github.com/user-attachments/assets/9327e8bc-4f17-46b0-b50d-38c250a9483a" width="100%" controls autoplay loop></video>
|
182 |
</td>
|
183 |
</tr>
|
184 |
</table>
|
|
|
188 |
<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
|
189 |
<tr>
|
190 |
<td>
|
191 |
+
<video src="https://github.com/user-attachments/assets/ef407030-8062-454d-aba3-131c21e6b58c" width="100%" controls autoplay loop></video>
|
192 |
</td>
|
193 |
<td>
|
194 |
+
<video src="https://github.com/user-attachments/assets/7610f49e-38b6-4214-aa48-723ae4d1b07e" width="100%" controls autoplay loop></video>
|
195 |
</td>
|
196 |
<td>
|
197 |
+
<video src="https://github.com/user-attachments/assets/1fff0567-1e15-415c-941e-53ee8ae2c841" width="100%" controls autoplay loop></video>
|
198 |
</td>
|
199 |
<td>
|
200 |
+
<video src="https://github.com/user-attachments/assets/bcec48da-b91b-43a0-9d50-cf026e00fa4f" width="100%" controls autoplay loop></video>
|
201 |
</td>
|
202 |
</tr>
|
203 |
</table>
|
204 |
|
205 |
+
### CogVideoX-Fun-V1.1-5B-Pose
|
206 |
+
|
207 |
+
<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
|
208 |
+
<tr>
|
209 |
+
<td>
|
210 |
+
Resolution-512
|
211 |
+
</td>
|
212 |
+
<td>
|
213 |
+
Resolution-768
|
214 |
+
</td>
|
215 |
+
<td>
|
216 |
+
Resolution-1024
|
217 |
+
</td>
|
218 |
+
<tr>
|
219 |
+
<td>
|
220 |
+
<video src="https://github.com/user-attachments/assets/a746df51-9eb7-4446-bee5-2ee30285c143" width="100%" controls autoplay loop></video>
|
221 |
+
</td>
|
222 |
+
<td>
|
223 |
+
<video src="https://github.com/user-attachments/assets/db295245-e6aa-43be-8c81-32cb411f1473" width="100%" controls autoplay loop></video>
|
224 |
+
</td>
|
225 |
+
<td>
|
226 |
+
<video src="https://github.com/user-attachments/assets/ec9875b2-fde0-48e1-ab7e-490cee51ef40" width="100%" controls autoplay loop></video>
|
227 |
+
</td>
|
228 |
+
</tr>
|
229 |
+
</table>
|
230 |
+
|
231 |
+
### CogVideoX-Fun-V1.1-2B
|
232 |
|
233 |
Resolution-768
|
234 |
|
235 |
<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
|
236 |
<tr>
|
237 |
<td>
|
238 |
+
<video src="https://github.com/user-attachments/assets/03235dea-980e-4fc5-9c41-e40a5bc1b6d0" width="100%" controls autoplay loop></video>
|
239 |
</td>
|
240 |
<td>
|
241 |
+
<video src="https://github.com/user-attachments/assets/f7302648-5017-47db-bdeb-4d893e620b37" width="100%" controls autoplay loop></video>
|
242 |
</td>
|
243 |
<td>
|
244 |
+
<video src="https://github.com/user-attachments/assets/cbadf411-28fa-4b87-813d-da63ff481904" width="100%" controls autoplay loop></video>
|
245 |
</td>
|
246 |
<td>
|
247 |
+
<video src="https://github.com/user-attachments/assets/87cc9d0b-b6fe-4d2d-b447-174513d169ab" width="100%" controls autoplay loop></video>
|
248 |
</td>
|
249 |
</tr>
|
250 |
</table>
|
251 |
|
252 |
+
### CogVideoX-Fun-V1.1-2B-Pose
|
253 |
+
|
254 |
+
<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
|
255 |
+
<tr>
|
256 |
+
<td>
|
257 |
+
Resolution-512
|
258 |
+
</td>
|
259 |
+
<td>
|
260 |
+
Resolution-768
|
261 |
+
</td>
|
262 |
+
<td>
|
263 |
+
Resolution-1024
|
264 |
+
</td>
|
265 |
+
<tr>
|
266 |
+
<td>
|
267 |
+
<video src="https://github.com/user-attachments/assets/487bcd7b-1b7f-4bb4-95b5-96a6b6548b3e" width="100%" controls autoplay loop></video>
|
268 |
+
</td>
|
269 |
+
<td>
|
270 |
+
<video src="https://github.com/user-attachments/assets/2710fd18-8489-46e4-8086-c237309ae7f6" width="100%" controls autoplay loop></video>
|
271 |
+
</td>
|
272 |
+
<td>
|
273 |
+
<video src="https://github.com/user-attachments/assets/b79513db-7747-4512-b86c-94f9ca447fe2" width="100%" controls autoplay loop></video>
|
274 |
+
</td>
|
275 |
+
</tr>
|
276 |
+
</table>
|
277 |
|
278 |
# 如何使用
|
279 |
|
|
|
373 |
关于一些参数的设置细节,可以查看[Readme Train](scripts/README_TRAIN.md)与[Readme Lora](scripts/README_TRAIN_LORA.md)
|
374 |
|
375 |
# 模型地址
|
376 |
+
|
377 |
+
V1.1:
|
378 |
+
|
379 |
+
| 名称 | 存储空间 | Hugging Face | Model Scope | 描述 |
|
380 |
+
|--|--|--|--|--|
|
381 |
+
| CogVideoX-Fun-V1.1-2b-InP.tar.gz | 解压前 9.7 GB / 解压后 13.0 GB | [🤗Link](https://huggingface.co/alibaba-pai/CogVideoX-Fun-V1.1-2b-InP) | [😄Link](https://modelscope.cn/models/PAI/CogVideoX-Fun-V1.1-2b-InP) | 官方的图生视频权重。添加了Noise,运动幅度相比于V1.0更大。支持多分辨率(512,768,1024,1280)的视频预测,以49帧、每秒8帧进行训练 |
|
382 |
+
| CogVideoX-Fun-V1.1-5b-InP.tar.gz | 解压前 16.0GB / 解压后 20.0 GB | [🤗Link](https://huggingface.co/alibaba-pai/CogVideoX-Fun-V1.1-5b-InP) | [😄Link](https://modelscope.cn/models/PAI/CogVideoX-Fun-V1.1-5b-InP) | 官方的图生视频权重。添加了Noise,运动幅度相比于V1.0更大。支持多分辨率(512,768,1024,1280)的视频预测,以49帧、每秒8帧进行训练 |
|
383 |
+
| CogVideoX-Fun-V1.1-2b-Pose.tar.gz | 解压前 9.7 GB / 解压后 13.0 GB | [🤗Link](https://huggingface.co/alibaba-pai/CogVideoX-Fun-V1.1-2b-Pose) | [😄Link](https://modelscope.cn/models/PAI/CogVideoX-Fun-V1.1-2b-Pose) | 官方的姿态控制生视频权重。支持多分辨率(512,768,1024,1280)的视频预测,以49帧、每秒8帧进行训练 |
|
384 |
+
| CogVideoX-Fun-V1.1-5b-Pose.tar.gz | 解压前 16.0GB / 解压后 20.0 GB | [🤗Link](https://huggingface.co/alibaba-pai/CogVideoX-Fun-V1.1-5b-Pose) | [😄Link](https://modelscope.cn/models/PAI/CogVideoX-Fun-V1.1-5b-Pose) | 官方的姿态控制生视频权重。支持多分辨率(512,768,1024,1280)的视频预测,以49帧、每秒8帧进行训练 |
|
385 |
+
|
386 |
+
V1.0:
|
387 |
+
|
388 |
| 名称 | 存储空间 | Hugging Face | Model Scope | 描述 |
|
389 |
|--|--|--|--|--|
|
390 |
| CogVideoX-Fun-2b-InP.tar.gz | 解压前 9.7 GB / 解压后 13.0 GB | [🤗Link](https://huggingface.co/alibaba-pai/CogVideoX-Fun-2b-InP) | [😄Link](https://modelscope.cn/models/PAI/CogVideoX-Fun-2b-InP) | 官方的图生视频权重。支持多分辨率(512,768,1024,1280)的视频预测,以49帧、每秒8帧进行训练 |
|
|
|
402 |
|
403 |
CogVideoX-2B 模型 (包括其对应的Transformers模块,VAE模块) 根据 [Apache 2.0 协议](LICENSE) 许可证发布。
|
404 |
|
405 |
+
CogVideoX-5B 模型(Transformer 模块)在[CogVideoX许可证](https://huggingface.co/THUDM/CogVideoX-5b/blob/main/LICENSE)下发布.
|
README_en.md
CHANGED
@@ -23,6 +23,7 @@ CogVideoX-Fun is a modified pipeline based on the CogVideoX structure, designed
|
|
23 |
We will support quick pull-ups from different platforms, refer to [Quick Start](#quick-start).
|
24 |
|
25 |
What's New:
|
|
|
26 |
- Create code! Now supporting Windows and Linux. Supports 2b and 5b models. Supports video generation at any resolution from 256x256x49 to 1024x1024x49. [ 2024.09.18 ]
|
27 |
|
28 |
Function:
|
@@ -68,10 +69,10 @@ cd CogVideoX-Fun
|
|
68 |
mkdir models/Diffusion_Transformer
|
69 |
mkdir models/Personalized_Model
|
70 |
|
71 |
-
wget https://pai-aigc-photog.oss-cn-hangzhou.aliyuncs.com/cogvideox_fun/Diffusion_Transformer/CogVideoX-Fun-2b-InP.tar.gz -O models/Diffusion_Transformer/CogVideoX-Fun-2b-InP.tar.gz
|
72 |
|
73 |
cd models/Diffusion_Transformer/
|
74 |
-
tar -xvf CogVideoX-Fun-2b-InP.tar.gz
|
75 |
cd ../../
|
76 |
```
|
77 |
|
@@ -103,8 +104,8 @@ We'd better place the [weights](#model-zoo) along the specified path:
|
|
103 |
```
|
104 |
📦 models/
|
105 |
├── 📂 Diffusion_Transformer/
|
106 |
-
│ ├── 📂 CogVideoX-Fun-2b-InP/
|
107 |
-
│ └── 📂 CogVideoX-Fun-5b-InP/
|
108 |
├── 📂 Personalized_Model/
|
109 |
│ └── your trained trainformer model / your trained lora model (for UI load)
|
110 |
```
|
@@ -112,42 +113,43 @@ We'd better place the [weights](#model-zoo) along the specified path:
|
|
112 |
# Video Result
|
113 |
The results displayed are all based on image.
|
114 |
|
115 |
-
### CogVideoX-Fun-5B
|
116 |
|
117 |
Resolution-1024
|
118 |
|
119 |
<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
|
120 |
<tr>
|
121 |
<td>
|
122 |
-
<video src="https://github.com/user-attachments/assets/
|
123 |
</td>
|
124 |
<td>
|
125 |
-
<video src="https://github.com/user-attachments/assets/
|
126 |
</td>
|
127 |
<td>
|
128 |
-
<video src="https://github.com/user-attachments/assets/
|
129 |
</td>
|
130 |
<td>
|
131 |
-
<video src="https://github.com/user-attachments/assets/
|
132 |
</td>
|
133 |
</tr>
|
134 |
</table>
|
135 |
|
|
|
136 |
Resolution-768
|
137 |
|
138 |
<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
|
139 |
<tr>
|
140 |
<td>
|
141 |
-
<video src="https://github.com/user-attachments/assets/
|
142 |
</td>
|
143 |
<td>
|
144 |
-
<video src="https://github.com/user-attachments/assets/
|
145 |
</td>
|
146 |
<td>
|
147 |
-
<video src="https://github.com/user-attachments/assets/
|
148 |
</td>
|
149 |
<td>
|
150 |
-
<video src="https://github.com/user-attachments/assets/
|
151 |
</td>
|
152 |
</tr>
|
153 |
</table>
|
@@ -157,35 +159,89 @@ Resolution-512
|
|
157 |
<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
|
158 |
<tr>
|
159 |
<td>
|
160 |
-
<video src="https://github.com/user-attachments/assets/
|
161 |
</td>
|
162 |
<td>
|
163 |
-
<video src="https://github.com/user-attachments/assets/
|
164 |
</td>
|
165 |
<td>
|
166 |
-
<video src="https://github.com/user-attachments/assets/
|
167 |
</td>
|
168 |
<td>
|
169 |
-
<video src="https://github.com/user-attachments/assets/
|
170 |
</td>
|
171 |
</tr>
|
172 |
</table>
|
173 |
|
174 |
-
### CogVideoX-Fun-
|
175 |
|
176 |
<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
|
177 |
<tr>
|
178 |
<td>
|
179 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
180 |
</td>
|
181 |
<td>
|
182 |
-
<video src="https://github.com/user-attachments/assets/
|
183 |
</td>
|
184 |
<td>
|
185 |
-
<video src="https://github.com/user-attachments/assets/
|
186 |
</td>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
187 |
<td>
|
188 |
-
<video src="https://github.com/user-attachments/assets/
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
189 |
</td>
|
190 |
</tr>
|
191 |
</table>
|
@@ -283,11 +339,22 @@ Then, we run scripts/train.sh.
|
|
283 |
sh scripts/train.sh
|
284 |
```
|
285 |
|
286 |
-
For details on setting some parameters, please refer to [Readme Train](scripts/README_TRAIN.md)
|
287 |
|
288 |
|
289 |
# Model zoo
|
290 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
291 |
| Name | Storage Space | Hugging Face | Model Scope | Description |
|
292 |
|--|--|--|--|--|
|
293 |
| CogVideoX-Fun-2b-InP.tar.gz | Before extraction:9.7 GB \/ After extraction: 13.0 GB | [🤗Link](https://huggingface.co/alibaba-pai/CogVideoX-Fun-2b-InP) | [😄Link](https://modelscope.cn/models/PAI/CogVideoX-Fun-2b-InP) | Our official graph-generated video model is capable of predicting videos at multiple resolutions (512, 768, 1024, 1280) and has been trained on 49 frames at a rate of 8 frames per second. |
|
|
|
23 |
We will support quick pull-ups from different platforms, refer to [Quick Start](#quick-start).
|
24 |
|
25 |
What's New:
|
26 |
+
- Retrain the i2v model and add noise to increase the motion amplitude of the video. Upload the control model training code and control model. [ 2024.09.29 ]
|
27 |
- Create code! Now supporting Windows and Linux. Supports 2b and 5b models. Supports video generation at any resolution from 256x256x49 to 1024x1024x49. [ 2024.09.18 ]
|
28 |
|
29 |
Function:
|
|
|
69 |
mkdir models/Diffusion_Transformer
|
70 |
mkdir models/Personalized_Model
|
71 |
|
72 |
+
wget https://pai-aigc-photog.oss-cn-hangzhou.aliyuncs.com/cogvideox_fun/Diffusion_Transformer/CogVideoX-Fun-V1.1-2b-InP.tar.gz -O models/Diffusion_Transformer/CogVideoX-Fun-V1.1-2b-InP.tar.gz
|
73 |
|
74 |
cd models/Diffusion_Transformer/
|
75 |
+
tar -xvf CogVideoX-Fun-V1.1-2b-InP.tar.gz
|
76 |
cd ../../
|
77 |
```
|
78 |
|
|
|
104 |
```
|
105 |
📦 models/
|
106 |
├── 📂 Diffusion_Transformer/
|
107 |
+
│ ├── 📂 CogVideoX-Fun-V1.1-2b-InP/
|
108 |
+
│ └── 📂 CogVideoX-Fun-V1.1-5b-InP/
|
109 |
├── 📂 Personalized_Model/
|
110 |
│ └── your trained trainformer model / your trained lora model (for UI load)
|
111 |
```
|
|
|
113 |
# Video Result
|
114 |
The results displayed are all based on image.
|
115 |
|
116 |
+
### CogVideoX-Fun-V1.1-5B
|
117 |
|
118 |
Resolution-1024
|
119 |
|
120 |
<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
|
121 |
<tr>
|
122 |
<td>
|
123 |
+
<video src="https://github.com/user-attachments/assets/34e7ec8f-293e-4655-bb14-5e1ee476f788" width="100%" controls autoplay loop></video>
|
124 |
</td>
|
125 |
<td>
|
126 |
+
<video src="https://github.com/user-attachments/assets/7809c64f-eb8c-48a9-8bdc-ca9261fd5434" width="100%" controls autoplay loop></video>
|
127 |
</td>
|
128 |
<td>
|
129 |
+
<video src="https://github.com/user-attachments/assets/8e76aaa4-c602-44ac-bcb4-8b24b72c386c" width="100%" controls autoplay loop></video>
|
130 |
</td>
|
131 |
<td>
|
132 |
+
<video src="https://github.com/user-attachments/assets/19dba894-7c35-4f25-b15c-384167ab3b03" width="100%" controls autoplay loop></video>
|
133 |
</td>
|
134 |
</tr>
|
135 |
</table>
|
136 |
|
137 |
+
|
138 |
Resolution-768
|
139 |
|
140 |
<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
|
141 |
<tr>
|
142 |
<td>
|
143 |
+
<video src="https://github.com/user-attachments/assets/0bc339b9-455b-44fd-8917-80272d702737" width="100%" controls autoplay loop></video>
|
144 |
</td>
|
145 |
<td>
|
146 |
+
<video src="https://github.com/user-attachments/assets/70a043b9-6721-4bd9-be47-78b7ec5c27e9" width="100%" controls autoplay loop></video>
|
147 |
</td>
|
148 |
<td>
|
149 |
+
<video src="https://github.com/user-attachments/assets/d5dd6c09-14f3-40f8-8b6d-91e26519b8ac" width="100%" controls autoplay loop></video>
|
150 |
</td>
|
151 |
<td>
|
152 |
+
<video src="https://github.com/user-attachments/assets/9327e8bc-4f17-46b0-b50d-38c250a9483a" width="100%" controls autoplay loop></video>
|
153 |
</td>
|
154 |
</tr>
|
155 |
</table>
|
|
|
159 |
<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
|
160 |
<tr>
|
161 |
<td>
|
162 |
+
<video src="https://github.com/user-attachments/assets/ef407030-8062-454d-aba3-131c21e6b58c" width="100%" controls autoplay loop></video>
|
163 |
</td>
|
164 |
<td>
|
165 |
+
<video src="https://github.com/user-attachments/assets/7610f49e-38b6-4214-aa48-723ae4d1b07e" width="100%" controls autoplay loop></video>
|
166 |
</td>
|
167 |
<td>
|
168 |
+
<video src="https://github.com/user-attachments/assets/1fff0567-1e15-415c-941e-53ee8ae2c841" width="100%" controls autoplay loop></video>
|
169 |
</td>
|
170 |
<td>
|
171 |
+
<video src="https://github.com/user-attachments/assets/bcec48da-b91b-43a0-9d50-cf026e00fa4f" width="100%" controls autoplay loop></video>
|
172 |
</td>
|
173 |
</tr>
|
174 |
</table>
|
175 |
|
176 |
+
### CogVideoX-Fun-V1.1-5B-Pose
|
177 |
|
178 |
<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
|
179 |
<tr>
|
180 |
<td>
|
181 |
+
Resolution-512
|
182 |
+
</td>
|
183 |
+
<td>
|
184 |
+
Resolution-768
|
185 |
+
</td>
|
186 |
+
<td>
|
187 |
+
Resolution-1024
|
188 |
+
</td>
|
189 |
+
<tr>
|
190 |
+
<td>
|
191 |
+
<video src="https://github.com/user-attachments/assets/a746df51-9eb7-4446-bee5-2ee30285c143" width="100%" controls autoplay loop></video>
|
192 |
</td>
|
193 |
<td>
|
194 |
+
<video src="https://github.com/user-attachments/assets/db295245-e6aa-43be-8c81-32cb411f1473" width="100%" controls autoplay loop></video>
|
195 |
</td>
|
196 |
<td>
|
197 |
+
<video src="https://github.com/user-attachments/assets/ec9875b2-fde0-48e1-ab7e-490cee51ef40" width="100%" controls autoplay loop></video>
|
198 |
</td>
|
199 |
+
</tr>
|
200 |
+
</table>
|
201 |
+
|
202 |
+
### CogVideoX-Fun-V1.1-2B
|
203 |
+
|
204 |
+
Resolution-768
|
205 |
+
|
206 |
+
<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
|
207 |
+
<tr>
|
208 |
+
<td>
|
209 |
+
<video src="https://github.com/user-attachments/assets/03235dea-980e-4fc5-9c41-e40a5bc1b6d0" width="100%" controls autoplay loop></video>
|
210 |
+
</td>
|
211 |
<td>
|
212 |
+
<video src="https://github.com/user-attachments/assets/f7302648-5017-47db-bdeb-4d893e620b37" width="100%" controls autoplay loop></video>
|
213 |
+
</td>
|
214 |
+
<td>
|
215 |
+
<video src="https://github.com/user-attachments/assets/cbadf411-28fa-4b87-813d-da63ff481904" width="100%" controls autoplay loop></video>
|
216 |
+
</td>
|
217 |
+
<td>
|
218 |
+
<video src="https://github.com/user-attachments/assets/87cc9d0b-b6fe-4d2d-b447-174513d169ab" width="100%" controls autoplay loop></video>
|
219 |
+
</td>
|
220 |
+
</tr>
|
221 |
+
</table>
|
222 |
+
|
223 |
+
### CogVideoX-Fun-V1.1-2B-Pose
|
224 |
+
|
225 |
+
<table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
|
226 |
+
<tr>
|
227 |
+
<td>
|
228 |
+
Resolution-512
|
229 |
+
</td>
|
230 |
+
<td>
|
231 |
+
Resolution-768
|
232 |
+
</td>
|
233 |
+
<td>
|
234 |
+
Resolution-1024
|
235 |
+
</td>
|
236 |
+
<tr>
|
237 |
+
<td>
|
238 |
+
<video src="https://github.com/user-attachments/assets/487bcd7b-1b7f-4bb4-95b5-96a6b6548b3e" width="100%" controls autoplay loop></video>
|
239 |
+
</td>
|
240 |
+
<td>
|
241 |
+
<video src="https://github.com/user-attachments/assets/2710fd18-8489-46e4-8086-c237309ae7f6" width="100%" controls autoplay loop></video>
|
242 |
+
</td>
|
243 |
+
<td>
|
244 |
+
<video src="https://github.com/user-attachments/assets/b79513db-7747-4512-b86c-94f9ca447fe2" width="100%" controls autoplay loop></video>
|
245 |
</td>
|
246 |
</tr>
|
247 |
</table>
|
|
|
339 |
sh scripts/train.sh
|
340 |
```
|
341 |
|
342 |
+
For details on setting some parameters, please refer to [Readme Train](scripts/README_TRAIN.md), [Readme Lora](scripts/README_TRAIN_LORA.md) and [Readme Control](scripts/README_TRAIN_CONTROL.md).
|
343 |
|
344 |
|
345 |
# Model zoo
|
346 |
|
347 |
+
V1.1:
|
348 |
+
|
349 |
+
| 名称 | 存储空间 | Hugging Face | Model Scope | 描述 |
|
350 |
+
|--|--|--|--|--|
|
351 |
+
| CogVideoX-Fun-V1.1-2b-InP.tar.gz | Before extraction:9.7 GB \/ After extraction: 13.0 GB | [🤗Link](https://huggingface.co/alibaba-pai/CogVideoX-Fun-V1.1-2b-InP) | [😄Link](https://modelscope.cn/models/PAI/CogVideoX-Fun-V1.1-2b-InP) | Our official graph-generated video model is capable of predicting videos at multiple resolutions (512, 768, 1024, 1280) and has been trained on 49 frames at a rate of 8 frames per second. Noise has been added to the reference image, and the amplitude of motion is greater compared to V1.0. |
|
352 |
+
| CogVideoX-Fun-V1.1-5b-InP.tar.gz | Before extraction:16.0 GB \/ After extraction: 20.0 GB | [🤗Link](https://huggingface.co/alibaba-pai/CogVideoX-Fun-V1.1-5b-InP) | [😄Link](https://modelscope.cn/models/PAI/CogVideoX-Fun-V1.1-5b-InP) | Our official graph-generated video model is capable of predicting videos at multiple resolutions (512, 768, 1024, 1280) and has been trained on 49 frames at a rate of 8 frames per second. Noise has been added to the reference image, and the amplitude of motion is greater compared to V1.0. |
|
353 |
+
| CogVideoX-Fun-V1.1-2b-Pose.tar.gz | Before extraction:9.7 GB \/ After extraction: 13.0 GB | [🤗Link](https://huggingface.co/alibaba-pai/CogVideoX-Fun-V1.1-2b-Pose) | [😄Link](https://modelscope.cn/models/PAI/CogVideoX-Fun-V1.1-2b-Pose) | Our official pose-control video model is capable of predicting videos at multiple resolutions (512, 768, 1024, 1280) and has been trained on 49 frames at a rate of 8 frames per second.|
|
354 |
+
| CogVideoX-Fun-V1.1-5b-Pose.tar.gz | Before extraction:16.0 GB \/ After extraction: 20.0 GB | [🤗Link](https://huggingface.co/alibaba-pai/CogVideoX-Fun-V1.1-5b-Pose) | [😄Link](https://modelscope.cn/models/PAI/CogVideoX-Fun-V1.1-5b-Pose) | Our official pose-control video model is capable of predicting videos at multiple resolutions (512, 768, 1024, 1280) and has been trained on 49 frames at a rate of 8 frames per second.|
|
355 |
+
|
356 |
+
V1.0:
|
357 |
+
|
358 |
| Name | Storage Space | Hugging Face | Model Scope | Description |
|
359 |
|--|--|--|--|--|
|
360 |
| CogVideoX-Fun-2b-InP.tar.gz | Before extraction:9.7 GB \/ After extraction: 13.0 GB | [🤗Link](https://huggingface.co/alibaba-pai/CogVideoX-Fun-2b-InP) | [😄Link](https://modelscope.cn/models/PAI/CogVideoX-Fun-2b-InP) | Our official graph-generated video model is capable of predicting videos at multiple resolutions (512, 768, 1024, 1280) and has been trained on 49 frames at a rate of 8 frames per second. |
|