bubbliiiing commited on
Commit
8f9e423
·
1 Parent(s): 3b485b4

Update Readme

Browse files
Files changed (2) hide show
  1. README.md +90 -23
  2. README_en.md +90 -23
README.md CHANGED
@@ -33,6 +33,8 @@ tasks:
33
 
34
  😊 Welcome!
35
 
 
 
36
  [English](./README_en.md) | 简体中文
37
 
38
  # 目录
@@ -52,6 +54,7 @@ CogVideoX-Fun是一个基于CogVideoX结构修改后的的pipeline,是一个
52
  我们会逐渐支持从不同平台快速启动,请参阅 [快速启动](#快速启动)。
53
 
54
  新特性:
 
55
  - 创建代码!现在支持 Windows 和 Linux。支持2b与5b最大256x256x49到1024x1024x49的任意分辨率的视频生成。[ 2024.09.18 ]
56
 
57
  功能概览:
@@ -95,10 +98,10 @@ cd CogVideoX-Fun
95
  mkdir models/Diffusion_Transformer
96
  mkdir models/Personalized_Model
97
 
98
- wget https://pai-aigc-photog.oss-cn-hangzhou.aliyuncs.com/cogvideox_fun/Diffusion_Transformer/CogVideoX-Fun-2b-InP.tar.gz -O models/Diffusion_Transformer/CogVideoX-Fun-2b-InP.tar.gz
99
 
100
  cd models/Diffusion_Transformer/
101
- tar -xvf CogVideoX-Fun-2b-InP.tar.gz
102
  cd ../../
103
  ```
104
 
@@ -130,8 +133,8 @@ Linux 的详细信息:
130
  ```
131
  📦 models/
132
  ├── 📂 Diffusion_Transformer/
133
- │ ├── 📂 CogVideoX-Fun-2b-InP/
134
- │ └── 📂 CogVideoX-Fun-5b-InP/
135
  ├── 📂 Personalized_Model/
136
  │ └── your trained trainformer model / your trained lora model (for UI load)
137
  ```
@@ -139,42 +142,43 @@ Linux 的详细信息:
139
  # 视频作品
140
  所展示的结果都是图生视频获得。
141
 
142
- ### CogVideoX-Fun-5B
143
 
144
  Resolution-1024
145
 
146
  <table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
147
  <tr>
148
  <td>
149
- <video src="https://github.com/user-attachments/assets/ec749326-b529-453f-a4b4-f587875dff64" width="100%" controls autoplay loop></video>
150
  </td>
151
  <td>
152
- <video src="https://github.com/user-attachments/assets/84df4178-f493-4aa8-a888-d2020338da82" width="100%" controls autoplay loop></video>
153
  </td>
154
  <td>
155
- <video src="https://github.com/user-attachments/assets/c66c139d-94d3-4930-985b-60e3e0600d8f" width="100%" controls autoplay loop></video>
156
  </td>
157
  <td>
158
- <video src="https://github.com/user-attachments/assets/647c0e0c-28d6-473e-b4eb-a30197dddefc" width="100%" controls autoplay loop></video>
159
  </td>
160
  </tr>
161
  </table>
162
 
 
163
  Resolution-768
164
 
165
  <table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
166
  <tr>
167
  <td>
168
- <video src="https://github.com/user-attachments/assets/647d45b0-4253-4438-baf3-f692789bde78" width="100%" controls autoplay loop></video>
169
  </td>
170
  <td>
171
- <video src="https://github.com/user-attachments/assets/e5a5a948-5c34-445d-9446-324a666a6a33" width="100%" controls autoplay loop></video>
172
  </td>
173
  <td>
174
- <video src="https://github.com/user-attachments/assets/0e605797-4a86-4e0c-8589-40ed686d97a4" width="100%" controls autoplay loop></video>
175
  </td>
176
  <td>
177
- <video src="https://github.com/user-attachments/assets/5356bf79-0a3b-4caf-ac31-2d796e20e429" width="100%" controls autoplay loop></video>
178
  </td>
179
  </tr>
180
  </table>
@@ -184,41 +188,92 @@ Resolution-512
184
  <table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
185
  <tr>
186
  <td>
187
- <video src="https://github.com/user-attachments/assets/5a9f3457-fe82-4082-8494-d8f4f8db75e9" width="100%" controls autoplay loop></video>
188
  </td>
189
  <td>
190
- <video src="https://github.com/user-attachments/assets/ca6874b8-41d1-4f02-bee3-4fc886f309ad" width="100%" controls autoplay loop></video>
191
  </td>
192
  <td>
193
- <video src="https://github.com/user-attachments/assets/9216b348-2c80-4eab-9c1c-dd3a54b7ea1e" width="100%" controls autoplay loop></video>
194
  </td>
195
  <td>
196
- <video src="https://github.com/user-attachments/assets/e99ec495-655f-44d8-afa7-3ad0a14f9975" width="100%" controls autoplay loop></video>
197
  </td>
198
  </tr>
199
  </table>
200
 
201
- ### CogVideoX-Fun-2B
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
202
 
203
  Resolution-768
204
 
205
  <table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
206
  <tr>
207
  <td>
208
- <video src="https://github.com/user-attachments/assets/d329b4d4-f08f-4e77-887e-049cfc93a908" width="100%" controls autoplay loop></video>
209
  </td>
210
  <td>
211
- <video src="https://github.com/user-attachments/assets/dd7fa2d5-9871-436c-ae5a-44f1494c9c9f" width="100%" controls autoplay loop></video>
212
  </td>
213
  <td>
214
- <video src="https://github.com/user-attachments/assets/c24a2fa2-2fe3-4277-aa9f-e812a2cf0a4e" width="100%" controls autoplay loop></video>
215
  </td>
216
  <td>
217
- <video src="https://github.com/user-attachments/assets/573edac3-8bd0-4e95-82df-bcfdcba9a73f" width="100%" controls autoplay loop></video>
218
  </td>
219
  </tr>
220
  </table>
221
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
222
 
223
  # 如何使用
224
 
@@ -318,6 +373,18 @@ sh scripts/train.sh
318
  关于一些参数的设置细节,可以查看[Readme Train](scripts/README_TRAIN.md)与[Readme Lora](scripts/README_TRAIN_LORA.md)
319
 
320
  # 模型地址
 
 
 
 
 
 
 
 
 
 
 
 
321
  | 名称 | 存储空间 | Hugging Face | Model Scope | 描述 |
322
  |--|--|--|--|--|
323
  | CogVideoX-Fun-2b-InP.tar.gz | 解压前 9.7 GB / 解压后 13.0 GB | [🤗Link](https://huggingface.co/alibaba-pai/CogVideoX-Fun-2b-InP) | [😄Link](https://modelscope.cn/models/PAI/CogVideoX-Fun-2b-InP) | 官方的图生视频权重。支持多分辨率(512,768,1024,1280)的视频预测,以49帧、每秒8帧进行训练 |
@@ -335,4 +402,4 @@ sh scripts/train.sh
335
 
336
  CogVideoX-2B 模型 (包括其对应的Transformers模块,VAE模块) 根据 [Apache 2.0 协议](LICENSE) 许可证发布。
337
 
338
- CogVideoX-5B 模型(Transformer 模块)在[CogVideoX许可证](https://huggingface.co/THUDM/CogVideoX-5b/blob/main/LICENSE)下发布.
 
33
 
34
  😊 Welcome!
35
 
36
+ [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-yellow)](https://huggingface.co/spaces/alibaba-pai/CogVideoX-Fun-5b)
37
+
38
  [English](./README_en.md) | 简体中文
39
 
40
  # 目录
 
54
  我们会逐渐支持从不同平台快速启动,请参阅 [快速启动](#快速启动)。
55
 
56
  新特性:
57
+ - 重新训练i2v模型,添加Noise,使得视频的运动幅度更大。上传控制模型训练代码与Control模型。[ 2024.09.29 ]
58
  - 创建代码!现在支持 Windows 和 Linux。支持2b与5b最大256x256x49到1024x1024x49的任意分辨率的视频生成。[ 2024.09.18 ]
59
 
60
  功能概览:
 
98
  mkdir models/Diffusion_Transformer
99
  mkdir models/Personalized_Model
100
 
101
+ wget https://pai-aigc-photog.oss-cn-hangzhou.aliyuncs.com/cogvideox_fun/Diffusion_Transformer/CogVideoX-Fun-V1.1-2b-InP.tar.gz -O models/Diffusion_Transformer/CogVideoX-Fun-V1.1-2b-InP.tar.gz
102
 
103
  cd models/Diffusion_Transformer/
104
+ tar -xvf CogVideoX-Fun-V1.1-2b-InP.tar.gz
105
  cd ../../
106
  ```
107
 
 
133
  ```
134
  📦 models/
135
  ├── 📂 Diffusion_Transformer/
136
+ │ ├── 📂 CogVideoX-Fun-V1.1-2b-InP/
137
+ │ └── 📂 CogVideoX-Fun-V1.1-5b-InP/
138
  ├── 📂 Personalized_Model/
139
  │ └── your trained trainformer model / your trained lora model (for UI load)
140
  ```
 
142
  # 视频作品
143
  所展示的结果都是图生视频获得。
144
 
145
+ ### CogVideoX-Fun-V1.1-5B
146
 
147
  Resolution-1024
148
 
149
  <table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
150
  <tr>
151
  <td>
152
+ <video src="https://github.com/user-attachments/assets/34e7ec8f-293e-4655-bb14-5e1ee476f788" width="100%" controls autoplay loop></video>
153
  </td>
154
  <td>
155
+ <video src="https://github.com/user-attachments/assets/7809c64f-eb8c-48a9-8bdc-ca9261fd5434" width="100%" controls autoplay loop></video>
156
  </td>
157
  <td>
158
+ <video src="https://github.com/user-attachments/assets/8e76aaa4-c602-44ac-bcb4-8b24b72c386c" width="100%" controls autoplay loop></video>
159
  </td>
160
  <td>
161
+ <video src="https://github.com/user-attachments/assets/19dba894-7c35-4f25-b15c-384167ab3b03" width="100%" controls autoplay loop></video>
162
  </td>
163
  </tr>
164
  </table>
165
 
166
+
167
  Resolution-768
168
 
169
  <table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
170
  <tr>
171
  <td>
172
+ <video src="https://github.com/user-attachments/assets/0bc339b9-455b-44fd-8917-80272d702737" width="100%" controls autoplay loop></video>
173
  </td>
174
  <td>
175
+ <video src="https://github.com/user-attachments/assets/70a043b9-6721-4bd9-be47-78b7ec5c27e9" width="100%" controls autoplay loop></video>
176
  </td>
177
  <td>
178
+ <video src="https://github.com/user-attachments/assets/d5dd6c09-14f3-40f8-8b6d-91e26519b8ac" width="100%" controls autoplay loop></video>
179
  </td>
180
  <td>
181
+ <video src="https://github.com/user-attachments/assets/9327e8bc-4f17-46b0-b50d-38c250a9483a" width="100%" controls autoplay loop></video>
182
  </td>
183
  </tr>
184
  </table>
 
188
  <table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
189
  <tr>
190
  <td>
191
+ <video src="https://github.com/user-attachments/assets/ef407030-8062-454d-aba3-131c21e6b58c" width="100%" controls autoplay loop></video>
192
  </td>
193
  <td>
194
+ <video src="https://github.com/user-attachments/assets/7610f49e-38b6-4214-aa48-723ae4d1b07e" width="100%" controls autoplay loop></video>
195
  </td>
196
  <td>
197
+ <video src="https://github.com/user-attachments/assets/1fff0567-1e15-415c-941e-53ee8ae2c841" width="100%" controls autoplay loop></video>
198
  </td>
199
  <td>
200
+ <video src="https://github.com/user-attachments/assets/bcec48da-b91b-43a0-9d50-cf026e00fa4f" width="100%" controls autoplay loop></video>
201
  </td>
202
  </tr>
203
  </table>
204
 
205
+ ### CogVideoX-Fun-V1.1-5B-Pose
206
+
207
+ <table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
208
+ <tr>
209
+ <td>
210
+ Resolution-512
211
+ </td>
212
+ <td>
213
+ Resolution-768
214
+ </td>
215
+ <td>
216
+ Resolution-1024
217
+ </td>
218
+ <tr>
219
+ <td>
220
+ <video src="https://github.com/user-attachments/assets/a746df51-9eb7-4446-bee5-2ee30285c143" width="100%" controls autoplay loop></video>
221
+ </td>
222
+ <td>
223
+ <video src="https://github.com/user-attachments/assets/db295245-e6aa-43be-8c81-32cb411f1473" width="100%" controls autoplay loop></video>
224
+ </td>
225
+ <td>
226
+ <video src="https://github.com/user-attachments/assets/ec9875b2-fde0-48e1-ab7e-490cee51ef40" width="100%" controls autoplay loop></video>
227
+ </td>
228
+ </tr>
229
+ </table>
230
+
231
+ ### CogVideoX-Fun-V1.1-2B
232
 
233
  Resolution-768
234
 
235
  <table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
236
  <tr>
237
  <td>
238
+ <video src="https://github.com/user-attachments/assets/03235dea-980e-4fc5-9c41-e40a5bc1b6d0" width="100%" controls autoplay loop></video>
239
  </td>
240
  <td>
241
+ <video src="https://github.com/user-attachments/assets/f7302648-5017-47db-bdeb-4d893e620b37" width="100%" controls autoplay loop></video>
242
  </td>
243
  <td>
244
+ <video src="https://github.com/user-attachments/assets/cbadf411-28fa-4b87-813d-da63ff481904" width="100%" controls autoplay loop></video>
245
  </td>
246
  <td>
247
+ <video src="https://github.com/user-attachments/assets/87cc9d0b-b6fe-4d2d-b447-174513d169ab" width="100%" controls autoplay loop></video>
248
  </td>
249
  </tr>
250
  </table>
251
 
252
+ ### CogVideoX-Fun-V1.1-2B-Pose
253
+
254
+ <table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
255
+ <tr>
256
+ <td>
257
+ Resolution-512
258
+ </td>
259
+ <td>
260
+ Resolution-768
261
+ </td>
262
+ <td>
263
+ Resolution-1024
264
+ </td>
265
+ <tr>
266
+ <td>
267
+ <video src="https://github.com/user-attachments/assets/487bcd7b-1b7f-4bb4-95b5-96a6b6548b3e" width="100%" controls autoplay loop></video>
268
+ </td>
269
+ <td>
270
+ <video src="https://github.com/user-attachments/assets/2710fd18-8489-46e4-8086-c237309ae7f6" width="100%" controls autoplay loop></video>
271
+ </td>
272
+ <td>
273
+ <video src="https://github.com/user-attachments/assets/b79513db-7747-4512-b86c-94f9ca447fe2" width="100%" controls autoplay loop></video>
274
+ </td>
275
+ </tr>
276
+ </table>
277
 
278
  # 如何使用
279
 
 
373
  关于一些参数的设置细节,可以查看[Readme Train](scripts/README_TRAIN.md)与[Readme Lora](scripts/README_TRAIN_LORA.md)
374
 
375
  # 模型地址
376
+
377
+ V1.1:
378
+
379
+ | 名称 | 存储空间 | Hugging Face | Model Scope | 描述 |
380
+ |--|--|--|--|--|
381
+ | CogVideoX-Fun-V1.1-2b-InP.tar.gz | 解压前 9.7 GB / 解压后 13.0 GB | [🤗Link](https://huggingface.co/alibaba-pai/CogVideoX-Fun-V1.1-2b-InP) | [😄Link](https://modelscope.cn/models/PAI/CogVideoX-Fun-V1.1-2b-InP) | 官方的图生视频权重。添加了Noise,运动幅度相比于V1.0更大。支持多分辨率(512,768,1024,1280)的视频预测,以49帧、每秒8帧进行训练 |
382
+ | CogVideoX-Fun-V1.1-5b-InP.tar.gz | 解压前 16.0GB / 解压后 20.0 GB | [🤗Link](https://huggingface.co/alibaba-pai/CogVideoX-Fun-V1.1-5b-InP) | [😄Link](https://modelscope.cn/models/PAI/CogVideoX-Fun-V1.1-5b-InP) | 官方的图生视频权重。添加了Noise,运动幅度相比于V1.0更大。支持多分辨率(512,768,1024,1280)的视频预测,以49帧、每秒8帧进行训练 |
383
+ | CogVideoX-Fun-V1.1-2b-Pose.tar.gz | 解压前 9.7 GB / 解压后 13.0 GB | [🤗Link](https://huggingface.co/alibaba-pai/CogVideoX-Fun-V1.1-2b-Pose) | [😄Link](https://modelscope.cn/models/PAI/CogVideoX-Fun-V1.1-2b-Pose) | 官方的姿态控制生视频权重。支持多分辨率(512,768,1024,1280)的视频预测,以49帧、每秒8帧进行训练 |
384
+ | CogVideoX-Fun-V1.1-5b-Pose.tar.gz | 解压前 16.0GB / 解压后 20.0 GB | [🤗Link](https://huggingface.co/alibaba-pai/CogVideoX-Fun-V1.1-5b-Pose) | [😄Link](https://modelscope.cn/models/PAI/CogVideoX-Fun-V1.1-5b-Pose) | 官方的姿态控制生视频权重。支持多分辨率(512,768,1024,1280)的视频预测,以49帧、每秒8帧进行训练 |
385
+
386
+ V1.0:
387
+
388
  | 名称 | 存储空间 | Hugging Face | Model Scope | 描述 |
389
  |--|--|--|--|--|
390
  | CogVideoX-Fun-2b-InP.tar.gz | 解压前 9.7 GB / 解压后 13.0 GB | [🤗Link](https://huggingface.co/alibaba-pai/CogVideoX-Fun-2b-InP) | [😄Link](https://modelscope.cn/models/PAI/CogVideoX-Fun-2b-InP) | 官方的图生视频权重。支持多分辨率(512,768,1024,1280)的视频预测,以49帧、每秒8帧进行训练 |
 
402
 
403
  CogVideoX-2B 模型 (包括其对应的Transformers模块,VAE模块) 根据 [Apache 2.0 协议](LICENSE) 许可证发布。
404
 
405
+ CogVideoX-5B 模型(Transformer 模块)在[CogVideoX许可证](https://huggingface.co/THUDM/CogVideoX-5b/blob/main/LICENSE)下发布.
README_en.md CHANGED
@@ -23,6 +23,7 @@ CogVideoX-Fun is a modified pipeline based on the CogVideoX structure, designed
23
  We will support quick pull-ups from different platforms, refer to [Quick Start](#quick-start).
24
 
25
  What's New:
 
26
  - Create code! Now supporting Windows and Linux. Supports 2b and 5b models. Supports video generation at any resolution from 256x256x49 to 1024x1024x49. [ 2024.09.18 ]
27
 
28
  Function:
@@ -68,10 +69,10 @@ cd CogVideoX-Fun
68
  mkdir models/Diffusion_Transformer
69
  mkdir models/Personalized_Model
70
 
71
- wget https://pai-aigc-photog.oss-cn-hangzhou.aliyuncs.com/cogvideox_fun/Diffusion_Transformer/CogVideoX-Fun-2b-InP.tar.gz -O models/Diffusion_Transformer/CogVideoX-Fun-2b-InP.tar.gz
72
 
73
  cd models/Diffusion_Transformer/
74
- tar -xvf CogVideoX-Fun-2b-InP.tar.gz
75
  cd ../../
76
  ```
77
 
@@ -103,8 +104,8 @@ We'd better place the [weights](#model-zoo) along the specified path:
103
  ```
104
  📦 models/
105
  ├── 📂 Diffusion_Transformer/
106
- │ ├── 📂 CogVideoX-Fun-2b-InP/
107
- │ └── 📂 CogVideoX-Fun-5b-InP/
108
  ├── 📂 Personalized_Model/
109
  │ └── your trained trainformer model / your trained lora model (for UI load)
110
  ```
@@ -112,42 +113,43 @@ We'd better place the [weights](#model-zoo) along the specified path:
112
  # Video Result
113
  The results displayed are all based on image.
114
 
115
- ### CogVideoX-Fun-5B
116
 
117
  Resolution-1024
118
 
119
  <table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
120
  <tr>
121
  <td>
122
- <video src="https://github.com/user-attachments/assets/ec749326-b529-453f-a4b4-f587875dff64" width="100%" controls autoplay loop></video>
123
  </td>
124
  <td>
125
- <video src="https://github.com/user-attachments/assets/84df4178-f493-4aa8-a888-d2020338da82" width="100%" controls autoplay loop></video>
126
  </td>
127
  <td>
128
- <video src="https://github.com/user-attachments/assets/c66c139d-94d3-4930-985b-60e3e0600d8f" width="100%" controls autoplay loop></video>
129
  </td>
130
  <td>
131
- <video src="https://github.com/user-attachments/assets/647c0e0c-28d6-473e-b4eb-a30197dddefc" width="100%" controls autoplay loop></video>
132
  </td>
133
  </tr>
134
  </table>
135
 
 
136
  Resolution-768
137
 
138
  <table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
139
  <tr>
140
  <td>
141
- <video src="https://github.com/user-attachments/assets/647d45b0-4253-4438-baf3-f692789bde78" width="100%" controls autoplay loop></video>
142
  </td>
143
  <td>
144
- <video src="https://github.com/user-attachments/assets/e5a5a948-5c34-445d-9446-324a666a6a33" width="100%" controls autoplay loop></video>
145
  </td>
146
  <td>
147
- <video src="https://github.com/user-attachments/assets/0e605797-4a86-4e0c-8589-40ed686d97a4" width="100%" controls autoplay loop></video>
148
  </td>
149
  <td>
150
- <video src="https://github.com/user-attachments/assets/5356bf79-0a3b-4caf-ac31-2d796e20e429" width="100%" controls autoplay loop></video>
151
  </td>
152
  </tr>
153
  </table>
@@ -157,35 +159,89 @@ Resolution-512
157
  <table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
158
  <tr>
159
  <td>
160
- <video src="https://github.com/user-attachments/assets/5a9f3457-fe82-4082-8494-d8f4f8db75e9" width="100%" controls autoplay loop></video>
161
  </td>
162
  <td>
163
- <video src="https://github.com/user-attachments/assets/ca6874b8-41d1-4f02-bee3-4fc886f309ad" width="100%" controls autoplay loop></video>
164
  </td>
165
  <td>
166
- <video src="https://github.com/user-attachments/assets/9216b348-2c80-4eab-9c1c-dd3a54b7ea1e" width="100%" controls autoplay loop></video>
167
  </td>
168
  <td>
169
- <video src="https://github.com/user-attachments/assets/e99ec495-655f-44d8-afa7-3ad0a14f9975" width="100%" controls autoplay loop></video>
170
  </td>
171
  </tr>
172
  </table>
173
 
174
- ### CogVideoX-Fun-2B
175
 
176
  <table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
177
  <tr>
178
  <td>
179
- <video src="https://github.com/user-attachments/assets/d329b4d4-f08f-4e77-887e-049cfc93a908" width="100%" controls autoplay loop></video>
 
 
 
 
 
 
 
 
 
 
180
  </td>
181
  <td>
182
- <video src="https://github.com/user-attachments/assets/dd7fa2d5-9871-436c-ae5a-44f1494c9c9f" width="100%" controls autoplay loop></video>
183
  </td>
184
  <td>
185
- <video src="https://github.com/user-attachments/assets/c24a2fa2-2fe3-4277-aa9f-e812a2cf0a4e" width="100%" controls autoplay loop></video>
186
  </td>
 
 
 
 
 
 
 
 
 
 
 
 
187
  <td>
188
- <video src="https://github.com/user-attachments/assets/573edac3-8bd0-4e95-82df-bcfdcba9a73f" width="100%" controls autoplay loop></video>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
189
  </td>
190
  </tr>
191
  </table>
@@ -283,11 +339,22 @@ Then, we run scripts/train.sh.
283
  sh scripts/train.sh
284
  ```
285
 
286
- For details on setting some parameters, please refer to [Readme Train](scripts/README_TRAIN.md) and [Readme Lora](scripts/README_TRAIN_LORA.md).
287
 
288
 
289
  # Model zoo
290
 
 
 
 
 
 
 
 
 
 
 
 
291
  | Name | Storage Space | Hugging Face | Model Scope | Description |
292
  |--|--|--|--|--|
293
  | CogVideoX-Fun-2b-InP.tar.gz | Before extraction:9.7 GB \/ After extraction: 13.0 GB | [🤗Link](https://huggingface.co/alibaba-pai/CogVideoX-Fun-2b-InP) | [😄Link](https://modelscope.cn/models/PAI/CogVideoX-Fun-2b-InP) | Our official graph-generated video model is capable of predicting videos at multiple resolutions (512, 768, 1024, 1280) and has been trained on 49 frames at a rate of 8 frames per second. |
 
23
  We will support quick pull-ups from different platforms, refer to [Quick Start](#quick-start).
24
 
25
  What's New:
26
+ - Retrain the i2v model and add noise to increase the motion amplitude of the video. Upload the control model training code and control model. [ 2024.09.29 ]
27
  - Create code! Now supporting Windows and Linux. Supports 2b and 5b models. Supports video generation at any resolution from 256x256x49 to 1024x1024x49. [ 2024.09.18 ]
28
 
29
  Function:
 
69
  mkdir models/Diffusion_Transformer
70
  mkdir models/Personalized_Model
71
 
72
+ wget https://pai-aigc-photog.oss-cn-hangzhou.aliyuncs.com/cogvideox_fun/Diffusion_Transformer/CogVideoX-Fun-V1.1-2b-InP.tar.gz -O models/Diffusion_Transformer/CogVideoX-Fun-V1.1-2b-InP.tar.gz
73
 
74
  cd models/Diffusion_Transformer/
75
+ tar -xvf CogVideoX-Fun-V1.1-2b-InP.tar.gz
76
  cd ../../
77
  ```
78
 
 
104
  ```
105
  📦 models/
106
  ├── 📂 Diffusion_Transformer/
107
+ │ ├── 📂 CogVideoX-Fun-V1.1-2b-InP/
108
+ │ └── 📂 CogVideoX-Fun-V1.1-5b-InP/
109
  ├── 📂 Personalized_Model/
110
  │ └── your trained trainformer model / your trained lora model (for UI load)
111
  ```
 
113
  # Video Result
114
  The results displayed are all based on image.
115
 
116
+ ### CogVideoX-Fun-V1.1-5B
117
 
118
  Resolution-1024
119
 
120
  <table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
121
  <tr>
122
  <td>
123
+ <video src="https://github.com/user-attachments/assets/34e7ec8f-293e-4655-bb14-5e1ee476f788" width="100%" controls autoplay loop></video>
124
  </td>
125
  <td>
126
+ <video src="https://github.com/user-attachments/assets/7809c64f-eb8c-48a9-8bdc-ca9261fd5434" width="100%" controls autoplay loop></video>
127
  </td>
128
  <td>
129
+ <video src="https://github.com/user-attachments/assets/8e76aaa4-c602-44ac-bcb4-8b24b72c386c" width="100%" controls autoplay loop></video>
130
  </td>
131
  <td>
132
+ <video src="https://github.com/user-attachments/assets/19dba894-7c35-4f25-b15c-384167ab3b03" width="100%" controls autoplay loop></video>
133
  </td>
134
  </tr>
135
  </table>
136
 
137
+
138
  Resolution-768
139
 
140
  <table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
141
  <tr>
142
  <td>
143
+ <video src="https://github.com/user-attachments/assets/0bc339b9-455b-44fd-8917-80272d702737" width="100%" controls autoplay loop></video>
144
  </td>
145
  <td>
146
+ <video src="https://github.com/user-attachments/assets/70a043b9-6721-4bd9-be47-78b7ec5c27e9" width="100%" controls autoplay loop></video>
147
  </td>
148
  <td>
149
+ <video src="https://github.com/user-attachments/assets/d5dd6c09-14f3-40f8-8b6d-91e26519b8ac" width="100%" controls autoplay loop></video>
150
  </td>
151
  <td>
152
+ <video src="https://github.com/user-attachments/assets/9327e8bc-4f17-46b0-b50d-38c250a9483a" width="100%" controls autoplay loop></video>
153
  </td>
154
  </tr>
155
  </table>
 
159
  <table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
160
  <tr>
161
  <td>
162
+ <video src="https://github.com/user-attachments/assets/ef407030-8062-454d-aba3-131c21e6b58c" width="100%" controls autoplay loop></video>
163
  </td>
164
  <td>
165
+ <video src="https://github.com/user-attachments/assets/7610f49e-38b6-4214-aa48-723ae4d1b07e" width="100%" controls autoplay loop></video>
166
  </td>
167
  <td>
168
+ <video src="https://github.com/user-attachments/assets/1fff0567-1e15-415c-941e-53ee8ae2c841" width="100%" controls autoplay loop></video>
169
  </td>
170
  <td>
171
+ <video src="https://github.com/user-attachments/assets/bcec48da-b91b-43a0-9d50-cf026e00fa4f" width="100%" controls autoplay loop></video>
172
  </td>
173
  </tr>
174
  </table>
175
 
176
+ ### CogVideoX-Fun-V1.1-5B-Pose
177
 
178
  <table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
179
  <tr>
180
  <td>
181
+ Resolution-512
182
+ </td>
183
+ <td>
184
+ Resolution-768
185
+ </td>
186
+ <td>
187
+ Resolution-1024
188
+ </td>
189
+ <tr>
190
+ <td>
191
+ <video src="https://github.com/user-attachments/assets/a746df51-9eb7-4446-bee5-2ee30285c143" width="100%" controls autoplay loop></video>
192
  </td>
193
  <td>
194
+ <video src="https://github.com/user-attachments/assets/db295245-e6aa-43be-8c81-32cb411f1473" width="100%" controls autoplay loop></video>
195
  </td>
196
  <td>
197
+ <video src="https://github.com/user-attachments/assets/ec9875b2-fde0-48e1-ab7e-490cee51ef40" width="100%" controls autoplay loop></video>
198
  </td>
199
+ </tr>
200
+ </table>
201
+
202
+ ### CogVideoX-Fun-V1.1-2B
203
+
204
+ Resolution-768
205
+
206
+ <table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
207
+ <tr>
208
+ <td>
209
+ <video src="https://github.com/user-attachments/assets/03235dea-980e-4fc5-9c41-e40a5bc1b6d0" width="100%" controls autoplay loop></video>
210
+ </td>
211
  <td>
212
+ <video src="https://github.com/user-attachments/assets/f7302648-5017-47db-bdeb-4d893e620b37" width="100%" controls autoplay loop></video>
213
+ </td>
214
+ <td>
215
+ <video src="https://github.com/user-attachments/assets/cbadf411-28fa-4b87-813d-da63ff481904" width="100%" controls autoplay loop></video>
216
+ </td>
217
+ <td>
218
+ <video src="https://github.com/user-attachments/assets/87cc9d0b-b6fe-4d2d-b447-174513d169ab" width="100%" controls autoplay loop></video>
219
+ </td>
220
+ </tr>
221
+ </table>
222
+
223
+ ### CogVideoX-Fun-V1.1-2B-Pose
224
+
225
+ <table border="0" style="width: 100%; text-align: left; margin-top: 20px;">
226
+ <tr>
227
+ <td>
228
+ Resolution-512
229
+ </td>
230
+ <td>
231
+ Resolution-768
232
+ </td>
233
+ <td>
234
+ Resolution-1024
235
+ </td>
236
+ <tr>
237
+ <td>
238
+ <video src="https://github.com/user-attachments/assets/487bcd7b-1b7f-4bb4-95b5-96a6b6548b3e" width="100%" controls autoplay loop></video>
239
+ </td>
240
+ <td>
241
+ <video src="https://github.com/user-attachments/assets/2710fd18-8489-46e4-8086-c237309ae7f6" width="100%" controls autoplay loop></video>
242
+ </td>
243
+ <td>
244
+ <video src="https://github.com/user-attachments/assets/b79513db-7747-4512-b86c-94f9ca447fe2" width="100%" controls autoplay loop></video>
245
  </td>
246
  </tr>
247
  </table>
 
339
  sh scripts/train.sh
340
  ```
341
 
342
+ For details on setting some parameters, please refer to [Readme Train](scripts/README_TRAIN.md), [Readme Lora](scripts/README_TRAIN_LORA.md) and [Readme Control](scripts/README_TRAIN_CONTROL.md).
343
 
344
 
345
  # Model zoo
346
 
347
+ V1.1:
348
+
349
+ | 名称 | 存储空间 | Hugging Face | Model Scope | 描述 |
350
+ |--|--|--|--|--|
351
+ | CogVideoX-Fun-V1.1-2b-InP.tar.gz | Before extraction:9.7 GB \/ After extraction: 13.0 GB | [🤗Link](https://huggingface.co/alibaba-pai/CogVideoX-Fun-V1.1-2b-InP) | [😄Link](https://modelscope.cn/models/PAI/CogVideoX-Fun-V1.1-2b-InP) | Our official graph-generated video model is capable of predicting videos at multiple resolutions (512, 768, 1024, 1280) and has been trained on 49 frames at a rate of 8 frames per second. Noise has been added to the reference image, and the amplitude of motion is greater compared to V1.0. |
352
+ | CogVideoX-Fun-V1.1-5b-InP.tar.gz | Before extraction:16.0 GB \/ After extraction: 20.0 GB | [🤗Link](https://huggingface.co/alibaba-pai/CogVideoX-Fun-V1.1-5b-InP) | [😄Link](https://modelscope.cn/models/PAI/CogVideoX-Fun-V1.1-5b-InP) | Our official graph-generated video model is capable of predicting videos at multiple resolutions (512, 768, 1024, 1280) and has been trained on 49 frames at a rate of 8 frames per second. Noise has been added to the reference image, and the amplitude of motion is greater compared to V1.0. |
353
+ | CogVideoX-Fun-V1.1-2b-Pose.tar.gz | Before extraction:9.7 GB \/ After extraction: 13.0 GB | [🤗Link](https://huggingface.co/alibaba-pai/CogVideoX-Fun-V1.1-2b-Pose) | [😄Link](https://modelscope.cn/models/PAI/CogVideoX-Fun-V1.1-2b-Pose) | Our official pose-control video model is capable of predicting videos at multiple resolutions (512, 768, 1024, 1280) and has been trained on 49 frames at a rate of 8 frames per second.|
354
+ | CogVideoX-Fun-V1.1-5b-Pose.tar.gz | Before extraction:16.0 GB \/ After extraction: 20.0 GB | [🤗Link](https://huggingface.co/alibaba-pai/CogVideoX-Fun-V1.1-5b-Pose) | [😄Link](https://modelscope.cn/models/PAI/CogVideoX-Fun-V1.1-5b-Pose) | Our official pose-control video model is capable of predicting videos at multiple resolutions (512, 768, 1024, 1280) and has been trained on 49 frames at a rate of 8 frames per second.|
355
+
356
+ V1.0:
357
+
358
  | Name | Storage Space | Hugging Face | Model Scope | Description |
359
  |--|--|--|--|--|
360
  | CogVideoX-Fun-2b-InP.tar.gz | Before extraction:9.7 GB \/ After extraction: 13.0 GB | [🤗Link](https://huggingface.co/alibaba-pai/CogVideoX-Fun-2b-InP) | [😄Link](https://modelscope.cn/models/PAI/CogVideoX-Fun-2b-InP) | Our official graph-generated video model is capable of predicting videos at multiple resolutions (512, 768, 1024, 1280) and has been trained on 49 frames at a rate of 8 frames per second. |