FastMochi-diffusers / README.md
PY007's picture
Update README.md
92d7fc4 verified
---
language:
- "en"
tags:
- video
license: apache-2.0
pipeline_tag: text-to-video
library_name: diffusers
---
<p align="center">
<img src="assets/logo.jpg" height=30>
</p>
# FastMochi Model Card
## Model Details
<div align="center">
<table style="margin-left: auto; margin-right: auto; border: none;">
<tr>
<td>
<img src="assets/mochi-demo.gif" width="640" alt="Mochi Demo">
</td>
</tr>
<tr>
<td style="text-align:center;">
Get 8X diffusion boost for Mochi with FastVideo
</td>
</tr>
</table>
</div>
FastMochi is an accelerated [Mochi](https://huggingface.co/genmo/mochi-1-preview) model. It can sample high quality videos with 8 diffusion steps. That brings around 8X speed up compared to the original Mochu with 64 steps.
- **Developed by**: [Hao AI Lab](https://hao-ai-lab.github.io/)
- **License**: Apache-2.0
- **Distilled from**: [Mochi](https://huggingface.co/genmo/mochi-1-preview)
- **Github Repository**: https://github.com/hao-ai-lab/FastVideo
## Usage
- Clone [Fastvideo](https://github.com/hao-ai-lab/FastVideo) repository and follow the inference instructions in the README.
- You can also run FastMochi using the official [Mochi repository](https://github.com/Tencent/HunyuanVideo) with the script below and this [compatible weight](https://huggingface.co/FastVideo/FastMochi).
<details>
<summary>Code</summary>
```python
from genmo.mochi_preview.pipelines import (
DecoderModelFactory,
DitModelFactory,
MochiMultiGPUPipeline,
T5ModelFactory,
linear_quadratic_schedule,
)
from genmo.lib.utils import save_video
import os
with open("prompt.txt", "r") as f:
prompts = [line.rstrip() for line in f]
pipeline = MochiMultiGPUPipeline(
text_encoder_factory=T5ModelFactory(),
world_size=4,
dit_factory=DitModelFactory(
model_path=f"weights/dit.safetensors", model_dtype="bf16"
),
decoder_factory=DecoderModelFactory(
model_path=f"weights/decoder.safetensors",
),
)
# read prompt line by line from prompt.txt
output_dir = "outputs"
os.makedirs(output_dir, exist_ok=True)
for i, prompt in enumerate(prompts):
video = pipeline(
height=480,
width=848,
num_frames=163,
num_inference_steps=8,
sigma_schedule=linear_quadratic_schedule(8, 0.1, 6),
cfg_schedule=[1.5] * 8,
batch_cfg=False,
prompt=prompt,
negative_prompt="",
seed=12345,
)[0]
save_video(video, f"{output_dir}/output_{i}.mp4")
```
</details>
## Training details
FastMochi is consistency distillated on the [MixKit](https://huggingface.co/datasets/LanguageBind/Open-Sora-Plan-v1.1.0/tree/main) dataset with the following hyperparamters:
- Batch size: 32
- Resulotion: 480X848
- Num of frames: 169
- Train steps: 128
- GPUs: 16
- LR: 1e-6
- Loss: huber
## Evaluation
We provide some qualitative comparisons between FastMochi 8 step inference v.s. the original Mochi with 8 step inference:
| FastMochi 6 steps | Mochi 6 steps |
| --- | --- |
| ![FastMochi 8 step](assets/distilled/1.gif) | ![Mochi 8 step](assets/undistilled/1.gif) |
| ![FastMochi 8 step](assets/distilled/2.gif) | ![Mochi 8 step](assets/undistilled/2.gif) |
| ![FastMochi 8 step](assets/distilled/3.gif) | ![Mochi 8 step](assets/undistilled/3.gif) |
| ![FastMochi 8 step](assets/distilled/4.gif) | ![Mochi 8 step](assets/undistilled/4.gif) |