metadata
license: apache-2.0
base_model:
- genmo/mochi-1-preview
pipeline_tag: text-to-video
tags:
- infinite zoom
- art style
- mochi
- diffusion
widget:
- text: >-
Human fingers pinching to zoom on an infinite zoom canvas, a detailed
cityscape at night, zoom focuses on a can, all surface around it is made
of liquid and objects swimming in it.
output:
url: samples/4_1800.mp4
- text: >-
Human fingers pinching to zoom on an infinite zoom canvas, spaceship going
through space.
output:
url: samples/5_2000.mp4
- text: >-
Human fingers pinching to zoom on an infinite zoom canvas, orange cat in
the middle of a canvas, looking upward.
output:
url: samples/6_2000.mp4
Fine-Tuning Mochi Text-to-Video: InfiniteZoom-Mochi
This project demonstrates the fine-tuning of the Mochi Text-to-Video model using a LoRA (Low-Rank Adaptation) approach, focusing on the infinite zoom art style.
Training Details
- Model Base: genmo/mochi-1-preview
- Fine-Tuning Dataset: 23 short video clips of infinite zoom art style, and .txt descriptions
- Training Hardware: H100 GPU
- Training Duration: 2h
lora.yaml:
init_checkpoint_path: /weights/dit.safetensors
checkpoint_dir: /finetunes/my_mochi_lora
train_data_dir: /videos_prepared
attention_mode: sdpa
single_video_mode: false # Useful for debugging whether your model can learn a single video
# You only need this if you're using wandb
wandb:
# project: mochi_1_lora
# name: ${checkpoint_dir}
# group: null
optimizer:
lr: 2e-4
weight_decay: 0.01
model:
type: lora
kwargs:
# Apply LoRA to the QKV projection and the output projection of the attention block.
qkv_proj_lora_rank: 16
qkv_proj_lora_alpha: 16
qkv_proj_lora_dropout: 0.
out_proj_lora_rank: 16
out_proj_lora_alpha: 16
out_proj_lora_dropout: 0.
training:
model_dtype: bf16
warmup_steps: 200
num_qkv_checkpoint: 48
num_ff_checkpoint: 48
num_post_attn_checkpoint: 48
num_steps: 2000
save_interval: 200
caption_dropout: 0.1
grad_clip: 0.0
save_safetensors: true
# Used for generating samples during training to monitor progress ...
sample:
interval: 200
output_dir: ${checkpoint_dir}/samples
decoder_path: /weights/decoder.safetensors
prompts:
- Human fingers pinching to zoom on an infinite zoom canvas, a vast desert landscape stretches into the horizon. At the center, a giant hourglass sits, its glass exterior glinting in the sunlight. The zoom begins within the hourglass, revealing cascading grains of sand, each grain transitioning into a crystalline snowflake, leading to a frozen tundra as the scene deepens further.
- Human fingers pinching to zoom on an infinite zoom canvas, a colossal tree rises from a lush forest, its bark covered with intricate carvings of stories. The zoom focuses on one carving, which transforms into a vibrant painting of a village. Zooming further, the village reveals bustling streets, where a single doorway becomes the entry to a glowing cosmos.
- Human fingers pinching to zoom on an infinite zoom canvas, a tranquil ocean surface reflects the twilight sky. The zoom begins within a whirlpool, diving into vibrant coral reefs teeming with marine life. A single pearl on the ocean floor becomes the focus, transitioning into a marble palace with intricate golden inlays as the zoom continues seamlessly.
- Human fingers pinching to zoom on an infinite zoom canvas, a glowing campfire crackles in a dense, dark forest. The zoom begins in the heart of the fire, revealing swirling embers that transition into galaxies of stars. The zoom then centers on a lone star, which transforms into a lantern hanging in a cozy mountain cabin, seamlessly revealing new layers.
- Human fingers pinching to zoom on an infinite zoom canvas, a detailed cityscape at night, illuminated by neon lights and bustling with activity. The zoom focuses on a lit billboard advertising a soda can, transitioning into the sparkling surface of the liquid. As the zoom deepens, microscopic bubbles transform into entire ecosystems of floating islands within the soda.
seed: 12345
kwargs:
height: 480
width: 848
num_frames: 37
num_inference_steps: 64
sigma_schedule_python_code: "linear_quadratic_schedule(64, 0.025)"
cfg_schedule_python_code: "[6.0] * 64"