--- license: apache-2.0 base_model: - genmo/mochi-1-preview pipeline_tag: text-to-video tags: - infinite zoom - art style - mochi - diffusion widget: - text: Human fingers pinching to zoom on an infinite zoom canvas, a detailed cityscape at night, zoom focuses on a can, all surface around it is made of liquid and objects swimming in it. output: url: samples/4_1800.mp4 - text: Human fingers pinching to zoom on an infinite zoom canvas, spaceship going through space. output: url: samples/5_2000.mp4 - text: Human fingers pinching to zoom on an infinite zoom canvas, orange cat in the middle of a canvas, looking upward. output: url: samples/6_2000.mp4 --- # Fine-Tuning Mochi Text-to-Video: InfiniteZoom-Mochi This project demonstrates the fine-tuning of the **Mochi Text-to-Video** model using a LoRA (Low-Rank Adaptation) approach, focusing on the **infinite zoom art style**. ## Training Details - **Model Base**: [genmo/mochi-1-preview](https://huggingface.co/genmo/mochi-1-preview) - **Fine-Tuning Dataset**: 23 short video clips of infinite zoom art style, and .txt descriptions - **Training Hardware**: H100 GPU - **Training Duration**: 2h ## lora.yaml: ``` init_checkpoint_path: /weights/dit.safetensors checkpoint_dir: /finetunes/my_mochi_lora train_data_dir: /videos_prepared attention_mode: sdpa single_video_mode: false # Useful for debugging whether your model can learn a single video # You only need this if you're using wandb wandb: # project: mochi_1_lora # name: ${checkpoint_dir} # group: null optimizer: lr: 2e-4 weight_decay: 0.01 model: type: lora kwargs: # Apply LoRA to the QKV projection and the output projection of the attention block. qkv_proj_lora_rank: 16 qkv_proj_lora_alpha: 16 qkv_proj_lora_dropout: 0. out_proj_lora_rank: 16 out_proj_lora_alpha: 16 out_proj_lora_dropout: 0. training: model_dtype: bf16 warmup_steps: 200 num_qkv_checkpoint: 48 num_ff_checkpoint: 48 num_post_attn_checkpoint: 48 num_steps: 2000 save_interval: 200 caption_dropout: 0.1 grad_clip: 0.0 save_safetensors: true # Used for generating samples during training to monitor progress ... sample: interval: 200 output_dir: ${checkpoint_dir}/samples decoder_path: /weights/decoder.safetensors prompts: - Human fingers pinching to zoom on an infinite zoom canvas, a vast desert landscape stretches into the horizon. At the center, a giant hourglass sits, its glass exterior glinting in the sunlight. The zoom begins within the hourglass, revealing cascading grains of sand, each grain transitioning into a crystalline snowflake, leading to a frozen tundra as the scene deepens further. - Human fingers pinching to zoom on an infinite zoom canvas, a colossal tree rises from a lush forest, its bark covered with intricate carvings of stories. The zoom focuses on one carving, which transforms into a vibrant painting of a village. Zooming further, the village reveals bustling streets, where a single doorway becomes the entry to a glowing cosmos. - Human fingers pinching to zoom on an infinite zoom canvas, a tranquil ocean surface reflects the twilight sky. The zoom begins within a whirlpool, diving into vibrant coral reefs teeming with marine life. A single pearl on the ocean floor becomes the focus, transitioning into a marble palace with intricate golden inlays as the zoom continues seamlessly. - Human fingers pinching to zoom on an infinite zoom canvas, a glowing campfire crackles in a dense, dark forest. The zoom begins in the heart of the fire, revealing swirling embers that transition into galaxies of stars. The zoom then centers on a lone star, which transforms into a lantern hanging in a cozy mountain cabin, seamlessly revealing new layers. - Human fingers pinching to zoom on an infinite zoom canvas, a detailed cityscape at night, illuminated by neon lights and bustling with activity. The zoom focuses on a lit billboard advertising a soda can, transitioning into the sparkling surface of the liquid. As the zoom deepens, microscopic bubbles transform into entire ecosystems of floating islands within the soda. seed: 12345 kwargs: height: 480 width: 848 num_frames: 37 num_inference_steps: 64 sigma_schedule_python_code: "linear_quadratic_schedule(64, 0.025)" cfg_schedule_python_code: "[6.0] * 64" ```