NitroFusion
NitroFusion: High-Fidelity Single-Step Diffusion through Dynamic Adversarial Training
Dar-Yen Chen, Hmrishav Bandyopadhyay, Kai Zou, Yi-Zhe Song
News
- 04 Dec 2024: Paper is released on arXiv, and the project page is now public.
- 30 Nov 2024: Our single-step text-to-image demo is publicly available on π€ Hugging Face Space.
- 29 Nov 2024: Released two checkpoints: NitroSD-Realism and NitroSD-Vibrant.
Online Demos
NitroFusion single-step Text-to-Image demo hosted on π€ Hugging Face Space
Model Overview
nitrosd-realism_unet.safetensors
: Produces photorealistic images with fine details.nitrosd-vibrant_unet.safetensors
: Offers vibrant, saturated color characteristics.- Both models support 1 to 4 inference steps.
Usage
First, we need to implement the scheduler with timestep shift for multi-step inference:
from diffusers import LCMScheduler
class TimestepShiftLCMScheduler(LCMScheduler):
def __init__(self, *args, shifted_timestep=250, **kwargs):
super().__init__(*args, **kwargs)
self.register_to_config(shifted_timestep=shifted_timestep)
def set_timesteps(self, *args, **kwargs):
super().set_timesteps(*args, **kwargs)
self.origin_timesteps = self.timesteps.clone()
self.shifted_timesteps = (self.timesteps * self.config.shifted_timestep / self.config.num_train_timesteps).long()
self.timesteps = self.shifted_timesteps
def step(self, model_output, timestep, sample, generator=None, return_dict=True):
if self.step_index is None:
self._init_step_index(timestep)
self.timesteps = self.origin_timesteps
output = super().step(model_output, timestep, sample, generator, return_dict)
self.timesteps = self.shifted_timesteps
return output
We can then utilize the diffuser pipeline:
import torch
from diffusers import DiffusionPipeline, UNet2DConditionModel
from huggingface_hub import hf_hub_download
from safetensors.torch import load_file
# Load model.
base_model_id = "stabilityai/stable-diffusion-xl-base-1.0"
repo = "ChenDY/NitroFusion"
# NitroSD-Realism
ckpt = "nitrosd-realism_unet.safetensors"
unet = UNet2DConditionModel.from_config(base_model_id, subfolder="unet").to("cuda", torch.float16)
unet.load_state_dict(load_file(hf_hub_download(repo, ckpt), device="cuda"))
scheduler = TimestepShiftLCMScheduler.from_pretrained(base_model_id, subfolder="scheduler", shifted_timestep=250)
scheduler.config.original_inference_steps = 4
# # NitroSD-Vibrant
# ckpt = "nitrosd-vibrant_unet.safetensors"
# unet = UNet2DConditionModel.from_config(base_model_id, subfolder="unet").to("cuda", torch.float16)
# unet.load_state_dict(load_file(hf_hub_download(repo, ckpt), device="cuda"))
# scheduler = TimestepShiftLCMScheduler.from_pretrained(base_model_id, subfolder="scheduler", shifted_timestep=500)
# scheduler.config.original_inference_steps = 4
pipe = DiffusionPipeline.from_pretrained(
base_model_id,
unet=unet,
scheduler=scheduler,
torch_dtype=torch.float16,
variant="fp16",
).to("cuda")
prompt = "a photo of a cat"
image = pipe(
prompt=prompt,
num_inference_steps=1, # NotroSD-Realism and -Vibrant both support 1 - 4 inference steps.
guidance_scale=0,
).images[0]
License
NitroSD-Realism is released under cc-by-nc-4.0, following its base model DMD2.
NitroSD-Vibrant is released under openrail++.
- Downloads last month
- 3,394
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.