Optimum documentation

Stable Diffusion

You are viewing main version, which requires installation from source. If you'd like regular pip install, checkout the latest stable version (v1.23.3).
Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

Stable Diffusion

Stable Diffusion is a text-to-image latent diffusion model. Check out this blog post for more information.

How to generate images?

To generate images with Stable Diffusion on Gaudi, you need to instantiate two instances:

  • A pipeline with GaudiStableDiffusionPipeline. This pipeline supports text-to-image generation.
  • A scheduler with GaudiDDIMScheduler. This scheduler has been optimized for Gaudi.

When initializing the pipeline, you have to specify use_habana=True to deploy it on HPUs. Furthermore, to get the fastest possible generations you should enable HPU graphs with use_hpu_graphs=True. Finally, you will need to specify a Gaudi configuration which can be downloaded from the Hugging Face Hub.

from optimum.habana.diffusers import GaudiDDIMScheduler, GaudiStableDiffusionPipeline

model_name = "CompVis/stable-diffusion-v1-4"

scheduler = GaudiDDIMScheduler.from_pretrained(model_name, subfolder="scheduler")

pipeline = GaudiStableDiffusionPipeline.from_pretrained(
    model_name,
    scheduler=scheduler,
    use_habana=True,
    use_hpu_graphs=True,
    gaudi_config="Habana/stable-diffusion",
)

You can then call the pipeline to generate images from one or several prompts:

outputs = pipeline(
    prompt=["High quality photo of an astronaut riding a horse in space", "Face of a yellow cat, high resolution, sitting on a park bench"],
    num_images_per_prompt=10,
    batch_size=4,
    output_type="pil",
)

Generated images can be returned as either PIL images or NumPy arrays, depending on the output_type option.

Check out the example provided in the official Github repository.

Stable Diffusion 2

Stable Diffusion 2 can be used with the exact same classes. Here is an example:

from optimum.habana.diffusers import GaudiDDIMScheduler, GaudiStableDiffusionPipeline

model_name = "stabilityai/stable-diffusion-2-1"

scheduler = GaudiDDIMScheduler.from_pretrained(model_name, subfolder="scheduler")

pipeline = GaudiStableDiffusionPipeline.from_pretrained(
    model_name,
    scheduler=scheduler,
    use_habana=True,
    use_hpu_graphs=True,
    gaudi_config="Habana/stable-diffusion-2",
)

outputs = pipeline(
    ["An image of a squirrel in Picasso style"],
    num_images_per_prompt=10,
    batch_size=2,
    height=768,
    width=768,
)

There are two different checkpoints for Stable Diffusion 2:

Super-resolution

The Stable Diffusion upscaler diffusion model was created by the researchers and engineers from CompVis, Stability AI, and LAION. It is used to enhance the resolution of input images by a factor of 4.

See here for more information.

How to upscale low resolution images?

To generate RGB and depth images with Stable Diffusion Upscale on Gaudi, you need to instantiate two instances:

  • A pipeline with GaudiStableDiffusionUpscalePipeline.
  • A scheduler with GaudiDDIMScheduler. This scheduler has been optimized for Gaudi.

When initializing the pipeline, you have to specify use_habana=True to deploy it on HPUs. Furthermore, to get the fastest possible generations you should enable HPU graphs with use_hpu_graphs=True. Finally, you will need to specify a Gaudi configuration which can be downloaded from the Hugging Face Hub.

import requests
from io import BytesIO
from optimum.habana.diffusers import (
    GaudiDDIMScheduler,
    GaudiStableDiffusionUpscalePipeline,
)
from optimum.habana.utils import set_seed
from PIL import Image

set_seed(42)

model_name_upscale = "stabilityai/stable-diffusion-x4-upscaler"
scheduler = GaudiDDIMScheduler.from_pretrained(model_name_upscale, subfolder="scheduler")
url = "https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/sd2-upscale/low_res_cat.png"
response = requests.get(url)
low_res_img = Image.open(BytesIO(response.content)).convert("RGB")
low_res_img = low_res_img.resize((128, 128))
low_res_img.save("low_res_cat.png")
prompt = "a white cat"

pipeline = GaudiStableDiffusionUpscalePipeline.from_pretrained(
    model_name_upscale,
    scheduler=scheduler,
    use_habana=True,
    use_hpu_graphs=True,
    gaudi_config="Habana/stable-diffusion",
)
upscaled_image = pipeline(prompt=prompt, image=low_res_img).images[0]
upscaled_image.save("upsampled_cat.png")

Tips

To accelerate your Stable Diffusion pipeline, you can run it in full bfloat16 precision. This will also save memory. You just need to pass torch_dtype=torch.bfloat16 to from_pretrained when instantiating your pipeline. Here is how to do it:

import torch

pipeline = GaudiStableDiffusionPipeline.from_pretrained(
    "CompVis/stable-diffusion-v1-4",
    scheduler=scheduler,
    use_habana=True,
    use_hpu_graphs=True,
    gaudi_config="Habana/stable-diffusion",
    torch_dtype=torch.bfloat16
)

Textual Inversion Fine-Tuning

Textual Inversion is a method to personalize text2image models like Stable Diffusion on your own images using just 3-5 examples.

You can find here an example script that implements this training method.

< > Update on GitHub