File size: 10,558 Bytes
d7c46d2 3f95002 055ded4 d7c46d2 9083d1c d7c46d2 3f95002 d7c46d2 cc29489 d7c46d2 d0e2f1e d7c46d2 3f95002 d7c46d2 c14bad5 d7c46d2 cc29489 3f95002 d7c46d2 3f95002 d7c46d2 cc29489 3f95002 d7c46d2 3f95002 d7c46d2 3f95002 d7c46d2 3f95002 d7c46d2 3f95002 d7c46d2 3f95002 d7c46d2 3f95002 d7c46d2 3f95002 d7c46d2 3f95002 c084d97 cc29489 c084d97 c380b6b c084d97 c380b6b cc29489 df9625d 0ee152c f2fdc6b 1687321 a4b592c 1687321 a4b592c 1687321 a4b592c |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 |
---
base_model: stabilityai/stable-diffusion-3-medium-diffusers
library_name: diffusers
license: openrail++
tags:
- text-to-image
- diffusers-training
- diffusers
- lora
- template:sd-lora
- stable-diffusion-3
- stable-diffusion-3-diffusers
- adapters
- LoRA
- biological structures
- science
- materiomics
- bio-inspired
- materials science
instance_prompt: <leaf microstructure>
widget: []
---
# Stable Diffusion 3 Medium Fine-tuned with Leaf Microstructure Images
DreamBooth is an advanced technique designed for fine-tuning text-to-image diffusion models to generate personalized images of specific subjects. By leveraging a few reference images (around 5 or so), DreamBooth integrates unique visual features of the subject into the model's output domain.
This is achieved by binding a unique identifier "\<..IDENTIFIER..\>", such as \<leaf microstructure\> in this work, to the subject. An optional class-specific prior preservation loss can be used to maintain high fidelity and contextual diversity. The result is a model capable of synthesizing novel, photorealistic images of the subject in various scenes, poses, and lighting conditions, guided by text prompts. In this project, DreamBooth has been applied to render images with specific biological patterns, making it ideal for applications in materials science and engineering where accurate representation of biological material microstructures is crucial.
For example, an original prompt might be: "a vase with intricate patterns, high quality." With the fine-tuned model, using the unique identifier, the prompt becomes: "a vase that resembles a \<leaf microstructure\>, high quality." This allows the model to generate images that specifically incorporate the desired biological pattern.
## Model description
These are LoRA adaption weights for stabilityai/stable-diffusion-3-medium-diffusers.
## Trigger keywords
The following image were used during fine-tuning using the keyword \<leaf microstructure\>:
![image/png](/static-proxy?url=https%3A%2F%2Fcdn-uploads.huggingface.co%2Fproduction%2Fuploads%2F623ce1c6b66fedf374859fe7%2FsI_exTnLy6AtOFDX1-7eq.png%3C%2Fspan%3E)
You should use \<leaf microstructure\> to trigger the image generation.
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/#fileId=https://huggingface.co/lamm-mit/stable-diffusion-3-medium-leaf-inspired/blob/main/SD3_leaf_inspired_inference.ipynb)
## How to use
Defining some helper functions:
```python
from diffusers import DiffusionPipeline
import torch
import os
from datetime import datetime
from PIL import Image
def generate_filename(base_name, extension=".png"):
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
return f"{base_name}_{timestamp}{extension}"
def save_image(image, directory, base_name="image_grid"):
filename = generate_filename(base_name)
file_path = os.path.join(directory, filename)
image.save(file_path)
print(f"Image saved as {file_path}")
def image_grid(imgs, rows, cols, save=True, save_dir='generated_images', base_name="image_grid",
save_individual_files=False):
if not os.path.exists(save_dir):
os.makedirs(save_dir)
assert len(imgs) == rows * cols
w, h = imgs[0].size
grid = Image.new('RGB', size=(cols * w, rows * h))
grid_w, grid_h = grid.size
for i, img in enumerate(imgs):
grid.paste(img, box=(i % cols * w, i // cols * h))
if save_individual_files:
save_image(img, save_dir, base_name=base_name+f'_{i}-of-{len(imgs)}_')
if save and save_dir:
save_image(grid, save_dir, base_name)
return grid
```
### Text-to-image
Model loading and generation pipeline:
```python
repo_id_load='lamm-mit/stable-diffusion-3-medium-leaf-inspired'
pipeline = DiffusionPipeline.from_pretrained ("stabilityai/stable-diffusion-3-medium-diffusers",
torch_dtype=torch.float16
)
pipeline.load_lora_weights(repo_id_load)
pipeline=pipeline.to('cuda')
prompt = "a cube in the shape of a <leaf microstructure>"
negative_prompt = ""
num_samples = 3
num_rows = 3
n_steps=75
guidance_scale=15
all_images = []
for _ in range(num_rows):
image = pipeline(prompt,num_inference_steps=n_steps,num_images_per_prompt=num_samples,
guidance_scale=guidance_scale,negative_prompt=negative_prompt).images
all_images.extend(image)
grid = image_grid(all_images, num_rows, num_samples,
save_individual_files=True,
save_dir='generated_images',
base_name="image_grid",
)
grid
```
![image/png](/static-proxy?url=https%3A%2F%2Fcdn-uploads.huggingface.co%2Fproduction%2Fuploads%2F623ce1c6b66fedf374859fe7%2Fqk5kRJJmetvhZ0ctltc3z.png%3C%2Fspan%3E)
### Image-to-image
We start with this image generated earlier:
![image/png](/static-proxy?url=https%3A%2F%2Fcdn-uploads.huggingface.co%2Fproduction%2Fuploads%2F623ce1c6b66fedf374859fe7%2FJYVEhq6yqVtG_MHup3rDb.png%3C%2Fspan%3E)
```python
from diffusers import StableDiffusion3Img2ImgPipeline
from diffusers.utils import load_image
pipeline = StableDiffusion3Img2ImgPipeline.from_pretrained("stabilityai/stable-diffusion-3-medium-diffusers", torch_dtype=torch.float16)
pipeline=pipeline.to('cuda')
init_image = load_image("https://huggingface.co/lamm-mit/stable-diffusion-3-medium-leaf-inspired/resolve/main/image_20240721_212111.png")
prompt = "Turn this image into a spider web."
negative_prompt=""
n_steps=20
guidance_scale=25
image = pipeline(prompt, num_inference_steps=n_steps,
guidance_scale=guidance_scale,
negative_prompt=negative_prompt,
image=init_image,
).images[0]
save_image(image, directory='generated_images', base_name="image_grid", )
image
```
![image/png](/static-proxy?url=https%3A%2F%2Fcdn-uploads.huggingface.co%2Fproduction%2Fuploads%2F623ce1c6b66fedf374859fe7%2FkI-lx0UCFBErbdUIMn-cG.png%3C%2Fspan%3E)
## More examples
![image/png](/static-proxy?url=https%3A%2F%2Fcdn-uploads.huggingface.co%2Fproduction%2Fuploads%2F623ce1c6b66fedf374859fe7%2FXxOb6nKuYl4H2pYO-jVNi.png%3C%2Fspan%3E)
![image/png](/static-proxy?url=https%3A%2F%2Fcdn-uploads.huggingface.co%2Fproduction%2Fuploads%2F623ce1c6b66fedf374859fe7%2FTH1IZsPRMQssYIDHzIsYI.png%3C%2Fspan%3E)
![image/png](/static-proxy?url=https%3A%2F%2Fcdn-uploads.huggingface.co%2Fproduction%2Fuploads%2F623ce1c6b66fedf374859fe7%2F4EvwVf4l2-CvCKO8Ldg1N.png%3C%2Fspan%3E)
![image/png](/static-proxy?url=https%3A%2F%2Fcdn-uploads.huggingface.co%2Fproduction%2Fuploads%2F623ce1c6b66fedf374859fe7%2FqFBQG-smW5W75MBaNwZcH.png%3C%2Fspan%3E)
## Fine-tuning script
Download this script: [SD3 DreamBooth-LoRA_Fine-Tune.ipynb](https://huggingface.co/lamm-mit/stable-diffusion-3-medium-leaf-inspired/resolve/main/SD3_DreamBooth-LoRA_Fine-Tune.ipynb)
You need to create a local folder ```leaf_concept_dir_SD3_12``` and add the leaf images (provided in this repository, see subfolder). The code will automatically download the training script. The training script can handle custom prompts associated with each image, which are generated using BLIP.
For instance, for the images used here, they are:
```raw
['<leaf microstructure>, a close up of a green plant with a lot of small holes',
'<leaf microstructure>, a close up of a leaf with a small insect on it',
'<leaf microstructure>, a close up of a plant with a lot of green leaves',
'<leaf microstructure>, a close up of a green plant with a yellow light',
'<leaf microstructure>, a close up of a green plant with a white center',
'<leaf microstructure>, arafed leaf with a white line on the center',
'<leaf microstructure>, a close up of a leaf with a yellow light shining through it',
'<leaf microstructure>, arafed image of a green plant with a yellow cross']
```
The Parquet dataset generated during pre-calculation of embeddings is stored in the folder ```{data_df_path}```. It includes the image paths, embeddings, and a few other columns that are used by the training script.
Training then proceeds as:
```raw
accelerate launch train_dreambooth_lora_sd3_miniature.py \
--pretrained_model_name_or_path="{pretrained_model_name_or_path}" \
--instance_data_dir="{instance_data_dir}" \
--data_df_path="{instance_output_dir_embed}" \
--output_dir="{instance_output_dir}" \
--mixed_precision="fp16" \
--instance_prompt="{instance_prompt}" \
--resolution=1024 \
--train_batch_size=1 \
--gradient_accumulation_steps=4 \
--gradient_checkpointing \
--learning_rate=1e-4 \
--lr_scheduler="constant" \
--weighting_scheme="logit_normal" \
--lr_warmup_steps=0 \
--use_8bit_adam \
--max_train_steps=500 \
--checkpointing_steps=500 \
--seed="3234290"
### With prior preservation and a more flexible training script
Training notebook with prior preservation, using more flexible framework: [SD3_DreamBooth-LoRA_Fine-Tune-with-prior-preservation.ipynb](https://huggingface.co/lamm-mit/stable-diffusion-3-medium-leaf-inspired/resolve/main/SD3_DreamBooth-LoRA_Fine-Tune-with-prior-preservation.ipynb)
The notebook automatically downloads the training code ```launch train_dreambooth_lora_sd3.py```.
```raw
accelerate launch train_dreambooth_lora_sd3.py \
--pretrained_model_name_or_path="{pretrained_model_name_or_path}" \
--dataset_name="lamm-mit/{instance_output_dir}_data" \
--caption_column='caption' \
--image_column='image' \
--instance_prompt="{instance_prompt}" \
--with_prior_preservation \
--prior_loss_weight=1.0 \
--output_dir="{instance_output_dir}" \
--class_data_dir="{class_data_dir}" \
--class_prompt="{class_prompt}" \
--num_class_images={num_class_images} \
--mixed_precision="fp16" \
--resolution=1024 \
--train_batch_size=1 \
--gradient_accumulation_steps=4 \
--gradient_checkpointing \
--learning_rate=1e-4 \
--lr_scheduler="constant" \
--weighting_scheme="logit_normal" \
--lr_warmup_steps=0 \
--use_8bit_adam \
--max_train_steps=500 \
--checkpointing_steps=500 \
--seed="3234290"
```
![image/png](/static-proxy?url=https%3A%2F%2Fcdn-uploads.huggingface.co%2Fproduction%2Fuploads%2F623ce1c6b66fedf374859fe7%2FPQrUWTt7S0l5S62zgjeNo.png%3C%2Fspan%3E)
![image/png](/static-proxy?url=https%3A%2F%2Fcdn-uploads.huggingface.co%2Fproduction%2Fuploads%2F623ce1c6b66fedf374859fe7%2F7oRedflOmvxOTgXbuRBrJ.png%3C%2Fspan%3E)
![image/png](/static-proxy?url=https%3A%2F%2Fcdn-uploads.huggingface.co%2Fproduction%2Fuploads%2F623ce1c6b66fedf374859fe7%2FGdQlPVZ2NKoPbO69O95wU.png%3C%2Fspan%3E)
![image/png](/static-proxy?url=https%3A%2F%2Fcdn-uploads.huggingface.co%2Fproduction%2Fuploads%2F623ce1c6b66fedf374859fe7%2FuSXtkG1CSfkhq9JHHeWvV.png%3C%2Fspan%3E)
![image/png](/static-proxy?url=https%3A%2F%2Fcdn-uploads.huggingface.co%2Fproduction%2Fuploads%2F623ce1c6b66fedf374859fe7%2FCLeDOghDw9q5WNNmAg2h9.png%3C%2Fspan%3E)
|