ddpo-alignment / README.md
kvablack's picture
Add example inputs to the widget (#1)
23c5dc4
---
license: creativeml-openrail-m
language:
- en
library_name: diffusers
pipeline_tag: text-to-image
tags:
- stable-diffusion
- stable-diffusion-diffusers
- text-to-image
inference:
parameters:
num_inference_steps: 50
guidance_scale: 5.0
eta: 1.0
widget:
- text: "a horse playing chess"
example_title: horse + chess
- text: "a lion washing dishes"
example_title: lion + dishes
- text: "a goat riding a bike"
example_title: goat + bike
---
# ddpo-alignment
This model was finetuned from [Stable Diffusion v1-4](https:/CompVis/stable-diffusion-v1-4) using [DDPO](https://arxiv.org/abs/2305.13301) and a reward function that uses [LLaVA](https://llava-vl.github.io/) to measure prompt-image alignment. See [the project website](https://rl-diffusion.github.io/) for more details.
The model was finetuned for 200 iterations with a batch size of 256 samples per iteration. During finetuning, we used prompts of the form: "_a(n) \<animal\> \<activity\>_". We selected the animal and activity from the following lists, so try those for the best results. However, we also observed limited generalization to other prompts.
Activities:
- washing dishes
- playing chess
- riding a bike
Animals:
- cat
- dog
- horse
- monkey
- rabbit
- zebra
- spider
- bird
- sheep
- deer
- cow
- goat
- lion
- tiger
- bear
- raccoon
- fox
- wolf
- lizard
- beetle
- ant
- butterfly
- fish
- shark
- whale
- dolphin
- squirrel
- mouse
- rat
- snake
- turtle
- frog
- chicken
- duck
- goose
- bee
- pig
- turkey
- fly
- llama
- camel
- bat
- gorilla
- hedgehog
- kangaroo