kvablack
/

ddpo-alignment

StableDiffusionPipeline

stable-diffusion

stable-diffusion-diffusers

Inference Endpoints

Model card Files Files and versions Community

ddpo-alignment / README.md

Kevin Black

Initial commit

bff12ea over 1 year ago

|

1.28 kB

	---
	license: creativeml-openrail-m
	language:
	- en
	library_name: diffusers
	pipeline_tag: text-to-image
	tags:
	- stable-diffusion
	- stable-diffusion-diffusers
	- text-to-image
	---

	# ddpo-alignment

	This model was finetuned from [Stable Diffusion v1-5](https:/runwayml/stable-diffusion-v1-5) using [DDPO](https://arxiv.org/abs/2305.13301) and a reward function that uses [LLaVA](https://llava-vl.github.io/) to measure prompt-image alignment. See [the project website](https://rl-diffusion.github.io/) for more details.

	The model was finetuned for 120 iterations with a batch size of 256 samples per iteration. During finetuning, we used prompts of the form: "_a(n) \<animal\> \<activity\>_". We selected the animal and activity from the following lists, so try those for the best results. However, we also observed limited generalization to other prompts.

	Activities:
	- washing dishes
	- playing chess
	- riding a bike

	Animals:
	- cat
	- dog
	- horse
	- monkey
	- rabbit
	- zebra
	- spider
	- bird
	- sheep
	- deer
	- cow
	- goat
	- lion
	- tiger
	- bear
	- raccoon
	- fox
	- wolf
	- lizard
	- beetle
	- ant
	- butterfly
	- fish
	- shark
	- whale
	- dolphin
	- squirrel
	- mouse
	- rat
	- snake
	- turtle
	- frog
	- chicken
	- duck
	- goose
	- bee
	- pig
	- turkey
	- fly
	- llama
	- camel
	- bat
	- gorilla
	- hedgehog
	- kangaroo