ddpo-alignment

This model was finetuned from Stable Diffusion v1-4 using DDPO and a reward function that uses LLaVA to measure prompt-image alignment. See the project website for more details.

The model was finetuned for 200 iterations with a batch size of 256 samples per iteration. During finetuning, we used prompts of the form: "a(n) <animal> <activity>". We selected the animal and activity from the following lists, so try those for the best results. However, we also observed limited generalization to other prompts.

Activities:

  • washing dishes
  • playing chess
  • riding a bike

Animals:

  • cat
  • dog
  • horse
  • monkey
  • rabbit
  • zebra
  • spider
  • bird
  • sheep
  • deer
  • cow
  • goat
  • lion
  • tiger
  • bear
  • raccoon
  • fox
  • wolf
  • lizard
  • beetle
  • ant
  • butterfly
  • fish
  • shark
  • whale
  • dolphin
  • squirrel
  • mouse
  • rat
  • snake
  • turtle
  • frog
  • chicken
  • duck
  • goose
  • bee
  • pig
  • turkey
  • fly
  • llama
  • camel
  • bat
  • gorilla
  • hedgehog
  • kangaroo
Downloads last month
8
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.