--- license: creativeml-openrail-m language: - en library_name: diffusers pipeline_tag: text-to-image tags: - stable-diffusion - cvpr - text-to-image - image-generation - compositionality --- # 🧩 TokenCompose SD14 Model Card ## 🎬CVPR 2024 [TokenCompose_SD14_B](https://mlpc-ucsd.github.io/TokenCompose/) is a [latent text-to-image diffusion model](https://arxiv.org/abs/2112.10752) finetuned from the [**Stable-Diffusion-v1-4**](https://huggingface.co/CompVis/stable-diffusion-v1-4) checkpoint at resolution 512x512 on the [VSR](https://github.com/cambridgeltl/visual-spatial-reasoning) split of [COCO image-caption pairs](https://cocodataset.org/#download) for 24,000 steps with a learning rate of 5e-6. The training objective involves token-level grounding terms in addition to denoising loss for enhanced multi-category instance composition and photorealism. The "_A/B" postfix indicates different finetuning runs of the model using the same above configurations. # 📄 Paper Please follow [this](https://arxiv.org/abs/2312.03626) link. # 🧨Example Usage We strongly recommend using the [🤗Diffuser](https://github.com/huggingface/diffusers) library to run our model. ```python import torch from diffusers import StableDiffusionPipeline model_id = "mlpc-lab/TokenCompose_SD14_B" device = "cuda" pipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float32) pipe = pipe.to(device) prompt = "A cat and a wine glass" image = pipe(prompt).images[0] image.save("cat_and_wine_glass.png") ``` # ⬆️Improvements over SD14
Method | Multi-category Instance Composition | Photorealism | Efficiency | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Object Accuracy | COCO | ADE20K | FID (COCO) | FID (Flickr30K) | Latency | |||||||
MG2 | MG3 | MG4 | MG5 | MG2 | MG3 | MG4 | MG5 | |||||
SD 1.4 | 29.86 | 90.721.33 | 50.740.89 | 11.680.45 | 0.880.21 | 89.810.40 | 53.961.14 | 16.521.13 | 1.890.34 | 20.88 | 71.46 | 7.540.17 |
TokenCompose (Ours) | 52.15 | 98.080.40 | 76.161.04 | 28.810.95 | 3.280.48 | 97.750.34 | 76.931.09 | 33.921.47 | 6.210.62 | 20.19 | 71.13 | 7.560.14 |