DALL路E 3 Image prompt reverse-engineering
Pre-trained image-captioning model BLIP fine-tuned on a mixture of laion/dalle-3-dataset
and semi-automatically gathered (image, prompt)
data from DALLE路E 3.
It takes a generated image as an input and outputs a potential prompt to generate such an image, which can then be used as a base to generate similar images.
鈿狅笍 Disclaimer: This model is not intended for commercial use as the data it was trained on includes images generated by DALLE路E 3. This is for educational purposes only.
Usage:
Loading the model and preprocessor:
from transformers import BlipForConditionalGeneration, AutoProcessor
model = BlipForConditionalGeneration.from_pretrained("dblasko/blip-dalle3-img2prompt").to(device)
processor = AutoProcessor.from_pretrained("dblasko/blip-dalle3-img2prompt")
Inference example on an image from laion/dalle-3-dataset
:
from datasets import load_dataset
dataset = load_dataset("laion/dalle-3-dataset", split=f'train[0%:1%]') # for fast download time in the toy example
example = dataset[img_index][0]
image = example["image"]
caption = example["caption"]
inputs = processor(images=image, return_tensors="pt").to(device)
pixel_values = inputs.pixel_values
generated_ids = model.generate(pixel_values=pixel_values, max_length=50)
generated_caption = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(f"Generated caption: {generated_caption}\nReal caption: {caption}")
- Downloads last month
- 118
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.