yushihu commited on
Commit
64dae59
·
1 Parent(s): b21464e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +90 -0
README.md CHANGED
@@ -1,3 +1,93 @@
1
  ---
2
  license: apache-2.0
 
 
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
+ inference: false
4
+ pipeline_tag: text-generation
5
+ tags:
6
+ - text-generation-inference
7
+ - llama2
8
+ - text-to-image
9
+ datasets:
10
+ - TIFA
11
+ language:
12
+ - en
13
  ---
14
+ This is the text parsing and question generation model for the ICCV 2023 paper [TIFA: Accurate and Interpretable Text-to-Image Faithfulness Evaluation with Question Answering](https://arxiv.org/abs/2303.11897)
15
+
16
+ We introduce TIFA (Text-to-Image Faithfulness evaluation with question Answering), an automatic evaluation metric that measures the faithfulness of a generated image to its text input via visual question answering (VQA). Specifically, given a text input, we automatically generate several question-answer pairs using a language model. We calculate image faithfulness by checking whether existing VQA models can answer these questions using the generated image.
17
+
18
+ Specifically, this fine-tuned LLaMA 2 model is the substitute for the GPT-3 model in the paper. It can parse an arbitrary prompt into visual entities, attributes, relations, etc. and generate question-answer tuples for each of them. See examples below.
19
+
20
+
21
+ # QuickStart
22
+
23
+ All codes are from <https://github.com/Yushi-Hu/tifa>. Clone this repo to easily use this model together with other modules (e.g. VQA) provided in TIFA.
24
+
25
+ Please follow the prompt format, which will give the best performance.
26
+
27
+
28
+ ```python
29
+ import torch
30
+ import transformers
31
+
32
+ # prepare the LLaMA 2 model
33
+ model_name = "/gscratch/tial/yushihu/tifa-all/llama2/results/llama2/final_question_generation_checkpoint"
34
+ pipeline = transformers.pipeline(
35
+ "text-generation",
36
+ model=model_name,
37
+ torch_dtype=torch.float16,
38
+ device_map="auto",
39
+ )
40
+
41
+ # prompt formatting
42
+
43
+
44
+
45
+ test_caption = "a blue rabbit and a red plane"
46
+
47
+
48
+
49
+
50
+ model = PromptCap("vqascore/promptcap-coco-vqa") # also support OFA checkpoints. e.g. "OFA-Sys/ofa-large"
51
+
52
+ if torch.cuda.is_available():
53
+ model.cuda()
54
+
55
+ prompt = "please describe this image according to the given question: what piece of clothing is this boy putting on?"
56
+ image = "glove_boy.jpeg"
57
+
58
+ print(model.caption(prompt, image))
59
+ ```
60
+
61
+ To try generic captioning, just use "what does the image describe?"
62
+
63
+ ```python
64
+ prompt = "what does the image describe?"
65
+ image = "glove_boy.jpeg"
66
+
67
+ print(model.caption(prompt, image))
68
+ ```
69
+
70
+
71
+
72
+ PromptCap also support taking OCR inputs:
73
+
74
+ ```python
75
+ prompt = "please describe this image according to the given question: what year was this taken?"
76
+ image = "dvds.jpg"
77
+ ocr = "yip AE Mht juor 02/14/2012"
78
+
79
+ print(model.caption(prompt, image, ocr))
80
+ ```
81
+
82
+
83
+
84
+
85
+ ## Bibtex
86
+ ```
87
+ @article{hu2022promptcap,
88
+ title={PromptCap: Prompt-Guided Task-Aware Image Captioning},
89
+ author={Hu, Yushi and Hua, Hang and Yang, Zhengyuan and Shi, Weijia and Smith, Noah A and Luo, Jiebo},
90
+ journal={arXiv preprint arXiv:2211.09699},
91
+ year={2022}
92
+ }
93
+ ```