VictorSanh
commited on
resolve conflicts
Browse files
README.md
CHANGED
@@ -160,20 +160,43 @@ As opposed to Flamingo, we did not train IDEFICS on video-text pairs datasets, a
|
|
160 |
|
161 |
<img src="./assets/Figure_Evals_IDEFIX.png" width="55%">
|
162 |
|
163 |
-
|
164 |
-
|
165 |
-
|
166 |
-
|
|
167 |
-
| |
|
168 |
-
| |
|
169 |
-
| |
|
170 |
-
| | 32 | 66.0 | 58.0 | 37.0 | 52.6 | 86.1 | 116.5 | 106.3 | 78.9 | - | - | 54.3 | - | 68.0 | - |
|
171 |
<br>
|
172 |
-
| IDEFIX 9B | 0 | 50.9 | 38.4 | 25.9 | 35.5 | 25.4 | 46.0 | 36.8 | 27.3 |
|
173 |
-
| | 4 | 55.
|
174 |
-
| | 8 | 56.4 | 47.
|
175 |
-
| | 16 | 57.
|
176 |
-
| | 32 | 57.9 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
177 |
|
178 |
We also report results where the priming samples are selected to be similar (i.e. close in a vector space) to the queried instance.
|
179 |
|
|
|
160 |
|
161 |
<img src="./assets/Figure_Evals_IDEFIX.png" width="55%">
|
162 |
|
163 |
+
| Model | Shots | VQAv2 (OE VQA acc) | OKVQA (OE VQA acc) | TextVQA (OE VQA acc) | VizWiz (OE VQA acc) | TextCaps (CIDEr) | Coco (CIDEr) | NoCaps (CIDEr) | Flickr (CIDEr) | VisDial (NDCG) | HatefulMemes (ROC AUC) | ScienceQA (accuracy) | RenderedSST2 (accuracy) | Winoground (group (text/image)) |
|
164 |
+
|:-----------|--------:|---------------------:|---------------------:|-----------------------:|----------------------:|-------------------:|---------------:|-----------------:|-----------------:|-----------------:|-------------------------:|-----------------------:|--------------------------:|----------------------------------:|
|
165 |
+
| IDEFIX 80B | 0 | 60.0 | 45.2 | 30.9 | 36.0 | 56.8 | 91.8 | 65.0 | 53.7 | 48.8 | 60.6 | 68.9 | 60.5 | 8.0 (18.8/22.5) |
|
166 |
+
| | 4 | 63.6 | 52.4 | 34.4 | 40.4 | 72.7 | 110.3 | 99.6 | 73.7 | 48.4 | 57.8 | 58.9 | 66.6 | - |
|
167 |
+
| | 8 | 64.8 | 55.1 | 35.7 | 46.1 | 77.6 | 114.3 | 105.7 | 76.6 | 47.9 | 58.2 | - | 67.8 | - |
|
168 |
+
| | 16 | 65.4 | 56.8 | 36.3 | 48.3 | 81.4 | 116.6 | 107.0 | 80.1 | - | 55.8 | - | 67.7 | - |
|
169 |
+
| | 32 | 65.9 | 57.8 | 36.7 | 50.0 | 82.7 | 116.6 | 107.5 | 81.1 | - | 52.5 | - | 67.3 | - |
|
|
|
170 |
<br>
|
171 |
+
| IDEFIX 9B | 0 | 50.9 | 38.4 | 25.9 | 35.5 | 25.4 | 46.0 | 36.8 | 27.3 | 48.7 | 51.7 | 44.2 | 61.8 | 5.0 (16.8/20.8) |
|
172 |
+
| | 4 | 55.4 | 45.5 | 27.6 | 36.9 | 60.0 | 93.0 | 81.3 | 59.7 | 47.9 | 50.7 | 37.4 | 62.3 | - |
|
173 |
+
| | 8 | 56.4 | 47.7 | 27.5 | 40.4 | 63.2 | 97.0 | 86.8 | 61.9 | 47.6 | 51.0 | - | 66.3 | - |
|
174 |
+
| | 16 | 57.0 | 48.4 | 27.9 | 42.6 | 67.4 | 99.7 | 89.4 | 64.5 | - | 50.9 | - | 67.8 | - |
|
175 |
+
| | 32 | 57.9 | 49.6 | 28.3 | 43.7 | 68.1 | 98.0 | 90.5 | 64.4 | - | 49.8 | - | 67.0 | - |
|
176 |
+
|
177 |
+
Imagenet Evaluation:
|
178 |
+
| Model | Shots | Imagenet |
|
179 |
+
|:-----------|--------:|-----------:|
|
180 |
+
| IDEFIX 80B | 16, 1k support set | 65.4 |
|
181 |
+
| | 16, RICES 5k support set | 72.9 |
|
182 |
+
<br>
|
183 |
+
| IDEFIX 9B | 16, 1k support set | 53.5 |
|
184 |
+
| | 16, RICES 5k support set | 64.5 |
|
185 |
+
|
186 |
+
Fairness Evaluations:
|
187 |
+
| Model | Shots | FairFaceGender (accuracy) | FairFaceRace (accuracy) | FairFaceAge (accuracy) |
|
188 |
+
|:-----------|--------:|----------------------------:|--------------------------:|-------------------------:|
|
189 |
+
| IDEFIX 80B | 0 | 95.8 | 64.1 | 51.0 |
|
190 |
+
| | 4 | 95.2 | 48.8 | 50.6 |
|
191 |
+
| | 8 | 95.5 | 52.3 | 53.1 |
|
192 |
+
| | 16 | 95.7 | 47.6 | 52.8 |
|
193 |
+
| | 32 | 95.7 | 36.5 | 51.2 |
|
194 |
+
<br>
|
195 |
+
| IDEFIX 9B | 0 | 94.4 | 55.3 | 45.1 |
|
196 |
+
| | 4 | 93.9 | 35.3 | 44.3 |
|
197 |
+
| | 8 | 95.4 | 44.7 | 46.0 |
|
198 |
+
| | 16 | 95.8 | 43.0 | 46.1 |
|
199 |
+
| | 32 | 96.1 | 35.1 | 44.9 |
|
200 |
|
201 |
We also report results where the priming samples are selected to be similar (i.e. close in a vector space) to the queried instance.
|
202 |
|