File size: 3,638 Bytes
09c93bf
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
Model,Open Source,Text Recognition,Text Referring,Text Spotting,Relation Extraction,Element Parsing,Mathematical Calculation,Visual Text Understanding,Knowledge Reasoning,Average Score,Link
Gemini-Pro,No,61.2,39.5,13.5,79.3,39.2,47.7,75.5,59.3,51.9,https://arxiv.org/abs/2312.11805
Qwen2-VL-8B,Yes,72.1,47.9,17.5,82.5,25.5,25.4,78.4,61.5,51.4,https://arxiv.org/abs/2409.12191
InternVL2.5-26B,Yes,65.6,26.1,1.6,86.9,36.2,37.4,78.3,62.9,49.4,https://arxiv.org/abs/2412.05271
GPT-4V,No,69.7,26.9,0.3,75.6,36.7,42.9,71.5,57.9,47.7,https://openai.com/index/gpt-4v-system-card/
InternVL2-26B,Yes,63.4,26.1,0,76.8,37.8,32.3,79.4,58.9,46.8,https://arxiv.org/abs/2312.14238
Step-1V,No,67.8,31.3,7.2,73.6,37.2,27.8,69.8,58.6,46.7,https://www.stepfun.com/#step1v
GPT-4o,No,61.2,26.7,0,77.5,36.3,43.4,71.1,55.5,46.5,https://arxiv.org/abs/2303.08774
Claude3.5-sonnet,No,62.2,28.4,1.3,56.6,37.8,40.8,73.5,60.9,45.2,https://www.anthropic.com/news/claude-3-5-sonnet
InternVL2.5-8B,Yes,59,25,1.4,77.5,35.1,29.4,75.3,57.2,45,https://arxiv.org/abs/2412.05271
GPT-4o-mini,No,57.9,23.3,0.6,70.8,31.5,38.8,65.9,55.1,43,https://openai.com/index/gpt-4o-mini-advancing-cost-efficient-intelligence/
GLM-4V-Plus,No,60.3,25.2,0,74.7,37.6,26.4,61.4,57.2,42.9,https://arxiv.org/abs/2406.12793
GLM-4V-9B,Yes,61.8,22.6,0,71.7,31.6,22.6,72.1,58.4,42.6,https://arxiv.org/abs/2406.12793
Ovis1.6-3B,Yes,59.2,14.3,0,65,32.1,29,69.8,56.8,40.8,https://arxiv.org/abs/2405.20797
MiniCPM-V-2.6,Yes,66.8,6,0.8,62,28.8,32.4,73.7,52.1,40.3,https://arxiv.org/abs/2408.01800
Pixtral-12B,Yes,48.9,21.6,0,66.3,35.5,29.8,66.9,53.7,40.3,https://arxiv.org/abs/2410.07073
InternVL2-8B,Yes,49.9,23.1,0.5,65.2,24.8,26.7,73.5,52.9,39.6,https://arxiv.org/abs/2312.14238
LLaVA-OV-7B,Yes,46,20.8,0.1,58.3,25.3,23.3,64.4,53,36.4,https://arxiv.org/abs/2408.03326
Cambrian-1-8B,Yes,45.3,21.5,0,53.6,19.2,19.5,63.5,55.5,34.7,https://arxiv.org/abs/2406.16860
Molmo-7B,Yes,52.4,21.3,0.1,45.5,7.6,28.5,65.3,55,34.5,https://arxiv.org/abs/2409.17146
Idefics3-8B,Yes,23.8,13.2,0,63.2,23.8,23,65.8,44.9,32.2,https://arxiv.org/abs/2408.12637
LLaVA-Next-8B,Yes,41.3,18.8,0,49.5,21.2,17.3,55.2,48.9,31.5,https://github.com/Darren-greenhand/LLaVA-Next
XComposer2-4KHD,Yes,45.1,21.8,0.1,15.9,11.7,15.7,66.8,45.9,27.9,https://arxiv.org/abs/2404.06512
Eagle-X5-7B,Yes,34.7,17.8,0,21.7,20.6,21.5,61,42.6,27.5,https://arxiv.org/abs/2408.15998
Deepseek-VL-7B,Yes,37.1,15.4,0,23.5,14.6,20.8,53.3,52.9,27.2,https://arxiv.org/abs/2403.05525
mPLUG-Owl3,Yes,41.6,14,0.6,24.4,10.9,11.1,52.2,46,25.1,https://arxiv.org/abs/2408.04840
TextMonkey,Yes,39.1,0.7,0,19,12.2,19,61.1,40.2,23.9,https://arxiv.org/abs/2403.04473
VILA1.5-8B,Yes,35.3,15.5,0,21.1,12.7,17.3,46.3,40.3,23.6,https://arxiv.org/abs/2412.04468
Qwen-VL-chat,Yes,34.5,4.1,0,25.9,14,13.8,55.7,39.5,23.4,https://arxiv.org/abs/2308.12966
Qwen-VL,Yes,34.6,7.5,0,18.2,20,8.1,57.2,41.1,23.3,https://arxiv.org/abs/2308.12966
Monkey,Yes,35.2,0,0,16.6,16.3,14.4,59.8,42.3,23.1,https://arxiv.org/abs/2311.06607
CogVLM-chat,Yes,50.9,0,0,0.2,8.4,15,58.1,41.7,21.8,https://arxiv.org/abs/2311.03079
DocOwl2,Yes,24,9.7,0,13.4,13.5,8.8,53.7,32,19.4,https://arxiv.org/abs/2409.03420
EMU2-chat,Yes,42.1,0.2,0,12.5,8.1,11.2,42.7,33.4,18.8,https://arxiv.org/abs/2312.13286
Janus-1.3B,Yes,46.1,0,0,0.2,14.5,13.5,36,39.1,18.7,https://arxiv.org/abs/2410.13848
Yi-VL-6B,Yes,28.9,2.9,0,9.7,12.9,15.8,36.1,32,17.3,https://arxiv.org/abs/2403.04652
TextHarmony,Yes,25.8,2.5,0,1.8,8.5,10.4,46.1,33.1,16,https://arxiv.org/abs/2407.16364
LLaVAR,Yes,37.3,0,0,1,9.9,12.3,34.6,27,15.3,https://arxiv.org/abs/2306.17107
UReader,Yes,22.4,0.1,0,0,9.2,7.9,41,29.1,13.7,https://arxiv.org/abs/2310.05126