vision language models
updated
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art
Multimodal Models
Paper
•
2409.17146
•
Published
•
106
Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at
Any Resolution
Paper
•
2409.12191
•
Published
•
76
mistralai/Pixtral-12B-2409
Image-Text-to-Text
•
Updated
•
583
HuggingFaceTB/SmolVLM-Instruct
Image-Text-to-Text
•
Updated
•
47.4k
•
330
showlab/ShowUI-2B
Updated
•
8.97k
•
222
microsoft/Phi-3-vision-128k-instruct
Text Generation
•
Updated
•
98.1k
•
945
mtgv/MobileVLM_V2-1.7B
Text Generation
•
Updated
•
1.7k
•
24
mtgv/MobileVLM_V2-3B
Text Generation
•
Updated
•
132
•
7
xtuner/llava-phi-3-mini
Image-Text-to-Text
•
Updated
•
45
•
24
rhymes-ai/Aria
Image-Text-to-Text
•
Updated
•
29.5k
•
608
THUDM/glm-edge-v-2b
Image-Text-to-Text
•
Updated
•
39.1k
•
8
THUDM/glm-edge-v-5b
Image-Text-to-Text
•
Updated
•
130
•
11
h2oai/h2ovl-mississippi-2b
Text Generation
•
Updated
•
10.6k
•
27
google/paligemma2-3b-pt-448
Image-Text-to-Text
•
Updated
•
10.9k
•
40