multimodal - a Hyeonmin Collection

Hyeonmin 's Collections

SSM

Linear

function calling

MoE

LLM

Emb

Vision

Others

Code

multimodal

updated 19 days ago

internlm/internlm-xcomposer2-vl-1_8b

Visual Question Answering • Updated Apr 9, 2024 • 212 • 17
openbmb/MiniCPM-V-2

Visual Question Answering • Updated 17 days ago • 6.27k • 442
llava-hf/llava-v1.6-mistral-7b-hf

Image-Text-to-Text • Updated 5 days ago • 373k • 250
Qwen/Qwen-VL-Chat

Text Generation • Updated Jan 25, 2024 • 25.7k • 350
Qwen/Qwen-VL

Text Generation • Updated Jan 25, 2024 • 36.7k • 224
openbmb/MiniCPM-Llama3-V-2_5

Image-Text-to-Text • Updated 17 days ago • 32.6k • 1.39k
microsoft/Phi-3-vision-128k-instruct

Text Generation • Updated Aug 20, 2024 • 145k • 944
OpenGVLab/InternVL2_5-78B

Image-Text-to-Text • Updated Dec 18, 2024 • 41.6k • 169
Qwen/Qwen2-VL-72B-Instruct

Image-Text-to-Text • Updated 20 days ago • 152k • 273
mistralai/Pixtral-12B-2409

Image-Text-to-Text • Updated Dec 26, 2024 • 593
llava-hf/llava-1.5-7b-hf

Image-Text-to-Text • Updated 5 days ago • 755k • 224
meta-llama/Llama-3.2-11B-Vision-Instruct

Image-Text-to-Text • Updated Dec 4, 2024 • 2.54M • • 1.27k