Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2311.16502

A Picture is Worth More Than 77 Text Tokens: Evaluating CLIP-Style Models on Dense Captions

Paper • 2312.08578 • Published Dec 14, 2023 • 17
ZeroQuant(4+2): Redefining LLMs Quantization with a New FP6-Centric Strategy for Diverse Generative Tasks

Paper • 2312.08583 • Published Dec 14, 2023 • 9
Vision-Language Models as a Source of Rewards

Paper • 2312.09187 • Published Dec 14, 2023 • 12
StemGen: A music generation model that listens

Paper • 2312.08723 • Published Dec 14, 2023 • 48

Top Vision-Language Papers 🖼️💬📝

A curated list of papers on vision-language models, with the most influential ones at the top.

Improved Baselines with Visual Instruction Tuning

Paper • 2310.03744 • Published Oct 5, 2023 • 37
DeepSeek-VL: Towards Real-World Vision-Language Understanding

Paper • 2403.05525 • Published Mar 8, 2024 • 43
Qwen-VL: A Frontier Large Vision-Language Model with Versatile Abilities

Paper • 2308.12966 • Published Aug 24, 2023 • 8
LLaVA-Gemma: Accelerating Multimodal Foundation Models with a Compact Language Model

Paper • 2404.01331 • Published Mar 29, 2024 • 25

Multimodal Language Model Benchmarks

Multimodal benchmarks that test various aspects of LLMs, VLMs, LMMs

Running

3

🏆

Multimodal Clembench
Running

82

🏆

SEED-Bench Leaderboard
MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI

Paper • 2311.16502 • Published Nov 27, 2023 • 35
MMMU-Pro: A More Robust Multi-discipline Multimodal Understanding Benchmark

Paper • 2409.02813 • Published Sep 4, 2024 • 29

Vision Language Models

BLINK: Multimodal Large Language Models Can See but Not Perceive

Paper • 2404.12390 • Published Apr 18, 2024 • 26
TextSquare: Scaling up Text-Centric Visual Instruction Tuning

Paper • 2404.12803 • Published Apr 19, 2024 • 30
Groma: Localized Visual Tokenization for Grounding Multimodal Large Language Models

Paper • 2404.13013 • Published Apr 19, 2024 • 31
InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD

Paper • 2404.06512 • Published Apr 9, 2024 • 30

GAIA: a benchmark for General AI Assistants

Paper • 2311.12983 • Published Nov 21, 2023 • 189
MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI

Paper • 2311.16502 • Published Nov 27, 2023 • 35
BLINK: Multimodal Large Language Models Can See but Not Perceive

Paper • 2404.12390 • Published Apr 18, 2024 • 26
RULER: What's the Real Context Size of Your Long-Context Language Models?

Paper • 2404.06654 • Published Apr 9, 2024 • 35

Multimodal Papers

Woodpecker: Hallucination Correction for Multimodal Large Language Models

Paper • 2310.16045 • Published Oct 24, 2023 • 16
SILC: Improving Vision Language Pretraining with Self-Distillation

Paper • 2310.13355 • Published Oct 20, 2023 • 9
To See is to Believe: Prompting GPT-4V for Better Visual Instruction Tuning

Paper • 2311.07574 • Published Nov 13, 2023 • 15
MyVLM: Personalizing VLMs for User-Specific Queries

Paper • 2403.14599 • Published Mar 21, 2024 • 16

Vision Language Models Papers 🖼️💬📝

Papers about vision-language models, most important ones are on top of the list.

Improved Baselines with Visual Instruction Tuning

Paper • 2310.03744 • Published Oct 5, 2023 • 37
DeepSeek-VL: Towards Real-World Vision-Language Understanding

Paper • 2403.05525 • Published Mar 8, 2024 • 43
Qwen-VL: A Frontier Large Vision-Language Model with Versatile Abilities

Paper • 2308.12966 • Published Aug 24, 2023 • 8
LLaVA-Gemma: Accelerating Multimodal Foundation Models with a Compact Language Model

Paper • 2404.01331 • Published Mar 29, 2024 • 25

CRUXEval: A Benchmark for Code Reasoning, Understanding and Execution

Paper • 2401.03065 • Published Jan 5, 2024 • 11
Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of Large Language Models for Code Generation

Paper • 2305.01210 • Published May 2, 2023 • 4
AGIBench: A Multi-granularity, Multimodal, Human-referenced, Auto-scoring Benchmark for Large Language Models

Paper • 2309.06495 • Published Sep 5, 2023 • 1
MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI

Paper • 2311.16502 • Published Nov 27, 2023 • 35

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs