Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2311.06783

Mementos: A Comprehensive Benchmark for Multimodal Large Language Model Reasoning over Image Sequences

Paper • 2401.10529 • Published Jan 19, 2024 • 1
ShareGPT4V: Improving Large Multi-Modal Models with Better Captions

Paper • 2311.12793 • Published Nov 21, 2023 • 18
Q-Instruct: Improving Low-level Visual Abilities for Multi-modality Foundation Models

Paper • 2311.06783 • Published Nov 12, 2023 • 27
SVIT: Scaling up Visual Instruction Tuning

Paper • 2307.04087 • Published Jul 9, 2023 • 6

Q-Instruct: Improving Low-level Visual Abilities for Multi-modality Foundation Models

Paper • 2311.06783 • Published Nov 12, 2023 • 27
To See is to Believe: Prompting GPT-4V for Better Visual Instruction Tuning

Paper • 2311.07574 • Published Nov 13, 2023 • 15
Let's Go Shopping (LGS) -- Web-Scale Image-Text Dataset for Visual Concept Understanding

Paper • 2401.04575 • Published Jan 9, 2024 • 15
Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research

Paper • 2402.00159 • Published Jan 31, 2024 • 62

LayoutPrompter: Awaken the Design Ability of Large Language Models

Paper • 2311.06495 • Published Nov 11, 2023 • 11
Q-Instruct: Improving Low-level Visual Abilities for Multi-modality Foundation Models

Paper • 2311.06783 • Published Nov 12, 2023 • 27
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents

Paper • 2311.05437 • Published Nov 9, 2023 • 49
TEAL: Tokenize and Embed ALL for Multi-modal Large Language Models

Paper • 2311.04589 • Published Nov 8, 2023 • 19

Q-Instruct: Improving Low-level Visual Abilities for Multi-modality Foundation Models

Paper • 2311.06783 • Published Nov 12, 2023 • 27
The Impact of Large Language Models on Scientific Discovery: a Preliminary Study using GPT-4

Paper • 2311.07361 • Published Nov 13, 2023 • 13
GAIA: a benchmark for General AI Assistants

Paper • 2311.12983 • Published Nov 21, 2023 • 188
teknium/openhermes

Viewer • Updated Sep 7, 2023 • 243k • 421 • 205

Generative Multiple Modality

Random Field Augmentations for Self-Supervised Representation Learning

Paper • 2311.03629 • Published Nov 7, 2023 • 7
TEAL: Tokenize and Embed ALL for Multi-modal Large Language Models

Paper • 2311.04589 • Published Nov 8, 2023 • 19
GENOME: GenerativE Neuro-symbOlic visual reasoning by growing and reusing ModulEs

Paper • 2311.04901 • Published Nov 8, 2023 • 8
Q-Instruct: Improving Low-level Visual Abilities for Multi-modality Foundation Models

Paper • 2311.06783 • Published Nov 12, 2023 • 27

OmnimatteRF: Robust Omnimatte with 3D Background Modeling

Paper • 2309.07749 • Published Sep 14, 2023 • 7
AudioSR: Versatile Audio Super-resolution at Scale

Paper • 2309.07314 • Published Sep 13, 2023 • 26
Generative Image Dynamics

Paper • 2309.07906 • Published Sep 14, 2023 • 53
MagiCapture: High-Resolution Multi-Concept Portrait Customization

Paper • 2309.06895 • Published Sep 13, 2023 • 27

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs