Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2406.11832

Math-LLaVA: Bootstrapping Mathematical Reasoning for Multimodal Large Language Models

Paper • 2406.17294 • Published Jun 25, 2024 • 11
OMG-LLaVA: Bridging Image-level, Object-level, Pixel-level Reasoning and Understanding

Paper • 2406.19389 • Published Jun 27, 2024 • 53
EVF-SAM: Early Vision-Language Fusion for Text-Prompted Segment Anything Model

Paper • 2406.20076 • Published Jun 28, 2024 • 9
PicoAudio: Enabling Precise Timestamp and Frequency Controllability of Audio Events in Text-to-audio Generation

Paper • 2407.02869 • Published Jul 3, 2024 • 18

vision language models (VLM)

PaliGemma: A versatile 3B VLM for transfer

Paper • 2407.07726 • Published Jul 10, 2024 • 68
Vision language models are blind

Paper • 2407.06581 • Published Jul 9, 2024 • 83
CosmoCLIP: Generalizing Large Vision-Language Models for Astronomical Imaging

Paper • 2407.07315 • Published Jul 10, 2024 • 6
Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision

Paper • 2407.06189 • Published Jul 8, 2024 • 26

Papers - Multimodal - Training - LLM Guided Pre-training

Unveiling Encoder-Free Vision-Language Models

Paper • 2406.11832 • Published Jun 17, 2024 • 51

Papers - Multimodal - Training - Loss - Cross Entropy

Unveiling Encoder-Free Vision-Language Models

Paper • 2406.11832 • Published Jun 17, 2024 • 51

Papers - Attention - Decoder Only

Unveiling Encoder-Free Vision-Language Models

Paper • 2406.11832 • Published Jun 17, 2024 • 51

Papers - Multimodal - Training - Patch Aligning Layer

Unveiling Encoder-Free Vision-Language Models

Paper • 2406.11832 • Published Jun 17, 2024 • 51

Papers - Multimodal - Training - Decoder Only

Unveiling Encoder-Free Vision-Language Models

Paper • 2406.11832 • Published Jun 17, 2024 • 51

OMG-LLaVA: Bridging Image-level, Object-level, Pixel-level Reasoning and Understanding

Paper • 2406.19389 • Published Jun 27, 2024 • 53
The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale

Paper • 2406.17557 • Published Jun 25, 2024 • 90
RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in LLMs

Paper • 2407.02485 • Published Jul 2, 2024 • 5
Summary of a Haystack: A Challenge to Long-Context LLMs and RAG Systems

Paper • 2407.01370 • Published Jul 1, 2024 • 86

Math-LLaVA: Bootstrapping Mathematical Reasoning for Multimodal Large Language Models

Paper • 2406.17294 • Published Jun 25, 2024 • 11
TokenPacker: Efficient Visual Projector for Multimodal LLM

Paper • 2407.02392 • Published Jul 2, 2024 • 21
Understanding Alignment in Multimodal LLMs: A Comprehensive Study

Paper • 2407.02477 • Published Jul 2, 2024 • 22
InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output

Paper • 2407.03320 • Published Jul 3, 2024 • 93

Planning-with-LLM

Octo-planner: On-device Language Model for Planner-Action Agents

Paper • 2406.18082 • Published Jun 26, 2024 • 48
Adaptable Logical Control for Large Language Models

Paper • 2406.13892 • Published Jun 19, 2024 • 1
SeaKR: Self-aware Knowledge Retrieval for Adaptive Retrieval Augmented Generation

Paper • 2406.19215 • Published Jun 27, 2024 • 30
HippoRAG: Neurobiologically Inspired Long-Term Memory for Large Language Models

Paper • 2405.14831 • Published May 23, 2024 • 3

Previous
1
2
3
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs