-
Math-LLaVA: Bootstrapping Mathematical Reasoning for Multimodal Large Language Models
Paper • 2406.17294 • Published • 11 -
OMG-LLaVA: Bridging Image-level, Object-level, Pixel-level Reasoning and Understanding
Paper • 2406.19389 • Published • 53 -
EVF-SAM: Early Vision-Language Fusion for Text-Prompted Segment Anything Model
Paper • 2406.20076 • Published • 9 -
PicoAudio: Enabling Precise Timestamp and Frequency Controllability of Audio Events in Text-to-audio Generation
Paper • 2407.02869 • Published • 18
Collections
Discover the best community collections!
Collections including paper arxiv:2406.11832
-
PaliGemma: A versatile 3B VLM for transfer
Paper • 2407.07726 • Published • 68 -
Vision language models are blind
Paper • 2407.06581 • Published • 83 -
CosmoCLIP: Generalizing Large Vision-Language Models for Astronomical Imaging
Paper • 2407.07315 • Published • 6 -
Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision
Paper • 2407.06189 • Published • 26
-
OMG-LLaVA: Bridging Image-level, Object-level, Pixel-level Reasoning and Understanding
Paper • 2406.19389 • Published • 53 -
The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale
Paper • 2406.17557 • Published • 90 -
RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in LLMs
Paper • 2407.02485 • Published • 5 -
Summary of a Haystack: A Challenge to Long-Context LLMs and RAG Systems
Paper • 2407.01370 • Published • 86
-
Math-LLaVA: Bootstrapping Mathematical Reasoning for Multimodal Large Language Models
Paper • 2406.17294 • Published • 11 -
TokenPacker: Efficient Visual Projector for Multimodal LLM
Paper • 2407.02392 • Published • 21 -
Understanding Alignment in Multimodal LLMs: A Comprehensive Study
Paper • 2407.02477 • Published • 22 -
InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output
Paper • 2407.03320 • Published • 93
-
Octo-planner: On-device Language Model for Planner-Action Agents
Paper • 2406.18082 • Published • 48 -
Adaptable Logical Control for Large Language Models
Paper • 2406.13892 • Published • 1 -
SeaKR: Self-aware Knowledge Retrieval for Adaptive Retrieval Augmented Generation
Paper • 2406.19215 • Published • 30 -
HippoRAG: Neurobiologically Inspired Long-Term Memory for Large Language Models
Paper • 2405.14831 • Published • 3