Improve Vision Language Model Chain-of-thought Reasoning Paper • 2410.16198 • Published Oct 21, 2024 • 22
SOTOPIA: Interactive Evaluation for Social Intelligence in Language Agents Paper • 2310.11667 • Published Oct 18, 2023 • 3
A Self-enhancement Approach for Domain-specific Chatbot Training via Knowledge Mining and Digest Paper • 2311.10614 • Published Nov 17, 2023
Direct Preference Optimization of Video Large Multimodal Models from Language Model Reward Paper • 2404.01258 • Published Apr 1, 2024 • 11
MMBench: Is Your Multi-modal Model an All-around Player? Paper • 2307.06281 • Published Jul 12, 2023 • 5
VBench: Comprehensive Benchmark Suite for Video Generative Models Paper • 2311.17982 • Published Nov 29, 2023 • 7
Direct Preference Optimization of Video Large Multimodal Models from Language Model Reward Paper • 2404.01258 • Published Apr 1, 2024 • 11
Bamboo: Building Mega-Scale Vision Dataset Continually with Human-Machine Synergy Paper • 2203.07845 • Published Mar 15, 2022
LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models Paper • 2407.12772 • Published Jul 17, 2024 • 34
LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models Paper • 2407.07895 • Published Jul 10, 2024 • 40