AV-Odyssey Bench: Can Your Multimodal LLMs Really Understand Audio-Visual Information? Paper โข 2412.02611 โข Published Dec 3, 2024 โข 23
VisRAG: Vision-based Retrieval-augmented Generation on Multi-modality Documents Paper โข 2410.10594 โข Published Oct 14, 2024 โข 24
VisRAG: Vision-based Retrieval-augmented Generation on Multi-modality Documents Paper โข 2410.10594 โข Published Oct 14, 2024 โข 24
Enhancing Chat Language Models by Scaling High-quality Instructional Conversations Paper โข 2305.14233 โข Published May 23, 2023 โข 6
VisRAG: Vision-based Retrieval-augmented Generation on Multi-modality Documents Paper โข 2410.10594 โข Published Oct 14, 2024 โข 24
Won't Get Fooled Again: Answering Questions with False Premises Paper โข 2307.02394 โข Published Jul 5, 2023
RAGEval: Scenario Specific RAG Evaluation Dataset Generation Framework Paper โข 2408.01262 โข Published Aug 2, 2024 โข 1
GUICourse: From General Vision Language Models to Versatile GUI Agents Paper โข 2406.11317 โข Published Jun 17, 2024 โข 1
view post Post 2487 Introducing GUICourse! ๐ By leveraging extensive OCR pretraining with grounding ability, we unlock the potential of parsing-free methods for GUIAgent. ๐ Paper: ( GUICourse: From General Vision Language Models to Versatile GUI Agents (2406.11317))๐ Github Repo: (https://github.com/yiye3/GUICourse)๐ Dataset: ( yiye2023/GUIAct) / ( yiye2023/GUIChat) / ( yiye2023/GUIEnv)๐ฏ Model: ( RhapsodyAI/minicpm-guidance) / ( RhapsodyAI/qwen_vl_guidance) 16 replies ยท ๐ฅ 5 5 ๐ 4 4 ๐ 2 2 + Reply
SEED-Bench-2: Benchmarking Multimodal Large Language Models Paper โข 2311.17092 โข Published Nov 28, 2023
SEED-Bench-2-Plus: Benchmarking Multimodal Large Language Models with Text-Rich Visual Comprehension Paper โข 2404.16790 โข Published Apr 25, 2024 โข 7