Unifying Specialized Visual Encoders for Video Language Models Paper • 2501.01426 • Published 19 days ago • 21
Probabilistic Conceptual Explainers: Trustworthy Conceptual Explanations for Vision Foundation Models Paper • 2406.12649 • Published Jun 18, 2024 • 16
Multimodal Needle in a Haystack: Benchmarking Long-Context Capability of Multimodal Large Language Models Paper • 2406.11230 • Published Jun 17, 2024 • 34