LAVE: Zero-shot VQA Evaluation on Docmatix with LLMs - Do We Still Need Fine-Tuning? Jul 25, 2024 • 18
DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding Paper • 2412.10302 • Published 21 days ago • 11
Apollo: An Exploration of Video Understanding in Large Multimodal Models Paper • 2412.10360 • Published 21 days ago • 136
view article Article Powerful ASR + diarization + speculative decoding with Hugging Face Inference Endpoints May 1, 2024 • 69
view article Article A failed experiment: Infini-Attention, and why we should keep trying? Aug 14, 2024 • 56
view article Article DEMO: French Spoken Language Understanding with the new speech resources from NAVER LABS Europe By mzboito • Aug 28, 2024 • 9
view article Article Deep Learning over the Internet: Training Language Models Collaboratively Jul 15, 2021 • 4
Building and better understanding vision-language models: insights and future directions Paper • 2408.12637 • Published Aug 22, 2024 • 124