VLsI: Verbalized Layers-to-Interactions from Large to Small Vision Language Models
Paper
•
2412.01822
•
Published
•
14
Deep learning and machine learning on computer vision and multimedia, Multimodal deep learning, Integrating vision, speech, and language for AI, Multimodal object and motion detection/recognition, Inclusive human machine teaming, Analysis for competency, interpretability, memorability, and robustness of deep learning model, Multimodal prompt with large scale model