arxiv:2412.01822
Yong Man Ro
dwightro
AI & ML interests
Multimodal Deep learning, Integrating Vision, Speech, and Language for AI, Multimodal Object and Motion Detection/recognition, Inclusive Human Multimodal Conversation, Analysis for Competency, Interpretability, Memorability, and Robustness of Deep learning Model,
Multimodal Prompt with Large scale model, Computer Vision and Multimedia
Recent Activity
authored
a paper
about 1 month ago
VLsI: Verbalized Layers-to-Interactions from Large to Small Vision
Language Models
authored
a paper
about 1 month ago
Look Every Frame All at Once: Video-Ma$^2$mba for Efficient Long-form
Video Understanding with Multi-Axis Gradient Checkpointing
Organizations
Papers
14
models
None public yet
datasets
None public yet