flow2023
's Collections
video mllm
updated
VideoAgent: Long-form Video Understanding with Large Language Model as
Agent
Paper
•
2403.10517
•
Published
•
32
VideoAgent: A Memory-augmented Multimodal Agent for Video Understanding
Paper
•
2403.11481
•
Published
•
12
VideoMamba: State Space Model for Efficient Video Understanding
Paper
•
2403.06977
•
Published
•
27
MovieLLM: Enhancing Long Video Understanding with AI-Generated Movies
Paper
•
2403.01422
•
Published
•
26
Video as the New Language for Real-World Decision Making
Paper
•
2402.17139
•
Published
•
18
VideoPrism: A Foundational Visual Encoder for Video Understanding
Paper
•
2402.13217
•
Published
•
23
Memory Consolidation Enables Long-Context Video Understanding
Paper
•
2402.05861
•
Published
•
8
InternVideo2: Scaling Video Foundation Models for Multimodal Video
Understanding
Paper
•
2403.15377
•
Published
•
22
VidLA: Video-Language Alignment at Scale
Paper
•
2403.14870
•
Published
•
12
MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video
Understanding
Paper
•
2404.05726
•
Published
•
21
Koala: Key frame-conditioned long video-LLM
Paper
•
2404.04346
•
Published
•
6
MiniGPT4-Video: Advancing Multimodal LLMs for Video Understanding with
Interleaved Visual-Textual Tokens
Paper
•
2404.03413
•
Published
•
25
Pegasus-v1 Technical Report
Paper
•
2404.14687
•
Published
•
30
PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video
Dense Captioning
Paper
•
2404.16994
•
Published
•
35
ShareGPT4Video: Improving Video Understanding and Generation with Better
Captions
Paper
•
2406.04325
•
Published
•
73
LARP: Tokenizing Videos with a Learned Autoregressive Generative Prior
Paper
•
2410.21264
•
Published
•
9