-
LITA: Language Instructed Temporal-Localization Assistant
Paper • 2403.19046 • Published • 19 -
Snap-it, Tap-it, Splat-it: Tactile-Informed 3D Gaussian Splatting for Reconstructing Challenging Surfaces
Paper • 2403.20275 • Published • 9 -
Condition-Aware Neural Network for Controlled Image Generation
Paper • 2404.01143 • Published • 12 -
CantTalkAboutThis: Aligning Language Models to Stay on Topic in Dialogues
Paper • 2404.03820 • Published • 25
Collections
Discover the best community collections!
Collections including paper arxiv:2403.19046
-
Video Mamba Suite: State Space Model as a Versatile Alternative for Video Understanding
Paper • 2403.09626 • Published • 14 -
VideoAgent: Long-form Video Understanding with Large Language Model as Agent
Paper • 2403.10517 • Published • 33 -
VSTAR: Generative Temporal Nursing for Longer Dynamic Video Synthesis
Paper • 2403.13501 • Published • 9 -
LITA: Language Instructed Temporal-Localization Assistant
Paper • 2403.19046 • Published • 19
-
Video as the New Language for Real-World Decision Making
Paper • 2402.17139 • Published • 19 -
Learning and Leveraging World Models in Visual Representation Learning
Paper • 2403.00504 • Published • 32 -
MovieLLM: Enhancing Long Video Understanding with AI-Generated Movies
Paper • 2403.01422 • Published • 27 -
VideoElevator: Elevating Video Generation Quality with Versatile Text-to-Image Diffusion Models
Paper • 2403.05438 • Published • 19