Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2403.19046

Papers - Nvidia

LITA: Language Instructed Temporal-Localization Assistant

Paper • 2403.19046 • Published Mar 27, 2024 • 19
Snap-it, Tap-it, Splat-it: Tactile-Informed 3D Gaussian Splatting for Reconstructing Challenging Surfaces

Paper • 2403.20275 • Published Mar 29, 2024 • 9
Condition-Aware Neural Network for Controlled Image Generation

Paper • 2404.01143 • Published Apr 1, 2024 • 12
CantTalkAboutThis: Aligning Language Models to Stay on Topic in Dialogues

Paper • 2404.03820 • Published Apr 4, 2024 • 25

Papers - Video - Training - Understanding Time

LITA: Language Instructed Temporal-Localization Assistant

Paper • 2403.19046 • Published Mar 27, 2024 • 19

Papers - Video - Encoders

LITA: Language Instructed Temporal-Localization Assistant

Paper • 2403.19046 • Published Mar 27, 2024 • 19
Zero-shot Prompt-based Video Encoder for Surgical Gesture Recognition

Paper • 2403.19786 • Published Mar 28, 2024 • 2

Papers - Video - Reasoning - Time of Events

LITA: Language Instructed Temporal-Localization Assistant

Paper • 2403.19046 • Published Mar 27, 2024 • 19

Vision Language Model

Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models

Paper • 2403.18814 • Published Mar 27, 2024 • 47
LITA: Language Instructed Temporal-Localization Assistant

Paper • 2403.19046 • Published Mar 27, 2024 • 19

Papers - Video - Understanding

Video Mamba Suite: State Space Model as a Versatile Alternative for Video Understanding

Paper • 2403.09626 • Published Mar 14, 2024 • 14
VideoAgent: Long-form Video Understanding with Large Language Model as Agent

Paper • 2403.10517 • Published Mar 15, 2024 • 33
VSTAR: Generative Temporal Nursing for Longer Dynamic Video Synthesis

Paper • 2403.13501 • Published Mar 20, 2024 • 9
LITA: Language Instructed Temporal-Localization Assistant

Paper • 2403.19046 • Published Mar 27, 2024 • 19

Video as the New Language for Real-World Decision Making

Paper • 2402.17139 • Published Feb 27, 2024 • 19
Learning and Leveraging World Models in Visual Representation Learning

Paper • 2403.00504 • Published Mar 1, 2024 • 32
MovieLLM: Enhancing Long Video Understanding with AI-Generated Movies

Paper • 2403.01422 • Published Mar 3, 2024 • 27
VideoElevator: Elevating Video Generation Quality with Versatile Text-to-Image Diffusion Models

Paper • 2403.05438 • Published Mar 8, 2024 • 19

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs