-
Compose and Conquer: Diffusion-Based 3D Depth Aware Composable Image Synthesis
Paper • 2401.09048 • Published • 10 -
Improving fine-grained understanding in image-text pre-training
Paper • 2401.09865 • Published • 17 -
Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data
Paper • 2401.10891 • Published • 60 -
Scaling Up to Excellence: Practicing Model Scaling for Photo-Realistic Image Restoration In the Wild
Paper • 2401.13627 • Published • 74
Collections
Discover the best community collections!
Collections including paper arxiv:2405.01434
-
Be-Your-Outpainter: Mastering Video Outpainting through Input-Specific Adaptation
Paper • 2403.13745 • Published • 11 -
StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation
Paper • 2405.01434 • Published • 54 -
KAN: Kolmogorov-Arnold Networks
Paper • 2404.19756 • Published • 109
-
MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training
Paper • 2311.17049 • Published • 1 -
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
Paper • 2405.04434 • Published • 17 -
A Study of Autoregressive Decoders for Multi-Tasking in Computer Vision
Paper • 2303.17376 • Published -
Sigmoid Loss for Language Image Pre-Training
Paper • 2303.15343 • Published • 6
-
Mora: Enabling Generalist Video Generation via A Multi-Agent Framework
Paper • 2403.13248 • Published • 78 -
Efficient Video Diffusion Models via Content-Frame Motion-Latent Decomposition
Paper • 2403.14148 • Published • 19 -
StreamingT2V: Consistent, Dynamic, and Extendable Long Video Generation from Text
Paper • 2403.14773 • Published • 10 -
StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation
Paper • 2405.01434 • Published • 54
-
Is Sora a World Simulator? A Comprehensive Survey on General World Models and Beyond
Paper • 2405.03520 • Published • 1 -
StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation
Paper • 2405.01434 • Published • 54 -
Data-centric Artificial Intelligence: A Survey
Paper • 2303.10158 • Published • 1
-
StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation
Paper • 2405.01434 • Published • 54 -
TransPixar: Advancing Text-to-Video Generation with Transparency
Paper • 2501.03006 • Published • 23 -
CPA: Camera-pose-awareness Diffusion Transformer for Video Generation
Paper • 2412.01429 • Published -
Ingredients: Blending Custom Photos with Video Diffusion Transformers
Paper • 2501.01790 • Published • 8
-
Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models
Paper • 2405.01535 • Published • 121 -
StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation
Paper • 2405.01434 • Published • 54 -
WildChat: 1M ChatGPT Interaction Logs in the Wild
Paper • 2405.01470 • Published • 62 -
A Careful Examination of Large Language Model Performance on Grade School Arithmetic
Paper • 2405.00332 • Published • 32