-
ViewDiff: 3D-Consistent Image Generation with Text-to-Image Models
Paper • 2403.01807 • Published • 8 -
TripoSR: Fast 3D Object Reconstruction from a Single Image
Paper • 2403.02151 • Published • 13 -
OOTDiffusion: Outfitting Fusion based Latent Diffusion for Controllable Virtual Try-on
Paper • 2403.01779 • Published • 29 -
MagicClay: Sculpting Meshes With Generative Neural Fields
Paper • 2403.02460 • Published • 8
Collections
Discover the best community collections!
Collections including paper arxiv:2404.13026
-
Seamless Human Motion Composition with Blended Positional Encodings
Paper • 2402.15509 • Published • 14 -
TripoSR: Fast 3D Object Reconstruction from a Single Image
Paper • 2403.02151 • Published • 13 -
3D-VLA: A 3D Vision-Language-Action Generative World Model
Paper • 2403.09631 • Published • 8 -
Controllable Text-to-3D Generation via Surface-Aligned Gaussian Splatting
Paper • 2403.09981 • Published • 7
-
Motion-I2V: Consistent and Controllable Image-to-Video Generation with Explicit Motion Modeling
Paper • 2401.15977 • Published • 38 -
Lumiere: A Space-Time Diffusion Model for Video Generation
Paper • 2401.12945 • Published • 86 -
AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning
Paper • 2307.04725 • Published • 64 -
Boximator: Generating Rich and Controllable Motions for Video Synthesis
Paper • 2402.01566 • Published • 27
-
GPT-4V(ision) is a Human-Aligned Evaluator for Text-to-3D Generation
Paper • 2401.04092 • Published • 21 -
AToM: Amortized Text-to-Mesh using 2D Diffusion
Paper • 2402.00867 • Published • 11 -
Advances in 3D Generation: A Survey
Paper • 2401.17807 • Published • 19 -
SceneVerse: Scaling 3D Vision-Language Learning for Grounded Scene Understanding
Paper • 2401.09340 • Published • 20
-
TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones
Paper • 2312.16862 • Published • 31 -
Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision, Language, Audio, and Action
Paper • 2312.17172 • Published • 28 -
Towards Truly Zero-shot Compositional Visual Reasoning with LLMs as Programmers
Paper • 2401.01974 • Published • 7 -
From Audio to Photoreal Embodiment: Synthesizing Humans in Conversations
Paper • 2401.01885 • Published • 28
-
One-for-All: Generalized LoRA for Parameter-Efficient Fine-tuning
Paper • 2306.07967 • Published • 24 -
Rerender A Video: Zero-Shot Text-Guided Video-to-Video Translation
Paper • 2306.07954 • Published • 112 -
TryOnDiffusion: A Tale of Two UNets
Paper • 2306.08276 • Published • 72 -
Seeing the World through Your Eyes
Paper • 2306.09348 • Published • 33