-
PRDP: Proximal Reward Difference Prediction for Large-Scale Reward Finetuning of Diffusion Models
Paper • 2402.08714 • Published • 12 -
Data Engineering for Scaling Language Models to 128K Context
Paper • 2402.10171 • Published • 24 -
RLVF: Learning from Verbal Feedback without Overgeneralization
Paper • 2402.10893 • Published • 11 -
Coercing LLMs to do and reveal (almost) anything
Paper • 2402.14020 • Published • 13
Collections
Discover the best community collections!
Collections including paper arxiv:2404.09967
-
MagicTime: Time-lapse Video Generation Models as Metamorphic Simulators
Paper • 2404.05014 • Published • 33 -
Ctrl-Adapter: An Efficient and Versatile Framework for Adapting Diverse Controls to Any Diffusion Model
Paper • 2404.09967 • Published • 21 -
Scaling (Down) CLIP: A Comprehensive Analysis of Data, Architecture, and Training Strategies
Paper • 2404.08197 • Published • 28
-
Kandinsky: an Improved Text-to-Image Synthesis with Image Prior and Latent Diffusion
Paper • 2310.03502 • Published • 78 -
Transferable and Principled Efficiency for Open-Vocabulary Segmentation
Paper • 2404.07448 • Published • 12 -
Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models
Paper • 2404.07973 • Published • 31 -
COCONut: Modernizing COCO Segmentation
Paper • 2404.08639 • Published • 28
-
CiaraRowles/TemporalDiff
Text-to-Video • Updated • 171 -
Ctrl-Adapter: An Efficient and Versatile Framework for Adapting Diverse Controls to Any Diffusion Model
Paper • 2404.09967 • Published • 21 -
Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation
Paper • 2406.06525 • Published • 67
-
On the Scalability of Diffusion-based Text-to-Image Generation
Paper • 2404.02883 • Published • 18 -
InstantStyle: Free Lunch towards Style-Preserving in Text-to-Image Generation
Paper • 2404.02733 • Published • 21 -
CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching
Paper • 2404.03653 • Published • 34 -
ControlNet++: Improving Conditional Controls with Efficient Consistency Feedback
Paper • 2404.07987 • Published • 47
-
Adding Conditional Control to Text-to-Image Diffusion Models
Paper • 2302.05543 • Published • 45 -
LightIt: Illumination Modeling and Control for Diffusion Models
Paper • 2403.10615 • Published • 17 -
SDXS: Real-Time One-Step Latent Diffusion Models with Image Conditions
Paper • 2403.16627 • Published • 20 -
DreamPolisher: Towards High-Quality Text-to-3D Generation via Geometric Diffusion
Paper • 2403.17237 • Published • 10
-
Video as the New Language for Real-World Decision Making
Paper • 2402.17139 • Published • 19 -
Learning and Leveraging World Models in Visual Representation Learning
Paper • 2403.00504 • Published • 32 -
MovieLLM: Enhancing Long Video Understanding with AI-Generated Movies
Paper • 2403.01422 • Published • 27 -
VideoElevator: Elevating Video Generation Quality with Versatile Text-to-Image Diffusion Models
Paper • 2403.05438 • Published • 19
-
Video as the New Language for Real-World Decision Making
Paper • 2402.17139 • Published • 19 -
VideoCrafter1: Open Diffusion Models for High-Quality Video Generation
Paper • 2310.19512 • Published • 15 -
VideoMamba: State Space Model for Efficient Video Understanding
Paper • 2403.06977 • Published • 27 -
VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models
Paper • 2401.09047 • Published • 14