Collections
Discover the best community collections!
Collections including paper arxiv:2408.07009
-
LinFusion: 1 GPU, 1 Minute, 16K Image
Paper • 2409.02097 • Published • 33 -
Phidias: A Generative Model for Creating 3D Content from Text, Image, and 3D Conditions with Reference-Augmented Diffusion
Paper • 2409.11406 • Published • 26 -
Diffusion Models Are Real-Time Game Engines
Paper • 2408.14837 • Published • 121 -
Segment Anything with Multiple Modalities
Paper • 2408.09085 • Published • 21
-
Imagen 3
Paper • 2408.07009 • Published • 61 -
Generative Photomontage
Paper • 2408.07116 • Published • 20 -
ControlNeXt: Powerful and Efficient Control for Image and Video Generation
Paper • 2408.06070 • Published • 53 -
UniPortrait: A Unified Framework for Identity-Preserving Single- and Multi-Human Image Personalization
Paper • 2408.05939 • Published • 14
-
The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery
Paper • 2408.06292 • Published • 118 -
Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers
Paper • 2408.06195 • Published • 64 -
VisualAgentBench: Towards Large Multimodal Models as Visual Foundation Agents
Paper • 2408.06327 • Published • 16 -
Imagen 3
Paper • 2408.07009 • Published • 61
-
Alleviating Distortion in Image Generation via Multi-Resolution Diffusion Models
Paper • 2406.09416 • Published • 27 -
Wavelets Are All You Need for Autoregressive Image Generation
Paper • 2406.19997 • Published • 29 -
ViPer: Visual Personalization of Generative Models via Individual Preference Learning
Paper • 2407.17365 • Published • 11 -
MegaFusion: Extend Diffusion Models towards Higher-resolution Image Generation without Further Tuning
Paper • 2408.11001 • Published • 11
-
MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training
Paper • 2311.17049 • Published • 1 -
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
Paper • 2405.04434 • Published • 14 -
A Study of Autoregressive Decoders for Multi-Tasking in Computer Vision
Paper • 2303.17376 • Published -
Sigmoid Loss for Language Image Pre-Training
Paper • 2303.15343 • Published • 6
-
OLMo: Accelerating the Science of Language Models
Paper • 2402.00838 • Published • 82 -
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
Paper • 2403.05530 • Published • 61 -
StarCoder: may the source be with you!
Paper • 2305.06161 • Published • 29 -
SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling
Paper • 2312.15166 • Published • 56
-
FaceChain-SuDe: Building Derived Class to Inherit Category Attributes for One-shot Subject-Driven Generation
Paper • 2403.06775 • Published • 3 -
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Paper • 2010.11929 • Published • 7 -
Data Incubation -- Synthesizing Missing Data for Handwriting Recognition
Paper • 2110.07040 • Published • 2 -
A Mixture of Expert Approach for Low-Cost Customization of Deep Neural Networks
Paper • 1811.00056 • Published • 2