PERSE: Personalized 3D Generative Avatars from A Single Portrait Paper • 2412.21206 • Published 7 days ago • 14
How "Real" is Your Real-Time Simultaneous Speech-to-Text Translation System? Paper • 2412.18495 • Published 13 days ago • 8
EgoVid-5M: A Large-Scale Video-Action Dataset for Egocentric Video Generation Paper • 2411.08380 • Published Nov 13, 2024 • 25
PaliGemma Release Collection Pretrained and mix checkpoints for PaliGemma • 16 items • Updated 24 days ago • 143
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone Paper • 2404.14219 • Published Apr 22, 2024 • 254
Gaussian Frosting: Editable Complex Radiance Fields with Real-Time Rendering Paper • 2403.14554 • Published Mar 21, 2024 • 12
Kosmos-2: Grounding Multimodal Large Language Models to the World Paper • 2306.14824 • Published Jun 26, 2023 • 34
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits Paper • 2402.17764 • Published Feb 27, 2024 • 605
Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models Paper • 2402.17177 • Published Feb 27, 2024 • 88
GaussianObject: Just Taking Four Images to Get A High-Quality 3D Object with Gaussian Splatting Paper • 2402.10259 • Published Feb 15, 2024 • 13
AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling Paper • 2402.12226 • Published Feb 19, 2024 • 41
FlashTex: Fast Relightable Mesh Texturing with LightControlNet Paper • 2402.13251 • Published Feb 20, 2024 • 13
MVDiffusion++: A Dense High-resolution Multi-view Diffusion Model for Single or Sparse-view 3D Object Reconstruction Paper • 2402.12712 • Published Feb 20, 2024 • 17
Synthetic Data (Almost) from Scratch: Generalized Instruction Tuning for Language Models Paper • 2402.13064 • Published Feb 20, 2024 • 47
Media2Face: Co-speech Facial Animation Generation With Multi-Modality Guidance Paper • 2401.15687 • Published Jan 28, 2024 • 23