StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models Paper • 2306.07691 • Published Jun 13, 2023 • 5
Scaling Test-Time Compute with Open Models Collection Models and datasets used in our blog post: https://huggingface.co/spaces/HuggingFaceH4/blogpost-scaling-test-time-compute • 10 items • Updated about 2 hours ago • 19
Deliberation in Latent Space via Differentiable Cache Augmentation Paper • 2412.17747 • Published 14 days ago • 28
Are Transformers with One Layer Self-Attention Using Low-Rank Weight Matrices Universal Approximators? Paper • 2307.14023 • Published Jul 26, 2023 • 1
Causal Diffusion Transformers for Generative Modeling Paper • 2412.12095 • Published 21 days ago • 23
A Touch, Vision, and Language Dataset for Multimodal Alignment Paper • 2402.13232 • Published Feb 20, 2024 • 14
Can you Remove the Downstream Model for Speaker Recognition with Self-Supervised Speech Features? Paper • 2402.00340 • Published Feb 1, 2024 • 1
Optimizing Byte-level Representation for End-to-end ASR Paper • 2406.09676 • Published Jun 14, 2024 • 1
Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs Paper • 2404.05719 • Published Apr 8, 2024 • 82
Depth Pro: Sharp Monocular Metric Depth in Less Than a Second Paper • 2410.02073 • Published Oct 2, 2024 • 41
Computational Bottlenecks of Training Small-scale Large Language Models Paper • 2410.19456 • Published Oct 25, 2024 • 1
Kaleido Diffusion: Improving Conditional Diffusion Models with Autoregressive Latent Modeling Paper • 2405.21048 • Published May 31, 2024 • 13
Dataset Decomposition: Faster LLM Training with Variable Sequence Length Curriculum Paper • 2405.13226 • Published May 21, 2024 • 1
4M-21: An Any-to-Any Vision Model for Tens of Tasks and Modalities Paper • 2406.09406 • Published Jun 13, 2024 • 14
Multimodal Autoregressive Pre-training of Large Vision Encoders Paper • 2411.14402 • Published Nov 21, 2024 • 43
Speech is More Than Words: Do Speech-to-Text Translation Systems Leverage Prosody? Paper • 2410.24019 • Published Oct 31, 2024 • 1