Collections
Discover the best community collections!
Collections including paper arxiv:2401.13660
-
MambaByte: Token-free Selective State Space Model
Paper • 2401.13660 • Published • 53 -
Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads
Paper • 2401.10774 • Published • 54 -
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 145 -
Meta-Prompting: Enhancing Language Models with Task-Agnostic Scaffolding
Paper • 2401.12954 • Published • 29
-
Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads
Paper • 2401.10774 • Published • 54 -
APAR: LLMs Can Do Auto-Parallel Auto-Regressive Decoding
Paper • 2401.06761 • Published • 1 -
Infinite-LLM: Efficient LLM Service for Long Context with DistAttention and Distributed KVCache
Paper • 2401.02669 • Published • 14 -
MambaByte: Token-free Selective State Space Model
Paper • 2401.13660 • Published • 53
-
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 145 -
Orion-14B: Open-source Multilingual Large Language Models
Paper • 2401.12246 • Published • 12 -
MambaByte: Token-free Selective State Space Model
Paper • 2401.13660 • Published • 53 -
MM-LLMs: Recent Advances in MultiModal Large Language Models
Paper • 2401.13601 • Published • 45
-
Blending Is All You Need: Cheaper, Better Alternative to Trillion-Parameters LLM
Paper • 2401.02994 • Published • 49 -
MambaByte: Token-free Selective State Space Model
Paper • 2401.13660 • Published • 53 -
Repeat After Me: Transformers are Better than State Space Models at Copying
Paper • 2402.01032 • Published • 22 -
BlackMamba: Mixture of Experts for State-Space Models
Paper • 2402.01771 • Published • 23
-
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
Paper • 2401.02954 • Published • 41 -
Qwen Technical Report
Paper • 2309.16609 • Published • 35 -
GPT-4 Technical Report
Paper • 2303.08774 • Published • 5 -
Gemini: A Family of Highly Capable Multimodal Models
Paper • 2312.11805 • Published • 44
-
A Picture is Worth More Than 77 Text Tokens: Evaluating CLIP-Style Models on Dense Captions
Paper • 2312.08578 • Published • 16 -
ZeroQuant(4+2): Redefining LLMs Quantization with a New FP6-Centric Strategy for Diverse Generative Tasks
Paper • 2312.08583 • Published • 9 -
Vision-Language Models as a Source of Rewards
Paper • 2312.09187 • Published • 11 -
StemGen: A music generation model that listens
Paper • 2312.08723 • Published • 47
-
The Chosen One: Consistent Characters in Text-to-Image Diffusion Models
Paper • 2311.10093 • Published • 56 -
NeuroPrompts: An Adaptive Framework to Optimize Prompts for Text-to-Image Generation
Paper • 2311.12229 • Published • 26 -
Diffusion Model Alignment Using Direct Preference Optimization
Paper • 2311.12908 • Published • 47 -
VMC: Video Motion Customization using Temporal Attention Adaption for Text-to-Video Diffusion Models
Paper • 2312.00845 • Published • 36
-
DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models
Paper • 2309.14509 • Published • 17 -
LLM Augmented LLMs: Expanding Capabilities through Composition
Paper • 2401.02412 • Published • 36 -
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
Paper • 2401.06066 • Published • 44 -
Tuning Language Models by Proxy
Paper • 2401.08565 • Published • 21