-
Self-Play Preference Optimization for Language Model Alignment
Paper • 2405.00675 • Published • 25 -
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness
Paper • 2205.14135 • Published • 11 -
Attention Is All You Need
Paper • 1706.03762 • Published • 49 -
FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning
Paper • 2307.08691 • Published • 8
Collections
Discover the best community collections!
Collections including paper arxiv:1706.03762
-
Attention Is All You Need
Paper • 1706.03762 • Published • 49 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 16 -
Universal Language Model Fine-tuning for Text Classification
Paper • 1801.06146 • Published • 6 -
Language Models are Few-Shot Learners
Paper • 2005.14165 • Published • 12
-
The Impact of Depth and Width on Transformer Language Model Generalization
Paper • 2310.19956 • Published • 9 -
Retentive Network: A Successor to Transformer for Large Language Models
Paper • 2307.08621 • Published • 170 -
RWKV: Reinventing RNNs for the Transformer Era
Paper • 2305.13048 • Published • 15 -
Attention Is All You Need
Paper • 1706.03762 • Published • 49
-
Efficient Memory Management for Large Language Model Serving with PagedAttention
Paper • 2309.06180 • Published • 25 -
LM-Infinite: Simple On-the-Fly Length Generalization for Large Language Models
Paper • 2308.16137 • Published • 39 -
Scaling Transformer to 1M tokens and beyond with RMT
Paper • 2304.11062 • Published • 2 -
DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models
Paper • 2309.14509 • Published • 17
-
sentence-transformers/all-mpnet-base-v2
Sentence Similarity • Updated • 19.4M • • 945 -
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Paper • 1910.10683 • Published • 10 -
google-t5/t5-base
Translation • Updated • 1.91M • • 655 -
Attention Is All You Need
Paper • 1706.03762 • Published • 49
-
Recurrent Neural Network Regularization
Paper • 1409.2329 • Published -
Pointer Networks
Paper • 1506.03134 • Published -
Order Matters: Sequence to sequence for sets
Paper • 1511.06391 • Published -
GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism
Paper • 1811.06965 • Published
-
Addition is All You Need for Energy-efficient Language Models
Paper • 2410.00907 • Published • 144 -
Emu3: Next-Token Prediction is All You Need
Paper • 2409.18869 • Published • 94 -
An accurate detection is not all you need to combat label noise in web-noisy datasets
Paper • 2407.05528 • Published • 3 -
Is It Really Long Context if All You Need Is Retrieval? Towards Genuinely Difficult Long Context NLP
Paper • 2407.00402 • Published • 22
-
Attention Is All You Need
Paper • 1706.03762 • Published • 49 -
Playing Atari with Deep Reinforcement Learning
Paper • 1312.5602 • Published -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 16 -
Language Models are Few-Shot Learners
Paper • 2005.14165 • Published • 12