BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding Paper • 1810.04805 • Published Oct 11, 2018 • 16
Transformers Can Achieve Length Generalization But Not Robustly Paper • 2402.09371 • Published Feb 14, 2024 • 13
A Thorough Examination of Decoding Methods in the Era of LLMs Paper • 2402.06925 • Published Feb 10, 2024 • 1
Byte Latent Transformer: Patches Scale Better Than Tokens Paper • 2412.09871 • Published 24 days ago • 83