-
ByT5: Towards a token-free future with pre-trained byte-to-byte models
Paper • 2105.13626 • Published • 3 -
Beyond Language Models: Byte Models are Digital World Simulators
Paper • 2402.19155 • Published • 49 -
MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers
Paper • 2305.07185 • Published • 9 -
Byte-Level Recursive Convolutional Auto-Encoder for Text
Paper • 1802.01817 • Published
Collections
Discover the best community collections!
Collections including paper arxiv:2105.13626
-
The Impact of Depth and Width on Transformer Language Model Generalization
Paper • 2310.19956 • Published • 9 -
Retentive Network: A Successor to Transformer for Large Language Models
Paper • 2307.08621 • Published • 170 -
RWKV: Reinventing RNNs for the Transformer Era
Paper • 2305.13048 • Published • 15 -
Attention Is All You Need
Paper • 1706.03762 • Published • 50
-
ByT5: Towards a token-free future with pre-trained byte-to-byte models
Paper • 2105.13626 • Published • 3 -
Byte Latent Transformer: Patches Scale Better Than Tokens
Paper • 2412.09871 • Published • 85 -
MrT5: Dynamic Token Merging for Efficient Byte-level Language Models
Paper • 2410.20771 • Published • 3
-
Beyond A*: Better Planning with Transformers via Search Dynamics Bootstrapping
Paper • 2402.14083 • Published • 47 -
GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints
Paper • 2305.13245 • Published • 5 -
Training a T5 Using Lab-sized Resources
Paper • 2208.12097 • Published • 1 -
Sparse Upcycling: Training Mixture-of-Experts from Dense Checkpoints
Paper • 2212.05055 • Published • 5
-
CodeBERT: A Pre-Trained Model for Programming and Natural Languages
Paper • 2002.08155 • Published • 2 -
Text Generation with Diffusion Language Models: A Pre-training Approach with Continuous Paragraph Denoise
Paper • 2212.11685 • Published • 2 -
ReNoise: Real Image Inversion Through Iterative Noising
Paper • 2403.14602 • Published • 19 -
ByT5: Towards a token-free future with pre-trained byte-to-byte models
Paper • 2105.13626 • Published • 3
-
A Biomedical Entity Extraction Pipeline for Oncology Health Records in Portuguese
Paper • 2304.08999 • Published • 2 -
CulturaX: A Cleaned, Enormous, and Multilingual Dataset for Large Language Models in 167 Languages
Paper • 2309.09400 • Published • 84 -
Robust Open-Vocabulary Translation from Visual Text Representations
Paper • 2104.08211 • Published • 1 -
Poro 34B and the Blessing of Multilinguality
Paper • 2404.01856 • Published • 13