Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2105.13626

ByT5: Towards a token-free future with pre-trained byte-to-byte models

Paper • 2105.13626 • Published May 28, 2021 • 3
Beyond Language Models: Byte Models are Digital World Simulators

Paper • 2402.19155 • Published Feb 29, 2024 • 49
MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers

Paper • 2305.07185 • Published May 12, 2023 • 9
Byte-Level Recursive Convolutional Auto-Encoder for Text

Paper • 1802.01817 • Published Feb 6, 2018

LLM architecture

The Impact of Depth and Width on Transformer Language Model Generalization

Paper • 2310.19956 • Published Oct 30, 2023 • 9
Retentive Network: A Successor to Transformer for Large Language Models

Paper • 2307.08621 • Published Jul 17, 2023 • 170
RWKV: Reinventing RNNs for the Transformer Era

Paper • 2305.13048 • Published May 22, 2023 • 15
Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 50

Papers - Training - Text - Vocabulary - SentencePiece

ByT5: Towards a token-free future with pre-trained byte-to-byte models

Paper • 2105.13626 • Published May 28, 2021 • 3

Papers - Encoders - Bytes - More Depth than Decoder

ByT5: Towards a token-free future with pre-trained byte-to-byte models

Paper • 2105.13626 • Published May 28, 2021 • 3
Byte Latent Transformer: Patches Scale Better Than Tokens

Paper • 2412.09871 • Published 26 days ago • 85

Papers - Training - Token Free - Bytes or Characters

ByT5: Towards a token-free future with pre-trained byte-to-byte models

Paper • 2105.13626 • Published May 28, 2021 • 3
CANINE: Pre-training an Efficient Tokenization-Free Encoder for Language Representation

Paper • 2103.06874 • Published Mar 11, 2021 • 1

Papers - Training - Bytes - No Tokenizer

ByT5: Towards a token-free future with pre-trained byte-to-byte models

Paper • 2105.13626 • Published May 28, 2021 • 3
Byte Latent Transformer: Patches Scale Better Than Tokens

Paper • 2412.09871 • Published 26 days ago • 85
MrT5: Dynamic Token Merging for Efficient Byte-level Language Models

Paper • 2410.20771 • Published Oct 28, 2024 • 3

Papers - Multilingual - Benchmarks

HyperCLOVA X Technical Report

Paper • 2404.01954 • Published Apr 2, 2024 • 20
ByT5: Towards a token-free future with pre-trained byte-to-byte models

Paper • 2105.13626 • Published May 28, 2021 • 3
Byte Latent Transformer: Patches Scale Better Than Tokens

Paper • 2412.09871 • Published 26 days ago • 85

Beyond A*: Better Planning with Transformers via Search Dynamics Bootstrapping

Paper • 2402.14083 • Published Feb 21, 2024 • 47
GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints

Paper • 2305.13245 • Published May 22, 2023 • 5
Training a T5 Using Lab-sized Resources

Paper • 2208.12097 • Published Aug 25, 2022 • 1
Sparse Upcycling: Training Mixture-of-Experts from Dense Checkpoints

Paper • 2212.05055 • Published Dec 9, 2022 • 5

Papers - Training - Synthetic Noise

CodeBERT: A Pre-Trained Model for Programming and Natural Languages

Paper • 2002.08155 • Published Feb 19, 2020 • 2
Text Generation with Diffusion Language Models: A Pre-training Approach with Continuous Paragraph Denoise

Paper • 2212.11685 • Published Dec 22, 2022 • 2
ReNoise: Real Image Inversion Through Iterative Noising

Paper • 2403.14602 • Published Mar 21, 2024 • 19
ByT5: Towards a token-free future with pre-trained byte-to-byte models

Paper • 2105.13626 • Published May 28, 2021 • 3

Papers - Multilingual

A Biomedical Entity Extraction Pipeline for Oncology Health Records in Portuguese

Paper • 2304.08999 • Published Apr 18, 2023 • 2
CulturaX: A Cleaned, Enormous, and Multilingual Dataset for Large Language Models in 167 Languages

Paper • 2309.09400 • Published Sep 17, 2023 • 84
Robust Open-Vocabulary Translation from Visual Text Representations

Paper • 2104.08211 • Published Apr 16, 2021 • 1
Poro 34B and the Blessing of Multilinguality

Paper • 2404.01856 • Published Apr 2, 2024 • 13

Previous
1
2
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs