Byte Latent Transformer: Patches Scale Better Than Tokens Paper • 2412.09871 • Published 26 days ago • 85
SpiRit-LM: Interleaved Spoken and Written Language Model Paper • 2402.05755 • Published Feb 8, 2024 • 13
Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning Paper • 2309.02591 • Published Sep 5, 2023 • 14
The Belebele Benchmark: a Parallel Reading Comprehension Dataset in 122 Language Variants Paper • 2308.16884 • Published Aug 31, 2023 • 8