mishig
's Collections
fuck quadratic attention
updated
Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence
Paper
•
2404.05892
•
Published
•
32
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Paper
•
2312.00752
•
Published
•
138
RecurrentGemma: Moving Past Transformers for Efficient Open Language
Models
Paper
•
2404.07839
•
Published
•
43
Leave No Context Behind: Efficient Infinite Context Transformers with
Infini-attention
Paper
•
2404.07143
•
Published
•
104
Megalodon: Efficient LLM Pretraining and Inference with Unlimited
Context Length
Paper
•
2404.08801
•
Published
•
64
Griffin: Mixing Gated Linear Recurrences with Local Attention for
Efficient Language Models
Paper
•
2402.19427
•
Published
•
52
Transformers are RNNs: Fast Autoregressive Transformers with Linear
Attention
Paper
•
2006.16236
•
Published
•
3
Scaling Transformer to 1M tokens and beyond with RMT
Paper
•
2304.11062
•
Published
•
2
CoLT5: Faster Long-Range Transformers with Conditional Computation
Paper
•
2303.09752
•
Published
•
2
The Hedgehog & the Porcupine: Expressive Linear Attentions with Softmax
Mimicry
Paper
•
2402.04347
•
Published
•
13
The Illusion of State in State-Space Models
Paper
•
2404.08819
•
Published
•
1