admarcosai
's Collections
Model Architectures
updated
togethercomputer/StripedHyena-Hessian-7B
Text Generation
•
Updated
•
88
•
65
Zebra: Extending Context Window with Layerwise Grouped Local-Global
Attention
Paper
•
2312.08618
•
Published
•
11
SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention
Paper
•
2312.07987
•
Published
•
41
LLM360: Towards Fully Transparent Open-Source LLMs
Paper
•
2312.06550
•
Published
•
57
Cached Transformers: Improving Transformers with Differentiable Memory
Cache
Paper
•
2312.12742
•
Published
•
12
SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective
Depth Up-Scaling
Paper
•
2312.15166
•
Published
•
56
Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence
Lengths in Large Language Models
Paper
•
2401.04658
•
Published
•
25
Infini-gram: Scaling Unbounded n-gram Language Models to a Trillion
Tokens
Paper
•
2401.17377
•
Published
•
35
Advancing Transformer Architecture in Long-Context Large Language
Models: A Comprehensive Survey
Paper
•
2311.12351
•
Published
•
3
H2O-Danube-1.8B Technical Report
Paper
•
2401.16818
•
Published
•
17
TinyLlama: An Open-Source Small Language Model
Paper
•
2401.02385
•
Published
•
90
Learning and Leveraging World Models in Visual Representation Learning
Paper
•
2403.00504
•
Published
•
31