Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2309.08586

LLM architecture

The Impact of Depth and Width on Transformer Language Model Generalization

Paper • 2310.19956 • Published Oct 30, 2023 • 9
Retentive Network: A Successor to Transformer for Large Language Models

Paper • 2307.08621 • Published Jul 17, 2023 • 170
RWKV: Reinventing RNNs for the Transformer Era

Paper • 2305.13048 • Published May 22, 2023 • 15
Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 50

Replacing softmax with ReLU in Vision Transformers

Paper • 2309.08586 • Published Sep 15, 2023 • 17
Softmax Bias Correction for Quantized Generative Models

Paper • 2309.01729 • Published Sep 4, 2023 • 1
The Closeness of In-Context Learning and Weight Shifting for Softmax Regression

Paper • 2304.13276 • Published Apr 26, 2023 • 1
Quantizable Transformers: Removing Outliers by Helping Attention Heads Do Nothing

Paper • 2306.12929 • Published Jun 22, 2023 • 12

Efficient Memory Management for Large Language Model Serving with PagedAttention

Paper • 2309.06180 • Published Sep 12, 2023 • 25
LM-Infinite: Simple On-the-Fly Length Generalization for Large Language Models

Paper • 2308.16137 • Published Aug 30, 2023 • 39
Scaling Transformer to 1M tokens and beyond with RMT

Paper • 2304.11062 • Published Apr 19, 2023 • 2
DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models

Paper • 2309.14509 • Published Sep 25, 2023 • 17

Language Modeling Is Compression

Paper • 2309.10668 • Published Sep 19, 2023 • 83
Baichuan 2: Open Large-scale Language Models

Paper • 2309.10305 • Published Sep 19, 2023 • 19
Chain-of-Verification Reduces Hallucination in Large Language Models

Paper • 2309.11495 • Published Sep 20, 2023 • 37
LMDX: Language Model-based Document Information Extraction and Localization

Paper • 2309.10952 • Published Sep 19, 2023 • 65

Replacing softmax with ReLU in Vision Transformers

Paper • 2309.08586 • Published Sep 15, 2023 • 17
AstroLLaMA: Towards Specialized Foundation Models in Astronomy

Paper • 2309.06126 • Published Sep 12, 2023 • 16

Connecting Large Language Models with Evolutionary Algorithms Yields Powerful Prompt Optimizers

Paper • 2309.08532 • Published Sep 15, 2023 • 53
A Distributed Data-Parallel PyTorch Implementation of the Distributed Shampoo Optimizer for Training Neural Networks At-Scale

Paper • 2309.06497 • Published Sep 12, 2023 • 4
MindAgent: Emergent Gaming Interaction

Paper • 2309.09971 • Published Sep 18, 2023 • 11
CulturaX: A Cleaned, Enormous, and Multilingual Dataset for Large Language Models in 167 Languages

Paper • 2309.09400 • Published Sep 17, 2023 • 84

My Papers of Interest

Self-Alignment with Instruction Backtranslation

Paper • 2308.06259 • Published Aug 11, 2023 • 41
ReCLIP: Refine Contrastive Language Image Pre-Training with Source Free Domain Adaptation

Paper • 2308.03793 • Published Aug 4, 2023 • 10
From Sparse to Soft Mixtures of Experts

Paper • 2308.00951 • Published Aug 2, 2023 • 20
Revisiting DETR Pre-training for Object Detection

Paper • 2308.01300 • Published Aug 2, 2023 • 9

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs