Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2410.20399

Papers - Visualizations - GPU Programming - Memory

ThunderKittens: Simple, Fast, and Adorable AI Kernels

Paper • 2410.20399 • Published Oct 27, 2024 • 1

Papers - Attention - GPU Programming - Kernel - Cuda

ThunderKittens: Simple, Fast, and Adorable AI Kernels

Paper • 2410.20399 • Published Oct 27, 2024 • 1

Papers - Triton

Cut Your Losses in Large-Vocabulary Language Models

Paper • 2411.09009 • Published Nov 13, 2024 • 44
ThunderKittens: Simple, Fast, and Adorable AI Kernels

Paper • 2410.20399 • Published Oct 27, 2024 • 1

Papers - Attention - Flash Attention

OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework

Paper • 2404.14619 • Published Apr 22, 2024 • 127
ThunderKittens: Simple, Fast, and Adorable AI Kernels

Paper • 2410.20399 • Published Oct 27, 2024 • 1

Papers - Encodings - Rotary - RoPE

The Impact of Positional Encoding on Length Generalization in Transformers

Paper • 2305.19466 • Published May 31, 2023 • 2
Qwen2 Technical Report

Paper • 2407.10671 • Published Jul 15, 2024 • 161
Round and Round We Go! What makes Rotary Positional Encodings useful?

Paper • 2410.06205 • Published Oct 8, 2024 • 1
ThunderKittens: Simple, Fast, and Adorable AI Kernels

Paper • 2410.20399 • Published Oct 27, 2024 • 1

Resonance RoPE: Improving Context Length Generalization of Large Language Models

Paper • 2403.00071 • Published Feb 29, 2024 • 23
Scaling Laws of RoPE-based Extrapolation

Paper • 2310.05209 • Published Oct 8, 2023 • 7
Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models

Paper • 2404.12387 • Published Apr 18, 2024 • 39
OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework

Paper • 2404.14619 • Published Apr 22, 2024 • 127

Papers - Attention

Linear Transformers with Learnable Kernel Functions are Better In-Context Models

Paper • 2402.10644 • Published Feb 16, 2024 • 80
GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints

Paper • 2305.13245 • Published May 22, 2023 • 5
ChunkAttention: Efficient Self-Attention with Prefix-Aware KV Cache and Two-Phase Partition

Paper • 2402.15220 • Published Feb 23, 2024 • 19
Sequence Parallelism: Long Sequence Training from System Perspective

Paper • 2105.13120 • Published May 26, 2021 • 5

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs