-
The Impact of Positional Encoding on Length Generalization in Transformers
Paper • 2305.19466 • Published • 2 -
Latent Positional Information is in the Self-Attention Variance of Transformer Language Models Without Positional Embeddings
Paper • 2305.13571 • Published • 2 -
Position Prediction as an Effective Pretraining Strategy
Paper • 2207.07611 • Published • 1 -
Transformer Language Models without Positional Encodings Still Learn Positional Information
Paper • 2203.16634 • Published • 5
Collections
Discover the best community collections!
Collections including paper arxiv:2305.19466
-
Cure the headache of Transformers via Collinear Constrained Attention
Paper • 2309.08646 • Published • 12 -
YaRN: Efficient Context Window Extension of Large Language Models
Paper • 2309.00071 • Published • 65 -
PoSE: Efficient Context Window Extension of LLMs via Positional Skip-wise Training
Paper • 2309.10400 • Published • 26 -
Dynamically Relative Position Encoding-Based Transformer for Automatic Code Edit
Paper • 2205.13522 • Published • 1
-
Diversity of Thought Improves Reasoning Abilities of Large Language Models
Paper • 2310.07088 • Published • 5 -
Reverse Chain: A Generic-Rule for LLMs to Master Multi-API Planning
Paper • 2310.04474 • Published • 2 -
Promptor: A Conversational and Autonomous Prompt Generation Agent for Intelligent Text Entry Techniques
Paper • 2310.08101 • Published • 2 -
Instance Needs More Care: Rewriting Prompts for Instances Yields Better Zero-Shot Performance
Paper • 2310.02107 • Published • 3
-
The Impact of Positional Encoding on Length Generalization in Transformers
Paper • 2305.19466 • Published • 2 -
Transformers Can Do Arithmetic with the Right Embeddings
Paper • 2405.17399 • Published • 52 -
Teaching Transformers Causal Reasoning through Axiomatic Training
Paper • 2407.07612 • Published • 2 -
Round and Round We Go! What makes Rotary Positional Encodings useful?
Paper • 2410.06205 • Published • 1
-
The Impact of Positional Encoding on Length Generalization in Transformers
Paper • 2305.19466 • Published • 2 -
Qwen2 Technical Report
Paper • 2407.10671 • Published • 160 -
Round and Round We Go! What makes Rotary Positional Encodings useful?
Paper • 2410.06205 • Published • 1 -
ThunderKittens: Simple, Fast, and Adorable AI Kernels
Paper • 2410.20399 • Published • 1