-
The Impact of Positional Encoding on Length Generalization in Transformers
Paper • 2305.19466 • Published • 2 -
Latent Positional Information is in the Self-Attention Variance of Transformer Language Models Without Positional Embeddings
Paper • 2305.13571 • Published • 2 -
Position Prediction as an Effective Pretraining Strategy
Paper • 2207.07611 • Published • 1 -
Transformer Language Models without Positional Encodings Still Learn Positional Information
Paper • 2203.16634 • Published • 5
Collections
Discover the best community collections!
Collections including paper arxiv:2203.16634
-
The Impact of Depth and Width on Transformer Language Model Generalization
Paper • 2310.19956 • Published • 9 -
Retentive Network: A Successor to Transformer for Large Language Models
Paper • 2307.08621 • Published • 170 -
RWKV: Reinventing RNNs for the Transformer Era
Paper • 2305.13048 • Published • 15 -
Attention Is All You Need
Paper • 1706.03762 • Published • 50
-
Cure the headache of Transformers via Collinear Constrained Attention
Paper • 2309.08646 • Published • 12 -
YaRN: Efficient Context Window Extension of Large Language Models
Paper • 2309.00071 • Published • 65 -
PoSE: Efficient Context Window Extension of LLMs via Positional Skip-wise Training
Paper • 2309.10400 • Published • 26 -
Dynamically Relative Position Encoding-Based Transformer for Automatic Code Edit
Paper • 2205.13522 • Published • 1
-
Length Generalization of Causal Transformers without Position Encoding
Paper • 2404.12224 • Published • 1 -
Transformer Language Models without Positional Encodings Still Learn Positional Information
Paper • 2203.16634 • Published • 5 -
Latent Positional Information is in the Self-Attention Variance of Transformer Language Models Without Positional Embeddings
Paper • 2305.13571 • Published • 2 -
The Impact of Positional Encoding on Length Generalization in Transformers
Paper • 2305.19466 • Published • 2
-
Analyzing Transformers in Embedding Space
Paper • 2209.02535 • Published • 3 -
Prompt-to-Prompt Image Editing with Cross Attention Control
Paper • 2208.01626 • Published • 2 -
Dynamic Typography: Bringing Words to Life
Paper • 2404.11614 • Published • 44 -
Transformer Language Models without Positional Encodings Still Learn Positional Information
Paper • 2203.16634 • Published • 5
-
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Paper • 1907.11692 • Published • 7 -
Leveraging Pre-trained Checkpoints for Sequence Generation Tasks
Paper • 1907.12461 • Published • 1 -
Transformer Language Models without Positional Encodings Still Learn Positional Information
Paper • 2203.16634 • Published • 5 -
CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation
Paper • 2102.04664 • Published • 2
-
The Curious Case of Neural Text Degeneration
Paper • 1904.09751 • Published • 3 -
Getting it Right: Improving Spatial Consistency in Text-to-Image Models
Paper • 2404.01197 • Published • 30 -
BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions
Paper • 1905.10044 • Published • 1 -
PIQA: Reasoning about Physical Commonsense in Natural Language
Paper • 1911.11641 • Published • 2
-
Mesh2NeRF: Direct Mesh Supervision for Neural Radiance Field Representation and Generation
Paper • 2403.19319 • Published • 12 -
Getting it Right: Improving Spatial Consistency in Text-to-Image Models
Paper • 2404.01197 • Published • 30 -
LLaVA-Gemma: Accelerating Multimodal Foundation Models with a Compact Language Model
Paper • 2404.01331 • Published • 25 -
LVLM-Intrepret: An Interpretability Tool for Large Vision-Language Models
Paper • 2404.03118 • Published • 23
-
LIMA: Less Is More for Alignment
Paper • 2305.11206 • Published • 21 -
Garment3DGen: 3D Garment Stylization and Texture Generation
Paper • 2403.18816 • Published • 21 -
EgoLifter: Open-world 3D Segmentation for Egocentric Perception
Paper • 2403.18118 • Published • 10 -
The Unreasonable Ineffectiveness of the Deeper Layers
Paper • 2403.17887 • Published • 78
-
LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens
Paper • 2402.13753 • Published • 114 -
Data Engineering for Scaling Language Models to 128K Context
Paper • 2402.10171 • Published • 23 -
LongAgent: Scaling Language Models to 128k Context through Multi-Agent Collaboration
Paper • 2402.11550 • Published • 16 -
The What, Why, and How of Context Length Extension Techniques in Large Language Models -- A Detailed Survey
Paper • 2401.07872 • Published • 2