sugatoray
's Collections
Papers-Fundamentals
updated
RoFormer: Enhanced Transformer with Rotary Position Embedding
Paper
•
2104.09864
•
Published
•
11
Attention Is All You Need
Paper
•
1706.03762
•
Published
•
49
Direct Nash Optimization: Teaching Language Models to Self-Improve with
General Preferences
Paper
•
2404.03715
•
Published
•
60
Zero-Shot Tokenizer Transfer
Paper
•
2405.07883
•
Published
•
5
Blending Is All You Need: Cheaper, Better Alternative to
Trillion-Parameters LLM
Paper
•
2401.02994
•
Published
•
49
The Prompt Report: A Systematic Survey of Prompting Techniques
Paper
•
2406.06608
•
Published
•
56
Extreme Compression of Large Language Models via Additive Quantization
Paper
•
2401.06118
•
Published
•
12
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open
Language Models
Paper
•
2402.03300
•
Published
•
75
HyperZcdotZcdotW Operator Connects Slow-Fast Networks for Full
Context Interaction
Paper
•
2401.17948
•
Published
•
2
Grokfast: Accelerated Grokking by Amplifying Slow Gradients
Paper
•
2405.20233
•
Published
•
6
Stream of Search (SoS): Learning to Search in Language
Paper
•
2404.03683
•
Published
•
29
Xmodel-2 Technical Report
Paper
•
2412.19638
•
Published
•
18