jmagder
's Collections
To read
updated
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Paper
•
2312.00752
•
Published
•
138
Elucidating the Design Space of Diffusion-Based Generative Models
Paper
•
2206.00364
•
Published
•
14
GLU Variants Improve Transformer
Paper
•
2002.05202
•
Published
•
1
StarCoder 2 and The Stack v2: The Next Generation
Paper
•
2402.19173
•
Published
•
136
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
Paper
•
2403.03507
•
Published
•
183
DocLLM: A layout-aware generative language model for multimodal document
understanding
Paper
•
2401.00908
•
Published
•
181
Paper
•
2401.04088
•
Published
•
158
Your Transformer is Secretly Linear
Paper
•
2405.12250
•
Published
•
149
LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical Report
Paper
•
2405.00732
•
Published
•
118
Chameleon: Mixed-Modal Early-Fusion Foundation Models
Paper
•
2405.09818
•
Published
•
126
Leave No Context Behind: Efficient Infinite Context Transformers with
Infini-attention
Paper
•
2404.07143
•
Published
•
104
Design2Code: How Far Are We From Automating Front-End Engineering?
Paper
•
2403.03163
•
Published
•
93
Gemma: Open Models Based on Gemini Research and Technology
Paper
•
2403.08295
•
Published
•
47
Longformer: The Long-Document Transformer
Paper
•
2004.05150
•
Published
•
3
WARP: On the Benefits of Weight Averaged Rewarded Policies
Paper
•
2406.16768
•
Published
•
22
DeBERTa: Decoding-enhanced BERT with Disentangled Attention
Paper
•
2006.03654
•
Published
•
3
DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with
Gradient-Disentangled Embedding Sharing
Paper
•
2111.09543
•
Published
•
2
Direct Preference Optimization: Your Language Model is Secretly a Reward
Model
Paper
•
2305.18290
•
Published
•
50
The Prompt Report: A Systematic Survey of Prompting Techniques
Paper
•
2406.06608
•
Published
•
57
SimPO: Simple Preference Optimization with a Reference-Free Reward
Paper
•
2405.14734
•
Published
•
11
MEDIC: Towards a Comprehensive Framework for Evaluating LLMs in Clinical
Applications
Paper
•
2409.07314
•
Published
•
50
Qwen2.5-Coder Technical Report
Paper
•
2409.12186
•
Published
•
139