tmarechaux
's Collections
Theorical
updated
Language Modeling Is Compression
Paper
•
2309.10668
•
Published
•
83
Small-scale proxies for large-scale Transformer training instabilities
Paper
•
2309.14322
•
Published
•
19
Evaluating Cognitive Maps and Planning in Large Language Models with
CogEval
Paper
•
2309.15129
•
Published
•
6
Vision Transformers Need Registers
Paper
•
2309.16588
•
Published
•
78
The Consensus Game: Language Model Generation via Equilibrium Search
Paper
•
2310.09139
•
Published
•
12
Text Generation with Diffusion Language Models: A Pre-training Approach
with Continuous Paragraph Denoise
Paper
•
2212.11685
•
Published
•
2
Levels of AGI: Operationalizing Progress on the Path to AGI
Paper
•
2311.02462
•
Published
•
34
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper
•
2402.17764
•
Published
•
605
Scaling Instructable Agents Across Many Simulated Worlds
Paper
•
2404.10179
•
Published
•
27
Your Transformer is Secretly Linear
Paper
•
2405.12250
•
Published
•
149
Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion
Paper
•
2407.01392
•
Published
•
39
softmax is not enough (for sharp out-of-distribution)
Paper
•
2410.01104
•
Published
•
1
Paper
•
2410.05258
•
Published
•
169
LLMs Know More Than They Show: On the Intrinsic Representation of LLM
Hallucinations
Paper
•
2410.02707
•
Published
•
48