Byte Latent Transformer: Patches Scale Better Than Tokens Paper • 2412.09871 • Published 24 days ago • 82
Training Large Language Models to Reason in a Continuous Latent Space Paper • 2412.06769 • Published 28 days ago • 66
Adaptive Decoding via Latent Preference Optimization Paper • 2411.09661 • Published Nov 14, 2024 • 10
Thinking LLMs: General Instruction Following with Thought Generation Paper • 2410.10630 • Published Oct 14, 2024 • 18
Source2Synth: Synthetic Data Generation and Curation Grounded in Real Data Sources Paper • 2409.08239 • Published Sep 12, 2024 • 16
Better Alignment with Instruction Back-and-Forth Translation Paper • 2408.04614 • Published Aug 8, 2024 • 14
Meta-Rewarding Language Models: Self-Improving Alignment with LLM-as-a-Meta-Judge Paper • 2407.19594 • Published Jul 28, 2024 • 20
Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM Paper • 2403.07816 • Published Mar 12, 2024 • 39
System 2 Attention (is something you might need too) Paper • 2311.11829 • Published Nov 20, 2023 • 39
System 2 Attention (is something you might need too) Paper • 2311.11829 • Published Nov 20, 2023 • 39
Branch-Solve-Merge Improves Large Language Model Evaluation and Generation Paper • 2310.15123 • Published Oct 23, 2023 • 7