Offline Reinforcement Learning for LLM Multi-Step Reasoning Paper • 2412.16145 • Published 17 days ago • 36
Training Large Language Models to Reason in a Continuous Latent Space Paper • 2412.06769 • Published 28 days ago • 66
Flow of Reasoning: Efficient Training of LLM Policy with Divergent Thinking Paper • 2406.05673 • Published Jun 9, 2024 • 3
Pandora: Towards General World Model with Natural Language Actions and Video States Paper • 2406.09455 • Published Jun 12, 2024 • 15
On-Policy Distillation of Language Models: Learning from Self-Generated Mistakes Paper • 2306.13649 • Published Jun 23, 2023 • 17