-
Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models
Paper • 2310.04406 • Published • 8 -
Chain-of-Thought Reasoning Without Prompting
Paper • 2402.10200 • Published • 104 -
ICDPO: Effectively Borrowing Alignment Capability of Others via In-context Direct Preference Optimization
Paper • 2402.09320 • Published • 6 -
Self-Discover: Large Language Models Self-Compose Reasoning Structures
Paper • 2402.03620 • Published • 114
Collections
Discover the best community collections!
Collections including paper arxiv:2403.04642
-
Training Large Language Models to Reason in a Continuous Latent Space
Paper • 2412.06769 • Published • 66 -
Language Models are Hidden Reasoners: Unlocking Latent Reasoning Capabilities via Self-Rewarding
Paper • 2411.04282 • Published • 32 -
Offline Reinforcement Learning for LLM Multi-Step Reasoning
Paper • 2412.16145 • Published • 36 -
MALT: Improving Reasoning with Multi-Agent LLM Training
Paper • 2412.01928 • Published • 40
-
Chain-of-Knowledge: Integrating Knowledge Reasoning into Large Language Models by Learning from Knowledge Graphs
Paper • 2407.00653 • Published • 11 -
Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs
Paper • 2406.18629 • Published • 41 -
Whiteboard-of-Thought: Thinking Step-by-Step Across Modalities
Paper • 2406.14562 • Published • 27 -
Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models
Paper • 2406.04271 • Published • 29
-
Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking
Paper • 2403.09629 • Published • 75 -
How Far Are We from Intelligent Visual Deductive Reasoning?
Paper • 2403.04732 • Published • 19 -
Teaching Large Language Models to Reason with Reinforcement Learning
Paper • 2403.04642 • Published • 46 -
GLoRe: When, Where, and How to Improve LLM Reasoning via Global and Local Refinements
Paper • 2402.10963 • Published • 10
-
ORPO: Monolithic Preference Optimization without Reference Model
Paper • 2403.07691 • Published • 64 -
sDPO: Don't Use Your Data All at Once
Paper • 2403.19270 • Published • 40 -
Teaching Large Language Models to Reason with Reinforcement Learning
Paper • 2403.04642 • Published • 46 -
Best Practices and Lessons Learned on Synthetic Data for Language Models
Paper • 2404.07503 • Published • 29
-
Pix2Gif: Motion-Guided Diffusion for GIF Generation
Paper • 2403.04634 • Published • 14 -
StableDrag: Stable Dragging for Point-based Image Editing
Paper • 2403.04437 • Published • 25 -
Teaching Large Language Models to Reason with Reinforcement Learning
Paper • 2403.04642 • Published • 46 -
Yi: Open Foundation Models by 01.AI
Paper • 2403.04652 • Published • 62
-
Teaching Large Language Models to Reason with Reinforcement Learning
Paper • 2403.04642 • Published • 46 -
Stop Regressing: Training Value Functions via Classification for Scalable Deep RL
Paper • 2403.03950 • Published • 13 -
RT-Sketch: Goal-Conditioned Imitation Learning from Hand-Drawn Sketches
Paper • 2403.02709 • Published • 7