-
RL Zero: Zero-Shot Language to Behaviors without any Supervision
Paper • 2412.05718 • Published • 4 -
Offline Reinforcement Learning for LLM Multi-Step Reasoning
Paper • 2412.16145 • Published • 38 -
Ensembling Large Language Models with Process Reward-Guided Tree Search for Better Complex Reasoning
Paper • 2412.15797 • Published • 17 -
Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search
Paper • 2412.18319 • Published • 37
Collections
Discover the best community collections!
Collections including paper arxiv:2501.05707
-
Flow-DPO: Improving LLM Mathematical Reasoning through Online Multi-Agent Learning
Paper • 2410.22304 • Published • 17 -
OpenWebVoyager: Building Multimodal Web Agents via Iterative Real-World Exploration, Feedback and Optimization
Paper • 2410.19609 • Published • 17 -
Adapting While Learning: Grounding LLMs for Scientific Problems with Intelligent Tool Usage Adaptation
Paper • 2411.00412 • Published • 9 -
Improving Autonomous AI Agents with Reflective Tree Search and Self-Learning
Paper • 2410.02052 • Published • 9
-
MEGA-Bench: Scaling Multimodal Evaluation to over 500 Real-World Tasks
Paper • 2410.10563 • Published • 38 -
Latent Action Pretraining from Videos
Paper • 2410.11758 • Published • 2 -
TVBench: Redesigning Video-Language Evaluation
Paper • 2410.07752 • Published • 5 -
Automated Generation of Challenging Multiple-Choice Questions for Vision Language Model Evaluation
Paper • 2501.03225 • Published • 6
-
Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking
Paper • 2403.09629 • Published • 76 -
Agent Q: Advanced Reasoning and Learning for Autonomous AI Agents
Paper • 2408.07199 • Published • 21 -
Let's Verify Step by Step
Paper • 2305.20050 • Published • 10 -
V-STaR: Training Verifiers for Self-Taught Reasoners
Paper • 2402.06457 • Published • 9
-
More Agents Is All You Need
Paper • 2402.05120 • Published • 52 -
OS-Copilot: Towards Generalist Computer Agents with Self-Improvement
Paper • 2402.07456 • Published • 43 -
Generative Agents: Interactive Simulacra of Human Behavior
Paper • 2304.03442 • Published • 12 -
Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models
Paper • 2310.04406 • Published • 8
-
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 146 -
Self-Discover: Large Language Models Self-Compose Reasoning Structures
Paper • 2402.03620 • Published • 114 -
OS-Copilot: Towards Generalist Computer Agents with Self-Improvement
Paper • 2402.07456 • Published • 43 -
Learning From Mistakes Makes LLM Better Reasoner
Paper • 2310.20689 • Published • 28
-
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 146 -
Orion-14B: Open-source Multilingual Large Language Models
Paper • 2401.12246 • Published • 12 -
MambaByte: Token-free Selective State Space Model
Paper • 2401.13660 • Published • 53 -
MM-LLMs: Recent Advances in MultiModal Large Language Models
Paper • 2401.13601 • Published • 45
-
DocGraphLM: Documental Graph Language Model for Information Extraction
Paper • 2401.02823 • Published • 35 -
Understanding LLMs: A Comprehensive Overview from Training to Inference
Paper • 2401.02038 • Published • 62 -
DocLLM: A layout-aware generative language model for multimodal document understanding
Paper • 2401.00908 • Published • 181 -
Attention Where It Matters: Rethinking Visual Document Understanding with Selective Region Concentration
Paper • 2309.01131 • Published • 1