rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking Paper • 2501.04519 • Published 9 days ago • 229
Learning an evolved mixture model for task-free continual learning Paper • 2207.05080 • Published Jul 11, 2022 • 1
EVOLvE: Evaluating and Optimizing LLMs For Exploration Paper • 2410.06238 • Published Oct 8, 2024 • 1
Smaller Language Models Are Better Instruction Evolvers Paper • 2412.11231 • Published Dec 15, 2024 • 27
VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding? Paper • 2404.05955 • Published Apr 9, 2024
AgentGym: Evolving Large Language Model-based Agents across Diverse Environments Paper • 2406.04151 • Published Jun 6, 2024 • 19
Chain-of-Table: Evolving Tables in the Reasoning Chain for Table Understanding Paper • 2401.04398 • Published Jan 9, 2024 • 22
EvoCodeBench: An Evolving Code Generation Benchmark with Domain-Specific Evaluations Paper • 2410.22821 • Published Oct 30, 2024 • 1
PortLLM: Personalizing Evolving Large Language Models with Training-Free and Portable Model Patches Paper • 2410.10870 • Published Oct 8, 2024 • 1
Generating and Evolving Reward Functions for Highway Driving with Large Language Models Paper • 2406.10540 • Published Jun 15, 2024 • 1
CompassJudger-1: All-in-one Judge Model Helps Model Evaluation and Evolution Paper • 2410.16256 • Published Oct 21, 2024 • 60
MUSCLE: A Model Update Strategy for Compatible LLM Evolution Paper • 2407.09435 • Published Jul 12, 2024 • 22
GAVEL: Generating Games Via Evolution and Language Models Paper • 2407.09388 • Published Jul 12, 2024 • 16
Reward Steering with Evolutionary Heuristics for Decoding-time Alignment Paper • 2406.15193 • Published Jun 21, 2024 • 14
Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Though Paper • 2501.04682 • Published 9 days ago • 83
BoostStep: Boosting mathematical capability of Large Language Models via improved single-step reasoning Paper • 2501.03226 • Published 11 days ago • 35
MapEval: A Map-Based Evaluation of Geo-Spatial Reasoning in Foundation Models Paper • 2501.00316 • Published 18 days ago • 22
Search-o1: Agentic Search-Enhanced Large Reasoning Models Paper • 2501.05366 • Published 8 days ago • 75
URSA: Understanding and Verifying Chain-of-thought Reasoning in Multimodal Mathematics Paper • 2501.04686 • Published 9 days ago • 48
InfiGUIAgent: A Multimodal Generalist GUI Agent with Native Reasoning and Reflection Paper • 2501.04575 • Published 9 days ago • 22
Efficiently Serving LLM Reasoning Programs with Certaindex Paper • 2412.20993 • Published 18 days ago • 34
Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search Paper • 2412.18319 • Published 25 days ago • 37
B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners Paper • 2412.17256 • Published 26 days ago • 45
ShowUI: One Vision-Language-Action Model for GUI Visual Agent Paper • 2411.17465 • Published Nov 26, 2024 • 78
Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction Paper • 2412.04454 • Published Dec 5, 2024 • 59
AgentTrek: Agent Trajectory Synthesis via Guiding Replay with Web Tutorials Paper • 2412.09605 • Published Dec 12, 2024 • 28
OmniManip: Towards General Robotic Manipulation via Object-Centric Interaction Primitives as Spatial Constraints Paper • 2501.03841 • Published 10 days ago • 49
Agents for self-driving laboratories applied to quantum computing Paper • 2412.07978 • Published Dec 10, 2024 • 1
Towards Scientific Discovery with Generative AI: Progress, Opportunities, and Challenges Paper • 2412.11427 • Published Dec 16, 2024 • 1
AEGIS: An Agent-based Framework for General Bug Reproduction from Issue Descriptions Paper • 2411.18015 • Published Nov 27, 2024 • 1
LLM4SR: A Survey on Large Language Models for Scientific Research Paper • 2501.04306 • Published 10 days ago • 33
Using Generative AI and Multi-Agents to Provide Automatic Feedback Paper • 2411.07407 • Published Nov 11, 2024 • 1
Designing Reliable Experiments with Generative Agent-Based Modeling: A Comprehensive Guide Using Concordia by Google DeepMind Paper • 2411.07038 • Published Nov 11, 2024 • 1
Agent Laboratory: Using LLM Agents as Research Assistants Paper • 2501.04227 • Published 10 days ago • 77
A Multi-AI Agent System for Autonomous Optimization of Agentic AI Solutions via Iterative Refinement and LLM-Driven Feedback Loops Paper • 2412.17149 • Published 26 days ago • 1
Multiagent Finetuning: Self Improvement with Diverse Reasoning Chains Paper • 2501.05707 • Published 8 days ago • 18
EnerVerse: Envisioning Embodied Future Space for Robotics Manipulation Paper • 2501.01895 • Published 14 days ago • 48
Understanding Self-Predictive Learning for Reinforcement Learning Paper • 2212.03319 • Published Dec 6, 2022
Grokfast: Accelerated Grokking by Amplifying Slow Gradients Paper • 2405.20233 • Published May 30, 2024 • 6
Vid2Robot: End-to-end Video-conditioned Policy Learning with Cross-Attention Transformers Paper • 2403.12943 • Published Mar 19, 2024 • 15
LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMs Paper • 2501.06186 • Published 7 days ago • 54
Test of Time: A Benchmark for Evaluating LLMs on Temporal Reasoning Paper • 2406.09170 • Published Jun 13, 2024 • 26
Demystifying Domain-adaptive Post-training for Financial LLMs Paper • 2501.04961 • Published 9 days ago • 10
Enhancing Human-Like Responses in Large Language Models Paper • 2501.05032 • Published 9 days ago • 46
The Lessons of Developing Process Reward Models in Mathematical Reasoning Paper • 2501.07301 • Published 4 days ago • 72
MiniMax-01: Scaling Foundation Models with Lightning Attention Paper • 2501.08313 • Published 3 days ago • 254
HALoGEN: Fantastic LLM Hallucinations and Where to Find Them Paper • 2501.08292 • Published 3 days ago • 16
PokerBench: Training Large Language Models to become Professional Poker Players Paper • 2501.08328 • Published 3 days ago • 13
Tarsier2: Advancing Large Vision-Language Models from Detailed Video Description to Comprehensive Video Understanding Paper • 2501.07888 • Published 4 days ago • 12
Potential and Perils of Large Language Models as Judges of Unstructured Textual Data Paper • 2501.08167 • Published 3 days ago • 6
SPAM: Spike-Aware Adam with Momentum Reset for Stable LLM Training Paper • 2501.06842 • Published 5 days ago • 14
O1 Replication Journey -- Part 3: Inference-time Scaling for Medical Reasoning Paper • 2501.06458 • Published 7 days ago • 29
Evaluating Sample Utility for Data Selection by Mimicking Model Weights Paper • 2501.06708 • Published 6 days ago • 5
ChemAgent: Self-updating Library in Large Language Models Improves Chemical Reasoning Paper • 2501.06590 • Published 6 days ago • 7
OmniThink: Expanding Knowledge Boundaries in Machine Writing through Thinking Paper • 2501.09751 • Published 1 day ago • 28
RLHS: Mitigating Misalignment in RLHF with Hindsight Simulation Paper • 2501.08617 • Published 3 days ago • 7
Learnings from Scaling Visual Tokenizers for Reconstruction and Generation Paper • 2501.09755 • Published 1 day ago • 16
Towards Large Reasoning Models: A Survey of Reinforced Reasoning with Large Language Models Paper • 2501.09686 • Published 1 day ago • 11
FAST: Efficient Action Tokenization for Vision-Language-Action Models Paper • 2501.09747 • Published 1 day ago • 11