How do multi-step agents work?

The ReAct framework (Yao et al., 2022) is currently the main approach to building agents.

The name is based on the concatenation of two words, “Reason” and “Act.” Indeed, agents following this architecture will solve their task in as many steps as needed, each step consisting of a Reasoning step, then an Action step where it formulates tool calls that will bring it closer to solving the task at hand.

React process involves keeping a memory of past steps.

Read Open-source LLMs as LangChain Agents blog post to learn more about multi-step agents.

Here is a video overview of how that works:

Framework of a React Agent

We implement two versions of ToolCallingAgent:

ToolCallingAgent generates tool calls as a JSON in its output.
CodeAgent is a new type of ToolCallingAgent that generates its tool calls as blobs of code, which works really well for LLMs that have strong coding performance.

We also provide an option to run agents in one-shot: just pass single_step=True when launching the agent, like agent.run(your_task, single_step=True)

< > Update on GitHub

smolagents

How do multi-step agents work?