CodeElo: Benchmarking Competition-level Code Generation of LLMs with Human-comparable Elo Ratings Paper • 2501.01257 • Published 5 days ago • 42
AgentTrek: Agent Trajectory Synthesis via Guiding Replay with Web Tutorials Paper • 2412.09605 • Published 26 days ago • 26
AgentTrek: Agent Trajectory Synthesis via Guiding Replay with Web Tutorials Paper • 2412.09605 • Published 26 days ago • 26
AgentTrek: Agent Trajectory Synthesis via Guiding Replay with Web Tutorials Paper • 2412.09605 • Published 26 days ago • 26
AgentTrek: Agent Trajectory Synthesis via Guiding Replay with Web Tutorials Paper • 2412.09605 • Published 26 days ago • 26