SUPER: Evaluating Agents on Setting Up and Executing Tasks from Research Repositories Paper • 2409.07440 • Published Sep 11, 2024 • 6
AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agents Paper • 2407.18901 • Published Jul 26, 2024 • 32