SciCode: A Research Coding Benchmark Curated by Scientists Paper • 2407.13168 • Published Jul 18, 2024 • 14
AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks? Paper • 2407.15711 • Published Jul 22, 2024 • 9
The Vision of Autonomic Computing: Can LLMs Make It a Reality? Paper • 2407.14402 • Published Jul 19, 2024 • 14