Is there a human baseline reported for this benchmark?
There is no human baseline on the full set. However, we find that experienced programmers can achieve a 97% pass rate on a sampled subset, as reported in the paper.
· Sign up or log in to comment