Post
4500
Contamination free code evaluations with LiveCodeBench! ๐ฅ๏ธ
LiveCodeBench is a new leaderboard, which contains:
- complete code evaluations (on code generation, self repair, code execution, tests)
- my favorite feature: problem selection by publication date ๐
This feature means that you can get model scores averaged only on new problems out of the training data. This means... contamination free code evals! ๐
Check it out!
Blog: https://huggingface.co/blog/leaderboard-livecodebench
Leaderboard: livecodebench/leaderboard
Congrats to @StringChaos @minimario @xu3kev @kingh0730 and @FanjiaYan for the super cool leaderboard!
LiveCodeBench is a new leaderboard, which contains:
- complete code evaluations (on code generation, self repair, code execution, tests)
- my favorite feature: problem selection by publication date ๐
This feature means that you can get model scores averaged only on new problems out of the training data. This means... contamination free code evals! ๐
Check it out!
Blog: https://huggingface.co/blog/leaderboard-livecodebench
Leaderboard: livecodebench/leaderboard
Congrats to @StringChaos @minimario @xu3kev @kingh0730 and @FanjiaYan for the super cool leaderboard!