Is this a mistake?
@cakebabi Its overfit so it performs poorly at general knowledge questions outside of the the tight selection of domains it was trained on. Anything that performs low on SimpleQA but high on MMLU is likely to be overfit. No it is not a mistake.
· Sign up or log in to comment