task,metric,value,err,version arc_challenge,acc,0.26535836177474403,0.012902554762313967,0 arc_challenge,acc_norm,0.295221843003413,0.013329750293382316,0 arc_easy,acc,0.5896464646464646,0.010093531255765457,0 arc_easy,acc_norm,0.5404040404040404,0.010226230740889027,0 boolq,acc,0.5859327217125382,0.008614932353134947,1 hellaswag,acc,0.40689105755825533,0.004902502514738602,0 hellaswag,acc_norm,0.5210117506472814,0.0049853735507751065,0 sciq,acc,0.851,0.011266140684632175,0 sciq,acc_norm,0.795,0.012772554096113132,0