Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,29 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
## Task Performance Metrics
|
2 |
+
|
3 |
+
The following table displays the performance metrics for various tasks, including accuracy (`acc`) and normalized accuracy (`acc_norm`). The 'Value' column represents the accuracy, and 'Stderr' indicates the standard error for each metric.
|
4 |
+
|
5 |
+
| **Task** | **Version** | **Metric** | **Value** | **Stderr** |
|
6 |
+
|----------------|-------------|------------|-----------|------------|
|
7 |
+
| arc_challenge | 0 | acc | 0.4334 | ± 0.0145 |
|
8 |
+
| | | acc_norm | 0.4394 | ± 0.0145 |
|
9 |
+
|----------------|-------------|------------|-----------|------------|
|
10 |
+
| arc_easy | 0 | acc | 0.6974 | ± 0.0094 |
|
11 |
+
| | | acc_norm | 0.6170 | ± 0.0100 |
|
12 |
+
|----------------|-------------|------------|-----------|------------|
|
13 |
+
| boolq | 1 | acc | 0.8171 | ± 0.0068 |
|
14 |
+
|----------------|-------------|------------|-----------|------------|
|
15 |
+
| hellaswag | 0 | acc | 0.5770 | ± 0.0049 |
|
16 |
+
| | | acc_norm | 0.7391 | ± 0.0044 |
|
17 |
+
|----------------|-------------|------------|-----------|------------|
|
18 |
+
| openbookqa | 0 | acc | 0.2800 | ± 0.0201 |
|
19 |
+
| | | acc_norm | 0.3760 | ± 0.0217 |
|
20 |
+
|----------------|-------------|------------|-----------|------------|
|
21 |
+
| piqa | 0 | acc | 0.7797 | ± 0.0097 |
|
22 |
+
| | | acc_norm | 0.7622 | ± 0.0099 |
|
23 |
+
|----------------|-------------|------------|-----------|------------|
|
24 |
+
| toxigen | 0 | acc | 0.4777 | ± 0.0163 |
|
25 |
+
| | | acc_norm | 0.4340 | ± 0.0162 |
|
26 |
+
|----------------|-------------|------------|-----------|------------|
|
27 |
+
| winogrande | 0 | acc | 0.6322 | ± 0.0136 |
|
28 |
+
|----------------|-------------|------------|-----------|------------|
|
29 |
+
| gsm8k | 0 | acc | 0.0144 | ± 0.0033 |
|