SivilTaram
commited on
Update README.md
Browse filesUpdate the first 16 model's performance
README.md
CHANGED
@@ -1,3 +1,38 @@
|
|
1 |
-
---
|
2 |
-
license: mit
|
3 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: mit
|
3 |
+
---
|
4 |
+
|
5 |
+
|
6 |
+
| **Task / Model** | **model-index-1** | **model-index-2** | **model-index-3** | **model-index-4** | **model-index-5** | **model-index-6** | **model-index-7** | **model-index-8** |
|
7 |
+
|--------------------------|----------------|----------------|----------------|----------------|----------------|----------------|----------------|----------------|
|
8 |
+
| **Social IQA** | 33.27 | 33.33 | 33.62 | 33.53 | 33.49 | 33.56 | 33.62 | 33.55 |
|
9 |
+
| **HellaSwag** | 40.58 | 36.86 | 40.58 | 36.06 | 40.07 | 37.85 | 37.93 | 39.59 |
|
10 |
+
| **PiQA** | 67.29 | 65.14 | 67.97 | 64.66 | 67.03 | 65.36 | 66.0 | 66.55 |
|
11 |
+
| **OpenBookQA** | 28.63 | 27.87 | 29.33 | 29.1 | 29.23 | 28.33 | 29.13 | 28.73 |
|
12 |
+
| **Lambada** | 29.17 | 26.86 | 31.55 | 27.11 | 29.16 | 28.92 | 31.53 | 30.92 |
|
13 |
+
| **SciQ** | 80.68 | 79.98 | 81.05 | 80.8 | 82.4 | 79.88 | 78.67 | 79.7 |
|
14 |
+
| **COPA** | 70.5 | 63.83 | 69.17 | 65.0 | 67.5 | 66.0 | 66.67 | 68.67 |
|
15 |
+
| **RACE** | 29.47 | 30.0 | 32.11 | 28.82 | 31.13 | 30.06 | 29.9 | 30.75 |
|
16 |
+
| **ARC Easy** | 50.03 | 48.72 | 50.01 | 46.64 | 51.06 | 47.46 | 46.75 | 48.39 |
|
17 |
+
| **LogiQA** | 23.76 | 24.17 | 25.29 | 25.29 | 24.55 | 25.96 | 25.45 | 26.32 |
|
18 |
+
| **QQP** | 55.71 | 55.9 | 54.84 | 56.52 | 54.01 | 56.34 | 52.35 | 54.2 |
|
19 |
+
| **WinoGrande** | 51.54 | 51.59 | 51.39 | 50.91 | 53.13 | 52.26 | 51.26 | 51.45 |
|
20 |
+
| **MultiRC** | 52.65 | 53.39 | 51.89 | 50.92 | 49.03 | 53.09 | 53.64 | 50.23 |
|
21 |
+
| **Avg** | 47.18 | 45.97 | 47.60 | 45.80 | 47.06 | 46.54 | 46.38 | 46.85 |
|
22 |
+
|
23 |
+
| **Task / Model** | **model-index-9** | **model-index-10** | **model-index-11** | **model-index-12** | **model-index-13** | **model-index-14** | **model-index-15** | **model-index-16** |
|
24 |
+
|--------------------------|----------------|-----------------|-----------------|-----------------|-----------------|-----------------|-----------------|-----------------|
|
25 |
+
| **Social IQA** | 33.43 | 33.21 | 33.31 | 33.17 | 33.28 | 32.43 | 33.57 | 33.7 |
|
26 |
+
| **HellaSwag** | 40.05 | 35.89 | 39.55 | 39.89 | 38.63 | 36.18 | 39.52 | 35.94 |
|
27 |
+
| **PiQA** | 66.6 | 64.74 | 66.29 | 66.27 | 66.9 | 64.05 | 66.7 | 64.51 |
|
28 |
+
| **OpenBookQA** | 28.87 | 26.6 | 29.33 | 28.73 | 29.4 | 27.87 | 29.67 | 27.83 |
|
29 |
+
| **Lambada** | 31.39 | 27.37 | 30.32 | 30.31 | 31.38 | 26.25 | 29.86 | 26.95 |
|
30 |
+
| **SciQ** | 81.1 | 79.12 | 79.97 | 82.85 | 79.42 | 81.4 | 81.38 | 81.23 |
|
31 |
+
| **COPA** | 67.0 | 64.5 | 66.83 | 69.5 | 67.33 | 65.83 | 69.5 | 66.33 |
|
32 |
+
| **RACE** | 30.57 | 29.63 | 30.49 | 30.85 | 30.35 | 28.66 | 31.21 | 29.57 |
|
33 |
+
| **ARC Easy** | 50.66 | 47.74 | 47.47 | 50.18 | 49.92 | 49.52 | 50.73 | 48.65 |
|
34 |
+
| **LogiQA** | 23.6 | 25.65 | 26.37 | 23.81 | 25.58 | 26.29 | 25.86 | 25.12 |
|
35 |
+
| **QQP** | 54.89 | 54.79 | 54.2 | 55.23 | 53.69 | 57.09 | 53.95 | 54.24 |
|
36 |
+
| **WinoGrande** | 50.83 | 51.84 | 51.05 | 51.83 | 52.12 | 52.0 | 51.01 | 51.82 |
|
37 |
+
| **MultiRC** | 54.18 | 54.48 | 50.17 | 52.12 | 51.42 | 52.69 | 51.87 | 53.48 |
|
38 |
+
| **Avg** | 47.17 | 45.81 | 46.57 | 47.29 | 46.88 | 46.17 | 47.30 | 46.11 |
|