Update README.md
Browse files
README.md
CHANGED
@@ -101,7 +101,6 @@ The table below summarizes evaluation results across mathematical tasks and orig
|
|
101 |
| **Control LLM*** | 38.1 | 62.7 | **90.4**| 63.2 | 79.7 | 25.2 | **68.1**| 43.6 | **57.2** | **60.2** |
|
102 |
|
103 |
---
|
104 |
-
|
105 |
### Explanation:
|
106 |
- **MH**: MathHard
|
107 |
- **M**: Math
|
@@ -112,5 +111,4 @@ The table below summarizes evaluation results across mathematical tasks and orig
|
|
112 |
- **MLU**: MMLU (Massive Multitask Language Understanding)
|
113 |
- **MLUP**: MMLU Pro
|
114 |
- **O-Avg**: Orginal Capability - Average across ARC, GPQA, MMLU, and MMLUP
|
115 |
-
- **Overall**: Combined average across all tasks
|
116 |
-
|
|
|
101 |
| **Control LLM*** | 38.1 | 62.7 | **90.4**| 63.2 | 79.7 | 25.2 | **68.1**| 43.6 | **57.2** | **60.2** |
|
102 |
|
103 |
---
|
|
|
104 |
### Explanation:
|
105 |
- **MH**: MathHard
|
106 |
- **M**: Math
|
|
|
111 |
- **MLU**: MMLU (Massive Multitask Language Understanding)
|
112 |
- **MLUP**: MMLU Pro
|
113 |
- **O-Avg**: Orginal Capability - Average across ARC, GPQA, MMLU, and MMLUP
|
114 |
+
- **Overall**: Combined average across all tasks
|
|