Update README.md
Browse files
README.md
CHANGED
@@ -98,6 +98,7 @@ The dataset is comprised of a mixture of open datasets large-scale datasets avai
|
|
98 |
| GPT-4 | -| RLHF |8.99| 95.28|
|
99 |
|
100 |
## Other benchmark:
|
|
|
101 |
| Metric | Value |
|
102 |
|-----------------------|---------------------------|
|
103 |
| ARC (25-shot) | 47.0 |
|
@@ -108,9 +109,10 @@ The dataset is comprised of a mixture of open datasets large-scale datasets avai
|
|
108 |
| GSM8K (5-shot) | 42.3 |
|
109 |
|
110 |
|
111 |
-
|
112 |
-
Average: 35.26
|
113 |
|
|
|
|
|
114 |
|
115 |
| Task | Version | Metric | Value | Stderr |
|
116 |
|-----------------------------------------------------|---------|-------------------------|-------|--------|
|
|
|
98 |
| GPT-4 | -| RLHF |8.99| 95.28|
|
99 |
|
100 |
## Other benchmark:
|
101 |
+
1. HuggingFace OpenLLM Leaderboard
|
102 |
| Metric | Value |
|
103 |
|-----------------------|---------------------------|
|
104 |
| ARC (25-shot) | 47.0 |
|
|
|
109 |
| GSM8K (5-shot) | 42.3 |
|
110 |
|
111 |
|
112 |
+
2. BigBench:
|
|
|
113 |
|
114 |
+
- Average: 35.26
|
115 |
+
- Details:
|
116 |
|
117 |
| Task | Version | Metric | Value | Stderr |
|
118 |
|-----------------------------------------------------|---------|-------------------------|-------|--------|
|