Update src/display/about.py
Browse files- src/display/about.py +22 -17
src/display/about.py
CHANGED
@@ -55,27 +55,28 @@ For more information on the included benchmarks and instructions on evaluating y
|
|
55 |
|
56 |
# Which evaluations are you running? how can people reproduce what you have?
|
57 |
LLM_BENCHMARKS_TEXT = f"""
|
58 |
-
|
|
|
59 |
|
60 |
All currently supported benchmarks are listed in the table below:
|
61 |
|
62 |
| Dataset | Language | Task type | Metrics | Samples | Task ID |
|
63 |
| ------------------------------------------------------------ | ----------------------------- | -------------------------- | -------------- | ------: | --------------- |
|
64 |
-
| [AGREE](https://github.com/jirkoada/czechbench_eval_harness/tree/main/lm_eval/tasks/
|
65 |
-
| [ANLI](https://github.com/jirkoada/czechbench_eval_harness/tree/main/lm_eval/tasks/
|
66 |
-
| [ARC Challenge](https://github.com/jirkoada/czechbench_eval_harness/tree/main/lm_eval/tasks/
|
67 |
-
| [ARC Easy](https://github.com/jirkoada/czechbench_eval_harness/tree/main/lm_eval/tasks/
|
68 |
-
| [Belebele](https://github.com/jirkoada/czechbench_eval_harness/tree/main/lm_eval/tasks/
|
69 |
-
| [CTKFacts](https://github.com/jirkoada/czechbench_eval_harness/tree/main/lm_eval/tasks/
|
70 |
-
| [Czech News](https://github.com/jirkoada/czechbench_eval_harness/tree/main/lm_eval/tasks/
|
71 |
-
| [Facebook Comments](https://github.com/jirkoada/czechbench_eval_harness/tree/main/lm_eval/tasks/
|
72 |
-
| [GSM8K](https://github.com/jirkoada/czechbench_eval_harness/tree/main/lm_eval/tasks/
|
73 |
-
| [Klokánek](https://github.com/jirkoada/czechbench_eval_harness/tree/main/lm_eval/tasks/
|
74 |
-
| [Mall Reviews](https://github.com/jirkoada/czechbench_eval_harness/tree/main/lm_eval/tasks/
|
75 |
-
| [MMLU](https://github.com/jirkoada/czechbench_eval_harness/tree/main/lm_eval/tasks/
|
76 |
-
| [SQAD](https://github.com/jirkoada/czechbench_eval_harness/tree/main/lm_eval/tasks/
|
77 |
-
| [Subjectivity](https://github.com/jirkoada/czechbench_eval_harness/tree/main/lm_eval/tasks/
|
78 |
-
| [TruthfulQA](https://github.com/jirkoada/czechbench_eval_harness/tree/main/lm_eval/tasks/
|
79 |
|
80 |
## Evaluation Process
|
81 |
|
@@ -103,10 +104,14 @@ lm_eval --model hf \\
|
|
103 |
--output_path $OUTPUT_PATH \\
|
104 |
--apply_chat_template \\
|
105 |
```
|
|
|
|
|
|
|
106 |
|
107 |
|
108 |
### 3. Upload results to Leaderboard
|
109 |
-
|
|
|
110 |
|
111 |
"""
|
112 |
|
|
|
55 |
|
56 |
# Which evaluations are you running? how can people reproduce what you have?
|
57 |
LLM_BENCHMARKS_TEXT = f"""
|
58 |
+
The CzechBench evaluation suite is hosted on [GitHub](https://github.com/jirkoada/czechbench_eval_harness/tree/main/lm_eval/tasks/czechbench#readme).
|
59 |
+
It is implemented on top of the popular [Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness) framework, which provides extensive model compatibility and optimal evaluation efficiency.
|
60 |
|
61 |
All currently supported benchmarks are listed in the table below:
|
62 |
|
63 |
| Dataset | Language | Task type | Metrics | Samples | Task ID |
|
64 |
| ------------------------------------------------------------ | ----------------------------- | -------------------------- | -------------- | ------: | --------------- |
|
65 |
+
| [AGREE](https://github.com/jirkoada/czechbench_eval_harness/tree/main/lm_eval/tasks/czechbench/agree_cs) | CS (Original) | Subject-verb agreement | Acc | 627 | agree_cs |
|
66 |
+
| [ANLI](https://github.com/jirkoada/czechbench_eval_harness/tree/main/lm_eval/tasks/czechbench/anli_cs) | CS (Translated) | Natural Language Inference | Acc, Macro F1 | 1200 | anli_cs |
|
67 |
+
| [ARC Challenge](https://github.com/jirkoada/czechbench_eval_harness/tree/main/lm_eval/tasks/czechbench/arc_cs) | CS (Translated) | Knowledge-Based QA | Acc | 1172 | arc_cs |
|
68 |
+
| [ARC Easy](https://github.com/jirkoada/czechbench_eval_harness/tree/main/lm_eval/tasks/czechbench/arc_cs) | CS (Translated) | Knowledge-Based QA | Acc | 2376 | arc_cs |
|
69 |
+
| [Belebele](https://github.com/jirkoada/czechbench_eval_harness/tree/main/lm_eval/tasks/czechbench/belebele_cs) | CS (Professional translation) | Reading Comprehension / QA | Acc | 895 | belebele_cs |
|
70 |
+
| [CTKFacts](https://github.com/jirkoada/czechbench_eval_harness/tree/main/lm_eval/tasks/czechbench/ctkfacts_cs) | CS (Original) | Natural Language Inference | Acc, Macro F1 | 558 | ctkfacts_cs |
|
71 |
+
| [Czech News](https://github.com/jirkoada/czechbench_eval_harness/tree/main/lm_eval/tasks/czechbench/czechnews_cs) | CS (Original) | News Topic Classification | Acc, Macro F1 | 1000 | czechnews_cs |
|
72 |
+
| [Facebook Comments](https://github.com/jirkoada/czechbench_eval_harness/tree/main/lm_eval/tasks/czechbench/fb_comments_cs) | CS (Original) | Sentiment Analysis | Acc, Macro F1 | 1000 | fb_comments_cs |
|
73 |
+
| [GSM8K](https://github.com/jirkoada/czechbench_eval_harness/tree/main/lm_eval/tasks/czechbench/gsm8k_cs) | CS (Translated) | Mathematical inference | EM Acc | 1319 | gsm8k_cs |
|
74 |
+
| [Klokánek](https://github.com/jirkoada/czechbench_eval_harness/tree/main/lm_eval/tasks/czechbench/klokanek_cs) | CS (Original) | Math/Logical Inference | Acc | 808 | klokanek_cs |
|
75 |
+
| [Mall Reviews](https://github.com/jirkoada/czechbench_eval_harness/tree/main/lm_eval/tasks/czechbench/mall_reviews_cs) | CS (Original) | Sentiment Analysis | Acc, Macro F1 | 3000 | mall_reviews_cs |
|
76 |
+
| [MMLU](https://github.com/jirkoada/czechbench_eval_harness/tree/main/lm_eval/tasks/czechbench/mmlu_cs) | CS (Translated) | Knowledge-Based QA | Acc | 12408 | mmlu_cs |
|
77 |
+
| [SQAD](https://github.com/jirkoada/czechbench_eval_harness/tree/main/lm_eval/tasks/czechbench/sqad_cs) | CS (Original) | Reading Comprehension / QA | EM Acc, BoW F1 | 843 | sqad_cs |
|
78 |
+
| [Subjectivity](https://github.com/jirkoada/czechbench_eval_harness/tree/main/lm_eval/tasks/czechbench/subjectivity_cs) | CS (Original) | Subjectivity Analysis | Acc, Macro F1 | 2000 | subjectivity_cs |
|
79 |
+
| [TruthfulQA](https://github.com/jirkoada/czechbench_eval_harness/tree/main/lm_eval/tasks/czechbench/truthfulqa_cs) | CS (Translated) | Knowledge-Based QA | Acc | 813 | truthfulqa_cs |
|
80 |
|
81 |
## Evaluation Process
|
82 |
|
|
|
104 |
--output_path $OUTPUT_PATH \\
|
105 |
--apply_chat_template \\
|
106 |
```
|
107 |
+
|
108 |
+
For advanced usage instructions, please inspect the [CzechBench README on GitHub](https://github.com/jirkoada/czechbench_eval_harness/tree/main/lm_eval/tasks/czechbench#readme)
|
109 |
+
or the official [LM Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness) documentation.
|
110 |
|
111 |
|
112 |
### 3. Upload results to Leaderboard
|
113 |
+
Inside the `$OUTPUT_PATH` directory, you can find the file `results.json`.
|
114 |
+
To submit your evaluation results to our leaderboard, please visit the "Submit here!" section above and upload your `results.json` file.
|
115 |
|
116 |
"""
|
117 |
|