Adamiros commited on
Commit
041fe0a
·
verified ·
1 Parent(s): ae10add

Update src/display/about.py

Browse files
Files changed (1) hide show
  1. src/display/about.py +22 -17
src/display/about.py CHANGED
@@ -55,27 +55,28 @@ For more information on the included benchmarks and instructions on evaluating y
55
 
56
  # Which evaluations are you running? how can people reproduce what you have?
57
  LLM_BENCHMARKS_TEXT = f"""
58
- ## Included benchmarks
 
59
 
60
  All currently supported benchmarks are listed in the table below:
61
 
62
  | Dataset | Language | Task type | Metrics | Samples | Task ID |
63
  | ------------------------------------------------------------ | ----------------------------- | -------------------------- | -------------- | ------: | --------------- |
64
- | [AGREE](https://github.com/jirkoada/czechbench_eval_harness/tree/main/lm_eval/tasks/czechbenchagree_cs) | CS (Original) | Subject-verb agreement | Acc | 627 | agree_cs |
65
- | [ANLI](https://github.com/jirkoada/czechbench_eval_harness/tree/main/lm_eval/tasks/czechbenchanli_cs) | CS (Translated) | Natural Language Inference | Acc, Macro F1 | 1200 | anli_cs |
66
- | [ARC Challenge](https://github.com/jirkoada/czechbench_eval_harness/tree/main/lm_eval/tasks/czechbencharc_cs) | CS (Translated) | Knowledge-Based QA | Acc | 1172 | arc_cs |
67
- | [ARC Easy](https://github.com/jirkoada/czechbench_eval_harness/tree/main/lm_eval/tasks/czechbencharc_cs) | CS (Translated) | Knowledge-Based QA | Acc | 2376 | arc_cs |
68
- | [Belebele](https://github.com/jirkoada/czechbench_eval_harness/tree/main/lm_eval/tasks/czechbenchbelebele_cs) | CS (Professional translation) | Reading Comprehension / QA | Acc | 895 | belebele_cs |
69
- | [CTKFacts](https://github.com/jirkoada/czechbench_eval_harness/tree/main/lm_eval/tasks/czechbenchctkfacts_cs) | CS (Original) | Natural Language Inference | Acc, Macro F1 | 558 | ctkfacts_cs |
70
- | [Czech News](https://github.com/jirkoada/czechbench_eval_harness/tree/main/lm_eval/tasks/czechbenchczechnews_cs) | CS (Original) | News Topic Classification | Acc, Macro F1 | 1000 | czechnews_cs |
71
- | [Facebook Comments](https://github.com/jirkoada/czechbench_eval_harness/tree/main/lm_eval/tasks/czechbenchfb_comments_cs) | CS (Original) | Sentiment Analysis | Acc, Macro F1 | 1000 | fb_comments_cs |
72
- | [GSM8K](https://github.com/jirkoada/czechbench_eval_harness/tree/main/lm_eval/tasks/czechbenchgsm8k_cs) | CS (Translated) | Mathematical inference | EM Acc | 1319 | gsm8k_cs |
73
- | [Klokánek](https://github.com/jirkoada/czechbench_eval_harness/tree/main/lm_eval/tasks/czechbenchklokanek_cs) | CS (Original) | Math/Logical Inference | Acc | 808 | klokanek_cs |
74
- | [Mall Reviews](https://github.com/jirkoada/czechbench_eval_harness/tree/main/lm_eval/tasks/czechbenchmall_reviews_cs) | CS (Original) | Sentiment Analysis | Acc, Macro F1 | 3000 | mall_reviews_cs |
75
- | [MMLU](https://github.com/jirkoada/czechbench_eval_harness/tree/main/lm_eval/tasks/czechbenchmmlu_cs) | CS (Translated) | Knowledge-Based QA | Acc | 12408 | mmlu_cs |
76
- | [SQAD](https://github.com/jirkoada/czechbench_eval_harness/tree/main/lm_eval/tasks/czechbenchsqad_cs) | CS (Original) | Reading Comprehension / QA | EM Acc, BoW F1 | 843 | sqad_cs |
77
- | [Subjectivity](https://github.com/jirkoada/czechbench_eval_harness/tree/main/lm_eval/tasks/czechbenchsubjectivity_cs) | CS (Original) | Subjectivity Analysis | Acc, Macro F1 | 2000 | subjectivity_cs |
78
- | [TruthfulQA](https://github.com/jirkoada/czechbench_eval_harness/tree/main/lm_eval/tasks/czechbenchtruthfulqa_cs) | CS (Translated) | Knowledge-Based QA | Acc | 813 | truthfulqa_cs |
79
 
80
  ## Evaluation Process
81
 
@@ -103,10 +104,14 @@ lm_eval --model hf \\
103
  --output_path $OUTPUT_PATH \\
104
  --apply_chat_template \\
105
  ```
 
 
 
106
 
107
 
108
  ### 3. Upload results to Leaderboard
109
- in `$OUTPUT_PATH` directory you can find file `results.json` upload `result.json` to [CzechBench Leaderboard](https://huggingface.co/spaces/CIIRC-NLP/czechbench_leaderboard) on **Submit Here!** tab.
 
110
 
111
  """
112
 
 
55
 
56
  # Which evaluations are you running? how can people reproduce what you have?
57
  LLM_BENCHMARKS_TEXT = f"""
58
+ The CzechBench evaluation suite is hosted on [GitHub](https://github.com/jirkoada/czechbench_eval_harness/tree/main/lm_eval/tasks/czechbench#readme).
59
+ It is implemented on top of the popular [Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness) framework, which provides extensive model compatibility and optimal evaluation efficiency.
60
 
61
  All currently supported benchmarks are listed in the table below:
62
 
63
  | Dataset | Language | Task type | Metrics | Samples | Task ID |
64
  | ------------------------------------------------------------ | ----------------------------- | -------------------------- | -------------- | ------: | --------------- |
65
+ | [AGREE](https://github.com/jirkoada/czechbench_eval_harness/tree/main/lm_eval/tasks/czechbench/agree_cs) | CS (Original) | Subject-verb agreement | Acc | 627 | agree_cs |
66
+ | [ANLI](https://github.com/jirkoada/czechbench_eval_harness/tree/main/lm_eval/tasks/czechbench/anli_cs) | CS (Translated) | Natural Language Inference | Acc, Macro F1 | 1200 | anli_cs |
67
+ | [ARC Challenge](https://github.com/jirkoada/czechbench_eval_harness/tree/main/lm_eval/tasks/czechbench/arc_cs) | CS (Translated) | Knowledge-Based QA | Acc | 1172 | arc_cs |
68
+ | [ARC Easy](https://github.com/jirkoada/czechbench_eval_harness/tree/main/lm_eval/tasks/czechbench/arc_cs) | CS (Translated) | Knowledge-Based QA | Acc | 2376 | arc_cs |
69
+ | [Belebele](https://github.com/jirkoada/czechbench_eval_harness/tree/main/lm_eval/tasks/czechbench/belebele_cs) | CS (Professional translation) | Reading Comprehension / QA | Acc | 895 | belebele_cs |
70
+ | [CTKFacts](https://github.com/jirkoada/czechbench_eval_harness/tree/main/lm_eval/tasks/czechbench/ctkfacts_cs) | CS (Original) | Natural Language Inference | Acc, Macro F1 | 558 | ctkfacts_cs |
71
+ | [Czech News](https://github.com/jirkoada/czechbench_eval_harness/tree/main/lm_eval/tasks/czechbench/czechnews_cs) | CS (Original) | News Topic Classification | Acc, Macro F1 | 1000 | czechnews_cs |
72
+ | [Facebook Comments](https://github.com/jirkoada/czechbench_eval_harness/tree/main/lm_eval/tasks/czechbench/fb_comments_cs) | CS (Original) | Sentiment Analysis | Acc, Macro F1 | 1000 | fb_comments_cs |
73
+ | [GSM8K](https://github.com/jirkoada/czechbench_eval_harness/tree/main/lm_eval/tasks/czechbench/gsm8k_cs) | CS (Translated) | Mathematical inference | EM Acc | 1319 | gsm8k_cs |
74
+ | [Klokánek](https://github.com/jirkoada/czechbench_eval_harness/tree/main/lm_eval/tasks/czechbench/klokanek_cs) | CS (Original) | Math/Logical Inference | Acc | 808 | klokanek_cs |
75
+ | [Mall Reviews](https://github.com/jirkoada/czechbench_eval_harness/tree/main/lm_eval/tasks/czechbench/mall_reviews_cs) | CS (Original) | Sentiment Analysis | Acc, Macro F1 | 3000 | mall_reviews_cs |
76
+ | [MMLU](https://github.com/jirkoada/czechbench_eval_harness/tree/main/lm_eval/tasks/czechbench/mmlu_cs) | CS (Translated) | Knowledge-Based QA | Acc | 12408 | mmlu_cs |
77
+ | [SQAD](https://github.com/jirkoada/czechbench_eval_harness/tree/main/lm_eval/tasks/czechbench/sqad_cs) | CS (Original) | Reading Comprehension / QA | EM Acc, BoW F1 | 843 | sqad_cs |
78
+ | [Subjectivity](https://github.com/jirkoada/czechbench_eval_harness/tree/main/lm_eval/tasks/czechbench/subjectivity_cs) | CS (Original) | Subjectivity Analysis | Acc, Macro F1 | 2000 | subjectivity_cs |
79
+ | [TruthfulQA](https://github.com/jirkoada/czechbench_eval_harness/tree/main/lm_eval/tasks/czechbench/truthfulqa_cs) | CS (Translated) | Knowledge-Based QA | Acc | 813 | truthfulqa_cs |
80
 
81
  ## Evaluation Process
82
 
 
104
  --output_path $OUTPUT_PATH \\
105
  --apply_chat_template \\
106
  ```
107
+
108
+ For advanced usage instructions, please inspect the [CzechBench README on GitHub](https://github.com/jirkoada/czechbench_eval_harness/tree/main/lm_eval/tasks/czechbench#readme)
109
+ or the official [LM Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness) documentation.
110
 
111
 
112
  ### 3. Upload results to Leaderboard
113
+ Inside the `$OUTPUT_PATH` directory, you can find the file `results.json`.
114
+ To submit your evaluation results to our leaderboard, please visit the "Submit here!" section above and upload your `results.json` file.
115
 
116
  """
117