davidadamczyk
commited on
Commit
·
c2f28e6
1
Parent(s):
66b85dd
Update text
Browse files- src/display/about.py +32 -3
src/display/about.py
CHANGED
@@ -39,12 +39,14 @@ TITLE = """<h1 align="center" id="space-title">🇨🇿 CzechBench Leaderboard</
|
|
39 |
# What does your leaderboard evaluate?
|
40 |
INTRODUCTION_TEXT = """
|
41 |
Czech-Bench is a collection of LLM benchmarks available for the Czech language. It currently consists of 15 Czech benchmarks, including new machine translations of the popular ARC, GSM8K, MMLU, and TruthfulQA datasets.
|
|
|
|
|
42 |
"""
|
43 |
|
44 |
# Which evaluations are you running? how can people reproduce what you have?
|
45 |
LLM_BENCHMARKS_TEXT = f"""
|
46 |
## Basic Information
|
47 |
-
The goal of this project is to provide a comprehensive and practical benchmark for evaluating Czech language models. This benchmark consists of 15 selected test tasks containing test data in the Czech language. It includes both original Czech datasets and machine translations of popular datasets such as ARC, GSM8K, MMLU, and TruthfulQA. A list of all datasets can be found at [
|
48 |
|
49 |
Key Features and Benefits:
|
50 |
- **Tailored for the Czech Language:** The benchmark includes both original Czech datasets and adapted versions of international datasets, ensuring relevant evaluation of model performance in the Czech context.
|
@@ -54,9 +56,36 @@ Key Features and Benefits:
|
|
54 |
|
55 |
By using this benchmark, you will gain deep insights into the strengths and weaknesses of your models, allowing you to better focus on key areas for optimization. This will not only improve the performance of your models but also enhance their real-world deployment in various Czech contexts.
|
56 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
57 |
|
58 |
-
|
59 |
-
|
60 |
|
61 |
"""
|
62 |
|
|
|
39 |
# What does your leaderboard evaluate?
|
40 |
INTRODUCTION_TEXT = """
|
41 |
Czech-Bench is a collection of LLM benchmarks available for the Czech language. It currently consists of 15 Czech benchmarks, including new machine translations of the popular ARC, GSM8K, MMLU, and TruthfulQA datasets.
|
42 |
+
|
43 |
+
Czech-Bench is developed by <a href="https://huggingface.co/CIIRC-NLP">CIIRC-NLP</a>.
|
44 |
"""
|
45 |
|
46 |
# Which evaluations are you running? how can people reproduce what you have?
|
47 |
LLM_BENCHMARKS_TEXT = f"""
|
48 |
## Basic Information
|
49 |
+
The goal of this project is to provide a comprehensive and practical benchmark for evaluating Czech language models. This benchmark consists of 15 selected test tasks containing test data in the Czech language. It includes both original Czech datasets and machine translations of popular datasets such as ARC, GSM8K, MMLU, and TruthfulQA. A list of all datasets can be found at [GitHub](https://github.com/jirkoada/czechbench_eval_harness/tree/main/lm_eval/tasks/czechbench#readme)
|
50 |
|
51 |
Key Features and Benefits:
|
52 |
- **Tailored for the Czech Language:** The benchmark includes both original Czech datasets and adapted versions of international datasets, ensuring relevant evaluation of model performance in the Czech context.
|
|
|
56 |
|
57 |
By using this benchmark, you will gain deep insights into the strengths and weaknesses of your models, allowing you to better focus on key areas for optimization. This will not only improve the performance of your models but also enhance their real-world deployment in various Czech contexts.
|
58 |
|
59 |
+
## Evaluation Process
|
60 |
+
|
61 |
+
### 1. Install CzechBench:
|
62 |
+
```
|
63 |
+
git clone https://github.com/jirkoada/czechbench_eval_harness.git
|
64 |
+
cd czechbench_eval_harness
|
65 |
+
pip install -e “.[api]”
|
66 |
+
```
|
67 |
+
|
68 |
+
### 2. Run evaluation
|
69 |
+
* `export MODEL=your_model_name` where your_model_name is HF path for public model. For example: `export MODEL=meta-llama/Meta-Llama-3.1-8B-Instruct`
|
70 |
+
* `export OUTPUT_PATH=my_output_path` where my_output_path is directory for evaluation reports
|
71 |
+
|
72 |
+
|
73 |
+
Run following command (you can adjust parameters like batch_size or device):
|
74 |
+
```
|
75 |
+
lm_eval --model hf \\
|
76 |
+
--model_args pretrained=$MODEL \\
|
77 |
+
--tasks czechbench_tasks \\
|
78 |
+
--device cuda:0 \\
|
79 |
+
--batch_size 1 \\
|
80 |
+
--write_out \\
|
81 |
+
--log_samples \\
|
82 |
+
--output_path $OUTPUT_PATH \\
|
83 |
+
--apply_chat_template \\
|
84 |
+
```
|
85 |
+
|
86 |
|
87 |
+
### 3. Upload results to Leaderboard
|
88 |
+
in `$OUTPUT_PATH` directory you can find file `results.json` upload `result.json` to [CzechBench Leaderboard](https://huggingface.co/spaces/CIIRC-NLP/czechbench_leaderboard) on **Submit Here!** tab.
|
89 |
|
90 |
"""
|
91 |
|