Spaces:

CIIRC-NLP
/

czechbench_leaderboard

Running

App Files Files Community

Adamiros commited on Oct 11, 2024

Commit

0a6f522

verified ·

1 Parent(s): 5cf1886

Update src/display/about.py

Browse files

Files changed (1) hide show

src/display/about.py +14 -7

src/display/about.py CHANGED Viewed

@@ -48,13 +48,20 @@ currently consists of 15 individual tasks, leveraging pre-existing Czech dataset
 including ARC, GSM8K, MMLU, and TruthfulQA. This work is brought to you by CIIRC CTU and VSB Ostrava.
 Key Features and Benefits:
-- **Tailored for the Czech Language:** The benchmark includes both original Czech datasets and adapted versions of international datasets, ensuring relevant evaluation of model performance in the Czech context.
-- **Wide Range of Tasks:** It contains 15 different tasks that cover various aspects of language understanding and text generation, enabling a comprehensive assessment of the model's capabilities.
-- **Universal model support:** The universal text-to-text evaluation approach adopted in CzechBench allows for direct comparison of models with varying levels of internal access, including commercial APIs.
-- **Ease of Use:** The benchmark is designed to be easily integrated into your development process, saving time and resources during model testing and improvement.
-- **Up-to-date and Relevant:** We regularly update our datasets to reflect the latest findings and trends in language model development.
-By using CzechBench, you will gain deep insights into the strengths and weaknesses of your models, allowing you to better focus on key areas for optimization.
-This will not only improve the performance of your models but also enhance their real-world deployment in various Czech contexts.
 Below, you can find the up-to-date loaderboard of models evaluated on CzechBench.
 For more information on the included benchmarks and instructions on evaluating your own models, please visit the "About" section below.

 including ARC, GSM8K, MMLU, and TruthfulQA. This work is brought to you by CIIRC CTU and VSB Ostrava.
 Key Features and Benefits:
+- **Tailored for the Czech Language:**
+CzechBench includes both original Czech datasets and adapted versions of international datasets, ensuring relevant evaluation of model performance in the Czech context.
+- **Wide Range of Tasks:**
+It contains 15 different tasks that cover various aspects of language understanding and text generation, enabling a comprehensive assessment of the model's capabilities.
+- **Bilingual performance analysis:**
+CzechBench also offers a parallel collection of 9 English tasks corresponding to the Czech versions included in the main suite.
+This allows for direct comparison of model performance across both languages with equivalent conditions in terms of prompt formulation and few-shot example selection.
+- **Universal model support:**
+The universal text-to-text evaluation approach adopted in CzechBench allows for direct comparison of models with varying levels of internal access, including commercial APIs.
+- **Ease of Use:**
+The benchmark is built upon a commonly used evaluation framework with wide support for state-of-the-art models and inference acceleration tools.
+- **Empowering decisions:**
+Whether you are a business looking for the best LLM solution to base your application on, or a research team trying to maximize the capabilities of the models they are developing,
+CzechBench will help you gain insights into particular strengths and weeknesses of individual models and better focus on key areas for optimization.
 Below, you can find the up-to-date loaderboard of models evaluated on CzechBench.
 For more information on the included benchmarks and instructions on evaluating your own models, please visit the "About" section below.