Spaces:

AILab-CVC
/

SEED-Bench_Leaderboard

Running

tttoaster commited on Dec 4, 2023

Commit

8b3d9a9

1 Parent(s): 98e0fb3

Update constants.py

Files changed (1) hide show

constants.py CHANGED Viewed

@@ -82,6 +82,11 @@ TABLE_INTRODUCTION = """In the table below, we summarize each task performance o
         We use accurancy(%) as the primary evaluation metric for each tasks.
         SEED-Bench-1 calculates the overall accuracy by dividing the total number of correct QA answers by the total number of QA questions.
         SEED-Bench-2 represents the overall accuracy using the average accuracy of each dimension.
     """
 LEADERBORAD_INFO = """

         We use accurancy(%) as the primary evaluation metric for each tasks.
         SEED-Bench-1 calculates the overall accuracy by dividing the total number of correct QA answers by the total number of QA questions.
         SEED-Bench-2 represents the overall accuracy using the average accuracy of each dimension.
+        For PPL evaluation method, we count the loss for each candidate and select the lowest loss candidate. For detail, please refer [InternLM_Xcomposer_VL_interface](https://github.com/AILab-CVC/SEED-Bench/blob/387a067b6ba99ae5e8231f39ae2d2e453765765c/SEED-Bench-2/model/InternLM_Xcomposer_VL_interface.py#L74).
+        For PPL A/B/C/D evaluation method, please refer [EVAL_SEED.md](https://github.com/QwenLM/Qwen-VL/blob/master/eval_mm/seed_bench/EVAL_SEED.md) for more information.
+        For Generate evaluation method, please refer [Evaluation.md](https://github.com/haotian-liu/LLaVA/blob/main/docs/Evaluation.md#seed-bench) for detailed.
+        For the NG evaluation method, we indicate that the evaluation method is Not Given.
+        If you have any questions, please feel free to contact us.
     """
 LEADERBORAD_INFO = """