hexuan21 commited on
Commit
0c1ac5e
·
verified ·
1 Parent(s): 90867ec

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +47 -2
README.md CHANGED
@@ -19,7 +19,52 @@ pipeline_tag: visual-question-answering
19
  ![MantisScore](https://tiger-ai-lab.github.io/MantisScore/static/images/teaser.png)
20
 
21
  ## Introduction
22
- - MantisScore is a video quality evaluation model, trained on VideoEval[VideoEval](https://huggingface.co/datasets/TIGER-Lab/VideoEval),
 
23
  a large video evaluation dataset with multi-aspect human scores.
24
 
25
- - MantisScore trained on
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
19
  ![MantisScore](https://tiger-ai-lab.github.io/MantisScore/static/images/teaser.png)
20
 
21
  ## Introduction
22
+ - MantisScore is a video quality evaluation model, taking [Mantis-8B-Idefics2](https://huggingface.co/TIGER-Lab/Mantis-8B-Idefics2) as base-model
23
+ and trained on [VideoEval](https://huggingface.co/datasets/TIGER-Lab/VideoEval),
24
  a large video evaluation dataset with multi-aspect human scores.
25
 
26
+ - MantisScore can reach 75+ Spearman correlation with humans on VideoEval-test, surpassing all the MLLM-prompting methods and feature-based metrics.
27
+
28
+ - MantisScore also beat the best baselines on other three benchmarks EvalCrafter, GenAI-Bench and VBench, showing high alignment with human evaluations.
29
+
30
+ ## Performance
31
+ ### Evaluation Results on 4 benchmarks.
32
+
33
+ We test our video evaluation model MantisScore on VideoEval-test, EvalCrafter, GenAI-Bench and VBench.
34
+ For the first two benchmarks, we take Spearman corrleation between model's output and human ratings
35
+ averaged among all the evaluation aspects as indicator.
36
+ For GenAI-Bench and VBench, which include human preference data among two or more videos,
37
+ we employ the model's output to predict preferences and use pairwise accuracy as the performance indicator.
38
+ | metric | Final Sum Score | VideoEval-test | EvalCrafter | GenAI-Bench | VBench |
39
+ |------------------|----------------:|---------------:|------------:|-------------|--------|
40
+ | MantisScore | | | | | |
41
+ | Gemini-1.5-Pro | 158.8 | 22.1 | 22.9 | 60.9 | 52.9 |
42
+ | Gemini-1.5-Flash | 157.5 | 20.8 | 17.3 | 67.1 | 52.3 |
43
+ | GPT-4o | 155.4 | 23.1 | 28.7 | 52.0 | 51.7 |
44
+ | CLIP-sim | 126.8 | 8.9 | 36.2 | 34.2 | 47.4 |
45
+ | DINO-sim | 121.3 | 7.5 | 32.1 | 38.5 | 43.3 |
46
+ | SSIM-sim | 118.0 | 13.4 | 26.9 | 34.1 | 43.5 |
47
+ | CLIP-Score | 114.4 | -7.2 | 21.7 | 45.0 | 54.9 |
48
+ | LLaVA-1.5-7B | 108.3 | 8.5 | 10.5 | 49.9 | 39.4 |
49
+ | LLaVA-1.6-7B | 93.3 | -3.1 | 13.2 | 44.5 | 38.7 |
50
+ | X-CLIP-Score | 92.9 | -1.9 | 13.3 | 41.4 | 40.1 |
51
+ | PIQE | 78.3 | -10.1 | -1.2 | 34.5 | 55.1 |
52
+ | BRISQUE | 75.9 | -20.3 | 3.9 | 38.5 | 53.7 |
53
+ | SSIM-dyn | 42.5 | -5.5 | -17.0 | 28.4 | 36.5 |
54
+ | MES-dyn | 36.7 | -12.9 | -26.4 | 31.4 | 44.5 |
55
+
56
+
57
+ ## Usage
58
+ ### Installation
59
+ ```bash
60
+ pip install git+https://github.com/TIGER-AI-Lab/MantisScore.git
61
+ ```
62
+
63
+ ### Inference
64
+
65
+ ### Training
66
+ MantisScore is trained on
67
+
68
+ ### Evaluation
69
+
70
+ ## Citation