daviddongdong
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -41,7 +41,7 @@ This task allows for a more nuanced content retrieval, honing in on specific inf
|
|
41 |
## 3. Evaluation Set
|
42 |
### 3.1 Document Analysis
|
43 |
|
44 |
-
**MMDocIR**
|
45 |
Different domains feature distinct distributions of multi-modal information. For instance, research reports, tutorials, workshops, and brochures predominantly contain images, whereas financial and industry documents are table-rich. In contrast, government and legal documents primarily comprise text. Overall, the modality distribution is: Text (60.4%), Image (18.8%), Table (16.7%), and other modalities (4.1%).
|
46 |
|
47 |
### 3.2 Question and Annotation Analysis
|
|
|
41 |
## 3. Evaluation Set
|
42 |
### 3.1 Document Analysis
|
43 |
|
44 |
+
**MMDocIR** evaluation set includes 313 long documents averaging 65.1 pages, categorized into ten main domains: research reports, administration&industry, tutorials&workshops, academic papers, brochures, financial reports, guidebooks, government documents, laws, and news articles.
|
45 |
Different domains feature distinct distributions of multi-modal information. For instance, research reports, tutorials, workshops, and brochures predominantly contain images, whereas financial and industry documents are table-rich. In contrast, government and legal documents primarily comprise text. Overall, the modality distribution is: Text (60.4%), Image (18.8%), Table (16.7%), and other modalities (4.1%).
|
46 |
|
47 |
### 3.2 Question and Annotation Analysis
|