|
--- |
|
license: apache-2.0 |
|
datasets: |
|
- sitloboi2012/rvl_cdip_small_dataset |
|
- sitloboi2012/rvl_cdip_large_dataset |
|
language: |
|
- en |
|
metrics: |
|
- accuracy |
|
library_name: transformers |
|
pipeline_tag: image-to-text |
|
tags: |
|
- DocumentAI |
|
- ImageClassification |
|
- Donut |
|
--- |
|
# Model Card for Model ID |
|
|
|
<!-- Provide a quick summary of what the model is/does. --> |
|
|
|
This model card aims to be a baseline model for using RVL-CDIP with Donut. The model has been trained on small scale dataset of RVL-CDIP (specically 100 images from this dataset). |
|
|
|
## Model Details |
|
|
|
The model using Donut with VisionEncoderDecoder and Transformers as the backbone model for an end-to-end Document Classification task |
|
|
|
### Downstream Use [optional] |
|
|
|
<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app --> |
|
|
|
This model can be use for fine-tuning task related Document Classification in different area like Food Document, Financial Document, etc. |
|
For further task downstream fine-tune, please related to the orignal model from Naver. |
|
|
|
|
|
|