sitloboi2012
/

donut-finetune-rvl-cdip

vision-encoder-decoder

image-text-to-text

ImageClassification

Inference Endpoints

Model card Files Files and versions Community

donut-finetune-rvl-cdip / README.md

sitloboi2012's picture

Update README.md

e1a0d50 over 1 year ago

|

history blame contribute delete

1.05 kB

	---
	license: apache-2.0
	datasets:
	- sitloboi2012/rvl_cdip_small_dataset
	- sitloboi2012/rvl_cdip_large_dataset
	language:
	- en
	metrics:
	- accuracy
	library_name: transformers
	pipeline_tag: image-to-text
	tags:
	- DocumentAI
	- ImageClassification
	- Donut
	---
	# Model Card for Model ID

	<!-- Provide a quick summary of what the model is/does. -->

	This model card aims to be a baseline model for using RVL-CDIP with Donut. The model has been trained on small scale dataset of RVL-CDIP (specically 100 images from this dataset).

	## Model Details

	The model using Donut with VisionEncoderDecoder and Transformers as the backbone model for an end-to-end Document Classification task

	### Downstream Use [optional]

	<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->

	This model can be use for fine-tuning task related Document Classification in different area like Food Document, Financial Document, etc.
	For further task downstream fine-tune, please related to the orignal model from Naver.