tobiadefami
/

docmodel-base

Inference Endpoints

Model card Files Files and versions Community

docmodel-base / README.md

tobiadefami's picture

Update README.md

5ab3f83 verified 4 months ago

|

history blame contribute delete

1.68 kB

	### Model Description
	DocModel is a document understanding model built on the RoBERTa architecture. It captures both textual content and 2D spatial relationships, making it ideal for tasks that require processing complex document layouts, such as forms, tables, and scanned documents.

	Developed by: Oluwatobi Adefami, Madison May

	Model type: Document Understanding (Information Extraction)

	License: Apache-2.0

	Model Sources

	Repository: https://github.com/Tobiadefami/docmodel


	### Uses
	DocModel can be directly used for document processing, form understanding, and entity extraction from structured and semi-structured documents.

	### Out-of-Scope Use
	Not recommended for tasks that involve purely textual data without layout components or heavily distorted document scans.

	### Bias, Risks, and Limitations
	DocModel’s performance may degrade on highly noisy or poorly structured documents, such as extreme distortions or low-resolution scans.

	### Recommendations
	Users should be mindful of the model’s limitations, particularly in handling documents with severe layout inconsistencies.

	How to Get Started with the Model
	``` python
	from transformers import AutoTokenizer, AutoModel

	tokenizer = AutoTokenizer.from_pretrained("tobiadefami/docmodel-base")
	model = AutoModel.from_pretrained("tobiadefami/docmodel-base")

	# Example usage
	inputs = tokenizer("Your document text here...", return_tensors="pt")
	outputs = model(**inputs)
	```
	### Evaluation

	##### Metrics

	Eval Loss: 1.36752

	F1-Score: 0.84126

	### Results

	DocModel has been evaluated on the FUNSD dataset for information extraction tasks, demonstrating competitive performance in both loss and F1-score.