Donut - model fine-tuned for US IRS tax documents classification

This donut model has been fine-tuned for IRS (US) tax document classification. It can classify up to 28 different types of IRS documents, targeting common set of documents used for tax returns.

  1. 1040 U.S. Individual Income Tax Return
  2. 1040-NR U.S. Nonresident Alien Income Tax Return
  3. 1040-NR SCHEDULE OI Other Information
  4. 1040 SCHEDULE 1 Additional Income and Adjustments to Income
  5. 1040 SCHEDULE 2 Additional Taxes
  6. 1040 SCHEDULE 3 Additional Credits and Payments
  7. 1040 SCHEDULE 8812 Credits for Qualifying Children and Other Dependents
  8. 1040 SCHEDULE A Itemized Deductions
  9. 1040 SCHEDULE B Interest and Ordinary Dividends
  10. 1040 SCHEDULE C Profit or Loss From Business
  11. 1040 SCHEDULE D Capital Gains and Losses
  12. 1040 SCHEDULE E Supplemental Income and Loss
  13. 1040 SCHEDULE SE Self-Employment Tax
  14. Form 1125-A Cost of Goods Sold
  15. Form 8949 Sales and Other Dispositions of Capital Assets
  16. Form 8959 Additional Medicare Tax
  17. Form 8960 Net Investment Income Tax — Individuals, Estates, and Trusts
  18. Form 8995 Qualified Business Income Deduction Simplified Computation
  19. Form 8995-A SCHEDULE A Specified Service Trades or Businesses
  20. Form W-2 Wage and Tax Statement

Model Details & Description

The base model is 'naver-clova-ix/donut-base-finetuned-rvlcdip', the model is finetuned using training data set of over 3000+ documents. The config.json file has assocociated label2id updated to reflect all labels that can be classified via the model.

For inference use image size with width: 1920 px and height: 2560 px

Sample Code for Document Inference

# load dependencies
import torch
from transformers import DonutSwinModel, DonutSwinPreTrainedModel,DonutProcessor
from torch import nn
from PIL import Image

# 
class DonutForImageClassification(DonutSwinPreTrainedModel):
    def __init__(self, config):
        super().__init__(config)
        self.num_labels = config.num_labels
        self.swin = DonutSwinModel(config)
        self.dropout = nn.Dropout(0.5)
        self.classifier = nn.Linear(self.swin.num_features, config.num_labels)

    def forward(self, pixel_values: torch.Tensor) -> torch.Tensor:
        outputs = self.swin(pixel_values)
        pooled_output = outputs[1]
        pooled_output = self.dropout(pooled_output)
        logits = self.classifier(pooled_output)
        return logits

sModelName = 'hsarfraz/donut-irs-tax-docs-classifier'
processor = DonutProcessor.from_pretrained(sModelName)
model = DonutForImageClassification.from_pretrained(sModelName)

device = 'cuda' if torch.cuda.is_available() else 'cpu'
model.to(device)

model.eval()

# load test image
sTestImagePath ='replace this with document image path' # i.e. 
# open image
img = Image.open(sTestImagePath)
# resize image to width 1920 and height 2560 - fine tuned model is trained with this width and height 
img_new = img.resize((1920,2560),Image.Resampling.LANCZOS)

# perfoem inference
predicted_label = ''
with torch.no_grad():
    pixel_values = processor(img_new.convert("RGB"), return_tensors="pt").pixel_values
    print(pixel_values.shape)
    pixel_values = pixel_values.to(device)
    outputs = model(pixel_values)
    logits, predicted = torch.max(outputs.data, 1)
    pval = predicted.cpu().numpy()[0]
    predicted_label = model.config.id2label[pval]

print('---------------------------------- ')
print('Document Image Classification: ',predicted_label)

Downloads last month
15
Safetensors
Model size
74.4M params
Tensor type
I64
·
F32
·
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for hsarfraz/donut-irs-tax-docs-classifier

Finetuned
(2)
this model