trOCR-final

fine-tuned for VisionEncoderDecoderModel(encoder , decoder) encoder = 'facebook/deit-base-distilled-patch16-384' decoder = 'klue/roberta-base'

How to Get Started with the Model

from transformers import VisionEncoderDecoderModel,AutoTokenizer, TrOCRProcessor
import torch
from PIL import Image


device = torch.device('cuda') # change 'cuda' if you need.

image_path='(your image path)'
image = Image.open(image_path)
#model can be .jpg or .png
#hugging face download: https://huggingface.co/gg4ever/trOCR-final

processor = TrOCRProcessor.from_pretrained("microsoft/trocr-base-handwritten")
trocr_model = "gg4ever/trOCR-final"
model = VisionEncoderDecoderModel.from_pretrained(trocr_model).to(device)
tokenizer = AutoTokenizer.from_pretrained(trocr_model)

pixel_values = (processor(image, return_tensors="pt").pixel_values).to(device)
generated_ids = model.generate(pixel_values)
generated_text = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(generated_text)

Training Details

Training Data

1M words generated by TextRecognitionDataGenerator(trdg) : https://github.com/Belval/TextRecognitionDataGenerator/blob/master/trdg/run.py

1.1M words from AI-hub OCR words dataset : https://aihub.or.kr/aihubdata/data/view.do?currMenu=115&topMenu=100&dataSetSn=81

Training Hyperparameters

hyperparameters values
predict_with_generate True
evaluation_strategy "steps"
per_device_train_batch_size 32
per_device_eval_batch_size 32
num_train_epochs 2
fp16 True
learning_rate 4e-5
eval_stept 10000
warmup_steps 20000
weight_decay 0.01
Downloads last month
16
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.