Nayana OCR(Alpha)
Nayana OCR is a state-of-the-art model finetuned for document-level Optical Character Recognition (OCR) across 10 Indian languages:
Bengali, Gujarati, Hindi, Kannada, Malayalam, Marathi, Odia, Punjabi, Tamil, Telugu
while maintaining exceptional OCR capabilities in English and Chinese.
This model is built upon the robust GOT OCR base and offers features like advanced multilingual OCR, enhanced document rendering, and seamless GPU utilization.
We are training a better model with lot more data follows us to keep it update
for more information : Cognitivelab
Key Features
- Multilingual OCR: Supports OCR for 10 Indian languages alongside English and Chinese.
- Document-Level OCR: Designed for extracting text from complex document layouts.
- Streamlined Deployment: Optimized for GPU usage with support for safetensors.
- Customizable OCR Type: Switch between OCR modes and enable rendering.
Installation
To use Nayana OCR, ensure you have the following prerequisites installed:
- Python 3.8+
- PyTorch (with GPU support)
- Transformers library
- PEFT library
Install the required libraries using:
pip install torch transformers peft
Usage Example
Here's a quick example of how to use Nayana OCR for extracting text from an image:
from transformers import AutoModel, AutoTokenizer
from peft import PeftModel
import torch
# Load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained(
'Nayana-cognitivelab/Nayana_base_OCR',
trust_remote_code=True,
torch_dtype=torch.float16
)
model = AutoModel.from_pretrained(
'Nayana-cognitivelab/Nayana_base_OCR',
trust_remote_code=True,
low_cpu_mem_usage=True,
device_map='cuda',
use_safetensors=True,
pad_token_id=tokenizer.eos_token_id,
torch_dtype=torch.float16
)
# Prepare the model for inference
model = model.eval().cuda()
# Perform OCR on an image
image_file = 'hindi.png'
result = model.chat(
tokenizer,
image_file,
ocr_type='ocr',
render=True,
stream_flag=True
)
print(result)
Parameters
Parameter | Description | Default |
---|---|---|
ocr_type |
Specify the type of OCR to use ('ocr' ) |
'ocr' |
render |
Enable rendering of the extracted text on the image. | True |
stream_flag |
Stream results for larger or multi-page documents. | True |
Base Model
This model is finetuned on the GOT OCR base, leveraging its vision-language capabilities to deliver unparalleled OCR performance.
License
This project is licensed under the Apache 2.0 License. See the LICENSE file for details.
- Downloads last month
- 44