ruBert-base for Punctuation Correction

The model is built upon the foundation of ruBert-base and has been fine-tuned to correctly place punctuation marks in Russian sentences (it predicts the mark after each word).

Some additional info about the model:

  • Fine-Tuning Source: The model has undergone fine-tuning using a diverse dataset comprising over 20,000 paragraphs from Russian literary works.

  • Supported Classes: The model is designed to predict classes following specific punctuation marks: ? ! . , : ... and space (as class O).

  • Input Format: To achieve optimal results, input text should be provided without punctuation marks. The model does not process changes in letter case.

Usage Guidelines

To use the model effectively, follow these guidelines:

  1. Input Text: Feed the model with text excluding punctuation marks.

  2. Letter Case: The model does not recognize changes in letter case.

Authors

  • Mark Stolyarov
Downloads last month
152
Safetensors
Model size
178M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.