Overview

Scoris logo This is an English-Lithuanian translation model (Seq2Seq). For Lithuanian-English translation check another model scoris-mt-lt-en

Original model: Helsinki-NLP/opus-mt-tc-big-en-lt

Fine-tuned on large merged data set: scoris/en-lt-merged-data (5.4 million sentence pairs)

Trained on 6 epochs.

Made by Scoris team

Evaluation:

EN-LT BLEU
scoris/scoris-mt-en-lt 41.9
Helsinki-NLP/opus-mt-tc-big-en-lt 34.3
Google Translate 30.8
Deepl 32.3

Evaluated on scoris/en-lt-merged-data validation set. Google and Deepl evaluated using a random sample of 1000 sentence pairs.

According to Google BLEU score interpretation is following:

BLEU Score Interpretation
< 10 Almost useless
10 - 19 Hard to get the gist
20 - 29 The gist is clear, but has significant grammatical errors
30 - 40 Understandable to good translations
40 - 50 High quality translations
50 - 60 Very high quality, adequate, and fluent translations
> 60 Quality often better than human

Usage

You can use the model in the following way:

from transformers import MarianMTModel, MarianTokenizer

# Specify the model identifier on Hugging Face Model Hub
model_name = "scoris/scoris-mt-en-lt"

# Load the model and tokenizer from Hugging Face
tokenizer = MarianTokenizer.from_pretrained(model_name)
model = MarianMTModel.from_pretrained(model_name)

src_text = [
    "Once upon a time there were three bears, who lived together in a house of their own in a wood.",
    "One of them was a little, small wee bear; one was a middle-sized bear, and the other was a great, huge bear.",
    "One day, after they had made porridge for their breakfast, they walked out into the wood while the porridge was cooling.",
    "And while they were walking, a little girl came into the house. "
]

# Tokenize the text and generate translations
translated = model.generate(**tokenizer(src_text, return_tensors="pt", padding=True))

# Print out the translations
for t in translated:
    print(tokenizer.decode(t, skip_special_tokens=True))

# Result:
# Kažkada buvo trys lokiai, kurie gyveno kartu savame name miške.
# Vienas iš jų buvo mažas, mažas lokys; vienas buvo vidutinio dydžio lokys, o kitas buvo didelis, didžiulis lokys.
# Vieną dieną, pagaminę košės pusryčiams, jie išėjo į mišką, kol košė vėso.
# Jiems einant, į namus atėjo maža mergaitė.
Downloads last month
36
Safetensors
Model size
236M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train scoris/scoris-mt-en-lt