Update README.md
Browse files
README.md
CHANGED
@@ -10,6 +10,8 @@ datasets:
|
|
10 |
![Scoris logo](https://scoris.lt/logo_smaller.png)
|
11 |
This is an English-Lithuanian translation model based on [Helsinki-NLP/opus-mt-tc-big-en-lt](https://huggingface.co/Helsinki-NLP/opus-mt-tc-big-en-lt)
|
12 |
|
|
|
|
|
13 |
|
14 |
Fine-tuned on large merged data set: [scoris/en-lt-merged-data](https://huggingface.co/datasets/scoris/en-lt-merged-data) (5.4 million sentence pairs)
|
15 |
|
@@ -23,7 +25,7 @@ Tested on scoris/en-lt-merged-data validation set. Metric: sacrebleu
|
|
23 |
| model | testset | BLEU | Gen Len |
|
24 |
|----------|---------|-------|-------|
|
25 |
| scoris/opus-mt-tc-big-lt-en-scoris-finetuned | scoris/en-lt-merged-data (validation) | 41.026200 | 17.449100
|
26 |
-
| Helsinki-NLP/opus-mt-tc-big-lt-en | scoris/en-lt-merged-data (validation) |
|
27 |
|
28 |
According to [Google](https://cloud.google.com/translate/automl/docs/evaluate) BLEU score interpretation is following:
|
29 |
|
@@ -50,10 +52,10 @@ tokenizer = MarianTokenizer.from_pretrained(model_name)
|
|
50 |
model = MarianMTModel.from_pretrained(model_name)
|
51 |
|
52 |
src_text = [
|
53 |
-
"Once upon a time there
|
54 |
-
"
|
55 |
-
"One day
|
56 |
-
"
|
57 |
]
|
58 |
|
59 |
# Tokenize the text and generate translations
|
@@ -63,5 +65,9 @@ translated = model.generate(**tokenizer(src_text, return_tensors="pt", padding=T
|
|
63 |
for t in translated:
|
64 |
print(tokenizer.decode(t, skip_special_tokens=True))
|
65 |
|
66 |
-
#
|
|
|
|
|
|
|
|
|
67 |
```
|
|
|
10 |
![Scoris logo](https://scoris.lt/logo_smaller.png)
|
11 |
This is an English-Lithuanian translation model based on [Helsinki-NLP/opus-mt-tc-big-en-lt](https://huggingface.co/Helsinki-NLP/opus-mt-tc-big-en-lt)
|
12 |
|
13 |
+
For Lithuanian-English translation check another model [scoris/opus-mt-tc-big-lt-en-scoris-finetuned](https://huggingface.co/scoris/opus-mt-tc-big-lt-en-scoris-finetuned)
|
14 |
+
|
15 |
|
16 |
Fine-tuned on large merged data set: [scoris/en-lt-merged-data](https://huggingface.co/datasets/scoris/en-lt-merged-data) (5.4 million sentence pairs)
|
17 |
|
|
|
25 |
| model | testset | BLEU | Gen Len |
|
26 |
|----------|---------|-------|-------|
|
27 |
| scoris/opus-mt-tc-big-lt-en-scoris-finetuned | scoris/en-lt-merged-data (validation) | 41.026200 | 17.449100
|
28 |
+
| Helsinki-NLP/opus-mt-tc-big-lt-en | scoris/en-lt-merged-data (validation) | 34.2768 | 17.6664
|
29 |
|
30 |
According to [Google](https://cloud.google.com/translate/automl/docs/evaluate) BLEU score interpretation is following:
|
31 |
|
|
|
52 |
model = MarianMTModel.from_pretrained(model_name)
|
53 |
|
54 |
src_text = [
|
55 |
+
"Once upon a time there were three bears, who lived together in a house of their own in a wood.",
|
56 |
+
"One of them was a little, small wee bear; one was a middle-sized bear, and the other was a great, huge bear.",
|
57 |
+
"One day, after they had made porridge for their breakfast, they walked out into the wood while the porridge was cooling.",
|
58 |
+
"And while they were walking, a little girl came into the house. "
|
59 |
]
|
60 |
|
61 |
# Tokenize the text and generate translations
|
|
|
65 |
for t in translated:
|
66 |
print(tokenizer.decode(t, skip_special_tokens=True))
|
67 |
|
68 |
+
# Result:
|
69 |
+
# Kažkada buvo trys lokiai, kurie gyveno kartu savame name miške.
|
70 |
+
# Vienas iš jų buvo mažas, mažas lokys; vienas buvo vidutinio dydžio lokys, o kitas buvo didelis, didžiulis lokys.
|
71 |
+
# Vieną dieną, pagaminę košės pusryčiams, jie išėjo į mišką, kol košė vėso.
|
72 |
+
# Jiems einant, į namus atėjo maža mergaitė.
|
73 |
```
|