scoris
/

scoris-mt-en-lt

@@ -10,6 +10,8 @@ datasets:
 ![Scoris logo](https://scoris.lt/logo_smaller.png)
 This is an English-Lithuanian translation model based on [Helsinki-NLP/opus-mt-tc-big-en-lt](https://huggingface.co/Helsinki-NLP/opus-mt-tc-big-en-lt)
 Fine-tuned on large merged data set: [scoris/en-lt-merged-data](https://huggingface.co/datasets/scoris/en-lt-merged-data) (5.4 million sentence pairs)
@@ -23,7 +25,7 @@ Tested on scoris/en-lt-merged-data validation set. Metric: sacrebleu
 | model | testset | BLEU  | Gen Len |
 |----------|---------|-------|-------|
 | scoris/opus-mt-tc-big-lt-en-scoris-finetuned | scoris/en-lt-merged-data (validation) | 41.026200 	 | 17.449100
-| Helsinki-NLP/opus-mt-tc-big-lt-en | scoris/en-lt-merged-data (validation) | TBD | TBD
 According to [Google](https://cloud.google.com/translate/automl/docs/evaluate) BLEU score interpretation is following:
@@ -50,10 +52,10 @@ tokenizer = MarianTokenizer.from_pretrained(model_name)
 model = MarianMTModel.from_pretrained(model_name)
 src_text = [
-    "Once upon a time there was a dear little girl who was loved by everyone who looked at her, but most of all by her grandmother, and there was nothing that she would not have given to the child.",
-    "Once she gave her a little cap of red velvet, which suited her so well that she would never wear anything else; so she was always called 'Little Red- Cap.'",
-    "One day her mother said to her: ‘Come, Little Red-Cap, here is a piece of cake and a bottle of wine; take them to your grandmother, she is ill and weak, and they will do her good.",
-    "Set out before it gets hot, and when you are going, walk nicely and quietly and do not run off the path, or you may fall and break the bottle, and then your grandmother will get nothing."
 ]
 # Tokenize the text and generate translations
@@ -63,5 +65,9 @@ translated = model.generate(**tokenizer(src_text, return_tensors="pt", padding=T
 for t in translated:
     print(tokenizer.decode(t, skip_special_tokens=True))
-# TBD
 ```

 ![Scoris logo](https://scoris.lt/logo_smaller.png)
 This is an English-Lithuanian translation model based on [Helsinki-NLP/opus-mt-tc-big-en-lt](https://huggingface.co/Helsinki-NLP/opus-mt-tc-big-en-lt)
+For Lithuanian-English translation check another model [scoris/opus-mt-tc-big-lt-en-scoris-finetuned](https://huggingface.co/scoris/opus-mt-tc-big-lt-en-scoris-finetuned)
 Fine-tuned on large merged data set: [scoris/en-lt-merged-data](https://huggingface.co/datasets/scoris/en-lt-merged-data) (5.4 million sentence pairs)
 | model | testset | BLEU  | Gen Len |
 |----------|---------|-------|-------|
 | scoris/opus-mt-tc-big-lt-en-scoris-finetuned | scoris/en-lt-merged-data (validation) | 41.026200 	 | 17.449100
+| Helsinki-NLP/opus-mt-tc-big-lt-en | scoris/en-lt-merged-data (validation) | 34.2768 | 17.6664
 According to [Google](https://cloud.google.com/translate/automl/docs/evaluate) BLEU score interpretation is following:
 model = MarianMTModel.from_pretrained(model_name)
 src_text = [
+    "Once upon a time there were three bears, who lived together in a house of their own in a wood.",
+    "One of them was a little, small wee bear; one was a middle-sized bear, and the other was a great, huge bear.",
+    "One day, after they had made porridge for their breakfast, they walked out into the wood while the porridge was cooling.",
+    "And while they were walking, a little girl came into the house. "
 ]
 # Tokenize the text and generate translations
 for t in translated:
     print(tokenizer.decode(t, skip_special_tokens=True))
+# Result:
+# Kažkada buvo trys lokiai, kurie gyveno kartu savame name miške.
+# Vienas iš jų buvo mažas, mažas lokys; vienas buvo vidutinio dydžio lokys, o kitas buvo didelis, didžiulis lokys.
+# Vieną dieną, pagaminę košės pusryčiams, jie išėjo į mišką, kol košė vėso.
+# Jiems einant, į namus atėjo maža mergaitė.
 ```