repleeka commited on
Commit
0ceff8c
·
verified ·
1 Parent(s): 79cd78d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +65 -3
README.md CHANGED
@@ -1,3 +1,65 @@
1
- ---
2
- license: cc-by-nd-4.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-nd-4.0
3
+ language:
4
+ - en
5
+ - nyi
6
+ metrics:
7
+ - bleu
8
+ base_model:
9
+ - Helsinki-NLP/opus-mt-en-hi
10
+ pipeline_tag: translation
11
+ library_name: transformers
12
+ tags:
13
+ - english
14
+ - nyishi
15
+ - nmt
16
+ - translation
17
+ - nlp
18
+ ---
19
+ # Model Card for Model ID
20
+
21
+ The **eng-nyi-nmt** model is a neural machine translation (NMT) model fine-tuned on the **GinLish Corpus v0.1.0** (under development), consisting of English and Nyishi language pairs. Nyishi, a low-resource language spoken in Arunachal Pradesh, India, faces challenges due to the scarcity of digital resources and linguistic datasets. This model aims to support the translation of Nyishi, helping preserve and promote its use in digital spaces.
22
+
23
+ To develop **eng-nyi-nmt**, the pre-trained model **Helsinki-NLP/opus-mt-en-hi** (English-to-Hindi) was leveraged as a foundation, given the structural similarities between Hindi and Nyishi in a multilingual context. Using transfer learning on this model allowed efficient adaptation of the Nyishi translation model, even with limited language data.
24
+
25
+ ## Model Details
26
+
27
+ ### Model Description
28
+ - **Developed by:** Tungon Dugi and Nabam Kakum
29
+ - **Affiliation:** National Institute of Technology Arunachal Pradesh, India
30
31
+ - **Model type:** Translation
32
+ - **Language(s) (NLP):** English (en) and Nyishi (nyi)
33
+ - **Finetuned from model:** Helsinki-NLP/opus-mt-en-hi
34
+
35
+ ### Uses
36
+ #### Direct Use
37
+ This model can be used for translation and text-to-text generation between English and Nyishi.
38
+
39
+ ### How to Get Started with the Model
40
+
41
+ Use the code below to get started with the model:
42
+
43
+ ```python
44
+ from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
45
+
46
+ tokenizer = AutoTokenizer.from_pretrained("repleeka/eng-nyi-nmt")
47
+ model = AutoModelForSeq2SeqLM.from_pretrained("repleeka/eng-nyi-nmt")
48
+ ```
49
+
50
+ ## Training Details
51
+ ### Training Data
52
+ The model was trained using the **EnNyiCopr** dataset, which comprises aligned sentence pairs in English and Nyishi. This dataset was curated to support low-resource language machine translation, focusing on preserving and promoting Nyishi language in digital spaces.
53
+
54
+ ### Evaluation
55
+ The model was evaluated on translation quality using common metrics, specifically BLEU score, and runtime efficiency.
56
+
57
+ | Metric | Value |
58
+ |------------------------|------------------------|
59
+ | **BLEU Score** | 0.1468 |
60
+ | **Evaluation Runtime** | 1237.5341 seconds |
61
+
62
+ The BLEU score indicates a foundational level of translation quality for English-to-Nyishi, given the limited data resources. Although further refinement is needed, this result shows encouraging progress toward accurate translations.
63
+
64
+ ### Summary
65
+ The **eng-nyi-nmt** model is in the early stages of development, offering initial translation capabilities between English and Nyishi. Further dataset expansion and enhanced training resources are crucial for advancing the model's performance, enabling better generalization and translation accuracy for practical applications. Continued efforts are essential for refining and optimizing the model's translation capabilities to address the needs of this extremely low-resource language.