Mdkaif2782 commited on
Commit
4bdb873
·
verified ·
1 Parent(s): f47968f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +76 -1
README.md CHANGED
@@ -3,6 +3,7 @@ datasets:
3
  - SKNahin/bengali-transliteration-data
4
  language:
5
  - bn
 
6
  base_model:
7
  - facebook/mbart-large-50
8
  tags:
@@ -10,4 +11,78 @@ tags:
10
  - bangla
11
  - translator
12
  - avro
13
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  - SKNahin/bengali-transliteration-data
4
  language:
5
  - bn
6
+ - en
7
  base_model:
8
  - facebook/mbart-large-50
9
  tags:
 
11
  - bangla
12
  - translator
13
  - avro
14
+ ---
15
+
16
+ # Hugging Face: Banglish to Bangla Translation
17
+
18
+ This repository demonstrates how to use a Hugging Face model to translate Banglish (Romanized Bangla) text into Bangla using the MBart50 tokenizer and model. The model, `Mdkaif2782/banglish-to-bangla`, is pre-trained and fine-tuned for this task.
19
+
20
+ ## Setup in Google Colab
21
+ Follow these steps to use the model in Google Colab:
22
+
23
+ ### 1. Install Dependencies
24
+ Make sure you have the `transformers` library installed. Run the following command in your Colab notebook:
25
+
26
+ ```python
27
+ !pip install transformers torch
28
+ ```
29
+
30
+ ### 2. Load and Use the Model
31
+ Copy the code below into a cell in your Colab notebook to start translating Banglish to Bangla:
32
+
33
+ ```python
34
+ from transformers import MBartForConditionalGeneration, MBart50TokenizerFast
35
+ import torch
36
+
37
+ # Load the pre-trained model and tokenizer directly from Hugging Face
38
+ model_name = "Mdkaif2782/banglish-to-bangla"
39
+ tokenizer = MBart50TokenizerFast.from_pretrained(model_name)
40
+ model = MBartForConditionalGeneration.from_pretrained(model_name)
41
+
42
+ def translate_banglish_to_bangla(model, tokenizer, banglish_input):
43
+ inputs = tokenizer(banglish_input, return_tensors="pt", padding=True, truncation=True, max_length=128)
44
+
45
+ if torch.cuda.is_available():
46
+ inputs = {key: value.cuda() for key, value in inputs.items()}
47
+ model = model.cuda()
48
+
49
+ translated_tokens = model.generate(**inputs, decoder_start_token_id=tokenizer.lang_code_to_id["bn_IN"])
50
+ translated_text = tokenizer.batch_decode(translated_tokens, skip_special_tokens=True)[0]
51
+
52
+ return translated_text
53
+
54
+ # Take custom input
55
+ print("Enter your Banglish text (type 'exit' to quit):")
56
+ while True:
57
+ banglish_text = input("Banglish: ")
58
+ if banglish_text.lower() == "exit":
59
+ break
60
+
61
+ # Translate Banglish to Bangla
62
+ translated_text = translate_banglish_to_bangla(model, tokenizer, banglish_text)
63
+ print(f"Translated Bangla: {translated_text}\n")
64
+ ```
65
+
66
+ ### 3. Run the Notebook
67
+ 1. Paste the above code into a cell.
68
+ 2. Run the cell.
69
+ 3. Enter your Banglish text in the input prompt to get the translated Bangla text. Type `exit` to quit.
70
+
71
+ ## Example Usage
72
+
73
+ Input:
74
+ ```
75
+ Banglish: amar valo lagche onek
76
+ ```
77
+
78
+ Output:
79
+ ```
80
+ Translated Bangla: আমার ভালো লাগছে অনেক
81
+ ```
82
+
83
+ ## Notes
84
+ - Ensure your runtime in Google Colab supports GPU for faster processing. Go to `Runtime > Change runtime type` and select `GPU`.
85
+ - The model `Mdkaif2782/banglish-to-bangla` can be fine-tuned further if required.
86
+
87
+ ## License
88
+ This project uses the Hugging Face `transformers` library. Refer to the [Hugging Face documentation](https://huggingface.co/docs/transformers/) for more details.