Mdkaif2782
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -3,6 +3,7 @@ datasets:
|
|
3 |
- SKNahin/bengali-transliteration-data
|
4 |
language:
|
5 |
- bn
|
|
|
6 |
base_model:
|
7 |
- facebook/mbart-large-50
|
8 |
tags:
|
@@ -10,4 +11,78 @@ tags:
|
|
10 |
- bangla
|
11 |
- translator
|
12 |
- avro
|
13 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
3 |
- SKNahin/bengali-transliteration-data
|
4 |
language:
|
5 |
- bn
|
6 |
+
- en
|
7 |
base_model:
|
8 |
- facebook/mbart-large-50
|
9 |
tags:
|
|
|
11 |
- bangla
|
12 |
- translator
|
13 |
- avro
|
14 |
+
---
|
15 |
+
|
16 |
+
# Hugging Face: Banglish to Bangla Translation
|
17 |
+
|
18 |
+
This repository demonstrates how to use a Hugging Face model to translate Banglish (Romanized Bangla) text into Bangla using the MBart50 tokenizer and model. The model, `Mdkaif2782/banglish-to-bangla`, is pre-trained and fine-tuned for this task.
|
19 |
+
|
20 |
+
## Setup in Google Colab
|
21 |
+
Follow these steps to use the model in Google Colab:
|
22 |
+
|
23 |
+
### 1. Install Dependencies
|
24 |
+
Make sure you have the `transformers` library installed. Run the following command in your Colab notebook:
|
25 |
+
|
26 |
+
```python
|
27 |
+
!pip install transformers torch
|
28 |
+
```
|
29 |
+
|
30 |
+
### 2. Load and Use the Model
|
31 |
+
Copy the code below into a cell in your Colab notebook to start translating Banglish to Bangla:
|
32 |
+
|
33 |
+
```python
|
34 |
+
from transformers import MBartForConditionalGeneration, MBart50TokenizerFast
|
35 |
+
import torch
|
36 |
+
|
37 |
+
# Load the pre-trained model and tokenizer directly from Hugging Face
|
38 |
+
model_name = "Mdkaif2782/banglish-to-bangla"
|
39 |
+
tokenizer = MBart50TokenizerFast.from_pretrained(model_name)
|
40 |
+
model = MBartForConditionalGeneration.from_pretrained(model_name)
|
41 |
+
|
42 |
+
def translate_banglish_to_bangla(model, tokenizer, banglish_input):
|
43 |
+
inputs = tokenizer(banglish_input, return_tensors="pt", padding=True, truncation=True, max_length=128)
|
44 |
+
|
45 |
+
if torch.cuda.is_available():
|
46 |
+
inputs = {key: value.cuda() for key, value in inputs.items()}
|
47 |
+
model = model.cuda()
|
48 |
+
|
49 |
+
translated_tokens = model.generate(**inputs, decoder_start_token_id=tokenizer.lang_code_to_id["bn_IN"])
|
50 |
+
translated_text = tokenizer.batch_decode(translated_tokens, skip_special_tokens=True)[0]
|
51 |
+
|
52 |
+
return translated_text
|
53 |
+
|
54 |
+
# Take custom input
|
55 |
+
print("Enter your Banglish text (type 'exit' to quit):")
|
56 |
+
while True:
|
57 |
+
banglish_text = input("Banglish: ")
|
58 |
+
if banglish_text.lower() == "exit":
|
59 |
+
break
|
60 |
+
|
61 |
+
# Translate Banglish to Bangla
|
62 |
+
translated_text = translate_banglish_to_bangla(model, tokenizer, banglish_text)
|
63 |
+
print(f"Translated Bangla: {translated_text}\n")
|
64 |
+
```
|
65 |
+
|
66 |
+
### 3. Run the Notebook
|
67 |
+
1. Paste the above code into a cell.
|
68 |
+
2. Run the cell.
|
69 |
+
3. Enter your Banglish text in the input prompt to get the translated Bangla text. Type `exit` to quit.
|
70 |
+
|
71 |
+
## Example Usage
|
72 |
+
|
73 |
+
Input:
|
74 |
+
```
|
75 |
+
Banglish: amar valo lagche onek
|
76 |
+
```
|
77 |
+
|
78 |
+
Output:
|
79 |
+
```
|
80 |
+
Translated Bangla: আমার ভালো লাগছে অনেক
|
81 |
+
```
|
82 |
+
|
83 |
+
## Notes
|
84 |
+
- Ensure your runtime in Google Colab supports GPU for faster processing. Go to `Runtime > Change runtime type` and select `GPU`.
|
85 |
+
- The model `Mdkaif2782/banglish-to-bangla` can be fine-tuned further if required.
|
86 |
+
|
87 |
+
## License
|
88 |
+
This project uses the Hugging Face `transformers` library. Refer to the [Hugging Face documentation](https://huggingface.co/docs/transformers/) for more details.
|