Mdkaif2782
/

banglish-to-bangla

Text2Text Generation

Model card Files Files and versions Community

Mdkaif2782 commited on 19 days ago

Commit

4bdb873

·

verified ·

1 Parent(s): f47968f

Update README.md

Files changed (1) hide show

README.md +76 -1

README.md CHANGED Viewed

@@ -3,6 +3,7 @@ datasets:
 - SKNahin/bengali-transliteration-data
 language:
 - bn
 base_model:
 - facebook/mbart-large-50
 tags:
@@ -10,4 +11,78 @@ tags:
 - bangla
 - translator
 - avro
----

 - SKNahin/bengali-transliteration-data
 language:
 - bn
+- en
 base_model:
 - facebook/mbart-large-50
 tags:
 - bangla
 - translator
 - avro
+---
+# Hugging Face: Banglish to Bangla Translation
+This repository demonstrates how to use a Hugging Face model to translate Banglish (Romanized Bangla) text into Bangla using the MBart50 tokenizer and model. The model, `Mdkaif2782/banglish-to-bangla`, is pre-trained and fine-tuned for this task.
+## Setup in Google Colab
+Follow these steps to use the model in Google Colab:
+### 1. Install Dependencies
+Make sure you have the `transformers` library installed. Run the following command in your Colab notebook:
+```python
+!pip install transformers torch
+```
+### 2. Load and Use the Model
+Copy the code below into a cell in your Colab notebook to start translating Banglish to Bangla:
+```python
+from transformers import MBartForConditionalGeneration, MBart50TokenizerFast
+import torch
+# Load the pre-trained model and tokenizer directly from Hugging Face
+model_name = "Mdkaif2782/banglish-to-bangla"
+tokenizer = MBart50TokenizerFast.from_pretrained(model_name)
+model = MBartForConditionalGeneration.from_pretrained(model_name)
+def translate_banglish_to_bangla(model, tokenizer, banglish_input):
+    inputs = tokenizer(banglish_input, return_tensors="pt", padding=True, truncation=True, max_length=128)
+    if torch.cuda.is_available():
+        inputs = {key: value.cuda() for key, value in inputs.items()}
+        model = model.cuda()
+    translated_tokens = model.generate(**inputs, decoder_start_token_id=tokenizer.lang_code_to_id["bn_IN"])
+    translated_text = tokenizer.batch_decode(translated_tokens, skip_special_tokens=True)[0]
+    return translated_text
+# Take custom input
+print("Enter your Banglish text (type 'exit' to quit):")
+while True:
+    banglish_text = input("Banglish: ")
+    if banglish_text.lower() == "exit":
+        break
+    # Translate Banglish to Bangla
+    translated_text = translate_banglish_to_bangla(model, tokenizer, banglish_text)
+    print(f"Translated Bangla: {translated_text}\n")
+```
+### 3. Run the Notebook
+1. Paste the above code into a cell.
+2. Run the cell.
+3. Enter your Banglish text in the input prompt to get the translated Bangla text. Type `exit` to quit.
+## Example Usage
+Input:
+```
+Banglish: amar valo lagche onek
+```
+Output:
+```
+Translated Bangla: আমার ভালো লাগছে অনেক
+```
+## Notes
+- Ensure your runtime in Google Colab supports GPU for faster processing. Go to `Runtime > Change runtime type` and select `GPU`.
+- The model `Mdkaif2782/banglish-to-bangla` can be fine-tuned further if required.
+## License
+This project uses the Hugging Face `transformers` library. Refer to the [Hugging Face documentation](https://huggingface.co/docs/transformers/) for more details.