File size: 1,234 Bytes
64ff8fd 3acf7ac 4665571 3acf7ac 64ff8fd 4665571 64ff8fd |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 |
---
language: en
tags:
- Trinidad and Tobago English Parser
- text2text-generation
- Caribe
license: cc-by-nc-sa-4.0
datasets:
- Custom dataset
- Creolised JFLEG
---
# Trinidad English Creole Parser
This model was trained as a parser to Trinidad English Creole.
---
# Model
This model utilises T5-base pre-trained model. It was fine tuned using a combination of a custom dataset and creolised [JFLEG](https://arxiv.org/abs/1702.04066) dataset. JFLEG dataset was creolised using the file encoding feature of the Caribe library. For more on Caribbean Creole checkout the library [Caribe](https://pypi.org/project/Caribe/).
___
# Usage with Transformers
```python
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
tokenizer = AutoTokenizer.from_pretrained("KES/T5-TTParser")
model = AutoModelForSeq2SeqLM.from_pretrained("KES/T5-TTParser")
txt = "Ah have live with mi paremnts en London"
inputs = tokenizer("grammar:"+txt, truncation=True, return_tensors='pt')
output = model.generate(inputs['input_ids'], num_beams=4, max_length=512, early_stopping=True)
correction=tokenizer.batch_decode(output, skip_special_tokens=True)
print("".join(correction)) #Correction: Ah live with meh parents in London.
``` |