File size: 1,299 Bytes
4f5a601 207d1d3 703c410 207d1d3 2c54316 207d1d3 b628d0a 207d1d3 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 |
---
language:
- es
metrics:
- bleu
base_model:
- vgaraujov/bart-base-spanish
pipeline_tag: text2text-generation
library_name: transformers
tags:
- gec
- spanish
- seq2seq
- bart
- cows-l2h
---
This model has been trained on 80% of the COWS-L2H dataset for grammatical error correction of Spanish text. The corpus was sentencized, so the model has been fine-tuned for SENTENCE CORRECTION. This model will likely not perform well on an entire paragraph. To correct a paragraph, sentencize the text and run the model for each sentence.
BLEU: 0.846 on COWS-L2H
Example usage:
```python
from transformers import AutoTokenizer, BartForConditionalGeneration
tokenizer = AutoTokenizer.from_pretrained("SkitCon/gec-spanish-BARTO-COWS-L2H")
model = BartForConditionalGeneration.from_pretrained("SkitCon/gec-spanish-BARTO-COWS-L2H")
input_sentences = ["Yo va al tienda.", "Espero que tú ganas."]
tokenized_text = tokenizer(input_sentences, max_length=128, padding="max_length", truncation=True, return_tensors="pt")
input_ids = tokenized_text["input_ids"].squeeze()
attention_mask = tokenized_text["attention_mask"].squeeze()
outputs = model.generate(input_ids=input_ids, attention_mask=attention_mask)
for sentence in tokenizer.batch_decode(outputs, skip_special_tokens=True):
print(sentence)
``` |