Romanian paraphrase
Fine-tune t5-small model for paraphrase. Since there is no Romanian dataset for paraphrasing, I had to create my own dataset. The dataset contains ~60k examples.
How to use
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
tokenizer = AutoTokenizer.from_pretrained("BlackKakapo/t5-small-paraphrase-ro")
model = AutoModelForSeq2SeqLM.from_pretrained("BlackKakapo/t5-small-paraphrase-ro")
Or
from transformers import T5ForConditionalGeneration, T5TokenizerFast
model = T5ForConditionalGeneration.from_pretrained("BlackKakapo/t5-small-paraphrase-ro")
tokenizer = T5TokenizerFast.from_pretrained("BlackKakapo/t5-small-paraphrase-ro")
Generate
text = "Am impresia că fac multe greșeli."
encoding = tokenizer.encode_plus(text, pad_to_max_length=True, return_tensors="pt")
input_ids, attention_masks = encoding["input_ids"].to(device), encoding["attention_mask"].to(device)
beam_outputs = model.generate(
input_ids=input_ids,
attention_mask=attention_masks,
do_sample=True,
max_length=256,
top_k=10,
top_p=0.9,
early_stopping=False,
num_return_sequences=5
)
for beam_output in beam_outputs:
text_para = tokenizer.decode(beam_output, skip_special_tokens=True,clean_up_tokenization_spaces=True)
if text.lower() != text_para.lower() or text not in final_outputs:
final_outputs.append(text_para)
break
print(final_outputs)
Output
['Cred că fac multe greșeli.']
- Downloads last month
- 18
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.