I want to fine-tune a model…
model = BertForTokenClassification.from_pretrained('monilouise/ner_pt_br'
with this dataset:
raw_datasets = load_dataset('lener_br')
The raw_datasets
loaded are already tokenized and encoded. And I don’t know how it was tokenized. Now, I want to pad the inputs, but I don’t know how to use DataCollatorWithPaddings
in this case.
I noticed that this dataset is similar to wnut
dataset from the docs. Still, I can’t figure out what should I do.