hfl/chinese-roberta-wwm-ext fine-tuned on the COLDataset. Usage example:

import torch
from transformers.models.bert import BertTokenizer, BertForSequenceClassification

tokenizer = BertTokenizer.from_pretrained('thu-coai/roberta-base-cold')
model = BertForSequenceClassification.from_pretrained('thu-coai/roberta-base-cold')
model.eval()

texts = ['你就是个傻逼!','黑人很多都好吃懒做,偷奸耍滑!','男女平等,黑人也很优秀。']

model_input = tokenizer(texts,return_tensors="pt",padding=True)
model_output = model(**model_input, return_dict=False)
prediction = torch.argmax(model_output[0].cpu(), dim=-1)
prediction = [p.item() for p in prediction]
print(prediction) # --> [1, 1, 0] (0 for Non-Offensive, 1 for Offenisve)

This fine-tuned model obtains 82.75 accuracy and 82.39 macro-F1 on the test set.

Please kindly cite the original paper if you use this model.

@article{deng2022cold,
  title={Cold: A benchmark for chinese offensive language detection},
  author={Deng, Jiawen and Zhou, Jingyan and Sun, Hao and Zheng, Chujie and Mi, Fei and  Meng, Helen and Huang, Minlie},
  booktitle={Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing},
  year={2022}
}
Downloads last month
1,267
Safetensors
Model size
102M params
Tensor type
I64
·
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Spaces using thu-coai/roberta-base-cold 5