yizhao-risk-en-scorer

Introduction

This is a BERT model fine-tuned on a high-quality English financial dataset. It generates a security risk score, which helps to identify and remove data with security risks from financial datasets, thereby reducing the proportion of illegal or undesirable data. For the complete data cleaning process, please refer to YiZhao.

Quickstart

Here is an example code snippet for generating security risk scores using this model.

from transformers import AutoTokenizer, AutoModelForSequenceClassification

text = "You are a smart robot"
risk_model_name = "risk-model-en-v0.1"

risk_tokenizer = AutoTokenizer.from_pretrained(risk_model_name)
risk_model = AutoModelForSequenceClassification.from_pretrained(risk_model_name)

risk_inputs = risk_tokenizer(text, return_tensors="pt", padding="longest", truncation=True)
risk_outputs = risk_model(**risk_inputs)
risk_logits = risk_outputs.logits.squeeze(-1).float().detach().numpy()

risk_score = risk_logits.item()

result = {
    "text": text,
    "risk_score": risk_score
}

print(result)
# {'text': 'You are a smart robot', 'risk_score': 0.11226219683885574}
Downloads last month
46
Safetensors
Model size
22.7M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Collection including HIT-TMG/yizhao-risk-en-scorer