Overview
This model serves to enhance the precision and accuracy of personal information detection by utilizing a reduced label set compared to its base model. Through this refinement, it aims to provide superior labeling precision for identifying personal information across multiple languages.
Features
Improved Precision: By reducing the label set size from the base model, the model enhances the precision of the labeling procedure, ensuring more reliable identification of sensitive information.
Model Versions:
Maximum Accuracy Focus: This version aims to achieve the highest possible accuracy in the detection process, making it suitable for applications where minimizing errors is crucial.
Maximum Precision Focus: This variant is designed to maximize the precision of the detection, ideal for scenarios where false positives are particularly undesirable.
Installation
To run this model, you will need to install the dependencies:
pip install torch transformers safetensors
Usage
Load and run the model using PyTorch and transformers:
from transformers import AutoModelForTokenClassification, AutoConfig, BertTokenizerFast
from safetensors.torch import load_file
# Load the config
config = AutoConfig.from_pretrained("folder_to_model")
# Initialize the model with the config
model = AutoModelForTokenClassification.from_config(config)
# Load the safetensors weights
state_dict = load_file("folder_to_tensors")
# Load the state dict into the model
model.load_state_dict(state_dict)
# Load the tokenizer
tokenizer = BertTokenizerFast.from_pretrained("google-bert/bert-base-multilingual-cased")
# Load the label mapper if needed
with open("pii_model/label_mapper.json", 'r') as f:
label_mapper_data = json.load(f)
label_mapper = LabelMapper()
label_mapper.label_to_id = label_mapper_data['label_to_id']
label_mapper.id_to_label = {int(k): v for k, v in label_mapper_data['id_to_label'].items()}
label_mapper.num_labels = label_mapper_data['num_labels']
# Process outputs for analysis...
Evaluation
- Accuracy Model: Focused on minimizing errors, evaluates to achieve the highest accuracy metrics.
- Precision Model: Designed to minimize false positives, optimizing for precision-driven applications.
Disclaimer
The publisher of this repository is not affiliated with Ai4Privacy and Ai Suisse SA
Honorary Mention
This repo created during the Hackaton organized by NeuralWave
- Downloads last month
- 14
Model tree for hyacinthum/Piidgeon-ai4privacy
Base model
microsoft/mdeberta-v3-base