BERT Fine-Tuned for Named Entity Recognition (NER)
This repository contains a BERT model fine-tuned for Named Entity Recognition (NER) tasks. The model was fine-tuned using the Hugging Face transformers
library and is capable of recognizing named entities like people, locations, organizations, and more from text.
Model Details
- Model Architecture:
BERT-base
- Fine-Tuning Task: Named Entity Recognition (NER)
- Dataset Used: This model was fine-tuned on the CoNLL-2003 NER dataset, which includes labeled data for entities such as persons, organizations, locations, and miscellaneous.
- Intended Use: The model is suitable for NER tasks in various applications, including information extraction, question answering, and chatbots.
Usage
You can use this model with the Hugging Face transformers
library to quickly get started with NER tasks. Below is an example of how to load and use this model for inference.
Installation
First, make sure you have the required packages:
pip install transformers
Loading the Model
from transformers import AutoTokenizer, AutoModelForTokenClassification
from transformers import pipeline
# Load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained("heenamir/bert-finetuned-ner")
model = AutoModelForTokenClassification.from_pretrained("heenamir/bert-finetuned-ner")
# Initialize the NER pipeline
nlp = pipeline("ner", model=model, tokenizer=tokenizer)
# Example text
text = "John Doe is a software engineer at OpenAI in San Francisco."
# Perform NER
entities = nlp(text)
print(entities)
Example Output
The model will return a list of entities in the following format:
[
{"entity": "B-PER", "score": 0.99, "index": 1, "word": "John", "start": 0, "end": 4},
{"entity": "I-PER", "score": 0.98, "index": 2, "word": "Doe", "start": 5, "end": 8},
{"entity": "B-ORG", "score": 0.95, "index": 7, "word": "OpenAI", "start": 28, "end": 34},
{"entity": "B-LOC", "score": 0.97, "index": 10, "word": "San Francisco", "start": 38, "end": 51},
]
Entity Labels
The model is fine-tuned to detect the following entity types:
- PER: Person
- ORG: Organization
- LOC: Location
- MISC: Miscellaneous
Scoring
The model outputs a score for each detected entity, representing its confidence level. You can use these scores to filter out low-confidence predictions if needed.
Model Performance
The model's performance can vary depending on the complexity and context of the input text. It performs well on structured text but may struggle with informal or highly technical language.
Evaluation Metrics
The model was evaluated on the CoNLL-2003 test set with the following metrics:
- Precision: 93.04%
- Recall: 94.98%
- F1 Score: 94%
Limitations and Considerations
- The model may not perform well on texts outside of the domains it was trained on.
- Like all NER models, it may occasionally misclassify entities or fail to recognize them, especially in cases of polysemy or ambiguity.
- It is also limited to English text, as it was fine-tuned on an English dataset.
Credits
- Fine-tuning and Model: Heena Mirchandani & Krish Murjani
- Dataset: CoNLL-2003 NER dataset
License
This model is available for use under the Apache License 2.0. See the LICENSE file for more details.
For more details on BERT and Named Entity Recognition, refer to the Hugging Face documentation.
- Downloads last month
- 6
Model tree for heenamir/bert-finetuned-ner
Base model
google-bert/bert-base-cased