metadata
tags:
- spacy
- token-classification
language:
- en
model-index:
- name: en_chemner
results:
- task:
name: NER
type: token-classification
metrics:
- name: NER Precision
type: precision
value: 0.9906542056
- name: NER Recall
type: recall
value: 0.9636363636
- name: NER F Score
type: f_score
value: 0.9769585253
widget:
- text: >-
Cinammaldehyde is a fragrant compound found in cinammon. Icosanoic acid,
is a saturated fatty acid with a 20-carbon chain. Triptane is commonly
used as an anti-knock additive in aviation fuels. Benzophenone is a widely
used building block in organic chemistry, being the parent diarylketone.
Geraniol is a monoterpenoid and an alcohol. It is the primary component of
citronella oil and is a primary component of rose oil, palmarosa oil.
license: apache-2.0
en_chemner: A spaCy Model for Chemical NER
Model Description
The en_chemner
model is a specialized Named Entity Recognition (NER) tool designed for the field of chemistry. Built using the spaCy framework,
it identifies and classifies chemical entities within English-language texts.
Key Features
- High Precision and Recall: With a precision of 99.07% and a recall of 96.36%, the model offers highly accurate entity recognition, minimizing both false positives and false negatives.
- Rich Label Scheme: The model can identify a variety of chemical entities such as alcohols, aldehydes, alkanes, and more, making it versatile for different chemical analysis tasks.
- Optimized for spaCy: Integrated seamlessly with spaCy (>=3.6.1,<3.7.0), allowing for easy incorporation into existing spaCy pipelines and applications.
- Extensive Vector Library: Comes with over 514,000 unique vectors, each with 300 dimensions, providing a rich foundation for understanding and classifying chemical entities.
Use Cases
The en_chemner
model is ideal for:
- Chemical Literature Analysis: Automatically extracting chemical entities from research papers, patents, and textbooks.
- Data Annotation: Assisting in the annotation of chemical databases or creating datasets for further machine learning tasks.
- Educational Purposes: Helping students in chemistry-related fields to identify and understand various chemical compounds and their classifications.
Feature | Description |
---|---|
Name | en_chemner |
Version | 1.0.0 |
spaCy | >=3.6.1,<3.7.0 |
Default Pipeline | tok2vec , ner |
Components | tok2vec , ner |
Vectors | 514157 keys, 514157 unique vectors (300 dimensions) |
Sources | n/a |
License | n/a |
Author | n/a |
Label Scheme
View label scheme (7 labels for 1 components)
Component | Labels |
---|---|
ner |
ALCOHOL , ALDEHYDE , ALKANE , ALKENE , ALKYNE , C_ACID , KETONE |
Accuracy
Type | Score |
---|---|
ENTS_F |
97.70 |
ENTS_P |
99.07 |
ENTS_R |
96.36 |
TOK2VEC_LOSS |
151.95 |
NER_LOSS |
259.22 |