en_chemner / README.md
victormurcia's picture
Update README.md
893983d
metadata
tags:
  - spacy
  - token-classification
language:
  - en
model-index:
  - name: en_chemner
    results:
      - task:
          name: NER
          type: token-classification
        metrics:
          - name: NER Precision
            type: precision
            value: 0.9906542056
          - name: NER Recall
            type: recall
            value: 0.9636363636
          - name: NER F Score
            type: f_score
            value: 0.9769585253
widget:
  - text: >-
      Cinammaldehyde is a fragrant compound found in cinammon. Icosanoic acid,
      is a saturated fatty acid with a 20-carbon chain. Triptane is commonly
      used as an anti-knock additive in aviation fuels. Benzophenone is a widely
      used building block in organic chemistry, being the parent diarylketone.
      Geraniol is a monoterpenoid and an alcohol. It is the primary component of
      citronella oil and is a primary component of rose oil, palmarosa oil.
license: apache-2.0

en_chemner: A spaCy Model for Chemical NER

Model Description

The en_chemner model is a specialized Named Entity Recognition (NER) tool designed for the field of chemistry. Built using the spaCy framework, it identifies and classifies chemical entities within English-language texts.

Key Features

  • High Precision and Recall: With a precision of 99.07% and a recall of 96.36%, the model offers highly accurate entity recognition, minimizing both false positives and false negatives.
  • Rich Label Scheme: The model can identify a variety of chemical entities such as alcohols, aldehydes, alkanes, and more, making it versatile for different chemical analysis tasks.
  • Optimized for spaCy: Integrated seamlessly with spaCy (>=3.6.1,<3.7.0), allowing for easy incorporation into existing spaCy pipelines and applications.
  • Extensive Vector Library: Comes with over 514,000 unique vectors, each with 300 dimensions, providing a rich foundation for understanding and classifying chemical entities.

Use Cases

The en_chemner model is ideal for: - Chemical Literature Analysis: Automatically extracting chemical entities from research papers, patents, and textbooks. - Data Annotation: Assisting in the annotation of chemical databases or creating datasets for further machine learning tasks. - Educational Purposes: Helping students in chemistry-related fields to identify and understand various chemical compounds and their classifications.

Feature Description
Name en_chemner
Version 1.0.0
spaCy >=3.6.1,<3.7.0
Default Pipeline tok2vec, ner
Components tok2vec, ner
Vectors 514157 keys, 514157 unique vectors (300 dimensions)
Sources n/a
License n/a
Author n/a

Label Scheme

View label scheme (7 labels for 1 components)
Component Labels
ner ALCOHOL, ALDEHYDE, ALKANE, ALKENE, ALKYNE, C_ACID, KETONE

Accuracy

Type Score
ENTS_F 97.70
ENTS_P 99.07
ENTS_R 96.36
TOK2VEC_LOSS 151.95
NER_LOSS 259.22