BiodivBERT

Model description

  • BiodivBERT is a domain-specific BERT based cased model for the biodiversity literature.
  • It uses the tokenizer from BERTT base cased model.
  • BiodivBERT is pre-trained on abstracts and full text from biodiversity literature.
  • BiodivBERT is fine-tuned on two down stream tasks for Named Entity Recognition and Relation Extraction in the biodiversity domain.
  • Please visit our GitHub Repo for more details.

How to use

  • You can use BiodivBERT via huggingface library as follows:
  1. Masked Language Model
>>> from transformers import AutoTokenizer, AutoModelForMaskedLM

>>> tokenizer = AutoTokenizer.from_pretrained("NoYo25/BiodivBERT")

>>> model = AutoModelForMaskedLM.from_pretrained("NoYo25/BiodivBERT")
  1. Token Classification - Named Entity Recognition
>>> from transformers import AutoTokenizer, AutoModelForTokenClassification

>>> tokenizer = AutoTokenizer.from_pretrained("NoYo25/BiodivBERT")

>>> model = AutoModelForTokenClassification.from_pretrained("NoYo25/BiodivBERT")
  1. Sequence Classification - Relation Extraction
>>> from transformers import AutoTokenizer, AutoModelForSequenceClassification

>>> tokenizer = AutoTokenizer.from_pretrained("NoYo25/BiodivBERT")

>>> model = AutoModelForSequenceClassification.from_pretrained("NoYo25/BiodivBERT")

Training data

  • BiodivBERT is pre-trained on abstracts and full text from biodiversity domain-related publications.
  • We used both Elsevier and Springer APIs to crawl such data.
  • We covered publications over the duration of 1990-2020.

Evaluation results

BiodivBERT overperformed both BERT_base_cased, biobert_v1.1, and BiLSTM as a baseline approach on the down stream tasks.

Downloads last month
62
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.