AcroBERT can do end-to-end acronym linking (see the Demo here). Given a sentence, our framework first recognize acronyms by using MadDog, and then disambiguate them by using AcroBERT:

from inference.acrobert important acronym_linker

# input sentence with acronyms, the maximum length is 400 sub-tokens
sentence = "This new genome assembly and the annotation are tagged as a RefSeq genome by NCBI."

# mode = ['acrobert', 'pop']
# AcroBERT has a better performance while the pop method is faster but with a low accuracy.
results = acronym_linker(sentence, mode='acrobert')
print(results)

## expected output: [('NCBI', 'National Center for Biotechnology Information')]

Github: https://github.com/tigerchen52/GLADIS

Model: [https://zenodo.org/record/7568937#.Y9vtrXaZMuU]

Apart from the AcroBERT, we constructed a new benchmark named GLADIS for accelerating the research on acronym disambiguation, which contains the below data:

Source Desc
Acronym Dictionary Pile (MIT license), Wikidata, UMLS 1.6 million acronyms and 6.4 million long forms
Three Datasets WikilinksNED Unseen, SciAD(CC BY-NC-SA 4.0), Medmentions(CC0 1.0) three AD datasets that cover general, scientific, biomedical domains
A Pre-training Corpus Pile (MIT license) 160 million sentences with acronyms

usage

  1. git clone https://github.com/tigerchen52/GLADIS.git
  2. download the acronym dictionary and AcroBERT, and put them into this path: input/
  3. use the function inference.acrobert.acronym_linker() to do end-to-end acronym linking.

citation

@inproceedings{chen2023gladis,
  title={GLADIS: A General and Large Acronym Disambiguation Benchmark},
  author={Chen, Lihu and Varoquaux, Ga{\"e}l and Suchanek, Fabian M},
  booktitle={EACL 2023-The 17th Conference of the European Chapter of the Association for Computational Linguistics},
  year={2023}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Examples
Unable to determine this model's library. Check the docs .