Model Card for Model ID
This model card describes the AD-BERT model, which was initialized from Bio+ClinicalBERT trained on NWEDW AD-related progress notes.
Model Details
Pretraining Data
The AD-BERT
model was trained on AD related progress notes from NMEDW (private).
Model Pretraining
Note Preprocessing
We preprocessed each progress note derived from the EHRs as follows:
(1) Deidentification: Clinical notes contain legally Protected Health Information (PHI), such as patient names, addresses, and phone numbers, which should not be released to the public and should not be used for most research applications. We use the package Philter56 to remove PHI for clinical notes.
(2) Cleaning: We removed non-ASCII characters from the notes and replaced multiple contiguous white spaces with one blank space.
(3) Splitting: Each note is split into sections by the newline character (‘\n’). Each section is as an input to the model independently for section representation. And the embedding vector for a patient generated by the MaxPooling of all the section representations.
Pretraining Procedures
The model was trained using code from Hugging Face transformer repository on a Tesla V100 X 32 GB GPU. Model parameters were initialized with Bio+Clinical BERT.
Pretraining code
INPUT=<customer defined name>
export TRAIN_FILE=<path to the training corpus>
export INITIAL_PATH=<path to the initialized model> (here is Bio+Clinical BERT)
CUDA_VISIBLE_DEVICES=1 python3 ./lm_finetuning/simple_lm_finetuning.py \
--train_corpus $TRAIN_FILE \
--bert_model $INITIAL_PATH \
--do_lower_case \
--output_dir ./AD_bert_simple_lm_${INPUT}/ \
--on_memory \
--do_train &
How to use the model
Load the model via the transformers library:
from transformers import BertTokenizer, BertModel
tokenizer = BertTokenizer.from_pretrained(<path to model>)
model = BertModel.from_pretrained(<path to model>)
More Information
Refer to the original paper, AD-BERT: Using Pre-trained contextualized embeddings to Predict the Progression from Mild Cognitive Impairment to Alzheimer's Disease for additional details and performance on MCI-to-AD prediction tasks.
Questions?
Email [email protected] with any questions.
- Downloads last month
- 2