herg_karim-Morgan / README.md
avelezarce's picture
Update README.md
00f40c1 verified
metadata
language:
  - en
metrics:
  - accuracy
  - AUC ROC
  - precision
  - recall
tags:
  - biology
  - chemistry
  - therapeutic science
  - drug design
  - drug development
  - therapeutics
library_name: tdc
license: bsd-2-clause

Dataset description

An integrated Ether-a-go-go-related gene (hERG) dataset consisting of molecular structures labeled as hERG (<10uM) and non-hERG (>=10uM) blockers in the form of SMILES strings was obtained from the DeepHIT, the BindingDB database, ChEMBL bioactivity database, and other literature.

Task description

Binary classification. Given a drug SMILES string, predict whether it blocks (1, <10uM) or not blocks (0, >=10uM).

Dataset statistics

Total: 13445; Train_val: 12620; Test: 825

Pre-requisites

Install the following packages

pip install PyTDC
pip install DeepPurpose
pip install git+https://github.com/bp-kelley/descriptastorus
pip install dgl torch torchvision

You can also reference the colab notebook here

Dataset split

Random split on 70% training, 10% validation, and 20% testing

To load the dataset in TDC, type

from tdc.single_pred import Tox
data = Tox(name = 'herg_karim')

Model description

Morgan chemical fingerprint with an MLP decoder. The model is tuned with 100 runs using the Ax platform.

To load the pre-trained model, type

from tdc import tdc_hf_interface
tdc_hf = tdc_hf_interface("hERG_Karim-Morgan")
# load deeppurpose model from this repo
dp_model = tdc_hf.load_deeppurpose('./data')
tdc_hf.predict_deeppurpose(dp_model, ['CC(=O)NC1=CC=C(O)C=C1'])

References