File size: 2,778 Bytes
0e5156e
 
 
bb7f7f1
 
 
 
 
 
 
 
 
04331de
bb7f7f1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ce9a324
bb7f7f1
 
 
 
 
 
 
ce9a324
 
 
 
 
 
 
 
bb7f7f1
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
---
license: gpl-2.0
---

# Model Card for FupBERT

A descriptor free approach to predicting fraction unbound in human plasma.

## Model Details

### Model Description

Chemical specific parameters are either measured _in vitro_ or estimated using quantitative 
structure–activity relationship (QSAR) models. The existing body of QSAR work relies on extracting a 
set of descriptors or fingerprints, subset selection, and training a machine learning model. In this work, 
we used a state-of-the-art natural language processing model, Bidirectional Encoder Representations from Transformers 
(BERT), that allowed us to circumvent the need for calculation of these chemical descriptors. In this approach, 
simplified molecular-input line-entry system (SMILES) strings were embedded in a high dimensional space using a 
two-stage training approach. The model was first pre-trained on a masked SMILES token task and then fine-tuned on 
a QSAR prediction task. The pre-training task learned meaningful high dimensional embeddings based upon the relationships 
between the chemical tokens in the SMILES strings derived from the "in-stock" portion of the ZINC 15 dataset – a 
large dataset of commercially available chemicals. The fine-tuning task then perturbed the pre-trained embeddings 
to facilitate prediction of a specific QSAR endpoint of interest. The power of this model stems from the ability 
to reuse the pre-trained model for multiple different fine-tuning tasks, reducing the computational burden of developing 
multiple models for different endpoints. We used our framework to develop a predictive model for fraction unbound 
in human plasma (fup). This approach is flexible, requires minimum domain expertise, and can be generalized for 
other parameters of interest for rapid and accurate estimation of absorption, distribution, metabolism, excretion, and toxicity (ADMET).  



- **Developed by:** Michael Riedl, Sayak Mukherjee, and Mitch Gauthier
- **Model type:** BERT

### Model Sources

<!-- Provide the basic links for the model. -->

- **Paper:** Riedl, Michael, Sayak Mukherjee, and Mitch Gauthier. "Descriptor-Free Deep Learning QSAR Model for the Fraction Unbound in Human Plasma." Molecular Pharmaceutics (2023).
- **Demo:** https://huggingface.co/spaces/battelle/FupBERT_Space

## Citation

<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->

**BibTeX:**
```
@article{riedl2023descriptor,
  title={Descriptor-Free Deep Learning QSAR Model for the Fraction Unbound in Human Plasma},
  author={Riedl, Michael and Mukherjee, Sayak and Gauthier, Mitch},
  journal={Molecular Pharmaceutics},
  publisher={ACS Publications}
}
```

## Model Card Contact

[email protected]