AmelieSchreiber
commited on
Commit
·
101ed72
1
Parent(s):
d9babe5
Update README.md
Browse files
README.md
CHANGED
@@ -21,6 +21,7 @@ tags:
|
|
21 |
---
|
22 |
# ESM-2 for Binding Site Prediction
|
23 |
|
|
|
24 |
This model *may be* close to SOTA compared to [these SOTA structural models](https://www.biorxiv.org/content/10.1101/2023.08.11.553028v1).
|
25 |
One of the primary goals in training this model is to prove the viability of using simple, single sequence only protein language models
|
26 |
for binary token classification tasks like predicting binding and active sites of protein sequences based on sequence alone. This project
|
@@ -43,6 +44,23 @@ dataset [found here](https://huggingface.co/datasets/AmelieSchreiber/binding_sit
|
|
43 |
this model has a high recall, meaning it is likely to detect binding sites, but it has a precision score that is somewhat lower than the SOTA
|
44 |
structural models mentioned above, meaning the model may return some false positives as well.
|
45 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
46 |
|
47 |
## Running Inference
|
48 |
|
|
|
21 |
---
|
22 |
# ESM-2 for Binding Site Prediction
|
23 |
|
24 |
+
**This model is overfit (see below)**
|
25 |
This model *may be* close to SOTA compared to [these SOTA structural models](https://www.biorxiv.org/content/10.1101/2023.08.11.553028v1).
|
26 |
One of the primary goals in training this model is to prove the viability of using simple, single sequence only protein language models
|
27 |
for binary token classification tasks like predicting binding and active sites of protein sequences based on sequence alone. This project
|
|
|
44 |
this model has a high recall, meaning it is likely to detect binding sites, but it has a precision score that is somewhat lower than the SOTA
|
45 |
structural models mentioned above, meaning the model may return some false positives as well.
|
46 |
|
47 |
+
## Overfitting Issues
|
48 |
+
|
49 |
+
```python
|
50 |
+
({'accuracy': 0.9908574638195745,
|
51 |
+
'precision': 0.7748830511095647,
|
52 |
+
'recall': 0.9862043939282111,
|
53 |
+
'f1': 0.8678649909611492,
|
54 |
+
'auc': 0.9886039823329382,
|
55 |
+
'mcc': 0.8699396085712834},
|
56 |
+
{'accuracy': 0.9486280975482552,
|
57 |
+
'precision': 0.40980984516603186,
|
58 |
+
'recall': 0.827004864790918,
|
59 |
+
'f1': 0.5480444772577421,
|
60 |
+
'auc': 0.890196425388581,
|
61 |
+
'mcc': 0.560633448203768})
|
62 |
+
```
|
63 |
+
|
64 |
|
65 |
## Running Inference
|
66 |
|