Update README.md
Browse files
README.md
CHANGED
@@ -5,6 +5,8 @@ pipeline_tag: text-classification
|
|
5 |
tags:
|
6 |
- protein language model
|
7 |
widget:
|
|
|
|
|
8 |
- text: "M S I N I C R D N H D P F Y R Y K M P P I Q A K V E G R G N G I K T A V L N V A D I S H A L N R P A P Y I V K Y F G F E L G A Q T S I S V D K D R Y L V N G V H E P A K L Q D V L D G F I N K F V L C G S C K N P E T E I I I T K D N D L V R D C K A C G K R T P M D L R H K L S S F I L K N P P D S V S G S K K K K K A A T A S A N V R G G G L S I S D I A Q G K S Q N A P S D G T G S S T P Q H H D E D E D E L S R Q I K A A A S T L E D I E V K D D E W A V D M S E E A I R A R A K E L E V N S E L T Q L D E Y G E W I L E Q A G E D K E N L P S D V E L Y K K A A E L D V L N D P K I G C V L A Q C L F D E D I V N E I A E H N A F F T K I L V T P E Y E K N F M G G I E R F L G L E H K D L I P L L P K I L V Q L Y N N D I I S E E E I M R F G T K S S K K F V P K E V S K K V R R A A K P F I T W L E T A E S D D D E E D D E [SEP] M S I E N L K S F D P F A D T G D D E T A T S N Y I H I R I Q Q R N G R K T L T T V Q G V P E E Y D L K R I L K V L K K D F A C N G N I V K D P E M G E I I Q L Q G D Q R A K V C E F M I S Q L G L Q K K N I K I H G F"
|
9 |
example_title: "Interacting proteins"
|
10 |
---
|
@@ -22,12 +24,31 @@ SYNTERACT achieved unprecedented performance over vast phylogeny with 92-96% acc
|
|
22 |
## How to use
|
23 |
|
24 |
```python
|
|
|
|
|
|
|
|
|
|
|
25 |
|
|
|
|
|
|
|
|
|
|
|
26 |
|
|
|
|
|
|
|
|
|
|
|
27 |
|
28 |
-
|
29 |
-
|
|
|
30 |
|
|
|
|
|
|
|
31 |
|
32 |
## Intended use and limitations
|
33 |
We define a protein-protein interaction as physical contact that mediates chemical or conformational change, especially with non-generic function. However, due to SYNTERACTS propensity to predict false positives we believe that it identifies plausible conformational changes caused by interactions without relevance to function. Therefore, predictions by SYNTERACT should always be taken with a grain of salt and used as a means of hypothesis generation or secondary validation.
|
|
|
5 |
tags:
|
6 |
- protein language model
|
7 |
widget:
|
8 |
+
- text: "M S H S V K I Y D T C I G C T Q C V R A C P T D V L E M I P W G G C K A K Q I A S A P R T E D C V G C K R C E S A C P T D F L S V R V Y L W H E T T R S M G L A Y [SEP] M I N L P S L F V P L V G L L F P A V A M A S L F L H V E K R L L F S T K K I N"
|
9 |
+
example_title: "Non-interacting proteins"
|
10 |
- text: "M S I N I C R D N H D P F Y R Y K M P P I Q A K V E G R G N G I K T A V L N V A D I S H A L N R P A P Y I V K Y F G F E L G A Q T S I S V D K D R Y L V N G V H E P A K L Q D V L D G F I N K F V L C G S C K N P E T E I I I T K D N D L V R D C K A C G K R T P M D L R H K L S S F I L K N P P D S V S G S K K K K K A A T A S A N V R G G G L S I S D I A Q G K S Q N A P S D G T G S S T P Q H H D E D E D E L S R Q I K A A A S T L E D I E V K D D E W A V D M S E E A I R A R A K E L E V N S E L T Q L D E Y G E W I L E Q A G E D K E N L P S D V E L Y K K A A E L D V L N D P K I G C V L A Q C L F D E D I V N E I A E H N A F F T K I L V T P E Y E K N F M G G I E R F L G L E H K D L I P L L P K I L V Q L Y N N D I I S E E E I M R F G T K S S K K F V P K E V S K K V R R A A K P F I T W L E T A E S D D D E E D D E [SEP] M S I E N L K S F D P F A D T G D D E T A T S N Y I H I R I Q Q R N G R K T L T T V Q G V P E E Y D L K R I L K V L K K D F A C N G N I V K D P E M G E I I Q L Q G D Q R A K V C E F M I S Q L G L Q K K N I K I H G F"
|
11 |
example_title: "Interacting proteins"
|
12 |
---
|
|
|
24 |
## How to use
|
25 |
|
26 |
```python
|
27 |
+
# Imports
|
28 |
+
import re
|
29 |
+
import torch
|
30 |
+
import torch.nn.functional as F
|
31 |
+
from transformers import BertForSequenceClassification, BertTokenizer
|
32 |
|
33 |
+
model = BertForSequenceClassification.from_pretrained('lhallee/SYNTERACT') # load model
|
34 |
+
tokenizer = BertTokenizer.from_pretrained('lhallee/SYNTERACT') # load tokenizer
|
35 |
+
device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu') # gather device
|
36 |
+
model.to(device) # move to device
|
37 |
+
model.eval() # put in eval mode
|
38 |
|
39 |
+
sequence_a = 'MEKSCSIGNGREQYGWGHGEQCGTQFLECVYRNASMYSVLGDLITYVVFLGATCYAILFGFRLLLSCVRIVLKVVIALFVIRLLLALGSVDITSVSYSG' # Uniprot A1Z8T3
|
40 |
+
sequence_b = 'MRLTLLALIGVLCLACAYALDDSENNDQVVGLLDVADQGANHANDGAREARQLGGWGGGWGGRGGWGGRGGWGGRGGWGGRGGWGGGWGGRGGWGGRGGGWYGR' # Uniprot A1Z8H0
|
41 |
+
sequence_a = ' '.join(list(re.sub(r'[UZOB]', 'X', sequence_a))) # need spaces inbetween amino acids
|
42 |
+
sequence_b = ' '.join(list(re.sub(r'[UZOB]', 'X', sequence_b))) # replace rare amino acids with X
|
43 |
+
example = sequence_a + ' [SEP] ' + sequence_b # add SEP token
|
44 |
|
45 |
+
example = tokenizer(example, return_tensors='pt', padding=False).to(device) # tokenize example
|
46 |
+
with torch.no_grad():
|
47 |
+
logits = model(**example).logits.cpu().detach() # get logits from model
|
48 |
|
49 |
+
probability = F.softmax(output, dim=-1) # use softmax to get "confidence" in the prediction
|
50 |
+
prediction = probability.argmax(dim=-1) # 0 for no interaction, 1 for interaction
|
51 |
+
```
|
52 |
|
53 |
## Intended use and limitations
|
54 |
We define a protein-protein interaction as physical contact that mediates chemical or conformational change, especially with non-generic function. However, due to SYNTERACTS propensity to predict false positives we believe that it identifies plausible conformational changes caused by interactions without relevance to function. Therefore, predictions by SYNTERACT should always be taken with a grain of salt and used as a means of hypothesis generation or secondary validation.
|