AmelieSchreiber
commited on
Commit
·
7bb7235
1
Parent(s):
578f706
Update README.md
Browse files
README.md
CHANGED
@@ -15,8 +15,7 @@ tags:
|
|
15 |
|
16 |
This is a Parameter Efficient Fine Tuning (PEFT) Low Rank Adaptation (LoRA) of
|
17 |
the [esm2_t6_8M_UR50D](https://huggingface.co/facebook/esm2_t6_8M_UR50D) model for the (binary) token classification task of
|
18 |
-
predicting RNA binding sites of proteins.
|
19 |
-
[found here](https://github.com/Amelie-Schreiber/esm2_LoRA_binding_sites/tree/main). You can also find a version of this model
|
20 |
that was fine-tuned without LoRA [here](https://huggingface.co/AmelieSchreiber/esm2_t6_8M_UR50D_rna_binding_site_predictor).
|
21 |
|
22 |
## Training procedure
|
@@ -25,7 +24,10 @@ This is a Low Rank Adaptation (LoRA) of `esm2_t6_8M_UR50D`,
|
|
25 |
trained on `166` protein sequences in the [RNA binding sites dataset](https://huggingface.co/datasets/AmelieSchreiber/data_of_protein-rna_binding_sites)
|
26 |
using a `80/20` train/test split. This model was trained with class weighting due to the imbalanced nature
|
27 |
of the RNA binding site dataset (fewer binding sites than non-binding sites). You can train your own version
|
28 |
-
using [this notebook](https://huggingface.co/AmelieSchreiber/esm2_t6_8M_weighted_lora_rna_binding/blob/main/LoRA_binding_sites_no_sweeps_v2.ipynb)!
|
|
|
|
|
|
|
29 |
|
30 |
```
|
31 |
{'eval_loss': 0.49476009607315063,
|
|
|
15 |
|
16 |
This is a Parameter Efficient Fine Tuning (PEFT) Low Rank Adaptation (LoRA) of
|
17 |
the [esm2_t6_8M_UR50D](https://huggingface.co/facebook/esm2_t6_8M_UR50D) model for the (binary) token classification task of
|
18 |
+
predicting RNA binding sites of proteins. You can also find a version of this model
|
|
|
19 |
that was fine-tuned without LoRA [here](https://huggingface.co/AmelieSchreiber/esm2_t6_8M_UR50D_rna_binding_site_predictor).
|
20 |
|
21 |
## Training procedure
|
|
|
24 |
trained on `166` protein sequences in the [RNA binding sites dataset](https://huggingface.co/datasets/AmelieSchreiber/data_of_protein-rna_binding_sites)
|
25 |
using a `80/20` train/test split. This model was trained with class weighting due to the imbalanced nature
|
26 |
of the RNA binding site dataset (fewer binding sites than non-binding sites). You can train your own version
|
27 |
+
using [this notebook](https://huggingface.co/AmelieSchreiber/esm2_t6_8M_weighted_lora_rna_binding/blob/main/LoRA_binding_sites_no_sweeps_v2.ipynb)!
|
28 |
+
You just need the RNA `binding_sites.xml` file [found here](https://huggingface.co/datasets/AmelieSchreiber/data_of_protein-rna_binding_sites).
|
29 |
+
A similar model can also be trained using the Github with a training script and conda env YAML, which can be
|
30 |
+
[found here](https://github.com/Amelie-Schreiber/esm2_LoRA_binding_sites/tree/main). This version uses wandb sweeps for hyperparameter search.
|
31 |
|
32 |
```
|
33 |
{'eval_loss': 0.49476009607315063,
|