AmelieSchreiber commited on
Commit
7bb7235
·
1 Parent(s): 578f706

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -3
README.md CHANGED
@@ -15,8 +15,7 @@ tags:
15
 
16
  This is a Parameter Efficient Fine Tuning (PEFT) Low Rank Adaptation (LoRA) of
17
  the [esm2_t6_8M_UR50D](https://huggingface.co/facebook/esm2_t6_8M_UR50D) model for the (binary) token classification task of
18
- predicting RNA binding sites of proteins. The Github with the training script and conda env YAML can be
19
- [found here](https://github.com/Amelie-Schreiber/esm2_LoRA_binding_sites/tree/main). You can also find a version of this model
20
  that was fine-tuned without LoRA [here](https://huggingface.co/AmelieSchreiber/esm2_t6_8M_UR50D_rna_binding_site_predictor).
21
 
22
  ## Training procedure
@@ -25,7 +24,10 @@ This is a Low Rank Adaptation (LoRA) of `esm2_t6_8M_UR50D`,
25
  trained on `166` protein sequences in the [RNA binding sites dataset](https://huggingface.co/datasets/AmelieSchreiber/data_of_protein-rna_binding_sites)
26
  using a `80/20` train/test split. This model was trained with class weighting due to the imbalanced nature
27
  of the RNA binding site dataset (fewer binding sites than non-binding sites). You can train your own version
28
- using [this notebook](https://huggingface.co/AmelieSchreiber/esm2_t6_8M_weighted_lora_rna_binding/blob/main/LoRA_binding_sites_no_sweeps_v2.ipynb)!
 
 
 
29
 
30
  ```
31
  {'eval_loss': 0.49476009607315063,
 
15
 
16
  This is a Parameter Efficient Fine Tuning (PEFT) Low Rank Adaptation (LoRA) of
17
  the [esm2_t6_8M_UR50D](https://huggingface.co/facebook/esm2_t6_8M_UR50D) model for the (binary) token classification task of
18
+ predicting RNA binding sites of proteins. You can also find a version of this model
 
19
  that was fine-tuned without LoRA [here](https://huggingface.co/AmelieSchreiber/esm2_t6_8M_UR50D_rna_binding_site_predictor).
20
 
21
  ## Training procedure
 
24
  trained on `166` protein sequences in the [RNA binding sites dataset](https://huggingface.co/datasets/AmelieSchreiber/data_of_protein-rna_binding_sites)
25
  using a `80/20` train/test split. This model was trained with class weighting due to the imbalanced nature
26
  of the RNA binding site dataset (fewer binding sites than non-binding sites). You can train your own version
27
+ using [this notebook](https://huggingface.co/AmelieSchreiber/esm2_t6_8M_weighted_lora_rna_binding/blob/main/LoRA_binding_sites_no_sweeps_v2.ipynb)!
28
+ You just need the RNA `binding_sites.xml` file [found here](https://huggingface.co/datasets/AmelieSchreiber/data_of_protein-rna_binding_sites).
29
+ A similar model can also be trained using the Github with a training script and conda env YAML, which can be
30
+ [found here](https://github.com/Amelie-Schreiber/esm2_LoRA_binding_sites/tree/main). This version uses wandb sweeps for hyperparameter search.
31
 
32
  ```
33
  {'eval_loss': 0.49476009607315063,