AmelieSchreiber
/

esm2_t6_8M_weighted_lora_rna_binding

protein language model

Model card Files Files and versions Community

AmelieSchreiber commited on Aug 9, 2023

Commit

7bb7235

·

1 Parent(s): 578f706

Update README.md

Files changed (1) hide show

README.md +5 -3

README.md CHANGED Viewed

@@ -15,8 +15,7 @@ tags:
 This is a Parameter Efficient Fine Tuning (PEFT) Low Rank Adaptation (LoRA) of
 the [esm2_t6_8M_UR50D](https://huggingface.co/facebook/esm2_t6_8M_UR50D) model for the (binary) token classification task of
-predicting RNA binding sites of proteins. The Github with the training script and conda env YAML can be
-[found here](https://github.com/Amelie-Schreiber/esm2_LoRA_binding_sites/tree/main). You can also find a version of this model
 that was fine-tuned without LoRA [here](https://huggingface.co/AmelieSchreiber/esm2_t6_8M_UR50D_rna_binding_site_predictor).
 ## Training procedure
@@ -25,7 +24,10 @@ This is a Low Rank Adaptation (LoRA) of `esm2_t6_8M_UR50D`,
 trained on `166` protein sequences in the [RNA binding sites dataset](https://huggingface.co/datasets/AmelieSchreiber/data_of_protein-rna_binding_sites)
 using a `80/20` train/test split. This model was trained with class weighting due to the imbalanced nature
 of the RNA binding site dataset (fewer binding sites than non-binding sites). You can train your own version
-using [this notebook](https://huggingface.co/AmelieSchreiber/esm2_t6_8M_weighted_lora_rna_binding/blob/main/LoRA_binding_sites_no_sweeps_v2.ipynb)!
 ```
 {'eval_loss': 0.49476009607315063,

 This is a Parameter Efficient Fine Tuning (PEFT) Low Rank Adaptation (LoRA) of
 the [esm2_t6_8M_UR50D](https://huggingface.co/facebook/esm2_t6_8M_UR50D) model for the (binary) token classification task of
+predicting RNA binding sites of proteins. You can also find a version of this model
 that was fine-tuned without LoRA [here](https://huggingface.co/AmelieSchreiber/esm2_t6_8M_UR50D_rna_binding_site_predictor).
 ## Training procedure
 trained on `166` protein sequences in the [RNA binding sites dataset](https://huggingface.co/datasets/AmelieSchreiber/data_of_protein-rna_binding_sites)
 using a `80/20` train/test split. This model was trained with class weighting due to the imbalanced nature
 of the RNA binding site dataset (fewer binding sites than non-binding sites). You can train your own version
+using [this notebook](https://huggingface.co/AmelieSchreiber/esm2_t6_8M_weighted_lora_rna_binding/blob/main/LoRA_binding_sites_no_sweeps_v2.ipynb)!
+You just need the RNA `binding_sites.xml` file [found here](https://huggingface.co/datasets/AmelieSchreiber/data_of_protein-rna_binding_sites).
+A similar model can also be trained using the Github with a training script and conda env YAML, which can be
+[found here](https://github.com/Amelie-Schreiber/esm2_LoRA_binding_sites/tree/main). This version uses wandb sweeps for hyperparameter search.
 ```
 {'eval_loss': 0.49476009607315063,