--- license: mit widget: - text: "MERVAVVGVPMDLGANRRGVDMGPSALRYARLLEQLEDLGYTVEDLGDVPVSLARASRRRGRGLAYLEEIRAAALVLKERLAALPEGVFPIVLGGDHSLSMGSVAGAARGRRVGVVWVDAHADFNTPETSPSGNVHGMPLAVLSGLGHPRLTEVFRAVDPKDVVLVGVRSLDPGEKRLLKEAGVRVY" --- ## Label Semantics: Label 0: Non-crystallizable (Negative) Label 1: Crystallizable (Positive) ## Dataset 1. [DeepCrystal Train](https://huggingface.co/jaykmr/ESMCrystal_t12_35M_v2/blob/main/Datasets/train.csv) 2. [DeepCrystal Test](https://huggingface.co/jaykmr/ESMCrystal_t12_35M_v2/blob/main/Datasets/test.csv) 3. [BCrystal Test](https://huggingface.co/jaykmr/ESMCrystal_t12_35M_v2/tree/main/Datasets/BCrystal_Balanced_Test_set) 4. [SP Test](https://huggingface.co/jaykmr/ESMCrystal_t12_35M_v2/tree/main/Datasets/SP_Final_set) 5. [TR Test](https://huggingface.co/jaykmr/ESMCrystal_t12_35M_v2/tree/main/Datasets/TR_Final_set) ## Model #### ESMCrystal_t12_35M_v2 ESMCrystal_t12_35M_v2 is a state-of-the-art protein crystallization prediction model finetuned on [esm2_t12_35M_UR50D](https://huggingface.co/facebook/esm2_t12_35M_UR50D), having 12 layers and 35M parameters with size of [approx. 136MB](https://huggingface.co/jaykmr/ESMCrystal_t12_35M_v2/blob/main/pytorch_model.bin) using transfer learning to predict whether an input protein sequence will crystallize or not. ## Accuracy : | Dataset | Accuracy | |------------------|--------------------| | DeepCrystal Test | 0.8161222339304531 | | BCrystal test | 0.8052602126468943 | | SP test | 0.7637130801687764 | | TR test | 0.8389328063241107 | ## Comparision Table: | Dataset | Count | Positives | Negatives | TP | FP | FN | TN | Precision | Recall | F1 | Accuracy | ROC | Mathew's Coefficient | PPV | NPV | |-------------------|-------------|---------------|---------------|--------|------------|-----------|------------|-------------------|-------------------|-------------------|-------------------|---------------|--------------------------|-------------------|-------------------| | | | | | | | | | | | | | | | | | | DeepCrystalTest | 1898 | 898 | 1000 | 579 | 319 | 30 | 970 | 0.64476615 | 0.95073892 | 0.76841407 | 0.81612223 | 0.9403 | 0.657526117 | 0.64476615 | 0.97 | | | | | | | | | | | | | | | | | | | BCrystal Test | 1787 | 891 | 896 | 573 | 318 | 30 | 866 | 0.64309764 | 0.95024876 | 0.76706827 | 0.80526021 | 0.9396 | 0.644635696 | 0.64309764 | 0.96651786 | | | | | | | | | | | | | | | | | | | SP Test | 237 | 148 | 89 | 97 | 51 | 5 | 84 | 0.65540541 | 0.95098039 | 0.776 | 0.76371308 | 0.9293 | 0.586069704 | 0.65540541 | 0.94382022 | | | | | | | | | | | | | | | | | | | TR Test | 1012 | 374 | 638 | 225 | 149 | 14 | 624 | 0.60160428 | 0.94142259 | 0.73409462 | 0.83893281 | 0.9562 | 0.658766192 | 0.60160428 | 0.97805643 | | | | | | | | | | | | | | | | | | ## Graphs ### ROC-AUC Curve * DeepCrystal Test ![Test ROC-AUC Curve](https://huggingface.co/jaykmr/ESMCrystal_t12_35M_v2/blob/main/Graphs/ROC-final-test.png?raw=true) * BCrystal Test ![BCrystal Test ROC-AUC Curve](https://huggingface.co/jaykmr/ESMCrystal_t12_35M_v2/blob/main/Graphs/ROC-final-BCtest.png?raw=true) * SP Test ![SP Test ROC-AUC Curve](https://huggingface.co/jaykmr/ESMCrystal_t12_35M_v2/blob/main/Graphs/ROC-final-SPtest.png?raw=true) * TR Test ![TR Test ROC-AUC Curve](https://huggingface.co/jaykmr/ESMCrystal_t12_35M_v2/blob/main/Graphs/ROC-final-TRtest.png?raw=true) ### PR-AUC Curve * DeepCrystal Test ![Test PR-AUC Curve](https://huggingface.co/jaykmr/ESMCrystal_t12_35M_v2/blob/main/Graphs/PR-final-test.png?raw=true) * BCrystal Test ![BCrystal Test PR-AUC Curve](https://huggingface.co/jaykmr/ESMCrystal_t12_35M_v2/blob/main/Graphs/PR-final-BCtest.png?raw=true) * SP Test ![SP Test PR-AUC Curve](https://huggingface.co/jaykmr/ESMCrystal_t12_35M_v2/blob/main/Graphs/PR-final-SPtest.png?raw=true) * TR Test ![TR Test PR-AUC Curve](https://huggingface.co/jaykmr/ESMCrystal_t12_35M_v2/blob/main/Graphs/PR-final-TRtest.png?raw=true) ## Final scores : * on DeepCrystal test: | | precision | recall | f1-score | support | |--------------------|-----------|--------|----------|---------| | non-crystallizable | 0.75 | 0.97 | 0.85 | 1000 | | crystallizable | 0.95 | 0.64 | 0.77 | 898 | | accuracy | | | 0.82 | 1898 | | macro avg | 0.85 | 0.81 | 0.81 | 1898 | | weighted avg | 0.85 | 0.82 | 0.81 | 1898 | * on BCrystal test: | | precision | recall | f1-score | support | |--------------------|-----------|--------|----------|---------| | non-crystallizable | 0.73 | 0.97 | 0.83 | 896 | | crystallizable | 0.95 | 0.64 | 0.77 | 891 | | accuracy | | | 0.81 | 1787 | | macro avg | 0.84 | 0.80 | 0.80 | 1787 | | weighted avg | 0.84 | 0.81 | 0.80 | 1787 | * on SP test: | | precision | recall | f1-score | support | |--------------------|-----------|--------|----------|---------| | non-crystallizable | 0.62 | 0.94 | 0.75 | 89 | | crystallizable | 0.95 | 0.66 | 0.78 | 148 | | accuracy | | | 0.76 | 237 | | macro avg | 0.79 | 0.80 | 0.76 | 237 | | weighted avg | 0.83 | 0.76 | 0.77 | 237 | * on TR test: | | precision | recall | f1-score | support | |--------------------|-----------|--------|----------|---------| | non-crystallizable | 0.81 | 0.98 | 0.88 | 638 | | crystallizable | 0.94 | 0.60 | 0.73 | 374 | | accuracy | | | 0.84 | 1012 | | macro avg | 0.87 | 0.79 | 0.81 | 1012 | | weighted avg | 0.86 | 0.84 | 0.83 | 1012 | ## Confusion matrix: * on DeepCrystal test: ``` | 579 | 319 | | 30 | 970 | ``` * on BCrystal test: ``` | 573 | 318 | | 30 | 866 | ``` * on SP test: ``` | 97 | 51 | | 5 | 84 | ``` * on TR test: ``` | 225 | 149 | | 14 | 624 | ``` ## Metrics roc score: * on DeepCrystal test: 0.9403474387527841 * on BCrystal test: 0.9395705567580568 * on SP test: 0.9293197692074097 * on TR test: 0.9561924798417515 Mathews Coefficient: * on DeepCrystal test: 0.6575261170551334 * on BCrystal test: 0.6446356961702661 * on SP test: 0.586069703866632 * on TR test: 0.6587661924247377 NPV: * on DeepCrystal test: 0.97 * on BCrystal test: 0.9665178571428571 * on SP test: 0.9438202247191011 * on TR test: 0.9780564263322884 PPV: * on DeepCrystal test: 0.6447661469933185 * on BCrystal test: 0.6430976430976431 * on SP test: 0.6554054054054054 * on TR test: 0.6016042780748663 Researchers: * [Jayanth Kumar](https://jaykmr.com) * [Kavya Jaykumar](https://www.linkedin.com/in/kavya-jayakumar-6390271b5/) Credits: * [Meta ESMFold2](https://github.com/facebookresearch/esm) * [Huggingface](https://huggingface.co/jaykmr) * [Paperspace Compute Cloud](https://www.paperspace.com/)