Inquiry on ProtGPT2 Model Performance and Fine-Tuning Evaluation

#38
by littleworth - opened

Hi Noelia,

I'd like to thank you with your work on ProtGPT2 and its application in protein design.
It's a notable contribution to the field.

I am currently engaged in a project where I am utilizing ProtGPT2, and I have a couple of queries:

  1. Could you share the accuracy and loss metrics during both the training and evaluation phases of ProtGPT2? This information is vital for me to benchmark my fine-tuning results against the baseline performance of the model.

  2. Following the guidelines provided on the Hugging Face model page, I have fine-tuned the ProtGPT2 model for a specific task. To ensure the quality of the fine-tuning, could you recommend methodologies to effectively evaluate the fine-tuned model, focusing on assessing for 'catastrophic forgetting', ensuring it retains some of its prior knowledge?

Thank you for your time and consideration. I look forward to your response.

Sincerely,
LW

Hi LW,

Sorry I did not reply sooner, I was on leave.

  1. The loss was quite high for training and eval, probably due to the large vocabulary size (52k tokens). See below the eval and train loss for the last epoch.
      "epoch": 49.88,
      "learning_rate": 2.4673951357067326e-06,
      "loss": 6.5147,
      "step": 424500
    },
    {
      "epoch": 49.94,
      "learning_rate": 1.292445071084479e-06,
      "loss": 6.5139,
      "step": 425000
    },
    {
      "epoch": 49.94,
      "eval_loss": 6.520303726196289,
      "eval_runtime": 231.0929,
      "eval_samples_per_second": 5275.233,
      "eval_steps_per_second": 5.154,
      "step": 425000
    },
    {
      "epoch": 49.99,
      "learning_rate": 1.1749500646222535e-07,
      "loss": 6.513,
      "step": 425500
    }
  1. I haven't evaluated this myself, but I'd probably try to generate a sample of around 1000 sequences and pick the top 1/3 based on perplexity. Then from those, I'd run ESMfold and check their pLDDTs. I'd expect they have pLDDT values over 70. Beyond this, if you are fine-tuned on a specific family, I'd check that the generated sequences indeed look like members of that family. For example, if you fine-tuned on TIM-barrels, I'd check those sequences are indeed TIM-barrels. In my experience, the model tends to generate other families as well after fine-tuning, as opposed to ZymCTRL which sticks to the fine-tuned family. Hope this helps!

Sign up or log in to comment