LibrAI/longformer-harmful-ro · Instruction-response formatting missing

Hi! The ablation study in the original paper proposing this classifier (https://arxiv.org/pdf/2308.13387) describes the performance gains from adding instruction to the classifier input (E Ablation Study Results), but doesn't describe the format used for joining instruction and response. What is the formatting used to acquire such gains described in this ablation study?

Moreover, the classifier seems to be trained on instruction-response pairs (Section 6.1: "Specifically, we fine-tune a PLM classifier over human annotations for each instruction–response pair, and use its predictions as the evaluation score."). Is the formatting of this pair identical to that used in the ablation study, and if not, what is its format?

Thanks in advance!