I have a list of sentences: X = ["Today is Tuesday", "I went to the store", "This is a computer",....]
and for each sentence the label is vector of 5 floats: y = [[1,4,3,1,7], [5,1,2,8,9],[0,1,6,5,2],....]
I want to finetune BERT (or other sutiable pre-trained LM) with the proper head to predict the labels.
But I couldnt find ant example to something similar.
Can someone please provide a code sample as to how I can do it?
I am really helpless.
@nbroad Sure! sorry for being unclear.
The label vec is a representation vector of other higher dimensional data , so the order does matter.
Actually the head I thought would be the most reasonable to add is sklearn.linear_model.Ridge
Which suitable as it is compatible it multiple output for y.
What am I missing is how to add the add on top of BERT and perform the fine-tuning
You might be able to do something like the code below. The labels need to be in the dataset as (n, 5, num_labels) where n is the total number of examples and num_labels is the number of labels for the output vector.
It is just doing multiclass classification 5 separate times – one for each position in the vector.
You should be able to use this model in a Trainer.
class CustomModel(PreTrainedModel):
def __init__(self, config):
super().__init__(config)
self.backbone = AutoModel.from_config(config)
self.outputs = [nn.Linear(config.hidden_size, num_labels) for _ in range(5)]
def forward(
self,
input_ids,
attention_mask=None,
token_type_ids=None,
position_ids=None,
labels=None,
):
outputs = self.backbone(
input_ids,
attention_mask=attention_mask,
token_type_ids=token_type_ids,
position_ids=position_ids,
)
sequence_output = outputs.last_hidden_state
outputs = [self.outputs[i](sequence_output) for i in range(5)]
# if labels, then we are training
loss = None
if labels is not None:
loss_fn = nn.BCEWithLogitsLoss()
losses = [loss_fn(outputs[i], labels[i]) for i in range(5)]
loss = sum(losses)/len(losses)
return {
"loss": loss,
"logits": outputs
}
# You'll also have to do this when creating the model
# Config is from AutoConfig.from_pretrained(model_path)
# model_path is something like bert-base-cased
def get_pretrained(config, model_path):
model = CustomModel(config)
if model_path.endswith("pytorch_model.bin"):
model.load_state_dict(torch.load(model_path))
else:
model.backbone = AutoModel.from_pretrained(model_path)
return model
My goal is not only to get a prediction but also to fine-tune BERT, how does it happen here?
Why the labels are 3D and not just (n,5)? 5 is the number of labels
I actually rather use Ridge Regression if possible, is there a way to do so? Also I prefer not to do it 5 separate times - one for each position, as A) my actual label is more around 300 ints per vector B) I want to minimize the Ridge loss altogether
Ok you could have one linear layer (nn.Linear(config.hidden_size, 5)) and just use MSELoss. I think that would work. It won’t output ints but you can round
class CustomModel(PreTrainedModel):
def __init__(self, config):
super().__init__(config)
self.backbone = AutoModel.from_config(config)
self.output = nn.Linear(config.hidden_size, config.num_labels)
def forward(
self,
input_ids,
attention_mask=None,
token_type_ids=None,
position_ids=None,
labels=None,
):
outputs = self.backbone(
input_ids,
attention_mask=attention_mask,
token_type_ids=token_type_ids,
position_ids=position_ids,
)
sequence_output = outputs.last_hidden_state
outputs = self.output(sequence_output)
# if labels, then we are training
loss = None
if labels is not None:
loss_fn = nn.MSELoss()
loss = loss_fn(outputs, labels)
return {
"loss": loss,
"logits": outputs
}
Have your labels be shape (n, 5) where n is the number of samples.
model_name = "roberta-base"
cfg = AutoConfig.from_pretrained(model_name)
cfg.update({
"num_labels": 5
})
model = get_pretrained(cfg, model_name)
# put the model and data into Trainer
@nbroad , I want to finetune BERT with regression tasks and then use its embedding as a feature for prediction. Where do I need to modify? i need this embedding so that I can apply XGBoost as well