I am using a GPT2 model that outputs logits (before softmax) in the shape (batch_size, num_input_ids, vocab_size)
and I need to compare it with the labels that are of shape (batch_size, num_input_ids)
to calculate BCELoss. How do I calculate it?
logits = output.logits #--of shape (32, 56, 592)
logits = torch.nn.Softmax()(logits)
labels = labels #---------of shape (32, 56)
torch.nn.BCELoss()(logits, labels)
but the dimensions do not match, so how do I contract logits
to labels
shape or expand labels
to logits
shape?