Hello, I was wondering if labeling my dataset would lead to better results in the fine tuning of a causal model, I have seen several code examples in which they labelled and others where they don’t.
I went into the source code for GPTNeoForCausalLM forward function
if labels is not None:
# Compute loss in fp32 to match with mesh-tf version
# https://github.com/EleutherAI/gpt-neo/blob/89ce74164da2fb16179106f54e2269b5da8db333/models/gpt2/gpt2.py#L179
lm_logits = lm_logits.to(torch.float32)
# Shift so that tokens < n predict n
shift_logits = lm_logits[..., :-1, :].contiguous()
shift_labels = labels[..., 1:].contiguous()
# Flatten the tokens
loss_fct = CrossEntropyLoss()
loss = loss_fct(shift_logits.view(-1, shift_logits.size(-1)), shift_labels.view(-1))
lm_logits = lm_logits.to(hidden_states.dtype)
loss = loss.to(hidden_states.dtype)
So If I want to use labels can I simply copy the input_ids for the labels ? Or do I need to worry about BOS token and stuff… Thank you