Target is out of bounds

Hello everyone,
I am working with this tutorial: Fine-tuning with custom datasets — transformers 3.2.0 documentation
for fine-tuning with custom datasets and try to repeat the steps for the token classification model.
However, when I am at the training step I receive a target is out of bounds error (code and error follow below).
I did already some research and the problem seems to be related to the labels.
However I am not sure where to fix it.

CODE in which the error appears:

from transformers import DistilBertForTokenClassification, Trainer, TrainingArguments

training_args = TrainingArguments(
    output_dir='./results',          # output directory
    num_train_epochs=3,              # total number of training epochs
    per_device_train_batch_size=16,  # batch size per device during training
    per_device_eval_batch_size=64,   # batch size for evaluation
    warmup_steps=500,                # number of warmup steps for learning rate scheduler
    weight_decay=0.01,               # strength of weight decay
    logging_dir='./logs',            # directory for storing logs
    logging_steps=10,
)

model = DistilBertForTokenClassification.from_pretrained("distilbert-base-uncased")

trainer = Trainer(
    model=model,                         # the instantiated 🤗 Transformers model to be trained
    args=training_args,                  # training arguments, defined above
    train_dataset=train_dataset,         # training dataset
    eval_dataset=val_dataset             # evaluation dataset
)

trainer.train()

ERROR:

***** Running training *****
  Num examples = 2715
  Num Epochs = 3
  Instantaneous batch size per device = 16
  Total train batch size (w. parallel, distributed & accumulation) = 16
  Gradient Accumulation steps = 1
  Total optimization steps = 510
  Number of trainable parameters = 66364418

---------------------------------------------------------------------------

IndexError                                Traceback (most recent call last)

<ipython-input-13-d1e0a77d7826> in <module>
     21 )
     22 
---> 23 trainer.train()

8 frames

/usr/local/lib/python3.8/dist-packages/torch/nn/functional.py in cross_entropy(input, target, weight, size_average, ignore_index, reduce, reduction, label_smoothing)
   3024     if size_average is not None or reduce is not None:
   3025         reduction = _Reduction.legacy_get_string(size_average, reduce)
-> 3026     return torch._C._nn.cross_entropy_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index, label_smoothing)
   3027 
   3028 

IndexError: Target 5 is out of bounds.

CODE in which I assume I have to fix the error:

import torch

class WNUTDataset(torch.utils.data.Dataset):
    def __init__(self, encodings, labels):
        self.encodings = encodings
        self.labels = labels

    def __getitem__(self, idx):
        item = {key: torch.tensor(val[idx]) for key, val in self.encodings.items()}
        item['labels'] = torch.tensor(self.labels[idx])
        return item

    def __len__(self):
        return len(self.labels)

train_encodings.pop("offset_mapping") # we don't want to pass this to the model
val_encodings.pop("offset_mapping")
train_dataset = WNUTDataset(train_encodings, train_labels)
val_dataset = WNUTDataset(val_encodings, val_labels)

Thank you for all help!
Let me know, if you need more information.

Have you found a solution to it? and are you using CPU or GPU? I am facing a similar problem and can’t seem to find a solution @jonathan1989

No, as far as I remember I switched to another tutorial.
Although I am considering to give the problem to GPT and see if it finds a solution.

If it will find, do tell us about it, I have ben scanning the internet for so long, but no answer suitable to my case

Hey guys, got any solution on this?, I am fine tuning my own dataset for sequence classification task using GPT2ForSequenceClassification, I am facing the same error, here’s the error

IndexError                                Traceback (most recent call last)
<ipython-input-18-070df9681f69> in <cell line: 2>()
      1 # Fine-tune the model on your dataset
----> 2 trainer.train()

8 frames
/usr/local/lib/python3.10/dist-packages/torch/nn/functional.py in cross_entropy(input, target, weight, size_average, ignore_index, reduce, reduction, label_smoothing)
   3027     if size_average is not None or reduce is not None:
   3028         reduction = _Reduction.legacy_get_string(size_average, reduce)
-> 3029     return torch._C._nn.cross_entropy_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index, label_smoothing)
   3030 
   3031 

IndexError: Target 2 is out of bounds.

Also, my labels start from 0 so that shouldn’t be the issue. help me out here!

1 Like

Has there ever been a solution to this?

1 Like