How to debug NaN output of a logits in training

seand0101 · December 27, 2024, 10:02am

We did it boys, the accuracy not water still NaN for some reason. I thank thee all

Last thing I did was reverting this code back to it’s tutorial example

from torchvision.transforms import ColorJitter
from transformers import SegformerImageProcessor

processor = SegformerImageProcessor()
jitter = ColorJitter(brightness=0.25, contrast=0.25, saturation=0.25, hue=0.1) 

def train_transforms(example_batch):
    images = [jitter(x) for x in example_batch['pixel_values']]
    labels = [x for x in example_batch['label']]
    inputs = processor(images, labels)
    return inputs


def val_transforms(example_batch):
    images = [x for x in example_batch['pixel_values']]
    labels = [x for x in example_batch['label']]
    inputs = processor(images, labels)
    return inputs


# Set transforms
train_ds.set_transform(train_transforms)
test_ds.set_transform(val_transforms)

@Alanturner2 @John6666 Thank you to both of you, should we made the torch.nn.functional.normalize for posterity. I think John’s solution to normalization is bit simple but works.

Btw what I meant by “tutorial version” of that code was because previously we, me and John, made that code to convert it into B/W. Do you guys think it affects the normalized variables? Maybe I should check it after the training because I really don’t know which part of other code that makes it NaN after normalization.

For future readers, the answer is to always normalize your dataset.

O guys, one more thing. Is there any reading about judging these scores like when does this scores become abnormally bad or good? MIoU and other evaluation metric is rarely had any free readings except maybe this one

Topic		Replies	Views
Trainer doesn't show the loss at each step 🤗Transformers	20	33450	May 9, 2024
Determining size of a logits Beginners	0	20	December 4, 2024
Extracting Logits From T5 Output Beginners	5	1962	January 9, 2024
What outputs are defined when using custom compute_loss? Beginners	0	297	December 14, 2022
Inconsistency in logit values between generation and direct model prediction #31127 🤗Transformers	0	167	May 30, 2024

How to debug NaN output of a logits in training

Related topics