Trained a model with a 0.0566 loss and empty MIoU

seand0101 · December 23, 2024, 10:00pm

Dear community members and @John6666 ,

As seen inmy profile and continuation from this question, I have successfully trained a model with two labels from a model with hundreds of labels called segformer-b0-finetuned-ade-512-512. I got a pretty much a satisfied visually output I cannot send rn (will add on update or reply because the said laptop is freezing-occupied in larger batch training) but with a little bit incomplete final conclusion stats? Is this often happen when I trained on small batch-small epoch or is there a mistake in my program?

The stats are such:

edit 2: The final product is surprisingly… very similar to 10 random image. Is this what they called overfitting?

I mean… not that similar, it shouldn’t be that roundy either, I haven’t test it side by side with the image or make it colored like most segmentation product. This one is done with

epochs = 0.1
lr = 0.00006
batch_size = 8

That’s all the details I could give, feel free to ask for more information, thanks in advance for the community help

John6666 · December 24, 2024, 2:15am

I think you probably don’t have enough epochs. I’ve heard that it’s okay to train up to around 20 epochs. It seems that overfitting can occur if you train with the same data too much, but anyway, I think you’re probably short on epochs right now. I think it would be a good idea to train the model you’ve trained using the same procedure again.

seand0101 · December 24, 2024, 3:35am

I was gonna put big epoch but doing 1 is quite restraining and colab doesn’t seem to be a proper solution. Is there a holy triangle of epoch, batch_size, that won’t sacrifice result quality and computer ability?

John6666 · December 24, 2024, 1:01pm

If the model is small, I think it is possible to train it in the free CPU space of HF or in the virtual environment without GPU in Colab. It will take time, but…
In your case, the model is a little special, so it is difficult to use, but AutoTrainAdvanced, etc. actually work even in CPU space. This uses HF Trainer internally.
Also, with other company services, you can use a little GPU for free with Lightning.ai.

seand0101 · December 25, 2024, 1:57am

For some reason Lightning.ai doesn’t have services in my area, is there actually a lot services like this other than google collab? I have ran my model with big epoch and still running 24 hours after… looks like going with cpu doesn’t look good while it’s working.

I’ll try that AutoTrainAdvanced perhaps, even though it doesn’t have segmentation preset…

John6666 · December 25, 2024, 3:59am

I think github also offers a free virtual machine service. However, it doesn’t have a GPU. In HF, you can create as many weak VMs without GPUs as you like for free, so it will take time but it won’t cost you any money.
By the way, the program below is a sample program I made for a different project, which adds a simple GUI to HF Trainer.
It’s just like doing manually what AutoTrainAdvanced does automatically, but this is more convenient if there are parts you want to do manually. In your case, it might be faster to modify this.

seand0101 · December 25, 2024, 4:14am

You are truly a godsend, let me try that quickly

seand0101 · December 25, 2024, 4:18am

“Error occured: ‘Image’ object has no attribute ‘names’” is this from your code or

Ohh is it from my dataset format

John6666 · December 25, 2024, 7:01am

The key relationships and structure of the data set are hard-coded, so think of this as a normal script that is model-dependent, with a GUI added to make it work in HF space rather than an app. Concept model?

seand0101 · December 25, 2024, 7:06am

I don’t understand, so should I convert the images to other format? Where should I read the hardcoded code so I can suit my dataset to it?

John6666 · December 25, 2024, 7:10am

I think the collate function is just causing problems, so I think it would be better to rewrite it or stop using it.

            # data_collator=collate_fn,

For better or worse, the GUI “does nothing”.
Well, you can think of it as a CLI in effect. That’s what it was made for. (dummy GUI)

Topic		Replies	Views
Autotrain Advanced (local) finished training between epochs i.e not sure it actually completed 🤗AutoTrain	2	1121	October 13, 2023
Why does all my gpu memory get used with a small model? Beginners	5	2036	March 13, 2022
Need help in determining model quality Beginners	30	43	December 30, 2024
Colab error (memory crashes) Beginners	3	3027	April 22, 2021
Segformer fine-tuning: error with the metrics Beginners	7	1128	October 31, 2022

Trained a model with a 0.0566 loss and empty MIoU

Related topics