Dear huggingface community members and @John6666 @Alanturner2
I have trained my model and inference it on random images. When I tried random image that has similar camera position with my dataset, it fits well at detecting river. But when it’s like satellite view or something else, it fails quite bad.
I plot the training stats and here’s the graphs.
The loss is lowering, but so does the accuracy while it is pretty much flat. Is this overfitting?
I mean, I want a model that is very specific for this point of view dataset so being able to detect river at that angle is good enough, except the camera suddenly flips or tip like 30 degree. But I also know overfitting bad while adding more random data to it will just remade the model to be general although now it’s for river with various angle.
Is this good enough or is there something else I could do to make it better? Thanks in advance for the input
1 Like
Hi. As long as the repo is public, there is effectively no limit to the size of the repository on Hugging Face. So why not save the current stable model and try forking it and training it?
If it doesn’t work, throw it away. If it looks promising, train it some more.
You can do the forking in this space.
1 Like
So this one truly bad? How do you read these stats basically
1 Like
No, I don’t know much about model training to begin with, so I don’t know how to accurately read the graph, but if the model is operating as it was trained to, then there’s nothing wrong with it.
However, at first glance, it looks like it’s already been trained enough.
Something I forgot to mention above: this time, we’ve been doing things from the start to improve learning efficiency, such as adding jitter. Apparently, you can get by with 20 or 30 epochs in general…
To avoid overfitting with training beyond that, you’d have to add more data sets and train with them.
I wonder what would happen if you tried different angles or satellite photos… It might be worth experimenting with if you have the data. It would be possible with some models.
Edit:
If I’m not reading the graph wrong, I think about 1500 images should be enough…
Agreed, the reason I’m asking this is because:
- I’m wondering if adding rotated images to the training dataset will help, as in simulating if the CCTV leaned to wrong angles.
- If it doesn’t then the next step is adding more variant river data, but isn’t that a bit too much since the main idea is to implement this on CCTV site? Or every model has to be useable on satellite vision to bird vision?
Btw 1500 image more you mean? so the total data now about 10k?
1
That might be effective. If the mask is also deformed in the same way, there is no need to recreate the data set.
2
true. maybe overkill.
Btw 1500 image more you mean? so the total data now about 10k?
Looking at the gradient of the graph, after it exceeds 1500-2000, there doesn’t seem to be much change, does it?
You mean the accuracy and loss after 2000 epochs? You’re right though, there’s nothing more I can do with this model…
Now I’m curious how does the cityscape model graphs looks like, do they; model trained with bigger data; also has breaking point like this?
Btw what about the ups and down on the loss, can it be called extreme ups and down?
I really need a reading about this, or at least, trying to prove my “data opinion” quantitatively… Like… I don’t know what to conclude before I asked here
1 Like
You mean the accuracy and loss after 2000 epochs? You’re right though, there’s nothing more I can do with this model…
Yea. Well, there may still be some training benefits to including images from different angles, but I’d recommend taking a backup of your model before doing this, as you don’t want it to have the opposite effect.
model trained with bigger data; also has breaking point like this?
I think that’s the case. I think it depends on the size of the neural network. For example, if there are only one or ten neurons, you can imagine the limitations.
Empirically, from experience using models, the larger the model, the more potential training room there is, and the smaller the model, the easier it is to retrain everything, so in that respect it’s more convenient.
If I implement what I did to big model to this small model, will it hit some point of badness sometime? Like it’s already near overfitting so it wouldn’t make sense to finetune it again isn’t it? Does finetuning always go down not up? if that makes any sense sorry for the strange words
1 Like
If I implement what I did to big model to this small model, will it hit some point of badness sometime?
Perhaps, the smaller the model, the faster it reaches its limits. Of course, this depends not only on the size but also on the architecture, but in terms of size.
Like it’s already near overfitting so it wouldn’t make sense to finetune it again isn’t it?
I agree.
Does finetuning always go down not up?
Since the training is done using river data and the evaluation is also done using river data, I don’t think it will get worse about the river data. Even if it doesn’t get better. However, if overfitting progresses, there is a risk that it will become poor at handling river data from other perspectives and data on other objects. If you don’t let it handle it, there won’t be a problem.
I see. What kind of “handling” I could do at that point? Like regression?
1 Like
Yea. Let’s say we want to train this model to recognize images of rivers from different angles. If we already overfit the model to recognize images of rivers from a particular angle,
if we have a backup of the model before it was overfitted, it might be more efficient to train the model to recognize images of rivers from different angles based on that model, rather than training it on the already overfitted model.
If you don’t have a backup and you want to train a model that has been over-trained… either regress it or retrain it more strongly. In other words, you can overwrite it with more powerful training. Well, I think the cheapest thing to do is to take backups frequently.
In this case, it seems that this angle is fine, so there is no problem. I think you need to be careful if you want to create a more versatile model.
Alright, got it. Either do maths on it or unga-bunga hoping for the best neural networks achieved by dropping in more data but it is more unlikely to happened if the model is already overfitted like we have now
Btw one more question, does plotting it per epoch will affect anything if compared to let’s say, seconds of time running? The problem is I cannot find the time it ran after closing the program and only copy the table with epoch as unit of time. Do we perhaps have equations for determining epoch into absolute time?
1 Like
does plotting it per epoch will affect anything if compared to let’s say, seconds of time running? The problem is I cannot find the time it ran after closing the program and only copy the table with epoch as unit of time. Do we perhaps have equations for determining epoch into absolute time?
You can use the Python time library to manually measure absolute time like a stopwatch, but the actual training time will vary depending on the PC’s performance, so it probably won’t be of much practical use.
If you use steps instead of epochs as the unit of time, you can analyze it in more detail. You can specify this as an argument in the Trainer.
What is the difference between steps and epochs? Oh wait, I just saw the label again it actually said steps instead of epochs, sorry.
I just realized I only get 10 epochs but get like 8000s steps out of it, from our conversation, looks like taking extra 12 hours to get 20 epochs won’t be any different. Lucky I reduced it to 10 for faster outcome
1 Like
A training step is one gradient update. In one step batch_size
examples are processed.
An epoch consists of one full cycle through the training data. This is usually many steps. As an example, if you have 2,000 images and use a batch size of 10 an epoch consists of:
2,000 images / a batch size of 10 = 200 steps.
If you choose your training image randomly (and independently) in each step, you normally do not call it epoch.
Since the evaluation function uses up a lot of processing resources, if you don’t want to call it frequently, you can call it at each epoch, or if you are a little short on resources, you can choose not to call it (~_strategy=“no”).
I see, sad me uses inconvenient number like 8 as batch size. The thought in my head firstly was 8 is one byte of data so it could affect the training somehow, but looks like it doesn’t.
so 8748 / 8 = 1093,5 steps per epoch. So… if there’s like 8740 steps then there’s about 8 epoch?? Why is it not full 10 epoch? Is there any reason for that?
Btw if you look at it, the accuracy is also getting worse compared to the first one. As far as I read sources, the accuracy should be from zero instead of 0.98… something is up with my dataset?
This is also the first time I used a gpu, I wonder if multicores of GPU has any variable discrepancy due to parallel processing or not… I think I also need to read about GPU too
1 Like
something is up with my dataset?
I have a terrible hypothesis…
Is it that the river and non-river are being interpreted the wrong way around? The 1 and 0 are reversed.
No… the zeros are for “not-water” in the id2label, except my program accidentally switched it up for some reason I didn’t see… hmm let me check
1 Like
At the very least, the model’s performance is improving, so the evaluation function probably needs more work.