I recently saw a post recommending that huggingface dataset owners chunk up their data into archive files no larger than 5GB
But, what about the other side of things?
If you have a dataset with millions of filesā¦ are there recommendations about how best to present them in huggingface?
Is individual-file upload discouraged?
If not, is there a specific recommended directory stacking format?