I’d like to host a dataset on Hugging Face – The AI community building the future.
Is there a size limit? My dataset is 106GB.
I tried to add it but I can’t even add the files to the git repo (I’m getting fatal: Out of memory, realloc failed after git add), although I’m using git-lfs
Unfortunately, git-lfs has a size limitation of 2GB – and only a few GB larger, if you’re using GitHub Enterprise. As an alternative, you can store your dataset in an alternate location (ex: cloud storage), and reference that location in your data loading script.
As far as I know, we do have datasets with some Terabytes. As Paige suggested, you can store your dataset in alternate locations, but it is also possible (as far as I know) to upload datasets above 5GB with huggingface-cli lfs-enable-largefiles .
@julien-c I’m also confused. So I may upload as many files as I want and have them be <20GB or <5GB but I basically upload a data set as large as I want? @julien-c seems amazing…