Spaces:
Running
on
A10G
Split/shard support
Note to self: Merge this too: https://huggingface.co/spaces/ggml-org/gguf-my-repo/discussions/64/files
Gladly! Update: there's also support for i-quants virtually ready though it'd benefit a few more additions/fallbacks to handle some gotchas before submitting: https://huggingface.co/spaces/SixOpen/gguf-my-repo-sp_imat/tree/main I have access to Zero but it doesn't seem to support docker SDK so can't load any layers to GPU, though works regardless (in few hours given the free space CPU, granted that is no issue on the original space)
Result Just split result using the space mirroring this PR
Awesome :) of course, will do so!
Sure thing! π I might do it this weekend, though there's an impediment regarding the Dockerfile and putting the GPU to work which I haven't been able to figure a fix that follows best practices yet (other than through Dev Mode), and haven't been able to replicate it locally either. Will look into it in a bit! :)
Interesting, I think in your start.sh
you should have LLAMA_CUDA=1 make -j quantize gguf-split imatrix
so that it compiles with CUDA support.
Feel free to email me at vaibhav [at] huggingface [dot] co or message on twitter if you want to chat more about this, happy to help debug issues with you.
Sounds good let's do that then :) looked at the latest commits here by the way, they're cool stuff! I was using that cuda flag indeed, but good news: turns out the space itself was the issue, after moving to a new one everything is working well- Another odd thing similar to this that happened in another iteration of the space is that in spite of factory rebuilding, once the auth expires it results into an endless redirect loop