Any chance of an int4 or quantised version?

#3
by smcleod - opened

I'd try to create one myself, but the bf/fp16 weights are too big for me to process 🤣

@smcleod may be you have abbility quantinize https://huggingface.co/inarikami/DeepSeek-V3-int4-TensorRT ?

you have to be crazy/desperate to want any lower quants than int4. For a model to be comparable at all to nonquant version even this I would call too low personally...

@olborer that's not how param size vs quant type works. The larger the param size the less problems you have with lower quantisations and essentially any quant that's at least Q2_K_M of a larger parameter model will beat the smaller parameter version.

Sign up or log in to comment