Thank you Astronomer!

#4
by CleanCoder123 - opened

Thank you for making this! Perplexity looks better than my own quant with 128 samples as well? Did you end up calibrating with longer wiki text chunks or more samples?

Any time! I used wiki text but truncated them down to 4k length with AutoGPTQ. I did 500 random samples, so a lot more samples than 128. I think 128 is the minimum suggested by the original GPTQ paper, but it is in general to some extent the bigger and more diverse the better (especially if the dataset is for your purpose/domain specific task).

This is because GPTQ is much like a pruning technique which "removes" (for simple terms) neurons that have minimal impact on a specific task, reducing the model's parameters while maintaining sufficient accuracy. At a lower level, it is progressively quantizing some network parameters while comparing the quantized model with the original, and minimizing any errors by updating other network parameters. But some "changed" parameters could be useful for another task, especially since this is a general instruct model, and I don't know what purpose the community is using this for. One can be using this for a general chat bot, and another can be using this to do coding or math. Thus, it is better for this model to only use something like wikitext for calibration since it is somewhat general and diverse.

In your own quant, it is important to use a dataset most relevant to whatever you are using the model for. And I mentioned above that bigger is only good"to some extent" since GPTQ's quantization dataset, I believe, can also be subject to "overfitting" or "under-fitting" effects.

Sign up or log in to comment