OPEA
/

QwQ-32B-Preview-int4-sym-mixed-awq-inc

4-bit precision

Model card Files Files and versions Community

cicdatopea commited on Dec 4, 2024

Commit

13d8722

·

verified ·

1 Parent(s): 763a9f3

Update README.md

Files changed (1) hide show

README.md +7 -2

README.md CHANGED Viewed

@@ -1,3 +1,8 @@
 ## Model Details
 This awq model is an int4 model with group_size 128 and symmetric quantization of [Qwen/QwQ-32B-Preview](https://huggingface.co/Qwen/QwQ-32B-Preview) generated by [intel/auto-round](https://github.com/intel/auto-round). We excluded 3 layers from quantization due to the overflow issue on some int4 backends.
@@ -208,7 +213,7 @@ auto-round \
 --disable_eval \
 --model_dtype "fp16" \
 --fp_layers "model.layers.5.mlp.down_proj,model.layers.5.mlp.up_proj,model.layers.5.mlp.gate_proj" \
---format 'auto_round' \
 --output_dir "./tmp_autoround"
 ```
@@ -234,4 +239,4 @@ The license on this model does not constitute legal advice. We are not responsib
 @article{cheng2023optimize, title={Optimize weight rounding via signed gradient descent for the quantization of llms}, author={Cheng, Wenhua and Zhang, Weiwei and Shen, Haihao and Cai, Yiyang and He, Xin and Lv, Kaokao and Liu, Yi}, journal={arXiv preprint arXiv:2309.05516}, year={2023} }
-[arxiv](https://arxiv.org/abs/2309.05516) [github](https://github.com/intel/auto-round)

+---
+license: apache-2.0
+datasets:
+- NeelNanda/pile-10k
+---
 ## Model Details
 This awq model is an int4 model with group_size 128 and symmetric quantization of [Qwen/QwQ-32B-Preview](https://huggingface.co/Qwen/QwQ-32B-Preview) generated by [intel/auto-round](https://github.com/intel/auto-round). We excluded 3 layers from quantization due to the overflow issue on some int4 backends.
 --disable_eval \
 --model_dtype "fp16" \
 --fp_layers "model.layers.5.mlp.down_proj,model.layers.5.mlp.up_proj,model.layers.5.mlp.gate_proj" \
+--format 'auto_awq' \
 --output_dir "./tmp_autoround"
 ```
 @article{cheng2023optimize, title={Optimize weight rounding via signed gradient descent for the quantization of llms}, author={Cheng, Wenhua and Zhang, Weiwei and Shen, Haihao and Cai, Yiyang and He, Xin and Lv, Kaokao and Liu, Yi}, journal={arXiv preprint arXiv:2309.05516}, year={2023} }
+[arxiv](https://arxiv.org/abs/2309.05516) [github](https://github.com/intel/auto-round)