OPEA
/

Safetensors
qwen2
4-bit precision
awq
cicdatopea commited on
Commit
13d8722
·
verified ·
1 Parent(s): 763a9f3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -2
README.md CHANGED
@@ -1,3 +1,8 @@
 
 
 
 
 
1
  ## Model Details
2
 
3
  This awq model is an int4 model with group_size 128 and symmetric quantization of [Qwen/QwQ-32B-Preview](https://huggingface.co/Qwen/QwQ-32B-Preview) generated by [intel/auto-round](https://github.com/intel/auto-round). We excluded 3 layers from quantization due to the overflow issue on some int4 backends.
@@ -208,7 +213,7 @@ auto-round \
208
  --disable_eval \
209
  --model_dtype "fp16" \
210
  --fp_layers "model.layers.5.mlp.down_proj,model.layers.5.mlp.up_proj,model.layers.5.mlp.gate_proj" \
211
- --format 'auto_round' \
212
  --output_dir "./tmp_autoround"
213
  ```
214
 
@@ -234,4 +239,4 @@ The license on this model does not constitute legal advice. We are not responsib
234
 
235
  @article{cheng2023optimize, title={Optimize weight rounding via signed gradient descent for the quantization of llms}, author={Cheng, Wenhua and Zhang, Weiwei and Shen, Haihao and Cai, Yiyang and He, Xin and Lv, Kaokao and Liu, Yi}, journal={arXiv preprint arXiv:2309.05516}, year={2023} }
236
 
237
- [arxiv](https://arxiv.org/abs/2309.05516) [github](https://github.com/intel/auto-round)
 
1
+ ---
2
+ license: apache-2.0
3
+ datasets:
4
+ - NeelNanda/pile-10k
5
+ ---
6
  ## Model Details
7
 
8
  This awq model is an int4 model with group_size 128 and symmetric quantization of [Qwen/QwQ-32B-Preview](https://huggingface.co/Qwen/QwQ-32B-Preview) generated by [intel/auto-round](https://github.com/intel/auto-round). We excluded 3 layers from quantization due to the overflow issue on some int4 backends.
 
213
  --disable_eval \
214
  --model_dtype "fp16" \
215
  --fp_layers "model.layers.5.mlp.down_proj,model.layers.5.mlp.up_proj,model.layers.5.mlp.gate_proj" \
216
+ --format 'auto_awq' \
217
  --output_dir "./tmp_autoround"
218
  ```
219
 
 
239
 
240
  @article{cheng2023optimize, title={Optimize weight rounding via signed gradient descent for the quantization of llms}, author={Cheng, Wenhua and Zhang, Weiwei and Shen, Haihao and Cai, Yiyang and He, Xin and Lv, Kaokao and Liu, Yi}, journal={arXiv preprint arXiv:2309.05516}, year={2023} }
241
 
242
+ [arxiv](https://arxiv.org/abs/2309.05516) [github](https://github.com/intel/auto-round)