cicdatopea
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,8 @@
|
|
|
|
|
|
|
|
|
|
|
|
1 |
## Model Details
|
2 |
|
3 |
This awq model is an int4 model with group_size 128 and symmetric quantization of [Qwen/QwQ-32B-Preview](https://huggingface.co/Qwen/QwQ-32B-Preview) generated by [intel/auto-round](https://github.com/intel/auto-round). We excluded 3 layers from quantization due to the overflow issue on some int4 backends.
|
@@ -208,7 +213,7 @@ auto-round \
|
|
208 |
--disable_eval \
|
209 |
--model_dtype "fp16" \
|
210 |
--fp_layers "model.layers.5.mlp.down_proj,model.layers.5.mlp.up_proj,model.layers.5.mlp.gate_proj" \
|
211 |
-
--format '
|
212 |
--output_dir "./tmp_autoround"
|
213 |
```
|
214 |
|
@@ -234,4 +239,4 @@ The license on this model does not constitute legal advice. We are not responsib
|
|
234 |
|
235 |
@article{cheng2023optimize, title={Optimize weight rounding via signed gradient descent for the quantization of llms}, author={Cheng, Wenhua and Zhang, Weiwei and Shen, Haihao and Cai, Yiyang and He, Xin and Lv, Kaokao and Liu, Yi}, journal={arXiv preprint arXiv:2309.05516}, year={2023} }
|
236 |
|
237 |
-
[arxiv](https://arxiv.org/abs/2309.05516) [github](https://github.com/intel/auto-round)
|
|
|
1 |
+
---
|
2 |
+
license: apache-2.0
|
3 |
+
datasets:
|
4 |
+
- NeelNanda/pile-10k
|
5 |
+
---
|
6 |
## Model Details
|
7 |
|
8 |
This awq model is an int4 model with group_size 128 and symmetric quantization of [Qwen/QwQ-32B-Preview](https://huggingface.co/Qwen/QwQ-32B-Preview) generated by [intel/auto-round](https://github.com/intel/auto-round). We excluded 3 layers from quantization due to the overflow issue on some int4 backends.
|
|
|
213 |
--disable_eval \
|
214 |
--model_dtype "fp16" \
|
215 |
--fp_layers "model.layers.5.mlp.down_proj,model.layers.5.mlp.up_proj,model.layers.5.mlp.gate_proj" \
|
216 |
+
--format 'auto_awq' \
|
217 |
--output_dir "./tmp_autoround"
|
218 |
```
|
219 |
|
|
|
239 |
|
240 |
@article{cheng2023optimize, title={Optimize weight rounding via signed gradient descent for the quantization of llms}, author={Cheng, Wenhua and Zhang, Weiwei and Shen, Haihao and Cai, Yiyang and He, Xin and Lv, Kaokao and Liu, Yi}, journal={arXiv preprint arXiv:2309.05516}, year={2023} }
|
241 |
|
242 |
+
[arxiv](https://arxiv.org/abs/2309.05516) [github](https://github.com/intel/auto-round)
|