|
--- |
|
license: gemma |
|
--- |
|
|
|
[AWQ](https://arxiv.org/abs/2306.00978)-quantized package (W4G128) of [`google/gemma-2-9b`](https://huggingface.co/google/gemma-2-9b). |
|
Support for Gemma2 in the codebase of AutoAWQ is proposed in the following [pull request](https://github.com/casper-hansen/AutoAWQ/pull/562). |
|
To use the model, follow the AutoAWQ examples with the source from [#562](https://github.com/casper-hansen/AutoAWQ/pull/562). |
|
|
|
**Evaluation**<br> |
|
WikiText-2 PPL: 7.08<br> |
|
C4 PPL: 11.05 |
|
|
|
**Loading** |
|
|
|
```py |
|
model_path = "radi-cho/gemma-2-9b-AWQ" |
|
|
|
# With transformers |
|
from transformers import AutoModelForCausalLM |
|
model = AutoModelForCausalLM.from_pretrained(model_path, device_map="cuda:0") |
|
|
|
# With transformers (fused) |
|
from transformers import AutoModelForCausalLM, AwqConfig |
|
quantization_config = AwqConfig(bits=4, fuse_max_seq_len=512, do_fuse=True) |
|
model = AutoModelForCausalLM.from_pretrained(model_path, quantization_config=quantization_config).to(0) |
|
|
|
# With AutoAWQ |
|
from awq import AutoAWQForCausalLM |
|
model = AutoAWQForCausalLM.from_quantized(model_path) |
|
``` |