radi-cho
/

gemma-2-9b-AWQ

Text Generation

text-generation-inference

Inference Endpoints

4-bit precision

Model card Files Files and versions Community

gemma-2-9b-AWQ / README.md

radi-cho's picture

Update README.md

d6f42eb verified 6 months ago

|

history blame contribute delete

1.06 kB

	---
	license: gemma
	---

	[AWQ](https://arxiv.org/abs/2306.00978)-quantized package (W4G128) of [`google/gemma-2-9b`](https://huggingface.co/google/gemma-2-9b).
	Support for Gemma2 in the codebase of AutoAWQ is proposed in the following [pull request](https://github.com/casper-hansen/AutoAWQ/pull/562).
	To use the model, follow the AutoAWQ examples with the source from [#562](https://github.com/casper-hansen/AutoAWQ/pull/562).

	Evaluation<br>
	WikiText-2 PPL: 7.08<br>
	C4 PPL: 11.05

	Loading

	```py
	model_path = "radi-cho/gemma-2-9b-AWQ"

	# With transformers
	from transformers import AutoModelForCausalLM
	model = AutoModelForCausalLM.from_pretrained(model_path, device_map="cuda:0")

	# With transformers (fused)
	from transformers import AutoModelForCausalLM, AwqConfig
	quantization_config = AwqConfig(bits=4, fuse_max_seq_len=512, do_fuse=True)
	model = AutoModelForCausalLM.from_pretrained(model_path, quantization_config=quantization_config).to(0)

	# With AutoAWQ
	from awq import AutoAWQForCausalLM
	model = AutoAWQForCausalLM.from_quantized(model_path)
	```