File size: 1,178 Bytes
927ea0e |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 |
---
pipeline_tag: text-generation
inference: true
widget:
- text: 'Hello!'
example_title: Hello world
group: Python
library_name: transformers
---
# yujiepan/falcon-40b-awq-w4g128
This model applies autoawq on [tiiuae/falcon-40b](https://huggingface.co/tiiuae/falcon-40b): AutoAWQ, 4bit, group_size=128, zero_point=True
## Accuracy
| task | tiiuae/falcon-40b (fp16) | this repo |
|----------------------------|-------------------|-----------|
| wikitext ppl by lm_harness | 8.410 | 8.497 |
## Usage
```python
from awq import AutoAWQForCausalLM
from transformers import AutoTokenizer
model_name_or_path = "yujiepan/falcon-40b-awq-w4g128"
# Load model
model = AutoAWQForCausalLM.from_quantized(model_name_or_path, fuse_layers=False, trust_remote_code=False)
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)
prompt = "Tell me about AI"
tokens = tokenizer(
prompt,
return_tensors='pt'
).input_ids.cuda()
# Generate
generation_output = model.generate(
tokens,
do_sample=True,
temperature=0.7,
top_p=0.95,
top_k=40,
max_new_tokens=10,
)
print("Output: ", tokenizer.decode(generation_output[0]))
``` |