yujiepan commited on
Commit
927ea0e
1 Parent(s): 3acdc17

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +53 -0
README.md ADDED
@@ -0,0 +1,53 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ pipeline_tag: text-generation
3
+ inference: true
4
+ widget:
5
+ - text: 'Hello!'
6
+ example_title: Hello world
7
+ group: Python
8
+ library_name: transformers
9
+ ---
10
+
11
+ # yujiepan/falcon-40b-awq-w4g128
12
+
13
+ This model applies autoawq on [tiiuae/falcon-40b](https://huggingface.co/tiiuae/falcon-40b): AutoAWQ, 4bit, group_size=128, zero_point=True
14
+
15
+
16
+ ## Accuracy
17
+
18
+ | task | tiiuae/falcon-40b (fp16) | this repo |
19
+ |----------------------------|-------------------|-----------|
20
+ | wikitext ppl by lm_harness | 8.410 | 8.497 |
21
+
22
+
23
+
24
+ ## Usage
25
+
26
+ ```python
27
+ from awq import AutoAWQForCausalLM
28
+ from transformers import AutoTokenizer
29
+
30
+ model_name_or_path = "yujiepan/falcon-40b-awq-w4g128"
31
+
32
+ # Load model
33
+ model = AutoAWQForCausalLM.from_quantized(model_name_or_path, fuse_layers=False, trust_remote_code=False)
34
+ tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)
35
+
36
+ prompt = "Tell me about AI"
37
+ tokens = tokenizer(
38
+ prompt,
39
+ return_tensors='pt'
40
+ ).input_ids.cuda()
41
+
42
+ # Generate
43
+ generation_output = model.generate(
44
+ tokens,
45
+ do_sample=True,
46
+ temperature=0.7,
47
+ top_p=0.95,
48
+ top_k=40,
49
+ max_new_tokens=10,
50
+ )
51
+
52
+ print("Output: ", tokenizer.decode(generation_output[0]))
53
+ ```