KirillR commited on
Commit
b082e5c
·
verified ·
1 Parent(s): 7db0180

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +108 -3
README.md CHANGED
@@ -1,3 +1,108 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model:
4
+ - Qwen/QwQ-32B-Preview
5
+ language:
6
+ - en
7
+ pipeline_tag: text-generation
8
+ library_name: transformers
9
+ ---
10
+ # QwQ-32B-Preview AWQ 4-Bit Quantized Version
11
+
12
+ ## Introduction
13
+
14
+ This repository provides the **AWQ 4-bit quantized** version of the **QwQ-32B-Preview** model, originally developed by the Qwen Team. The quantized model significantly reduces memory usage and computational requirements, making it suitable for deployment on hardware with limited resources.
15
+
16
+ **Note**: This quantized model requires approximately **20 GB of VRAM** to run effectively.
17
+
18
+ **QwQ-32B-Preview** is an experimental research model aimed at advancing AI reasoning capabilities, particularly in mathematics and coding tasks. While it shows promising analytical abilities, it has several important limitations:
19
+
20
+ - **Language Mixing and Code Switching**: The model may unexpectedly switch between languages or mix them, affecting the clarity of responses.
21
+ - **Recursive Reasoning Loops**: There's a possibility of the model entering circular reasoning patterns, leading to lengthy responses without conclusive answers.
22
+ - **Safety and Ethical Considerations**: Enhanced safety measures are needed to ensure reliable and secure performance. Users should exercise caution when deploying the model.
23
+ - **Performance Limitations**: While excelling in math and coding, the model may underperform in areas like common sense reasoning and nuanced language understanding.
24
+
25
+ ---
26
+
27
+ ## Requirements
28
+
29
+ Ensure you are using the latest version of Hugging Face Transformers, as the code for Qwen2.5 is integrated there. Using a version earlier than **4.37.0** may result in the following error:
30
+
31
+ ```plaintext
32
+ KeyError: 'qwen2'
33
+ ```
34
+
35
+ ---
36
+
37
+ ## Quickstart
38
+
39
+ Here's how to load the tokenizer and model, and generate content using the quantized model:
40
+
41
+ ```python
42
+ from transformers import AutoModelForCausalLM, AutoTokenizer
43
+
44
+ model_name = "KirillR/QwQ-32B-Preview-AWQ"
45
+
46
+ model = AutoModelForCausalLM.from_pretrained(
47
+ model_name,
48
+ torch_dtype="auto",
49
+ device_map="auto"
50
+ )
51
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
52
+
53
+ prompt = "How many 'r's are in 'strawberry'?"
54
+ messages = [
55
+ {"role": "system", "content": "You are a helpful assistant developed by Alibaba. Please think step-by-step."},
56
+ {"role": "user", "content": prompt}
57
+ ]
58
+ text = tokenizer.apply_chat_template(
59
+ messages,
60
+ tokenize=False,
61
+ add_generation_prompt=True
62
+ )
63
+ model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
64
+
65
+ generated_ids = model.generate(
66
+ **model_inputs,
67
+ max_new_tokens=1024
68
+ )
69
+ generated_ids = [
70
+ output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
71
+ ]
72
+
73
+ response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
74
+
75
+ print(response)
76
+ ```
77
+
78
+ ---
79
+
80
+ ## Original Model
81
+
82
+ For more details about the original QwQ-32B-Preview model, please refer to the following resource:
83
+
84
+ https://huggingface.co/Qwen/Qwen2.5-Coder-32B-Instruct-AWQ
85
+
86
+
87
+ ---
88
+
89
+ ## Citation
90
+
91
+ If you find the original model helpful, please consider citing the original authors:
92
+
93
+ ```bibtext
94
+ @misc{qwq-32b-preview,
95
+ title = {QwQ: Reflect Deeply on the Boundaries of the Unknown},
96
+ url = {https://qwenlm.github.io/blog/qwq-32b-preview/},
97
+ author = {Qwen Team},
98
+ month = {November},
99
+ year = {2024}
100
+ }
101
+
102
+ @article{qwen2,
103
+ title={Qwen2 Technical Report},
104
+ author={An Yang and Baosong Yang and others},
105
+ journal={arXiv preprint arXiv:2407.10671},
106
+ year={2024}
107
+ }
108
+ ```