abhinavkulkarni commited on
Commit
466e5c2
·
1 Parent(s): a3fd730

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -132
README.md CHANGED
@@ -11,20 +11,6 @@ inference: false
11
 
12
  This model is a 4-bit 128 group size AWQ quantized model. For more information about AWQ quantization, please click [here](https://github.com/mit-han-lab/llm-awq).
13
 
14
- ## Model Date
15
- ---
16
- license: cc-by-sa-3.0
17
- tags:
18
- - MosaicML
19
- - AWQ
20
- inference: false
21
- ---
22
-
23
- # MPT-7B-Chat (4-bit 128g AWQ Quantized)
24
- [MPT-7B-Chat](https://huggingface.co/mosaicml/mpt-7b-chat) is a chatbot-like model for dialogue generation.
25
-
26
- This model is a 4-bit 128 group size AWQ quantized model. For more information about AWQ quantization, please click [here](https://github.com/mit-han-lab/llm-awq).
27
-
28
  ## Model Date
29
 
30
  July 5, 2023
@@ -47,7 +33,7 @@ git clone https://github.com/mit-han-lab/llm-awq \
47
  && git checkout 71d8e68df78de6c0c817b029a568c064bf22132d \
48
  && pip install -e . \
49
  && cd awq/kernels \
50
- python setup.py install
51
  ```
52
 
53
  ```python
@@ -120,123 +106,6 @@ This evaluation was done using [LM-Eval](https://github.com/EleutherAI/lm-evalua
120
  | | |bits_per_byte | 0.7138| | |
121
 
122
 
123
- ## Acknowledgements
124
-
125
- The MPT model was originally finetuned by Sam Havens and the MosaicML NLP team. Please cite this model using the following format:
126
-
127
- ```
128
- @online{MosaicML2023Introducing,
129
- author = {MosaicML NLP Team},
130
- title = {Introducing MPT-7B: A New Standard for Open-Source, Commercially Usable LLMs},
131
- year = {2023},
132
- url = {www.mosaicml.com/blog/mpt-7b},
133
- note = {Accessed: 2023-03-28}, % change this date
134
- urldate = {2023-03-28} % change this date
135
- }
136
- ```
137
-
138
- The model was quantized with AWQ technique. If you find AWQ useful or relevant to your research, please kindly cite the paper:
139
-
140
- ```
141
- @article{lin2023awq,
142
- title={AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration},
143
- author={Lin, Ji and Tang, Jiaming and Tang, Haotian and Yang, Shang and Dang, Xingyu and Han, Song},
144
- journal={arXiv},
145
- year={2023}
146
- }
147
- ```
148
-
149
- July 5, 2023
150
-
151
- ## Model License
152
-
153
- Please refer to original MPT model license ([link](https://huggingface.co/mosaicml/mpt-7b-chat)).
154
-
155
- Please refer to the AWQ quantization license ([link](https://github.com/llm-awq/blob/main/LICENSE)).
156
-
157
- ## CUDA Version
158
-
159
- This model was successfully tested on CUDA driver v12.1 and toolkit v11.7 with Python v3.10.11.
160
-
161
- ## How to Use
162
-
163
- ```bash
164
- git clone https://github.com/mit-han-lab/llm-awq \
165
- && cd llm-awq \
166
- && git checkout 71d8e68df78de6c0c817b029a568c064bf22132d \
167
- && pip install -e .
168
- ```
169
-
170
- ```python
171
- import torch
172
- from awq.quantize.quantizer import real_quantize_model_weight
173
- from transformers import AutoModelForCausalLM, AutoConfig, AutoTokenizer
174
- from accelerate import init_empty_weights, load_checkpoint_and_dispatch
175
- from huggingface_hub import hf_hub_download
176
-
177
- model_name = "mosaicml/mpt-7b-chat"
178
-
179
- # Config
180
- config = AutoConfig.from_pretrained(model_name, trust_remote_code=True)
181
-
182
- # Tokenizer
183
- tokenizer = AutoTokenizer.from_pretrained(config.tokenizer_name)
184
-
185
- # Model
186
- w_bit = 4
187
- q_config = {
188
- "zero_point": True,
189
- "q_group_size": 128,
190
- }
191
-
192
- load_quant = hf_hub_download('abhinavkulkarni/mpt-7b-chat-w4-g128-awq', 'pytorch_model.bin')
193
-
194
- with init_empty_weights():
195
- model = AutoModelForCausalLM.from_pretrained(model_name, config=config,
196
- torch_dtype=torch.float16, trust_remote_code=True)
197
-
198
- real_quantize_model_weight(model, w_bit=w_bit, q_config=q_config, init_only=True)
199
-
200
- model = load_checkpoint_and_dispatch(model, load_quant, device_map="balanced")
201
-
202
- # Inference
203
- prompt = f'''What is the difference between nuclear fusion and fission?
204
- ###Response:'''
205
-
206
- input_ids = tokenizer(prompt, return_tensors='pt').input_ids.cuda()
207
- output = model.generate(
208
- inputs=input_ids,
209
- temperature=0.7,
210
- max_new_tokens=512,
211
- top_p=0.15,
212
- top_k=0,
213
- repetition_penalty=1.1,
214
- eos_token_id=tokenizer.eos_token_id
215
- )
216
- print(tokenizer.decode(output[0]))
217
- ```
218
-
219
- ## Evaluation
220
-
221
- This evaluation was done using [LM-Eval](https://github.com/EleutherAI/lm-evaluation-harness).
222
-
223
- [MPT-7B-Chat](https://huggingface.co/mosaicml/mpt-7b-chat)
224
-
225
- | Task |Version| Metric | Value | |Stderr|
226
- |--------|------:|---------------|------:|---|------|
227
- |wikitext| 1|word_perplexity|13.5936| | |
228
- | | |byte_perplexity| 1.6291| | |
229
- | | |bits_per_byte | 0.7040| | |
230
-
231
- [MPT-7B-Chat (4-bit 128-group AWQ)](https://huggingface.co/abhinavkulkarni/mpt-7b-chat-w4-g128-awq)
232
-
233
- | Task |Version| Metric | Value | |Stderr|
234
- |--------|------:|---------------|------:|---|------|
235
- |wikitext| 1|word_perplexity|14.0922| | |
236
- | | |byte_perplexity| 1.6401| | |
237
- | | |bits_per_byte | 0.7138| | |
238
-
239
-
240
  ## Acknowledgements
241
 
242
  The MPT model was originally finetuned by Sam Havens and the MosaicML NLP team. Please cite this model using the following format:
 
11
 
12
  This model is a 4-bit 128 group size AWQ quantized model. For more information about AWQ quantization, please click [here](https://github.com/mit-han-lab/llm-awq).
13
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
14
  ## Model Date
15
 
16
  July 5, 2023
 
33
  && git checkout 71d8e68df78de6c0c817b029a568c064bf22132d \
34
  && pip install -e . \
35
  && cd awq/kernels \
36
+ && python setup.py install
37
  ```
38
 
39
  ```python
 
106
  | | |bits_per_byte | 0.7138| | |
107
 
108
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
109
  ## Acknowledgements
110
 
111
  The MPT model was originally finetuned by Sam Havens and the MosaicML NLP team. Please cite this model using the following format: