OPEA
/

Safetensors
olmo2
4-bit precision
intel/auto-round
cicdatopea commited on
Commit
ff31d3a
·
verified ·
1 Parent(s): 39856e8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +173 -3
README.md CHANGED
@@ -1,3 +1,173 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ datasets:
4
+ - NeelNanda/pile-10k
5
+ ---
6
+
7
+
8
+ ## Model Card Details
9
+
10
+ This model is an int4 model with group_size 128 and symmetric quantization of [allenai/OLMo-2-1124-13B-Instruct](https://huggingface.co/allenai/OLMo-2-1124-13B-Instruct) generated by [intel/auto-round](https://github.com/intel/auto-round). Load the model with revision `90c15db` to use AutoGPTQ format
11
+
12
+ ## Inference on CPU/HPU/CUDA
13
+
14
+ pip3 install transformers>=4.47
15
+
16
+ HPU: docker image with Gaudi Software Stack is recommended, please refer to following script for environment setup. More details can be found in [Gaudi Guide](https://docs.habana.ai/en/latest/Installation_Guide/Bare_Metal_Fresh_OS.html#launch-docker-image-that-was-built).
17
+
18
+
19
+
20
+ ```python
21
+ from auto_round import AutoHfQuantizer ##must import for auto-round format
22
+ import torch
23
+ from transformers import AutoModelForCausalLM,AutoTokenizer
24
+ quantized_model_dir = "OPEA/OLMo-2-1124-13B-Instruct-int4-sym-inc"
25
+ tokenizer = AutoTokenizer.from_pretrained(quantized_model_dir)
26
+
27
+ model = AutoModelForCausalLM.from_pretrained(
28
+ quantized_model_dir,
29
+ torch_dtype='auto',
30
+ device_map="auto",
31
+ ##revision="90c15db", ##AutoGPTQ format
32
+ )
33
+
34
+ ##import habana_frameworks.torch.core as htcore ## uncommnet it for HPU
35
+ ##import habana_frameworks.torch.hpu as hthpu ## uncommnet it for HPU
36
+ ##model = model.to(torch.bfloat16).to("hpu") ## uncommnet it for HPU
37
+
38
+ prompt = "There is a girl who likes adventure,"
39
+ messages = [
40
+ {"role": "system", "content": "You are OLMo 2, a helpful and harmless AI Assistant built by the Allen Institute for AI."},
41
+ {"role": "user", "content": prompt}
42
+ ]
43
+
44
+ tokenizer = AutoTokenizer.from_pretrained(quantized_model_dir)
45
+ text = tokenizer.apply_chat_template(
46
+ messages,
47
+ tokenize=False,
48
+ add_generation_prompt=True
49
+ )
50
+ model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
51
+
52
+ generated_ids = model.generate(
53
+ model_inputs.input_ids,
54
+ max_new_tokens=200,
55
+ do_sample=False ##change this to align with the official usage
56
+ )
57
+ generated_ids = [
58
+ output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
59
+ ]
60
+
61
+ response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
62
+ print(response)
63
+
64
+ ##prompt = "There is a girl who likes adventure,"
65
+ ##INT4
66
+ """
67
+ That sounds exciting! Adventure can come in many forms. For someone who enjoys the thrill of exploration, here are a few adventure-filled ideas:
68
+
69
+ 1. **Travel to New Places**: Encourage her to explore new cities, countries, or even different cultures. Traveling can be an adventure in itself, offering new experiences and perspectives.
70
+
71
+ 2. **Outdoor Activities**: Engage in outdoor adventures such as hiking, camping, rock climbing, or even kayaking. These activities can provide a sense of freedom and connection with nature.
72
+
73
+ 3. **Learn a New Skill**: Learning something new, like scuba diving, horseback riding, or even skydiving, can be an adventure in itself. It's not just about the activity but also about the journey of learning and mastering something new.
74
+
75
+ 4. **Volunteer Work**: Consider volunteering for adventure-related activities, such as wildlife conservation, archaeological digs, or even helping with outdoor events. This can be both
76
+ """
77
+
78
+ ##BF16
79
+ """
80
+ That sounds exciting! Adventure can come in many forms. It could be exploring new places, trying new activities, or even diving into books and movies about thrilling quests and journeys. What kind of adventure is she interested in?
81
+ """
82
+
83
+ ##prompt = "Which one is larger, 9.11 or 9.8"
84
+ ## INT4
85
+ """9.11 is larger than 9.8.
86
+ """
87
+
88
+ ## BF16
89
+ """9.11 is larger than 9.8."""
90
+
91
+ prompt = "How many r in strawberry."
92
+ ## INT4
93
+ """There are two 'r's in the word "strawberry.""""
94
+ ## BF16
95
+ """There are two 'r's in the word "strawberry.""""
96
+
97
+
98
+ ##prompt = "Once upon a time,"
99
+ ##INT4
100
+ """There was a curious user who wanted to continue a story. How should the story unfold?"""
101
+
102
+ ##BF16
103
+ """there was a curious user who wanted to explore the vast world of knowledge and storytelling. And I, OLMo 2, was here to assist and guide them on their journey. What would you like to explore today?
104
+ """
105
+ ```
106
+
107
+ ### Evaluate the model
108
+
109
+ pip3 install lm-eval==0.4.5
110
+
111
+ ```bash
112
+ auto-round --eval --model_name "OPEA/OLMo-2-1124-13B-Instruct-int4-sym-inc" --eval_bs 16 --tasks leaderboard_mmlu_pro,leaderboard_ifeval,lambada_openai,hellaswag,piqa,winogrande,truthfulqa_mc1,openbookqa,boolq,arc_easy,arc_challenge,gsm8k
113
+ ```
114
+
115
+
116
+
117
+ | Metric | BF16 | INT4 |
118
+ | --------------------------- | ------------------------ | ------------------------ |
119
+ | avg | 0.6557 | 0.6525 |
120
+ | leaderboard_mmlu_pro 5shot | 0.3314 | 0.3264 |
121
+ | leaderboard_ifeval | 0.6879=(0.7398+0.6359)/2 | 0.6832=(0.7362+0.6303)/2 |
122
+ | lambada_openai | 0.7479 | 0.7559 |
123
+ | hellaswag | 0.6853 | 0.6808 |
124
+ | winogrande | 0.7758 | 0.7806 |
125
+ | piqa | 0.8248 | 0.8177 |
126
+ | truthfulqa_mc1 | 0.4296 | 0.4247 |
127
+ | openbookqa | 0.4260 | 0.4220 |
128
+ | boolq | 0.7850 | 0.7532 |
129
+ | arc_easy | 0.8304 | 0.8295 |
130
+ | arc_challenge | 0.5742 | 0.5776 |
131
+ | gsm8k(5shot) strict match | 0.7703 | 0.7779 |
132
+
133
+ ## Reproduce the model
134
+
135
+ Here is the sample command to generate the model.
136
+
137
+ ```bash
138
+ auto-round \
139
+ --model allenai/OLMo-2-1124-13B-Instruct \
140
+ --device 0 \
141
+ --nsamples 512 \
142
+ --model_dtype "fp16" \
143
+ --iter 1000 \
144
+ --disable_eval \
145
+ --format 'auto_gptq,auto_round' \
146
+ --output_dir "./tmp_autoround"
147
+ ```
148
+
149
+
150
+
151
+ ## Ethical Considerations and Limitations
152
+
153
+ The model can produce factually incorrect output, and should not be relied on to produce factually accurate information. Because of the limitations of the pretrained model and the finetuning datasets, it is possible that this model could generate lewd, biased or otherwise offensive outputs.
154
+
155
+ Therefore, before deploying any applications of the model, developers should perform safety testing.
156
+
157
+ ## Caveats and Recommendations
158
+
159
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model.
160
+
161
+ Here are a couple of useful links to learn more about Intel's AI software:
162
+
163
+ - Intel Neural Compressor [link](https://github.com/intel/neural-compressor)
164
+
165
+ ## Disclaimer
166
+
167
+ The license on this model does not constitute legal advice. We are not responsible for the actions of third parties who use this model. Please consult an attorney before using this model for commercial purposes.
168
+
169
+ ## Cite
170
+
171
+ @article{cheng2023optimize, title={Optimize weight rounding via signed gradient descent for the quantization of llms}, author={Cheng, Wenhua and Zhang, Weiwei and Shen, Haihao and Cai, Yiyang and He, Xin and Lv, Kaokao and Liu, Yi}, journal={arXiv preprint arXiv:2309.05516}, year={2023} }
172
+
173
+ [arxiv](https://arxiv.org/abs/2309.05516) [github](https://github.com/intel/auto-round)