unsubscribe commited on
Commit
9d42fce
·
verified ·
1 Parent(s): d82dbd1

Upload folder using huggingface_hub

Browse files
README.md ADDED
@@ -0,0 +1,537 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ pipeline_tag: text-generation
4
+ ---
5
+ # InternLM
6
+
7
+
8
+
9
+ <div align="center">
10
+ <img src="https://github.com/InternLM/InternLM/assets/22529082/b9788105-8892-4398-8b47-b513a292378e" width="200"/>
11
+
12
+ <div>&nbsp;</div>
13
+ <div align="center">
14
+ <b><font size="5">InternLM</font></b>
15
+ <sup>
16
+ <a href="https://internlm.intern-ai.org.cn/">
17
+ <i><font size="4">HOT</font></i>
18
+ </a>
19
+ </sup>
20
+ <div>&nbsp;</div>
21
+ </div>
22
+
23
+
24
+ [![evaluation](https://github.com/InternLM/InternLM/assets/22529082/f80a2a58-5ddf-471a-8da4-32ab65c8fd3b)](https://github.com/internLM/OpenCompass/)
25
+
26
+ [💻Github Repo](https://github.com/InternLM/InternLM) • [🤗Demo](https://huggingface.co/spaces/internlm/internlm3-8b-instruct) • [🤔Reporting Issues](https://github.com/InternLM/InternLM/issues/new) • [📜Technical Report](https://arxiv.org/abs/2403.17297)
27
+
28
+ </div>
29
+
30
+ <p align="center">
31
+ 👋 join us on <a href="https://discord.gg/xa29JuW87d" target="_blank">Discord</a> and <a href="https://github.com/InternLM/InternLM/assets/25839884/a6aad896-7232-4220-ac84-9e070c2633ce" target="_blank">WeChat</a>
32
+ </p>
33
+
34
+
35
+
36
+ ## Introduction
37
+
38
+ This is a quantization of [internlm/internlm3-8b-instruct](https://huggingface.co/internlm/internlm3-8b-instruct) for NVIDIA GPUs like Ada Lovelace and Hopper architectures. Refer to [lmdeploy](https://lmdeploy.readthedocs.io/en/latest/quantization/w8a8.html#smoothquant) for more information.
39
+
40
+ InternLM3 has open-sourced an 8-billion parameter instruction model, InternLM3-8B-Instruct, designed for general-purpose usage and advanced reasoning. This model has the following characteristics:
41
+
42
+ - **Enhanced performance at reduced cost**:
43
+ State-of-the-art performance on reasoning and knowledge-intensive tasks surpass models like Llama3.1-8B and Qwen2.5-7B. Remarkably, InternLM3 is trained on only 4 trillion high-quality tokens, saving more than 75% of the training cost compared to other LLMs of similar scale.
44
+ - **Deep thinking capability**:
45
+ InternLM3 supports both the deep thinking mode for solving complicated reasoning tasks via the long chain-of-thought and the normal response mode for fluent user interactions.
46
+
47
+
48
+ ## Usage
49
+
50
+ [LMDeploy](https://github.com/InternLM/lmdeploy) is a toolkit for compressing, deploying, and serving LLM, developed by the MMRazor and MMDeploy teams.
51
+
52
+ ```bash
53
+ pip install lmdeploy
54
+ ```
55
+
56
+ You can run batch inference locally with the following python code:
57
+
58
+ ```python
59
+ import lmdeploy
60
+ model_id = "internlm/internlm3-8b-instruct-smoothquant-fp8"
61
+ pipe = lmdeploy.pipeline(model_id)
62
+ response = pipe("Please tell me five scenic spots in Shanghai")
63
+ print(response)
64
+ ```
65
+
66
+ Or you can launch an OpenAI compatible server with the following command:
67
+
68
+ ```bash
69
+ lmdeploy serve api_server internlm/internlm3-8b-instruct-smoothquant-fp8 --model-name internlm3-8b-instruct --server-port 23333
70
+ ```
71
+
72
+ Then you can send a chat request to the server:
73
+
74
+ ```bash
75
+ curl http://localhost:23333/v1/chat/completions \
76
+ -H "Content-Type: application/json" \
77
+ -d '{
78
+ "model": "internlm3-8b-instruct",
79
+ "messages": [
80
+ {"role": "user", "content": "Please tell me five scenic spots in Shanghai"}
81
+ ]
82
+ }'
83
+ ```
84
+
85
+ Find more details in the [LMDeploy documentation](https://lmdeploy.readthedocs.io/en/latest/)
86
+
87
+
88
+ ## Open Source License
89
+
90
+ Code and model weights are licensed under Apache-2.0.
91
+
92
+ ## Citation
93
+
94
+ ```
95
+ @misc{cai2024internlm2,
96
+ title={InternLM2 Technical Report},
97
+ author={Zheng Cai and Maosong Cao and Haojiong Chen and Kai Chen and Keyu Chen and Xin Chen and Xun Chen and Zehui Chen and Zhi Chen and Pei Chu and Xiaoyi Dong and Haodong Duan and Qi Fan and Zhaoye Fei and Yang Gao and Jiaye Ge and Chenya Gu and Yuzhe Gu and Tao Gui and Aijia Guo and Qipeng Guo and Conghui He and Yingfan Hu and Ting Huang and Tao Jiang and Penglong Jiao and Zhenjiang Jin and Zhikai Lei and Jiaxing Li and Jingwen Li and Linyang Li and Shuaibin Li and Wei Li and Yining Li and Hongwei Liu and Jiangning Liu and Jiawei Hong and Kaiwen Liu and Kuikun Liu and Xiaoran Liu and Chengqi Lv and Haijun Lv and Kai Lv and Li Ma and Runyuan Ma and Zerun Ma and Wenchang Ning and Linke Ouyang and Jiantao Qiu and Yuan Qu and Fukai Shang and Yunfan Shao and Demin Song and Zifan Song and Zhihao Sui and Peng Sun and Yu Sun and Huanze Tang and Bin Wang and Guoteng Wang and Jiaqi Wang and Jiayu Wang and Rui Wang and Yudong Wang and Ziyi Wang and Xingjian Wei and Qizhen Weng and Fan Wu and Yingtong Xiong and Chao Xu and Ruiliang Xu and Hang Yan and Yirong Yan and Xiaogui Yang and Haochen Ye and Huaiyuan Ying and Jia Yu and Jing Yu and Yuhang Zang and Chuyu Zhang and Li Zhang and Pan Zhang and Peng Zhang and Ruijie Zhang and Shuo Zhang and Songyang Zhang and Wenjian Zhang and Wenwei Zhang and Xingcheng Zhang and Xinyue Zhang and Hui Zhao and Qian Zhao and Xiaomeng Zhao and Fengzhe Zhou and Zaida Zhou and Jingming Zhuo and Yicheng Zou and Xipeng Qiu and Yu Qiao and Dahua Lin},
98
+ year={2024},
99
+ eprint={2403.17297},
100
+ archivePrefix={arXiv},
101
+ primaryClass={cs.CL}
102
+ }
103
+ ```
104
+
105
+
106
+
107
+ ## 简介
108
+
109
+ ### InternLM3-8B-Instruct
110
+
111
+ InternLM3,即书生·浦语大模型第3代,开源了80亿参数,面向通用使用与高阶推理的指令模型(InternLM3-8B-Instruct)。模型具备以下特点:
112
+
113
+ - **更低的代价取得更高的性能**:
114
+ 在推理、知识类任务上取得同量级最优性能,超过Llama3.1-8B和Qwen2.5-7B。值得关注的是InternLM3只用了4万亿词元进行训练,对比同级别模型训练成本节省75%以上。
115
+ - **深度思考能力**:
116
+ InternLM3支持通过长思维链求解复杂推理任务的深度思考模式,同时还兼顾了用户体验更流畅的通用回复模式。
117
+
118
+ #### 性能评测
119
+
120
+ 我们使用开源评测工具 [OpenCompass](https://github.com/internLM/OpenCompass/) 从学科综合能力、语言能力、知识能力、推理能力、理解能力五大能力维度对InternLM开展全面评测,部分评测结果如下表所示,欢迎访问[ OpenCompass 榜单 ](https://rank.opencompass.org.cn)获取更多的评测结果。
121
+
122
+ | | 评测集\模型 | InternLM3-8B-Instruct | Qwen2.5-7B-Instruct | Llama3.1-8B-Instruct | GPT-4o-mini(闭源) |
123
+ | ------------ | ------------------------------- | --------------------- | ------------------- | -------------------- | ----------------- |
124
+ | General | CMMLU(0-shot) | **83.1** | 75.8 | 53.9 | 66.0 |
125
+ | | MMLU(0-shot) | 76.6 | **76.8** | 71.8 | 82.7 |
126
+ | | MMLU-Pro(0-shot) | **57.6** | 56.2 | 48.1 | 64.1 |
127
+ | Reasoning | GPQA-Diamond(0-shot) | **37.4** | 33.3 | 24.2 | 42.9 |
128
+ | | DROP(0-shot) | **83.1** | 80.4 | 81.6 | 85.2 |
129
+ | | HellaSwag(10-shot) | **91.2** | 85.3 | 76.7 | 89.5 |
130
+ | | KOR-Bench(0-shot) | **56.4** | 44.6 | 47.7 | 58.2 |
131
+ | MATH | MATH-500(0-shot) | **83.0*** | 72.4 | 48.4 | 74.0 |
132
+ | | AIME2024(0-shot) | **20.0*** | 16.7 | 6.7 | 13.3 |
133
+ | Coding | LiveCodeBench(2407-2409 Pass@1) | **17.8** | 16.8 | 12.9 | 21.8 |
134
+ | | HumanEval(Pass@1) | 82.3 | **85.4** | 72.0 | 86.6 |
135
+ | Instrunction | IFEval(Prompt-Strict) | **79.3** | 71.7 | 75.2 | 79.7 |
136
+ | LongContext | RULER(4-128K Average) | 87.9 | 81.4 | **88.5** | 90.7 |
137
+ | Chat | AlpacaEval 2.0(LC WinRate) | **51.1** | 30.3 | 25.0 | 50.7 |
138
+ | | WildBench(Raw Score) | **33.1** | 23.3 | 1.5 | 40.3 |
139
+ | | MT-Bench-101(Score 1-10) | **8.59** | 8.49 | 8.37 | 8.87 |
140
+
141
+ - 表中标粗的数值表示在对比的开源模型中的最高值。
142
+ - 以上评测结果基于 [OpenCompass](https://github.com/internLM/OpenCompass/) 获得(部分数据标注`*`代表使用深度思考模式进行评测),具体测试细节可参见 [OpenCompass](https://github.com/internLM/OpenCompass/) 中提供的配置文件。
143
+ - 评测数据会因 [OpenCompass](https://github.com/internLM/OpenCompass/) 的版本迭代而存在数值差异,请以 [OpenCompass](https://github.com/internLM/OpenCompass/) 最新版的评测结果为主。
144
+
145
+ **局限性:** 尽管在训练过程中我们非常注重模型的安全性,尽力促使模型输出符合伦理和法律要求的文本,但受限于模型大小以及概率生成范式,模型可能会产生各种不符合预期的输出,例如回复内容包含偏见、歧视等有害内容,请勿传播这些内容。由于传播不良信息导致的任何后果,本项目不承担责任。
146
+
147
+ #### 依赖
148
+
149
+ ```python
150
+ transformers >= 4.48
151
+ ```
152
+
153
+
154
+
155
+
156
+ #### 常规对话模式
157
+
158
+ ##### Transformers 推理
159
+
160
+ 通过以下的代码加载 InternLM3 8B Instruct 模型
161
+
162
+ ```python
163
+ import torch
164
+ from transformers import AutoTokenizer, AutoModelForCausalLM
165
+
166
+ model_dir = "internlm/internlm3-8b-instruct"
167
+ tokenizer = AutoTokenizer.from_pretrained(model_dir, trust_remote_code=True)
168
+ # Set `torch_dtype=torch.float16` to load model in float16, otherwise it will be loaded as float32 and might cause OOM Error.
169
+ model = AutoModelForCausalLM.from_pretrained(model_dir, trust_remote_code=True, torch_dtype=torch.bfloat16).cuda()
170
+ # (Optional) If on low resource devices, you can load model in 4-bit or 8-bit to further save GPU memory via bitsandbytes.
171
+ # InternLM3 8B in 4bit will cost nearly 8GB GPU memory.
172
+ # pip install -U bitsandbytes
173
+ # 8-bit: model = AutoModelForCausalLM.from_pretrained(model_dir, device_map="auto", trust_remote_code=True, load_in_8bit=True)
174
+ # 4-bit: model = AutoModelForCausalLM.from_pretrained(model_dir, device_map="auto", trust_remote_code=True, load_in_4bit=True)
175
+ model = model.eval()
176
+
177
+ system_prompt = """You are an AI assistant whose name is InternLM (书生·浦语).
178
+ - InternLM (书生·浦语) is a conversational language model that is developed by Shanghai AI Laboratory (上海人工智能实验室). It is designed to be helpful, honest, and harmless.
179
+ - InternLM (书生·浦语) can understand and communicate fluently in the language chosen by the user such as English and 中文."""
180
+ messages = [
181
+ {"role": "system", "content": system_prompt},
182
+ {"role": "user", "content": "Please tell me five scenic spots in Shanghai"},
183
+ ]
184
+ tokenized_chat = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt").to("cuda")
185
+
186
+ generated_ids = model.generate(tokenized_chat, max_new_tokens=1024, temperature=1, repetition_penalty=1.005, top_k=40, top_p=0.8)
187
+
188
+ generated_ids = [
189
+ output_ids[len(input_ids):] for input_ids, output_ids in zip(tokenized_chat, generated_ids)
190
+ ]
191
+ prompt = tokenizer.batch_decode(tokenized_chat)[0]
192
+ print(prompt)
193
+ response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
194
+ print(response)
195
+ ```
196
+
197
+ ##### LMDeploy 推理
198
+
199
+ LMDeploy 是涵盖了 LLM 任务的全套轻量化、部署和服务解决方案。
200
+
201
+ ```bash
202
+ pip install lmdeploy
203
+ ```
204
+
205
+ 你可以使用以下 python 代码进行本地批量推理:
206
+
207
+ ```python
208
+ import lmdeploy
209
+ model_dir = "internlm/internlm3-8b-instruct"
210
+ pipe = lmdeploy.pipeline(model_dir)
211
+ response = pipe(["Please tell me five scenic spots in Shanghai"])
212
+ print(response)
213
+
214
+ ```
215
+
216
+ 或者你可以使用以下命令启动兼容 OpenAI API 的服务:
217
+
218
+ ```bash
219
+ lmdeploy serve api_server internlm/internlm3-8b-instruct --model-name internlm3-8b-instruct --server-port 23333
220
+ ```
221
+
222
+ 然后你可以向服务端发起一个聊天请求:
223
+
224
+ ```bash
225
+ curl http://localhost:23333/v1/chat/completions \
226
+ -H "Content-Type: application/json" \
227
+ -d '{
228
+ "model": "internlm3-8b-instruct",
229
+ "messages": [
230
+ {"role": "user", "content": "介绍一下深度学习。"}
231
+ ]
232
+ }'
233
+ ```
234
+
235
+ 更多信息请查看 [LMDeploy 文档](https://lmdeploy.readthedocs.io/en/latest/)
236
+
237
+
238
+
239
+ ##### Ollama 推理
240
+
241
+ 准备工作
242
+
243
+ ```python
244
+ # install ollama
245
+ curl -fsSL https://ollama.com/install.sh | sh
246
+ # fetch 模型
247
+ ollama pull internlm/internlm3-8b-instruct
248
+ # install python库
249
+ pip install ollama
250
+ ```
251
+
252
+ 推理代码
253
+
254
+ ```python
255
+ import ollama
256
+
257
+ system_prompt = """You are an AI assistant whose name is InternLM (书生·浦语).
258
+ - InternLM (书生·浦语) is a conversational language model that is developed by Shanghai AI Laboratory (上海人工智能实验室). It is designed to be helpful, honest, and harmless.
259
+ - InternLM (书生·浦语) can understand and communicate fluently in the language chosen by the user such as English and 中文."""
260
+
261
+ messages = [
262
+ {
263
+ "role": "system",
264
+ "content": system_prompt,
265
+ },
266
+ {
267
+ "role": "user",
268
+ "content": "Please tell me five scenic spots in Shanghai"
269
+ },
270
+ ]
271
+
272
+ stream = ollama.chat(
273
+ model='internlm/internlm3-8b-instruct',
274
+ messages=messages,
275
+ stream=True,
276
+ )
277
+
278
+ for chunk in stream:
279
+ print(chunk['message']['content'], end='', flush=True)
280
+ ```
281
+
282
+
283
+ ####
284
+
285
+ ##### vLLM 推理
286
+
287
+ 参考[文档](https://docs.vllm.ai/en/latest/getting_started/installation/index.html) 安装 vllm 最新代码
288
+
289
+ ```bash
290
+ pip install vllm --pre --extra-index-url https://wheels.vllm.ai/nightly
291
+ ```
292
+
293
+ 推理代码
294
+
295
+ ```python
296
+ from vllm import LLM, SamplingParams
297
+
298
+ llm = LLM(model="internlm/internlm3-8b-instruct")
299
+ sampling_params = SamplingParams(temperature=1, repetition_penalty=1.005, top_k=40, top_p=0.8)
300
+
301
+ system_prompt = """You are an AI assistant whose name is InternLM (书生·浦语).
302
+ - InternLM (书生·浦语) is a conversational language model that is developed by Shanghai AI Laboratory (上海人工智能实验室). It is designed to be helpful, honest, and harmless.
303
+ - InternLM (书生·浦语) can understand and communicate fluently in the language chosen by the user such as English and 中文."""
304
+
305
+ prompts = [
306
+ {
307
+ "role": "system",
308
+ "content": system_prompt,
309
+ },
310
+ {
311
+ "role": "user",
312
+ "content": "Please tell me five scenic spots in Shanghai"
313
+ },
314
+ ]
315
+ outputs = llm.chat(prompts,
316
+ sampling_params=sampling_params,
317
+ use_tqdm=False)
318
+ print(outputs)
319
+ ```
320
+
321
+ #### 深度思考模式
322
+
323
+ ##### 深度思考 Demo
324
+
325
+ <img src="https://github.com/InternLM/InternLM/blob/017ba7446d20ecc3b9ab8e7b66cc034500868ab4/assets/solve_puzzle.png?raw=true" width="400"/>
326
+
327
+
328
+
329
+
330
+
331
+ ##### 深度思考 system prompt
332
+
333
+ ```python
334
+ thinking_system_prompt = """You are an expert mathematician with extensive experience in mathematical competitions. You approach problems through systematic thinking and rigorous reasoning. When solving problems, follow these thought processes:
335
+ ## Deep Understanding
336
+ Take time to fully comprehend the problem before attempting a solution. Consider:
337
+ - What is the real question being asked?
338
+ - What are the given conditions and what do they tell us?
339
+ - Are there any special restrictions or assumptions?
340
+ - Which information is crucial and which is supplementary?
341
+ ## Multi-angle Analysis
342
+ Before solving, conduct thorough analysis:
343
+ - What mathematical concepts and properties are involved?
344
+ - Can you recall similar classic problems or solution methods?
345
+ - Would diagrams or tables help visualize the problem?
346
+ - Are there special cases that need separate consideration?
347
+ ## Systematic Thinking
348
+ Plan your solution path:
349
+ - Propose multiple possible approaches
350
+ - Analyze the feasibility and merits of each method
351
+ - Choose the most appropriate method and explain why
352
+ - Break complex problems into smaller, manageable steps
353
+ ## Rigorous Proof
354
+ During the solution process:
355
+ - Provide solid justification for each step
356
+ - Include detailed proofs for key conclusions
357
+ - Pay attention to logical connections
358
+ - Be vigilant about potential oversights
359
+ ## Repeated Verification
360
+ After completing your solution:
361
+ - Verify your results satisfy all conditions
362
+ - Check for overlooked special cases
363
+ - Consider if the solution can be optimized or simplified
364
+ - Review your reasoning process
365
+ Remember:
366
+ 1. Take time to think thoroughly rather than rushing to an answer
367
+ 2. Rigorously prove each key conclusion
368
+ 3. Keep an open mind and try different approaches
369
+ 4. Summarize valuable problem-solving methods
370
+ 5. Maintain healthy skepticism and verify multiple times
371
+ Your response should reflect deep mathematical understanding and precise logical thinking, making your solution path and reasoning clear to others.
372
+ When you're ready, present your complete solution with:
373
+ - Clear problem understanding
374
+ - Detailed solution process
375
+ - Key insights
376
+ - Thorough verification
377
+ Focus on clear, logical progression of ideas and thorough explanation of your mathematical reasoning. Provide answers in the same language as the user asking the question, repeat the final answer using a '\\boxed{}' without any units, you have [[8192]] tokens to complete the answer.
378
+ """
379
+ ```
380
+
381
+ ##### Transformers 推理
382
+
383
+
384
+ ```python
385
+ import torch
386
+ from transformers import AutoTokenizer, AutoModelForCausalLM
387
+
388
+ model_dir = "internlm/internlm3-8b-instruct"
389
+ tokenizer = AutoTokenizer.from_pretrained(model_dir, trust_remote_code=True)
390
+ # Set `torch_dtype=torch.float16` to load model in float16, otherwise it will be loaded as float32 and might cause OOM Error.
391
+ model = AutoModelForCausalLM.from_pretrained(model_dir, trust_remote_code=True, torch_dtype=torch.bfloat16).cuda()
392
+ # (Optional) If on low resource devices, you can load model in 4-bit or 8-bit to further save GPU memory via bitsandbytes.
393
+ # InternLM3 8B in 4bit will cost nearly 8GB GPU memory.
394
+ # pip install -U bitsandbytes
395
+ # 8-bit: model = AutoModelForCausalLM.from_pretrained(model_dir, device_map="auto", trust_remote_code=True, load_in_8bit=True)
396
+ # 4-bit: model = AutoModelForCausalLM.from_pretrained(model_dir, device_map="auto", trust_remote_code=True, load_in_4bit=True)
397
+ model = model.eval()
398
+
399
+ messages = [
400
+ {"role": "system", "content": thinking_system_prompt},
401
+ {"role": "user", "content": "已知函数\(f(x)=\mathrm{e}^{x}-ax - a^{3}\)。\n(1)当\(a = 1\)时,求曲线\(y = f(x)\)在点\((1,f(1))\)处的切线方程;\n(2)若\(f(x)\)有极小值,且极小值小于\(0\),求\(a\)的取值范围。"},
402
+ ]
403
+ tokenized_chat = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt").to("cuda")
404
+
405
+ generated_ids = model.generate(tokenized_chat, max_new_tokens=8192)
406
+
407
+ generated_ids = [
408
+ output_ids[len(input_ids):] for input_ids, output_ids in zip(tokenized_chat, generated_ids)
409
+ ]
410
+ prompt = tokenizer.batch_decode(tokenized_chat)[0]
411
+ print(prompt)
412
+ response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
413
+ print(response)
414
+ ```
415
+ ##### LMDeploy 推理
416
+
417
+ LMDeploy is a toolkit for compressing, deploying, and serving LLM, developed by the MMRazor and MMDeploy teams.
418
+
419
+ ```bash
420
+ pip install lmdeploy
421
+ ```
422
+
423
+ You can run batch inference locally with the following python code:
424
+
425
+ ```python
426
+ from lmdeploy import pipeline, GenerationConfig, ChatTemplateConfig
427
+ model_dir = "internlm/internlm3-8b-instruct"
428
+ chat_template_config = ChatTemplateConfig(model_name='internlm3')
429
+ pipe = pipeline(model_dir, chat_template_config=chat_template_config)
430
+
431
+ messages = [
432
+ {"role": "system", "content": thinking_system_prompt},
433
+ {"role": "user", "content": "已知函数\(f(x)=\mathrm{e}^{x}-ax - a^{3}\)。\n(1)当\(a = 1\)时,求曲线\(y = f(x)\)在点\((1,f(1))\)处的切线方程;\n(2)若\(f(x)\)有极小值,且极小值小于\(0\),求\(a\)的取值范围。"},
434
+ ]
435
+
436
+ response = pipe(messages, gen_config=GenerationConfig(max_new_tokens=2048))
437
+ print(response)
438
+ ```
439
+
440
+ ##### Ollama 推理
441
+
442
+ 准备工作
443
+
444
+ ```python
445
+ # install ollama
446
+ curl -fsSL https://ollama.com/install.sh | sh
447
+ # fetch 模型
448
+ ollama pull internlm/internlm3-8b-instruct
449
+ # install python库
450
+ pip install ollama
451
+ ```
452
+
453
+ inference code,
454
+
455
+ ```python
456
+ import ollama
457
+
458
+ messages = [
459
+ {
460
+ "role": "system",
461
+ "content": thinking_system_prompt,
462
+ },
463
+ {
464
+ "role": "user",
465
+ "content": "Given the function\(f(x)=\mathrm{e}^{x}-ax - a^{3}\),\n(1) When \(a = 1\), find the equation of the tangent line to the curve \(y = f(x)\) at the point \((1,f(1))\).\n(2) If \(f(x)\) has a local minimum and the minimum value is less than \(0\), determine the range of values for \(a\)."
466
+ },
467
+ ]
468
+
469
+ stream = ollama.chat(
470
+ model='internlm/internlm3-8b-instruct',
471
+ messages=messages,
472
+ stream=True,
473
+ )
474
+
475
+ for chunk in stream:
476
+ print(chunk['message']['content'], end='', flush=True)
477
+ ```
478
+
479
+
480
+ ####
481
+
482
+ ##### vLLM 推理
483
+
484
+ 参考[文档](https://docs.vllm.ai/en/latest/getting_started/installation/index.html) 安装 vllm 最新代码
485
+
486
+ ```bash
487
+ pip install vllm --pre --extra-index-url https://wheels.vllm.ai/nightly
488
+ ```
489
+
490
+ 推理代码
491
+
492
+ ```python
493
+ from vllm import LLM, SamplingParams
494
+
495
+ llm = LLM(model="internlm/internlm3-8b-instruct")
496
+ sampling_params = SamplingParams(temperature=1, repetition_penalty=1.005, top_k=40, top_p=0.8, max_tokens=8192)
497
+
498
+ prompts = [
499
+ {
500
+ "role": "system",
501
+ "content": thinking_system_prompt,
502
+ },
503
+ {
504
+ "role": "user",
505
+ "content": "已知函数\(f(x)=\mathrm{e}^{x}-ax - a^{3}\)。\n(1)当\(a = 1\)时,求曲线\(y = f(x)\)在点\((1,f(1))\)处的切线方程;\n(2)若\(f(x)\)有极小值,且极小值小于\(0\),求\(a\)的取值范围。"
506
+ },
507
+ ]
508
+ outputs = llm.chat(prompts,
509
+ sampling_params=sampling_params,
510
+ use_tqdm=False)
511
+ print(outputs)
512
+ ```
513
+
514
+
515
+
516
+
517
+
518
+
519
+
520
+
521
+
522
+ ## 开源许可证
523
+
524
+ 本仓库的代码和权重依照 Apache-2.0 协议开源。
525
+
526
+ ## 引用
527
+
528
+ ```
529
+ @misc{cai2024internlm2,
530
+ title={InternLM2 Technical Report},
531
+ author={Zheng Cai and Maosong Cao and Haojiong Chen and Kai Chen and Keyu Chen and Xin Chen and Xun Chen and Zehui Chen and Zhi Chen and Pei Chu and Xiaoyi Dong and Haodong Duan and Qi Fan and Zhaoye Fei and Yang Gao and Jiaye Ge and Chenya Gu and Yuzhe Gu and Tao Gui and Aijia Guo and Qipeng Guo and Conghui He and Yingfan Hu and Ting Huang and Tao Jiang and Penglong Jiao and Zhenjiang Jin and Zhikai Lei and Jiaxing Li and Jingwen Li and Linyang Li and Shuaibin Li and Wei Li and Yining Li and Hongwei Liu and Jiangning Liu and Jiawei Hong and Kaiwen Liu and Kuikun Liu and Xiaoran Liu and Chengqi Lv and Haijun Lv and Kai Lv and Li Ma and Runyuan Ma and Zerun Ma and Wenchang Ning and Linke Ouyang and Jiantao Qiu and Yuan Qu and Fukai Shang and Yunfan Shao and Demin Song and Zifan Song and Zhihao Sui and Peng Sun and Yu Sun and Huanze Tang and Bin Wang and Guoteng Wang and Jiaqi Wang and Jiayu Wang and Rui Wang and Yudong Wang and Ziyi Wang and Xingjian Wei and Qizhen Weng and Fan Wu and Yingtong Xiong and Chao Xu and Ruiliang Xu and Hang Yan and Yirong Yan and Xiaogui Yang and Haochen Ye and Huaiyuan Ying and Jia Yu and Jing Yu and Yuhang Zang and Chuyu Zhang and Li Zhang and Pan Zhang and Peng Zhang and Ruijie Zhang and Shuo Zhang and Songyang Zhang and Wenjian Zhang and Wenwei Zhang and Xingcheng Zhang and Xinyue Zhang and Hui Zhao and Qian Zhao and Xiaomeng Zhao and Fengzhe Zhou and Zaida Zhou and Jingming Zhuo and Yicheng Zou and Xipeng Qiu and Yu Qiao and Dahua Lin},
532
+ year={2024},
533
+ eprint={2403.17297},
534
+ archivePrefix={arXiv},
535
+ primaryClass={cs.CL}
536
+ }
537
+ ```
config.json ADDED
@@ -0,0 +1,42 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "/mnt/141/develop_internlm3_open_source_hf_0110v1/20250109095225_hf-080_open_source_hf",
3
+ "architectures": [
4
+ "InternLM3ForCausalLM"
5
+ ],
6
+ "attention_dropout": 0.0,
7
+ "auto_map": {
8
+ "AutoConfig": "configuration_internlm3.InternLM3Config",
9
+ "AutoModel": "modeling_internlm3.InternLM3Model",
10
+ "AutoModelForCausalLM": "modeling_internlm3.InternLM3ForCausalLM"
11
+ },
12
+ "bias": false,
13
+ "bos_token_id": 1,
14
+ "eos_token_id": 2,
15
+ "head_dim": 128,
16
+ "hidden_act": "silu",
17
+ "hidden_size": 4096,
18
+ "initializer_range": 0.02,
19
+ "intermediate_size": 10240,
20
+ "max_position_embeddings": 32768,
21
+ "model_type": "internlm3",
22
+ "num_attention_heads": 32,
23
+ "num_hidden_layers": 48,
24
+ "num_key_value_heads": 2,
25
+ "pad_token_id": 2,
26
+ "qkv_bias": false,
27
+ "quantization_config": {
28
+ "quant_dtype": "float8_e4m3fn",
29
+ "quant_method": "smooth_quant"
30
+ },
31
+ "rms_norm_eps": 1e-05,
32
+ "rope_scaling": {
33
+ "factor": 6.0,
34
+ "rope_type": "dynamic"
35
+ },
36
+ "rope_theta": 50000000,
37
+ "tie_word_embeddings": false,
38
+ "torch_dtype": "bfloat16",
39
+ "transformers_version": "4.47.1",
40
+ "use_cache": false,
41
+ "vocab_size": 128512
42
+ }
configuration_internlm3.py ADDED
@@ -0,0 +1,197 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # coding=utf-8
2
+ # Copyright (c) The InternLM team and The HuggingFace Inc. team. All rights reserved.
3
+ #
4
+ # This code is based on transformers/src/transformers/models/llama/configuration_llama.py
5
+ #
6
+ # Licensed under the Apache License, Version 2.0 (the "License");
7
+ # you may not use this file except in compliance with the License.
8
+ # You may obtain a copy of the License at
9
+ #
10
+ # http://www.apache.org/licenses/LICENSE-2.0
11
+ #
12
+ # Unless required by applicable law or agreed to in writing, software
13
+ # distributed under the License is distributed on an "AS IS" BASIS,
14
+ # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
15
+ # See the License for the specific language governing permissions and
16
+ # limitations under the License.
17
+ """ InternLM3 model configuration"""
18
+
19
+ from transformers.configuration_utils import PretrainedConfig
20
+ from transformers.modeling_rope_utils import rope_config_validation
21
+ from transformers.utils import logging
22
+
23
+
24
+ logger = logging.get_logger(__name__)
25
+
26
+
27
+ class InternLM3Config(PretrainedConfig):
28
+ r"""
29
+ This is the configuration class to store the configuration of a [`InternLM2Model`]. It is used to instantiate
30
+ an InternLM2 model according to the specified arguments, defining the model architecture. Instantiating a
31
+ configuration with the defaults will yield a similar configuration to that of the InternLM2-7B.
32
+
33
+ Configuration objects inherit from [`PretrainedConfig`] and can be used to control the model outputs. Read the
34
+ documentation from [`PretrainedConfig`] for more information.
35
+
36
+
37
+ Args:
38
+ vocab_size (`int`, *optional*, defaults to 151936):
39
+ Vocabulary size of the InternLM3 model. Defines the number of different tokens that can be represented by the
40
+ `inputs_ids` passed when calling [`InternLM3Model`]
41
+ hidden_size (`int`, *optional*, defaults to 4096):
42
+ Dimension of the hidden representations.
43
+ intermediate_size (`int`, *optional*, defaults to 22016):
44
+ Dimension of the MLP representations.
45
+ num_hidden_layers (`int`, *optional*, defaults to 32):
46
+ Number of hidden layers in the Transformer encoder.
47
+ num_attention_heads (`int`, *optional*, defaults to 32):
48
+ Number of attention heads for each attention layer in the Transformer encoder.
49
+ num_key_value_heads (`int`, *optional*, defaults to 32):
50
+ This is the number of key_value heads that should be used to implement Grouped Query Attention. If
51
+ `num_key_value_heads=num_attention_heads`, the model will use Multi Head Attention (MHA), if
52
+ `num_key_value_heads=1` the model will use Multi Query Attention (MQA) otherwise GQA is used. When
53
+ converting a multi-head checkpoint to a GQA checkpoint, each group key and value head should be constructed
54
+ by meanpooling all the original heads within that group. For more details checkout [this
55
+ paper](https://arxiv.org/pdf/2305.13245.pdf). If it is not specified, will default to `32`.
56
+ hidden_act (`str` or `function`, *optional*, defaults to `"silu"`):
57
+ The non-linear activation function (function or string) in the decoder.
58
+ max_position_embeddings (`int`, *optional*, defaults to 32768):
59
+ The maximum sequence length that this model might ever be used with.
60
+ initializer_range (`float`, *optional*, defaults to 0.02):
61
+ The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
62
+ rms_norm_eps (`float`, *optional*, defaults to 1e-06):
63
+ The epsilon used by the rms normalization layers.
64
+ use_cache (`bool`, *optional*, defaults to `True`):
65
+ Whether or not the model should return the last key/values attentions (not used by all models). Only
66
+ relevant if `config.is_decoder=True`.
67
+ tie_word_embeddings (`bool`, *optional*, defaults to `False`):
68
+ Whether the model's input and output word embeddings should be tied.
69
+ rope_theta (`float`, *optional*, defaults to 10000.0):
70
+ The base period of the RoPE embeddings.
71
+ rope_scaling (`Dict`, *optional*):
72
+ Dictionary containing the scaling configuration for the RoPE embeddings. NOTE: if you apply new rope type
73
+ and you expect the model to work on longer `max_position_embeddings`, we recommend you to update this value
74
+ accordingly.
75
+ Expected contents:
76
+ `rope_type` (`str`):
77
+ The sub-variant of RoPE to use. Can be one of ['default', 'linear', 'dynamic', 'yarn', 'longrope',
78
+ 'llama3'], with 'default' being the original RoPE implementation.
79
+ `factor` (`float`, *optional*):
80
+ Used with all rope types except 'default'. The scaling factor to apply to the RoPE embeddings. In
81
+ most scaling types, a `factor` of x will enable the model to handle sequences of length x *
82
+ original maximum pre-trained length.
83
+ `original_max_position_embeddings` (`int`, *optional*):
84
+ Used with 'dynamic', 'longrope' and 'llama3'. The original max position embeddings used during
85
+ pretraining.
86
+ `attention_factor` (`float`, *optional*):
87
+ Used with 'yarn' and 'longrope'. The scaling factor to be applied on the attention
88
+ computation. If unspecified, it defaults to value recommended by the implementation, using the
89
+ `factor` field to infer the suggested value.
90
+ `beta_fast` (`float`, *optional*):
91
+ Only used with 'yarn'. Parameter to set the boundary for extrapolation (only) in the linear
92
+ ramp function. If unspecified, it defaults to 32.
93
+ `beta_slow` (`float`, *optional*):
94
+ Only used with 'yarn'. Parameter to set the boundary for interpolation (only) in the linear
95
+ ramp function. If unspecified, it defaults to 1.
96
+ `short_factor` (`List[float]`, *optional*):
97
+ Only used with 'longrope'. The scaling factor to be applied to short contexts (<
98
+ `original_max_position_embeddings`). Must be a list of numbers with the same length as the hidden
99
+ size divided by the number of attention heads divided by 2
100
+ `long_factor` (`List[float]`, *optional*):
101
+ Only used with 'longrope'. The scaling factor to be applied to long contexts (<
102
+ `original_max_position_embeddings`). Must be a list of numbers with the same length as the hidden
103
+ size divided by the number of attention heads divided by 2
104
+ `low_freq_factor` (`float`, *optional*):
105
+ Only used with 'llama3'. Scaling factor applied to low frequency components of the RoPE
106
+ `high_freq_factor` (`float`, *optional*):
107
+ Only used with 'llama3'. Scaling factor applied to high frequency components of the RoPE
108
+ qkv_bias (`bool`, *optional*, defaults to `False`):
109
+ Whether to use a bias in the query, key and value projection layers during self-attention.
110
+ attention_dropout (`float`, *optional*, defaults to 0.0):
111
+ The dropout ratio for the attention probabilities.
112
+ bias (`bool`, *optional*, defaults to `False`):
113
+ Whether to use a bias in o_proj, up_proj, down_proj and gate_proj layers.
114
+ head_dim (`int`, *optional*):
115
+ The attention head dimension. If None, it will default to hidden_size // num_heads
116
+
117
+ ```python
118
+ >>> from transformers import InternLM3Model, InternLM3Config
119
+
120
+ >>> # Initializing a InternLM3 style configuration
121
+ >>> configuration = InternLM3Config()
122
+
123
+ >>> # Initializing a model from the InternLM3-8B style configuration
124
+ >>> model = InternLM3Model(configuration)
125
+
126
+ >>> # Accessing the model configuration
127
+ >>> configuration = model.config
128
+ ```"""
129
+
130
+ model_type = "internlm3"
131
+ keys_to_ignore_at_inference = ["past_key_values"]
132
+
133
+ # Default tensor parallel plan for base model `InternLM3`
134
+ base_model_tp_plan = {
135
+ "layers.*.self_attn.q_proj": "colwise",
136
+ "layers.*.self_attn.k_proj": "colwise",
137
+ "layers.*.self_attn.v_proj": "colwise",
138
+ "layers.*.self_attn.o_proj": "rowwise",
139
+ "layers.*.mlp.gate_proj": "colwise",
140
+ "layers.*.mlp.up_proj": "colwise",
141
+ "layers.*.mlp.down_proj": "rowwise",
142
+ }
143
+
144
+ def __init__(
145
+ self,
146
+ vocab_size=128512,
147
+ hidden_size=4096,
148
+ intermediate_size=11008,
149
+ num_hidden_layers=32,
150
+ num_attention_heads=32,
151
+ num_key_value_heads=32,
152
+ hidden_act="silu",
153
+ max_position_embeddings=32768,
154
+ initializer_range=0.02,
155
+ rms_norm_eps=1e-6,
156
+ use_cache=True,
157
+ tie_word_embeddings=False,
158
+ rope_theta=10000.0,
159
+ rope_scaling=None,
160
+ qkv_bias=False,
161
+ attention_dropout=0.0,
162
+ bias=False,
163
+ head_dim=None,
164
+ **kwargs,
165
+ ):
166
+ self.vocab_size = vocab_size
167
+ self.max_position_embeddings = max_position_embeddings
168
+ self.hidden_size = hidden_size
169
+ self.intermediate_size = intermediate_size
170
+ self.num_hidden_layers = num_hidden_layers
171
+ self.num_attention_heads = num_attention_heads
172
+
173
+ # for backward compatibility
174
+ if num_key_value_heads is None:
175
+ num_key_value_heads = num_attention_heads
176
+
177
+ self.num_key_value_heads = num_key_value_heads
178
+ self.hidden_act = hidden_act
179
+ self.initializer_range = initializer_range
180
+ self.rms_norm_eps = rms_norm_eps
181
+ self.use_cache = use_cache
182
+ self.rope_theta = rope_theta
183
+ self.rope_scaling = rope_scaling
184
+ self.qkv_bias = qkv_bias
185
+ self.attention_dropout = attention_dropout
186
+ self.bias = bias
187
+ self.head_dim = head_dim if head_dim is not None else self.hidden_size // self.num_attention_heads
188
+ # Validate the correctness of rotary position embeddings parameters
189
+ # BC: if there is a 'type' field, move it to 'rope_type'.
190
+ if self.rope_scaling is not None and "type" in self.rope_scaling:
191
+ self.rope_scaling["rope_type"] = self.rope_scaling["type"]
192
+ rope_config_validation(self)
193
+
194
+ super().__init__(
195
+ tie_word_embeddings=tie_word_embeddings,
196
+ **kwargs,
197
+ )
generation_config.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token_id": 1,
3
+ "eos_token_id": [
4
+ 2,
5
+ 128131
6
+ ],
7
+ "pad_token_id": 2,
8
+ "transformers_version": "4.47.1"
9
+ }
model-00001-of-00002.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f8c477ec6b3fb8dbd93bf4c9996846a0040650a0340cf8f8441e37cd80fa6e0a
3
+ size 4967625864
model-00002-of-00002.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9ae8d5f7858f7d54d35794a32a87a57168d6b68a5184d590d5d945dc4911fa56
3
+ size 4896259688
model.safetensors.index.json ADDED
@@ -0,0 +1,778 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "metadata": {
3
+ "total_size": 9863798784
4
+ },
5
+ "weight_map": {
6
+ "lm_head.weight": "model-00002-of-00002.safetensors",
7
+ "model.embed_tokens.weight": "model-00001-of-00002.safetensors",
8
+ "model.layers.0.input_layernorm.weight": "model-00001-of-00002.safetensors",
9
+ "model.layers.0.mlp.down_proj.scale": "model-00001-of-00002.safetensors",
10
+ "model.layers.0.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
11
+ "model.layers.0.mlp.gate_proj.scale": "model-00001-of-00002.safetensors",
12
+ "model.layers.0.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
13
+ "model.layers.0.mlp.up_proj.scale": "model-00001-of-00002.safetensors",
14
+ "model.layers.0.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
15
+ "model.layers.0.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
16
+ "model.layers.0.self_attn.k_proj.scale": "model-00001-of-00002.safetensors",
17
+ "model.layers.0.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
18
+ "model.layers.0.self_attn.o_proj.scale": "model-00001-of-00002.safetensors",
19
+ "model.layers.0.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
20
+ "model.layers.0.self_attn.q_proj.scale": "model-00001-of-00002.safetensors",
21
+ "model.layers.0.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
22
+ "model.layers.0.self_attn.v_proj.scale": "model-00001-of-00002.safetensors",
23
+ "model.layers.0.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
24
+ "model.layers.1.input_layernorm.weight": "model-00001-of-00002.safetensors",
25
+ "model.layers.1.mlp.down_proj.scale": "model-00001-of-00002.safetensors",
26
+ "model.layers.1.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
27
+ "model.layers.1.mlp.gate_proj.scale": "model-00001-of-00002.safetensors",
28
+ "model.layers.1.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
29
+ "model.layers.1.mlp.up_proj.scale": "model-00001-of-00002.safetensors",
30
+ "model.layers.1.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
31
+ "model.layers.1.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
32
+ "model.layers.1.self_attn.k_proj.scale": "model-00001-of-00002.safetensors",
33
+ "model.layers.1.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
34
+ "model.layers.1.self_attn.o_proj.scale": "model-00001-of-00002.safetensors",
35
+ "model.layers.1.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
36
+ "model.layers.1.self_attn.q_proj.scale": "model-00001-of-00002.safetensors",
37
+ "model.layers.1.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
38
+ "model.layers.1.self_attn.v_proj.scale": "model-00001-of-00002.safetensors",
39
+ "model.layers.1.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
40
+ "model.layers.10.input_layernorm.weight": "model-00001-of-00002.safetensors",
41
+ "model.layers.10.mlp.down_proj.scale": "model-00001-of-00002.safetensors",
42
+ "model.layers.10.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
43
+ "model.layers.10.mlp.gate_proj.scale": "model-00001-of-00002.safetensors",
44
+ "model.layers.10.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
45
+ "model.layers.10.mlp.up_proj.scale": "model-00001-of-00002.safetensors",
46
+ "model.layers.10.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
47
+ "model.layers.10.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
48
+ "model.layers.10.self_attn.k_proj.scale": "model-00001-of-00002.safetensors",
49
+ "model.layers.10.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
50
+ "model.layers.10.self_attn.o_proj.scale": "model-00001-of-00002.safetensors",
51
+ "model.layers.10.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
52
+ "model.layers.10.self_attn.q_proj.scale": "model-00001-of-00002.safetensors",
53
+ "model.layers.10.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
54
+ "model.layers.10.self_attn.v_proj.scale": "model-00001-of-00002.safetensors",
55
+ "model.layers.10.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
56
+ "model.layers.11.input_layernorm.weight": "model-00001-of-00002.safetensors",
57
+ "model.layers.11.mlp.down_proj.scale": "model-00001-of-00002.safetensors",
58
+ "model.layers.11.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
59
+ "model.layers.11.mlp.gate_proj.scale": "model-00001-of-00002.safetensors",
60
+ "model.layers.11.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
61
+ "model.layers.11.mlp.up_proj.scale": "model-00001-of-00002.safetensors",
62
+ "model.layers.11.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
63
+ "model.layers.11.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
64
+ "model.layers.11.self_attn.k_proj.scale": "model-00001-of-00002.safetensors",
65
+ "model.layers.11.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
66
+ "model.layers.11.self_attn.o_proj.scale": "model-00001-of-00002.safetensors",
67
+ "model.layers.11.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
68
+ "model.layers.11.self_attn.q_proj.scale": "model-00001-of-00002.safetensors",
69
+ "model.layers.11.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
70
+ "model.layers.11.self_attn.v_proj.scale": "model-00001-of-00002.safetensors",
71
+ "model.layers.11.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
72
+ "model.layers.12.input_layernorm.weight": "model-00001-of-00002.safetensors",
73
+ "model.layers.12.mlp.down_proj.scale": "model-00001-of-00002.safetensors",
74
+ "model.layers.12.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
75
+ "model.layers.12.mlp.gate_proj.scale": "model-00001-of-00002.safetensors",
76
+ "model.layers.12.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
77
+ "model.layers.12.mlp.up_proj.scale": "model-00001-of-00002.safetensors",
78
+ "model.layers.12.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
79
+ "model.layers.12.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
80
+ "model.layers.12.self_attn.k_proj.scale": "model-00001-of-00002.safetensors",
81
+ "model.layers.12.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
82
+ "model.layers.12.self_attn.o_proj.scale": "model-00001-of-00002.safetensors",
83
+ "model.layers.12.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
84
+ "model.layers.12.self_attn.q_proj.scale": "model-00001-of-00002.safetensors",
85
+ "model.layers.12.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
86
+ "model.layers.12.self_attn.v_proj.scale": "model-00001-of-00002.safetensors",
87
+ "model.layers.12.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
88
+ "model.layers.13.input_layernorm.weight": "model-00001-of-00002.safetensors",
89
+ "model.layers.13.mlp.down_proj.scale": "model-00001-of-00002.safetensors",
90
+ "model.layers.13.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
91
+ "model.layers.13.mlp.gate_proj.scale": "model-00001-of-00002.safetensors",
92
+ "model.layers.13.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
93
+ "model.layers.13.mlp.up_proj.scale": "model-00001-of-00002.safetensors",
94
+ "model.layers.13.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
95
+ "model.layers.13.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
96
+ "model.layers.13.self_attn.k_proj.scale": "model-00001-of-00002.safetensors",
97
+ "model.layers.13.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
98
+ "model.layers.13.self_attn.o_proj.scale": "model-00001-of-00002.safetensors",
99
+ "model.layers.13.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
100
+ "model.layers.13.self_attn.q_proj.scale": "model-00001-of-00002.safetensors",
101
+ "model.layers.13.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
102
+ "model.layers.13.self_attn.v_proj.scale": "model-00001-of-00002.safetensors",
103
+ "model.layers.13.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
104
+ "model.layers.14.input_layernorm.weight": "model-00001-of-00002.safetensors",
105
+ "model.layers.14.mlp.down_proj.scale": "model-00001-of-00002.safetensors",
106
+ "model.layers.14.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
107
+ "model.layers.14.mlp.gate_proj.scale": "model-00001-of-00002.safetensors",
108
+ "model.layers.14.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
109
+ "model.layers.14.mlp.up_proj.scale": "model-00001-of-00002.safetensors",
110
+ "model.layers.14.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
111
+ "model.layers.14.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
112
+ "model.layers.14.self_attn.k_proj.scale": "model-00001-of-00002.safetensors",
113
+ "model.layers.14.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
114
+ "model.layers.14.self_attn.o_proj.scale": "model-00001-of-00002.safetensors",
115
+ "model.layers.14.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
116
+ "model.layers.14.self_attn.q_proj.scale": "model-00001-of-00002.safetensors",
117
+ "model.layers.14.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
118
+ "model.layers.14.self_attn.v_proj.scale": "model-00001-of-00002.safetensors",
119
+ "model.layers.14.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
120
+ "model.layers.15.input_layernorm.weight": "model-00001-of-00002.safetensors",
121
+ "model.layers.15.mlp.down_proj.scale": "model-00001-of-00002.safetensors",
122
+ "model.layers.15.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
123
+ "model.layers.15.mlp.gate_proj.scale": "model-00001-of-00002.safetensors",
124
+ "model.layers.15.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
125
+ "model.layers.15.mlp.up_proj.scale": "model-00001-of-00002.safetensors",
126
+ "model.layers.15.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
127
+ "model.layers.15.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
128
+ "model.layers.15.self_attn.k_proj.scale": "model-00001-of-00002.safetensors",
129
+ "model.layers.15.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
130
+ "model.layers.15.self_attn.o_proj.scale": "model-00001-of-00002.safetensors",
131
+ "model.layers.15.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
132
+ "model.layers.15.self_attn.q_proj.scale": "model-00001-of-00002.safetensors",
133
+ "model.layers.15.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
134
+ "model.layers.15.self_attn.v_proj.scale": "model-00001-of-00002.safetensors",
135
+ "model.layers.15.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
136
+ "model.layers.16.input_layernorm.weight": "model-00001-of-00002.safetensors",
137
+ "model.layers.16.mlp.down_proj.scale": "model-00001-of-00002.safetensors",
138
+ "model.layers.16.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
139
+ "model.layers.16.mlp.gate_proj.scale": "model-00001-of-00002.safetensors",
140
+ "model.layers.16.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
141
+ "model.layers.16.mlp.up_proj.scale": "model-00001-of-00002.safetensors",
142
+ "model.layers.16.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
143
+ "model.layers.16.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
144
+ "model.layers.16.self_attn.k_proj.scale": "model-00001-of-00002.safetensors",
145
+ "model.layers.16.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
146
+ "model.layers.16.self_attn.o_proj.scale": "model-00001-of-00002.safetensors",
147
+ "model.layers.16.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
148
+ "model.layers.16.self_attn.q_proj.scale": "model-00001-of-00002.safetensors",
149
+ "model.layers.16.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
150
+ "model.layers.16.self_attn.v_proj.scale": "model-00001-of-00002.safetensors",
151
+ "model.layers.16.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
152
+ "model.layers.17.input_layernorm.weight": "model-00001-of-00002.safetensors",
153
+ "model.layers.17.mlp.down_proj.scale": "model-00001-of-00002.safetensors",
154
+ "model.layers.17.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
155
+ "model.layers.17.mlp.gate_proj.scale": "model-00001-of-00002.safetensors",
156
+ "model.layers.17.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
157
+ "model.layers.17.mlp.up_proj.scale": "model-00001-of-00002.safetensors",
158
+ "model.layers.17.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
159
+ "model.layers.17.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
160
+ "model.layers.17.self_attn.k_proj.scale": "model-00001-of-00002.safetensors",
161
+ "model.layers.17.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
162
+ "model.layers.17.self_attn.o_proj.scale": "model-00001-of-00002.safetensors",
163
+ "model.layers.17.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
164
+ "model.layers.17.self_attn.q_proj.scale": "model-00001-of-00002.safetensors",
165
+ "model.layers.17.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
166
+ "model.layers.17.self_attn.v_proj.scale": "model-00001-of-00002.safetensors",
167
+ "model.layers.17.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
168
+ "model.layers.18.input_layernorm.weight": "model-00001-of-00002.safetensors",
169
+ "model.layers.18.mlp.down_proj.scale": "model-00001-of-00002.safetensors",
170
+ "model.layers.18.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
171
+ "model.layers.18.mlp.gate_proj.scale": "model-00001-of-00002.safetensors",
172
+ "model.layers.18.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
173
+ "model.layers.18.mlp.up_proj.scale": "model-00001-of-00002.safetensors",
174
+ "model.layers.18.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
175
+ "model.layers.18.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
176
+ "model.layers.18.self_attn.k_proj.scale": "model-00001-of-00002.safetensors",
177
+ "model.layers.18.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
178
+ "model.layers.18.self_attn.o_proj.scale": "model-00001-of-00002.safetensors",
179
+ "model.layers.18.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
180
+ "model.layers.18.self_attn.q_proj.scale": "model-00001-of-00002.safetensors",
181
+ "model.layers.18.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
182
+ "model.layers.18.self_attn.v_proj.scale": "model-00001-of-00002.safetensors",
183
+ "model.layers.18.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
184
+ "model.layers.19.input_layernorm.weight": "model-00001-of-00002.safetensors",
185
+ "model.layers.19.mlp.down_proj.scale": "model-00001-of-00002.safetensors",
186
+ "model.layers.19.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
187
+ "model.layers.19.mlp.gate_proj.scale": "model-00001-of-00002.safetensors",
188
+ "model.layers.19.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
189
+ "model.layers.19.mlp.up_proj.scale": "model-00001-of-00002.safetensors",
190
+ "model.layers.19.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
191
+ "model.layers.19.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
192
+ "model.layers.19.self_attn.k_proj.scale": "model-00001-of-00002.safetensors",
193
+ "model.layers.19.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
194
+ "model.layers.19.self_attn.o_proj.scale": "model-00001-of-00002.safetensors",
195
+ "model.layers.19.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
196
+ "model.layers.19.self_attn.q_proj.scale": "model-00001-of-00002.safetensors",
197
+ "model.layers.19.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
198
+ "model.layers.19.self_attn.v_proj.scale": "model-00001-of-00002.safetensors",
199
+ "model.layers.19.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
200
+ "model.layers.2.input_layernorm.weight": "model-00001-of-00002.safetensors",
201
+ "model.layers.2.mlp.down_proj.scale": "model-00001-of-00002.safetensors",
202
+ "model.layers.2.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
203
+ "model.layers.2.mlp.gate_proj.scale": "model-00001-of-00002.safetensors",
204
+ "model.layers.2.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
205
+ "model.layers.2.mlp.up_proj.scale": "model-00001-of-00002.safetensors",
206
+ "model.layers.2.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
207
+ "model.layers.2.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
208
+ "model.layers.2.self_attn.k_proj.scale": "model-00001-of-00002.safetensors",
209
+ "model.layers.2.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
210
+ "model.layers.2.self_attn.o_proj.scale": "model-00001-of-00002.safetensors",
211
+ "model.layers.2.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
212
+ "model.layers.2.self_attn.q_proj.scale": "model-00001-of-00002.safetensors",
213
+ "model.layers.2.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
214
+ "model.layers.2.self_attn.v_proj.scale": "model-00001-of-00002.safetensors",
215
+ "model.layers.2.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
216
+ "model.layers.20.input_layernorm.weight": "model-00001-of-00002.safetensors",
217
+ "model.layers.20.mlp.down_proj.scale": "model-00001-of-00002.safetensors",
218
+ "model.layers.20.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
219
+ "model.layers.20.mlp.gate_proj.scale": "model-00001-of-00002.safetensors",
220
+ "model.layers.20.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
221
+ "model.layers.20.mlp.up_proj.scale": "model-00001-of-00002.safetensors",
222
+ "model.layers.20.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
223
+ "model.layers.20.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
224
+ "model.layers.20.self_attn.k_proj.scale": "model-00001-of-00002.safetensors",
225
+ "model.layers.20.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
226
+ "model.layers.20.self_attn.o_proj.scale": "model-00001-of-00002.safetensors",
227
+ "model.layers.20.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
228
+ "model.layers.20.self_attn.q_proj.scale": "model-00001-of-00002.safetensors",
229
+ "model.layers.20.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
230
+ "model.layers.20.self_attn.v_proj.scale": "model-00001-of-00002.safetensors",
231
+ "model.layers.20.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
232
+ "model.layers.21.input_layernorm.weight": "model-00001-of-00002.safetensors",
233
+ "model.layers.21.mlp.down_proj.scale": "model-00001-of-00002.safetensors",
234
+ "model.layers.21.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
235
+ "model.layers.21.mlp.gate_proj.scale": "model-00001-of-00002.safetensors",
236
+ "model.layers.21.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
237
+ "model.layers.21.mlp.up_proj.scale": "model-00001-of-00002.safetensors",
238
+ "model.layers.21.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
239
+ "model.layers.21.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
240
+ "model.layers.21.self_attn.k_proj.scale": "model-00001-of-00002.safetensors",
241
+ "model.layers.21.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
242
+ "model.layers.21.self_attn.o_proj.scale": "model-00001-of-00002.safetensors",
243
+ "model.layers.21.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
244
+ "model.layers.21.self_attn.q_proj.scale": "model-00001-of-00002.safetensors",
245
+ "model.layers.21.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
246
+ "model.layers.21.self_attn.v_proj.scale": "model-00001-of-00002.safetensors",
247
+ "model.layers.21.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
248
+ "model.layers.22.input_layernorm.weight": "model-00001-of-00002.safetensors",
249
+ "model.layers.22.mlp.down_proj.scale": "model-00001-of-00002.safetensors",
250
+ "model.layers.22.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
251
+ "model.layers.22.mlp.gate_proj.scale": "model-00001-of-00002.safetensors",
252
+ "model.layers.22.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
253
+ "model.layers.22.mlp.up_proj.scale": "model-00001-of-00002.safetensors",
254
+ "model.layers.22.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
255
+ "model.layers.22.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
256
+ "model.layers.22.self_attn.k_proj.scale": "model-00001-of-00002.safetensors",
257
+ "model.layers.22.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
258
+ "model.layers.22.self_attn.o_proj.scale": "model-00001-of-00002.safetensors",
259
+ "model.layers.22.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
260
+ "model.layers.22.self_attn.q_proj.scale": "model-00001-of-00002.safetensors",
261
+ "model.layers.22.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
262
+ "model.layers.22.self_attn.v_proj.scale": "model-00001-of-00002.safetensors",
263
+ "model.layers.22.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
264
+ "model.layers.23.input_layernorm.weight": "model-00001-of-00002.safetensors",
265
+ "model.layers.23.mlp.down_proj.scale": "model-00001-of-00002.safetensors",
266
+ "model.layers.23.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
267
+ "model.layers.23.mlp.gate_proj.scale": "model-00001-of-00002.safetensors",
268
+ "model.layers.23.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
269
+ "model.layers.23.mlp.up_proj.scale": "model-00001-of-00002.safetensors",
270
+ "model.layers.23.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
271
+ "model.layers.23.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
272
+ "model.layers.23.self_attn.k_proj.scale": "model-00001-of-00002.safetensors",
273
+ "model.layers.23.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
274
+ "model.layers.23.self_attn.o_proj.scale": "model-00001-of-00002.safetensors",
275
+ "model.layers.23.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
276
+ "model.layers.23.self_attn.q_proj.scale": "model-00001-of-00002.safetensors",
277
+ "model.layers.23.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
278
+ "model.layers.23.self_attn.v_proj.scale": "model-00001-of-00002.safetensors",
279
+ "model.layers.23.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
280
+ "model.layers.24.input_layernorm.weight": "model-00002-of-00002.safetensors",
281
+ "model.layers.24.mlp.down_proj.scale": "model-00002-of-00002.safetensors",
282
+ "model.layers.24.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
283
+ "model.layers.24.mlp.gate_proj.scale": "model-00002-of-00002.safetensors",
284
+ "model.layers.24.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
285
+ "model.layers.24.mlp.up_proj.scale": "model-00002-of-00002.safetensors",
286
+ "model.layers.24.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
287
+ "model.layers.24.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
288
+ "model.layers.24.self_attn.k_proj.scale": "model-00001-of-00002.safetensors",
289
+ "model.layers.24.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
290
+ "model.layers.24.self_attn.o_proj.scale": "model-00001-of-00002.safetensors",
291
+ "model.layers.24.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
292
+ "model.layers.24.self_attn.q_proj.scale": "model-00001-of-00002.safetensors",
293
+ "model.layers.24.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
294
+ "model.layers.24.self_attn.v_proj.scale": "model-00001-of-00002.safetensors",
295
+ "model.layers.24.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
296
+ "model.layers.25.input_layernorm.weight": "model-00002-of-00002.safetensors",
297
+ "model.layers.25.mlp.down_proj.scale": "model-00002-of-00002.safetensors",
298
+ "model.layers.25.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
299
+ "model.layers.25.mlp.gate_proj.scale": "model-00002-of-00002.safetensors",
300
+ "model.layers.25.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
301
+ "model.layers.25.mlp.up_proj.scale": "model-00002-of-00002.safetensors",
302
+ "model.layers.25.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
303
+ "model.layers.25.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
304
+ "model.layers.25.self_attn.k_proj.scale": "model-00002-of-00002.safetensors",
305
+ "model.layers.25.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
306
+ "model.layers.25.self_attn.o_proj.scale": "model-00002-of-00002.safetensors",
307
+ "model.layers.25.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
308
+ "model.layers.25.self_attn.q_proj.scale": "model-00002-of-00002.safetensors",
309
+ "model.layers.25.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
310
+ "model.layers.25.self_attn.v_proj.scale": "model-00002-of-00002.safetensors",
311
+ "model.layers.25.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
312
+ "model.layers.26.input_layernorm.weight": "model-00002-of-00002.safetensors",
313
+ "model.layers.26.mlp.down_proj.scale": "model-00002-of-00002.safetensors",
314
+ "model.layers.26.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
315
+ "model.layers.26.mlp.gate_proj.scale": "model-00002-of-00002.safetensors",
316
+ "model.layers.26.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
317
+ "model.layers.26.mlp.up_proj.scale": "model-00002-of-00002.safetensors",
318
+ "model.layers.26.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
319
+ "model.layers.26.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
320
+ "model.layers.26.self_attn.k_proj.scale": "model-00002-of-00002.safetensors",
321
+ "model.layers.26.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
322
+ "model.layers.26.self_attn.o_proj.scale": "model-00002-of-00002.safetensors",
323
+ "model.layers.26.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
324
+ "model.layers.26.self_attn.q_proj.scale": "model-00002-of-00002.safetensors",
325
+ "model.layers.26.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
326
+ "model.layers.26.self_attn.v_proj.scale": "model-00002-of-00002.safetensors",
327
+ "model.layers.26.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
328
+ "model.layers.27.input_layernorm.weight": "model-00002-of-00002.safetensors",
329
+ "model.layers.27.mlp.down_proj.scale": "model-00002-of-00002.safetensors",
330
+ "model.layers.27.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
331
+ "model.layers.27.mlp.gate_proj.scale": "model-00002-of-00002.safetensors",
332
+ "model.layers.27.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
333
+ "model.layers.27.mlp.up_proj.scale": "model-00002-of-00002.safetensors",
334
+ "model.layers.27.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
335
+ "model.layers.27.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
336
+ "model.layers.27.self_attn.k_proj.scale": "model-00002-of-00002.safetensors",
337
+ "model.layers.27.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
338
+ "model.layers.27.self_attn.o_proj.scale": "model-00002-of-00002.safetensors",
339
+ "model.layers.27.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
340
+ "model.layers.27.self_attn.q_proj.scale": "model-00002-of-00002.safetensors",
341
+ "model.layers.27.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
342
+ "model.layers.27.self_attn.v_proj.scale": "model-00002-of-00002.safetensors",
343
+ "model.layers.27.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
344
+ "model.layers.28.input_layernorm.weight": "model-00002-of-00002.safetensors",
345
+ "model.layers.28.mlp.down_proj.scale": "model-00002-of-00002.safetensors",
346
+ "model.layers.28.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
347
+ "model.layers.28.mlp.gate_proj.scale": "model-00002-of-00002.safetensors",
348
+ "model.layers.28.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
349
+ "model.layers.28.mlp.up_proj.scale": "model-00002-of-00002.safetensors",
350
+ "model.layers.28.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
351
+ "model.layers.28.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
352
+ "model.layers.28.self_attn.k_proj.scale": "model-00002-of-00002.safetensors",
353
+ "model.layers.28.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
354
+ "model.layers.28.self_attn.o_proj.scale": "model-00002-of-00002.safetensors",
355
+ "model.layers.28.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
356
+ "model.layers.28.self_attn.q_proj.scale": "model-00002-of-00002.safetensors",
357
+ "model.layers.28.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
358
+ "model.layers.28.self_attn.v_proj.scale": "model-00002-of-00002.safetensors",
359
+ "model.layers.28.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
360
+ "model.layers.29.input_layernorm.weight": "model-00002-of-00002.safetensors",
361
+ "model.layers.29.mlp.down_proj.scale": "model-00002-of-00002.safetensors",
362
+ "model.layers.29.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
363
+ "model.layers.29.mlp.gate_proj.scale": "model-00002-of-00002.safetensors",
364
+ "model.layers.29.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
365
+ "model.layers.29.mlp.up_proj.scale": "model-00002-of-00002.safetensors",
366
+ "model.layers.29.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
367
+ "model.layers.29.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
368
+ "model.layers.29.self_attn.k_proj.scale": "model-00002-of-00002.safetensors",
369
+ "model.layers.29.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
370
+ "model.layers.29.self_attn.o_proj.scale": "model-00002-of-00002.safetensors",
371
+ "model.layers.29.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
372
+ "model.layers.29.self_attn.q_proj.scale": "model-00002-of-00002.safetensors",
373
+ "model.layers.29.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
374
+ "model.layers.29.self_attn.v_proj.scale": "model-00002-of-00002.safetensors",
375
+ "model.layers.29.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
376
+ "model.layers.3.input_layernorm.weight": "model-00001-of-00002.safetensors",
377
+ "model.layers.3.mlp.down_proj.scale": "model-00001-of-00002.safetensors",
378
+ "model.layers.3.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
379
+ "model.layers.3.mlp.gate_proj.scale": "model-00001-of-00002.safetensors",
380
+ "model.layers.3.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
381
+ "model.layers.3.mlp.up_proj.scale": "model-00001-of-00002.safetensors",
382
+ "model.layers.3.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
383
+ "model.layers.3.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
384
+ "model.layers.3.self_attn.k_proj.scale": "model-00001-of-00002.safetensors",
385
+ "model.layers.3.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
386
+ "model.layers.3.self_attn.o_proj.scale": "model-00001-of-00002.safetensors",
387
+ "model.layers.3.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
388
+ "model.layers.3.self_attn.q_proj.scale": "model-00001-of-00002.safetensors",
389
+ "model.layers.3.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
390
+ "model.layers.3.self_attn.v_proj.scale": "model-00001-of-00002.safetensors",
391
+ "model.layers.3.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
392
+ "model.layers.30.input_layernorm.weight": "model-00002-of-00002.safetensors",
393
+ "model.layers.30.mlp.down_proj.scale": "model-00002-of-00002.safetensors",
394
+ "model.layers.30.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
395
+ "model.layers.30.mlp.gate_proj.scale": "model-00002-of-00002.safetensors",
396
+ "model.layers.30.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
397
+ "model.layers.30.mlp.up_proj.scale": "model-00002-of-00002.safetensors",
398
+ "model.layers.30.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
399
+ "model.layers.30.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
400
+ "model.layers.30.self_attn.k_proj.scale": "model-00002-of-00002.safetensors",
401
+ "model.layers.30.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
402
+ "model.layers.30.self_attn.o_proj.scale": "model-00002-of-00002.safetensors",
403
+ "model.layers.30.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
404
+ "model.layers.30.self_attn.q_proj.scale": "model-00002-of-00002.safetensors",
405
+ "model.layers.30.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
406
+ "model.layers.30.self_attn.v_proj.scale": "model-00002-of-00002.safetensors",
407
+ "model.layers.30.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
408
+ "model.layers.31.input_layernorm.weight": "model-00002-of-00002.safetensors",
409
+ "model.layers.31.mlp.down_proj.scale": "model-00002-of-00002.safetensors",
410
+ "model.layers.31.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
411
+ "model.layers.31.mlp.gate_proj.scale": "model-00002-of-00002.safetensors",
412
+ "model.layers.31.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
413
+ "model.layers.31.mlp.up_proj.scale": "model-00002-of-00002.safetensors",
414
+ "model.layers.31.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
415
+ "model.layers.31.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
416
+ "model.layers.31.self_attn.k_proj.scale": "model-00002-of-00002.safetensors",
417
+ "model.layers.31.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
418
+ "model.layers.31.self_attn.o_proj.scale": "model-00002-of-00002.safetensors",
419
+ "model.layers.31.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
420
+ "model.layers.31.self_attn.q_proj.scale": "model-00002-of-00002.safetensors",
421
+ "model.layers.31.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
422
+ "model.layers.31.self_attn.v_proj.scale": "model-00002-of-00002.safetensors",
423
+ "model.layers.31.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
424
+ "model.layers.32.input_layernorm.weight": "model-00002-of-00002.safetensors",
425
+ "model.layers.32.mlp.down_proj.scale": "model-00002-of-00002.safetensors",
426
+ "model.layers.32.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
427
+ "model.layers.32.mlp.gate_proj.scale": "model-00002-of-00002.safetensors",
428
+ "model.layers.32.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
429
+ "model.layers.32.mlp.up_proj.scale": "model-00002-of-00002.safetensors",
430
+ "model.layers.32.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
431
+ "model.layers.32.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
432
+ "model.layers.32.self_attn.k_proj.scale": "model-00002-of-00002.safetensors",
433
+ "model.layers.32.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
434
+ "model.layers.32.self_attn.o_proj.scale": "model-00002-of-00002.safetensors",
435
+ "model.layers.32.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
436
+ "model.layers.32.self_attn.q_proj.scale": "model-00002-of-00002.safetensors",
437
+ "model.layers.32.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
438
+ "model.layers.32.self_attn.v_proj.scale": "model-00002-of-00002.safetensors",
439
+ "model.layers.32.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
440
+ "model.layers.33.input_layernorm.weight": "model-00002-of-00002.safetensors",
441
+ "model.layers.33.mlp.down_proj.scale": "model-00002-of-00002.safetensors",
442
+ "model.layers.33.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
443
+ "model.layers.33.mlp.gate_proj.scale": "model-00002-of-00002.safetensors",
444
+ "model.layers.33.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
445
+ "model.layers.33.mlp.up_proj.scale": "model-00002-of-00002.safetensors",
446
+ "model.layers.33.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
447
+ "model.layers.33.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
448
+ "model.layers.33.self_attn.k_proj.scale": "model-00002-of-00002.safetensors",
449
+ "model.layers.33.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
450
+ "model.layers.33.self_attn.o_proj.scale": "model-00002-of-00002.safetensors",
451
+ "model.layers.33.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
452
+ "model.layers.33.self_attn.q_proj.scale": "model-00002-of-00002.safetensors",
453
+ "model.layers.33.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
454
+ "model.layers.33.self_attn.v_proj.scale": "model-00002-of-00002.safetensors",
455
+ "model.layers.33.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
456
+ "model.layers.34.input_layernorm.weight": "model-00002-of-00002.safetensors",
457
+ "model.layers.34.mlp.down_proj.scale": "model-00002-of-00002.safetensors",
458
+ "model.layers.34.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
459
+ "model.layers.34.mlp.gate_proj.scale": "model-00002-of-00002.safetensors",
460
+ "model.layers.34.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
461
+ "model.layers.34.mlp.up_proj.scale": "model-00002-of-00002.safetensors",
462
+ "model.layers.34.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
463
+ "model.layers.34.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
464
+ "model.layers.34.self_attn.k_proj.scale": "model-00002-of-00002.safetensors",
465
+ "model.layers.34.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
466
+ "model.layers.34.self_attn.o_proj.scale": "model-00002-of-00002.safetensors",
467
+ "model.layers.34.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
468
+ "model.layers.34.self_attn.q_proj.scale": "model-00002-of-00002.safetensors",
469
+ "model.layers.34.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
470
+ "model.layers.34.self_attn.v_proj.scale": "model-00002-of-00002.safetensors",
471
+ "model.layers.34.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
472
+ "model.layers.35.input_layernorm.weight": "model-00002-of-00002.safetensors",
473
+ "model.layers.35.mlp.down_proj.scale": "model-00002-of-00002.safetensors",
474
+ "model.layers.35.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
475
+ "model.layers.35.mlp.gate_proj.scale": "model-00002-of-00002.safetensors",
476
+ "model.layers.35.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
477
+ "model.layers.35.mlp.up_proj.scale": "model-00002-of-00002.safetensors",
478
+ "model.layers.35.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
479
+ "model.layers.35.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
480
+ "model.layers.35.self_attn.k_proj.scale": "model-00002-of-00002.safetensors",
481
+ "model.layers.35.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
482
+ "model.layers.35.self_attn.o_proj.scale": "model-00002-of-00002.safetensors",
483
+ "model.layers.35.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
484
+ "model.layers.35.self_attn.q_proj.scale": "model-00002-of-00002.safetensors",
485
+ "model.layers.35.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
486
+ "model.layers.35.self_attn.v_proj.scale": "model-00002-of-00002.safetensors",
487
+ "model.layers.35.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
488
+ "model.layers.36.input_layernorm.weight": "model-00002-of-00002.safetensors",
489
+ "model.layers.36.mlp.down_proj.scale": "model-00002-of-00002.safetensors",
490
+ "model.layers.36.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
491
+ "model.layers.36.mlp.gate_proj.scale": "model-00002-of-00002.safetensors",
492
+ "model.layers.36.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
493
+ "model.layers.36.mlp.up_proj.scale": "model-00002-of-00002.safetensors",
494
+ "model.layers.36.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
495
+ "model.layers.36.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
496
+ "model.layers.36.self_attn.k_proj.scale": "model-00002-of-00002.safetensors",
497
+ "model.layers.36.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
498
+ "model.layers.36.self_attn.o_proj.scale": "model-00002-of-00002.safetensors",
499
+ "model.layers.36.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
500
+ "model.layers.36.self_attn.q_proj.scale": "model-00002-of-00002.safetensors",
501
+ "model.layers.36.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
502
+ "model.layers.36.self_attn.v_proj.scale": "model-00002-of-00002.safetensors",
503
+ "model.layers.36.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
504
+ "model.layers.37.input_layernorm.weight": "model-00002-of-00002.safetensors",
505
+ "model.layers.37.mlp.down_proj.scale": "model-00002-of-00002.safetensors",
506
+ "model.layers.37.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
507
+ "model.layers.37.mlp.gate_proj.scale": "model-00002-of-00002.safetensors",
508
+ "model.layers.37.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
509
+ "model.layers.37.mlp.up_proj.scale": "model-00002-of-00002.safetensors",
510
+ "model.layers.37.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
511
+ "model.layers.37.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
512
+ "model.layers.37.self_attn.k_proj.scale": "model-00002-of-00002.safetensors",
513
+ "model.layers.37.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
514
+ "model.layers.37.self_attn.o_proj.scale": "model-00002-of-00002.safetensors",
515
+ "model.layers.37.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
516
+ "model.layers.37.self_attn.q_proj.scale": "model-00002-of-00002.safetensors",
517
+ "model.layers.37.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
518
+ "model.layers.37.self_attn.v_proj.scale": "model-00002-of-00002.safetensors",
519
+ "model.layers.37.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
520
+ "model.layers.38.input_layernorm.weight": "model-00002-of-00002.safetensors",
521
+ "model.layers.38.mlp.down_proj.scale": "model-00002-of-00002.safetensors",
522
+ "model.layers.38.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
523
+ "model.layers.38.mlp.gate_proj.scale": "model-00002-of-00002.safetensors",
524
+ "model.layers.38.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
525
+ "model.layers.38.mlp.up_proj.scale": "model-00002-of-00002.safetensors",
526
+ "model.layers.38.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
527
+ "model.layers.38.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
528
+ "model.layers.38.self_attn.k_proj.scale": "model-00002-of-00002.safetensors",
529
+ "model.layers.38.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
530
+ "model.layers.38.self_attn.o_proj.scale": "model-00002-of-00002.safetensors",
531
+ "model.layers.38.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
532
+ "model.layers.38.self_attn.q_proj.scale": "model-00002-of-00002.safetensors",
533
+ "model.layers.38.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
534
+ "model.layers.38.self_attn.v_proj.scale": "model-00002-of-00002.safetensors",
535
+ "model.layers.38.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
536
+ "model.layers.39.input_layernorm.weight": "model-00002-of-00002.safetensors",
537
+ "model.layers.39.mlp.down_proj.scale": "model-00002-of-00002.safetensors",
538
+ "model.layers.39.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
539
+ "model.layers.39.mlp.gate_proj.scale": "model-00002-of-00002.safetensors",
540
+ "model.layers.39.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
541
+ "model.layers.39.mlp.up_proj.scale": "model-00002-of-00002.safetensors",
542
+ "model.layers.39.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
543
+ "model.layers.39.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
544
+ "model.layers.39.self_attn.k_proj.scale": "model-00002-of-00002.safetensors",
545
+ "model.layers.39.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
546
+ "model.layers.39.self_attn.o_proj.scale": "model-00002-of-00002.safetensors",
547
+ "model.layers.39.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
548
+ "model.layers.39.self_attn.q_proj.scale": "model-00002-of-00002.safetensors",
549
+ "model.layers.39.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
550
+ "model.layers.39.self_attn.v_proj.scale": "model-00002-of-00002.safetensors",
551
+ "model.layers.39.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
552
+ "model.layers.4.input_layernorm.weight": "model-00001-of-00002.safetensors",
553
+ "model.layers.4.mlp.down_proj.scale": "model-00001-of-00002.safetensors",
554
+ "model.layers.4.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
555
+ "model.layers.4.mlp.gate_proj.scale": "model-00001-of-00002.safetensors",
556
+ "model.layers.4.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
557
+ "model.layers.4.mlp.up_proj.scale": "model-00001-of-00002.safetensors",
558
+ "model.layers.4.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
559
+ "model.layers.4.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
560
+ "model.layers.4.self_attn.k_proj.scale": "model-00001-of-00002.safetensors",
561
+ "model.layers.4.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
562
+ "model.layers.4.self_attn.o_proj.scale": "model-00001-of-00002.safetensors",
563
+ "model.layers.4.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
564
+ "model.layers.4.self_attn.q_proj.scale": "model-00001-of-00002.safetensors",
565
+ "model.layers.4.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
566
+ "model.layers.4.self_attn.v_proj.scale": "model-00001-of-00002.safetensors",
567
+ "model.layers.4.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
568
+ "model.layers.40.input_layernorm.weight": "model-00002-of-00002.safetensors",
569
+ "model.layers.40.mlp.down_proj.scale": "model-00002-of-00002.safetensors",
570
+ "model.layers.40.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
571
+ "model.layers.40.mlp.gate_proj.scale": "model-00002-of-00002.safetensors",
572
+ "model.layers.40.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
573
+ "model.layers.40.mlp.up_proj.scale": "model-00002-of-00002.safetensors",
574
+ "model.layers.40.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
575
+ "model.layers.40.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
576
+ "model.layers.40.self_attn.k_proj.scale": "model-00002-of-00002.safetensors",
577
+ "model.layers.40.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
578
+ "model.layers.40.self_attn.o_proj.scale": "model-00002-of-00002.safetensors",
579
+ "model.layers.40.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
580
+ "model.layers.40.self_attn.q_proj.scale": "model-00002-of-00002.safetensors",
581
+ "model.layers.40.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
582
+ "model.layers.40.self_attn.v_proj.scale": "model-00002-of-00002.safetensors",
583
+ "model.layers.40.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
584
+ "model.layers.41.input_layernorm.weight": "model-00002-of-00002.safetensors",
585
+ "model.layers.41.mlp.down_proj.scale": "model-00002-of-00002.safetensors",
586
+ "model.layers.41.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
587
+ "model.layers.41.mlp.gate_proj.scale": "model-00002-of-00002.safetensors",
588
+ "model.layers.41.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
589
+ "model.layers.41.mlp.up_proj.scale": "model-00002-of-00002.safetensors",
590
+ "model.layers.41.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
591
+ "model.layers.41.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
592
+ "model.layers.41.self_attn.k_proj.scale": "model-00002-of-00002.safetensors",
593
+ "model.layers.41.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
594
+ "model.layers.41.self_attn.o_proj.scale": "model-00002-of-00002.safetensors",
595
+ "model.layers.41.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
596
+ "model.layers.41.self_attn.q_proj.scale": "model-00002-of-00002.safetensors",
597
+ "model.layers.41.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
598
+ "model.layers.41.self_attn.v_proj.scale": "model-00002-of-00002.safetensors",
599
+ "model.layers.41.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
600
+ "model.layers.42.input_layernorm.weight": "model-00002-of-00002.safetensors",
601
+ "model.layers.42.mlp.down_proj.scale": "model-00002-of-00002.safetensors",
602
+ "model.layers.42.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
603
+ "model.layers.42.mlp.gate_proj.scale": "model-00002-of-00002.safetensors",
604
+ "model.layers.42.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
605
+ "model.layers.42.mlp.up_proj.scale": "model-00002-of-00002.safetensors",
606
+ "model.layers.42.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
607
+ "model.layers.42.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
608
+ "model.layers.42.self_attn.k_proj.scale": "model-00002-of-00002.safetensors",
609
+ "model.layers.42.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
610
+ "model.layers.42.self_attn.o_proj.scale": "model-00002-of-00002.safetensors",
611
+ "model.layers.42.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
612
+ "model.layers.42.self_attn.q_proj.scale": "model-00002-of-00002.safetensors",
613
+ "model.layers.42.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
614
+ "model.layers.42.self_attn.v_proj.scale": "model-00002-of-00002.safetensors",
615
+ "model.layers.42.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
616
+ "model.layers.43.input_layernorm.weight": "model-00002-of-00002.safetensors",
617
+ "model.layers.43.mlp.down_proj.scale": "model-00002-of-00002.safetensors",
618
+ "model.layers.43.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
619
+ "model.layers.43.mlp.gate_proj.scale": "model-00002-of-00002.safetensors",
620
+ "model.layers.43.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
621
+ "model.layers.43.mlp.up_proj.scale": "model-00002-of-00002.safetensors",
622
+ "model.layers.43.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
623
+ "model.layers.43.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
624
+ "model.layers.43.self_attn.k_proj.scale": "model-00002-of-00002.safetensors",
625
+ "model.layers.43.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
626
+ "model.layers.43.self_attn.o_proj.scale": "model-00002-of-00002.safetensors",
627
+ "model.layers.43.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
628
+ "model.layers.43.self_attn.q_proj.scale": "model-00002-of-00002.safetensors",
629
+ "model.layers.43.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
630
+ "model.layers.43.self_attn.v_proj.scale": "model-00002-of-00002.safetensors",
631
+ "model.layers.43.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
632
+ "model.layers.44.input_layernorm.weight": "model-00002-of-00002.safetensors",
633
+ "model.layers.44.mlp.down_proj.scale": "model-00002-of-00002.safetensors",
634
+ "model.layers.44.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
635
+ "model.layers.44.mlp.gate_proj.scale": "model-00002-of-00002.safetensors",
636
+ "model.layers.44.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
637
+ "model.layers.44.mlp.up_proj.scale": "model-00002-of-00002.safetensors",
638
+ "model.layers.44.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
639
+ "model.layers.44.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
640
+ "model.layers.44.self_attn.k_proj.scale": "model-00002-of-00002.safetensors",
641
+ "model.layers.44.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
642
+ "model.layers.44.self_attn.o_proj.scale": "model-00002-of-00002.safetensors",
643
+ "model.layers.44.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
644
+ "model.layers.44.self_attn.q_proj.scale": "model-00002-of-00002.safetensors",
645
+ "model.layers.44.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
646
+ "model.layers.44.self_attn.v_proj.scale": "model-00002-of-00002.safetensors",
647
+ "model.layers.44.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
648
+ "model.layers.45.input_layernorm.weight": "model-00002-of-00002.safetensors",
649
+ "model.layers.45.mlp.down_proj.scale": "model-00002-of-00002.safetensors",
650
+ "model.layers.45.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
651
+ "model.layers.45.mlp.gate_proj.scale": "model-00002-of-00002.safetensors",
652
+ "model.layers.45.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
653
+ "model.layers.45.mlp.up_proj.scale": "model-00002-of-00002.safetensors",
654
+ "model.layers.45.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
655
+ "model.layers.45.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
656
+ "model.layers.45.self_attn.k_proj.scale": "model-00002-of-00002.safetensors",
657
+ "model.layers.45.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
658
+ "model.layers.45.self_attn.o_proj.scale": "model-00002-of-00002.safetensors",
659
+ "model.layers.45.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
660
+ "model.layers.45.self_attn.q_proj.scale": "model-00002-of-00002.safetensors",
661
+ "model.layers.45.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
662
+ "model.layers.45.self_attn.v_proj.scale": "model-00002-of-00002.safetensors",
663
+ "model.layers.45.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
664
+ "model.layers.46.input_layernorm.weight": "model-00002-of-00002.safetensors",
665
+ "model.layers.46.mlp.down_proj.scale": "model-00002-of-00002.safetensors",
666
+ "model.layers.46.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
667
+ "model.layers.46.mlp.gate_proj.scale": "model-00002-of-00002.safetensors",
668
+ "model.layers.46.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
669
+ "model.layers.46.mlp.up_proj.scale": "model-00002-of-00002.safetensors",
670
+ "model.layers.46.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
671
+ "model.layers.46.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
672
+ "model.layers.46.self_attn.k_proj.scale": "model-00002-of-00002.safetensors",
673
+ "model.layers.46.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
674
+ "model.layers.46.self_attn.o_proj.scale": "model-00002-of-00002.safetensors",
675
+ "model.layers.46.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
676
+ "model.layers.46.self_attn.q_proj.scale": "model-00002-of-00002.safetensors",
677
+ "model.layers.46.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
678
+ "model.layers.46.self_attn.v_proj.scale": "model-00002-of-00002.safetensors",
679
+ "model.layers.46.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
680
+ "model.layers.47.input_layernorm.weight": "model-00002-of-00002.safetensors",
681
+ "model.layers.47.mlp.down_proj.scale": "model-00002-of-00002.safetensors",
682
+ "model.layers.47.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
683
+ "model.layers.47.mlp.gate_proj.scale": "model-00002-of-00002.safetensors",
684
+ "model.layers.47.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
685
+ "model.layers.47.mlp.up_proj.scale": "model-00002-of-00002.safetensors",
686
+ "model.layers.47.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
687
+ "model.layers.47.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
688
+ "model.layers.47.self_attn.k_proj.scale": "model-00002-of-00002.safetensors",
689
+ "model.layers.47.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
690
+ "model.layers.47.self_attn.o_proj.scale": "model-00002-of-00002.safetensors",
691
+ "model.layers.47.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
692
+ "model.layers.47.self_attn.q_proj.scale": "model-00002-of-00002.safetensors",
693
+ "model.layers.47.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
694
+ "model.layers.47.self_attn.v_proj.scale": "model-00002-of-00002.safetensors",
695
+ "model.layers.47.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
696
+ "model.layers.5.input_layernorm.weight": "model-00001-of-00002.safetensors",
697
+ "model.layers.5.mlp.down_proj.scale": "model-00001-of-00002.safetensors",
698
+ "model.layers.5.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
699
+ "model.layers.5.mlp.gate_proj.scale": "model-00001-of-00002.safetensors",
700
+ "model.layers.5.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
701
+ "model.layers.5.mlp.up_proj.scale": "model-00001-of-00002.safetensors",
702
+ "model.layers.5.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
703
+ "model.layers.5.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
704
+ "model.layers.5.self_attn.k_proj.scale": "model-00001-of-00002.safetensors",
705
+ "model.layers.5.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
706
+ "model.layers.5.self_attn.o_proj.scale": "model-00001-of-00002.safetensors",
707
+ "model.layers.5.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
708
+ "model.layers.5.self_attn.q_proj.scale": "model-00001-of-00002.safetensors",
709
+ "model.layers.5.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
710
+ "model.layers.5.self_attn.v_proj.scale": "model-00001-of-00002.safetensors",
711
+ "model.layers.5.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
712
+ "model.layers.6.input_layernorm.weight": "model-00001-of-00002.safetensors",
713
+ "model.layers.6.mlp.down_proj.scale": "model-00001-of-00002.safetensors",
714
+ "model.layers.6.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
715
+ "model.layers.6.mlp.gate_proj.scale": "model-00001-of-00002.safetensors",
716
+ "model.layers.6.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
717
+ "model.layers.6.mlp.up_proj.scale": "model-00001-of-00002.safetensors",
718
+ "model.layers.6.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
719
+ "model.layers.6.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
720
+ "model.layers.6.self_attn.k_proj.scale": "model-00001-of-00002.safetensors",
721
+ "model.layers.6.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
722
+ "model.layers.6.self_attn.o_proj.scale": "model-00001-of-00002.safetensors",
723
+ "model.layers.6.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
724
+ "model.layers.6.self_attn.q_proj.scale": "model-00001-of-00002.safetensors",
725
+ "model.layers.6.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
726
+ "model.layers.6.self_attn.v_proj.scale": "model-00001-of-00002.safetensors",
727
+ "model.layers.6.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
728
+ "model.layers.7.input_layernorm.weight": "model-00001-of-00002.safetensors",
729
+ "model.layers.7.mlp.down_proj.scale": "model-00001-of-00002.safetensors",
730
+ "model.layers.7.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
731
+ "model.layers.7.mlp.gate_proj.scale": "model-00001-of-00002.safetensors",
732
+ "model.layers.7.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
733
+ "model.layers.7.mlp.up_proj.scale": "model-00001-of-00002.safetensors",
734
+ "model.layers.7.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
735
+ "model.layers.7.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
736
+ "model.layers.7.self_attn.k_proj.scale": "model-00001-of-00002.safetensors",
737
+ "model.layers.7.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
738
+ "model.layers.7.self_attn.o_proj.scale": "model-00001-of-00002.safetensors",
739
+ "model.layers.7.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
740
+ "model.layers.7.self_attn.q_proj.scale": "model-00001-of-00002.safetensors",
741
+ "model.layers.7.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
742
+ "model.layers.7.self_attn.v_proj.scale": "model-00001-of-00002.safetensors",
743
+ "model.layers.7.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
744
+ "model.layers.8.input_layernorm.weight": "model-00001-of-00002.safetensors",
745
+ "model.layers.8.mlp.down_proj.scale": "model-00001-of-00002.safetensors",
746
+ "model.layers.8.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
747
+ "model.layers.8.mlp.gate_proj.scale": "model-00001-of-00002.safetensors",
748
+ "model.layers.8.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
749
+ "model.layers.8.mlp.up_proj.scale": "model-00001-of-00002.safetensors",
750
+ "model.layers.8.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
751
+ "model.layers.8.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
752
+ "model.layers.8.self_attn.k_proj.scale": "model-00001-of-00002.safetensors",
753
+ "model.layers.8.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
754
+ "model.layers.8.self_attn.o_proj.scale": "model-00001-of-00002.safetensors",
755
+ "model.layers.8.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
756
+ "model.layers.8.self_attn.q_proj.scale": "model-00001-of-00002.safetensors",
757
+ "model.layers.8.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
758
+ "model.layers.8.self_attn.v_proj.scale": "model-00001-of-00002.safetensors",
759
+ "model.layers.8.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
760
+ "model.layers.9.input_layernorm.weight": "model-00001-of-00002.safetensors",
761
+ "model.layers.9.mlp.down_proj.scale": "model-00001-of-00002.safetensors",
762
+ "model.layers.9.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
763
+ "model.layers.9.mlp.gate_proj.scale": "model-00001-of-00002.safetensors",
764
+ "model.layers.9.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
765
+ "model.layers.9.mlp.up_proj.scale": "model-00001-of-00002.safetensors",
766
+ "model.layers.9.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
767
+ "model.layers.9.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
768
+ "model.layers.9.self_attn.k_proj.scale": "model-00001-of-00002.safetensors",
769
+ "model.layers.9.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
770
+ "model.layers.9.self_attn.o_proj.scale": "model-00001-of-00002.safetensors",
771
+ "model.layers.9.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
772
+ "model.layers.9.self_attn.q_proj.scale": "model-00001-of-00002.safetensors",
773
+ "model.layers.9.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
774
+ "model.layers.9.self_attn.v_proj.scale": "model-00001-of-00002.safetensors",
775
+ "model.layers.9.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
776
+ "model.norm.weight": "model-00002-of-00002.safetensors"
777
+ }
778
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,54 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ "<|im_start|>",
4
+ "<|im_end|>",
5
+ "<|action_start|>",
6
+ "<|action_end|>",
7
+ "<|interpreter|>",
8
+ "<|plugin|>",
9
+ "<restate>",
10
+ "</restate>",
11
+ "<planning>",
12
+ "</planning>",
13
+ "<recollect>",
14
+ "</recollect>",
15
+ "<execution>",
16
+ "</execution>",
17
+ "<review>",
18
+ "</review>",
19
+ "<summarize>",
20
+ "</summarize>",
21
+ "<retry>",
22
+ "</retry>",
23
+ "<conclude>",
24
+ "</conclude>"
25
+ ],
26
+ "bos_token": {
27
+ "content": "<s>",
28
+ "lstrip": false,
29
+ "normalized": false,
30
+ "rstrip": false,
31
+ "single_word": false
32
+ },
33
+ "eos_token": {
34
+ "content": "</s>",
35
+ "lstrip": false,
36
+ "normalized": false,
37
+ "rstrip": false,
38
+ "single_word": false
39
+ },
40
+ "pad_token": {
41
+ "content": "</s>",
42
+ "lstrip": false,
43
+ "normalized": false,
44
+ "rstrip": false,
45
+ "single_word": false
46
+ },
47
+ "unk_token": {
48
+ "content": "<unk>",
49
+ "lstrip": false,
50
+ "normalized": false,
51
+ "rstrip": false,
52
+ "single_word": false
53
+ }
54
+ }
tokenization_internlm3.py ADDED
@@ -0,0 +1,294 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ from shutil import copyfile
3
+ from typing import TYPE_CHECKING, Any, Dict, List, Optional, Tuple
4
+
5
+ import sentencepiece as spm
6
+ from transformers.tokenization_utils import AddedToken, PreTrainedTokenizer
7
+ from transformers.utils import logging
8
+
9
+ if TYPE_CHECKING:
10
+ from transformers.tokenization_utils_base import TextInput
11
+
12
+ logger = logging.get_logger(__name__)
13
+
14
+ VOCAB_FILES_NAMES = {"vocab_file": "tokenizer.model"}
15
+
16
+ SPIECE_UNDERLINE = "▁"
17
+
18
+
19
+ class InternLM3Tokenizer(PreTrainedTokenizer):
20
+ """
21
+ Construct a InternLM3 tokenizer. Based on byte-level Byte-Pair-Encoding. The default padding token is unset as there is
22
+ no padding token in the original model.
23
+
24
+ Args:
25
+ vocab_file (`str`):
26
+ Path to the vocabulary file.
27
+ unk_token (`str` or `tokenizers.AddedToken`, *optional*, defaults to `"<unk>"`):
28
+ The unknown token. A token that is not in the vocabulary cannot be converted to an ID and is set to be this
29
+ token instead.
30
+ bos_token (`str` or `tokenizers.AddedToken`, *optional*, defaults to `"<s>"`):
31
+ The beginning of sequence token that was used during pretraining. Can be used a sequence classifier token.
32
+ eos_token (`str` or `tokenizers.AddedToken`, *optional*, defaults to `"</s>"`):
33
+ The end of sequence token.
34
+ pad_token (`str` or `tokenizers.AddedToken`, *optional*):
35
+ A special token used to make arrays of tokens the same size for batching purpose. Will then be ignored by
36
+ attention mechanisms or loss computation.
37
+ sp_model_kwargs (`Dict[str, Any]`, `Optional`, *optional*):
38
+ Will be passed to the `SentencePieceProcessor.__init__()` method. The [Python wrapper for
39
+ SentencePiece](https://github.com/google/sentencepiece/tree/master/python) can be used, among other things,
40
+ to set:
41
+
42
+ - `enable_sampling`: Enable subword regularization.
43
+ - `nbest_size`: Sampling parameters for unigram. Invalid for BPE-Dropout.
44
+
45
+ - `nbest_size = {0,1}`: No sampling is performed.
46
+ - `nbest_size > 1`: samples from the nbest_size results.
47
+ - `nbest_size < 0`: assuming that nbest_size is infinite and samples from the all hypothesis (lattice)
48
+ using forward-filtering-and-backward-sampling algorithm.
49
+
50
+ - `alpha`: Smoothing parameter for unigram sampling, and dropout probability of merge operations for
51
+ BPE-dropout.
52
+
53
+ add_bos_token (`bool`, *optional*, defaults to `True`):
54
+ Whether or not to add an `bos_token` at the start of sequences.
55
+ add_eos_token (`bool`, *optional*, defaults to `False`):
56
+ Whether or not to add an `eos_token` at the end of sequences.
57
+ clean_up_tokenization_spaces (`bool`, *optional*, defaults to `False`):
58
+ Whether or not to cleanup spaces after decoding, cleanup consists in removing potential artifacts like
59
+ extra spaces.
60
+ use_default_system_prompt (`bool`, *optional*, defaults to `False`):
61
+ Whether or not the default system prompt for InternLM3 should be used.
62
+ spaces_between_special_tokens (`bool`, *optional*, defaults to `False`):
63
+ Whether or not to add spaces between special tokens.
64
+ spaces_for_interleaved_special_tokens (`bool`, *optional*, defaults to `False`):
65
+ Whether or not to add spaces between special tokens that are interleaved with normal tokens.
66
+ add_prefix_space (`bool`, *optional*, defaults to `True`):
67
+ Whether or not to add an initial space to the input. This allows to treat the leading word just as any
68
+ other word. Again, this should be set with `from_slow=True` to make sure it's taken into account.
69
+ """
70
+
71
+ vocab_files_names = VOCAB_FILES_NAMES
72
+ model_input_names = ["input_ids", "attention_mask"]
73
+
74
+ def __init__(
75
+ self,
76
+ vocab_file,
77
+ unk_token="<unk>",
78
+ bos_token="<s>",
79
+ eos_token="</s>",
80
+ pad_token=None,
81
+ sp_model_kwargs: Optional[Dict[str, Any]] = None,
82
+ add_bos_token=True,
83
+ add_eos_token=False,
84
+ clean_up_tokenization_spaces=False,
85
+ use_default_system_prompt=False,
86
+ spaces_between_special_tokens=False,
87
+ spaces_for_interleaved_special_tokens=False,
88
+ add_prefix_space=True,
89
+ **kwargs,
90
+ ):
91
+ self.sp_model_kwargs = {} if sp_model_kwargs is None else sp_model_kwargs
92
+ bos_token = AddedToken(bos_token, normalized=False, special=True) if isinstance(bos_token, str) else bos_token
93
+ eos_token = AddedToken(eos_token, normalized=False, special=True) if isinstance(eos_token, str) else eos_token
94
+ unk_token = AddedToken(unk_token, normalized=False, special=True) if isinstance(unk_token, str) else unk_token
95
+ pad_token = AddedToken(pad_token, normalized=False, special=True) if isinstance(pad_token, str) else pad_token
96
+
97
+ self.vocab_file = vocab_file
98
+ self.add_bos_token = add_bos_token
99
+ self.add_eos_token = add_eos_token
100
+ self.use_default_system_prompt = use_default_system_prompt
101
+ self.sp_model = spm.SentencePieceProcessor(**self.sp_model_kwargs)
102
+ self.sp_model.Load(vocab_file)
103
+ self.add_prefix_space = add_prefix_space
104
+ self.spaces_for_interleaved_special_tokens = spaces_for_interleaved_special_tokens
105
+
106
+ vocab_size = self.sp_model.get_piece_size()
107
+ self.decoder = {i: self.sp_model.id_to_piece(i) for i in range(vocab_size)}
108
+
109
+ super().__init__(
110
+ bos_token=bos_token,
111
+ eos_token=eos_token,
112
+ unk_token=unk_token,
113
+ pad_token=pad_token,
114
+ add_bos_token=add_bos_token,
115
+ add_eos_token=add_eos_token,
116
+ sp_model_kwargs=sp_model_kwargs,
117
+ clean_up_tokenization_spaces=clean_up_tokenization_spaces,
118
+ use_default_system_prompt=use_default_system_prompt,
119
+ spaces_between_special_tokens=spaces_between_special_tokens,
120
+ add_prefix_space=add_prefix_space,
121
+ **kwargs,
122
+ )
123
+
124
+ def __getstate__(self):
125
+ state = self.__dict__.copy()
126
+ state["sp_model"] = None
127
+ state["sp_model_proto"] = self.sp_model.serialized_model_proto()
128
+ return state
129
+
130
+ def __setstate__(self, d):
131
+ self.__dict__.update(d)
132
+ self.sp_model = spm.SentencePieceProcessor(**self.sp_model_kwargs)
133
+ self.sp_model.LoadFromSerializedProto(self.sp_model_proto)
134
+
135
+ @property
136
+ def vocab_size(self):
137
+ """Returns vocab size"""
138
+ return self.sp_model.get_piece_size()
139
+
140
+ def get_vocab(self):
141
+ """Returns vocab as a dict"""
142
+ vocab = {self.convert_ids_to_tokens(i): i for i in range(self.vocab_size)}
143
+ vocab.update(self.added_tokens_encoder)
144
+ return vocab
145
+
146
+ def tokenize(self, text: "TextInput", **kwargs) -> List[str]:
147
+ """
148
+ Args:
149
+ text: TextInput
150
+ Simply calls PreTrainedTokenizer's method
151
+ """
152
+ return super().tokenize(text, **kwargs)
153
+
154
+ def _tokenize(self, text, **kwargs):
155
+ """
156
+ Args:
157
+ text: TextInput
158
+ Returns a tokenized string. The Gemma tokenizer never adds a prefix space.
159
+ """
160
+ return self.sp_model.encode(text, out_type=str)
161
+
162
+ def _convert_token_to_id(self, token):
163
+ """Converts a token (str) in an id using the vocab."""
164
+ return self.sp_model.piece_to_id(token)
165
+
166
+ def _convert_id_to_token(self, index):
167
+ """Converts an index (integer) in a token (str) using the vocab."""
168
+ return self.decoder.get(index, "")
169
+
170
+ def convert_tokens_to_string(self, tokens):
171
+ """Converts a sequence of tokens (string) in a single string."""
172
+ # since we manually add the prefix space, we have to remove it when decoding
173
+ if tokens[0].startswith(SPIECE_UNDERLINE) and self.add_prefix_space:
174
+ tokens[0] = tokens[0][1:]
175
+
176
+ current_sub_tokens = []
177
+ out_string = ""
178
+ prev_is_special = False
179
+ for i, token in enumerate(tokens):
180
+ # make sure that special tokens are not decoded using sentencepiece model
181
+ if token in self.all_special_tokens:
182
+ if not prev_is_special and i != 0 and self.spaces_for_interleaved_special_tokens:
183
+ out_string += " "
184
+ out_string += self.sp_model.decode(current_sub_tokens) + token
185
+ prev_is_special = True
186
+ current_sub_tokens = []
187
+ else:
188
+ if (
189
+ prev_is_special
190
+ and i == 1
191
+ and self.add_prefix_space
192
+ and not token.startswith(SPIECE_UNDERLINE)
193
+ and self.spaces_for_interleaved_special_tokens
194
+ ):
195
+ out_string += " "
196
+ current_sub_tokens.append(token)
197
+ prev_is_special = False
198
+ out_string += self.sp_model.decode(current_sub_tokens)
199
+ return out_string
200
+
201
+ def save_vocabulary(self, save_directory, filename_prefix: Optional[str] = None) -> Tuple[str]:
202
+ """
203
+ Save the vocabulary and special tokens file to a directory.
204
+
205
+ Args:
206
+ save_directory (`str`):
207
+ The directory in which to save the vocabulary.
208
+
209
+ Returns:
210
+ `Tuple(str)`: Paths to the files saved.
211
+ """
212
+ if not os.path.isdir(save_directory):
213
+ logger.error(f"Vocabulary path ({save_directory}) should be a directory")
214
+ return
215
+ out_vocab_file = os.path.join(save_directory, (filename_prefix + "-" if filename_prefix else "") + VOCAB_FILES_NAMES["vocab_file"])
216
+
217
+ if os.path.abspath(self.vocab_file) != os.path.abspath(out_vocab_file) and os.path.isfile(self.vocab_file):
218
+ copyfile(self.vocab_file, out_vocab_file)
219
+ elif not os.path.isfile(self.vocab_file):
220
+ with open(out_vocab_file, "wb") as fi:
221
+ content_spiece_model = self.sp_model.serialized_model_proto()
222
+ fi.write(content_spiece_model)
223
+
224
+ return (out_vocab_file,)
225
+
226
+ def build_inputs_with_special_tokens(self, token_ids_0, token_ids_1=None):
227
+ bos_token_id = [self.bos_token_id] if self.add_bos_token else []
228
+ eos_token_id = [self.eos_token_id] if self.add_eos_token else []
229
+
230
+ output = bos_token_id + token_ids_0 + eos_token_id
231
+
232
+ if token_ids_1 is not None:
233
+ output = output + bos_token_id + token_ids_1 + eos_token_id
234
+
235
+ return output
236
+
237
+ def get_special_tokens_mask(
238
+ self, token_ids_0: List[int], token_ids_1: Optional[List[int]] = None, already_has_special_tokens: bool = False
239
+ ) -> List[int]:
240
+ """
241
+ Retrieve sequence ids from a token list that has no special tokens added. This method is called when adding
242
+ special tokens using the tokenizer `prepare_for_model` method.
243
+
244
+ Args:
245
+ token_ids_0 (`List[int]`):
246
+ List of IDs.
247
+ token_ids_1 (`List[int]`, *optional*):
248
+ Optional second list of IDs for sequence pairs.
249
+ already_has_special_tokens (`bool`, *optional*, defaults to `False`):
250
+ Whether or not the token list is already formatted with special tokens for the model.
251
+
252
+ Returns:
253
+ `List[int]`: A list of integers in the range [0, 1]: 1 for a special token, 0 for a sequence token.
254
+ """
255
+ if already_has_special_tokens:
256
+ return super().get_special_tokens_mask(token_ids_0=token_ids_0, token_ids_1=token_ids_1, already_has_special_tokens=True)
257
+
258
+ bos_token_id = [1] if self.add_bos_token else []
259
+ eos_token_id = [1] if self.add_eos_token else []
260
+
261
+ if token_ids_1 is None:
262
+ return bos_token_id + ([0] * len(token_ids_0)) + eos_token_id
263
+ return bos_token_id + ([0] * len(token_ids_0)) + eos_token_id + bos_token_id + ([0] * len(token_ids_1)) + eos_token_id
264
+
265
+ def create_token_type_ids_from_sequences(self, token_ids_0: List[int], token_ids_1: Optional[List[int]] = None) -> List[int]:
266
+ """
267
+ Creates a mask from the two sequences passed to be used in a sequence-pair classification task. An ALBERT
268
+ sequence pair mask has the following format:
269
+
270
+ ```
271
+ 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1
272
+ | first sequence | second sequence |
273
+ ```
274
+
275
+ if token_ids_1 is None, only returns the first portion of the mask (0s).
276
+
277
+ Args:
278
+ token_ids_0 (`List[int]`):
279
+ List of ids.
280
+ token_ids_1 (`List[int]`, *optional*):
281
+ Optional second list of IDs for sequence pairs.
282
+
283
+ Returns:
284
+ `List[int]`: List of [token type IDs](../glossary#token-type-ids) according to the given sequence(s).
285
+ """
286
+ bos_token_id = [self.bos_token_id] if self.add_bos_token else []
287
+ eos_token_id = [self.eos_token_id] if self.add_eos_token else []
288
+
289
+ output = [0] * len(bos_token_id + token_ids_0 + eos_token_id)
290
+
291
+ if token_ids_1 is not None:
292
+ output += [1] * len(bos_token_id + token_ids_1 + eos_token_id)
293
+
294
+ return output
tokenizer.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:bcacff3229854f5103ee7a85473a30ca9a8b3a68f3aae9b7479574b23ac2256b
3
+ size 2475075
tokenizer_config.json ADDED
@@ -0,0 +1,249 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": true,
3
+ "add_eos_token": false,
4
+ "add_prefix_space": true,
5
+ "added_tokens_decoder": {
6
+ "0": {
7
+ "content": "<unk>",
8
+ "lstrip": false,
9
+ "normalized": false,
10
+ "rstrip": false,
11
+ "single_word": false,
12
+ "special": true
13
+ },
14
+ "1": {
15
+ "content": "<s>",
16
+ "lstrip": false,
17
+ "normalized": false,
18
+ "rstrip": false,
19
+ "single_word": false,
20
+ "special": true
21
+ },
22
+ "2": {
23
+ "content": "</s>",
24
+ "lstrip": false,
25
+ "normalized": false,
26
+ "rstrip": false,
27
+ "single_word": false,
28
+ "special": true
29
+ },
30
+ "128111": {
31
+ "content": "<restate>",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false,
36
+ "special": true
37
+ },
38
+ "128112": {
39
+ "content": "</restate>",
40
+ "lstrip": false,
41
+ "normalized": false,
42
+ "rstrip": false,
43
+ "single_word": false,
44
+ "special": true
45
+ },
46
+ "128113": {
47
+ "content": "<planning>",
48
+ "lstrip": false,
49
+ "normalized": false,
50
+ "rstrip": false,
51
+ "single_word": false,
52
+ "special": true
53
+ },
54
+ "128114": {
55
+ "content": "</planning>",
56
+ "lstrip": false,
57
+ "normalized": false,
58
+ "rstrip": false,
59
+ "single_word": false,
60
+ "special": true
61
+ },
62
+ "128115": {
63
+ "content": "<recollect>",
64
+ "lstrip": false,
65
+ "normalized": false,
66
+ "rstrip": false,
67
+ "single_word": false,
68
+ "special": true
69
+ },
70
+ "128116": {
71
+ "content": "</recollect>",
72
+ "lstrip": false,
73
+ "normalized": false,
74
+ "rstrip": false,
75
+ "single_word": false,
76
+ "special": true
77
+ },
78
+ "128117": {
79
+ "content": "<execution>",
80
+ "lstrip": false,
81
+ "normalized": false,
82
+ "rstrip": false,
83
+ "single_word": false,
84
+ "special": true
85
+ },
86
+ "128118": {
87
+ "content": "</execution>",
88
+ "lstrip": false,
89
+ "normalized": false,
90
+ "rstrip": false,
91
+ "single_word": false,
92
+ "special": true
93
+ },
94
+ "128119": {
95
+ "content": "<review>",
96
+ "lstrip": false,
97
+ "normalized": false,
98
+ "rstrip": false,
99
+ "single_word": false,
100
+ "special": true
101
+ },
102
+ "128120": {
103
+ "content": "</review>",
104
+ "lstrip": false,
105
+ "normalized": false,
106
+ "rstrip": false,
107
+ "single_word": false,
108
+ "special": true
109
+ },
110
+ "128121": {
111
+ "content": "<summarize>",
112
+ "lstrip": false,
113
+ "normalized": false,
114
+ "rstrip": false,
115
+ "single_word": false,
116
+ "special": true
117
+ },
118
+ "128122": {
119
+ "content": "</summarize>",
120
+ "lstrip": false,
121
+ "normalized": false,
122
+ "rstrip": false,
123
+ "single_word": false,
124
+ "special": true
125
+ },
126
+ "128123": {
127
+ "content": "<retry>",
128
+ "lstrip": false,
129
+ "normalized": false,
130
+ "rstrip": false,
131
+ "single_word": false,
132
+ "special": true
133
+ },
134
+ "128124": {
135
+ "content": "</retry>",
136
+ "lstrip": false,
137
+ "normalized": false,
138
+ "rstrip": false,
139
+ "single_word": false,
140
+ "special": true
141
+ },
142
+ "128125": {
143
+ "content": "<conclude>",
144
+ "lstrip": false,
145
+ "normalized": false,
146
+ "rstrip": false,
147
+ "single_word": false,
148
+ "special": true
149
+ },
150
+ "128126": {
151
+ "content": "</conclude>",
152
+ "lstrip": false,
153
+ "normalized": false,
154
+ "rstrip": false,
155
+ "single_word": false,
156
+ "special": true
157
+ },
158
+ "128127": {
159
+ "content": "<|plugin|>",
160
+ "lstrip": false,
161
+ "normalized": false,
162
+ "rstrip": false,
163
+ "single_word": false,
164
+ "special": true
165
+ },
166
+ "128128": {
167
+ "content": "<|interpreter|>",
168
+ "lstrip": false,
169
+ "normalized": false,
170
+ "rstrip": false,
171
+ "single_word": false,
172
+ "special": true
173
+ },
174
+ "128129": {
175
+ "content": "<|action_end|>",
176
+ "lstrip": false,
177
+ "normalized": false,
178
+ "rstrip": false,
179
+ "single_word": false,
180
+ "special": true
181
+ },
182
+ "128130": {
183
+ "content": "<|action_start|>",
184
+ "lstrip": false,
185
+ "normalized": false,
186
+ "rstrip": false,
187
+ "single_word": false,
188
+ "special": true
189
+ },
190
+ "128131": {
191
+ "content": "<|im_end|>",
192
+ "lstrip": false,
193
+ "normalized": false,
194
+ "rstrip": false,
195
+ "single_word": false,
196
+ "special": true
197
+ },
198
+ "128132": {
199
+ "content": "<|im_start|>",
200
+ "lstrip": false,
201
+ "normalized": false,
202
+ "rstrip": false,
203
+ "single_word": false,
204
+ "special": true
205
+ }
206
+ },
207
+ "additional_special_tokens": [
208
+ "<|im_start|>",
209
+ "<|im_end|>",
210
+ "<|action_start|>",
211
+ "<|action_end|>",
212
+ "<|interpreter|>",
213
+ "<|plugin|>",
214
+ "<restate>",
215
+ "</restate>",
216
+ "<planning>",
217
+ "</planning>",
218
+ "<recollect>",
219
+ "</recollect>",
220
+ "<execution>",
221
+ "</execution>",
222
+ "<review>",
223
+ "</review>",
224
+ "<summarize>",
225
+ "</summarize>",
226
+ "<retry>",
227
+ "</retry>",
228
+ "<conclude>",
229
+ "</conclude>"
230
+ ],
231
+ "auto_map": {
232
+ "AutoTokenizer": [
233
+ "tokenization_internlm3.InternLM3Tokenizer",
234
+ null
235
+ ]
236
+ },
237
+ "bos_token": "<s>",
238
+ "chat_template": "{{ bos_token }}{% for message in messages %}{{'<|im_start|>' + message['role'] + '\n' + message['content'] + '<|im_end|>' + '\n'}}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant\n' }}{% endif %}",
239
+ "clean_up_tokenization_spaces": false,
240
+ "eos_token": "</s>",
241
+ "extra_special_tokens": {},
242
+ "model_max_length": 1000000000000000019884624838656,
243
+ "pad_token": "</s>",
244
+ "sp_model_kwargs": {},
245
+ "spaces_between_special_tokens": false,
246
+ "tokenizer_class": "InternLM3Tokenizer",
247
+ "unk_token": "<unk>",
248
+ "use_default_system_prompt": false
249
+ }