Jeronymous commited on
Commit
969b0e4
·
2 Parent(s): 05df019 13f5b08

Merge branch 'main' of https://huggingface.co/OpenLLM-France/Lucie-7B

Browse files
README.md CHANGED
@@ -33,7 +33,7 @@ https://github.com/huggingface/huggingface_hub/blob/main/src/huggingface_hub/tem
33
 
34
  * [Model Description](#model-description)
35
  <!-- * [Uses](#uses) -->
36
- * [Example code in python](#example-code-in-python)
37
  * [Load the model](#load-the-model)
38
  * [Sentence completion](#sentence-completion)
39
  * [Load a checkpoint](#load-a-checkpoint)
@@ -42,11 +42,13 @@ https://github.com/huggingface/huggingface_hub/blob/main/src/huggingface_hub/tem
42
  * [Training Procedure](#training-procedure)
43
  * [Neural Network Architecture](#neural-network-architecture)
44
  * [Training Hyperparameters](#training-hyperparameters)
45
- 1. [Main pre-training](#1-main-pre-training)
46
  2. [Context Extension](#2-context-extension)
47
  3. [Annealing](#3-annealing)
48
- * [Training logs and learning curves](#training-logs-and-learning-curves)
49
  <!-- * [Evaluation](#evaluation) -->
 
 
50
  * [Acknowledgements](#acknowledgements)
51
  * [Contact](#contact)
52
 
@@ -64,7 +66,7 @@ Italian (3.8%),
64
  and parallel data from those languages (2.5%),
65
  as well as several programming languages (14.7%).
66
 
67
- ## Example code in python
68
 
69
  ### Load the model
70
 
@@ -82,7 +84,7 @@ model = transformers.AutoModelForCausalLM.from_pretrained(model_name,
82
  ```
83
  ### Sentence completion
84
 
85
- Wrap the model in a text generation pipeline, and prepare some generation parameters:
86
  ```
87
  pipeline = transformers.pipeline("text-generation", model=model, tokenizer=tokenizer)
88
 
@@ -132,8 +134,8 @@ model = transformers.AutoModelForCausalLM.from_pretrained(model_name,
132
  )
133
  ```
134
  where `revision` can be one of:
135
- * "[`step0005000`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/step0005000)", "[`step0010000`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/step0010000)", "[`step0015000`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/step0015000)", "[`step0020000`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/step0020000)": each 5000 steps for the first pre-training steps (with a context length of 4096).
136
- * "[`step0025000`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/step0025000)", "[`step0050000`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/step0050000)", "[`step0075000`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/step0075000)", "[`step0100000`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/step0100000)", ..., "[`step0750000`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/step0750000)": each 25000 steps from 25k to 750k steps.
137
  * "[`step0753851`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/step0753851)": last pre-training step before context extension and annealing.
138
  * "[`extension_step0000250`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/extension_step0000250)", "[`extension_step0000500`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/extension_step0000500)", "[`extension_step0000750`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/extension_step0000750)", "[`extension_step0001000`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/extension_step0001000)", "[`extension_step0001220`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/extension_step0001220)": several checkpoints during context extension (with a context length of 32000).
139
 
@@ -149,7 +151,7 @@ The initial composition of the training data is as follows:
149
 
150
  ![Initial Data Composition](figures/fig_dataset_composition.png)
151
 
152
- Some of the data was upsampled to balance the training data distribution, and the final composition is as follows:
153
 
154
  ![Training Data Composition](figures/fig_dataset_composition_training.png)
155
 
@@ -157,7 +159,7 @@ Some of the data was upsampled to balance the training data distribution, and th
157
 
158
  Lucie-7B is a causal decoder-only model trained on a causal language modeling task (i.e., predict the next token).
159
 
160
- It was pre-trained on 512 H100 80GB GPUs for about 550\,000 GPU hours on [Jean Zay supercomputer](http://www.idris.fr/eng/jean-zay/jean-zay-presentation-eng.html).
161
 
162
  The training code is available at [https://github.com/OpenLLM-France/Lucie-Training](https://github.com/OpenLLM-France/Lucie-Training).
163
  It is based on [this fork of Megatron-DeepSpeed](https://github.com/OpenLLM-France/Megatron-DeepSpeed).
@@ -180,21 +182,21 @@ with the following hyperparameters:
180
  | Activation | `silu` |
181
  | RMS norm epsilon | 1e-5 |
182
 
183
- The parameter "theta" of Rotary Positional Embedding (RoPE) varied during the training process
184
- and is indicated in the tables with training hyperparameters below.
185
 
186
  #### Training Hyperparameters
187
 
188
  The training consisted of three main phases:
189
  1. Main pre-training on 3.1T tokens, with a context length of 4096,
190
  2. Context extension on 5B tokens, with a context length of 32000,
191
- 3. Annealing, with a selected subset of the training data with especially high quality.
 
192
 
193
  The details of each phase are given below.
194
 
195
- ##### 1. Main pre-training
196
 
197
- Training hyperparameters in torch/Megatron-DeepSpeed were the following:
198
  | **Hyperparameter** | **Value** |
199
  |------------------------|------------|
200
  | Total \# samples| 762 144 586 (3.1T tokens) |
@@ -236,9 +238,7 @@ Training hyperparameters are the same as above, with the following changes:
236
 
237
  TODO
238
 
239
- ### Training logs and learning curves
240
-
241
- 🚧 work in progress 🚧
242
 
243
  Training logs can be found in Tensorboard format in:
244
  * [`metadata/training_logs/`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/main/metadata/training_logs)
@@ -246,6 +246,24 @@ Training logs can be found in Tensorboard format in:
246
  in a zip file. Each file in the zip corresponds to a job of at most 20H of training (parallelized over 512 GPUs).
247
  <br> └── [`2_extension/`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/main/metadata/training_logs/2_extension) folder containing the training log for the context extension phase, which was done in a single job of around 13H of training (parallelized over 128 GPUs).
248
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
249
  ## Acknowledgements
250
 
251
  This work was performed using HPC resources from GENCI–IDRIS (Grant 2024-GC011015444).
@@ -257,10 +275,22 @@ Julie Hunter (LINAGORA),
257
  Jean-Pierre Lorré (LINAGORA),
258
  Jérôme Louradour (LINAGORA),
259
  Michel-Marie Maudet (LINAGORA),
260
- Olivier Gouvert (LINAGORA),
261
- Pierre-Carl Langlais (OpSci),
262
  Yaya Sy (LORIA).
263
 
 
 
 
 
 
 
 
 
 
 
 
 
 
264
  ## Contact
265
 
266
 
33
 
34
  * [Model Description](#model-description)
35
  <!-- * [Uses](#uses) -->
36
+ * [Example Code in Python](#example-code-in-python)
37
  * [Load the model](#load-the-model)
38
  * [Sentence completion](#sentence-completion)
39
  * [Load a checkpoint](#load-a-checkpoint)
 
42
  * [Training Procedure](#training-procedure)
43
  * [Neural Network Architecture](#neural-network-architecture)
44
  * [Training Hyperparameters](#training-hyperparameters)
45
+ 1. [Main Pre-training](#1-main-pre-training)
46
  2. [Context Extension](#2-context-extension)
47
  3. [Annealing](#3-annealing)
48
+ * [Training Logs and Learning Curves](#training-logs-and-learning-curves)
49
  <!-- * [Evaluation](#evaluation) -->
50
+ * [Disclaimer](#disclaimer)
51
+ * [Citation](#citation)
52
  * [Acknowledgements](#acknowledgements)
53
  * [Contact](#contact)
54
 
 
66
  and parallel data from those languages (2.5%),
67
  as well as several programming languages (14.7%).
68
 
69
+ ## Example Code in Python
70
 
71
  ### Load the model
72
 
 
84
  ```
85
  ### Sentence completion
86
 
87
+ Wrap the model in a text generation pipeline, and specify some generation parameters:
88
  ```
89
  pipeline = transformers.pipeline("text-generation", model=model, tokenizer=tokenizer)
90
 
 
134
  )
135
  ```
136
  where `revision` can be one of:
137
+ * "[`step0005000`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/step0005000)", "[`step0010000`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/step0010000)", "[`step0015000`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/step0015000)", "[`step0020000`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/step0020000)": every 5000 steps for the first pre-training steps (with a context length of 4096).
138
+ * "[`step0025000`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/step0025000)", "[`step0050000`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/step0050000)", "[`step0075000`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/step0075000)", "[`step0100000`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/step0100000)", ..., "[`step0750000`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/step0750000)": every 25000 steps from 25k to 750k steps.
139
  * "[`step0753851`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/step0753851)": last pre-training step before context extension and annealing.
140
  * "[`extension_step0000250`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/extension_step0000250)", "[`extension_step0000500`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/extension_step0000500)", "[`extension_step0000750`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/extension_step0000750)", "[`extension_step0001000`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/extension_step0001000)", "[`extension_step0001220`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/extension_step0001220)": several checkpoints during context extension (with a context length of 32000).
141
 
 
151
 
152
  ![Initial Data Composition](figures/fig_dataset_composition.png)
153
 
154
+ Some of the data was upsampled to balance the training data distribution yielding the following composition for training:
155
 
156
  ![Training Data Composition](figures/fig_dataset_composition_training.png)
157
 
 
159
 
160
  Lucie-7B is a causal decoder-only model trained on a causal language modeling task (i.e., predict the next token).
161
 
162
+ It was pre-trained on 512 H100 80GB GPUs for about 550\,000 GPU hours on the [Jean Zay supercomputer](http://www.idris.fr/eng/jean-zay/jean-zay-presentation-eng.html).
163
 
164
  The training code is available at [https://github.com/OpenLLM-France/Lucie-Training](https://github.com/OpenLLM-France/Lucie-Training).
165
  It is based on [this fork of Megatron-DeepSpeed](https://github.com/OpenLLM-France/Megatron-DeepSpeed).
 
182
  | Activation | `silu` |
183
  | RMS norm epsilon | 1e-5 |
184
 
185
+ The "theta" parameter of Rotary Positional Embedding (RoPE) was increased during the training process. Its values are indicated in the tables with training hyperparameters below.
 
186
 
187
  #### Training Hyperparameters
188
 
189
  The training consisted of three main phases:
190
  1. Main pre-training on 3.1T tokens, with a context length of 4096,
191
  2. Context extension on 5B tokens, with a context length of 32000,
192
+ 3. Annealing on 5B tokens of high quality data composed of a mixture of new data and data seen during training.
193
+ <!-- perhaps cite the dataset for annealing -->
194
 
195
  The details of each phase are given below.
196
 
197
+ ##### 1. Main Pre-training
198
 
199
+ Training hyperparameters in torch/Megatron-DeepSpeed were as follows:
200
  | **Hyperparameter** | **Value** |
201
  |------------------------|------------|
202
  | Total \# samples| 762 144 586 (3.1T tokens) |
 
238
 
239
  TODO
240
 
241
+ ### Training Logs and Learning Curves
 
 
242
 
243
  Training logs can be found in Tensorboard format in:
244
  * [`metadata/training_logs/`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/main/metadata/training_logs)
 
246
  in a zip file. Each file in the zip corresponds to a job of at most 20H of training (parallelized over 512 GPUs).
247
  <br> └── [`2_extension/`](https://huggingface.co/OpenLLM-France/Lucie-7B/tree/main/metadata/training_logs/2_extension) folder containing the training log for the context extension phase, which was done in a single job of around 13H of training (parallelized over 128 GPUs).
248
 
249
+ 🚧 TODO: Plot convergence curve (and link CSV ?) 🚧
250
+
251
+ Evaluation results on benchmark datasets of checkpoints of Lucie-7B throughout the training process are available at
252
+ [metadata/evaluation_learning_curve_lucie.csv](metadata/evaluation_learning_curve_lucie.csv).
253
+ Evaluation results of baseline models on the same benchmark datasets are available at
254
+ [metadata/evaluation_baselines.csv](metadata/evaluation_baselines.csv).
255
+
256
+ 🚧 TODO: Plot learning curves 🚧
257
+
258
+ ## Disclaimer
259
+
260
+ Lucie-7B is a language model trained solely to predict the most probable next word in a sequence. Despite efforts to filter the [Lucie Training Dataset](https://huggingface.co/datasets/OpenLLM-France/Lucie-Training-Dataset), it is possible that Lucie-7B encountered strings containing toxic or offensive language during its training and as a result, it may generate strings of similar quality. To limit such behavior, it is advised to fine-tune Lucie-7B through instruction and/or preference tuning (DPO, RLHF, etc.).
261
+
262
+ ## Citation
263
+
264
+ TODO
265
+
266
+
267
  ## Acknowledgements
268
 
269
  This work was performed using HPC resources from GENCI–IDRIS (Grant 2024-GC011015444).
 
275
  Jean-Pierre Lorré (LINAGORA),
276
  Jérôme Louradour (LINAGORA),
277
  Michel-Marie Maudet (LINAGORA),
278
+ Olivier Gouvert (LINAGORA), and
 
279
  Yaya Sy (LORIA).
280
 
281
+ We thank
282
+ Anastasia Stasenko (OpSci/Pleias),
283
+ Clément Bénesse (Opsci),
284
+ Guokan Shang (MBZUAI),
285
+ Ismaïl Harrando (LINAGORA),
286
+ Joël Gombin (Opsci),
287
+ Jordan Ricker (Opsci),
288
+ Olivier Ferret (CEA),
289
+ Pierre-Carl Langlais (OpSci/Pleias),
290
+ and
291
+ Rachel Bawden (INRIA),
292
+ for their helpful input.
293
+
294
  ## Contact
295
 
296
model-00001-of-00003.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:cc8b337f3bd69430af103f927c1d838d29c158bd29bdfdc12f69405b37e49441
3
+ size 4924315872
model-00002-of-00003.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:efe1c9aaf6bb27991347d4f1fa47eb53ba71ef4163db6b1d5491c48690626b9a
3
+ size 4983047384
model-00003-of-00003.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5117b407cc109ad9d83616ae7cad460566421acc167668c2c992fc073ff4c113
3
+ size 3506598760
model.safetensors.index.json ADDED
@@ -0,0 +1,330 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "metadata": {
3
+ "total_size": 13413924864
4
+ },
5
+ "weight_map": {
6
+ "model.embed_tokens.weight": "model-00001-of-00003.safetensors",
7
+ "model.norm.weight": "model-00001-of-00003.safetensors",
8
+ "lm_head.weight": "model-00001-of-00003.safetensors",
9
+ "model.layers.0.input_layernorm.weight": "model-00001-of-00003.safetensors",
10
+ "model.layers.0.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
11
+ "model.layers.0.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
12
+ "model.layers.0.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
13
+ "model.layers.0.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
14
+ "model.layers.0.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
15
+ "model.layers.0.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
16
+ "model.layers.0.self_attn.rotary_emb.inv_freq": "model-00001-of-00003.safetensors",
17
+ "model.layers.1.input_layernorm.weight": "model-00001-of-00003.safetensors",
18
+ "model.layers.1.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
19
+ "model.layers.1.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
20
+ "model.layers.1.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
21
+ "model.layers.1.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
22
+ "model.layers.1.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
23
+ "model.layers.1.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
24
+ "model.layers.1.self_attn.rotary_emb.inv_freq": "model-00001-of-00003.safetensors",
25
+ "model.layers.2.input_layernorm.weight": "model-00001-of-00003.safetensors",
26
+ "model.layers.2.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
27
+ "model.layers.2.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
28
+ "model.layers.2.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
29
+ "model.layers.2.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
30
+ "model.layers.2.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
31
+ "model.layers.2.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
32
+ "model.layers.2.self_attn.rotary_emb.inv_freq": "model-00001-of-00003.safetensors",
33
+ "model.layers.3.input_layernorm.weight": "model-00001-of-00003.safetensors",
34
+ "model.layers.3.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
35
+ "model.layers.3.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
36
+ "model.layers.3.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
37
+ "model.layers.3.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
38
+ "model.layers.3.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
39
+ "model.layers.3.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
40
+ "model.layers.3.self_attn.rotary_emb.inv_freq": "model-00001-of-00003.safetensors",
41
+ "model.layers.4.input_layernorm.weight": "model-00001-of-00003.safetensors",
42
+ "model.layers.4.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
43
+ "model.layers.4.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
44
+ "model.layers.4.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
45
+ "model.layers.4.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
46
+ "model.layers.4.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
47
+ "model.layers.4.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
48
+ "model.layers.4.self_attn.rotary_emb.inv_freq": "model-00001-of-00003.safetensors",
49
+ "model.layers.5.input_layernorm.weight": "model-00001-of-00003.safetensors",
50
+ "model.layers.5.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
51
+ "model.layers.5.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
52
+ "model.layers.5.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
53
+ "model.layers.5.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
54
+ "model.layers.5.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
55
+ "model.layers.5.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
56
+ "model.layers.5.self_attn.rotary_emb.inv_freq": "model-00001-of-00003.safetensors",
57
+ "model.layers.6.input_layernorm.weight": "model-00001-of-00003.safetensors",
58
+ "model.layers.6.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
59
+ "model.layers.6.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
60
+ "model.layers.6.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
61
+ "model.layers.6.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
62
+ "model.layers.6.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
63
+ "model.layers.6.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
64
+ "model.layers.6.self_attn.rotary_emb.inv_freq": "model-00001-of-00003.safetensors",
65
+ "model.layers.7.input_layernorm.weight": "model-00001-of-00003.safetensors",
66
+ "model.layers.7.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
67
+ "model.layers.7.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
68
+ "model.layers.7.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
69
+ "model.layers.7.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
70
+ "model.layers.7.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
71
+ "model.layers.7.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
72
+ "model.layers.7.self_attn.rotary_emb.inv_freq": "model-00001-of-00003.safetensors",
73
+ "model.layers.8.input_layernorm.weight": "model-00001-of-00003.safetensors",
74
+ "model.layers.8.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
75
+ "model.layers.8.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
76
+ "model.layers.8.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
77
+ "model.layers.8.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
78
+ "model.layers.8.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
79
+ "model.layers.8.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
80
+ "model.layers.8.self_attn.rotary_emb.inv_freq": "model-00001-of-00003.safetensors",
81
+ "model.layers.9.input_layernorm.weight": "model-00001-of-00003.safetensors",
82
+ "model.layers.9.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
83
+ "model.layers.9.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
84
+ "model.layers.9.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
85
+ "model.layers.9.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
86
+ "model.layers.9.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
87
+ "model.layers.9.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
88
+ "model.layers.9.self_attn.rotary_emb.inv_freq": "model-00001-of-00003.safetensors",
89
+ "model.layers.10.input_layernorm.weight": "model-00001-of-00003.safetensors",
90
+ "model.layers.10.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
91
+ "model.layers.0.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
92
+ "model.layers.0.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
93
+ "model.layers.1.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
94
+ "model.layers.1.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
95
+ "model.layers.2.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
96
+ "model.layers.2.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
97
+ "model.layers.3.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
98
+ "model.layers.3.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
99
+ "model.layers.4.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
100
+ "model.layers.4.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
101
+ "model.layers.5.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
102
+ "model.layers.5.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
103
+ "model.layers.6.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
104
+ "model.layers.6.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
105
+ "model.layers.7.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
106
+ "model.layers.7.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
107
+ "model.layers.8.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
108
+ "model.layers.8.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
109
+ "model.layers.9.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
110
+ "model.layers.9.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
111
+ "model.layers.10.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
112
+ "model.layers.10.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
113
+ "model.layers.10.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
114
+ "model.layers.10.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
115
+ "model.layers.10.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
116
+ "model.layers.10.self_attn.rotary_emb.inv_freq": "model-00002-of-00003.safetensors",
117
+ "model.layers.11.input_layernorm.weight": "model-00002-of-00003.safetensors",
118
+ "model.layers.11.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
119
+ "model.layers.11.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
120
+ "model.layers.11.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
121
+ "model.layers.11.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
122
+ "model.layers.11.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
123
+ "model.layers.11.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
124
+ "model.layers.11.self_attn.rotary_emb.inv_freq": "model-00002-of-00003.safetensors",
125
+ "model.layers.12.input_layernorm.weight": "model-00002-of-00003.safetensors",
126
+ "model.layers.12.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
127
+ "model.layers.12.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
128
+ "model.layers.12.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
129
+ "model.layers.12.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
130
+ "model.layers.12.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
131
+ "model.layers.12.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
132
+ "model.layers.12.self_attn.rotary_emb.inv_freq": "model-00002-of-00003.safetensors",
133
+ "model.layers.13.input_layernorm.weight": "model-00002-of-00003.safetensors",
134
+ "model.layers.13.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
135
+ "model.layers.13.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
136
+ "model.layers.13.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
137
+ "model.layers.13.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
138
+ "model.layers.13.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
139
+ "model.layers.13.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
140
+ "model.layers.13.self_attn.rotary_emb.inv_freq": "model-00002-of-00003.safetensors",
141
+ "model.layers.14.input_layernorm.weight": "model-00002-of-00003.safetensors",
142
+ "model.layers.14.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
143
+ "model.layers.14.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
144
+ "model.layers.14.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
145
+ "model.layers.14.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
146
+ "model.layers.14.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
147
+ "model.layers.14.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
148
+ "model.layers.14.self_attn.rotary_emb.inv_freq": "model-00002-of-00003.safetensors",
149
+ "model.layers.15.input_layernorm.weight": "model-00002-of-00003.safetensors",
150
+ "model.layers.15.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
151
+ "model.layers.15.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
152
+ "model.layers.15.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
153
+ "model.layers.15.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
154
+ "model.layers.15.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
155
+ "model.layers.15.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
156
+ "model.layers.15.self_attn.rotary_emb.inv_freq": "model-00002-of-00003.safetensors",
157
+ "model.layers.16.input_layernorm.weight": "model-00002-of-00003.safetensors",
158
+ "model.layers.16.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
159
+ "model.layers.16.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
160
+ "model.layers.16.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
161
+ "model.layers.16.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
162
+ "model.layers.16.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
163
+ "model.layers.16.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
164
+ "model.layers.16.self_attn.rotary_emb.inv_freq": "model-00002-of-00003.safetensors",
165
+ "model.layers.17.input_layernorm.weight": "model-00002-of-00003.safetensors",
166
+ "model.layers.17.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
167
+ "model.layers.17.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
168
+ "model.layers.17.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
169
+ "model.layers.17.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
170
+ "model.layers.17.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
171
+ "model.layers.17.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
172
+ "model.layers.17.self_attn.rotary_emb.inv_freq": "model-00002-of-00003.safetensors",
173
+ "model.layers.18.input_layernorm.weight": "model-00002-of-00003.safetensors",
174
+ "model.layers.18.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
175
+ "model.layers.18.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
176
+ "model.layers.18.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
177
+ "model.layers.18.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
178
+ "model.layers.18.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
179
+ "model.layers.18.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
180
+ "model.layers.18.self_attn.rotary_emb.inv_freq": "model-00002-of-00003.safetensors",
181
+ "model.layers.19.input_layernorm.weight": "model-00002-of-00003.safetensors",
182
+ "model.layers.19.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
183
+ "model.layers.19.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
184
+ "model.layers.19.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
185
+ "model.layers.19.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
186
+ "model.layers.19.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
187
+ "model.layers.19.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
188
+ "model.layers.19.self_attn.rotary_emb.inv_freq": "model-00002-of-00003.safetensors",
189
+ "model.layers.20.input_layernorm.weight": "model-00002-of-00003.safetensors",
190
+ "model.layers.20.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
191
+ "model.layers.20.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
192
+ "model.layers.20.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
193
+ "model.layers.20.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
194
+ "model.layers.20.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
195
+ "model.layers.20.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
196
+ "model.layers.20.self_attn.rotary_emb.inv_freq": "model-00002-of-00003.safetensors",
197
+ "model.layers.21.input_layernorm.weight": "model-00002-of-00003.safetensors",
198
+ "model.layers.21.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
199
+ "model.layers.21.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
200
+ "model.layers.21.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
201
+ "model.layers.21.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
202
+ "model.layers.21.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
203
+ "model.layers.21.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
204
+ "model.layers.21.self_attn.rotary_emb.inv_freq": "model-00002-of-00003.safetensors",
205
+ "model.layers.22.input_layernorm.weight": "model-00002-of-00003.safetensors",
206
+ "model.layers.22.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
207
+ "model.layers.22.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
208
+ "model.layers.22.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
209
+ "model.layers.22.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
210
+ "model.layers.22.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
211
+ "model.layers.10.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
212
+ "model.layers.10.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
213
+ "model.layers.11.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
214
+ "model.layers.11.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
215
+ "model.layers.12.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
216
+ "model.layers.12.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
217
+ "model.layers.13.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
218
+ "model.layers.13.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
219
+ "model.layers.14.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
220
+ "model.layers.14.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
221
+ "model.layers.15.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
222
+ "model.layers.15.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
223
+ "model.layers.16.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
224
+ "model.layers.16.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
225
+ "model.layers.17.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
226
+ "model.layers.17.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
227
+ "model.layers.18.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
228
+ "model.layers.18.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
229
+ "model.layers.19.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
230
+ "model.layers.19.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
231
+ "model.layers.20.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
232
+ "model.layers.20.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
233
+ "model.layers.21.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
234
+ "model.layers.21.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
235
+ "model.layers.22.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
236
+ "model.layers.22.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
237
+ "model.layers.22.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
238
+ "model.layers.22.self_attn.rotary_emb.inv_freq": "model-00003-of-00003.safetensors",
239
+ "model.layers.23.input_layernorm.weight": "model-00003-of-00003.safetensors",
240
+ "model.layers.23.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
241
+ "model.layers.23.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
242
+ "model.layers.23.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
243
+ "model.layers.23.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
244
+ "model.layers.23.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
245
+ "model.layers.23.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
246
+ "model.layers.23.self_attn.rotary_emb.inv_freq": "model-00003-of-00003.safetensors",
247
+ "model.layers.24.input_layernorm.weight": "model-00003-of-00003.safetensors",
248
+ "model.layers.24.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
249
+ "model.layers.24.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
250
+ "model.layers.24.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
251
+ "model.layers.24.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
252
+ "model.layers.24.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
253
+ "model.layers.24.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
254
+ "model.layers.24.self_attn.rotary_emb.inv_freq": "model-00003-of-00003.safetensors",
255
+ "model.layers.25.input_layernorm.weight": "model-00003-of-00003.safetensors",
256
+ "model.layers.25.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
257
+ "model.layers.25.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
258
+ "model.layers.25.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
259
+ "model.layers.25.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
260
+ "model.layers.25.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
261
+ "model.layers.25.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
262
+ "model.layers.25.self_attn.rotary_emb.inv_freq": "model-00003-of-00003.safetensors",
263
+ "model.layers.26.input_layernorm.weight": "model-00003-of-00003.safetensors",
264
+ "model.layers.26.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
265
+ "model.layers.26.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
266
+ "model.layers.26.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
267
+ "model.layers.26.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
268
+ "model.layers.26.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
269
+ "model.layers.26.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
270
+ "model.layers.26.self_attn.rotary_emb.inv_freq": "model-00003-of-00003.safetensors",
271
+ "model.layers.27.input_layernorm.weight": "model-00003-of-00003.safetensors",
272
+ "model.layers.27.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
273
+ "model.layers.27.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
274
+ "model.layers.27.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
275
+ "model.layers.27.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
276
+ "model.layers.27.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
277
+ "model.layers.27.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
278
+ "model.layers.27.self_attn.rotary_emb.inv_freq": "model-00003-of-00003.safetensors",
279
+ "model.layers.28.input_layernorm.weight": "model-00003-of-00003.safetensors",
280
+ "model.layers.28.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
281
+ "model.layers.28.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
282
+ "model.layers.28.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
283
+ "model.layers.28.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
284
+ "model.layers.28.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
285
+ "model.layers.28.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
286
+ "model.layers.28.self_attn.rotary_emb.inv_freq": "model-00003-of-00003.safetensors",
287
+ "model.layers.29.input_layernorm.weight": "model-00003-of-00003.safetensors",
288
+ "model.layers.29.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
289
+ "model.layers.29.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
290
+ "model.layers.29.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
291
+ "model.layers.29.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
292
+ "model.layers.29.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
293
+ "model.layers.29.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
294
+ "model.layers.29.self_attn.rotary_emb.inv_freq": "model-00003-of-00003.safetensors",
295
+ "model.layers.30.input_layernorm.weight": "model-00003-of-00003.safetensors",
296
+ "model.layers.30.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
297
+ "model.layers.30.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
298
+ "model.layers.30.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
299
+ "model.layers.30.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
300
+ "model.layers.30.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
301
+ "model.layers.30.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
302
+ "model.layers.30.self_attn.rotary_emb.inv_freq": "model-00003-of-00003.safetensors",
303
+ "model.layers.31.input_layernorm.weight": "model-00003-of-00003.safetensors",
304
+ "model.layers.31.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
305
+ "model.layers.31.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
306
+ "model.layers.31.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
307
+ "model.layers.31.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
308
+ "model.layers.31.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
309
+ "model.layers.31.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
310
+ "model.layers.31.self_attn.rotary_emb.inv_freq": "model-00003-of-00003.safetensors",
311
+ "model.layers.23.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
312
+ "model.layers.23.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
313
+ "model.layers.24.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
314
+ "model.layers.24.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
315
+ "model.layers.25.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
316
+ "model.layers.25.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
317
+ "model.layers.26.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
318
+ "model.layers.26.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
319
+ "model.layers.27.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
320
+ "model.layers.27.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
321
+ "model.layers.28.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
322
+ "model.layers.28.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
323
+ "model.layers.29.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
324
+ "model.layers.29.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
325
+ "model.layers.30.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
326
+ "model.layers.30.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
327
+ "model.layers.31.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
328
+ "model.layers.31.self_attn.v_proj.weight": "model-00003-of-00003.safetensors"
329
+ }
330
+ }