ChuckMcSneed commited on
Commit
16527bc
·
verified ·
1 Parent(s): 8f4110e

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +835 -0
README.md ADDED
@@ -0,0 +1,835 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: other
3
+ license_name: databricks-open-model-license
4
+ license_link: https://www.databricks.com/legal/open-model-license
5
+ base_model: databricks/dbrx-base
6
+ tags:
7
+ - generated_from_trainer
8
+ - axolotl
9
+ datasets:
10
+ - cognitivecomputations/Dolphin-2.9
11
+ - teknium/OpenHermes-2.5
12
+ - m-a-p/CodeFeedback-Filtered-Instruction
13
+ - cognitivecomputations/dolphin-coder
14
+ - cognitivecomputations/samantha-data
15
+ - microsoft/orca-math-word-problems-200k
16
+ - Locutusque/function-calling-chatml
17
+ - internlm/Agent-FLAN
18
+ ---
19
+
20
+ # Dolphin 2.9.1 DBRX 🐬
21
+
22
+ Curated and trained by Eric Hartford, Lucas Atkins, and Fernando Fernandes, and Cognitive Computations
23
+
24
+ [![Discord](https://img.shields.io/discord/1156064224225808488?logo=Discord&logoColor=%23ffffff&label=Discord&link=https%3A%2F%2Fdiscord.gg%2FtCMkMDDHwm)](https://discord.gg/cognitivecomputations)
25
+ Discord: https://discord.gg/cognitivecomputations
26
+
27
+ <img src="https://cdn-uploads.huggingface.co/production/uploads/63111b2d88942700629f5771/ldkN1J0WIDQwU4vutGYiD.png" width="600" />
28
+
29
+ Our appreciation for the sponsors of Dolphin 2.9.1:
30
+ - [Crusoe Cloud](https://crusoe.ai/) - provided excellent on-demand 8xH100 node
31
+
32
+ This model is based on [databricks/dbrx-base](https://huggingface.co/databricks/dbrx-base), and is governed by [databricks-open-model-license](https://www.databricks.com/legal/open-model-license)
33
+
34
+ The base model has 32k context, and the full-weight fine-tuning was with 4k sequence length.
35
+
36
+ This model was trained FFT on parameters selected by [Laser Scanner](https://github.com/cognitivecomputations/laserRMT/blob/main/laser_scanner.py), using ChatML prompt template format.
37
+
38
+ example:
39
+
40
+ ```
41
+ <|im_start|>system
42
+ You are Dolphin, a helpful AI assistant.<|im_end|>
43
+ <|im_start|>user
44
+ {prompt}<|im_end|>
45
+ <|im_start|>assistant
46
+
47
+ ```
48
+
49
+ Dolphin-2.9.1 has a variety of instruction, conversational, and coding skills. It also has initial agentic abilities and supports function calling.
50
+
51
+ Dolphin is uncensored. We have filtered the dataset to remove alignment and bias. This makes the model more compliant. You are advised to implement your own alignment layer before exposing the model as a service. It will be highly compliant with any requests, even unethical ones. Please read my blog post about uncensored models. https://erichartford.com/uncensored-models You are responsible for any content you create using this model. Enjoy responsibly.
52
+
53
+ Dolphin is licensed according to Meta's Llama license. We grant permission for any use, including commercial, that falls within accordance with Meta's Llama-3 license. Dolphin was trained on data generated from GPT4, among other models.
54
+
55
+ ## Evals
56
+
57
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/63111b2d88942700629f5771/tVh5xVCGvjPyLgMCqp-IY.png)
58
+
59
+ ## Training
60
+
61
+ [<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
62
+ <details><summary>See axolotl config</summary>
63
+
64
+ axolotl version: `0.4.0`
65
+ ```yaml
66
+ base_model: /workspace/axolotl/dbrx-checkpoint
67
+ model_type: AutoModelForCausalLM
68
+ tokenizer_type: AutoTokenizer
69
+ trust_remote_code: true
70
+
71
+ load_in_8bit: false
72
+ # load_in_4bit: true
73
+ strict: false
74
+
75
+ # adapter: qlora
76
+ # lora_modules_to_save: [embed_tokens, lm_head]
77
+
78
+ # lora_r: 32
79
+ # lora_alpha: 16
80
+ # lora_dropout: 0.05
81
+ # lora_target_linear: false
82
+ # lora_fan_in_fan_out:
83
+
84
+ datasets:
85
+ - path: /workspace/datasets/dolphin-2.9/dolphin201-sharegpt2.jsonl
86
+ type: sharegpt
87
+ conversation: chatml
88
+ # - path: /workspace/datasets/dolphin-2.9/Ultrachat200kunfiltered.jsonl
89
+ # type: sharegpt
90
+ # conversation: chatml
91
+ - path: /workspace/datasets/dolphin-2.9/dolphin-coder-translate-sharegpt2.jsonl
92
+ type: sharegpt
93
+ conversation: chatml
94
+ - path: /workspace/datasets/dolphin-2.9/dolphin-coder-codegen-sharegpt2.jsonl
95
+ type: sharegpt
96
+ conversation: chatml
97
+ - path: /workspace/datasets/dolphin-2.9/m-a-p_Code-Feedback-sharegpt-unfiltered.jsonl
98
+ type: sharegpt
99
+ conversation: chatml
100
+ - path: /workspace/datasets/dolphin-2.9/m-a-p_CodeFeedback-Filtered-Instruction-sharegpt-unfiltered.jsonl
101
+ type: sharegpt
102
+ conversation: chatml
103
+ - path: /workspace/datasets/dolphin-2.9/not_samantha_norefusals.jsonl
104
+ type: sharegpt
105
+ conversation: chatml
106
+ - path: /workspace/datasets/dolphin-2.9/Orca-Math-resort-unfiltered.jsonl
107
+ type: sharegpt
108
+ conversation: chatml
109
+ - path: /workspace/datasets/dolphin-2.9/agent_instruct_react_unfiltered.jsonl
110
+ type: sharegpt
111
+ conversation: chatml
112
+ - path: /workspace/datasets/dolphin-2.9/toolbench_instruct_j1s1_3k_unfiltered.jsonl
113
+ type: sharegpt
114
+ conversation: chatml
115
+ - path: /workspace/datasets/dolphin-2.9/toolbench_negative_unfiltered.jsonl
116
+ type: sharegpt
117
+ conversation: chatml
118
+ - path: /workspace/datasets/dolphin-2.9/toolbench_react_10p_unfiltered.jsonl
119
+ type: sharegpt
120
+ conversation: chatml
121
+ - path: /workspace/datasets/dolphin-2.9/toolbench_tflan_cot_30p_unfiltered.jsonl
122
+ type: sharegpt
123
+ conversation: chatml
124
+ - path: /workspace/datasets/dolphin-2.9/openhermes200k_unfiltered.jsonl
125
+ type: sharegpt
126
+ conversation: chatml
127
+ # - path: /workspace/datasets/dolphin-2.9/SystemConversations.jsonl
128
+ # type: sharegpt
129
+ # conversation: chatml
130
+
131
+ chat_template: chatml
132
+
133
+ unfrozen_parameters:
134
+ - ^lm_head.weight$
135
+ # ffn.experts.mlp_experts.0.v1 layers
136
+ - transformer.blocks.30.ffn.experts.mlp_experts.0.v1
137
+ - transformer.blocks.32.ffn.experts.mlp_experts.0.v1
138
+ - transformer.blocks.25.ffn.experts.mlp_experts.0.v1
139
+ - transformer.blocks.15.ffn.experts.mlp_experts.0.v1
140
+ - transformer.blocks.22.ffn.experts.mlp_experts.0.v1
141
+ - transformer.blocks.31.ffn.experts.mlp_experts.0.v1
142
+ - transformer.blocks.7.ffn.experts.mlp_experts.0.v1
143
+ - transformer.blocks.21.ffn.experts.mlp_experts.0.v1
144
+ - transformer.blocks.8.ffn.experts.mlp_experts.0.v1
145
+ - transformer.blocks.23.ffn.experts.mlp_experts.0.v1
146
+ # ffn.experts.mlp_experts.0.w1 layers
147
+ - transformer.blocks.7.ffn.experts.mlp_experts.0.w1
148
+ - transformer.blocks.8.ffn.experts.mlp_experts.0.w1
149
+ - transformer.blocks.30.ffn.experts.mlp_experts.0.w1
150
+ - transformer.blocks.4.ffn.experts.mlp_experts.0.w1
151
+ - transformer.blocks.0.ffn.experts.mlp_experts.0.w1
152
+ - transformer.blocks.32.ffn.experts.mlp_experts.0.w1
153
+ - transformer.blocks.6.ffn.experts.mlp_experts.0.w1
154
+ - transformer.blocks.3.ffn.experts.mlp_experts.0.w1
155
+ - transformer.blocks.25.ffn.experts.mlp_experts.0.w1
156
+ - transformer.blocks.5.ffn.experts.mlp_experts.0.w1
157
+ # ffn.experts.mlp_experts.0.w2 layers
158
+ - transformer.blocks.25.ffn.experts.mlp_experts.0.w2
159
+ - transformer.blocks.22.ffn.experts.mlp_experts.0.w2
160
+ - transformer.blocks.27.ffn.experts.mlp_experts.0.w2
161
+ - transformer.blocks.26.ffn.experts.mlp_experts.0.w2
162
+ - transformer.blocks.4.ffn.experts.mlp_experts.0.w2
163
+ - transformer.blocks.29.ffn.experts.mlp_experts.0.w2
164
+ - transformer.blocks.32.ffn.experts.mlp_experts.0.w2
165
+ - transformer.blocks.5.ffn.experts.mlp_experts.0.w2
166
+ - transformer.blocks.7.ffn.experts.mlp_experts.0.w2
167
+ - transformer.blocks.3.ffn.experts.mlp_experts.0.w2
168
+ # ffn.experts.mlp_experts.1.v1 layers
169
+ - transformer.blocks.27.ffn.experts.mlp_experts.1.v1
170
+ - transformer.blocks.25.ffn.experts.mlp_experts.1.v1
171
+ - transformer.blocks.29.ffn.experts.mlp_experts.1.v1
172
+ - transformer.blocks.33.ffn.experts.mlp_experts.1.v1
173
+ - transformer.blocks.23.ffn.experts.mlp_experts.1.v1
174
+ - transformer.blocks.30.ffn.experts.mlp_experts.1.v1
175
+ - transformer.blocks.6.ffn.experts.mlp_experts.1.v1
176
+ - transformer.blocks.21.ffn.experts.mlp_experts.1.v1
177
+ - transformer.blocks.15.ffn.experts.mlp_experts.1.v1
178
+ - transformer.blocks.7.ffn.experts.mlp_experts.1.v1
179
+ # ffn.experts.mlp_experts.1.w1 layers
180
+ - transformer.blocks.0.ffn.experts.mlp_experts.1.w1
181
+ - transformer.blocks.6.ffn.experts.mlp_experts.1.w1
182
+ - transformer.blocks.7.ffn.experts.mlp_experts.1.w1
183
+ - transformer.blocks.4.ffn.experts.mlp_experts.1.w1
184
+ - transformer.blocks.8.ffn.experts.mlp_experts.1.w1
185
+ - transformer.blocks.29.ffn.experts.mlp_experts.1.w1
186
+ - transformer.blocks.33.ffn.experts.mlp_experts.1.w1
187
+ - transformer.blocks.27.ffn.experts.mlp_experts.1.w1
188
+ - transformer.blocks.1.ffn.experts.mlp_experts.1.w1
189
+ - transformer.blocks.10.ffn.experts.mlp_experts.1.w1
190
+ # ffn.experts.mlp_experts.1.w2 layers
191
+ - transformer.blocks.25.ffn.experts.mlp_experts.1.w2
192
+ - transformer.blocks.23.ffn.experts.mlp_experts.1.w2
193
+ - transformer.blocks.27.ffn.experts.mlp_experts.1.w2
194
+ - transformer.blocks.29.ffn.experts.mlp_experts.1.w2
195
+ - transformer.blocks.31.ffn.experts.mlp_experts.1.w2
196
+ - transformer.blocks.4.ffn.experts.mlp_experts.1.w2
197
+ - transformer.blocks.32.ffn.experts.mlp_experts.1.w2
198
+ - transformer.blocks.30.ffn.experts.mlp_experts.1.w2
199
+ - transformer.blocks.21.ffn.experts.mlp_experts.1.w2
200
+ - transformer.blocks.33.ffn.experts.mlp_experts.1.w2
201
+ # ffn.experts.mlp_experts.10.v1 layers
202
+ - transformer.blocks.28.ffn.experts.mlp_experts.10.v1
203
+ - transformer.blocks.34.ffn.experts.mlp_experts.10.v1
204
+ - transformer.blocks.33.ffn.experts.mlp_experts.10.v1
205
+ - transformer.blocks.26.ffn.experts.mlp_experts.10.v1
206
+ - transformer.blocks.32.ffn.experts.mlp_experts.10.v1
207
+ - transformer.blocks.30.ffn.experts.mlp_experts.10.v1
208
+ - transformer.blocks.36.ffn.experts.mlp_experts.10.v1
209
+ - transformer.blocks.24.ffn.experts.mlp_experts.10.v1
210
+ - transformer.blocks.20.ffn.experts.mlp_experts.10.v1
211
+ - transformer.blocks.35.ffn.experts.mlp_experts.10.v1
212
+ # ffn.experts.mlp_experts.10.w1 layers
213
+ - transformer.blocks.24.ffn.experts.mlp_experts.10.w1
214
+ - transformer.blocks.33.ffn.experts.mlp_experts.10.w1
215
+ - transformer.blocks.8.ffn.experts.mlp_experts.10.w1
216
+ - transformer.blocks.7.ffn.experts.mlp_experts.10.w1
217
+ - transformer.blocks.34.ffn.experts.mlp_experts.10.w1
218
+ - transformer.blocks.28.ffn.experts.mlp_experts.10.w1
219
+ - transformer.blocks.30.ffn.experts.mlp_experts.10.w1
220
+ - transformer.blocks.1.ffn.experts.mlp_experts.10.w1
221
+ - transformer.blocks.3.ffn.experts.mlp_experts.10.w1
222
+ - transformer.blocks.5.ffn.experts.mlp_experts.10.w1
223
+ # ffn.experts.mlp_experts.10.w2 layers
224
+ - transformer.blocks.24.ffn.experts.mlp_experts.10.w2
225
+ - transformer.blocks.28.ffn.experts.mlp_experts.10.w2
226
+ - transformer.blocks.23.ffn.experts.mlp_experts.10.w2
227
+ - transformer.blocks.30.ffn.experts.mlp_experts.10.w2
228
+ - transformer.blocks.32.ffn.experts.mlp_experts.10.w2
229
+ - transformer.blocks.3.ffn.experts.mlp_experts.10.w2
230
+ - transformer.blocks.33.ffn.experts.mlp_experts.10.w2
231
+ - transformer.blocks.26.ffn.experts.mlp_experts.10.w2
232
+ - transformer.blocks.2.ffn.experts.mlp_experts.10.w2
233
+ - transformer.blocks.20.ffn.experts.mlp_experts.10.w2
234
+ # ffn.experts.mlp_experts.11.w1 layers
235
+ - transformer.blocks.6.ffn.experts.mlp_experts.11.w1
236
+ - transformer.blocks.8.ffn.experts.mlp_experts.11.w1
237
+ - transformer.blocks.9.ffn.experts.mlp_experts.11.w1
238
+ - transformer.blocks.0.ffn.experts.mlp_experts.11.w1
239
+ - transformer.blocks.10.ffn.experts.mlp_experts.11.w1
240
+ - transformer.blocks.28.ffn.experts.mlp_experts.11.w1
241
+ - transformer.blocks.3.ffn.experts.mlp_experts.11.w1
242
+ - transformer.blocks.5.ffn.experts.mlp_experts.11.w1
243
+ - transformer.blocks.33.ffn.experts.mlp_experts.11.w1
244
+ - transformer.blocks.13.ffn.experts.mlp_experts.11.w1
245
+ # ffn.experts.mlp_experts.11.w2 layers
246
+ - transformer.blocks.27.ffn.experts.mlp_experts.11.w2
247
+ - transformer.blocks.24.ffn.experts.mlp_experts.11.w2
248
+ - transformer.blocks.29.ffn.experts.mlp_experts.11.w2
249
+ - transformer.blocks.30.ffn.experts.mlp_experts.11.w2
250
+ - transformer.blocks.22.ffn.experts.mlp_experts.11.w2
251
+ - transformer.blocks.6.ffn.experts.mlp_experts.11.w2
252
+ - transformer.blocks.25.ffn.experts.mlp_experts.11.w2
253
+ - transformer.blocks.7.ffn.experts.mlp_experts.11.w2
254
+ - transformer.blocks.28.ffn.experts.mlp_experts.11.w2
255
+ - transformer.blocks.5.ffn.experts.mlp_experts.11.w2
256
+ # ffn.experts.mlp_experts.12.v1 layers
257
+ - transformer.blocks.30.ffn.experts.mlp_experts.12.v1
258
+ - transformer.blocks.21.ffn.experts.mlp_experts.12.v1
259
+ - transformer.blocks.27.ffn.experts.mlp_experts.12.v1
260
+ - transformer.blocks.28.ffn.experts.mlp_experts.12.v1
261
+ - transformer.blocks.29.ffn.experts.mlp_experts.12.v1
262
+ - transformer.blocks.8.ffn.experts.mlp_experts.12.v1
263
+ - transformer.blocks.10.ffn.experts.mlp_experts.12.v1
264
+ - transformer.blocks.23.ffn.experts.mlp_experts.12.v1
265
+ - transformer.blocks.6.ffn.experts.mlp_experts.12.v1
266
+ - transformer.blocks.20.ffn.experts.mlp_experts.12.v1
267
+ # ffn.experts.mlp_experts.12.w1 layers
268
+ - transformer.blocks.8.ffn.experts.mlp_experts.12.w1
269
+ - transformer.blocks.1.ffn.experts.mlp_experts.12.w1
270
+ - transformer.blocks.0.ffn.experts.mlp_experts.12.w1
271
+ - transformer.blocks.6.ffn.experts.mlp_experts.12.w1
272
+ - transformer.blocks.9.ffn.experts.mlp_experts.12.w1
273
+ - transformer.blocks.2.ffn.experts.mlp_experts.12.w1
274
+ - transformer.blocks.10.ffn.experts.mlp_experts.12.w1
275
+ - transformer.blocks.17.ffn.experts.mlp_experts.12.w1
276
+ - transformer.blocks.29.ffn.experts.mlp_experts.12.w1
277
+ - transformer.blocks.21.ffn.experts.mlp_experts.12.w1
278
+ # ffn.experts.mlp_experts.12.w2 layers
279
+ - transformer.blocks.6.ffn.experts.mlp_experts.12.w2
280
+ - transformer.blocks.25.ffn.experts.mlp_experts.12.w2
281
+ - transformer.blocks.27.ffn.experts.mlp_experts.12.w2
282
+ - transformer.blocks.8.ffn.experts.mlp_experts.12.w2
283
+ - transformer.blocks.31.ffn.experts.mlp_experts.12.w2
284
+ - transformer.blocks.21.ffn.experts.mlp_experts.12.w2
285
+ - transformer.blocks.2.ffn.experts.mlp_experts.12.w2
286
+ - transformer.blocks.29.ffn.experts.mlp_experts.12.w2
287
+ - transformer.blocks.32.ffn.experts.mlp_experts.12.w2
288
+ - transformer.blocks.30.ffn.experts.mlp_experts.12.w2
289
+ # ffn.experts.mlp_experts.13.v1 layers
290
+ - transformer.blocks.31.ffn.experts.mlp_experts.13.v1
291
+ - transformer.blocks.24.ffn.experts.mlp_experts.13.v1
292
+ - transformer.blocks.30.ffn.experts.mlp_experts.13.v1
293
+ - transformer.blocks.29.ffn.experts.mlp_experts.13.v1
294
+ - transformer.blocks.8.ffn.experts.mlp_experts.13.v1
295
+ - transformer.blocks.10.ffn.experts.mlp_experts.13.v1
296
+ - transformer.blocks.11.ffn.experts.mlp_experts.13.v1
297
+ - transformer.blocks.27.ffn.experts.mlp_experts.13.v1
298
+ - transformer.blocks.25.ffn.experts.mlp_experts.13.v1
299
+ - transformer.blocks.36.ffn.experts.mlp_experts.13.v1
300
+ # ffn.experts.mlp_experts.13.w1 layers
301
+ - transformer.blocks.4.ffn.experts.mlp_experts.13.w1
302
+ - transformer.blocks.10.ffn.experts.mlp_experts.13.w1
303
+ - transformer.blocks.6.ffn.experts.mlp_experts.13.w1
304
+ - transformer.blocks.0.ffn.experts.mlp_experts.13.w1
305
+ - transformer.blocks.3.ffn.experts.mlp_experts.13.w1
306
+ - transformer.blocks.24.ffn.experts.mlp_experts.13.w1
307
+ - transformer.blocks.8.ffn.experts.mlp_experts.13.w1
308
+ - transformer.blocks.1.ffn.experts.mlp_experts.13.w1
309
+ - transformer.blocks.30.ffn.experts.mlp_experts.13.w1
310
+ - transformer.blocks.11.ffn.experts.mlp_experts.13.w1
311
+ # ffn.experts.mlp_experts.13.w2 layers
312
+ - transformer.blocks.24.ffn.experts.mlp_experts.13.w2
313
+ - transformer.blocks.20.ffn.experts.mlp_experts.13.w2
314
+ - transformer.blocks.25.ffn.experts.mlp_experts.13.w2
315
+ - transformer.blocks.27.ffn.experts.mlp_experts.13.w2
316
+ - transformer.blocks.3.ffn.experts.mlp_experts.13.w2
317
+ - transformer.blocks.4.ffn.experts.mlp_experts.13.w2
318
+ - transformer.blocks.29.ffn.experts.mlp_experts.13.w2
319
+ - transformer.blocks.6.ffn.experts.mlp_experts.13.w2
320
+ - transformer.blocks.30.ffn.experts.mlp_experts.13.w2
321
+ - transformer.blocks.31.ffn.experts.mlp_experts.13.w2
322
+ # ffn.experts.mlp_experts.14.v1 layers
323
+ - transformer.blocks.28.ffn.experts.mlp_experts.14.v1
324
+ - transformer.blocks.26.ffn.experts.mlp_experts.14.v1
325
+ - transformer.blocks.29.ffn.experts.mlp_experts.14.v1
326
+ - transformer.blocks.35.ffn.experts.mlp_experts.14.v1
327
+ - transformer.blocks.24.ffn.experts.mlp_experts.14.v1
328
+ - transformer.blocks.8.ffn.experts.mlp_experts.14.v1
329
+ - transformer.blocks.32.ffn.experts.mlp_experts.14.v1
330
+ - transformer.blocks.15.ffn.experts.mlp_experts.14.v1
331
+ - transformer.blocks.11.ffn.experts.mlp_experts.14.v1
332
+ - transformer.blocks.22.ffn.experts.mlp_experts.14.v1
333
+ # ffn.experts.mlp_experts.14.w1 layers
334
+ - transformer.blocks.8.ffn.experts.mlp_experts.14.w1
335
+ - transformer.blocks.4.ffn.experts.mlp_experts.14.w1
336
+ - transformer.blocks.5.ffn.experts.mlp_experts.14.w1
337
+ - transformer.blocks.7.ffn.experts.mlp_experts.14.w1
338
+ - transformer.blocks.3.ffn.experts.mlp_experts.14.w1
339
+ - transformer.blocks.13.ffn.experts.mlp_experts.14.w1
340
+ - transformer.blocks.29.ffn.experts.mlp_experts.14.w1
341
+ - transformer.blocks.6.ffn.experts.mlp_experts.14.w1
342
+ - transformer.blocks.28.ffn.experts.mlp_experts.14.w1
343
+ - transformer.blocks.9.ffn.experts.mlp_experts.14.w1
344
+ # ffn.experts.mlp_experts.14.w2 layers
345
+ - transformer.blocks.26.ffn.experts.mlp_experts.14.w2
346
+ - transformer.blocks.24.ffn.experts.mlp_experts.14.w2
347
+ - transformer.blocks.29.ffn.experts.mlp_experts.14.w2
348
+ - transformer.blocks.28.ffn.experts.mlp_experts.14.w2
349
+ - transformer.blocks.31.ffn.experts.mlp_experts.14.w2
350
+ - transformer.blocks.5.ffn.experts.mlp_experts.14.w2
351
+ - transformer.blocks.4.ffn.experts.mlp_experts.14.w2
352
+ - transformer.blocks.32.ffn.experts.mlp_experts.14.w2
353
+ - transformer.blocks.6.ffn.experts.mlp_experts.14.w2
354
+ - transformer.blocks.22.ffn.experts.mlp_experts.14.w2
355
+ # ffn.experts.mlp_experts.15.v1 layers
356
+ - transformer.blocks.33.ffn.experts.mlp_experts.15.v1
357
+ - transformer.blocks.26.ffn.experts.mlp_experts.15.v1
358
+ - transformer.blocks.31.ffn.experts.mlp_experts.15.v1
359
+ - transformer.blocks.28.ffn.experts.mlp_experts.15.v1
360
+ - transformer.blocks.9.ffn.experts.mlp_experts.15.v1
361
+ - transformer.blocks.34.ffn.experts.mlp_experts.15.v1
362
+ - transformer.blocks.29.ffn.experts.mlp_experts.15.v1
363
+ - transformer.blocks.7.ffn.experts.mlp_experts.15.v1
364
+ - transformer.blocks.17.ffn.experts.mlp_experts.15.v1
365
+ - transformer.blocks.15.ffn.experts.mlp_experts.15.v1
366
+ # ffn.experts.mlp_experts.15.w1 layers
367
+ - transformer.blocks.6.ffn.experts.mlp_experts.15.w1
368
+ - transformer.blocks.9.ffn.experts.mlp_experts.15.w1
369
+ - transformer.blocks.0.ffn.experts.mlp_experts.15.w1
370
+ - transformer.blocks.7.ffn.experts.mlp_experts.15.w1
371
+ - transformer.blocks.14.ffn.experts.mlp_experts.15.w1
372
+ - transformer.blocks.33.ffn.experts.mlp_experts.15.w1
373
+ - transformer.blocks.34.ffn.experts.mlp_experts.15.w1
374
+ - transformer.blocks.10.ffn.experts.mlp_experts.15.w1
375
+ - transformer.blocks.5.ffn.experts.mlp_experts.15.w1
376
+ - transformer.blocks.29.ffn.experts.mlp_experts.15.w1
377
+ # ffn.experts.mlp_experts.15.w2 layers
378
+ - transformer.blocks.28.ffn.experts.mlp_experts.15.w2
379
+ - transformer.blocks.26.ffn.experts.mlp_experts.15.w2
380
+ - transformer.blocks.27.ffn.experts.mlp_experts.15.w2
381
+ - transformer.blocks.29.ffn.experts.mlp_experts.15.w2
382
+ - transformer.blocks.6.ffn.experts.mlp_experts.15.w2
383
+ - transformer.blocks.31.ffn.experts.mlp_experts.15.w2
384
+ - transformer.blocks.7.ffn.experts.mlp_experts.15.w2
385
+ - transformer.blocks.33.ffn.experts.mlp_experts.15.w2
386
+ - transformer.blocks.32.ffn.experts.mlp_experts.15.w2
387
+ - transformer.blocks.25.ffn.experts.mlp_experts.15.w2
388
+ # ffn.experts.mlp_experts.2.v1 layers
389
+ - transformer.blocks.31.ffn.experts.mlp_experts.2.v1
390
+ - transformer.blocks.27.ffn.experts.mlp_experts.2.v1
391
+ - transformer.blocks.28.ffn.experts.mlp_experts.2.v1
392
+ - transformer.blocks.30.ffn.experts.mlp_experts.2.v1
393
+ - transformer.blocks.23.ffn.experts.mlp_experts.2.v1
394
+ - transformer.blocks.32.ffn.experts.mlp_experts.2.v1
395
+ - transformer.blocks.35.ffn.experts.mlp_experts.2.v1
396
+ - transformer.blocks.7.ffn.experts.mlp_experts.2.v1
397
+ - transformer.blocks.21.ffn.experts.mlp_experts.2.v1
398
+ - transformer.blocks.15.ffn.experts.mlp_experts.2.v1
399
+ # ffn.experts.mlp_experts.2.w1 layers
400
+ - transformer.blocks.7.ffn.experts.mlp_experts.2.w1
401
+ - transformer.blocks.6.ffn.experts.mlp_experts.2.w1
402
+ - transformer.blocks.1.ffn.experts.mlp_experts.2.w1
403
+ - transformer.blocks.4.ffn.experts.mlp_experts.2.w1
404
+ - transformer.blocks.5.ffn.experts.mlp_experts.2.w1
405
+ - transformer.blocks.29.ffn.experts.mlp_experts.2.w1
406
+ - transformer.blocks.0.ffn.experts.mlp_experts.2.w1
407
+ - transformer.blocks.9.ffn.experts.mlp_experts.2.w1
408
+ - transformer.blocks.31.ffn.experts.mlp_experts.2.w1
409
+ - transformer.blocks.30.ffn.experts.mlp_experts.2.w1
410
+ # ffn.experts.mlp_experts.2.w2 layers
411
+ - transformer.blocks.26.ffn.experts.mlp_experts.2.w2
412
+ - transformer.blocks.27.ffn.experts.mlp_experts.2.w2
413
+ - transformer.blocks.33.ffn.experts.mlp_experts.2.w2
414
+ - transformer.blocks.5.ffn.experts.mlp_experts.2.w2
415
+ - transformer.blocks.23.ffn.experts.mlp_experts.2.w2
416
+ - transformer.blocks.32.ffn.experts.mlp_experts.2.w2
417
+ - transformer.blocks.28.ffn.experts.mlp_experts.2.w2
418
+ - transformer.blocks.4.ffn.experts.mlp_experts.2.w2
419
+ - transformer.blocks.29.ffn.experts.mlp_experts.2.w2
420
+ - transformer.blocks.30.ffn.experts.mlp_experts.2.w2
421
+ # ffn.experts.mlp_experts.3.v1 layers
422
+ - transformer.blocks.28.ffn.experts.mlp_experts.3.v1
423
+ - transformer.blocks.33.ffn.experts.mlp_experts.3.v1
424
+ - transformer.blocks.36.ffn.experts.mlp_experts.3.v1
425
+ - transformer.blocks.29.ffn.experts.mlp_experts.3.v1
426
+ - transformer.blocks.30.ffn.experts.mlp_experts.3.v1
427
+ - transformer.blocks.7.ffn.experts.mlp_experts.3.v1
428
+ - transformer.blocks.14.ffn.experts.mlp_experts.3.v1
429
+ - transformer.blocks.10.ffn.experts.mlp_experts.3.v1
430
+ - transformer.blocks.31.ffn.experts.mlp_experts.3.v1
431
+ - transformer.blocks.21.ffn.experts.mlp_experts.3.v1
432
+ # ffn.experts.mlp_experts.3.w1 layers
433
+ - transformer.blocks.7.ffn.experts.mlp_experts.3.w1
434
+ - transformer.blocks.0.ffn.experts.mlp_experts.3.w1
435
+ - transformer.blocks.10.ffn.experts.mlp_experts.3.w1
436
+ - transformer.blocks.9.ffn.experts.mlp_experts.3.w1
437
+ - transformer.blocks.29.ffn.experts.mlp_experts.3.w1
438
+ - transformer.blocks.5.ffn.experts.mlp_experts.3.w1
439
+ - transformer.blocks.30.ffn.experts.mlp_experts.3.w1
440
+ - transformer.blocks.4.ffn.experts.mlp_experts.3.w1
441
+ - transformer.blocks.33.ffn.experts.mlp_experts.3.w1
442
+ - transformer.blocks.1.ffn.experts.mlp_experts.3.w1
443
+ # ffn.experts.mlp_experts.3.w2 layers
444
+ - transformer.blocks.28.ffn.experts.mlp_experts.3.w2
445
+ - transformer.blocks.5.ffn.experts.mlp_experts.3.w2
446
+ - transformer.blocks.24.ffn.experts.mlp_experts.3.w2
447
+ - transformer.blocks.31.ffn.experts.mlp_experts.3.w2
448
+ - transformer.blocks.30.ffn.experts.mlp_experts.3.w2
449
+ - transformer.blocks.21.ffn.experts.mlp_experts.3.w2
450
+ - transformer.blocks.32.ffn.experts.mlp_experts.3.w2
451
+ - transformer.blocks.29.ffn.experts.mlp_experts.3.w2
452
+ - transformer.blocks.26.ffn.experts.mlp_experts.3.w2
453
+ - transformer.blocks.2.ffn.experts.mlp_experts.3.w2
454
+ # ffn.experts.mlp_experts.4.v1 layers
455
+ - transformer.blocks.34.ffn.experts.mlp_experts.4.v1
456
+ - transformer.blocks.31.ffn.experts.mlp_experts.4.v1
457
+ - transformer.blocks.26.ffn.experts.mlp_experts.4.v1
458
+ - transformer.blocks.24.ffn.experts.mlp_experts.4.v1
459
+ - transformer.blocks.14.ffn.experts.mlp_experts.4.v1
460
+ - transformer.blocks.32.ffn.experts.mlp_experts.4.v1
461
+ - transformer.blocks.7.ffn.experts.mlp_experts.4.v1
462
+ - transformer.blocks.6.ffn.experts.mlp_experts.4.v1
463
+ - transformer.blocks.20.ffn.experts.mlp_experts.4.v1
464
+ - transformer.blocks.9.ffn.experts.mlp_experts.4.v1
465
+ # ffn.experts.mlp_experts.4.w1 layers
466
+ - transformer.blocks.6.ffn.experts.mlp_experts.4.w1
467
+ - transformer.blocks.4.ffn.experts.mlp_experts.4.w1
468
+ - transformer.blocks.7.ffn.experts.mlp_experts.4.w1
469
+ - transformer.blocks.9.ffn.experts.mlp_experts.4.w1
470
+ - transformer.blocks.0.ffn.experts.mlp_experts.4.w1
471
+ - transformer.blocks.5.ffn.experts.mlp_experts.4.w1
472
+ - transformer.blocks.14.ffn.experts.mlp_experts.4.w1
473
+ - transformer.blocks.34.ffn.experts.mlp_experts.4.w1
474
+ - transformer.blocks.8.ffn.experts.mlp_experts.4.w1
475
+ - transformer.blocks.29.ffn.experts.mlp_experts.4.w1
476
+ # ffn.experts.mlp_experts.4.w2 layers
477
+ - transformer.blocks.25.ffn.experts.mlp_experts.4.w2
478
+ - transformer.blocks.24.ffn.experts.mlp_experts.4.w2
479
+ - transformer.blocks.26.ffn.experts.mlp_experts.4.w2
480
+ - transformer.blocks.5.ffn.experts.mlp_experts.4.w2
481
+ - transformer.blocks.6.ffn.experts.mlp_experts.4.w2
482
+ - transformer.blocks.32.ffn.experts.mlp_experts.4.w2
483
+ - transformer.blocks.4.ffn.experts.mlp_experts.4.w2
484
+ - transformer.blocks.36.ffn.experts.mlp_experts.4.w2
485
+ - transformer.blocks.29.ffn.experts.mlp_experts.4.w2
486
+ - transformer.blocks.27.ffn.experts.mlp_experts.4.w2
487
+ # ffn.experts.mlp_experts.5.v1 layers
488
+ - transformer.blocks.35.ffn.experts.mlp_experts.5.v1
489
+ - transformer.blocks.30.ffn.experts.mlp_experts.5.v1
490
+ - transformer.blocks.28.ffn.experts.mlp_experts.5.v1
491
+ - transformer.blocks.32.ffn.experts.mlp_experts.5.v1
492
+ - transformer.blocks.27.ffn.experts.mlp_experts.5.v1
493
+ - transformer.blocks.26.ffn.experts.mlp_experts.5.v1
494
+ - transformer.blocks.33.ffn.experts.mlp_experts.5.v1
495
+ - transformer.blocks.29.ffn.experts.mlp_experts.5.v1
496
+ - transformer.blocks.8.ffn.experts.mlp_experts.5.v1
497
+ - transformer.blocks.7.ffn.experts.mlp_experts.5.v1
498
+ # ffn.experts.mlp_experts.5.w1 layers
499
+ - transformer.blocks.0.ffn.experts.mlp_experts.5.w1
500
+ - transformer.blocks.6.ffn.experts.mlp_experts.5.w1
501
+ - transformer.blocks.7.ffn.experts.mlp_experts.5.w1
502
+ - transformer.blocks.9.ffn.experts.mlp_experts.5.w1
503
+ - transformer.blocks.8.ffn.experts.mlp_experts.5.w1
504
+ - transformer.blocks.12.ffn.experts.mlp_experts.5.w1
505
+ - transformer.blocks.3.ffn.experts.mlp_experts.5.w1
506
+ - transformer.blocks.5.ffn.experts.mlp_experts.5.w1
507
+ - transformer.blocks.4.ffn.experts.mlp_experts.5.w1
508
+ - transformer.blocks.33.ffn.experts.mlp_experts.5.w1
509
+ # ffn.experts.mlp_experts.5.w2 layers
510
+ - transformer.blocks.26.ffn.experts.mlp_experts.5.w2
511
+ - transformer.blocks.28.ffn.experts.mlp_experts.5.w2
512
+ - transformer.blocks.6.ffn.experts.mlp_experts.5.w2
513
+ - transformer.blocks.33.ffn.experts.mlp_experts.5.w2
514
+ - transformer.blocks.5.ffn.experts.mlp_experts.5.w2
515
+ - transformer.blocks.27.ffn.experts.mlp_experts.5.w2
516
+ - transformer.blocks.3.ffn.experts.mlp_experts.5.w2
517
+ - transformer.blocks.29.ffn.experts.mlp_experts.5.w2
518
+ - transformer.blocks.25.ffn.experts.mlp_experts.5.w2
519
+ - transformer.blocks.7.ffn.experts.mlp_experts.5.w2
520
+ # ffn.experts.mlp_experts.6.v1 layers
521
+ - transformer.blocks.34.ffn.experts.mlp_experts.6.v1
522
+ - transformer.blocks.31.ffn.experts.mlp_experts.6.v1
523
+ - transformer.blocks.30.ffn.experts.mlp_experts.6.v1
524
+ - transformer.blocks.26.ffn.experts.mlp_experts.6.v1
525
+ - transformer.blocks.35.ffn.experts.mlp_experts.6.v1
526
+ - transformer.blocks.20.ffn.experts.mlp_experts.6.v1
527
+ - transformer.blocks.15.ffn.experts.mlp_experts.6.v1
528
+ - transformer.blocks.29.ffn.experts.mlp_experts.6.v1
529
+ - transformer.blocks.10.ffn.experts.mlp_experts.6.v1
530
+ - transformer.blocks.24.ffn.experts.mlp_experts.6.v1
531
+ # ffn.experts.mlp_experts.6.w1 layers
532
+ - transformer.blocks.0.ffn.experts.mlp_experts.6.w1
533
+ - transformer.blocks.10.ffn.experts.mlp_experts.6.w1
534
+ - transformer.blocks.9.ffn.experts.mlp_experts.6.w1
535
+ - transformer.blocks.30.ffn.experts.mlp_experts.6.w1
536
+ - transformer.blocks.4.ffn.experts.mlp_experts.6.w1
537
+ - transformer.blocks.34.ffn.experts.mlp_experts.6.w1
538
+ - transformer.blocks.26.ffn.experts.mlp_experts.6.w1
539
+ - transformer.blocks.2.ffn.experts.mlp_experts.6.w1
540
+ - transformer.blocks.29.ffn.experts.mlp_experts.6.w1
541
+ - transformer.blocks.8.ffn.experts.mlp_experts.6.w1
542
+ # ffn.experts.mlp_experts.6.w2 layers
543
+ - transformer.blocks.24.ffn.experts.mlp_experts.6.w2
544
+ - transformer.blocks.26.ffn.experts.mlp_experts.6.w2
545
+ - transformer.blocks.32.ffn.experts.mlp_experts.6.w2
546
+ - transformer.blocks.30.ffn.experts.mlp_experts.6.w2
547
+ - transformer.blocks.25.ffn.experts.mlp_experts.6.w2
548
+ - transformer.blocks.31.ffn.experts.mlp_experts.6.w2
549
+ - transformer.blocks.20.ffn.experts.mlp_experts.6.w2
550
+ - transformer.blocks.4.ffn.experts.mlp_experts.6.w2
551
+ - transformer.blocks.2.ffn.experts.mlp_experts.6.w2
552
+ - transformer.blocks.9.ffn.experts.mlp_experts.6.w2
553
+ # ffn.experts.mlp_experts.7.v1 layers
554
+ - transformer.blocks.27.ffn.experts.mlp_experts.7.v1
555
+ - transformer.blocks.28.ffn.experts.mlp_experts.7.v1
556
+ - transformer.blocks.33.ffn.experts.mlp_experts.7.v1
557
+ - transformer.blocks.29.ffn.experts.mlp_experts.7.v1
558
+ - transformer.blocks.24.ffn.experts.mlp_experts.7.v1
559
+ - transformer.blocks.11.ffn.experts.mlp_experts.7.v1
560
+ - transformer.blocks.12.ffn.experts.mlp_experts.7.v1
561
+ - transformer.blocks.10.ffn.experts.mlp_experts.7.v1
562
+ - transformer.blocks.23.ffn.experts.mlp_experts.7.v1
563
+ - transformer.blocks.34.ffn.experts.mlp_experts.7.v1
564
+ # ffn.experts.mlp_experts.7.w1 layers
565
+ - transformer.blocks.12.ffn.experts.mlp_experts.7.w1
566
+ - transformer.blocks.0.ffn.experts.mlp_experts.7.w1
567
+ - transformer.blocks.5.ffn.experts.mlp_experts.7.w1
568
+ - transformer.blocks.29.ffn.experts.mlp_experts.7.w1
569
+ - transformer.blocks.10.ffn.experts.mlp_experts.7.w1
570
+ - transformer.blocks.4.ffn.experts.mlp_experts.7.w1
571
+ - transformer.blocks.3.ffn.experts.mlp_experts.7.w1
572
+ - transformer.blocks.8.ffn.experts.mlp_experts.7.w1
573
+ - transformer.blocks.34.ffn.experts.mlp_experts.7.w1
574
+ - transformer.blocks.33.ffn.experts.mlp_experts.7.w1
575
+ # ffn.experts.mlp_experts.7.w2 layers
576
+ - transformer.blocks.23.ffn.experts.mlp_experts.7.w2
577
+ - transformer.blocks.24.ffn.experts.mlp_experts.7.w2
578
+ - transformer.blocks.31.ffn.experts.mlp_experts.7.w2
579
+ - transformer.blocks.28.ffn.experts.mlp_experts.7.w2
580
+ - transformer.blocks.27.ffn.experts.mlp_experts.7.w2
581
+ - transformer.blocks.5.ffn.experts.mlp_experts.7.w2
582
+ - transformer.blocks.25.ffn.experts.mlp_experts.7.w2
583
+ - transformer.blocks.29.ffn.experts.mlp_experts.7.w2
584
+ - transformer.blocks.3.ffn.experts.mlp_experts.7.w2
585
+ - transformer.blocks.33.ffn.experts.mlp_experts.7.w2
586
+ # ffn.experts.mlp_experts.8.v1 layers
587
+ - transformer.blocks.30.ffn.experts.mlp_experts.8.v1
588
+ - transformer.blocks.27.ffn.experts.mlp_experts.8.v1
589
+ - transformer.blocks.20.ffn.experts.mlp_experts.8.v1
590
+ - transformer.blocks.32.ffn.experts.mlp_experts.8.v1
591
+ - transformer.blocks.34.ffn.experts.mlp_experts.8.v1
592
+ - transformer.blocks.33.ffn.experts.mlp_experts.8.v1
593
+ - transformer.blocks.9.ffn.experts.mlp_experts.8.v1
594
+ - transformer.blocks.7.ffn.experts.mlp_experts.8.v1
595
+ - transformer.blocks.6.ffn.experts.mlp_experts.8.v1
596
+ - transformer.blocks.24.ffn.experts.mlp_experts.8.v1
597
+ # ffn.experts.mlp_experts.8.w1 layers
598
+ - transformer.blocks.7.ffn.experts.mlp_experts.8.w1
599
+ - transformer.blocks.6.ffn.experts.mlp_experts.8.w1
600
+ - transformer.blocks.0.ffn.experts.mlp_experts.8.w1
601
+ - transformer.blocks.9.ffn.experts.mlp_experts.8.w1
602
+ - transformer.blocks.3.ffn.experts.mlp_experts.8.w1
603
+ - transformer.blocks.2.ffn.experts.mlp_experts.8.w1
604
+ - transformer.blocks.8.ffn.experts.mlp_experts.8.w1
605
+ - transformer.blocks.30.ffn.experts.mlp_experts.8.w1
606
+ - transformer.blocks.24.ffn.experts.mlp_experts.8.w1
607
+ - transformer.blocks.1.ffn.experts.mlp_experts.8.w1
608
+ # ffn.experts.mlp_experts.8.w2 layers
609
+ - transformer.blocks.32.ffn.experts.mlp_experts.8.w2
610
+ - transformer.blocks.24.ffn.experts.mlp_experts.8.w2
611
+ - transformer.blocks.27.ffn.experts.mlp_experts.8.w2
612
+ - transformer.blocks.30.ffn.experts.mlp_experts.8.w2
613
+ - transformer.blocks.31.ffn.experts.mlp_experts.8.w2
614
+ - transformer.blocks.28.ffn.experts.mlp_experts.8.w2
615
+ - transformer.blocks.2.ffn.experts.mlp_experts.8.w2
616
+ - transformer.blocks.3.ffn.experts.mlp_experts.8.w2
617
+ - transformer.blocks.23.ffn.experts.mlp_experts.8.w2
618
+ - transformer.blocks.29.ffn.experts.mlp_experts.8.w2
619
+ # ffn.experts.mlp_experts.9.v1 layers
620
+ - transformer.blocks.31.ffn.experts.mlp_experts.9.v1
621
+ - transformer.blocks.27.ffn.experts.mlp_experts.9.v1
622
+ - transformer.blocks.29.ffn.experts.mlp_experts.9.v1
623
+ - transformer.blocks.33.ffn.experts.mlp_experts.9.v1
624
+ - transformer.blocks.25.ffn.experts.mlp_experts.9.v1
625
+ - transformer.blocks.14.ffn.experts.mlp_experts.9.v1
626
+ - transformer.blocks.32.ffn.experts.mlp_experts.9.v1
627
+ - transformer.blocks.7.ffn.experts.mlp_experts.9.v1
628
+ - transformer.blocks.9.ffn.experts.mlp_experts.9.v1
629
+ - transformer.blocks.34.ffn.experts.mlp_experts.9.v1
630
+ # ffn.experts.mlp_experts.9.w1 layers
631
+ - transformer.blocks.7.ffn.experts.mlp_experts.9.w1
632
+ - transformer.blocks.1.ffn.experts.mlp_experts.9.w1
633
+ - transformer.blocks.9.ffn.experts.mlp_experts.9.w1
634
+ - transformer.blocks.2.ffn.experts.mlp_experts.9.w1
635
+ - transformer.blocks.27.ffn.experts.mlp_experts.9.w1
636
+ - transformer.blocks.12.ffn.experts.mlp_experts.9.w1
637
+ - transformer.blocks.4.ffn.experts.mlp_experts.9.w1
638
+ - transformer.blocks.6.ffn.experts.mlp_experts.9.w1
639
+ - transformer.blocks.19.ffn.experts.mlp_experts.9.w1
640
+ - transformer.blocks.8.ffn.experts.mlp_experts.9.w1
641
+ # ffn.experts.mlp_experts.9.w2 layers
642
+ - transformer.blocks.26.ffn.experts.mlp_experts.9.w2
643
+ - transformer.blocks.25.ffn.experts.mlp_experts.9.w2
644
+ - transformer.blocks.28.ffn.experts.mlp_experts.9.w2
645
+ - transformer.blocks.27.ffn.experts.mlp_experts.9.w2
646
+ - transformer.blocks.31.ffn.experts.mlp_experts.9.w2
647
+ - transformer.blocks.29.ffn.experts.mlp_experts.9.w2
648
+ - transformer.blocks.7.ffn.experts.mlp_experts.9.w2
649
+ - transformer.blocks.34.ffn.experts.mlp_experts.9.w2
650
+ - transformer.blocks.2.ffn.experts.mlp_experts.9.w2
651
+ - transformer.blocks.33.ffn.experts.mlp_experts.9.w2
652
+ # ffn.router.layer layers
653
+ - transformer.blocks.2.ffn.router.layer
654
+ - transformer.blocks.3.ffn.router.layer
655
+ - transformer.blocks.4.ffn.router.layer
656
+ - transformer.blocks.5.ffn.router.layer
657
+ - transformer.blocks.6.ffn.router.layer
658
+ - transformer.blocks.7.ffn.router.layer
659
+ - transformer.blocks.8.ffn.router.layer
660
+ - transformer.blocks.9.ffn.router.layer
661
+ - transformer.blocks.10.ffn.router.layer
662
+ - transformer.blocks.11.ffn.router.layer
663
+ # norm_attn_norm.attn.Wqkv layers
664
+ - transformer.blocks.16.norm_attn_norm.attn.Wqkv
665
+ - transformer.blocks.15.norm_attn_norm.attn.Wqkv
666
+ - transformer.blocks.11.norm_attn_norm.attn.Wqkv
667
+ - transformer.blocks.14.norm_attn_norm.attn.Wqkv
668
+ - transformer.blocks.12.norm_attn_norm.attn.Wqkv
669
+ - transformer.blocks.20.norm_attn_norm.attn.Wqkv
670
+ - transformer.blocks.10.norm_attn_norm.attn.Wqkv
671
+ - transformer.blocks.9.norm_attn_norm.attn.Wqkv
672
+ - transformer.blocks.19.norm_attn_norm.attn.Wqkv
673
+ - transformer.blocks.18.norm_attn_norm.attn.Wqkv
674
+ # norm_attn_norm.attn.out_proj layers
675
+ - transformer.blocks.1.norm_attn_norm.attn.out_proj
676
+ - transformer.blocks.18.norm_attn_norm.attn.out_proj
677
+ - transformer.blocks.2.norm_attn_norm.attn.out_proj
678
+ - transformer.blocks.16.norm_attn_norm.attn.out_proj
679
+ - transformer.blocks.0.norm_attn_norm.attn.out_proj
680
+ - transformer.blocks.39.norm_attn_norm.attn.out_proj
681
+ - transformer.blocks.23.norm_attn_norm.attn.out_proj
682
+ - transformer.blocks.8.norm_attn_norm.attn.out_proj
683
+ - transformer.blocks.24.norm_attn_norm.attn.out_proj
684
+ - transformer.blocks.19.norm_attn_norm.attn.out_proj
685
+ # norm_attn_norm.norm_1 layers
686
+ - transformer.blocks.0.norm_attn_norm.norm_1
687
+ - transformer.blocks.1.norm_attn_norm.norm_1
688
+ - transformer.blocks.2.norm_attn_norm.norm_1
689
+ - transformer.blocks.3.norm_attn_norm.norm_1
690
+ - transformer.blocks.4.norm_attn_norm.norm_1
691
+ - transformer.blocks.5.norm_attn_norm.norm_1
692
+ - transformer.blocks.6.norm_attn_norm.norm_1
693
+ - transformer.blocks.7.norm_attn_norm.norm_1
694
+ - transformer.blocks.8.norm_attn_norm.norm_1
695
+ - transformer.blocks.9.norm_attn_norm.norm_1
696
+ # norm_attn_norm.norm_2 layers
697
+ - transformer.blocks.0.norm_attn_norm.norm_2
698
+ - transformer.blocks.1.norm_attn_norm.norm_2
699
+ - transformer.blocks.2.norm_attn_norm.norm_2
700
+ - transformer.blocks.3.norm_attn_norm.norm_2
701
+ - transformer.blocks.4.norm_attn_norm.norm_2
702
+ - transformer.blocks.5.norm_attn_norm.norm_2
703
+ - transformer.blocks.6.norm_attn_norm.norm_2
704
+ - transformer.blocks.7.norm_attn_norm.norm_2
705
+ - transformer.blocks.8.norm_attn_norm.norm_2
706
+ - transformer.blocks.9.norm_attn_norm.norm_2
707
+ # transformer.norm_f layers
708
+ # transformer.wte layers
709
+ # ffn.experts.mlp_experts.11.v1 layers
710
+ - transformer.blocks.29.ffn.experts.mlp_experts.11.v1
711
+ - transformer.blocks.27.ffn.experts.mlp_experts.11.v1
712
+ - transformer.blocks.30.ffn.experts.mlp_experts.11.v1
713
+ - transformer.blocks.28.ffn.experts.mlp_experts.11.v1
714
+ - transformer.blocks.22.ffn.experts.mlp_experts.11.v1
715
+ - transformer.blocks.7.ffn.experts.mlp_experts.11.v1
716
+ - transformer.blocks.24.ffn.experts.mlp_experts.11.v1
717
+ - transformer.blocks.8.ffn.experts.mlp_experts.11.v1
718
+ - transformer.blocks.6.ffn.experts.mlp_experts.11.v1
719
+ - transformer.blocks.12.ffn.experts.mlp_experts.11.v1
720
+
721
+
722
+
723
+ dataset_prepared_path: dbrx2
724
+ val_set_size: 0.01
725
+ output_dir: ./out
726
+
727
+ sequence_len: 4096
728
+ sample_packing: true
729
+ pad_to_sequence_len: true
730
+
731
+ wandb_project: dolphin-2.9-Dbrx
732
+ wandb_watch:
733
+ wandb_run_id:
734
+ wandb_log_model:
735
+
736
+ gradient_accumulation_steps: 8
737
+ micro_batch_size: 1
738
+ num_epochs: 1
739
+ optimizer: paged_adamw_8bit
740
+ lr_scheduler: cosine
741
+ learning_rate: 1e-5
742
+
743
+ train_on_inputs: false
744
+ group_by_length: false
745
+ bf16: auto
746
+ fp16:
747
+ tf32: true
748
+
749
+ gradient_checkpointing: true
750
+ gradient_checkpointing_kwargs:
751
+ use_reentrant: false
752
+ early_stopping_patience:
753
+ # resume_from_checkpoint: /workspace/axolotl/dbrx-checkpoint
754
+ logging_steps: 1
755
+ xformers_attention:
756
+ flash_attention: true
757
+
758
+ warmup_steps: 10
759
+ evals_per_epoch: 4
760
+ eval_table_size:
761
+ saves_per_epoch: 4
762
+ save_total_limit: 2
763
+ save_steps:
764
+ debug:
765
+ deepspeed: /workspace/axolotl/deepspeed_configs/zero3_bf16_cpuoffload_params.json
766
+ weight_decay: 0.05
767
+ fsdp:
768
+ fsdp_config:
769
+ special_tokens:
770
+ bos_token: "<|endoftext|>"
771
+ eos_token: "<|im_end|>"
772
+ pad_token: "<|pad|>"
773
+ unk_token: "<|endoftext|>"
774
+ tokens:
775
+ - "<|im_start|>"
776
+ - "<|im_end|>"
777
+
778
+
779
+ ```
780
+
781
+ </details><br>
782
+
783
+ # out
784
+
785
+ This model was trained from scratch on the None dataset.
786
+ It achieves the following results on the evaluation set:
787
+ - Loss: 0.4336
788
+
789
+ ## Model description
790
+
791
+ More information needed
792
+
793
+ ## Intended uses & limitations
794
+
795
+ More information needed
796
+
797
+ ## Training and evaluation data
798
+
799
+ More information needed
800
+
801
+ ## Training procedure
802
+
803
+ ### Training hyperparameters
804
+
805
+ The following hyperparameters were used during training:
806
+ - learning_rate: 1e-05
807
+ - train_batch_size: 1
808
+ - eval_batch_size: 1
809
+ - seed: 42
810
+ - distributed_type: multi-GPU
811
+ - num_devices: 8
812
+ - gradient_accumulation_steps: 8
813
+ - total_train_batch_size: 64
814
+ - total_eval_batch_size: 8
815
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
816
+ - lr_scheduler_type: cosine
817
+ - lr_scheduler_warmup_steps: 10
818
+ - num_epochs: 1
819
+
820
+ ### Training results
821
+
822
+ | Training Loss | Epoch | Step | Validation Loss |
823
+ |:-------------:|:-----:|:----:|:---------------:|
824
+ | 0.4009 | 0.0 | 1 | 0.4328 |
825
+ | 0.413 | 0.25 | 587 | 0.4408 |
826
+ | 0.3626 | 0.5 | 1174 | 0.4368 |
827
+ | 0.3896 | 0.75 | 1761 | 0.4336 |
828
+
829
+
830
+ ### Framework versions
831
+
832
+ - Transformers 4.40.0.dev0
833
+ - Pytorch 2.2.2+cu121
834
+ - Datasets 2.15.0
835
+ - Tokenizers 0.15.0