nlile commited on
Commit
4638d30
·
1 Parent(s): 77d34c4

Model save

Browse files
README.md ADDED
@@ -0,0 +1,134 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: stabilityai/StableBeluga-13B
3
+ tags:
4
+ - generated_from_trainer
5
+ model-index:
6
+ - name: PE-13b-full
7
+ results: []
8
+ ---
9
+
10
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
11
+ should probably proofread and complete it, then remove this comment. -->
12
+
13
+ # PE-13b-full
14
+
15
+ This model is a fine-tuned version of [stabilityai/StableBeluga-13B](https://huggingface.co/stabilityai/StableBeluga-13B) on an unknown dataset.
16
+ It achieves the following results on the evaluation set:
17
+ - Loss: 0.0094
18
+ - Rewards/chosen: -1.2833
19
+ - Rewards/rejected: -29.7294
20
+ - Rewards/accuracies: 0.9916
21
+ - Rewards/margins: 28.4460
22
+ - Logps/rejected: -121.9200
23
+ - Logps/chosen: -84.7524
24
+ - Logits/rejected: -2.1605
25
+ - Logits/chosen: -2.4403
26
+
27
+ ## Model description
28
+
29
+ More information needed
30
+
31
+ ## Intended uses & limitations
32
+
33
+ More information needed
34
+
35
+ ## Training and evaluation data
36
+
37
+ More information needed
38
+
39
+ ## Training procedure
40
+
41
+ ### Training hyperparameters
42
+
43
+ The following hyperparameters were used during training:
44
+ - learning_rate: 3e-07
45
+ - train_batch_size: 1
46
+ - eval_batch_size: 2
47
+ - seed: 42
48
+ - distributed_type: multi-GPU
49
+ - num_devices: 8
50
+ - gradient_accumulation_steps: 8
51
+ - total_train_batch_size: 64
52
+ - total_eval_batch_size: 16
53
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
54
+ - lr_scheduler_type: linear
55
+ - lr_scheduler_warmup_ratio: 0.1
56
+ - num_epochs: 3
57
+
58
+ ### Training results
59
+
60
+ | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
61
+ |:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
62
+ | 0.5085 | 0.05 | 100 | 0.4978 | 0.1241 | -0.3334 | 0.9525 | 0.4575 | -63.1282 | -81.9376 | -2.0870 | -2.3586 |
63
+ | 0.1966 | 0.09 | 200 | 0.2003 | 0.5022 | -1.3704 | 0.9804 | 1.8726 | -65.2020 | -81.1812 | -2.0918 | -2.3650 |
64
+ | 0.0612 | 0.14 | 300 | 0.0656 | 0.8997 | -3.3315 | 0.9888 | 4.2312 | -69.1243 | -80.3863 | -2.0887 | -2.3741 |
65
+ | 0.029 | 0.18 | 400 | 0.0356 | 0.9536 | -5.0607 | 0.9944 | 6.0143 | -72.5827 | -80.2785 | -2.0905 | -2.3804 |
66
+ | 0.0187 | 0.23 | 500 | 0.0201 | 0.9079 | -7.5059 | 0.9888 | 8.4139 | -77.4731 | -80.3699 | -2.0974 | -2.3915 |
67
+ | 0.0112 | 0.27 | 600 | 0.0130 | 0.7188 | -10.4500 | 0.9916 | 11.1688 | -83.3612 | -80.7481 | -2.0987 | -2.3960 |
68
+ | 0.0066 | 0.32 | 700 | 0.0102 | 0.6639 | -13.1345 | 0.9916 | 13.7984 | -88.7303 | -80.8579 | -2.1111 | -2.4104 |
69
+ | 0.0088 | 0.37 | 800 | 0.0098 | 0.9128 | -13.1977 | 0.9888 | 14.1105 | -88.8568 | -80.3601 | -2.1031 | -2.4030 |
70
+ | 0.0054 | 0.41 | 900 | 0.0092 | 0.6109 | -15.6398 | 0.9888 | 16.2507 | -93.7409 | -80.9640 | -2.1158 | -2.4144 |
71
+ | 0.0044 | 0.46 | 1000 | 0.0094 | 0.9982 | -16.0071 | 0.9916 | 17.0053 | -94.4755 | -80.1893 | -2.0988 | -2.3946 |
72
+ | 0.0061 | 0.5 | 1100 | 0.0089 | 0.5504 | -18.0125 | 0.9916 | 18.5630 | -98.4864 | -81.0849 | -2.0991 | -2.3955 |
73
+ | 0.024 | 0.55 | 1200 | 0.0088 | 0.4877 | -16.6683 | 0.9916 | 17.1561 | -95.7980 | -81.2103 | -2.0748 | -2.3633 |
74
+ | 0.0039 | 0.59 | 1300 | 0.0087 | 0.3755 | -18.5093 | 0.9916 | 18.8848 | -99.4799 | -81.4347 | -2.0746 | -2.3623 |
75
+ | 0.0051 | 0.64 | 1400 | 0.0086 | 0.1176 | -20.5558 | 0.9916 | 20.6734 | -103.5730 | -81.9506 | -2.0819 | -2.3738 |
76
+ | 0.0023 | 0.68 | 1500 | 0.0089 | 0.1552 | -20.0740 | 0.9888 | 20.2292 | -102.6092 | -81.8754 | -2.0813 | -2.3667 |
77
+ | 0.0027 | 0.73 | 1600 | 0.0089 | -0.5025 | -20.7978 | 0.9888 | 20.2953 | -104.0569 | -83.1908 | -2.1179 | -2.4078 |
78
+ | 0.0031 | 0.78 | 1700 | 0.0085 | -0.6314 | -21.0492 | 0.9916 | 20.4178 | -104.5597 | -83.4485 | -2.0915 | -2.3773 |
79
+ | 0.0049 | 0.82 | 1800 | 0.0085 | -0.7786 | -21.3333 | 0.9916 | 20.5547 | -105.1278 | -83.7429 | -2.0670 | -2.3504 |
80
+ | 0.0023 | 0.87 | 1900 | 0.0084 | -0.7496 | -22.3377 | 0.9944 | 21.5880 | -107.1367 | -83.6850 | -2.0729 | -2.3547 |
81
+ | 0.0067 | 0.91 | 2000 | 0.0086 | -0.8126 | -22.8024 | 0.9916 | 21.9899 | -108.0662 | -83.8109 | -2.0651 | -2.3472 |
82
+ | 0.0041 | 0.96 | 2100 | 0.0082 | -0.7903 | -21.8379 | 0.9944 | 21.0476 | -106.1371 | -83.7663 | -2.0363 | -2.3137 |
83
+ | 0.0025 | 1.0 | 2200 | 0.0079 | -0.4489 | -21.4451 | 0.9916 | 20.9963 | -105.3516 | -83.0835 | -2.0303 | -2.3074 |
84
+ | 0.0023 | 1.05 | 2300 | 0.0082 | -1.1267 | -22.7620 | 0.9944 | 21.6353 | -107.9852 | -84.4391 | -2.0477 | -2.3260 |
85
+ | 0.0055 | 1.1 | 2400 | 0.0085 | -1.4969 | -24.0568 | 0.9888 | 22.5599 | -110.5749 | -85.1796 | -2.0616 | -2.3384 |
86
+ | 0.0139 | 1.14 | 2500 | 0.0077 | 0.4564 | -20.3860 | 0.9916 | 20.8424 | -103.2333 | -81.2730 | -2.0453 | -2.3206 |
87
+ | 0.0023 | 1.19 | 2600 | 0.0081 | 0.0858 | -21.9640 | 0.9916 | 22.0498 | -106.3893 | -82.0141 | -2.0528 | -2.3273 |
88
+ | 0.0046 | 1.23 | 2700 | 0.0083 | -0.2543 | -23.4016 | 0.9916 | 23.1473 | -109.2646 | -82.6943 | -2.0668 | -2.3457 |
89
+ | 0.0033 | 1.28 | 2800 | 0.0083 | -0.3317 | -23.7872 | 0.9916 | 23.4555 | -110.0356 | -82.8491 | -2.0884 | -2.3650 |
90
+ | 0.0023 | 1.32 | 2900 | 0.0084 | -0.2753 | -24.3682 | 0.9916 | 24.0929 | -111.1976 | -82.7362 | -2.1054 | -2.3879 |
91
+ | 0.0034 | 1.37 | 3000 | 0.0081 | 0.4328 | -23.3162 | 0.9916 | 23.7491 | -109.0938 | -81.3201 | -2.0817 | -2.3565 |
92
+ | 0.0033 | 1.42 | 3100 | 0.0082 | -0.0254 | -23.7390 | 0.9944 | 23.7136 | -109.9394 | -82.2366 | -2.0706 | -2.3447 |
93
+ | 0.0033 | 1.46 | 3200 | 0.0086 | -0.7680 | -24.0452 | 0.9916 | 23.2772 | -110.5517 | -83.7218 | -2.0760 | -2.3543 |
94
+ | 0.0032 | 1.51 | 3300 | 0.0086 | -0.0016 | -23.5161 | 0.9944 | 23.5145 | -109.4934 | -82.1889 | -2.0881 | -2.3655 |
95
+ | 0.0011 | 1.55 | 3400 | 0.0084 | 0.0195 | -24.2635 | 0.9944 | 24.2831 | -110.9884 | -82.1467 | -2.0878 | -2.3667 |
96
+ | 0.0002 | 1.6 | 3500 | 0.0087 | 0.0421 | -24.8306 | 0.9916 | 24.8728 | -112.1225 | -82.1015 | -2.0890 | -2.3698 |
97
+ | 0.0034 | 1.64 | 3600 | 0.0086 | -0.2729 | -25.8106 | 0.9916 | 25.5377 | -114.0825 | -82.7315 | -2.1030 | -2.3851 |
98
+ | 0.0027 | 1.69 | 3700 | 0.0086 | 0.0339 | -25.0221 | 0.9916 | 25.0560 | -112.5055 | -82.1179 | -2.1300 | -2.4147 |
99
+ | 0.0056 | 1.73 | 3800 | 0.0082 | 0.1800 | -23.6173 | 0.9916 | 23.7974 | -109.6960 | -81.8257 | -2.1140 | -2.3980 |
100
+ | 0.0026 | 1.78 | 3900 | 0.0083 | -0.0334 | -24.6060 | 0.9944 | 24.5725 | -111.6733 | -82.2526 | -2.1140 | -2.3965 |
101
+ | 0.0036 | 1.83 | 4000 | 0.0080 | -0.2511 | -23.0433 | 0.9916 | 22.7923 | -108.5479 | -82.6879 | -2.1348 | -2.4167 |
102
+ | 0.0044 | 1.87 | 4100 | 0.0084 | -0.4259 | -23.7811 | 0.9916 | 23.3551 | -110.0234 | -83.0376 | -2.1314 | -2.4160 |
103
+ | 0.0022 | 1.92 | 4200 | 0.0083 | -0.5710 | -23.2360 | 0.9944 | 22.6650 | -108.9332 | -83.3277 | -2.1369 | -2.4196 |
104
+ | 0.0044 | 1.96 | 4300 | 0.0085 | -0.6363 | -24.6474 | 0.9972 | 24.0111 | -111.7560 | -83.4583 | -2.1307 | -2.4109 |
105
+ | 0.0023 | 2.01 | 4400 | 0.0085 | -0.6133 | -24.9492 | 0.9916 | 24.3359 | -112.3597 | -83.4124 | -2.1322 | -2.4134 |
106
+ | 0.0033 | 2.05 | 4500 | 0.0085 | -0.7101 | -25.5054 | 0.9916 | 24.7953 | -113.4721 | -83.6059 | -2.1326 | -2.4142 |
107
+ | 0.0023 | 2.1 | 4600 | 0.0087 | -0.7855 | -26.0511 | 0.9916 | 25.2656 | -114.5634 | -83.7567 | -2.1333 | -2.4152 |
108
+ | 0.0011 | 2.15 | 4700 | 0.0088 | -0.9006 | -26.5845 | 0.9944 | 25.6839 | -115.6303 | -83.9870 | -2.1369 | -2.4198 |
109
+ | 0.0065 | 2.19 | 4800 | 0.0088 | -0.7570 | -26.8960 | 0.9916 | 26.1390 | -116.2533 | -83.6997 | -2.1393 | -2.4198 |
110
+ | 0.0022 | 2.24 | 4900 | 0.0091 | -0.9581 | -27.9431 | 0.9916 | 26.9850 | -118.3475 | -84.1019 | -2.1428 | -2.4245 |
111
+ | 0.0026 | 2.28 | 5000 | 0.0091 | -1.2522 | -28.8309 | 0.9944 | 27.5788 | -120.1232 | -84.6901 | -2.1479 | -2.4287 |
112
+ | 0.0033 | 2.33 | 5100 | 0.0089 | -0.8602 | -28.7323 | 0.9916 | 27.8721 | -119.9259 | -83.9062 | -2.1522 | -2.4328 |
113
+ | 0.0041 | 2.37 | 5200 | 0.0091 | -1.0405 | -29.2861 | 0.9916 | 28.2456 | -121.0335 | -84.2668 | -2.1536 | -2.4343 |
114
+ | 0.0023 | 2.42 | 5300 | 0.0093 | -1.1323 | -29.5240 | 0.9916 | 28.3917 | -121.5093 | -84.4504 | -2.1529 | -2.4336 |
115
+ | 0.0022 | 2.46 | 5400 | 0.0092 | -1.2202 | -29.2127 | 0.9916 | 27.9925 | -120.8866 | -84.6261 | -2.1595 | -2.4416 |
116
+ | 0.0 | 2.51 | 5500 | 0.0093 | -1.4371 | -29.7063 | 0.9916 | 28.2692 | -121.8739 | -85.0599 | -2.1609 | -2.4404 |
117
+ | 0.0022 | 2.56 | 5600 | 0.0095 | -1.4397 | -30.0202 | 0.9944 | 28.5804 | -122.5016 | -85.0652 | -2.1584 | -2.4383 |
118
+ | 0.0011 | 2.6 | 5700 | 0.0096 | -1.6125 | -30.0945 | 0.9916 | 28.4820 | -122.6504 | -85.4108 | -2.1601 | -2.4395 |
119
+ | 0.0053 | 2.65 | 5800 | 0.0095 | -1.5638 | -30.0025 | 0.9944 | 28.4387 | -122.4663 | -85.3133 | -2.1615 | -2.4398 |
120
+ | 0.003 | 2.69 | 5900 | 0.0095 | -1.5904 | -30.1980 | 0.9916 | 28.6076 | -122.8572 | -85.3666 | -2.1606 | -2.4406 |
121
+ | 0.0011 | 2.74 | 6000 | 0.0094 | -1.5286 | -30.0882 | 0.9944 | 28.5596 | -122.6377 | -85.2429 | -2.1615 | -2.4403 |
122
+ | 0.0008 | 2.78 | 6100 | 0.0095 | -1.4405 | -30.0174 | 0.9916 | 28.5769 | -122.4961 | -85.0667 | -2.1615 | -2.4400 |
123
+ | 0.0022 | 2.83 | 6200 | 0.0093 | -1.3508 | -29.9317 | 0.9916 | 28.5808 | -122.3246 | -84.8874 | -2.1599 | -2.4395 |
124
+ | 0.0019 | 2.88 | 6300 | 0.0093 | -1.2416 | -29.6525 | 0.9916 | 28.4109 | -121.7663 | -84.6690 | -2.1620 | -2.4415 |
125
+ | 0.0034 | 2.92 | 6400 | 0.0093 | -1.2995 | -29.7927 | 0.9916 | 28.4932 | -122.0468 | -84.7848 | -2.1616 | -2.4412 |
126
+ | 0.0014 | 2.97 | 6500 | 0.0092 | -1.2574 | -29.7200 | 0.9916 | 28.4626 | -121.9014 | -84.7006 | -2.1595 | -2.4408 |
127
+
128
+
129
+ ### Framework versions
130
+
131
+ - Transformers 4.35.0
132
+ - Pytorch 2.1.1+cu121
133
+ - Datasets 2.14.6
134
+ - Tokenizers 0.14.1
all_results.json ADDED
@@ -0,0 +1,21 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 3.0,
3
+ "eval_logits/chosen": -2.4403114318847656,
4
+ "eval_logits/rejected": -2.160489320755005,
5
+ "eval_logps/chosen": -84.75238800048828,
6
+ "eval_logps/rejected": -121.9200210571289,
7
+ "eval_loss": 0.009359963238239288,
8
+ "eval_rewards/accuracies": 0.9916201233863831,
9
+ "eval_rewards/chosen": -1.2833248376846313,
10
+ "eval_rewards/margins": 28.446029663085938,
11
+ "eval_rewards/rejected": -29.729354858398438,
12
+ "eval_runtime": 194.4792,
13
+ "eval_samples": 2862,
14
+ "eval_samples_per_second": 14.716,
15
+ "eval_steps_per_second": 0.92,
16
+ "train_loss": 0.020571088252061828,
17
+ "train_runtime": 85394.2588,
18
+ "train_samples": 140201,
19
+ "train_samples_per_second": 4.925,
20
+ "train_steps_per_second": 0.077
21
+ }
eval_results.json ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 3.0,
3
+ "eval_logits/chosen": -2.4403114318847656,
4
+ "eval_logits/rejected": -2.160489320755005,
5
+ "eval_logps/chosen": -84.75238800048828,
6
+ "eval_logps/rejected": -121.9200210571289,
7
+ "eval_loss": 0.009359963238239288,
8
+ "eval_rewards/accuracies": 0.9916201233863831,
9
+ "eval_rewards/chosen": -1.2833248376846313,
10
+ "eval_rewards/margins": 28.446029663085938,
11
+ "eval_rewards/rejected": -29.729354858398438,
12
+ "eval_runtime": 194.4792,
13
+ "eval_samples": 2862,
14
+ "eval_samples_per_second": 14.716,
15
+ "eval_steps_per_second": 0.92
16
+ }
generation_config.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 1,
4
+ "eos_token_id": 2,
5
+ "pad_token_id": 0,
6
+ "transformers_version": "4.35.0",
7
+ "use_cache": false
8
+ }
model-00001-of-00006.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c7f83be9c88f193e544401017425ddd7d3fc94f56dea590675aaea96316e625f
3
+ size 4978265800
model-00002-of-00006.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2c8603687c4cac7839703b7c2fc0a87b431757c87986ac6a655ef3382ce2db46
3
+ size 4970422232
model-00003-of-00006.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d13c22f6ad52ad30f9c89ce9abea709e1f8d02ff92ac442f9a9e24dbcd3152c4
3
+ size 4970422256
model-00004-of-00006.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:926d931e47f68909018d8dae01cda01dfc46edafbe77185ae0595f4c9c44ffb9
3
+ size 4933701504
model-00005-of-00006.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:cc1f1511124e3d681065b70e3004134690262b6b1bac0d11740301af278fe9e6
3
+ size 4933722216
model-00006-of-00006.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0d40584042f6b215794b5681c85b5ce648fc2229a2dcfd7578154fcdffd55763
3
+ size 1245236920
model.safetensors.index.json ADDED
@@ -0,0 +1,370 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "metadata": {
3
+ "total_size": 26031728640
4
+ },
5
+ "weight_map": {
6
+ "lm_head.weight": "model-00006-of-00006.safetensors",
7
+ "model.embed_tokens.weight": "model-00001-of-00006.safetensors",
8
+ "model.layers.0.input_layernorm.weight": "model-00001-of-00006.safetensors",
9
+ "model.layers.0.mlp.down_proj.weight": "model-00001-of-00006.safetensors",
10
+ "model.layers.0.mlp.gate_proj.weight": "model-00001-of-00006.safetensors",
11
+ "model.layers.0.mlp.up_proj.weight": "model-00001-of-00006.safetensors",
12
+ "model.layers.0.post_attention_layernorm.weight": "model-00001-of-00006.safetensors",
13
+ "model.layers.0.self_attn.k_proj.weight": "model-00001-of-00006.safetensors",
14
+ "model.layers.0.self_attn.o_proj.weight": "model-00001-of-00006.safetensors",
15
+ "model.layers.0.self_attn.q_proj.weight": "model-00001-of-00006.safetensors",
16
+ "model.layers.0.self_attn.v_proj.weight": "model-00001-of-00006.safetensors",
17
+ "model.layers.1.input_layernorm.weight": "model-00001-of-00006.safetensors",
18
+ "model.layers.1.mlp.down_proj.weight": "model-00001-of-00006.safetensors",
19
+ "model.layers.1.mlp.gate_proj.weight": "model-00001-of-00006.safetensors",
20
+ "model.layers.1.mlp.up_proj.weight": "model-00001-of-00006.safetensors",
21
+ "model.layers.1.post_attention_layernorm.weight": "model-00001-of-00006.safetensors",
22
+ "model.layers.1.self_attn.k_proj.weight": "model-00001-of-00006.safetensors",
23
+ "model.layers.1.self_attn.o_proj.weight": "model-00001-of-00006.safetensors",
24
+ "model.layers.1.self_attn.q_proj.weight": "model-00001-of-00006.safetensors",
25
+ "model.layers.1.self_attn.v_proj.weight": "model-00001-of-00006.safetensors",
26
+ "model.layers.10.input_layernorm.weight": "model-00002-of-00006.safetensors",
27
+ "model.layers.10.mlp.down_proj.weight": "model-00002-of-00006.safetensors",
28
+ "model.layers.10.mlp.gate_proj.weight": "model-00002-of-00006.safetensors",
29
+ "model.layers.10.mlp.up_proj.weight": "model-00002-of-00006.safetensors",
30
+ "model.layers.10.post_attention_layernorm.weight": "model-00002-of-00006.safetensors",
31
+ "model.layers.10.self_attn.k_proj.weight": "model-00002-of-00006.safetensors",
32
+ "model.layers.10.self_attn.o_proj.weight": "model-00002-of-00006.safetensors",
33
+ "model.layers.10.self_attn.q_proj.weight": "model-00002-of-00006.safetensors",
34
+ "model.layers.10.self_attn.v_proj.weight": "model-00002-of-00006.safetensors",
35
+ "model.layers.11.input_layernorm.weight": "model-00002-of-00006.safetensors",
36
+ "model.layers.11.mlp.down_proj.weight": "model-00002-of-00006.safetensors",
37
+ "model.layers.11.mlp.gate_proj.weight": "model-00002-of-00006.safetensors",
38
+ "model.layers.11.mlp.up_proj.weight": "model-00002-of-00006.safetensors",
39
+ "model.layers.11.post_attention_layernorm.weight": "model-00002-of-00006.safetensors",
40
+ "model.layers.11.self_attn.k_proj.weight": "model-00002-of-00006.safetensors",
41
+ "model.layers.11.self_attn.o_proj.weight": "model-00002-of-00006.safetensors",
42
+ "model.layers.11.self_attn.q_proj.weight": "model-00002-of-00006.safetensors",
43
+ "model.layers.11.self_attn.v_proj.weight": "model-00002-of-00006.safetensors",
44
+ "model.layers.12.input_layernorm.weight": "model-00002-of-00006.safetensors",
45
+ "model.layers.12.mlp.down_proj.weight": "model-00002-of-00006.safetensors",
46
+ "model.layers.12.mlp.gate_proj.weight": "model-00002-of-00006.safetensors",
47
+ "model.layers.12.mlp.up_proj.weight": "model-00002-of-00006.safetensors",
48
+ "model.layers.12.post_attention_layernorm.weight": "model-00002-of-00006.safetensors",
49
+ "model.layers.12.self_attn.k_proj.weight": "model-00002-of-00006.safetensors",
50
+ "model.layers.12.self_attn.o_proj.weight": "model-00002-of-00006.safetensors",
51
+ "model.layers.12.self_attn.q_proj.weight": "model-00002-of-00006.safetensors",
52
+ "model.layers.12.self_attn.v_proj.weight": "model-00002-of-00006.safetensors",
53
+ "model.layers.13.input_layernorm.weight": "model-00002-of-00006.safetensors",
54
+ "model.layers.13.mlp.down_proj.weight": "model-00002-of-00006.safetensors",
55
+ "model.layers.13.mlp.gate_proj.weight": "model-00002-of-00006.safetensors",
56
+ "model.layers.13.mlp.up_proj.weight": "model-00002-of-00006.safetensors",
57
+ "model.layers.13.post_attention_layernorm.weight": "model-00002-of-00006.safetensors",
58
+ "model.layers.13.self_attn.k_proj.weight": "model-00002-of-00006.safetensors",
59
+ "model.layers.13.self_attn.o_proj.weight": "model-00002-of-00006.safetensors",
60
+ "model.layers.13.self_attn.q_proj.weight": "model-00002-of-00006.safetensors",
61
+ "model.layers.13.self_attn.v_proj.weight": "model-00002-of-00006.safetensors",
62
+ "model.layers.14.input_layernorm.weight": "model-00002-of-00006.safetensors",
63
+ "model.layers.14.mlp.down_proj.weight": "model-00002-of-00006.safetensors",
64
+ "model.layers.14.mlp.gate_proj.weight": "model-00002-of-00006.safetensors",
65
+ "model.layers.14.mlp.up_proj.weight": "model-00002-of-00006.safetensors",
66
+ "model.layers.14.post_attention_layernorm.weight": "model-00002-of-00006.safetensors",
67
+ "model.layers.14.self_attn.k_proj.weight": "model-00002-of-00006.safetensors",
68
+ "model.layers.14.self_attn.o_proj.weight": "model-00002-of-00006.safetensors",
69
+ "model.layers.14.self_attn.q_proj.weight": "model-00002-of-00006.safetensors",
70
+ "model.layers.14.self_attn.v_proj.weight": "model-00002-of-00006.safetensors",
71
+ "model.layers.15.input_layernorm.weight": "model-00003-of-00006.safetensors",
72
+ "model.layers.15.mlp.down_proj.weight": "model-00003-of-00006.safetensors",
73
+ "model.layers.15.mlp.gate_proj.weight": "model-00003-of-00006.safetensors",
74
+ "model.layers.15.mlp.up_proj.weight": "model-00003-of-00006.safetensors",
75
+ "model.layers.15.post_attention_layernorm.weight": "model-00003-of-00006.safetensors",
76
+ "model.layers.15.self_attn.k_proj.weight": "model-00002-of-00006.safetensors",
77
+ "model.layers.15.self_attn.o_proj.weight": "model-00003-of-00006.safetensors",
78
+ "model.layers.15.self_attn.q_proj.weight": "model-00002-of-00006.safetensors",
79
+ "model.layers.15.self_attn.v_proj.weight": "model-00003-of-00006.safetensors",
80
+ "model.layers.16.input_layernorm.weight": "model-00003-of-00006.safetensors",
81
+ "model.layers.16.mlp.down_proj.weight": "model-00003-of-00006.safetensors",
82
+ "model.layers.16.mlp.gate_proj.weight": "model-00003-of-00006.safetensors",
83
+ "model.layers.16.mlp.up_proj.weight": "model-00003-of-00006.safetensors",
84
+ "model.layers.16.post_attention_layernorm.weight": "model-00003-of-00006.safetensors",
85
+ "model.layers.16.self_attn.k_proj.weight": "model-00003-of-00006.safetensors",
86
+ "model.layers.16.self_attn.o_proj.weight": "model-00003-of-00006.safetensors",
87
+ "model.layers.16.self_attn.q_proj.weight": "model-00003-of-00006.safetensors",
88
+ "model.layers.16.self_attn.v_proj.weight": "model-00003-of-00006.safetensors",
89
+ "model.layers.17.input_layernorm.weight": "model-00003-of-00006.safetensors",
90
+ "model.layers.17.mlp.down_proj.weight": "model-00003-of-00006.safetensors",
91
+ "model.layers.17.mlp.gate_proj.weight": "model-00003-of-00006.safetensors",
92
+ "model.layers.17.mlp.up_proj.weight": "model-00003-of-00006.safetensors",
93
+ "model.layers.17.post_attention_layernorm.weight": "model-00003-of-00006.safetensors",
94
+ "model.layers.17.self_attn.k_proj.weight": "model-00003-of-00006.safetensors",
95
+ "model.layers.17.self_attn.o_proj.weight": "model-00003-of-00006.safetensors",
96
+ "model.layers.17.self_attn.q_proj.weight": "model-00003-of-00006.safetensors",
97
+ "model.layers.17.self_attn.v_proj.weight": "model-00003-of-00006.safetensors",
98
+ "model.layers.18.input_layernorm.weight": "model-00003-of-00006.safetensors",
99
+ "model.layers.18.mlp.down_proj.weight": "model-00003-of-00006.safetensors",
100
+ "model.layers.18.mlp.gate_proj.weight": "model-00003-of-00006.safetensors",
101
+ "model.layers.18.mlp.up_proj.weight": "model-00003-of-00006.safetensors",
102
+ "model.layers.18.post_attention_layernorm.weight": "model-00003-of-00006.safetensors",
103
+ "model.layers.18.self_attn.k_proj.weight": "model-00003-of-00006.safetensors",
104
+ "model.layers.18.self_attn.o_proj.weight": "model-00003-of-00006.safetensors",
105
+ "model.layers.18.self_attn.q_proj.weight": "model-00003-of-00006.safetensors",
106
+ "model.layers.18.self_attn.v_proj.weight": "model-00003-of-00006.safetensors",
107
+ "model.layers.19.input_layernorm.weight": "model-00003-of-00006.safetensors",
108
+ "model.layers.19.mlp.down_proj.weight": "model-00003-of-00006.safetensors",
109
+ "model.layers.19.mlp.gate_proj.weight": "model-00003-of-00006.safetensors",
110
+ "model.layers.19.mlp.up_proj.weight": "model-00003-of-00006.safetensors",
111
+ "model.layers.19.post_attention_layernorm.weight": "model-00003-of-00006.safetensors",
112
+ "model.layers.19.self_attn.k_proj.weight": "model-00003-of-00006.safetensors",
113
+ "model.layers.19.self_attn.o_proj.weight": "model-00003-of-00006.safetensors",
114
+ "model.layers.19.self_attn.q_proj.weight": "model-00003-of-00006.safetensors",
115
+ "model.layers.19.self_attn.v_proj.weight": "model-00003-of-00006.safetensors",
116
+ "model.layers.2.input_layernorm.weight": "model-00001-of-00006.safetensors",
117
+ "model.layers.2.mlp.down_proj.weight": "model-00001-of-00006.safetensors",
118
+ "model.layers.2.mlp.gate_proj.weight": "model-00001-of-00006.safetensors",
119
+ "model.layers.2.mlp.up_proj.weight": "model-00001-of-00006.safetensors",
120
+ "model.layers.2.post_attention_layernorm.weight": "model-00001-of-00006.safetensors",
121
+ "model.layers.2.self_attn.k_proj.weight": "model-00001-of-00006.safetensors",
122
+ "model.layers.2.self_attn.o_proj.weight": "model-00001-of-00006.safetensors",
123
+ "model.layers.2.self_attn.q_proj.weight": "model-00001-of-00006.safetensors",
124
+ "model.layers.2.self_attn.v_proj.weight": "model-00001-of-00006.safetensors",
125
+ "model.layers.20.input_layernorm.weight": "model-00003-of-00006.safetensors",
126
+ "model.layers.20.mlp.down_proj.weight": "model-00003-of-00006.safetensors",
127
+ "model.layers.20.mlp.gate_proj.weight": "model-00003-of-00006.safetensors",
128
+ "model.layers.20.mlp.up_proj.weight": "model-00003-of-00006.safetensors",
129
+ "model.layers.20.post_attention_layernorm.weight": "model-00003-of-00006.safetensors",
130
+ "model.layers.20.self_attn.k_proj.weight": "model-00003-of-00006.safetensors",
131
+ "model.layers.20.self_attn.o_proj.weight": "model-00003-of-00006.safetensors",
132
+ "model.layers.20.self_attn.q_proj.weight": "model-00003-of-00006.safetensors",
133
+ "model.layers.20.self_attn.v_proj.weight": "model-00003-of-00006.safetensors",
134
+ "model.layers.21.input_layernorm.weight": "model-00003-of-00006.safetensors",
135
+ "model.layers.21.mlp.down_proj.weight": "model-00003-of-00006.safetensors",
136
+ "model.layers.21.mlp.gate_proj.weight": "model-00003-of-00006.safetensors",
137
+ "model.layers.21.mlp.up_proj.weight": "model-00003-of-00006.safetensors",
138
+ "model.layers.21.post_attention_layernorm.weight": "model-00003-of-00006.safetensors",
139
+ "model.layers.21.self_attn.k_proj.weight": "model-00003-of-00006.safetensors",
140
+ "model.layers.21.self_attn.o_proj.weight": "model-00003-of-00006.safetensors",
141
+ "model.layers.21.self_attn.q_proj.weight": "model-00003-of-00006.safetensors",
142
+ "model.layers.21.self_attn.v_proj.weight": "model-00003-of-00006.safetensors",
143
+ "model.layers.22.input_layernorm.weight": "model-00003-of-00006.safetensors",
144
+ "model.layers.22.mlp.down_proj.weight": "model-00003-of-00006.safetensors",
145
+ "model.layers.22.mlp.gate_proj.weight": "model-00003-of-00006.safetensors",
146
+ "model.layers.22.mlp.up_proj.weight": "model-00003-of-00006.safetensors",
147
+ "model.layers.22.post_attention_layernorm.weight": "model-00003-of-00006.safetensors",
148
+ "model.layers.22.self_attn.k_proj.weight": "model-00003-of-00006.safetensors",
149
+ "model.layers.22.self_attn.o_proj.weight": "model-00003-of-00006.safetensors",
150
+ "model.layers.22.self_attn.q_proj.weight": "model-00003-of-00006.safetensors",
151
+ "model.layers.22.self_attn.v_proj.weight": "model-00003-of-00006.safetensors",
152
+ "model.layers.23.input_layernorm.weight": "model-00004-of-00006.safetensors",
153
+ "model.layers.23.mlp.down_proj.weight": "model-00004-of-00006.safetensors",
154
+ "model.layers.23.mlp.gate_proj.weight": "model-00004-of-00006.safetensors",
155
+ "model.layers.23.mlp.up_proj.weight": "model-00004-of-00006.safetensors",
156
+ "model.layers.23.post_attention_layernorm.weight": "model-00004-of-00006.safetensors",
157
+ "model.layers.23.self_attn.k_proj.weight": "model-00004-of-00006.safetensors",
158
+ "model.layers.23.self_attn.o_proj.weight": "model-00004-of-00006.safetensors",
159
+ "model.layers.23.self_attn.q_proj.weight": "model-00004-of-00006.safetensors",
160
+ "model.layers.23.self_attn.v_proj.weight": "model-00004-of-00006.safetensors",
161
+ "model.layers.24.input_layernorm.weight": "model-00004-of-00006.safetensors",
162
+ "model.layers.24.mlp.down_proj.weight": "model-00004-of-00006.safetensors",
163
+ "model.layers.24.mlp.gate_proj.weight": "model-00004-of-00006.safetensors",
164
+ "model.layers.24.mlp.up_proj.weight": "model-00004-of-00006.safetensors",
165
+ "model.layers.24.post_attention_layernorm.weight": "model-00004-of-00006.safetensors",
166
+ "model.layers.24.self_attn.k_proj.weight": "model-00004-of-00006.safetensors",
167
+ "model.layers.24.self_attn.o_proj.weight": "model-00004-of-00006.safetensors",
168
+ "model.layers.24.self_attn.q_proj.weight": "model-00004-of-00006.safetensors",
169
+ "model.layers.24.self_attn.v_proj.weight": "model-00004-of-00006.safetensors",
170
+ "model.layers.25.input_layernorm.weight": "model-00004-of-00006.safetensors",
171
+ "model.layers.25.mlp.down_proj.weight": "model-00004-of-00006.safetensors",
172
+ "model.layers.25.mlp.gate_proj.weight": "model-00004-of-00006.safetensors",
173
+ "model.layers.25.mlp.up_proj.weight": "model-00004-of-00006.safetensors",
174
+ "model.layers.25.post_attention_layernorm.weight": "model-00004-of-00006.safetensors",
175
+ "model.layers.25.self_attn.k_proj.weight": "model-00004-of-00006.safetensors",
176
+ "model.layers.25.self_attn.o_proj.weight": "model-00004-of-00006.safetensors",
177
+ "model.layers.25.self_attn.q_proj.weight": "model-00004-of-00006.safetensors",
178
+ "model.layers.25.self_attn.v_proj.weight": "model-00004-of-00006.safetensors",
179
+ "model.layers.26.input_layernorm.weight": "model-00004-of-00006.safetensors",
180
+ "model.layers.26.mlp.down_proj.weight": "model-00004-of-00006.safetensors",
181
+ "model.layers.26.mlp.gate_proj.weight": "model-00004-of-00006.safetensors",
182
+ "model.layers.26.mlp.up_proj.weight": "model-00004-of-00006.safetensors",
183
+ "model.layers.26.post_attention_layernorm.weight": "model-00004-of-00006.safetensors",
184
+ "model.layers.26.self_attn.k_proj.weight": "model-00004-of-00006.safetensors",
185
+ "model.layers.26.self_attn.o_proj.weight": "model-00004-of-00006.safetensors",
186
+ "model.layers.26.self_attn.q_proj.weight": "model-00004-of-00006.safetensors",
187
+ "model.layers.26.self_attn.v_proj.weight": "model-00004-of-00006.safetensors",
188
+ "model.layers.27.input_layernorm.weight": "model-00004-of-00006.safetensors",
189
+ "model.layers.27.mlp.down_proj.weight": "model-00004-of-00006.safetensors",
190
+ "model.layers.27.mlp.gate_proj.weight": "model-00004-of-00006.safetensors",
191
+ "model.layers.27.mlp.up_proj.weight": "model-00004-of-00006.safetensors",
192
+ "model.layers.27.post_attention_layernorm.weight": "model-00004-of-00006.safetensors",
193
+ "model.layers.27.self_attn.k_proj.weight": "model-00004-of-00006.safetensors",
194
+ "model.layers.27.self_attn.o_proj.weight": "model-00004-of-00006.safetensors",
195
+ "model.layers.27.self_attn.q_proj.weight": "model-00004-of-00006.safetensors",
196
+ "model.layers.27.self_attn.v_proj.weight": "model-00004-of-00006.safetensors",
197
+ "model.layers.28.input_layernorm.weight": "model-00004-of-00006.safetensors",
198
+ "model.layers.28.mlp.down_proj.weight": "model-00004-of-00006.safetensors",
199
+ "model.layers.28.mlp.gate_proj.weight": "model-00004-of-00006.safetensors",
200
+ "model.layers.28.mlp.up_proj.weight": "model-00004-of-00006.safetensors",
201
+ "model.layers.28.post_attention_layernorm.weight": "model-00004-of-00006.safetensors",
202
+ "model.layers.28.self_attn.k_proj.weight": "model-00004-of-00006.safetensors",
203
+ "model.layers.28.self_attn.o_proj.weight": "model-00004-of-00006.safetensors",
204
+ "model.layers.28.self_attn.q_proj.weight": "model-00004-of-00006.safetensors",
205
+ "model.layers.28.self_attn.v_proj.weight": "model-00004-of-00006.safetensors",
206
+ "model.layers.29.input_layernorm.weight": "model-00004-of-00006.safetensors",
207
+ "model.layers.29.mlp.down_proj.weight": "model-00004-of-00006.safetensors",
208
+ "model.layers.29.mlp.gate_proj.weight": "model-00004-of-00006.safetensors",
209
+ "model.layers.29.mlp.up_proj.weight": "model-00004-of-00006.safetensors",
210
+ "model.layers.29.post_attention_layernorm.weight": "model-00004-of-00006.safetensors",
211
+ "model.layers.29.self_attn.k_proj.weight": "model-00004-of-00006.safetensors",
212
+ "model.layers.29.self_attn.o_proj.weight": "model-00004-of-00006.safetensors",
213
+ "model.layers.29.self_attn.q_proj.weight": "model-00004-of-00006.safetensors",
214
+ "model.layers.29.self_attn.v_proj.weight": "model-00004-of-00006.safetensors",
215
+ "model.layers.3.input_layernorm.weight": "model-00001-of-00006.safetensors",
216
+ "model.layers.3.mlp.down_proj.weight": "model-00001-of-00006.safetensors",
217
+ "model.layers.3.mlp.gate_proj.weight": "model-00001-of-00006.safetensors",
218
+ "model.layers.3.mlp.up_proj.weight": "model-00001-of-00006.safetensors",
219
+ "model.layers.3.post_attention_layernorm.weight": "model-00001-of-00006.safetensors",
220
+ "model.layers.3.self_attn.k_proj.weight": "model-00001-of-00006.safetensors",
221
+ "model.layers.3.self_attn.o_proj.weight": "model-00001-of-00006.safetensors",
222
+ "model.layers.3.self_attn.q_proj.weight": "model-00001-of-00006.safetensors",
223
+ "model.layers.3.self_attn.v_proj.weight": "model-00001-of-00006.safetensors",
224
+ "model.layers.30.input_layernorm.weight": "model-00005-of-00006.safetensors",
225
+ "model.layers.30.mlp.down_proj.weight": "model-00005-of-00006.safetensors",
226
+ "model.layers.30.mlp.gate_proj.weight": "model-00004-of-00006.safetensors",
227
+ "model.layers.30.mlp.up_proj.weight": "model-00004-of-00006.safetensors",
228
+ "model.layers.30.post_attention_layernorm.weight": "model-00005-of-00006.safetensors",
229
+ "model.layers.30.self_attn.k_proj.weight": "model-00004-of-00006.safetensors",
230
+ "model.layers.30.self_attn.o_proj.weight": "model-00004-of-00006.safetensors",
231
+ "model.layers.30.self_attn.q_proj.weight": "model-00004-of-00006.safetensors",
232
+ "model.layers.30.self_attn.v_proj.weight": "model-00004-of-00006.safetensors",
233
+ "model.layers.31.input_layernorm.weight": "model-00005-of-00006.safetensors",
234
+ "model.layers.31.mlp.down_proj.weight": "model-00005-of-00006.safetensors",
235
+ "model.layers.31.mlp.gate_proj.weight": "model-00005-of-00006.safetensors",
236
+ "model.layers.31.mlp.up_proj.weight": "model-00005-of-00006.safetensors",
237
+ "model.layers.31.post_attention_layernorm.weight": "model-00005-of-00006.safetensors",
238
+ "model.layers.31.self_attn.k_proj.weight": "model-00005-of-00006.safetensors",
239
+ "model.layers.31.self_attn.o_proj.weight": "model-00005-of-00006.safetensors",
240
+ "model.layers.31.self_attn.q_proj.weight": "model-00005-of-00006.safetensors",
241
+ "model.layers.31.self_attn.v_proj.weight": "model-00005-of-00006.safetensors",
242
+ "model.layers.32.input_layernorm.weight": "model-00005-of-00006.safetensors",
243
+ "model.layers.32.mlp.down_proj.weight": "model-00005-of-00006.safetensors",
244
+ "model.layers.32.mlp.gate_proj.weight": "model-00005-of-00006.safetensors",
245
+ "model.layers.32.mlp.up_proj.weight": "model-00005-of-00006.safetensors",
246
+ "model.layers.32.post_attention_layernorm.weight": "model-00005-of-00006.safetensors",
247
+ "model.layers.32.self_attn.k_proj.weight": "model-00005-of-00006.safetensors",
248
+ "model.layers.32.self_attn.o_proj.weight": "model-00005-of-00006.safetensors",
249
+ "model.layers.32.self_attn.q_proj.weight": "model-00005-of-00006.safetensors",
250
+ "model.layers.32.self_attn.v_proj.weight": "model-00005-of-00006.safetensors",
251
+ "model.layers.33.input_layernorm.weight": "model-00005-of-00006.safetensors",
252
+ "model.layers.33.mlp.down_proj.weight": "model-00005-of-00006.safetensors",
253
+ "model.layers.33.mlp.gate_proj.weight": "model-00005-of-00006.safetensors",
254
+ "model.layers.33.mlp.up_proj.weight": "model-00005-of-00006.safetensors",
255
+ "model.layers.33.post_attention_layernorm.weight": "model-00005-of-00006.safetensors",
256
+ "model.layers.33.self_attn.k_proj.weight": "model-00005-of-00006.safetensors",
257
+ "model.layers.33.self_attn.o_proj.weight": "model-00005-of-00006.safetensors",
258
+ "model.layers.33.self_attn.q_proj.weight": "model-00005-of-00006.safetensors",
259
+ "model.layers.33.self_attn.v_proj.weight": "model-00005-of-00006.safetensors",
260
+ "model.layers.34.input_layernorm.weight": "model-00005-of-00006.safetensors",
261
+ "model.layers.34.mlp.down_proj.weight": "model-00005-of-00006.safetensors",
262
+ "model.layers.34.mlp.gate_proj.weight": "model-00005-of-00006.safetensors",
263
+ "model.layers.34.mlp.up_proj.weight": "model-00005-of-00006.safetensors",
264
+ "model.layers.34.post_attention_layernorm.weight": "model-00005-of-00006.safetensors",
265
+ "model.layers.34.self_attn.k_proj.weight": "model-00005-of-00006.safetensors",
266
+ "model.layers.34.self_attn.o_proj.weight": "model-00005-of-00006.safetensors",
267
+ "model.layers.34.self_attn.q_proj.weight": "model-00005-of-00006.safetensors",
268
+ "model.layers.34.self_attn.v_proj.weight": "model-00005-of-00006.safetensors",
269
+ "model.layers.35.input_layernorm.weight": "model-00005-of-00006.safetensors",
270
+ "model.layers.35.mlp.down_proj.weight": "model-00005-of-00006.safetensors",
271
+ "model.layers.35.mlp.gate_proj.weight": "model-00005-of-00006.safetensors",
272
+ "model.layers.35.mlp.up_proj.weight": "model-00005-of-00006.safetensors",
273
+ "model.layers.35.post_attention_layernorm.weight": "model-00005-of-00006.safetensors",
274
+ "model.layers.35.self_attn.k_proj.weight": "model-00005-of-00006.safetensors",
275
+ "model.layers.35.self_attn.o_proj.weight": "model-00005-of-00006.safetensors",
276
+ "model.layers.35.self_attn.q_proj.weight": "model-00005-of-00006.safetensors",
277
+ "model.layers.35.self_attn.v_proj.weight": "model-00005-of-00006.safetensors",
278
+ "model.layers.36.input_layernorm.weight": "model-00005-of-00006.safetensors",
279
+ "model.layers.36.mlp.down_proj.weight": "model-00005-of-00006.safetensors",
280
+ "model.layers.36.mlp.gate_proj.weight": "model-00005-of-00006.safetensors",
281
+ "model.layers.36.mlp.up_proj.weight": "model-00005-of-00006.safetensors",
282
+ "model.layers.36.post_attention_layernorm.weight": "model-00005-of-00006.safetensors",
283
+ "model.layers.36.self_attn.k_proj.weight": "model-00005-of-00006.safetensors",
284
+ "model.layers.36.self_attn.o_proj.weight": "model-00005-of-00006.safetensors",
285
+ "model.layers.36.self_attn.q_proj.weight": "model-00005-of-00006.safetensors",
286
+ "model.layers.36.self_attn.v_proj.weight": "model-00005-of-00006.safetensors",
287
+ "model.layers.37.input_layernorm.weight": "model-00005-of-00006.safetensors",
288
+ "model.layers.37.mlp.down_proj.weight": "model-00005-of-00006.safetensors",
289
+ "model.layers.37.mlp.gate_proj.weight": "model-00005-of-00006.safetensors",
290
+ "model.layers.37.mlp.up_proj.weight": "model-00005-of-00006.safetensors",
291
+ "model.layers.37.post_attention_layernorm.weight": "model-00005-of-00006.safetensors",
292
+ "model.layers.37.self_attn.k_proj.weight": "model-00005-of-00006.safetensors",
293
+ "model.layers.37.self_attn.o_proj.weight": "model-00005-of-00006.safetensors",
294
+ "model.layers.37.self_attn.q_proj.weight": "model-00005-of-00006.safetensors",
295
+ "model.layers.37.self_attn.v_proj.weight": "model-00005-of-00006.safetensors",
296
+ "model.layers.38.input_layernorm.weight": "model-00006-of-00006.safetensors",
297
+ "model.layers.38.mlp.down_proj.weight": "model-00006-of-00006.safetensors",
298
+ "model.layers.38.mlp.gate_proj.weight": "model-00005-of-00006.safetensors",
299
+ "model.layers.38.mlp.up_proj.weight": "model-00006-of-00006.safetensors",
300
+ "model.layers.38.post_attention_layernorm.weight": "model-00006-of-00006.safetensors",
301
+ "model.layers.38.self_attn.k_proj.weight": "model-00005-of-00006.safetensors",
302
+ "model.layers.38.self_attn.o_proj.weight": "model-00005-of-00006.safetensors",
303
+ "model.layers.38.self_attn.q_proj.weight": "model-00005-of-00006.safetensors",
304
+ "model.layers.38.self_attn.v_proj.weight": "model-00005-of-00006.safetensors",
305
+ "model.layers.39.input_layernorm.weight": "model-00006-of-00006.safetensors",
306
+ "model.layers.39.mlp.down_proj.weight": "model-00006-of-00006.safetensors",
307
+ "model.layers.39.mlp.gate_proj.weight": "model-00006-of-00006.safetensors",
308
+ "model.layers.39.mlp.up_proj.weight": "model-00006-of-00006.safetensors",
309
+ "model.layers.39.post_attention_layernorm.weight": "model-00006-of-00006.safetensors",
310
+ "model.layers.39.self_attn.k_proj.weight": "model-00006-of-00006.safetensors",
311
+ "model.layers.39.self_attn.o_proj.weight": "model-00006-of-00006.safetensors",
312
+ "model.layers.39.self_attn.q_proj.weight": "model-00006-of-00006.safetensors",
313
+ "model.layers.39.self_attn.v_proj.weight": "model-00006-of-00006.safetensors",
314
+ "model.layers.4.input_layernorm.weight": "model-00001-of-00006.safetensors",
315
+ "model.layers.4.mlp.down_proj.weight": "model-00001-of-00006.safetensors",
316
+ "model.layers.4.mlp.gate_proj.weight": "model-00001-of-00006.safetensors",
317
+ "model.layers.4.mlp.up_proj.weight": "model-00001-of-00006.safetensors",
318
+ "model.layers.4.post_attention_layernorm.weight": "model-00001-of-00006.safetensors",
319
+ "model.layers.4.self_attn.k_proj.weight": "model-00001-of-00006.safetensors",
320
+ "model.layers.4.self_attn.o_proj.weight": "model-00001-of-00006.safetensors",
321
+ "model.layers.4.self_attn.q_proj.weight": "model-00001-of-00006.safetensors",
322
+ "model.layers.4.self_attn.v_proj.weight": "model-00001-of-00006.safetensors",
323
+ "model.layers.5.input_layernorm.weight": "model-00001-of-00006.safetensors",
324
+ "model.layers.5.mlp.down_proj.weight": "model-00001-of-00006.safetensors",
325
+ "model.layers.5.mlp.gate_proj.weight": "model-00001-of-00006.safetensors",
326
+ "model.layers.5.mlp.up_proj.weight": "model-00001-of-00006.safetensors",
327
+ "model.layers.5.post_attention_layernorm.weight": "model-00001-of-00006.safetensors",
328
+ "model.layers.5.self_attn.k_proj.weight": "model-00001-of-00006.safetensors",
329
+ "model.layers.5.self_attn.o_proj.weight": "model-00001-of-00006.safetensors",
330
+ "model.layers.5.self_attn.q_proj.weight": "model-00001-of-00006.safetensors",
331
+ "model.layers.5.self_attn.v_proj.weight": "model-00001-of-00006.safetensors",
332
+ "model.layers.6.input_layernorm.weight": "model-00001-of-00006.safetensors",
333
+ "model.layers.6.mlp.down_proj.weight": "model-00001-of-00006.safetensors",
334
+ "model.layers.6.mlp.gate_proj.weight": "model-00001-of-00006.safetensors",
335
+ "model.layers.6.mlp.up_proj.weight": "model-00001-of-00006.safetensors",
336
+ "model.layers.6.post_attention_layernorm.weight": "model-00001-of-00006.safetensors",
337
+ "model.layers.6.self_attn.k_proj.weight": "model-00001-of-00006.safetensors",
338
+ "model.layers.6.self_attn.o_proj.weight": "model-00001-of-00006.safetensors",
339
+ "model.layers.6.self_attn.q_proj.weight": "model-00001-of-00006.safetensors",
340
+ "model.layers.6.self_attn.v_proj.weight": "model-00001-of-00006.safetensors",
341
+ "model.layers.7.input_layernorm.weight": "model-00002-of-00006.safetensors",
342
+ "model.layers.7.mlp.down_proj.weight": "model-00002-of-00006.safetensors",
343
+ "model.layers.7.mlp.gate_proj.weight": "model-00002-of-00006.safetensors",
344
+ "model.layers.7.mlp.up_proj.weight": "model-00002-of-00006.safetensors",
345
+ "model.layers.7.post_attention_layernorm.weight": "model-00002-of-00006.safetensors",
346
+ "model.layers.7.self_attn.k_proj.weight": "model-00001-of-00006.safetensors",
347
+ "model.layers.7.self_attn.o_proj.weight": "model-00001-of-00006.safetensors",
348
+ "model.layers.7.self_attn.q_proj.weight": "model-00001-of-00006.safetensors",
349
+ "model.layers.7.self_attn.v_proj.weight": "model-00001-of-00006.safetensors",
350
+ "model.layers.8.input_layernorm.weight": "model-00002-of-00006.safetensors",
351
+ "model.layers.8.mlp.down_proj.weight": "model-00002-of-00006.safetensors",
352
+ "model.layers.8.mlp.gate_proj.weight": "model-00002-of-00006.safetensors",
353
+ "model.layers.8.mlp.up_proj.weight": "model-00002-of-00006.safetensors",
354
+ "model.layers.8.post_attention_layernorm.weight": "model-00002-of-00006.safetensors",
355
+ "model.layers.8.self_attn.k_proj.weight": "model-00002-of-00006.safetensors",
356
+ "model.layers.8.self_attn.o_proj.weight": "model-00002-of-00006.safetensors",
357
+ "model.layers.8.self_attn.q_proj.weight": "model-00002-of-00006.safetensors",
358
+ "model.layers.8.self_attn.v_proj.weight": "model-00002-of-00006.safetensors",
359
+ "model.layers.9.input_layernorm.weight": "model-00002-of-00006.safetensors",
360
+ "model.layers.9.mlp.down_proj.weight": "model-00002-of-00006.safetensors",
361
+ "model.layers.9.mlp.gate_proj.weight": "model-00002-of-00006.safetensors",
362
+ "model.layers.9.mlp.up_proj.weight": "model-00002-of-00006.safetensors",
363
+ "model.layers.9.post_attention_layernorm.weight": "model-00002-of-00006.safetensors",
364
+ "model.layers.9.self_attn.k_proj.weight": "model-00002-of-00006.safetensors",
365
+ "model.layers.9.self_attn.o_proj.weight": "model-00002-of-00006.safetensors",
366
+ "model.layers.9.self_attn.q_proj.weight": "model-00002-of-00006.safetensors",
367
+ "model.layers.9.self_attn.v_proj.weight": "model-00002-of-00006.safetensors",
368
+ "model.norm.weight": "model-00006-of-00006.safetensors"
369
+ }
370
+ }
train_results.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 3.0,
3
+ "train_loss": 0.020571088252061828,
4
+ "train_runtime": 85394.2588,
5
+ "train_samples": 140201,
6
+ "train_samples_per_second": 4.925,
7
+ "train_steps_per_second": 0.077
8
+ }
trainer_state.json ADDED
The diff for this file is too large to render. See raw diff