UGARIT commited on
Commit
c18a8a0
·
1 Parent(s): 90569b1

Upload 6 files

Browse files
Files changed (6) hide show
  1. dev.tsv +0 -0
  2. final-model.pt +3 -0
  3. loss.tsv +11 -0
  4. test.tsv +0 -0
  5. training.log +526 -0
  6. weights.txt +0 -0
dev.tsv ADDED
The diff for this file is too large to render. See raw diff
 
final-model.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d36b2ad45e1159e1fa953433b4d913a4af9c8103fc1ef8012d9f32645ff00f45
3
+ size 1140158181
loss.tsv ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ EPOCH TIMESTAMP BAD_EPOCHS LEARNING_RATE TRAIN_LOSS DEV_LOSS DEV_PRECISION DEV_RECALL DEV_F1 DEV_ACCURACY
2
+ 1 15:50:17 0 0.0100 0.15425708817024494 0.060714565217494965 0.7787 0.8033 0.7908 0.7707
3
+ 2 16:12:13 0 0.0100 0.08213652963048304 0.05180404335260391 0.8249 0.8572 0.8408 0.8224
4
+ 3 16:34:11 0 0.0100 0.07016307590417552 0.04575943946838379 0.8564 0.8825 0.8693 0.8513
5
+ 4 16:55:45 1 0.0100 0.06056221985322536 0.04747875779867172 0.8472 0.8787 0.8627 0.8453
6
+ 5 17:17:21 0 0.0100 0.052649540430491894 0.03879784420132637 0.8726 0.9006 0.8864 0.8699
7
+ 6 17:39:07 0 0.0100 0.047476279502849085 0.03874654322862625 0.8813 0.9006 0.8908 0.873
8
+ 7 18:00:59 0 0.0100 0.04204823572638501 0.04413652420043945 0.8832 0.9108 0.8968 0.8809
9
+ 8 18:22:46 0 0.0100 0.03738312710290932 0.03726610541343689 0.9065 0.917 0.9117 0.8934
10
+ 9 18:44:24 1 0.0100 0.03297667390843091 0.04557322338223457 0.8949 0.9224 0.9084 0.8937
11
+ 10 19:06:06 0 0.0100 0.0303017038951756 0.03892701491713524 0.905 0.9216 0.9132 0.8958
test.tsv ADDED
The diff for this file is too large to render. See raw diff
 
training.log ADDED
@@ -0,0 +1,526 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 2022-10-26 15:28:10,168 ----------------------------------------------------------------------------------------------------
2
+ 2022-10-26 15:28:10,173 Model: "SequenceTagger(
3
+ (embeddings): TransformerWordEmbeddings(
4
+ (model): XLMRobertaModel(
5
+ (embeddings): RobertaEmbeddings(
6
+ (word_embeddings): Embedding(250002, 768, padding_idx=1)
7
+ (position_embeddings): Embedding(514, 768, padding_idx=1)
8
+ (token_type_embeddings): Embedding(1, 768)
9
+ (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
10
+ (dropout): Dropout(p=0.1, inplace=False)
11
+ )
12
+ (encoder): RobertaEncoder(
13
+ (layer): ModuleList(
14
+ (0): RobertaLayer(
15
+ (attention): RobertaAttention(
16
+ (self): RobertaSelfAttention(
17
+ (query): Linear(in_features=768, out_features=768, bias=True)
18
+ (key): Linear(in_features=768, out_features=768, bias=True)
19
+ (value): Linear(in_features=768, out_features=768, bias=True)
20
+ (dropout): Dropout(p=0.1, inplace=False)
21
+ )
22
+ (output): RobertaSelfOutput(
23
+ (dense): Linear(in_features=768, out_features=768, bias=True)
24
+ (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
25
+ (dropout): Dropout(p=0.1, inplace=False)
26
+ )
27
+ )
28
+ (intermediate): RobertaIntermediate(
29
+ (dense): Linear(in_features=768, out_features=3072, bias=True)
30
+ (intermediate_act_fn): GELUActivation()
31
+ )
32
+ (output): RobertaOutput(
33
+ (dense): Linear(in_features=3072, out_features=768, bias=True)
34
+ (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
35
+ (dropout): Dropout(p=0.1, inplace=False)
36
+ )
37
+ )
38
+ (1): RobertaLayer(
39
+ (attention): RobertaAttention(
40
+ (self): RobertaSelfAttention(
41
+ (query): Linear(in_features=768, out_features=768, bias=True)
42
+ (key): Linear(in_features=768, out_features=768, bias=True)
43
+ (value): Linear(in_features=768, out_features=768, bias=True)
44
+ (dropout): Dropout(p=0.1, inplace=False)
45
+ )
46
+ (output): RobertaSelfOutput(
47
+ (dense): Linear(in_features=768, out_features=768, bias=True)
48
+ (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
49
+ (dropout): Dropout(p=0.1, inplace=False)
50
+ )
51
+ )
52
+ (intermediate): RobertaIntermediate(
53
+ (dense): Linear(in_features=768, out_features=3072, bias=True)
54
+ (intermediate_act_fn): GELUActivation()
55
+ )
56
+ (output): RobertaOutput(
57
+ (dense): Linear(in_features=3072, out_features=768, bias=True)
58
+ (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
59
+ (dropout): Dropout(p=0.1, inplace=False)
60
+ )
61
+ )
62
+ (2): RobertaLayer(
63
+ (attention): RobertaAttention(
64
+ (self): RobertaSelfAttention(
65
+ (query): Linear(in_features=768, out_features=768, bias=True)
66
+ (key): Linear(in_features=768, out_features=768, bias=True)
67
+ (value): Linear(in_features=768, out_features=768, bias=True)
68
+ (dropout): Dropout(p=0.1, inplace=False)
69
+ )
70
+ (output): RobertaSelfOutput(
71
+ (dense): Linear(in_features=768, out_features=768, bias=True)
72
+ (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
73
+ (dropout): Dropout(p=0.1, inplace=False)
74
+ )
75
+ )
76
+ (intermediate): RobertaIntermediate(
77
+ (dense): Linear(in_features=768, out_features=3072, bias=True)
78
+ (intermediate_act_fn): GELUActivation()
79
+ )
80
+ (output): RobertaOutput(
81
+ (dense): Linear(in_features=3072, out_features=768, bias=True)
82
+ (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
83
+ (dropout): Dropout(p=0.1, inplace=False)
84
+ )
85
+ )
86
+ (3): RobertaLayer(
87
+ (attention): RobertaAttention(
88
+ (self): RobertaSelfAttention(
89
+ (query): Linear(in_features=768, out_features=768, bias=True)
90
+ (key): Linear(in_features=768, out_features=768, bias=True)
91
+ (value): Linear(in_features=768, out_features=768, bias=True)
92
+ (dropout): Dropout(p=0.1, inplace=False)
93
+ )
94
+ (output): RobertaSelfOutput(
95
+ (dense): Linear(in_features=768, out_features=768, bias=True)
96
+ (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
97
+ (dropout): Dropout(p=0.1, inplace=False)
98
+ )
99
+ )
100
+ (intermediate): RobertaIntermediate(
101
+ (dense): Linear(in_features=768, out_features=3072, bias=True)
102
+ (intermediate_act_fn): GELUActivation()
103
+ )
104
+ (output): RobertaOutput(
105
+ (dense): Linear(in_features=3072, out_features=768, bias=True)
106
+ (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
107
+ (dropout): Dropout(p=0.1, inplace=False)
108
+ )
109
+ )
110
+ (4): RobertaLayer(
111
+ (attention): RobertaAttention(
112
+ (self): RobertaSelfAttention(
113
+ (query): Linear(in_features=768, out_features=768, bias=True)
114
+ (key): Linear(in_features=768, out_features=768, bias=True)
115
+ (value): Linear(in_features=768, out_features=768, bias=True)
116
+ (dropout): Dropout(p=0.1, inplace=False)
117
+ )
118
+ (output): RobertaSelfOutput(
119
+ (dense): Linear(in_features=768, out_features=768, bias=True)
120
+ (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
121
+ (dropout): Dropout(p=0.1, inplace=False)
122
+ )
123
+ )
124
+ (intermediate): RobertaIntermediate(
125
+ (dense): Linear(in_features=768, out_features=3072, bias=True)
126
+ (intermediate_act_fn): GELUActivation()
127
+ )
128
+ (output): RobertaOutput(
129
+ (dense): Linear(in_features=3072, out_features=768, bias=True)
130
+ (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
131
+ (dropout): Dropout(p=0.1, inplace=False)
132
+ )
133
+ )
134
+ (5): RobertaLayer(
135
+ (attention): RobertaAttention(
136
+ (self): RobertaSelfAttention(
137
+ (query): Linear(in_features=768, out_features=768, bias=True)
138
+ (key): Linear(in_features=768, out_features=768, bias=True)
139
+ (value): Linear(in_features=768, out_features=768, bias=True)
140
+ (dropout): Dropout(p=0.1, inplace=False)
141
+ )
142
+ (output): RobertaSelfOutput(
143
+ (dense): Linear(in_features=768, out_features=768, bias=True)
144
+ (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
145
+ (dropout): Dropout(p=0.1, inplace=False)
146
+ )
147
+ )
148
+ (intermediate): RobertaIntermediate(
149
+ (dense): Linear(in_features=768, out_features=3072, bias=True)
150
+ (intermediate_act_fn): GELUActivation()
151
+ )
152
+ (output): RobertaOutput(
153
+ (dense): Linear(in_features=3072, out_features=768, bias=True)
154
+ (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
155
+ (dropout): Dropout(p=0.1, inplace=False)
156
+ )
157
+ )
158
+ (6): RobertaLayer(
159
+ (attention): RobertaAttention(
160
+ (self): RobertaSelfAttention(
161
+ (query): Linear(in_features=768, out_features=768, bias=True)
162
+ (key): Linear(in_features=768, out_features=768, bias=True)
163
+ (value): Linear(in_features=768, out_features=768, bias=True)
164
+ (dropout): Dropout(p=0.1, inplace=False)
165
+ )
166
+ (output): RobertaSelfOutput(
167
+ (dense): Linear(in_features=768, out_features=768, bias=True)
168
+ (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
169
+ (dropout): Dropout(p=0.1, inplace=False)
170
+ )
171
+ )
172
+ (intermediate): RobertaIntermediate(
173
+ (dense): Linear(in_features=768, out_features=3072, bias=True)
174
+ (intermediate_act_fn): GELUActivation()
175
+ )
176
+ (output): RobertaOutput(
177
+ (dense): Linear(in_features=3072, out_features=768, bias=True)
178
+ (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
179
+ (dropout): Dropout(p=0.1, inplace=False)
180
+ )
181
+ )
182
+ (7): RobertaLayer(
183
+ (attention): RobertaAttention(
184
+ (self): RobertaSelfAttention(
185
+ (query): Linear(in_features=768, out_features=768, bias=True)
186
+ (key): Linear(in_features=768, out_features=768, bias=True)
187
+ (value): Linear(in_features=768, out_features=768, bias=True)
188
+ (dropout): Dropout(p=0.1, inplace=False)
189
+ )
190
+ (output): RobertaSelfOutput(
191
+ (dense): Linear(in_features=768, out_features=768, bias=True)
192
+ (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
193
+ (dropout): Dropout(p=0.1, inplace=False)
194
+ )
195
+ )
196
+ (intermediate): RobertaIntermediate(
197
+ (dense): Linear(in_features=768, out_features=3072, bias=True)
198
+ (intermediate_act_fn): GELUActivation()
199
+ )
200
+ (output): RobertaOutput(
201
+ (dense): Linear(in_features=3072, out_features=768, bias=True)
202
+ (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
203
+ (dropout): Dropout(p=0.1, inplace=False)
204
+ )
205
+ )
206
+ (8): RobertaLayer(
207
+ (attention): RobertaAttention(
208
+ (self): RobertaSelfAttention(
209
+ (query): Linear(in_features=768, out_features=768, bias=True)
210
+ (key): Linear(in_features=768, out_features=768, bias=True)
211
+ (value): Linear(in_features=768, out_features=768, bias=True)
212
+ (dropout): Dropout(p=0.1, inplace=False)
213
+ )
214
+ (output): RobertaSelfOutput(
215
+ (dense): Linear(in_features=768, out_features=768, bias=True)
216
+ (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
217
+ (dropout): Dropout(p=0.1, inplace=False)
218
+ )
219
+ )
220
+ (intermediate): RobertaIntermediate(
221
+ (dense): Linear(in_features=768, out_features=3072, bias=True)
222
+ (intermediate_act_fn): GELUActivation()
223
+ )
224
+ (output): RobertaOutput(
225
+ (dense): Linear(in_features=3072, out_features=768, bias=True)
226
+ (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
227
+ (dropout): Dropout(p=0.1, inplace=False)
228
+ )
229
+ )
230
+ (9): RobertaLayer(
231
+ (attention): RobertaAttention(
232
+ (self): RobertaSelfAttention(
233
+ (query): Linear(in_features=768, out_features=768, bias=True)
234
+ (key): Linear(in_features=768, out_features=768, bias=True)
235
+ (value): Linear(in_features=768, out_features=768, bias=True)
236
+ (dropout): Dropout(p=0.1, inplace=False)
237
+ )
238
+ (output): RobertaSelfOutput(
239
+ (dense): Linear(in_features=768, out_features=768, bias=True)
240
+ (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
241
+ (dropout): Dropout(p=0.1, inplace=False)
242
+ )
243
+ )
244
+ (intermediate): RobertaIntermediate(
245
+ (dense): Linear(in_features=768, out_features=3072, bias=True)
246
+ (intermediate_act_fn): GELUActivation()
247
+ )
248
+ (output): RobertaOutput(
249
+ (dense): Linear(in_features=3072, out_features=768, bias=True)
250
+ (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
251
+ (dropout): Dropout(p=0.1, inplace=False)
252
+ )
253
+ )
254
+ (10): RobertaLayer(
255
+ (attention): RobertaAttention(
256
+ (self): RobertaSelfAttention(
257
+ (query): Linear(in_features=768, out_features=768, bias=True)
258
+ (key): Linear(in_features=768, out_features=768, bias=True)
259
+ (value): Linear(in_features=768, out_features=768, bias=True)
260
+ (dropout): Dropout(p=0.1, inplace=False)
261
+ )
262
+ (output): RobertaSelfOutput(
263
+ (dense): Linear(in_features=768, out_features=768, bias=True)
264
+ (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
265
+ (dropout): Dropout(p=0.1, inplace=False)
266
+ )
267
+ )
268
+ (intermediate): RobertaIntermediate(
269
+ (dense): Linear(in_features=768, out_features=3072, bias=True)
270
+ (intermediate_act_fn): GELUActivation()
271
+ )
272
+ (output): RobertaOutput(
273
+ (dense): Linear(in_features=3072, out_features=768, bias=True)
274
+ (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
275
+ (dropout): Dropout(p=0.1, inplace=False)
276
+ )
277
+ )
278
+ (11): RobertaLayer(
279
+ (attention): RobertaAttention(
280
+ (self): RobertaSelfAttention(
281
+ (query): Linear(in_features=768, out_features=768, bias=True)
282
+ (key): Linear(in_features=768, out_features=768, bias=True)
283
+ (value): Linear(in_features=768, out_features=768, bias=True)
284
+ (dropout): Dropout(p=0.1, inplace=False)
285
+ )
286
+ (output): RobertaSelfOutput(
287
+ (dense): Linear(in_features=768, out_features=768, bias=True)
288
+ (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
289
+ (dropout): Dropout(p=0.1, inplace=False)
290
+ )
291
+ )
292
+ (intermediate): RobertaIntermediate(
293
+ (dense): Linear(in_features=768, out_features=3072, bias=True)
294
+ (intermediate_act_fn): GELUActivation()
295
+ )
296
+ (output): RobertaOutput(
297
+ (dense): Linear(in_features=3072, out_features=768, bias=True)
298
+ (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
299
+ (dropout): Dropout(p=0.1, inplace=False)
300
+ )
301
+ )
302
+ )
303
+ )
304
+ (pooler): RobertaPooler(
305
+ (dense): Linear(in_features=768, out_features=768, bias=True)
306
+ (activation): Tanh()
307
+ )
308
+ )
309
+ )
310
+ (word_dropout): WordDropout(p=0.05)
311
+ (locked_dropout): LockedDropout(p=0.5)
312
+ (embedding2nn): Linear(in_features=768, out_features=768, bias=True)
313
+ (rnn): LSTM(768, 256, batch_first=True, bidirectional=True)
314
+ (linear): Linear(in_features=512, out_features=15, bias=True)
315
+ (loss_function): ViterbiLoss()
316
+ (crf): CRF()
317
+ )"
318
+ 2022-10-26 15:28:10,176 ----------------------------------------------------------------------------------------------------
319
+ 2022-10-26 15:28:10,180 Corpus: "Corpus: 8551 train + 1425 dev + 1425 test sentences"
320
+ 2022-10-26 15:28:10,182 ----------------------------------------------------------------------------------------------------
321
+ 2022-10-26 15:28:10,184 Parameters:
322
+ 2022-10-26 15:28:10,186 - learning_rate: "0.010000"
323
+ 2022-10-26 15:28:10,187 - mini_batch_size: "8"
324
+ 2022-10-26 15:28:10,188 - patience: "3"
325
+ 2022-10-26 15:28:10,189 - anneal_factor: "0.5"
326
+ 2022-10-26 15:28:10,191 - max_epochs: "10"
327
+ 2022-10-26 15:28:10,192 - shuffle: "True"
328
+ 2022-10-26 15:28:10,193 - train_with_dev: "False"
329
+ 2022-10-26 15:28:10,194 - batch_growth_annealing: "False"
330
+ 2022-10-26 15:28:10,196 ----------------------------------------------------------------------------------------------------
331
+ 2022-10-26 15:28:10,197 Model training base path: "/content/model/xlmr_ner"
332
+ 2022-10-26 15:28:10,198 ----------------------------------------------------------------------------------------------------
333
+ 2022-10-26 15:28:10,199 Device: cuda:0
334
+ 2022-10-26 15:28:10,201 ----------------------------------------------------------------------------------------------------
335
+ 2022-10-26 15:28:10,202 Embeddings storage mode: none
336
+ 2022-10-26 15:28:10,203 ----------------------------------------------------------------------------------------------------
337
+ 2022-10-26 15:30:29,962 epoch 1 - iter 106/1069 - loss 0.55101171 - samples/sec: 6.07 - lr: 0.010000
338
+ 2022-10-26 15:32:28,714 epoch 1 - iter 212/1069 - loss 0.35636418 - samples/sec: 7.14 - lr: 0.010000
339
+ 2022-10-26 15:34:23,625 epoch 1 - iter 318/1069 - loss 0.28047260 - samples/sec: 7.38 - lr: 0.010000
340
+ 2022-10-26 15:36:24,015 epoch 1 - iter 424/1069 - loss 0.23890211 - samples/sec: 7.04 - lr: 0.010000
341
+ 2022-10-26 15:38:21,987 epoch 1 - iter 530/1069 - loss 0.21322222 - samples/sec: 7.19 - lr: 0.010000
342
+ 2022-10-26 15:40:22,521 epoch 1 - iter 636/1069 - loss 0.19431796 - samples/sec: 7.04 - lr: 0.010000
343
+ 2022-10-26 15:42:18,754 epoch 1 - iter 742/1069 - loss 0.18084010 - samples/sec: 7.30 - lr: 0.010000
344
+ 2022-10-26 15:44:18,344 epoch 1 - iter 848/1069 - loss 0.16975329 - samples/sec: 7.09 - lr: 0.010000
345
+ 2022-10-26 15:46:14,738 epoch 1 - iter 954/1069 - loss 0.16158584 - samples/sec: 7.29 - lr: 0.010000
346
+ 2022-10-26 15:48:14,067 epoch 1 - iter 1060/1069 - loss 0.15491697 - samples/sec: 7.11 - lr: 0.010000
347
+ 2022-10-26 15:48:24,569 ----------------------------------------------------------------------------------------------------
348
+ 2022-10-26 15:48:24,577 EPOCH 1 done: loss 0.1543 - lr 0.010000
349
+ 2022-10-26 15:50:17,480 Evaluating as a multi-label problem: False
350
+ 2022-10-26 15:50:17,512 DEV : loss 0.060714565217494965 - f1-score (micro avg) 0.7908
351
+ 2022-10-26 15:50:17,553 BAD EPOCHS (no improvement): 0
352
+ 2022-10-26 15:50:17,554 saving best model
353
+ 2022-10-26 15:50:23,470 ----------------------------------------------------------------------------------------------------
354
+ 2022-10-26 15:52:24,219 epoch 2 - iter 106/1069 - loss 0.08869057 - samples/sec: 7.02 - lr: 0.010000
355
+ 2022-10-26 15:54:21,594 epoch 2 - iter 212/1069 - loss 0.08600343 - samples/sec: 7.23 - lr: 0.010000
356
+ 2022-10-26 15:56:19,809 epoch 2 - iter 318/1069 - loss 0.08546665 - samples/sec: 7.17 - lr: 0.010000
357
+ 2022-10-26 15:58:17,214 epoch 2 - iter 424/1069 - loss 0.08476718 - samples/sec: 7.22 - lr: 0.010000
358
+ 2022-10-26 16:00:16,114 epoch 2 - iter 530/1069 - loss 0.08542624 - samples/sec: 7.13 - lr: 0.010000
359
+ 2022-10-26 16:02:13,540 epoch 2 - iter 636/1069 - loss 0.08522910 - samples/sec: 7.22 - lr: 0.010000
360
+ 2022-10-26 16:04:12,854 epoch 2 - iter 742/1069 - loss 0.08502467 - samples/sec: 7.11 - lr: 0.010000
361
+ 2022-10-26 16:06:13,219 epoch 2 - iter 848/1069 - loss 0.08373459 - samples/sec: 7.05 - lr: 0.010000
362
+ 2022-10-26 16:08:09,808 epoch 2 - iter 954/1069 - loss 0.08316639 - samples/sec: 7.27 - lr: 0.010000
363
+ 2022-10-26 16:10:11,036 epoch 2 - iter 1060/1069 - loss 0.08215396 - samples/sec: 7.00 - lr: 0.010000
364
+ 2022-10-26 16:10:21,246 ----------------------------------------------------------------------------------------------------
365
+ 2022-10-26 16:10:21,249 EPOCH 2 done: loss 0.0821 - lr 0.010000
366
+ 2022-10-26 16:12:13,875 Evaluating as a multi-label problem: False
367
+ 2022-10-26 16:12:13,905 DEV : loss 0.05180404335260391 - f1-score (micro avg) 0.8408
368
+ 2022-10-26 16:12:13,947 BAD EPOCHS (no improvement): 0
369
+ 2022-10-26 16:12:13,948 saving best model
370
+ 2022-10-26 16:12:19,344 ----------------------------------------------------------------------------------------------------
371
+ 2022-10-26 16:14:19,879 epoch 3 - iter 106/1069 - loss 0.06627178 - samples/sec: 7.04 - lr: 0.010000
372
+ 2022-10-26 16:16:18,272 epoch 3 - iter 212/1069 - loss 0.07094348 - samples/sec: 7.16 - lr: 0.010000
373
+ 2022-10-26 16:18:18,453 epoch 3 - iter 318/1069 - loss 0.07194093 - samples/sec: 7.06 - lr: 0.010000
374
+ 2022-10-26 16:20:15,802 epoch 3 - iter 424/1069 - loss 0.07242840 - samples/sec: 7.23 - lr: 0.010000
375
+ 2022-10-26 16:22:12,248 epoch 3 - iter 530/1069 - loss 0.07171872 - samples/sec: 7.28 - lr: 0.010000
376
+ 2022-10-26 16:24:12,231 epoch 3 - iter 636/1069 - loss 0.07162092 - samples/sec: 7.07 - lr: 0.010000
377
+ 2022-10-26 16:26:10,382 epoch 3 - iter 742/1069 - loss 0.07130310 - samples/sec: 7.18 - lr: 0.010000
378
+ 2022-10-26 16:28:08,953 epoch 3 - iter 848/1069 - loss 0.07050136 - samples/sec: 7.15 - lr: 0.010000
379
+ 2022-10-26 16:30:09,728 epoch 3 - iter 954/1069 - loss 0.07070517 - samples/sec: 7.02 - lr: 0.010000
380
+ 2022-10-26 16:32:08,721 epoch 3 - iter 1060/1069 - loss 0.07033198 - samples/sec: 7.13 - lr: 0.010000
381
+ 2022-10-26 16:32:18,654 ----------------------------------------------------------------------------------------------------
382
+ 2022-10-26 16:32:18,656 EPOCH 3 done: loss 0.0702 - lr 0.010000
383
+ 2022-10-26 16:34:10,956 Evaluating as a multi-label problem: False
384
+ 2022-10-26 16:34:10,986 DEV : loss 0.04575943946838379 - f1-score (micro avg) 0.8693
385
+ 2022-10-26 16:34:11,026 BAD EPOCHS (no improvement): 0
386
+ 2022-10-26 16:34:11,029 saving best model
387
+ 2022-10-26 16:34:16,564 ----------------------------------------------------------------------------------------------------
388
+ 2022-10-26 16:36:12,350 epoch 4 - iter 106/1069 - loss 0.06432601 - samples/sec: 7.32 - lr: 0.010000
389
+ 2022-10-26 16:38:08,474 epoch 4 - iter 212/1069 - loss 0.06376094 - samples/sec: 7.30 - lr: 0.010000
390
+ 2022-10-26 16:40:03,219 epoch 4 - iter 318/1069 - loss 0.06273795 - samples/sec: 7.39 - lr: 0.010000
391
+ 2022-10-26 16:41:59,110 epoch 4 - iter 424/1069 - loss 0.06153989 - samples/sec: 7.32 - lr: 0.010000
392
+ 2022-10-26 16:43:57,347 epoch 4 - iter 530/1069 - loss 0.06137878 - samples/sec: 7.17 - lr: 0.010000
393
+ 2022-10-26 16:45:55,146 epoch 4 - iter 636/1069 - loss 0.06072772 - samples/sec: 7.20 - lr: 0.010000
394
+ 2022-10-26 16:47:53,049 epoch 4 - iter 742/1069 - loss 0.06031769 - samples/sec: 7.19 - lr: 0.010000
395
+ 2022-10-26 16:49:50,705 epoch 4 - iter 848/1069 - loss 0.06084099 - samples/sec: 7.21 - lr: 0.010000
396
+ 2022-10-26 16:51:49,833 epoch 4 - iter 954/1069 - loss 0.06096388 - samples/sec: 7.12 - lr: 0.010000
397
+ 2022-10-26 16:53:45,640 epoch 4 - iter 1060/1069 - loss 0.06061743 - samples/sec: 7.32 - lr: 0.010000
398
+ 2022-10-26 16:53:54,974 ----------------------------------------------------------------------------------------------------
399
+ 2022-10-26 16:53:54,976 EPOCH 4 done: loss 0.0606 - lr 0.010000
400
+ 2022-10-26 16:55:45,518 Evaluating as a multi-label problem: False
401
+ 2022-10-26 16:55:45,548 DEV : loss 0.04747875779867172 - f1-score (micro avg) 0.8627
402
+ 2022-10-26 16:55:45,589 BAD EPOCHS (no improvement): 1
403
+ 2022-10-26 16:55:45,590 ----------------------------------------------------------------------------------------------------
404
+ 2022-10-26 16:57:41,259 epoch 5 - iter 106/1069 - loss 0.05285565 - samples/sec: 7.33 - lr: 0.010000
405
+ 2022-10-26 16:59:40,296 epoch 5 - iter 212/1069 - loss 0.05049977 - samples/sec: 7.12 - lr: 0.010000
406
+ 2022-10-26 17:01:35,184 epoch 5 - iter 318/1069 - loss 0.05297933 - samples/sec: 7.38 - lr: 0.010000
407
+ 2022-10-26 17:03:34,028 epoch 5 - iter 424/1069 - loss 0.05293744 - samples/sec: 7.14 - lr: 0.010000
408
+ 2022-10-26 17:05:29,295 epoch 5 - iter 530/1069 - loss 0.05359386 - samples/sec: 7.36 - lr: 0.010000
409
+ 2022-10-26 17:07:25,593 epoch 5 - iter 636/1069 - loss 0.05307424 - samples/sec: 7.29 - lr: 0.010000
410
+ 2022-10-26 17:09:22,893 epoch 5 - iter 742/1069 - loss 0.05323355 - samples/sec: 7.23 - lr: 0.010000
411
+ 2022-10-26 17:11:22,602 epoch 5 - iter 848/1069 - loss 0.05272547 - samples/sec: 7.08 - lr: 0.010000
412
+ 2022-10-26 17:13:22,960 epoch 5 - iter 954/1069 - loss 0.05280553 - samples/sec: 7.05 - lr: 0.010000
413
+ 2022-10-26 17:15:20,527 epoch 5 - iter 1060/1069 - loss 0.05265360 - samples/sec: 7.21 - lr: 0.010000
414
+ 2022-10-26 17:15:29,931 ----------------------------------------------------------------------------------------------------
415
+ 2022-10-26 17:15:29,932 EPOCH 5 done: loss 0.0526 - lr 0.010000
416
+ 2022-10-26 17:17:21,728 Evaluating as a multi-label problem: False
417
+ 2022-10-26 17:17:21,760 DEV : loss 0.03879784420132637 - f1-score (micro avg) 0.8864
418
+ 2022-10-26 17:17:21,803 BAD EPOCHS (no improvement): 0
419
+ 2022-10-26 17:17:21,804 saving best model
420
+ 2022-10-26 17:17:27,330 ----------------------------------------------------------------------------------------------------
421
+ 2022-10-26 17:19:26,401 epoch 6 - iter 106/1069 - loss 0.04801558 - samples/sec: 7.12 - lr: 0.010000
422
+ 2022-10-26 17:21:22,988 epoch 6 - iter 212/1069 - loss 0.05008290 - samples/sec: 7.27 - lr: 0.010000
423
+ 2022-10-26 17:23:16,794 epoch 6 - iter 318/1069 - loss 0.04925649 - samples/sec: 7.45 - lr: 0.010000
424
+ 2022-10-26 17:25:15,532 epoch 6 - iter 424/1069 - loss 0.04786643 - samples/sec: 7.14 - lr: 0.010000
425
+ 2022-10-26 17:27:13,913 epoch 6 - iter 530/1069 - loss 0.04879792 - samples/sec: 7.16 - lr: 0.010000
426
+ 2022-10-26 17:29:10,114 epoch 6 - iter 636/1069 - loss 0.04800786 - samples/sec: 7.30 - lr: 0.010000
427
+ 2022-10-26 17:31:07,810 epoch 6 - iter 742/1069 - loss 0.04755361 - samples/sec: 7.21 - lr: 0.010000
428
+ 2022-10-26 17:33:04,496 epoch 6 - iter 848/1069 - loss 0.04782375 - samples/sec: 7.27 - lr: 0.010000
429
+ 2022-10-26 17:35:05,834 epoch 6 - iter 954/1069 - loss 0.04776160 - samples/sec: 6.99 - lr: 0.010000
430
+ 2022-10-26 17:37:03,878 epoch 6 - iter 1060/1069 - loss 0.04743945 - samples/sec: 7.18 - lr: 0.010000
431
+ 2022-10-26 17:37:14,466 ----------------------------------------------------------------------------------------------------
432
+ 2022-10-26 17:37:14,468 EPOCH 6 done: loss 0.0475 - lr 0.010000
433
+ 2022-10-26 17:39:07,562 Evaluating as a multi-label problem: False
434
+ 2022-10-26 17:39:07,592 DEV : loss 0.03874654322862625 - f1-score (micro avg) 0.8908
435
+ 2022-10-26 17:39:07,633 BAD EPOCHS (no improvement): 0
436
+ 2022-10-26 17:39:07,635 saving best model
437
+ 2022-10-26 17:39:13,242 ----------------------------------------------------------------------------------------------------
438
+ 2022-10-26 17:41:11,924 epoch 7 - iter 106/1069 - loss 0.04334369 - samples/sec: 7.15 - lr: 0.010000
439
+ 2022-10-26 17:43:11,382 epoch 7 - iter 212/1069 - loss 0.04192565 - samples/sec: 7.10 - lr: 0.010000
440
+ 2022-10-26 17:45:08,087 epoch 7 - iter 318/1069 - loss 0.04115627 - samples/sec: 7.27 - lr: 0.010000
441
+ 2022-10-26 17:47:06,615 epoch 7 - iter 424/1069 - loss 0.04114928 - samples/sec: 7.16 - lr: 0.010000
442
+ 2022-10-26 17:49:03,863 epoch 7 - iter 530/1069 - loss 0.04105023 - samples/sec: 7.23 - lr: 0.010000
443
+ 2022-10-26 17:51:02,216 epoch 7 - iter 636/1069 - loss 0.04125208 - samples/sec: 7.17 - lr: 0.010000
444
+ 2022-10-26 17:53:04,293 epoch 7 - iter 742/1069 - loss 0.04151765 - samples/sec: 6.95 - lr: 0.010000
445
+ 2022-10-26 17:55:01,446 epoch 7 - iter 848/1069 - loss 0.04170200 - samples/sec: 7.24 - lr: 0.010000
446
+ 2022-10-26 17:56:59,848 epoch 7 - iter 954/1069 - loss 0.04180177 - samples/sec: 7.16 - lr: 0.010000
447
+ 2022-10-26 17:58:56,175 epoch 7 - iter 1060/1069 - loss 0.04203413 - samples/sec: 7.29 - lr: 0.010000
448
+ 2022-10-26 17:59:05,814 ----------------------------------------------------------------------------------------------------
449
+ 2022-10-26 17:59:05,816 EPOCH 7 done: loss 0.0420 - lr 0.010000
450
+ 2022-10-26 18:00:59,457 Evaluating as a multi-label problem: False
451
+ 2022-10-26 18:00:59,486 DEV : loss 0.04413652420043945 - f1-score (micro avg) 0.8968
452
+ 2022-10-26 18:00:59,527 BAD EPOCHS (no improvement): 0
453
+ 2022-10-26 18:00:59,529 saving best model
454
+ 2022-10-26 18:01:05,372 ----------------------------------------------------------------------------------------------------
455
+ 2022-10-26 18:03:03,422 epoch 8 - iter 106/1069 - loss 0.03592615 - samples/sec: 7.18 - lr: 0.010000
456
+ 2022-10-26 18:05:00,466 epoch 8 - iter 212/1069 - loss 0.03676863 - samples/sec: 7.25 - lr: 0.010000
457
+ 2022-10-26 18:06:58,178 epoch 8 - iter 318/1069 - loss 0.03702258 - samples/sec: 7.20 - lr: 0.010000
458
+ 2022-10-26 18:08:55,170 epoch 8 - iter 424/1069 - loss 0.03704658 - samples/sec: 7.25 - lr: 0.010000
459
+ 2022-10-26 18:10:52,222 epoch 8 - iter 530/1069 - loss 0.03711348 - samples/sec: 7.25 - lr: 0.010000
460
+ 2022-10-26 18:12:51,244 epoch 8 - iter 636/1069 - loss 0.03715815 - samples/sec: 7.13 - lr: 0.010000
461
+ 2022-10-26 18:14:50,229 epoch 8 - iter 742/1069 - loss 0.03708747 - samples/sec: 7.13 - lr: 0.010000
462
+ 2022-10-26 18:16:47,946 epoch 8 - iter 848/1069 - loss 0.03734575 - samples/sec: 7.20 - lr: 0.010000
463
+ 2022-10-26 18:18:45,873 epoch 8 - iter 954/1069 - loss 0.03736843 - samples/sec: 7.19 - lr: 0.010000
464
+ 2022-10-26 18:20:43,504 epoch 8 - iter 1060/1069 - loss 0.03737578 - samples/sec: 7.21 - lr: 0.010000
465
+ 2022-10-26 18:20:53,262 ----------------------------------------------------------------------------------------------------
466
+ 2022-10-26 18:20:53,265 EPOCH 8 done: loss 0.0374 - lr 0.010000
467
+ 2022-10-26 18:22:46,256 Evaluating as a multi-label problem: False
468
+ 2022-10-26 18:22:46,293 DEV : loss 0.03726610541343689 - f1-score (micro avg) 0.9117
469
+ 2022-10-26 18:22:46,336 BAD EPOCHS (no improvement): 0
470
+ 2022-10-26 18:22:46,337 saving best model
471
+ 2022-10-26 18:22:51,847 ----------------------------------------------------------------------------------------------------
472
+ 2022-10-26 18:24:50,402 epoch 9 - iter 106/1069 - loss 0.03606101 - samples/sec: 7.15 - lr: 0.010000
473
+ 2022-10-26 18:26:47,577 epoch 9 - iter 212/1069 - loss 0.03466163 - samples/sec: 7.24 - lr: 0.010000
474
+ 2022-10-26 18:28:47,029 epoch 9 - iter 318/1069 - loss 0.03420843 - samples/sec: 7.10 - lr: 0.010000
475
+ 2022-10-26 18:30:43,235 epoch 9 - iter 424/1069 - loss 0.03406325 - samples/sec: 7.30 - lr: 0.010000
476
+ 2022-10-26 18:32:41,132 epoch 9 - iter 530/1069 - loss 0.03393077 - samples/sec: 7.19 - lr: 0.010000
477
+ 2022-10-26 18:34:35,953 epoch 9 - iter 636/1069 - loss 0.03438052 - samples/sec: 7.39 - lr: 0.010000
478
+ 2022-10-26 18:36:33,872 epoch 9 - iter 742/1069 - loss 0.03435922 - samples/sec: 7.19 - lr: 0.010000
479
+ 2022-10-26 18:38:30,457 epoch 9 - iter 848/1069 - loss 0.03351594 - samples/sec: 7.27 - lr: 0.010000
480
+ 2022-10-26 18:40:26,775 epoch 9 - iter 954/1069 - loss 0.03363514 - samples/sec: 7.29 - lr: 0.010000
481
+ 2022-10-26 18:42:26,040 epoch 9 - iter 1060/1069 - loss 0.03301736 - samples/sec: 7.11 - lr: 0.010000
482
+ 2022-10-26 18:42:34,477 ----------------------------------------------------------------------------------------------------
483
+ 2022-10-26 18:42:34,480 EPOCH 9 done: loss 0.0330 - lr 0.010000
484
+ 2022-10-26 18:44:24,572 Evaluating as a multi-label problem: False
485
+ 2022-10-26 18:44:24,602 DEV : loss 0.04557322338223457 - f1-score (micro avg) 0.9084
486
+ 2022-10-26 18:44:24,644 BAD EPOCHS (no improvement): 1
487
+ 2022-10-26 18:44:24,646 ----------------------------------------------------------------------------------------------------
488
+ 2022-10-26 18:46:21,774 epoch 10 - iter 106/1069 - loss 0.02992093 - samples/sec: 7.24 - lr: 0.010000
489
+ 2022-10-26 18:48:20,730 epoch 10 - iter 212/1069 - loss 0.02886380 - samples/sec: 7.13 - lr: 0.010000
490
+ 2022-10-26 18:50:20,679 epoch 10 - iter 318/1069 - loss 0.03109654 - samples/sec: 7.07 - lr: 0.010000
491
+ 2022-10-26 18:52:14,564 epoch 10 - iter 424/1069 - loss 0.03091892 - samples/sec: 7.45 - lr: 0.010000
492
+ 2022-10-26 18:54:14,888 epoch 10 - iter 530/1069 - loss 0.02977117 - samples/sec: 7.05 - lr: 0.010000
493
+ 2022-10-26 18:56:13,992 epoch 10 - iter 636/1069 - loss 0.02969566 - samples/sec: 7.12 - lr: 0.010000
494
+ 2022-10-26 18:58:12,618 epoch 10 - iter 742/1069 - loss 0.02979601 - samples/sec: 7.15 - lr: 0.010000
495
+ 2022-10-26 19:00:10,398 epoch 10 - iter 848/1069 - loss 0.03040781 - samples/sec: 7.20 - lr: 0.010000
496
+ 2022-10-26 19:02:06,063 epoch 10 - iter 954/1069 - loss 0.03029135 - samples/sec: 7.33 - lr: 0.010000
497
+ 2022-10-26 19:04:05,626 epoch 10 - iter 1060/1069 - loss 0.03035206 - samples/sec: 7.09 - lr: 0.010000
498
+ 2022-10-26 19:04:15,538 ----------------------------------------------------------------------------------------------------
499
+ 2022-10-26 19:04:15,540 EPOCH 10 done: loss 0.0303 - lr 0.010000
500
+ 2022-10-26 19:06:06,586 Evaluating as a multi-label problem: False
501
+ 2022-10-26 19:06:06,621 DEV : loss 0.03892701491713524 - f1-score (micro avg) 0.9132
502
+ 2022-10-26 19:06:06,663 BAD EPOCHS (no improvement): 0
503
+ 2022-10-26 19:06:06,665 saving best model
504
+ 2022-10-26 19:06:17,597 ----------------------------------------------------------------------------------------------------
505
+ 2022-10-26 19:06:17,723 loading file /content/model/xlmr_ner/best-model.pt
506
+ 2022-10-26 19:06:24,597 SequenceTagger predicts: Dictionary with 15 tags: O, S-PER, B-PER, E-PER, I-PER, S-MISC, B-MISC, E-MISC, I-MISC, S-LOC, B-LOC, E-LOC, I-LOC, <START>, <STOP>
507
+ 2022-10-26 19:08:17,003 Evaluating as a multi-label problem: False
508
+ 2022-10-26 19:08:17,040 0.9053 0.9316 0.9182 0.8955
509
+ 2022-10-26 19:08:17,041
510
+ Results:
511
+ - F-score (micro) 0.9182
512
+ - F-score (macro) 0.8875
513
+ - Accuracy 0.8955
514
+
515
+ By class:
516
+ precision recall f1-score support
517
+
518
+ PER 0.9339 0.9633 0.9484 2127
519
+ MISC 0.8469 0.9250 0.8842 933
520
+ LOC 0.8955 0.7732 0.8299 388
521
+
522
+ micro avg 0.9053 0.9316 0.9182 3448
523
+ macro avg 0.8921 0.8872 0.8875 3448
524
+ weighted avg 0.9060 0.9316 0.9177 3448
525
+
526
+ 2022-10-26 19:08:17,045 ----------------------------------------------------------------------------------------------------
weights.txt ADDED
File without changes