File size: 165,554 Bytes
a118412
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
[2023-10-13 02:59:14,478] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2023-10-13 02:59:16,541] [WARNING] [runner.py:196:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only.
[2023-10-13 02:59:16,541] [INFO] [runner.py:555:main] cmd = /usr/local/miniconda3/envs/llava/bin/python -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbMCwgMV19 --master_addr=127.0.0.1 --master_port=29500 --enable_each_rank_log=None llava/train/train_mem_video.py --deepspeed ./scripts/zero2.json --lora_enable True --model_name_or_path /hy-tmp/vicuna-7b-v1.3 --version v1 --data_path ./data/avsd_train_omni.json --video_folder /hy-tmp/Charades_v1_480 --vision_tower /hy-tmp/clip-vit-large-patch14 --pretrain_mm_mlp_adapter /hy-tmp/llava-pretrain-vicuna-7b-v1.3/mm_projector.bin --mm_vision_select_layer -2 --mm_use_im_start_end False --mm_use_im_patch_token False --bf16 True --output_dir /hy-tmp/checkpoints/omni-vicuna-7b-v1.3-finetune_lora --num_train_epochs 8 --per_device_train_batch_size 8 --per_device_eval_batch_size 4 --gradient_accumulation_steps 8 --evaluation_strategy no --save_strategy steps --save_steps 100 --save_total_limit 3 --learning_rate 2e-5 --weight_decay 0. --warmup_ratio 0.03 --lr_scheduler_type cosine --logging_steps 1 --tf32 True --model_max_length 2048 --gradient_checkpointing True --lazy_preprocess True --dataloader_num_workers 8 --report_to wandb
[2023-10-13 02:59:17,802] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2023-10-13 02:59:19,574] [INFO] [launch.py:138:main] 0 NCCL_P2P_LEVEL=NVL
[2023-10-13 02:59:19,574] [INFO] [launch.py:145:main] WORLD INFO DICT: {'localhost': [0, 1]}
[2023-10-13 02:59:19,574] [INFO] [launch.py:151:main] nnodes=1, num_local_procs=2, node_rank=0
[2023-10-13 02:59:19,574] [INFO] [launch.py:162:main] global_rank_mapping=defaultdict(<class 'list'>, {'localhost': [0, 1]})
[2023-10-13 02:59:19,574] [INFO] [launch.py:163:main] dist_world_size=2
[2023-10-13 02:59:19,574] [INFO] [launch.py:165:main] Setting CUDA_VISIBLE_DEVICES=0,1
[2023-10-13 02:59:22,389] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2023-10-13 02:59:22,433] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2023-10-13 02:59:22,977] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented
[2023-10-13 02:59:22,977] [INFO] [comm.py:594:init_distributed] cdb=None
[2023-10-13 02:59:22,977] [INFO] [comm.py:625:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl
[2023-10-13 02:59:23,051] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented
[2023-10-13 02:59:23,051] [INFO] [comm.py:594:init_distributed] cdb=None
You are using a model of type llama to instantiate a model of type omni. This is not supported for all configurations of models and can yield errors.
You are using a model of type llama to instantiate a model of type omni. This is not supported for all configurations of models and can yield errors.

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]
Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]
Loading checkpoint shards:  50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ     | 1/2 [00:17<00:17, 17.74s/it]
Loading checkpoint shards:  50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ     | 1/2 [00:24<00:24, 24.14s/it]
Loading checkpoint shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2/2 [00:24<00:00, 11.07s/it]
Loading checkpoint shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2/2 [00:24<00:00, 12.07s/it]

Loading checkpoint shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2/2 [00:35<00:00, 16.58s/it]
Loading checkpoint shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2/2 [00:35<00:00, 17.72s/it]
Adding LoRA adapters...
You are using the legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. This means that tokens that come after special tokens will not be properly handled. We recommend you to read the related pull request available at https://github.com/huggingface/transformers/pull/24565
You are using the legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. This means that tokens that come after special tokens will not be properly handled. We recommend you to read the related pull request available at https://github.com/huggingface/transformers/pull/24565
Formatting inputs...Skip in lazy mode
Rank: 0 partition count [2, 2] and sizes[(82444288, False), (2176, False)] 
Rank: 1 partition count [2, 2] and sizes[(82444288, False), (2176, False)] 
wandb: Currently logged in as: wanghao-cst. Use `wandb login --relogin` to force relogin
wandb: Tracking run with wandb version 0.15.12
wandb: Run data is saved locally in /root/Omni-LLM/wandb/run-20231013_030309-30lhy90r
wandb: Run `wandb offline` to turn off syncing.
wandb: Syncing run fiery-dew-9
wandb: ⭐️ View project at https://wandb.ai/wanghao-cst/huggingface
wandb: πŸš€ View run at https://wandb.ai/wanghao-cst/huggingface/runs/30lhy90r

  0%|          | 0/616 [00:00<?, ?it/s]
  0%|          | 1/616 [01:39<17:00:22, 99.55s/it]
                                                  
{'loss': 12.2148, 'learning_rate': 1.0526315789473685e-06, 'epoch': 0.01}

  0%|          | 1/616 [01:39<17:00:22, 99.55s/it]
  0%|          | 2/616 [02:35<12:33:55, 73.67s/it]
                                                  
{'loss': 12.0312, 'learning_rate': 2.105263157894737e-06, 'epoch': 0.03}

  0%|          | 2/616 [02:35<12:33:55, 73.67s/it]
  0%|          | 3/616 [03:30<11:05:55, 65.18s/it]
                                                  
{'loss': 12.3086, 'learning_rate': 3.157894736842105e-06, 'epoch': 0.04}

  0%|          | 3/616 [03:30<11:05:55, 65.18s/it]
  1%|          | 4/616 [04:24<10:22:36, 61.04s/it]
                                                  
{'loss': 12.1172, 'learning_rate': 4.210526315789474e-06, 'epoch': 0.05}

  1%|          | 4/616 [04:24<10:22:36, 61.04s/it]
  1%|          | 5/616 [05:20<10:01:41, 59.09s/it]
                                                  
{'loss': 12.0117, 'learning_rate': 5.263157894736842e-06, 'epoch': 0.06}

  1%|          | 5/616 [05:20<10:01:41, 59.09s/it]
  1%|          | 6/616 [06:15<9:47:18, 57.77s/it] 
                                                 
{'loss': 12.2656, 'learning_rate': 6.31578947368421e-06, 'epoch': 0.08}

  1%|          | 6/616 [06:15<9:47:18, 57.77s/it]
  1%|          | 7/616 [07:11<9:39:26, 57.09s/it]
                                                 
{'loss': 12.125, 'learning_rate': 7.368421052631579e-06, 'epoch': 0.09}

  1%|          | 7/616 [07:11<9:39:26, 57.09s/it]
  1%|▏         | 8/616 [08:05<9:30:10, 56.27s/it]
                                                 
{'loss': 11.2266, 'learning_rate': 8.421052631578948e-06, 'epoch': 0.1}

  1%|▏         | 8/616 [08:05<9:30:10, 56.27s/it]
  1%|▏         | 9/616 [09:01<9:28:26, 56.19s/it]
                                                 
{'loss': 11.1523, 'learning_rate': 9.473684210526315e-06, 'epoch': 0.12}

  1%|▏         | 9/616 [09:01<9:28:26, 56.19s/it]
  2%|▏         | 10/616 [09:56<9:23:58, 55.84s/it]
                                                  
{'loss': 9.5234, 'learning_rate': 1.0526315789473684e-05, 'epoch': 0.13}

  2%|▏         | 10/616 [09:56<9:23:58, 55.84s/it]
  2%|▏         | 11/616 [10:52<9:23:12, 55.86s/it]
                                                  
{'loss': 9.4688, 'learning_rate': 1.1578947368421053e-05, 'epoch': 0.14}

  2%|▏         | 11/616 [10:52<9:23:12, 55.86s/it]
  2%|▏         | 12/616 [11:49<9:24:18, 56.06s/it]
                                                  
{'loss': 9.25, 'learning_rate': 1.263157894736842e-05, 'epoch': 0.16}

  2%|▏         | 12/616 [11:49<9:24:18, 56.06s/it]
  2%|▏         | 13/616 [12:44<9:20:17, 55.75s/it]
                                                  
{'loss': 7.7285, 'learning_rate': 1.3684210526315791e-05, 'epoch': 0.17}

  2%|▏         | 13/616 [12:44<9:20:17, 55.75s/it]
  2%|▏         | 14/616 [13:39<9:16:20, 55.45s/it]
                                                  
{'loss': 7.6367, 'learning_rate': 1.4736842105263159e-05, 'epoch': 0.18}

  2%|▏         | 14/616 [13:39<9:16:20, 55.45s/it]
  2%|▏         | 15/616 [14:34<9:16:23, 55.55s/it]
                                                  
{'loss': 7.4844, 'learning_rate': 1.578947368421053e-05, 'epoch': 0.19}

  2%|▏         | 15/616 [14:34<9:16:23, 55.55s/it]
  3%|β–Ž         | 16/616 [15:30<9:16:29, 55.65s/it]
                                                  
{'loss': 7.2422, 'learning_rate': 1.6842105263157896e-05, 'epoch': 0.21}

  3%|β–Ž         | 16/616 [15:30<9:16:29, 55.65s/it]
  3%|β–Ž         | 17/616 [16:27<9:18:45, 55.97s/it]
                                                  
{'loss': 7.0938, 'learning_rate': 1.7894736842105264e-05, 'epoch': 0.22}

  3%|β–Ž         | 17/616 [16:27<9:18:45, 55.97s/it]
  3%|β–Ž         | 18/616 [17:22<9:14:48, 55.67s/it]
                                                  
{'loss': 6.7266, 'learning_rate': 1.894736842105263e-05, 'epoch': 0.23}

  3%|β–Ž         | 18/616 [17:22<9:14:48, 55.67s/it]
  3%|β–Ž         | 19/616 [18:17<9:10:47, 55.36s/it]
                                                  
{'loss': 6.5234, 'learning_rate': 2e-05, 'epoch': 0.25}

  3%|β–Ž         | 19/616 [18:17<9:10:47, 55.36s/it]
  3%|β–Ž         | 20/616 [19:13<9:11:42, 55.54s/it]
                                                  
{'loss': 6.3477, 'learning_rate': 1.9999861541352416e-05, 'epoch': 0.26}

  3%|β–Ž         | 20/616 [19:13<9:11:42, 55.54s/it]
  3%|β–Ž         | 21/616 [20:08<9:10:46, 55.54s/it]
                                                  
{'loss': 6.127, 'learning_rate': 1.9999446169243816e-05, 'epoch': 0.27}

  3%|β–Ž         | 21/616 [20:08<9:10:46, 55.54s/it]
  4%|β–Ž         | 22/616 [21:03<9:07:59, 55.35s/it]
                                                  
{'loss': 5.8555, 'learning_rate': 1.9998753895176576e-05, 'epoch': 0.29}

  4%|β–Ž         | 22/616 [21:03<9:07:59, 55.35s/it]
  4%|β–Ž         | 23/616 [22:00<9:11:42, 55.82s/it]
                                                  
{'loss': 5.7402, 'learning_rate': 1.999778473832096e-05, 'epoch': 0.3}

  4%|β–Ž         | 23/616 [22:00<9:11:42, 55.82s/it]
  4%|▍         | 24/616 [22:54<9:06:29, 55.39s/it]
                                                  
{'loss': 5.5605, 'learning_rate': 1.9996538725514597e-05, 'epoch': 0.31}

  4%|▍         | 24/616 [22:54<9:06:29, 55.39s/it]
  4%|▍         | 25/616 [23:50<9:05:28, 55.38s/it]
                                                  
{'loss': 5.4199, 'learning_rate': 1.999501589126174e-05, 'epoch': 0.32}

  4%|▍         | 25/616 [23:50<9:05:28, 55.38s/it]
  4%|▍         | 26/616 [24:46<9:07:10, 55.65s/it]
                                                  
{'loss': 5.3242, 'learning_rate': 1.9993216277732302e-05, 'epoch': 0.34}

  4%|▍         | 26/616 [24:46<9:07:10, 55.65s/it]
  4%|▍         | 27/616 [25:42<9:05:58, 55.62s/it]
                                                  
{'loss': 5.2148, 'learning_rate': 1.999113993476069e-05, 'epoch': 0.35}

  4%|▍         | 27/616 [25:42<9:05:58, 55.62s/it]
  5%|▍         | 28/616 [26:37<9:04:07, 55.52s/it]
                                                  
{'loss': 5.1016, 'learning_rate': 1.9988786919844437e-05, 'epoch': 0.36}

  5%|▍         | 28/616 [26:37<9:04:07, 55.52s/it]
  5%|▍         | 29/616 [27:34<9:06:56, 55.91s/it]
                                                  
{'loss': 5.0488, 'learning_rate': 1.9986157298142595e-05, 'epoch': 0.38}

  5%|▍         | 29/616 [27:34<9:06:56, 55.91s/it]
  5%|▍         | 30/616 [28:28<9:02:34, 55.55s/it]
                                                  
{'loss': 4.9258, 'learning_rate': 1.9983251142473935e-05, 'epoch': 0.39}

  5%|▍         | 30/616 [28:28<9:02:34, 55.55s/it]
  5%|β–Œ         | 31/616 [29:26<9:06:21, 56.04s/it]
                                                  
{'loss': 4.9531, 'learning_rate': 1.9980068533314937e-05, 'epoch': 0.4}

  5%|β–Œ         | 31/616 [29:26<9:06:21, 56.04s/it]
  5%|β–Œ         | 32/616 [30:21<9:05:01, 56.00s/it]
                                                  
{'loss': 4.8535, 'learning_rate': 1.9976609558797545e-05, 'epoch': 0.42}

  5%|β–Œ         | 32/616 [30:21<9:05:01, 56.00s/it]
  5%|β–Œ         | 33/616 [31:17<9:02:02, 55.79s/it]
                                                  
{'loss': 4.8203, 'learning_rate': 1.9972874314706755e-05, 'epoch': 0.43}

  5%|β–Œ         | 33/616 [31:17<9:02:02, 55.79s/it]
  6%|β–Œ         | 34/616 [32:12<8:58:28, 55.51s/it]
                                                  
{'loss': 4.8535, 'learning_rate': 1.9968862904477936e-05, 'epoch': 0.44}

  6%|β–Œ         | 34/616 [32:12<8:58:28, 55.51s/it]
  6%|β–Œ         | 35/616 [33:07<8:56:38, 55.42s/it]
                                                  
{'loss': 4.7168, 'learning_rate': 1.9964575439193966e-05, 'epoch': 0.45}

  6%|β–Œ         | 35/616 [33:07<8:56:38, 55.42s/it]
  6%|β–Œ         | 36/616 [34:02<8:53:51, 55.23s/it]
                                                  
{'loss': 4.6875, 'learning_rate': 1.996001203758218e-05, 'epoch': 0.47}

  6%|β–Œ         | 36/616 [34:02<8:53:51, 55.23s/it]
  6%|β–Œ         | 37/616 [34:56<8:50:03, 54.93s/it]
                                                  
{'loss': 4.6172, 'learning_rate': 1.995517282601106e-05, 'epoch': 0.48}

  6%|β–Œ         | 37/616 [34:56<8:50:03, 54.93s/it]
  6%|β–Œ         | 38/616 [35:52<8:51:28, 55.17s/it]
                                                  
{'loss': 4.6523, 'learning_rate': 1.9950057938486745e-05, 'epoch': 0.49}

  6%|β–Œ         | 38/616 [35:52<8:51:28, 55.17s/it]
  6%|β–‹         | 39/616 [36:48<8:54:36, 55.59s/it]
                                                  
{'loss': 4.5195, 'learning_rate': 1.994466751664932e-05, 'epoch': 0.51}

  6%|β–‹         | 39/616 [36:48<8:54:36, 55.59s/it]
  6%|β–‹         | 40/616 [37:44<8:54:53, 55.72s/it]
                                                  
{'loss': 4.5117, 'learning_rate': 1.993900170976888e-05, 'epoch': 0.52}

  6%|β–‹         | 40/616 [37:44<8:54:53, 55.72s/it]
  7%|β–‹         | 41/616 [38:39<8:52:15, 55.54s/it]
                                                  
{'loss': 4.4141, 'learning_rate': 1.9933060674741422e-05, 'epoch': 0.53}

  7%|β–‹         | 41/616 [38:39<8:52:15, 55.54s/it]
  7%|β–‹         | 42/616 [39:35<8:52:13, 55.63s/it]
                                                  
{'loss': 4.3398, 'learning_rate': 1.9926844576084483e-05, 'epoch': 0.55}

  7%|β–‹         | 42/616 [39:35<8:52:13, 55.63s/it]
  7%|β–‹         | 43/616 [40:32<8:54:09, 55.93s/it]
                                                  
{'loss': 4.3232, 'learning_rate': 1.992035358593258e-05, 'epoch': 0.56}

  7%|β–‹         | 43/616 [40:32<8:54:09, 55.93s/it]
  7%|β–‹         | 44/616 [41:28<8:52:43, 55.88s/it]
                                                  
{'loss': 4.2305, 'learning_rate': 1.991358788403246e-05, 'epoch': 0.57}

  7%|β–‹         | 44/616 [41:28<8:52:43, 55.88s/it]
  7%|β–‹         | 45/616 [42:23<8:50:59, 55.80s/it]
                                                  
{'loss': 4.1641, 'learning_rate': 1.990654765773811e-05, 'epoch': 0.58}

  7%|β–‹         | 45/616 [42:23<8:50:59, 55.80s/it]
  7%|β–‹         | 46/616 [43:18<8:48:37, 55.65s/it]
                                                  
{'loss': 4.0674, 'learning_rate': 1.9899233102005573e-05, 'epoch': 0.6}

  7%|β–‹         | 46/616 [43:18<8:48:37, 55.65s/it]
  8%|β–Š         | 47/616 [44:14<8:48:51, 55.77s/it]
                                                  
{'loss': 3.915, 'learning_rate': 1.9891644419387545e-05, 'epoch': 0.61}

  8%|β–Š         | 47/616 [44:14<8:48:51, 55.77s/it]
  8%|β–Š         | 48/616 [45:10<8:46:06, 55.58s/it]
                                                  
{'loss': 3.7822, 'learning_rate': 1.9883781820027777e-05, 'epoch': 0.62}

  8%|β–Š         | 48/616 [45:10<8:46:06, 55.58s/it]
  8%|β–Š         | 49/616 [46:06<8:47:09, 55.78s/it]
                                                  
{'loss': 3.709, 'learning_rate': 1.987564552165524e-05, 'epoch': 0.64}

  8%|β–Š         | 49/616 [46:06<8:47:09, 55.78s/it]
  8%|β–Š         | 50/616 [47:03<8:49:27, 56.13s/it]
                                                  
{'loss': 3.4131, 'learning_rate': 1.9867235749578108e-05, 'epoch': 0.65}

  8%|β–Š         | 50/616 [47:03<8:49:27, 56.13s/it]
  8%|β–Š         | 51/616 [47:59<8:47:39, 56.03s/it]
                                                  
{'loss': 3.1318, 'learning_rate': 1.9858552736677516e-05, 'epoch': 0.66}

  8%|β–Š         | 51/616 [47:59<8:47:39, 56.03s/it]
  8%|β–Š         | 52/616 [48:56<8:49:19, 56.31s/it]
                                                  
{'loss': 2.834, 'learning_rate': 1.984959672340111e-05, 'epoch': 0.68}

  8%|β–Š         | 52/616 [48:56<8:49:19, 56.31s/it]
  9%|β–Š         | 53/616 [49:52<8:48:34, 56.33s/it]
                                                  
{'loss': 2.5654, 'learning_rate': 1.984036795775638e-05, 'epoch': 0.69}

  9%|β–Š         | 53/616 [49:52<8:48:34, 56.33s/it]
  9%|β–‰         | 54/616 [50:48<8:47:14, 56.29s/it]
                                                  
{'loss': 2.417, 'learning_rate': 1.9830866695303817e-05, 'epoch': 0.7}

  9%|β–‰         | 54/616 [50:48<8:47:14, 56.29s/it]
  9%|β–‰         | 55/616 [51:45<8:46:52, 56.35s/it]
                                                  
{'loss': 2.1909, 'learning_rate': 1.9821093199149806e-05, 'epoch': 0.71}

  9%|β–‰         | 55/616 [51:45<8:46:52, 56.35s/it]
  9%|β–‰         | 56/616 [52:41<8:47:14, 56.49s/it]
                                                  
{'loss': 2.2568, 'learning_rate': 1.981104773993936e-05, 'epoch': 0.73}

  9%|β–‰         | 56/616 [52:41<8:47:14, 56.49s/it]
  9%|β–‰         | 57/616 [53:37<8:44:24, 56.29s/it]
                                                  
{'loss': 2.2744, 'learning_rate': 1.980073059584862e-05, 'epoch': 0.74}

  9%|β–‰         | 57/616 [53:37<8:44:24, 56.29s/it]
  9%|β–‰         | 58/616 [54:34<8:43:43, 56.31s/it]
                                                  
{'loss': 2.0771, 'learning_rate': 1.9790142052577148e-05, 'epoch': 0.75}

  9%|β–‰         | 58/616 [54:34<8:43:43, 56.31s/it]
 10%|β–‰         | 59/616 [55:29<8:41:20, 56.16s/it]
                                                  
{'loss': 2.1729, 'learning_rate': 1.977928240334002e-05, 'epoch': 0.77}

 10%|β–‰         | 59/616 [55:29<8:41:20, 56.16s/it]
 10%|β–‰         | 60/616 [56:25<8:37:59, 55.90s/it]
                                                  
{'loss': 2.123, 'learning_rate': 1.9768151948859705e-05, 'epoch': 0.78}

 10%|β–‰         | 60/616 [56:25<8:37:59, 55.90s/it]
 10%|β–‰         | 61/616 [57:21<8:38:57, 56.10s/it]
                                                  
{'loss': 2.0356, 'learning_rate': 1.9756750997357738e-05, 'epoch': 0.79}

 10%|β–‰         | 61/616 [57:21<8:38:57, 56.10s/it]
 10%|β–ˆ         | 62/616 [58:17<8:37:46, 56.08s/it]
                                                  
{'loss': 2.0142, 'learning_rate': 1.9745079864546184e-05, 'epoch': 0.81}

 10%|β–ˆ         | 62/616 [58:17<8:37:46, 56.08s/it]
 10%|β–ˆ         | 63/616 [59:12<8:34:10, 55.79s/it]
                                                  
{'loss': 2.061, 'learning_rate': 1.97331388736189e-05, 'epoch': 0.82}

 10%|β–ˆ         | 63/616 [59:12<8:34:10, 55.79s/it]
 10%|β–ˆ         | 64/616 [1:00:08<8:31:47, 55.63s/it]
                                                    
{'loss': 2.0508, 'learning_rate': 1.972092835524257e-05, 'epoch': 0.83}

 10%|β–ˆ         | 64/616 [1:00:08<8:31:47, 55.63s/it]
 11%|β–ˆ         | 65/616 [1:01:05<8:36:39, 56.26s/it]
                                                    
{'loss': 2.0171, 'learning_rate': 1.9708448647547575e-05, 'epoch': 0.84}

 11%|β–ˆ         | 65/616 [1:01:05<8:36:39, 56.26s/it]
 11%|β–ˆ         | 66/616 [1:02:02<8:37:02, 56.40s/it]
                                                    
{'loss': 2.1284, 'learning_rate': 1.9695700096118594e-05, 'epoch': 0.86}

 11%|β–ˆ         | 66/616 [1:02:02<8:37:02, 56.40s/it]
 11%|β–ˆ         | 67/616 [1:02:58<8:35:36, 56.35s/it]
                                                    
{'loss': 2.0166, 'learning_rate': 1.9682683053985073e-05, 'epoch': 0.87}

 11%|β–ˆ         | 67/616 [1:02:58<8:35:36, 56.35s/it]
 11%|β–ˆ         | 68/616 [1:03:54<8:33:34, 56.23s/it]
                                                    
{'loss': 2.062, 'learning_rate': 1.966939788161142e-05, 'epoch': 0.88}

 11%|β–ˆ         | 68/616 [1:03:54<8:33:34, 56.23s/it]
 11%|β–ˆ         | 69/616 [1:04:50<8:31:20, 56.09s/it]
                                                    
{'loss': 2.0142, 'learning_rate': 1.9655844946887035e-05, 'epoch': 0.9}

 11%|β–ˆ         | 69/616 [1:04:50<8:31:20, 56.09s/it]
 11%|β–ˆβ–        | 70/616 [1:05:45<8:27:52, 55.81s/it]
                                                    
{'loss': 2.0103, 'learning_rate': 1.9642024625116117e-05, 'epoch': 0.91}

 11%|β–ˆβ–        | 70/616 [1:05:45<8:27:52, 55.81s/it]
 12%|β–ˆβ–        | 71/616 [1:06:41<8:26:19, 55.74s/it]
                                                    
{'loss': 1.9956, 'learning_rate': 1.9627937299007286e-05, 'epoch': 0.92}

 12%|β–ˆβ–        | 71/616 [1:06:41<8:26:19, 55.74s/it]
 12%|β–ˆβ–        | 72/616 [1:07:38<8:29:09, 56.16s/it]
                                                    
{'loss': 1.9868, 'learning_rate': 1.961358335866296e-05, 'epoch': 0.94}

 12%|β–ˆβ–        | 72/616 [1:07:38<8:29:09, 56.16s/it]
 12%|β–ˆβ–        | 73/616 [1:08:34<8:27:42, 56.10s/it]
                                                    
{'loss': 2.0435, 'learning_rate': 1.959896320156857e-05, 'epoch': 0.95}

 12%|β–ˆβ–        | 73/616 [1:08:34<8:27:42, 56.10s/it]
 12%|β–ˆβ–        | 74/616 [1:09:31<8:28:19, 56.27s/it]
                                                    
{'loss': 2.0112, 'learning_rate': 1.958407723258156e-05, 'epoch': 0.96}

 12%|β–ˆβ–        | 74/616 [1:09:31<8:28:19, 56.27s/it]
 12%|β–ˆβ–        | 75/616 [1:10:26<8:24:52, 55.99s/it]
                                                    
{'loss': 2.0908, 'learning_rate': 1.9568925863920155e-05, 'epoch': 0.97}

 12%|β–ˆβ–        | 75/616 [1:10:26<8:24:52, 55.99s/it]
 12%|β–ˆβ–        | 76/616 [1:11:22<8:25:13, 56.14s/it]
                                                    
{'loss': 1.9795, 'learning_rate': 1.955350951515195e-05, 'epoch': 0.99}

 12%|β–ˆβ–        | 76/616 [1:11:22<8:25:13, 56.14s/it]
 12%|β–ˆβ–Ž        | 77/616 [1:12:19<8:25:03, 56.22s/it]
                                                    
{'loss': 2.0112, 'learning_rate': 1.9537828613182314e-05, 'epoch': 1.0}

 12%|β–ˆβ–Ž        | 77/616 [1:12:19<8:25:03, 56.22s/it]
 13%|β–ˆβ–Ž        | 78/616 [1:13:47<9:49:34, 65.75s/it]
                                                    
{'loss': 2.0459, 'learning_rate': 1.9521883592242537e-05, 'epoch': 1.01}

 13%|β–ˆβ–Ž        | 78/616 [1:13:47<9:49:34, 65.75s/it]
 13%|β–ˆβ–Ž        | 79/616 [1:14:43<9:21:38, 62.75s/it]
                                                    
{'loss': 2.0117, 'learning_rate': 1.950567489387783e-05, 'epoch': 1.03}

 13%|β–ˆβ–Ž        | 79/616 [1:14:43<9:21:38, 62.75s/it]
 13%|β–ˆβ–Ž        | 80/616 [1:15:37<8:59:10, 60.35s/it]
                                                    
{'loss': 2.0156, 'learning_rate': 1.9489202966935084e-05, 'epoch': 1.04}

 13%|β–ˆβ–Ž        | 80/616 [1:15:37<8:59:10, 60.35s/it]
 13%|β–ˆβ–Ž        | 81/616 [1:16:33<8:45:30, 58.93s/it]
                                                    
{'loss': 2.0547, 'learning_rate': 1.947246826755044e-05, 'epoch': 1.05}

 13%|β–ˆβ–Ž        | 81/616 [1:16:33<8:45:30, 58.93s/it]
 13%|β–ˆβ–Ž        | 82/616 [1:17:29<8:37:02, 58.09s/it]
                                                    
{'loss': 1.9639, 'learning_rate': 1.945547125913667e-05, 'epoch': 1.06}

 13%|β–ˆβ–Ž        | 82/616 [1:17:29<8:37:02, 58.09s/it]
 13%|β–ˆβ–Ž        | 83/616 [1:18:25<8:29:53, 57.40s/it]
                                                    
{'loss': 2.019, 'learning_rate': 1.943821241237034e-05, 'epoch': 1.08}

 13%|β–ˆβ–Ž        | 83/616 [1:18:25<8:29:53, 57.40s/it]
 14%|β–ˆβ–Ž        | 84/616 [1:19:20<8:23:52, 56.83s/it]
                                                    
{'loss': 1.9771, 'learning_rate': 1.9420692205178753e-05, 'epoch': 1.09}

 14%|β–ˆβ–Ž        | 84/616 [1:19:20<8:23:52, 56.83s/it]
 14%|β–ˆβ–        | 85/616 [1:20:16<8:21:01, 56.61s/it]
                                                    
{'loss': 1.9492, 'learning_rate': 1.9402911122726756e-05, 'epoch': 1.1}

 14%|β–ˆβ–        | 85/616 [1:20:16<8:21:01, 56.61s/it]
 14%|β–ˆβ–        | 86/616 [1:21:11<8:14:46, 56.01s/it]
                                                    
{'loss': 1.9702, 'learning_rate': 1.9384869657403277e-05, 'epoch': 1.12}

 14%|β–ˆβ–        | 86/616 [1:21:11<8:14:46, 56.01s/it]
 14%|β–ˆβ–        | 87/616 [1:22:06<8:11:43, 55.77s/it]
                                                    
{'loss': 1.9946, 'learning_rate': 1.9366568308807685e-05, 'epoch': 1.13}

 14%|β–ˆβ–        | 87/616 [1:22:06<8:11:43, 55.77s/it]
 14%|β–ˆβ–        | 88/616 [1:23:01<8:09:00, 55.57s/it]
                                                    
{'loss': 1.9854, 'learning_rate': 1.9348007583735985e-05, 'epoch': 1.14}

 14%|β–ˆβ–        | 88/616 [1:23:01<8:09:00, 55.57s/it]
 14%|β–ˆβ–        | 89/616 [1:23:57<8:06:59, 55.45s/it]
                                                    
{'loss': 1.959, 'learning_rate': 1.9329187996166747e-05, 'epoch': 1.16}

 14%|β–ˆβ–        | 89/616 [1:23:57<8:06:59, 55.45s/it]
 15%|β–ˆβ–        | 90/616 [1:24:52<8:07:03, 55.56s/it]
                                                    
{'loss': 1.9722, 'learning_rate': 1.9310110067246905e-05, 'epoch': 1.17}

 15%|β–ˆβ–        | 90/616 [1:24:52<8:07:03, 55.56s/it]
 15%|β–ˆβ–        | 91/616 [1:25:48<8:07:08, 55.67s/it]
                                                    
{'loss': 2.0376, 'learning_rate': 1.9290774325277305e-05, 'epoch': 1.18}

 15%|β–ˆβ–        | 91/616 [1:25:48<8:07:08, 55.67s/it]
 15%|β–ˆβ–        | 92/616 [1:26:44<8:06:06, 55.66s/it]
                                                    
{'loss': 1.9834, 'learning_rate': 1.9271181305698084e-05, 'epoch': 1.19}

 15%|β–ˆβ–        | 92/616 [1:26:44<8:06:06, 55.66s/it]
 15%|β–ˆβ–Œ        | 93/616 [1:27:40<8:05:12, 55.66s/it]
                                                    
{'loss': 2.0049, 'learning_rate': 1.9251331551073843e-05, 'epoch': 1.21}

 15%|β–ˆβ–Œ        | 93/616 [1:27:40<8:05:12, 55.66s/it]
 15%|β–ˆβ–Œ        | 94/616 [1:28:35<8:03:16, 55.55s/it]
                                                    
{'loss': 1.9824, 'learning_rate': 1.923122561107861e-05, 'epoch': 1.22}

 15%|β–ˆβ–Œ        | 94/616 [1:28:35<8:03:16, 55.55s/it]
 15%|β–ˆβ–Œ        | 95/616 [1:29:30<8:02:27, 55.56s/it]
                                                    
{'loss': 1.9624, 'learning_rate': 1.9210864042480645e-05, 'epoch': 1.23}

 15%|β–ˆβ–Œ        | 95/616 [1:29:30<8:02:27, 55.56s/it]
 16%|β–ˆβ–Œ        | 96/616 [1:30:26<8:02:30, 55.67s/it]
                                                    
{'loss': 1.9395, 'learning_rate': 1.9190247409126993e-05, 'epoch': 1.25}

 16%|β–ˆβ–Œ        | 96/616 [1:30:26<8:02:30, 55.67s/it]
 16%|β–ˆβ–Œ        | 97/616 [1:31:22<8:01:13, 55.63s/it]
                                                    
{'loss': 1.9746, 'learning_rate': 1.916937628192789e-05, 'epoch': 1.26}

 16%|β–ˆβ–Œ        | 97/616 [1:31:22<8:01:13, 55.63s/it]
 16%|β–ˆβ–Œ        | 98/616 [1:32:17<7:59:31, 55.54s/it]
                                                    
{'loss': 1.9507, 'learning_rate': 1.9148251238840947e-05, 'epoch': 1.27}

 16%|β–ˆβ–Œ        | 98/616 [1:32:17<7:59:31, 55.54s/it]
 16%|β–ˆβ–Œ        | 99/616 [1:33:13<7:59:26, 55.64s/it]
                                                    
{'loss': 2.0054, 'learning_rate': 1.9126872864855142e-05, 'epoch': 1.29}

 16%|β–ˆβ–Œ        | 99/616 [1:33:13<7:59:26, 55.64s/it]
 16%|β–ˆβ–Œ        | 100/616 [1:34:09<7:58:14, 55.61s/it]
                                                     
{'loss': 1.9409, 'learning_rate': 1.9105241751974624e-05, 'epoch': 1.3}

 16%|β–ˆβ–Œ        | 100/616 [1:34:09<7:58:14, 55.61s/it]/usr/local/miniconda3/envs/llava/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details.
  warnings.warn(
/usr/local/miniconda3/envs/llava/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details.
  warnings.warn(

 16%|β–ˆβ–‹        | 101/616 [1:36:09<10:43:30, 74.97s/it]
                                                      
{'loss': 1.9912, 'learning_rate': 1.9083358499202323e-05, 'epoch': 1.31}

 16%|β–ˆβ–‹        | 101/616 [1:36:09<10:43:30, 74.97s/it]
 17%|β–ˆβ–‹        | 102/616 [1:37:04<9:52:10, 69.12s/it] 
                                                     
{'loss': 1.9404, 'learning_rate': 1.9061223712523352e-05, 'epoch': 1.32}

 17%|β–ˆβ–‹        | 102/616 [1:37:04<9:52:10, 69.12s/it]
 17%|β–ˆβ–‹        | 103/616 [1:38:00<9:16:14, 65.06s/it]
                                                     
{'loss': 1.9102, 'learning_rate': 1.903883800488824e-05, 'epoch': 1.34}

 17%|β–ˆβ–‹        | 103/616 [1:38:00<9:16:14, 65.06s/it]
 17%|β–ˆβ–‹        | 104/616 [1:38:55<8:49:52, 62.09s/it]
                                                     
{'loss': 1.9248, 'learning_rate': 1.9016201996195943e-05, 'epoch': 1.35}

 17%|β–ˆβ–‹        | 104/616 [1:38:55<8:49:52, 62.09s/it]
 17%|β–ˆβ–‹        | 105/616 [1:39:51<8:32:54, 60.22s/it]
                                                     
{'loss': 1.8984, 'learning_rate': 1.8993316313276694e-05, 'epoch': 1.36}

 17%|β–ˆβ–‹        | 105/616 [1:39:51<8:32:54, 60.22s/it]
 17%|β–ˆβ–‹        | 106/616 [1:40:46<8:19:20, 58.75s/it]
                                                     
{'loss': 1.9331, 'learning_rate': 1.8970181589874637e-05, 'epoch': 1.38}

 17%|β–ˆβ–‹        | 106/616 [1:40:46<8:19:20, 58.75s/it]
 17%|β–ˆβ–‹        | 107/616 [1:41:42<8:11:28, 57.93s/it]
                                                     
{'loss': 1.9561, 'learning_rate': 1.894679846663027e-05, 'epoch': 1.39}

 17%|β–ˆβ–‹        | 107/616 [1:41:42<8:11:28, 57.93s/it]
 18%|β–ˆβ–Š        | 108/616 [1:42:38<8:04:50, 57.26s/it]
                                                     
{'loss': 1.8901, 'learning_rate': 1.8923167591062723e-05, 'epoch': 1.4}

 18%|β–ˆβ–Š        | 108/616 [1:42:38<8:04:50, 57.26s/it]
 18%|β–ˆβ–Š        | 109/616 [1:43:34<8:00:58, 56.92s/it]
                                                     
{'loss': 1.9922, 'learning_rate': 1.8899289617551803e-05, 'epoch': 1.42}

 18%|β–ˆβ–Š        | 109/616 [1:43:34<8:00:58, 56.92s/it]
 18%|β–ˆβ–Š        | 110/616 [1:44:29<7:55:58, 56.44s/it]
                                                     
{'loss': 1.9277, 'learning_rate': 1.8875165207319902e-05, 'epoch': 1.43}

 18%|β–ˆβ–Š        | 110/616 [1:44:29<7:55:58, 56.44s/it]
 18%|β–ˆβ–Š        | 111/616 [1:45:25<7:53:10, 56.22s/it]
                                                     
{'loss': 1.9185, 'learning_rate': 1.8850795028413658e-05, 'epoch': 1.44}

 18%|β–ˆβ–Š        | 111/616 [1:45:25<7:53:10, 56.22s/it]
 18%|β–ˆβ–Š        | 112/616 [1:46:21<7:50:46, 56.04s/it]
                                                     
{'loss': 1.9575, 'learning_rate': 1.882617975568547e-05, 'epoch': 1.45}

 18%|β–ˆβ–Š        | 112/616 [1:46:21<7:50:46, 56.04s/it]
 18%|β–ˆβ–Š        | 113/616 [1:47:15<7:46:22, 55.63s/it]
                                                     
{'loss': 1.957, 'learning_rate': 1.880132007077482e-05, 'epoch': 1.47}

 18%|β–ˆβ–Š        | 113/616 [1:47:15<7:46:22, 55.63s/it]
 19%|β–ˆβ–Š        | 114/616 [1:48:11<7:45:53, 55.69s/it]
                                                     
{'loss': 1.8984, 'learning_rate': 1.8776216662089373e-05, 'epoch': 1.48}

 19%|β–ˆβ–Š        | 114/616 [1:48:11<7:45:53, 55.69s/it]
 19%|β–ˆβ–Š        | 115/616 [1:49:08<7:46:53, 55.92s/it]
                                                     
{'loss': 1.9429, 'learning_rate': 1.875087022478594e-05, 'epoch': 1.49}

 19%|β–ˆβ–Š        | 115/616 [1:49:08<7:46:53, 55.92s/it]
 19%|β–ˆβ–‰        | 116/616 [1:50:03<7:45:27, 55.86s/it]
                                                     
{'loss': 1.8701, 'learning_rate': 1.8725281460751198e-05, 'epoch': 1.51}

 19%|β–ˆβ–‰        | 116/616 [1:50:03<7:45:27, 55.86s/it]
 19%|β–ˆβ–‰        | 117/616 [1:50:59<7:43:13, 55.70s/it]
                                                     
{'loss': 1.9497, 'learning_rate': 1.869945107858228e-05, 'epoch': 1.52}

 19%|β–ˆβ–‰        | 117/616 [1:50:59<7:43:13, 55.70s/it]
 19%|β–ˆβ–‰        | 118/616 [1:51:55<7:44:34, 55.97s/it]
                                                     
{'loss': 1.8921, 'learning_rate': 1.867337979356715e-05, 'epoch': 1.53}

 19%|β–ˆβ–‰        | 118/616 [1:51:55<7:44:34, 55.97s/it]
 19%|β–ˆβ–‰        | 119/616 [1:52:51<7:42:44, 55.86s/it]
                                                     
{'loss': 1.8569, 'learning_rate': 1.8647068327664774e-05, 'epoch': 1.55}

 19%|β–ˆβ–‰        | 119/616 [1:52:51<7:42:44, 55.86s/it]
 19%|β–ˆβ–‰        | 120/616 [1:53:47<7:42:20, 55.93s/it]
                                                     
{'loss': 1.8882, 'learning_rate': 1.8620517409485148e-05, 'epoch': 1.56}

 19%|β–ˆβ–‰        | 120/616 [1:53:47<7:42:20, 55.93s/it]
 20%|β–ˆβ–‰        | 121/616 [1:54:43<7:40:16, 55.79s/it]
                                                     
{'loss': 1.8765, 'learning_rate': 1.8593727774269122e-05, 'epoch': 1.57}

 20%|β–ˆβ–‰        | 121/616 [1:54:43<7:40:16, 55.79s/it]
 20%|β–ˆβ–‰        | 122/616 [1:55:36<7:34:53, 55.25s/it]
                                                     
{'loss': 1.9282, 'learning_rate': 1.8566700163868027e-05, 'epoch': 1.58}

 20%|β–ˆβ–‰        | 122/616 [1:55:36<7:34:53, 55.25s/it]
 20%|β–ˆβ–‰        | 123/616 [1:56:32<7:34:34, 55.32s/it]
                                                     
{'loss': 1.8384, 'learning_rate': 1.8539435326723135e-05, 'epoch': 1.6}

 20%|β–ˆβ–‰        | 123/616 [1:56:32<7:34:34, 55.32s/it]
 20%|β–ˆβ–ˆ        | 124/616 [1:57:28<7:35:47, 55.58s/it]
                                                     
{'loss': 1.9185, 'learning_rate': 1.851193401784495e-05, 'epoch': 1.61}

 20%|β–ˆβ–ˆ        | 124/616 [1:57:28<7:35:47, 55.58s/it]
 20%|β–ˆβ–ˆ        | 125/616 [1:58:23<7:32:30, 55.30s/it]
                                                     
{'loss': 1.834, 'learning_rate': 1.848419699879227e-05, 'epoch': 1.62}

 20%|β–ˆβ–ˆ        | 125/616 [1:58:23<7:32:30, 55.30s/it]
 20%|β–ˆβ–ˆ        | 126/616 [1:59:19<7:32:47, 55.44s/it]
                                                     
{'loss': 1.8657, 'learning_rate': 1.845622503765113e-05, 'epoch': 1.64}

 20%|β–ˆβ–ˆ        | 126/616 [1:59:19<7:32:47, 55.44s/it]
 21%|β–ˆβ–ˆ        | 127/616 [2:00:14<7:31:52, 55.44s/it]
                                                     
{'loss': 1.8457, 'learning_rate': 1.842801890901351e-05, 'epoch': 1.65}

 21%|β–ˆβ–ˆ        | 127/616 [2:00:14<7:31:52, 55.44s/it]
 21%|β–ˆβ–ˆ        | 128/616 [2:01:09<7:30:42, 55.41s/it]
                                                     
{'loss': 1.7671, 'learning_rate': 1.8399579393955893e-05, 'epoch': 1.66}

 21%|β–ˆβ–ˆ        | 128/616 [2:01:09<7:30:42, 55.41s/it]
 21%|β–ˆβ–ˆ        | 129/616 [2:02:04<7:28:38, 55.27s/it]
                                                     
{'loss': 1.8462, 'learning_rate': 1.837090728001764e-05, 'epoch': 1.68}

 21%|β–ˆβ–ˆ        | 129/616 [2:02:04<7:28:38, 55.27s/it]
 21%|β–ˆβ–ˆ        | 130/616 [2:03:00<7:28:17, 55.34s/it]
                                                     
{'loss': 1.8296, 'learning_rate': 1.834200336117918e-05, 'epoch': 1.69}

 21%|β–ˆβ–ˆ        | 130/616 [2:03:00<7:28:17, 55.34s/it]
 21%|β–ˆβ–ˆβ–       | 131/616 [2:03:55<7:27:18, 55.34s/it]
                                                     
{'loss': 1.8262, 'learning_rate': 1.8312868437840002e-05, 'epoch': 1.7}

 21%|β–ˆβ–ˆβ–       | 131/616 [2:03:55<7:27:18, 55.34s/it]
 21%|β–ˆβ–ˆβ–       | 132/616 [2:04:50<7:25:41, 55.25s/it]
                                                     
{'loss': 1.835, 'learning_rate': 1.8283503316796536e-05, 'epoch': 1.71}

 21%|β–ˆβ–ˆβ–       | 132/616 [2:04:50<7:25:41, 55.25s/it]
 22%|β–ˆβ–ˆβ–       | 133/616 [2:05:46<7:26:05, 55.42s/it]
                                                     
{'loss': 1.8979, 'learning_rate': 1.8253908811219764e-05, 'epoch': 1.73}

 22%|β–ˆβ–ˆβ–       | 133/616 [2:05:46<7:26:05, 55.42s/it]
 22%|β–ˆβ–ˆβ–       | 134/616 [2:06:43<7:28:04, 55.78s/it]
                                                     
{'loss': 1.8496, 'learning_rate': 1.822408574063273e-05, 'epoch': 1.74}

 22%|β–ˆβ–ˆβ–       | 134/616 [2:06:43<7:28:04, 55.78s/it]
 22%|β–ˆβ–ˆβ–       | 135/616 [2:07:39<7:27:34, 55.83s/it]
                                                     
{'loss': 1.8252, 'learning_rate': 1.8194034930887842e-05, 'epoch': 1.75}

 22%|β–ˆβ–ˆβ–       | 135/616 [2:07:39<7:27:34, 55.83s/it]
 22%|β–ˆβ–ˆβ–       | 136/616 [2:08:34<7:26:39, 55.83s/it]
                                                     
{'loss': 1.7812, 'learning_rate': 1.8163757214143993e-05, 'epoch': 1.77}

 22%|β–ˆβ–ˆβ–       | 136/616 [2:08:34<7:26:39, 55.83s/it]
 22%|β–ˆβ–ˆβ–       | 137/616 [2:09:29<7:23:29, 55.55s/it]
                                                     
{'loss': 1.8364, 'learning_rate': 1.8133253428843524e-05, 'epoch': 1.78}

 22%|β–ˆβ–ˆβ–       | 137/616 [2:09:29<7:23:29, 55.55s/it]
 22%|β–ˆβ–ˆβ–       | 138/616 [2:10:25<7:21:53, 55.47s/it]
                                                     
{'loss': 1.8013, 'learning_rate': 1.810252441968901e-05, 'epoch': 1.79}

 22%|β–ˆβ–ˆβ–       | 138/616 [2:10:25<7:21:53, 55.47s/it]
 23%|β–ˆβ–ˆβ–Ž       | 139/616 [2:11:20<7:21:37, 55.55s/it]
                                                     
{'loss': 1.8203, 'learning_rate': 1.8071571037619856e-05, 'epoch': 1.81}

 23%|β–ˆβ–ˆβ–Ž       | 139/616 [2:11:20<7:21:37, 55.55s/it]
 23%|β–ˆβ–ˆβ–Ž       | 140/616 [2:12:17<7:22:24, 55.77s/it]
                                                     
{'loss': 1.7729, 'learning_rate': 1.804039413978875e-05, 'epoch': 1.82}

 23%|β–ˆβ–ˆβ–Ž       | 140/616 [2:12:17<7:22:24, 55.77s/it]
 23%|β–ˆβ–ˆβ–Ž       | 141/616 [2:13:12<7:20:17, 55.62s/it]
                                                     
{'loss': 1.8491, 'learning_rate': 1.8008994589537913e-05, 'epoch': 1.83}

 23%|β–ˆβ–ˆβ–Ž       | 141/616 [2:13:12<7:20:17, 55.62s/it]
 23%|β–ˆβ–ˆβ–Ž       | 142/616 [2:14:08<7:20:03, 55.70s/it]
                                                     
{'loss': 1.7998, 'learning_rate': 1.7977373256375194e-05, 'epoch': 1.84}

 23%|β–ˆβ–ˆβ–Ž       | 142/616 [2:14:08<7:20:03, 55.70s/it]
 23%|β–ˆβ–ˆβ–Ž       | 143/616 [2:15:03<7:18:17, 55.60s/it]
                                                     
{'loss': 1.8364, 'learning_rate': 1.7945531015950008e-05, 'epoch': 1.86}

 23%|β–ˆβ–ˆβ–Ž       | 143/616 [2:15:03<7:18:17, 55.60s/it]
 23%|β–ˆβ–ˆβ–Ž       | 144/616 [2:16:00<7:19:32, 55.87s/it]
                                                     
{'loss': 1.8125, 'learning_rate': 1.791346875002905e-05, 'epoch': 1.87}

 23%|β–ˆβ–ˆβ–Ž       | 144/616 [2:16:00<7:19:32, 55.87s/it]
 24%|β–ˆβ–ˆβ–Ž       | 145/616 [2:16:56<7:20:03, 56.06s/it]
                                                     
{'loss': 1.832, 'learning_rate': 1.7881187346471924e-05, 'epoch': 1.88}

 24%|β–ˆβ–ˆβ–Ž       | 145/616 [2:16:56<7:20:03, 56.06s/it]
 24%|β–ˆβ–ˆβ–Ž       | 146/616 [2:17:52<7:19:22, 56.09s/it]
                                                     
{'loss': 1.8271, 'learning_rate': 1.784868769920653e-05, 'epoch': 1.9}

 24%|β–ˆβ–ˆβ–Ž       | 146/616 [2:17:52<7:19:22, 56.09s/it]
 24%|β–ˆβ–ˆβ–       | 147/616 [2:18:48<7:18:03, 56.04s/it]
                                                     
{'loss': 1.7959, 'learning_rate': 1.7815970708204296e-05, 'epoch': 1.91}

 24%|β–ˆβ–ˆβ–       | 147/616 [2:18:48<7:18:03, 56.04s/it]
 24%|β–ˆβ–ˆβ–       | 148/616 [2:19:44<7:16:24, 55.95s/it]
                                                     
{'loss': 1.7798, 'learning_rate': 1.77830372794553e-05, 'epoch': 1.92}

 24%|β–ˆβ–ˆβ–       | 148/616 [2:19:44<7:16:24, 55.95s/it]
 24%|β–ˆβ–ˆβ–       | 149/616 [2:20:39<7:14:15, 55.79s/it]
                                                     
{'loss': 1.7651, 'learning_rate': 1.774988832494314e-05, 'epoch': 1.94}

 24%|β–ˆβ–ˆβ–       | 149/616 [2:20:39<7:14:15, 55.79s/it]
 24%|β–ˆβ–ˆβ–       | 150/616 [2:21:34<7:11:42, 55.58s/it]
                                                     
{'loss': 1.8076, 'learning_rate': 1.7716524762619695e-05, 'epoch': 1.95}

 24%|β–ˆβ–ˆβ–       | 150/616 [2:21:34<7:11:42, 55.58s/it]
 25%|β–ˆβ–ˆβ–       | 151/616 [2:22:30<7:09:34, 55.43s/it]
                                                     
{'loss': 1.8379, 'learning_rate': 1.7682947516379706e-05, 'epoch': 1.96}

 25%|β–ˆβ–ˆβ–       | 151/616 [2:22:30<7:09:34, 55.43s/it]
 25%|β–ˆβ–ˆβ–       | 152/616 [2:23:24<7:06:59, 55.21s/it]
                                                     
{'loss': 1.8228, 'learning_rate': 1.7649157516035205e-05, 'epoch': 1.97}

 25%|β–ˆβ–ˆβ–       | 152/616 [2:23:24<7:06:59, 55.21s/it]
 25%|β–ˆβ–ˆβ–       | 153/616 [2:24:20<7:06:55, 55.32s/it]
                                                     
{'loss': 1.7783, 'learning_rate': 1.7615155697289734e-05, 'epoch': 1.99}

 25%|β–ˆβ–ˆβ–       | 153/616 [2:24:20<7:06:55, 55.32s/it]
 25%|β–ˆβ–ˆβ–Œ       | 154/616 [2:25:16<7:07:25, 55.51s/it]
                                                     
{'loss': 1.8188, 'learning_rate': 1.7580943001712457e-05, 'epoch': 2.0}

 25%|β–ˆβ–ˆβ–Œ       | 154/616 [2:25:16<7:07:25, 55.51s/it]
 25%|β–ˆβ–ˆβ–Œ       | 155/616 [2:26:40<8:11:56, 64.03s/it]
                                                     
{'loss': 1.7974, 'learning_rate': 1.7546520376712093e-05, 'epoch': 2.01}

 25%|β–ˆβ–ˆβ–Œ       | 155/616 [2:26:40<8:11:56, 64.03s/it]
 25%|β–ˆβ–ˆβ–Œ       | 156/616 [2:27:36<7:52:12, 61.59s/it]
                                                     
{'loss': 1.7964, 'learning_rate': 1.7511888775510662e-05, 'epoch': 2.03}

 25%|β–ˆβ–ˆβ–Œ       | 156/616 [2:27:36<7:52:12, 61.59s/it]
 25%|β–ˆβ–ˆβ–Œ       | 157/616 [2:28:31<7:36:15, 59.64s/it]
                                                     
{'loss': 1.7515, 'learning_rate': 1.7477049157117093e-05, 'epoch': 2.04}

 25%|β–ˆβ–ˆβ–Œ       | 157/616 [2:28:31<7:36:15, 59.64s/it]
 26%|β–ˆβ–ˆβ–Œ       | 158/616 [2:29:26<7:25:42, 58.39s/it]
                                                     
{'loss': 1.7725, 'learning_rate': 1.744200248630068e-05, 'epoch': 2.05}

 26%|β–ˆβ–ˆβ–Œ       | 158/616 [2:29:26<7:25:42, 58.39s/it]
 26%|β–ˆβ–ˆβ–Œ       | 159/616 [2:30:22<7:18:37, 57.59s/it]
                                                     
{'loss': 1.7534, 'learning_rate': 1.7406749733564344e-05, 'epoch': 2.06}

 26%|β–ˆβ–ˆβ–Œ       | 159/616 [2:30:22<7:18:37, 57.59s/it]
 26%|β–ˆβ–ˆβ–Œ       | 160/616 [2:31:18<7:13:47, 57.08s/it]
                                                     
{'loss': 1.8408, 'learning_rate': 1.737129187511779e-05, 'epoch': 2.08}

 26%|β–ˆβ–ˆβ–Œ       | 160/616 [2:31:18<7:13:47, 57.08s/it]
 26%|β–ˆβ–ˆβ–Œ       | 161/616 [2:32:13<7:09:24, 56.63s/it]
                                                     
{'loss': 1.7686, 'learning_rate': 1.7335629892850436e-05, 'epoch': 2.09}

 26%|β–ˆβ–ˆβ–Œ       | 161/616 [2:32:13<7:09:24, 56.63s/it]
 26%|β–ˆβ–ˆβ–‹       | 162/616 [2:33:10<7:08:30, 56.63s/it]
                                                     
{'loss': 1.7642, 'learning_rate': 1.729976477430425e-05, 'epoch': 2.1}

 26%|β–ˆβ–ˆβ–‹       | 162/616 [2:33:10<7:08:30, 56.63s/it]
 26%|β–ˆβ–ˆβ–‹       | 163/616 [2:34:06<7:05:21, 56.34s/it]
                                                     
{'loss': 1.8047, 'learning_rate': 1.7263697512646397e-05, 'epoch': 2.12}

 26%|β–ˆβ–ˆβ–‹       | 163/616 [2:34:06<7:05:21, 56.34s/it]
 27%|β–ˆβ–ˆβ–‹       | 164/616 [2:35:02<7:03:59, 56.28s/it]
                                                     
{'loss': 1.8301, 'learning_rate': 1.7227429106641726e-05, 'epoch': 2.13}

 27%|β–ˆβ–ˆβ–‹       | 164/616 [2:35:02<7:03:59, 56.28s/it]
 27%|β–ˆβ–ˆβ–‹       | 165/616 [2:35:58<7:02:14, 56.17s/it]
                                                     
{'loss': 1.7588, 'learning_rate': 1.7190960560625127e-05, 'epoch': 2.14}

 27%|β–ˆβ–ˆβ–‹       | 165/616 [2:35:58<7:02:14, 56.17s/it]
 27%|β–ˆβ–ˆβ–‹       | 166/616 [2:36:53<6:59:31, 55.94s/it]
                                                     
{'loss': 1.7749, 'learning_rate': 1.7154292884473712e-05, 'epoch': 2.16}

 27%|β–ˆβ–ˆβ–‹       | 166/616 [2:36:53<6:59:31, 55.94s/it]
 27%|β–ˆβ–ˆβ–‹       | 167/616 [2:37:49<6:58:57, 55.99s/it]
                                                     
{'loss': 1.7251, 'learning_rate': 1.711742709357886e-05, 'epoch': 2.17}

 27%|β–ˆβ–ˆβ–‹       | 167/616 [2:37:49<6:58:57, 55.99s/it]
 27%|β–ˆβ–ˆβ–‹       | 168/616 [2:38:44<6:56:05, 55.73s/it]
                                                     
{'loss': 1.7603, 'learning_rate': 1.708036420881807e-05, 'epoch': 2.18}

 27%|β–ˆβ–ˆβ–‹       | 168/616 [2:38:44<6:56:05, 55.73s/it]
 27%|β–ˆβ–ˆβ–‹       | 169/616 [2:39:41<6:56:55, 55.96s/it]
                                                     
{'loss': 1.7339, 'learning_rate': 1.7043105256526723e-05, 'epoch': 2.19}

 27%|β–ˆβ–ˆβ–‹       | 169/616 [2:39:41<6:56:55, 55.96s/it]
 28%|β–ˆβ–ˆβ–Š       | 170/616 [2:40:38<6:58:19, 56.28s/it]
                                                     
{'loss': 1.731, 'learning_rate': 1.7005651268469652e-05, 'epoch': 2.21}

 28%|β–ˆβ–ˆβ–Š       | 170/616 [2:40:38<6:58:19, 56.28s/it]
 28%|β–ˆβ–ˆβ–Š       | 171/616 [2:41:33<6:54:13, 55.85s/it]
                                                     
{'loss': 1.7598, 'learning_rate': 1.6968003281812563e-05, 'epoch': 2.22}

 28%|β–ˆβ–ˆβ–Š       | 171/616 [2:41:33<6:54:13, 55.85s/it]
 28%|β–ˆβ–ˆβ–Š       | 172/616 [2:42:29<6:53:17, 55.85s/it]
                                                     
{'loss': 1.7007, 'learning_rate': 1.693016233909332e-05, 'epoch': 2.23}

 28%|β–ˆβ–ˆβ–Š       | 172/616 [2:42:29<6:53:17, 55.85s/it]
 28%|β–ˆβ–ˆβ–Š       | 173/616 [2:43:24<6:51:59, 55.80s/it]
                                                     
{'loss': 1.7183, 'learning_rate': 1.689212948819307e-05, 'epoch': 2.25}

 28%|β–ˆβ–ˆβ–Š       | 173/616 [2:43:24<6:51:59, 55.80s/it]
 28%|β–ˆβ–ˆβ–Š       | 174/616 [2:44:18<6:47:09, 55.27s/it]
                                                     
{'loss': 1.7173, 'learning_rate': 1.6853905782307235e-05, 'epoch': 2.26}

 28%|β–ˆβ–ˆβ–Š       | 174/616 [2:44:18<6:47:09, 55.27s/it]
 28%|β–ˆβ–ˆβ–Š       | 175/616 [2:45:16<6:51:57, 56.05s/it]
                                                     
{'loss': 1.7856, 'learning_rate': 1.681549227991634e-05, 'epoch': 2.27}

 28%|β–ˆβ–ˆβ–Š       | 175/616 [2:45:16<6:51:57, 56.05s/it]
 29%|β–ˆβ–ˆβ–Š       | 176/616 [2:46:11<6:48:50, 55.75s/it]
                                                     
{'loss': 1.7329, 'learning_rate': 1.67768900447567e-05, 'epoch': 2.29}

 29%|β–ˆβ–ˆβ–Š       | 176/616 [2:46:11<6:48:50, 55.75s/it]
 29%|β–ˆβ–ˆβ–Š       | 177/616 [2:47:07<6:46:59, 55.63s/it]
                                                     
{'loss': 1.7578, 'learning_rate': 1.6738100145790977e-05, 'epoch': 2.3}

 29%|β–ˆβ–ˆβ–Š       | 177/616 [2:47:07<6:46:59, 55.63s/it]
 29%|β–ˆβ–ˆβ–‰       | 178/616 [2:48:03<6:46:51, 55.73s/it]
                                                     
{'loss': 1.6846, 'learning_rate': 1.6699123657178553e-05, 'epoch': 2.31}

 29%|β–ˆβ–ˆβ–‰       | 178/616 [2:48:03<6:46:51, 55.73s/it]
 29%|β–ˆβ–ˆβ–‰       | 179/616 [2:48:57<6:43:53, 55.45s/it]
                                                     
{'loss': 1.791, 'learning_rate': 1.6659961658245813e-05, 'epoch': 2.32}

 29%|β–ˆβ–ˆβ–‰       | 179/616 [2:48:57<6:43:53, 55.45s/it]
 29%|β–ˆβ–ˆβ–‰       | 180/616 [2:49:53<6:43:27, 55.52s/it]
                                                     
{'loss': 1.7798, 'learning_rate': 1.6620615233456235e-05, 'epoch': 2.34}

 29%|β–ˆβ–ˆβ–‰       | 180/616 [2:49:53<6:43:27, 55.52s/it]
 29%|β–ˆβ–ˆβ–‰       | 181/616 [2:50:49<6:43:06, 55.60s/it]
                                                     
{'loss': 1.6987, 'learning_rate': 1.658108547238038e-05, 'epoch': 2.35}

 29%|β–ˆβ–ˆβ–‰       | 181/616 [2:50:49<6:43:06, 55.60s/it]
 30%|β–ˆβ–ˆβ–‰       | 182/616 [2:51:45<6:42:48, 55.69s/it]
                                                     
{'loss': 1.7202, 'learning_rate': 1.6541373469665688e-05, 'epoch': 2.36}

 30%|β–ˆβ–ˆβ–‰       | 182/616 [2:51:45<6:42:48, 55.69s/it]
 30%|β–ˆβ–ˆβ–‰       | 183/616 [2:52:40<6:40:16, 55.46s/it]
                                                     
{'loss': 1.7285, 'learning_rate': 1.6501480325006206e-05, 'epoch': 2.38}

 30%|β–ˆβ–ˆβ–‰       | 183/616 [2:52:40<6:40:16, 55.46s/it]
 30%|β–ˆβ–ˆβ–‰       | 184/616 [2:53:35<6:38:17, 55.32s/it]
                                                     
{'loss': 1.7417, 'learning_rate': 1.64614071431121e-05, 'epoch': 2.39}

 30%|β–ˆβ–ˆβ–‰       | 184/616 [2:53:35<6:38:17, 55.32s/it]
 30%|β–ˆβ–ˆβ–ˆ       | 185/616 [2:54:31<6:38:58, 55.54s/it]
                                                     
{'loss': 1.79, 'learning_rate': 1.6421155033679085e-05, 'epoch': 2.4}

 30%|β–ˆβ–ˆβ–ˆ       | 185/616 [2:54:31<6:38:58, 55.54s/it]
 30%|β–ˆβ–ˆβ–ˆ       | 186/616 [2:55:27<6:38:52, 55.66s/it]
                                                     
{'loss': 1.7876, 'learning_rate': 1.6380725111357693e-05, 'epoch': 2.42}

 30%|β–ˆβ–ˆβ–ˆ       | 186/616 [2:55:27<6:38:52, 55.66s/it]
 30%|β–ˆβ–ˆβ–ˆ       | 187/616 [2:56:23<6:39:32, 55.88s/it]
                                                     
{'loss': 1.7734, 'learning_rate': 1.634011849572239e-05, 'epoch': 2.43}

 30%|β–ˆβ–ˆβ–ˆ       | 187/616 [2:56:23<6:39:32, 55.88s/it]
 31%|β–ˆβ–ˆβ–ˆ       | 188/616 [2:57:18<6:37:16, 55.69s/it]
                                                     
{'loss': 1.7686, 'learning_rate': 1.6299336311240593e-05, 'epoch': 2.44}

 31%|β–ˆβ–ˆβ–ˆ       | 188/616 [2:57:18<6:37:16, 55.69s/it]
 31%|β–ˆβ–ˆβ–ˆ       | 189/616 [2:58:15<6:38:07, 55.94s/it]
                                                     
{'loss': 1.7993, 'learning_rate': 1.6258379687241533e-05, 'epoch': 2.45}

 31%|β–ˆβ–ˆβ–ˆ       | 189/616 [2:58:15<6:38:07, 55.94s/it]
 31%|β–ˆβ–ˆβ–ˆ       | 190/616 [2:59:09<6:34:19, 55.54s/it]
                                                     
{'loss': 1.708, 'learning_rate': 1.6217249757884954e-05, 'epoch': 2.47}

 31%|β–ˆβ–ˆβ–ˆ       | 190/616 [2:59:09<6:34:19, 55.54s/it]
 31%|β–ˆβ–ˆβ–ˆ       | 191/616 [3:00:05<6:33:15, 55.52s/it]
                                                     
{'loss': 1.7065, 'learning_rate': 1.6175947662129735e-05, 'epoch': 2.48}

 31%|β–ˆβ–ˆβ–ˆ       | 191/616 [3:00:05<6:33:15, 55.52s/it]
 31%|β–ˆβ–ˆβ–ˆ       | 192/616 [3:01:00<6:32:25, 55.53s/it]
                                                     
{'loss': 1.7324, 'learning_rate': 1.6134474543702353e-05, 'epoch': 2.49}

 31%|β–ˆβ–ˆβ–ˆ       | 192/616 [3:01:00<6:32:25, 55.53s/it]
 31%|β–ˆβ–ˆβ–ˆβ–      | 193/616 [3:01:56<6:31:58, 55.60s/it]
                                                     
{'loss': 1.7686, 'learning_rate': 1.609283155106517e-05, 'epoch': 2.51}

 31%|β–ˆβ–ˆβ–ˆβ–      | 193/616 [3:01:56<6:31:58, 55.60s/it]
 31%|β–ˆβ–ˆβ–ˆβ–      | 194/616 [3:02:51<6:30:30, 55.52s/it]
                                                     
{'loss': 1.7563, 'learning_rate': 1.605101983738468e-05, 'epoch': 2.52}

 31%|β–ˆβ–ˆβ–ˆβ–      | 194/616 [3:02:51<6:30:30, 55.52s/it]
 32%|β–ˆβ–ˆβ–ˆβ–      | 195/616 [3:03:48<6:31:31, 55.80s/it]
                                                     
{'loss': 1.7373, 'learning_rate': 1.6009040560499548e-05, 'epoch': 2.53}

 32%|β–ˆβ–ˆβ–ˆβ–      | 195/616 [3:03:48<6:31:31, 55.80s/it]
 32%|β–ˆβ–ˆβ–ˆβ–      | 196/616 [3:04:44<6:32:05, 56.01s/it]
                                                     
{'loss': 1.7104, 'learning_rate': 1.596689488288856e-05, 'epoch': 2.55}

 32%|β–ˆβ–ˆβ–ˆβ–      | 196/616 [3:04:44<6:32:05, 56.01s/it]
 32%|β–ˆβ–ˆβ–ˆβ–      | 197/616 [3:05:40<6:29:58, 55.84s/it]
                                                     
{'loss': 1.7368, 'learning_rate': 1.5924583971638416e-05, 'epoch': 2.56}

 32%|β–ˆβ–ˆβ–ˆβ–      | 197/616 [3:05:40<6:29:58, 55.84s/it]
 32%|β–ˆβ–ˆβ–ˆβ–      | 198/616 [3:06:36<6:30:01, 55.99s/it]
                                                     
{'loss': 1.7886, 'learning_rate': 1.5882108998411427e-05, 'epoch': 2.57}

 32%|β–ˆβ–ˆβ–ˆβ–      | 198/616 [3:06:36<6:30:01, 55.99s/it]
 32%|β–ˆβ–ˆβ–ˆβ–      | 199/616 [3:07:32<6:28:20, 55.88s/it]
                                                     
{'loss': 1.6855, 'learning_rate': 1.5839471139413065e-05, 'epoch': 2.58}

 32%|β–ˆβ–ˆβ–ˆβ–      | 199/616 [3:07:32<6:28:20, 55.88s/it]
 32%|β–ˆβ–ˆβ–ˆβ–      | 200/616 [3:08:27<6:25:31, 55.60s/it]
                                                     
{'loss': 1.7158, 'learning_rate': 1.5796671575359382e-05, 'epoch': 2.6}

 32%|β–ˆβ–ˆβ–ˆβ–      | 200/616 [3:08:27<6:25:31, 55.60s/it]
 33%|β–ˆβ–ˆβ–ˆβ–Ž      | 201/616 [3:10:31<8:46:36, 76.14s/it]
                                                     
{'loss': 1.7144, 'learning_rate': 1.5753711491444336e-05, 'epoch': 2.61}

 33%|β–ˆβ–ˆβ–ˆβ–Ž      | 201/616 [3:10:31<8:46:36, 76.14s/it]
 33%|β–ˆβ–ˆβ–ˆβ–Ž      | 202/616 [3:11:27<8:03:20, 70.05s/it]
                                                     
{'loss': 1.6909, 'learning_rate': 1.571059207730695e-05, 'epoch': 2.62}

 33%|β–ˆβ–ˆβ–ˆβ–Ž      | 202/616 [3:11:27<8:03:20, 70.05s/it]
 33%|β–ˆβ–ˆβ–ˆβ–Ž      | 203/616 [3:12:23<7:33:14, 65.85s/it]
                                                     
{'loss': 1.8003, 'learning_rate': 1.5667314526998373e-05, 'epoch': 2.64}

 33%|β–ˆβ–ˆβ–ˆβ–Ž      | 203/616 [3:12:23<7:33:14, 65.85s/it]
 33%|β–ˆβ–ˆβ–ˆβ–Ž      | 204/616 [3:13:19<7:11:50, 62.89s/it]
                                                     
{'loss': 1.7231, 'learning_rate': 1.5623880038948828e-05, 'epoch': 2.65}

 33%|β–ˆβ–ˆβ–ˆβ–Ž      | 204/616 [3:13:19<7:11:50, 62.89s/it]
 33%|β–ˆβ–ˆβ–ˆβ–Ž      | 205/616 [3:14:14<6:55:21, 60.64s/it]
                                                     
{'loss': 1.6816, 'learning_rate': 1.55802898159344e-05, 'epoch': 2.66}

 33%|β–ˆβ–ˆβ–ˆβ–Ž      | 205/616 [3:14:14<6:55:21, 60.64s/it]
 33%|β–ˆβ–ˆβ–ˆβ–Ž      | 206/616 [3:15:10<6:43:56, 59.11s/it]
                                                     
{'loss': 1.6826, 'learning_rate': 1.553654506504377e-05, 'epoch': 2.68}

 33%|β–ˆβ–ˆβ–ˆβ–Ž      | 206/616 [3:15:10<6:43:56, 59.11s/it]
 34%|β–ˆβ–ˆβ–ˆβ–Ž      | 207/616 [3:16:06<6:36:32, 58.17s/it]
                                                     
{'loss': 1.7085, 'learning_rate': 1.5492646997644737e-05, 'epoch': 2.69}

 34%|β–ˆβ–ˆβ–ˆβ–Ž      | 207/616 [3:16:06<6:36:32, 58.17s/it]
 34%|β–ˆβ–ˆβ–ˆβ–      | 208/616 [3:17:01<6:29:54, 57.34s/it]
                                                     
{'loss': 1.6797, 'learning_rate': 1.5448596829350706e-05, 'epoch': 2.7}

 34%|β–ˆβ–ˆβ–ˆβ–      | 208/616 [3:17:01<6:29:54, 57.34s/it]
 34%|β–ˆβ–ˆβ–ˆβ–      | 209/616 [3:17:56<6:24:38, 56.70s/it]
                                                     
{'loss': 1.708, 'learning_rate': 1.540439577998703e-05, 'epoch': 2.71}

 34%|β–ˆβ–ˆβ–ˆβ–      | 209/616 [3:17:56<6:24:38, 56.70s/it]
 34%|β–ˆβ–ˆβ–ˆβ–      | 210/616 [3:18:51<6:20:13, 56.19s/it]
                                                     
{'loss': 1.7036, 'learning_rate': 1.5360045073557214e-05, 'epoch': 2.73}

 34%|β–ˆβ–ˆβ–ˆβ–      | 210/616 [3:18:51<6:20:13, 56.19s/it]
 34%|β–ˆβ–ˆβ–ˆβ–      | 211/616 [3:19:47<6:17:35, 55.94s/it]
                                                     
{'loss': 1.7129, 'learning_rate': 1.5315545938209016e-05, 'epoch': 2.74}

 34%|β–ˆβ–ˆβ–ˆβ–      | 211/616 [3:19:47<6:17:35, 55.94s/it]
 34%|β–ˆβ–ˆβ–ˆβ–      | 212/616 [3:20:42<6:15:56, 55.83s/it]
                                                     
{'loss': 1.6855, 'learning_rate': 1.527089960620046e-05, 'epoch': 2.75}

 34%|β–ˆβ–ˆβ–ˆβ–      | 212/616 [3:20:42<6:15:56, 55.83s/it]
 35%|β–ˆβ–ˆβ–ˆβ–      | 213/616 [3:21:37<6:12:54, 55.52s/it]
                                                     
{'loss': 1.645, 'learning_rate': 1.5226107313865701e-05, 'epoch': 2.77}

 35%|β–ˆβ–ˆβ–ˆβ–      | 213/616 [3:21:37<6:12:54, 55.52s/it]
 35%|β–ˆβ–ˆβ–ˆβ–      | 214/616 [3:22:32<6:11:06, 55.39s/it]
                                                     
{'loss': 1.6982, 'learning_rate': 1.5181170301580776e-05, 'epoch': 2.78}

 35%|β–ˆβ–ˆβ–ˆβ–      | 214/616 [3:22:32<6:11:06, 55.39s/it]
 35%|β–ˆβ–ˆβ–ˆβ–      | 215/616 [3:23:27<6:09:14, 55.25s/it]
                                                     
{'loss': 1.731, 'learning_rate': 1.5136089813729276e-05, 'epoch': 2.79}

 35%|β–ˆβ–ˆβ–ˆβ–      | 215/616 [3:23:27<6:09:14, 55.25s/it]
 35%|β–ˆβ–ˆβ–ˆβ–Œ      | 216/616 [3:24:22<6:08:42, 55.31s/it]
                                                     
{'loss': 1.7192, 'learning_rate': 1.509086709866788e-05, 'epoch': 2.81}

 35%|β–ˆβ–ˆβ–ˆβ–Œ      | 216/616 [3:24:22<6:08:42, 55.31s/it]
 35%|β–ˆβ–ˆβ–ˆβ–Œ      | 217/616 [3:25:18<6:09:08, 55.51s/it]
                                                     
{'loss': 1.6982, 'learning_rate': 1.5045503408691776e-05, 'epoch': 2.82}

 35%|β–ˆβ–ˆβ–ˆβ–Œ      | 217/616 [3:25:18<6:09:08, 55.51s/it]
 35%|β–ˆβ–ˆβ–ˆβ–Œ      | 218/616 [3:26:15<6:10:32, 55.86s/it]
                                                     
{'loss': 1.7266, 'learning_rate': 1.5000000000000002e-05, 'epoch': 2.83}

 35%|β–ˆβ–ˆβ–ˆβ–Œ      | 218/616 [3:26:15<6:10:32, 55.86s/it]
 36%|β–ˆβ–ˆβ–ˆβ–Œ      | 219/616 [3:27:11<6:08:45, 55.73s/it]
                                                     
{'loss': 1.6958, 'learning_rate': 1.495435813266064e-05, 'epoch': 2.84}

 36%|β–ˆβ–ˆβ–ˆβ–Œ      | 219/616 [3:27:11<6:08:45, 55.73s/it]
 36%|β–ˆβ–ˆβ–ˆβ–Œ      | 220/616 [3:28:06<6:07:56, 55.75s/it]
                                                     
{'loss': 1.7056, 'learning_rate': 1.4908579070575936e-05, 'epoch': 2.86}

 36%|β–ˆβ–ˆβ–ˆβ–Œ      | 220/616 [3:28:06<6:07:56, 55.75s/it]
 36%|β–ˆβ–ˆβ–ˆβ–Œ      | 221/616 [3:29:02<6:07:44, 55.86s/it]
                                                     
{'loss': 1.6943, 'learning_rate': 1.4862664081447297e-05, 'epoch': 2.87}

 36%|β–ˆβ–ˆβ–ˆβ–Œ      | 221/616 [3:29:02<6:07:44, 55.86s/it]
 36%|β–ˆβ–ˆβ–ˆβ–Œ      | 222/616 [3:29:57<6:04:46, 55.55s/it]
                                                     
{'loss': 1.6724, 'learning_rate': 1.4816614436740184e-05, 'epoch': 2.88}

 36%|β–ˆβ–ˆβ–ˆβ–Œ      | 222/616 [3:29:57<6:04:46, 55.55s/it]
 36%|β–ˆβ–ˆβ–ˆβ–Œ      | 223/616 [3:30:52<6:02:26, 55.34s/it]
                                                     
{'loss': 1.6641, 'learning_rate': 1.4770431411648898e-05, 'epoch': 2.9}

 36%|β–ˆβ–ˆβ–ˆβ–Œ      | 223/616 [3:30:52<6:02:26, 55.34s/it]
 36%|β–ˆβ–ˆβ–ˆβ–‹      | 224/616 [3:31:48<6:02:46, 55.53s/it]
                                                     
{'loss': 1.7461, 'learning_rate': 1.4724116285061278e-05, 'epoch': 2.91}

 36%|β–ˆβ–ˆβ–ˆβ–‹      | 224/616 [3:31:48<6:02:46, 55.53s/it]
 37%|β–ˆβ–ˆβ–ˆβ–‹      | 225/616 [3:32:43<5:59:56, 55.23s/it]
                                                     
{'loss': 1.7207, 'learning_rate': 1.4677670339523285e-05, 'epoch': 2.92}

 37%|β–ˆβ–ˆβ–ˆβ–‹      | 225/616 [3:32:43<5:59:56, 55.23s/it]
 37%|β–ˆβ–ˆβ–ˆβ–‹      | 226/616 [3:33:39<6:02:09, 55.72s/it]
                                                     
{'loss': 1.7061, 'learning_rate': 1.4631094861203478e-05, 'epoch': 2.94}

 37%|β–ˆβ–ˆβ–ˆβ–‹      | 226/616 [3:33:39<6:02:09, 55.72s/it]
 37%|β–ˆβ–ˆβ–ˆβ–‹      | 227/616 [3:34:35<6:00:28, 55.60s/it]
                                                     
{'loss': 1.6758, 'learning_rate': 1.4584391139857407e-05, 'epoch': 2.95}

 37%|β–ˆβ–ˆβ–ˆβ–‹      | 227/616 [3:34:35<6:00:28, 55.60s/it]
 37%|β–ˆβ–ˆβ–ˆβ–‹      | 228/616 [3:35:31<6:00:26, 55.74s/it]
                                                     
{'loss': 1.73, 'learning_rate': 1.4537560468791889e-05, 'epoch': 2.96}

 37%|β–ˆβ–ˆβ–ˆβ–‹      | 228/616 [3:35:31<6:00:26, 55.74s/it]
 37%|β–ˆβ–ˆβ–ˆβ–‹      | 229/616 [3:36:26<5:57:53, 55.49s/it]
                                                     
{'loss': 1.7314, 'learning_rate': 1.4490604144829204e-05, 'epoch': 2.97}

 37%|β–ˆβ–ˆβ–ˆβ–‹      | 229/616 [3:36:26<5:57:53, 55.49s/it]
 37%|β–ˆβ–ˆβ–ˆβ–‹      | 230/616 [3:37:21<5:56:16, 55.38s/it]
                                                     
{'loss': 1.7114, 'learning_rate': 1.4443523468271168e-05, 'epoch': 2.99}

 37%|β–ˆβ–ˆβ–ˆβ–‹      | 230/616 [3:37:21<5:56:16, 55.38s/it]
 38%|β–ˆβ–ˆβ–ˆβ–Š      | 231/616 [3:38:18<5:58:35, 55.89s/it]
                                                     
{'loss': 1.7212, 'learning_rate': 1.4396319742863145e-05, 'epoch': 3.0}

 38%|β–ˆβ–ˆβ–ˆβ–Š      | 231/616 [3:38:18<5:58:35, 55.89s/it]
 38%|β–ˆβ–ˆβ–ˆβ–Š      | 232/616 [3:39:42<6:51:47, 64.34s/it]
                                                     
{'loss': 1.7036, 'learning_rate': 1.4348994275757933e-05, 'epoch': 3.01}

 38%|β–ˆβ–ˆβ–ˆβ–Š      | 232/616 [3:39:42<6:51:47, 64.34s/it]
 38%|β–ˆβ–ˆβ–ˆβ–Š      | 233/616 [3:40:38<6:34:52, 61.86s/it]
                                                     
{'loss': 1.71, 'learning_rate': 1.4301548377479562e-05, 'epoch': 3.03}

 38%|β–ˆβ–ˆβ–ˆβ–Š      | 233/616 [3:40:38<6:34:52, 61.86s/it]
 38%|β–ˆβ–ˆβ–ˆβ–Š      | 234/616 [3:41:33<6:20:43, 59.80s/it]
                                                     
{'loss': 1.7432, 'learning_rate': 1.4253983361887017e-05, 'epoch': 3.04}

 38%|β–ˆβ–ˆβ–ˆβ–Š      | 234/616 [3:41:33<6:20:43, 59.80s/it]
 38%|β–ˆβ–ˆβ–ˆβ–Š      | 235/616 [3:42:29<6:12:23, 58.65s/it]
                                                     
{'loss': 1.6992, 'learning_rate': 1.4206300546137844e-05, 'epoch': 3.05}

 38%|β–ˆβ–ˆβ–ˆβ–Š      | 235/616 [3:42:29<6:12:23, 58.65s/it]
 38%|β–ˆβ–ˆβ–ˆβ–Š      | 236/616 [3:43:24<6:05:20, 57.69s/it]
                                                     
{'loss': 1.7271, 'learning_rate': 1.415850125065168e-05, 'epoch': 3.06}

 38%|β–ˆβ–ˆβ–ˆβ–Š      | 236/616 [3:43:24<6:05:20, 57.69s/it]
 38%|β–ˆβ–ˆβ–ˆβ–Š      | 237/616 [3:44:19<5:59:01, 56.84s/it]
                                                     
{'loss': 1.6792, 'learning_rate': 1.4110586799073684e-05, 'epoch': 3.08}

 38%|β–ˆβ–ˆβ–ˆβ–Š      | 237/616 [3:44:19<5:59:01, 56.84s/it]
 39%|β–ˆβ–ˆβ–ˆβ–Š      | 238/616 [3:45:15<5:56:01, 56.51s/it]
                                                     
{'loss': 1.73, 'learning_rate': 1.4062558518237893e-05, 'epoch': 3.09}

 39%|β–ˆβ–ˆβ–ˆβ–Š      | 238/616 [3:45:15<5:56:01, 56.51s/it]
 39%|β–ˆβ–ˆβ–ˆβ–‰      | 239/616 [3:46:11<5:53:55, 56.33s/it]
                                                     
{'loss': 1.7192, 'learning_rate': 1.4014417738130464e-05, 'epoch': 3.1}

 39%|β–ˆβ–ˆβ–ˆβ–‰      | 239/616 [3:46:11<5:53:55, 56.33s/it]
 39%|β–ˆβ–ˆβ–ˆβ–‰      | 240/616 [3:47:06<5:50:00, 55.85s/it]
                                                     
{'loss': 1.7476, 'learning_rate': 1.3966165791852862e-05, 'epoch': 3.12}

 39%|β–ˆβ–ˆβ–ˆβ–‰      | 240/616 [3:47:06<5:50:00, 55.85s/it]
 39%|β–ˆβ–ˆβ–ˆβ–‰      | 241/616 [3:48:02<5:49:47, 55.97s/it]
                                                     
{'loss': 1.6958, 'learning_rate': 1.3917804015584932e-05, 'epoch': 3.13}

 39%|β–ˆβ–ˆβ–ˆβ–‰      | 241/616 [3:48:02<5:49:47, 55.97s/it]
 39%|β–ˆβ–ˆβ–ˆβ–‰      | 242/616 [3:48:57<5:47:38, 55.77s/it]
                                                     
{'loss': 1.6865, 'learning_rate': 1.3869333748547901e-05, 'epoch': 3.14}

 39%|β–ˆβ–ˆβ–ˆβ–‰      | 242/616 [3:48:57<5:47:38, 55.77s/it]
 39%|β–ˆβ–ˆβ–ˆβ–‰      | 243/616 [3:49:53<5:46:15, 55.70s/it]
                                                     
{'loss': 1.668, 'learning_rate': 1.3820756332967294e-05, 'epoch': 3.16}

 39%|β–ˆβ–ˆβ–ˆβ–‰      | 243/616 [3:49:53<5:46:15, 55.70s/it]
 40%|β–ˆβ–ˆβ–ˆβ–‰      | 244/616 [3:50:48<5:44:15, 55.53s/it]
                                                     
{'loss': 1.6826, 'learning_rate': 1.3772073114035762e-05, 'epoch': 3.17}

 40%|β–ˆβ–ˆβ–ˆβ–‰      | 244/616 [3:50:48<5:44:15, 55.53s/it]
 40%|β–ˆβ–ˆβ–ˆβ–‰      | 245/616 [3:51:43<5:42:32, 55.40s/it]
                                                     
{'loss': 1.7227, 'learning_rate': 1.3723285439875836e-05, 'epoch': 3.18}

 40%|β–ˆβ–ˆβ–ˆβ–‰      | 245/616 [3:51:43<5:42:32, 55.40s/it]
 40%|β–ˆβ–ˆβ–ˆβ–‰      | 246/616 [3:52:39<5:41:59, 55.46s/it]
                                                     
{'loss': 1.7163, 'learning_rate': 1.3674394661502595e-05, 'epoch': 3.19}

 40%|β–ˆβ–ˆβ–ˆβ–‰      | 246/616 [3:52:39<5:41:59, 55.46s/it]
 40%|β–ˆβ–ˆβ–ˆβ–ˆ      | 247/616 [3:53:35<5:42:19, 55.66s/it]
                                                     
{'loss': 1.6606, 'learning_rate': 1.3625402132786247e-05, 'epoch': 3.21}

 40%|β–ˆβ–ˆβ–ˆβ–ˆ      | 247/616 [3:53:35<5:42:19, 55.66s/it]
 40%|β–ˆβ–ˆβ–ˆβ–ˆ      | 248/616 [3:54:31<5:42:14, 55.80s/it]
                                                     
{'loss': 1.7085, 'learning_rate': 1.3576309210414646e-05, 'epoch': 3.22}

 40%|β–ˆβ–ˆβ–ˆβ–ˆ      | 248/616 [3:54:31<5:42:14, 55.80s/it]
 40%|β–ˆβ–ˆβ–ˆβ–ˆ      | 249/616 [3:55:26<5:40:19, 55.64s/it]
                                                     
{'loss': 1.668, 'learning_rate': 1.352711725385572e-05, 'epoch': 3.23}

 40%|β–ˆβ–ˆβ–ˆβ–ˆ      | 249/616 [3:55:26<5:40:19, 55.64s/it]
 41%|β–ˆβ–ˆβ–ˆβ–ˆ      | 250/616 [3:56:22<5:39:13, 55.61s/it]
                                                     
{'loss': 1.7173, 'learning_rate': 1.3477827625319826e-05, 'epoch': 3.25}

 41%|β–ˆβ–ˆβ–ˆβ–ˆ      | 250/616 [3:56:22<5:39:13, 55.61s/it]
 41%|β–ˆβ–ˆβ–ˆβ–ˆ      | 251/616 [3:57:17<5:38:23, 55.63s/it]
                                                     
{'loss': 1.7656, 'learning_rate': 1.3428441689722023e-05, 'epoch': 3.26}

 41%|β–ˆβ–ˆβ–ˆβ–ˆ      | 251/616 [3:57:17<5:38:23, 55.63s/it]
 41%|β–ˆβ–ˆβ–ˆβ–ˆ      | 252/616 [3:58:14<5:38:25, 55.78s/it]
                                                     
{'loss': 1.6812, 'learning_rate': 1.3378960814644283e-05, 'epoch': 3.27}

 41%|β–ˆβ–ˆβ–ˆβ–ˆ      | 252/616 [3:58:14<5:38:25, 55.78s/it]
 41%|β–ˆβ–ˆβ–ˆβ–ˆ      | 253/616 [3:59:09<5:36:11, 55.57s/it]
                                                     
{'loss': 1.6953, 'learning_rate': 1.3329386370297615e-05, 'epoch': 3.29}

 41%|β–ˆβ–ˆβ–ˆβ–ˆ      | 253/616 [3:59:09<5:36:11, 55.57s/it]
 41%|β–ˆβ–ˆβ–ˆβ–ˆ      | 254/616 [4:00:04<5:35:02, 55.53s/it]
                                                     
{'loss': 1.665, 'learning_rate': 1.3279719729484117e-05, 'epoch': 3.3}

 41%|β–ˆβ–ˆβ–ˆβ–ˆ      | 254/616 [4:00:04<5:35:02, 55.53s/it]
 41%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 255/616 [4:00:59<5:33:43, 55.47s/it]
                                                     
{'loss': 1.6587, 'learning_rate': 1.3229962267558982e-05, 'epoch': 3.31}

 41%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 255/616 [4:00:59<5:33:43, 55.47s/it]
 42%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 256/616 [4:01:55<5:33:39, 55.61s/it]
                                                     
{'loss': 1.6797, 'learning_rate': 1.3180115362392383e-05, 'epoch': 3.32}

 42%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 256/616 [4:01:55<5:33:39, 55.61s/it]
 42%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 257/616 [4:02:51<5:32:48, 55.62s/it]
                                                     
{'loss': 1.6992, 'learning_rate': 1.3130180394331335e-05, 'epoch': 3.34}

 42%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 257/616 [4:02:51<5:32:48, 55.62s/it]
 42%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 258/616 [4:03:47<5:32:16, 55.69s/it]
                                                     
{'loss': 1.6567, 'learning_rate': 1.3080158746161468e-05, 'epoch': 3.35}

 42%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 258/616 [4:03:47<5:32:16, 55.69s/it]
 42%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 259/616 [4:04:42<5:31:01, 55.63s/it]
                                                     
{'loss': 1.6641, 'learning_rate': 1.3030051803068729e-05, 'epoch': 3.36}

 42%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 259/616 [4:04:42<5:31:01, 55.63s/it]
 42%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 260/616 [4:05:39<5:31:17, 55.84s/it]
                                                     
{'loss': 1.6841, 'learning_rate': 1.2979860952601038e-05, 'epoch': 3.38}

 42%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 260/616 [4:05:39<5:31:17, 55.84s/it]
 42%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 261/616 [4:06:33<5:28:37, 55.54s/it]
                                                     
{'loss': 1.6777, 'learning_rate': 1.2929587584629845e-05, 'epoch': 3.39}

 42%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 261/616 [4:06:33<5:28:37, 55.54s/it]
 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž     | 262/616 [4:07:30<5:29:36, 55.87s/it]
                                                     
{'loss': 1.7065, 'learning_rate': 1.2879233091311667e-05, 'epoch': 3.4}

 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž     | 262/616 [4:07:30<5:29:36, 55.87s/it]
 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž     | 263/616 [4:08:26<5:28:11, 55.78s/it]
                                                     
{'loss': 1.6997, 'learning_rate': 1.2828798867049504e-05, 'epoch': 3.42}

 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž     | 263/616 [4:08:26<5:28:11, 55.78s/it]
 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž     | 264/616 [4:09:21<5:27:20, 55.80s/it]
                                                     
{'loss': 1.6704, 'learning_rate': 1.2778286308454255e-05, 'epoch': 3.43}

 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž     | 264/616 [4:09:21<5:27:20, 55.80s/it]
 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž     | 265/616 [4:10:16<5:24:37, 55.49s/it]
                                                     
{'loss': 1.6489, 'learning_rate': 1.2727696814306034e-05, 'epoch': 3.44}

 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž     | 265/616 [4:10:16<5:24:37, 55.49s/it]
 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž     | 266/616 [4:11:12<5:23:30, 55.46s/it]
                                                     
{'loss': 1.6777, 'learning_rate': 1.2677031785515423e-05, 'epoch': 3.45}

 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž     | 266/616 [4:11:12<5:23:30, 55.46s/it]
 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž     | 267/616 [4:12:07<5:22:50, 55.50s/it]
                                                     
{'loss': 1.6284, 'learning_rate': 1.26262926250847e-05, 'epoch': 3.47}

 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž     | 267/616 [4:12:07<5:22:50, 55.50s/it]
 44%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž     | 268/616 [4:13:03<5:21:36, 55.45s/it]
                                                     
{'loss': 1.6445, 'learning_rate': 1.2575480738068971e-05, 'epoch': 3.48}

 44%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž     | 268/616 [4:13:03<5:21:36, 55.45s/it]
 44%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž     | 269/616 [4:13:58<5:20:21, 55.39s/it]
                                                     
{'loss': 1.626, 'learning_rate': 1.2524597531537261e-05, 'epoch': 3.49}

 44%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž     | 269/616 [4:13:58<5:20:21, 55.39s/it]
 44%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 270/616 [4:14:54<5:19:56, 55.48s/it]
                                                     
{'loss': 1.626, 'learning_rate': 1.2473644414533573e-05, 'epoch': 3.51}

 44%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 270/616 [4:14:54<5:19:56, 55.48s/it]
 44%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 271/616 [4:15:50<5:20:41, 55.77s/it]
                                                     
{'loss': 1.6919, 'learning_rate': 1.2422622798037833e-05, 'epoch': 3.52}

 44%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 271/616 [4:15:50<5:20:41, 55.77s/it]
 44%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 272/616 [4:16:46<5:20:14, 55.86s/it]
                                                     
{'loss': 1.6602, 'learning_rate': 1.2371534094926852e-05, 'epoch': 3.53}

 44%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 272/616 [4:16:46<5:20:14, 55.86s/it]
 44%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 273/616 [4:17:42<5:18:58, 55.80s/it]
                                                     
{'loss': 1.6401, 'learning_rate': 1.232037971993517e-05, 'epoch': 3.55}

 44%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 273/616 [4:17:42<5:18:58, 55.80s/it]
 44%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 274/616 [4:18:36<5:16:22, 55.50s/it]
                                                     
{'loss': 1.7026, 'learning_rate': 1.2269161089615902e-05, 'epoch': 3.56}

 44%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 274/616 [4:18:37<5:16:22, 55.50s/it]
 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 275/616 [4:19:32<5:15:51, 55.58s/it]
                                                     
{'loss': 1.6875, 'learning_rate': 1.2217879622301514e-05, 'epoch': 3.57}

 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 275/616 [4:19:32<5:15:51, 55.58s/it]
 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 276/616 [4:20:27<5:14:12, 55.45s/it]
                                                     
{'loss': 1.6646, 'learning_rate': 1.2166536738064523e-05, 'epoch': 3.58}

 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 276/616 [4:20:27<5:14:12, 55.45s/it]
 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 277/616 [4:21:23<5:13:32, 55.49s/it]
                                                     
{'loss': 1.6631, 'learning_rate': 1.2115133858678192e-05, 'epoch': 3.6}

 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–     | 277/616 [4:21:23<5:13:32, 55.49s/it]
 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ     | 278/616 [4:22:19<5:13:43, 55.69s/it]
                                                     
{'loss': 1.6196, 'learning_rate': 1.2063672407577154e-05, 'epoch': 3.61}

 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ     | 278/616 [4:22:19<5:13:43, 55.69s/it]
 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ     | 279/616 [4:23:14<5:11:50, 55.52s/it]
                                                     
{'loss': 1.6606, 'learning_rate': 1.2012153809817992e-05, 'epoch': 3.62}

 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ     | 279/616 [4:23:14<5:11:50, 55.52s/it]
 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ     | 280/616 [4:24:10<5:11:51, 55.69s/it]
                                                     
{'loss': 1.6719, 'learning_rate': 1.1960579492039783e-05, 'epoch': 3.64}

 45%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ     | 280/616 [4:24:10<5:11:51, 55.69s/it]
 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ     | 281/616 [4:25:07<5:11:43, 55.83s/it]
                                                     
{'loss': 1.6958, 'learning_rate': 1.1908950882424581e-05, 'epoch': 3.65}

 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ     | 281/616 [4:25:07<5:11:43, 55.83s/it]
 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ     | 282/616 [4:26:03<5:12:04, 56.06s/it]
                                                     
{'loss': 1.645, 'learning_rate': 1.1857269410657883e-05, 'epoch': 3.66}

 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ     | 282/616 [4:26:03<5:12:04, 56.06s/it]
 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ     | 283/616 [4:27:01<5:13:38, 56.51s/it]
                                                     
{'loss': 1.6782, 'learning_rate': 1.1805536507889021e-05, 'epoch': 3.68}

 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ     | 283/616 [4:27:01<5:13:38, 56.51s/it]
 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ     | 284/616 [4:27:56<5:10:37, 56.14s/it]
                                                     
{'loss': 1.6724, 'learning_rate': 1.1753753606691554e-05, 'epoch': 3.69}

 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ     | 284/616 [4:27:56<5:10:37, 56.14s/it]
 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹     | 285/616 [4:28:52<5:09:53, 56.17s/it]
                                                     
{'loss': 1.6108, 'learning_rate': 1.1701922141023566e-05, 'epoch': 3.7}

 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹     | 285/616 [4:28:52<5:09:53, 56.17s/it]
 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹     | 286/616 [4:29:47<5:06:06, 55.66s/it]
                                                     
{'loss': 1.6313, 'learning_rate': 1.1650043546187994e-05, 'epoch': 3.71}

 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹     | 286/616 [4:29:47<5:06:06, 55.66s/it]
 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹     | 287/616 [4:30:42<5:05:23, 55.70s/it]
                                                     
{'loss': 1.647, 'learning_rate': 1.1598119258792848e-05, 'epoch': 3.73}

 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹     | 287/616 [4:30:42<5:05:23, 55.70s/it]
 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹     | 288/616 [4:31:38<5:04:18, 55.67s/it]
                                                     
{'loss': 1.6816, 'learning_rate': 1.1546150716711448e-05, 'epoch': 3.74}

 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹     | 288/616 [4:31:38<5:04:18, 55.67s/it]
 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹     | 289/616 [4:32:34<5:03:48, 55.74s/it]
                                                     
{'loss': 1.6846, 'learning_rate': 1.1494139359042612e-05, 'epoch': 3.75}

 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹     | 289/616 [4:32:34<5:03:48, 55.74s/it]
 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹     | 290/616 [4:33:30<5:04:10, 55.98s/it]
                                                     
{'loss': 1.6602, 'learning_rate': 1.1442086626070781e-05, 'epoch': 3.77}

 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹     | 290/616 [4:33:30<5:04:10, 55.98s/it]
 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹     | 291/616 [4:34:26<5:02:43, 55.89s/it]
                                                     
{'loss': 1.6133, 'learning_rate': 1.1389993959226163e-05, 'epoch': 3.78}

 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹     | 291/616 [4:34:26<5:02:43, 55.89s/it]
 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹     | 292/616 [4:35:22<5:01:18, 55.80s/it]
                                                     
{'loss': 1.6997, 'learning_rate': 1.1337862801044792e-05, 'epoch': 3.79}

 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹     | 292/616 [4:35:22<5:01:18, 55.80s/it]
 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š     | 293/616 [4:36:18<5:00:40, 55.85s/it]
                                                     
{'loss': 1.6172, 'learning_rate': 1.1285694595128606e-05, 'epoch': 3.81}

 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š     | 293/616 [4:36:18<5:00:40, 55.85s/it]
 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š     | 294/616 [4:37:13<4:59:35, 55.82s/it]
                                                     
{'loss': 1.6479, 'learning_rate': 1.123349078610545e-05, 'epoch': 3.82}

 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š     | 294/616 [4:37:13<4:59:35, 55.82s/it]
 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š     | 295/616 [4:38:10<4:59:14, 55.93s/it]
                                                     
{'loss': 1.6851, 'learning_rate': 1.1181252819589081e-05, 'epoch': 3.83}

 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š     | 295/616 [4:38:10<4:59:14, 55.93s/it]
 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š     | 296/616 [4:39:06<4:58:40, 56.00s/it]
                                                     
{'loss': 1.6533, 'learning_rate': 1.1128982142139142e-05, 'epoch': 3.84}

 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š     | 296/616 [4:39:06<4:58:40, 56.00s/it]
 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š     | 297/616 [4:40:02<4:58:04, 56.06s/it]
                                                     
{'loss': 1.6367, 'learning_rate': 1.1076680201221093e-05, 'epoch': 3.86}

 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š     | 297/616 [4:40:02<4:58:04, 56.06s/it]
 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š     | 298/616 [4:40:58<4:56:22, 55.92s/it]
                                                     
{'loss': 1.6426, 'learning_rate': 1.1024348445166133e-05, 'epoch': 3.87}

 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š     | 298/616 [4:40:58<4:56:22, 55.92s/it]
 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š     | 299/616 [4:41:54<4:56:48, 56.18s/it]
                                                     
{'loss': 1.6509, 'learning_rate': 1.0971988323131099e-05, 'epoch': 3.88}

 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š     | 299/616 [4:41:54<4:56:48, 56.18s/it]
 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š     | 300/616 [4:42:49<4:53:36, 55.75s/it]
                                                     
{'loss': 1.6997, 'learning_rate': 1.091960128505833e-05, 'epoch': 3.9}

 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š     | 300/616 [4:42:49<4:53:36, 55.75s/it]
 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰     | 301/616 [4:44:56<6:44:24, 77.03s/it]
                                                     
{'loss': 1.6187, 'learning_rate': 1.086718878163551e-05, 'epoch': 3.91}

 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰     | 301/616 [4:44:56<6:44:24, 77.03s/it]
 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰     | 302/616 [4:45:52<6:09:55, 70.69s/it]
                                                     
{'loss': 1.6914, 'learning_rate': 1.0814752264255508e-05, 'epoch': 3.92}

 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰     | 302/616 [4:45:52<6:09:55, 70.69s/it]
 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰     | 303/616 [4:46:47<5:44:48, 66.10s/it]
                                                     
{'loss': 1.6421, 'learning_rate': 1.0762293184976178e-05, 'epoch': 3.94}

 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰     | 303/616 [4:46:47<5:44:48, 66.10s/it]
 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰     | 304/616 [4:47:42<5:26:46, 62.84s/it]
                                                     
{'loss': 1.6631, 'learning_rate': 1.070981299648016e-05, 'epoch': 3.95}

 49%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰     | 304/616 [4:47:42<5:26:46, 62.84s/it]
 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰     | 305/616 [4:48:38<5:14:34, 60.69s/it]
                                                     
{'loss': 1.7046, 'learning_rate': 1.0657313152034634e-05, 'epoch': 3.96}

 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰     | 305/616 [4:48:38<5:14:34, 60.69s/it]
 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰     | 306/616 [4:49:33<5:04:42, 58.97s/it]
                                                     
{'loss': 1.5845, 'learning_rate': 1.0604795105451096e-05, 'epoch': 3.97}

 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰     | 306/616 [4:49:33<5:04:42, 58.97s/it]
 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰     | 307/616 [4:50:29<4:58:34, 57.97s/it]
                                                     
{'loss': 1.6621, 'learning_rate': 1.0552260311045082e-05, 'epoch': 3.99}

 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰     | 307/616 [4:50:29<4:58:34, 57.97s/it]
 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ     | 308/616 [4:51:24<4:53:57, 57.26s/it]
                                                     
{'loss': 1.6782, 'learning_rate': 1.0499710223595913e-05, 'epoch': 4.0}

 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ     | 308/616 [4:51:24<4:53:57, 57.26s/it]
 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ     | 309/616 [4:52:56<5:46:30, 67.72s/it]
                                                     
{'loss': 1.6611, 'learning_rate': 1.0447146298306394e-05, 'epoch': 4.01}

 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ     | 309/616 [4:52:56<5:46:30, 67.72s/it]
 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ     | 310/616 [4:53:52<5:26:19, 63.98s/it]
                                                     
{'loss': 1.6626, 'learning_rate': 1.0394569990762528e-05, 'epoch': 4.03}

 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ     | 310/616 [4:53:52<5:26:19, 63.98s/it]
 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ     | 311/616 [4:54:47<5:11:51, 61.35s/it]
                                                     
{'loss': 1.6406, 'learning_rate': 1.0341982756893203e-05, 'epoch': 4.04}

 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ     | 311/616 [4:54:47<5:11:51, 61.35s/it]
 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ     | 312/616 [4:55:42<5:01:12, 59.45s/it]
                                                     
{'loss': 1.6455, 'learning_rate': 1.0289386052929874e-05, 'epoch': 4.05}

 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ     | 312/616 [4:55:42<5:01:12, 59.45s/it]
 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ     | 313/616 [4:56:37<4:53:24, 58.10s/it]
                                                     
{'loss': 1.7051, 'learning_rate': 1.0236781335366239e-05, 'epoch': 4.06}

 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ     | 313/616 [4:56:37<4:53:24, 58.10s/it]
 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ     | 314/616 [4:57:32<4:47:47, 57.18s/it]
                                                     
{'loss': 1.5967, 'learning_rate': 1.0184170060917914e-05, 'epoch': 4.08}

 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ     | 314/616 [4:57:32<4:47:47, 57.18s/it]
 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ     | 315/616 [4:58:28<4:45:07, 56.84s/it]
                                                     
{'loss': 1.6772, 'learning_rate': 1.0131553686482077e-05, 'epoch': 4.09}

 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ     | 315/616 [4:58:28<4:45:07, 56.84s/it]
 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 316/616 [4:59:24<4:42:42, 56.54s/it]
                                                     
{'loss': 1.625, 'learning_rate': 1.0078933669097135e-05, 'epoch': 4.1}

 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 316/616 [4:59:24<4:42:42, 56.54s/it]
 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 317/616 [5:00:19<4:40:23, 56.27s/it]
                                                     
{'loss': 1.6572, 'learning_rate': 1.002631146590238e-05, 'epoch': 4.12}

 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 317/616 [5:00:19<4:40:23, 56.27s/it]
 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 318/616 [5:01:17<4:41:30, 56.68s/it]
                                                     
{'loss': 1.6694, 'learning_rate': 9.973688534097624e-06, 'epoch': 4.13}

 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 318/616 [5:01:17<4:41:30, 56.68s/it]
 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 319/616 [5:02:13<4:39:49, 56.53s/it]
                                                     
{'loss': 1.6377, 'learning_rate': 9.92106633090287e-06, 'epoch': 4.14}

 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 319/616 [5:02:13<4:39:49, 56.53s/it]
 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 320/616 [5:03:08<4:36:51, 56.12s/it]
                                                     
{'loss': 1.6782, 'learning_rate': 9.868446313517927e-06, 'epoch': 4.16}

 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 320/616 [5:03:08<4:36:51, 56.12s/it]
 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 321/616 [5:04:04<4:35:33, 56.04s/it]
                                                     
{'loss': 1.6147, 'learning_rate': 9.815829939082087e-06, 'epoch': 4.17}

 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 321/616 [5:04:04<4:35:33, 56.04s/it]
 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 322/616 [5:05:00<4:33:42, 55.86s/it]
                                                     
{'loss': 1.6826, 'learning_rate': 9.763218664633763e-06, 'epoch': 4.18}

 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 322/616 [5:05:00<4:33:42, 55.86s/it]
 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 323/616 [5:05:56<4:32:53, 55.88s/it]
                                                     
{'loss': 1.7041, 'learning_rate': 9.710613947070127e-06, 'epoch': 4.19}

 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 323/616 [5:05:56<4:32:53, 55.88s/it]
 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž    | 324/616 [5:06:51<4:31:22, 55.76s/it]
                                                     
{'loss': 1.6343, 'learning_rate': 9.658017243106802e-06, 'epoch': 4.21}

 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž    | 324/616 [5:06:51<4:31:22, 55.76s/it]
 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž    | 325/616 [5:07:47<4:30:07, 55.69s/it]
                                                     
{'loss': 1.6724, 'learning_rate': 9.605430009237474e-06, 'epoch': 4.22}

 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž    | 325/616 [5:07:47<4:30:07, 55.69s/it]
 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž    | 326/616 [5:08:42<4:28:06, 55.47s/it]
                                                     
{'loss': 1.6812, 'learning_rate': 9.552853701693606e-06, 'epoch': 4.23}

 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž    | 326/616 [5:08:42<4:28:06, 55.47s/it]
 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž    | 327/616 [5:09:37<4:27:10, 55.47s/it]
                                                     
{'loss': 1.6289, 'learning_rate': 9.50028977640409e-06, 'epoch': 4.25}

 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž    | 327/616 [5:09:37<4:27:10, 55.47s/it]
 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž    | 328/616 [5:10:33<4:26:54, 55.60s/it]
                                                     
{'loss': 1.6313, 'learning_rate': 9.44773968895492e-06, 'epoch': 4.26}

 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž    | 328/616 [5:10:33<4:26:54, 55.60s/it]
 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž    | 329/616 [5:11:29<4:26:11, 55.65s/it]
                                                     
{'loss': 1.6274, 'learning_rate': 9.395204894548907e-06, 'epoch': 4.27}

 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž    | 329/616 [5:11:29<4:26:11, 55.65s/it]
 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž    | 330/616 [5:12:24<4:24:35, 55.51s/it]
                                                     
{'loss': 1.6572, 'learning_rate': 9.342686847965367e-06, 'epoch': 4.29}

 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž    | 330/616 [5:12:24<4:24:35, 55.51s/it]
 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž    | 331/616 [5:13:20<4:25:05, 55.81s/it]
                                                     
{'loss': 1.6333, 'learning_rate': 9.290187003519841e-06, 'epoch': 4.3}

 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž    | 331/616 [5:13:20<4:25:05, 55.81s/it]
 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 332/616 [5:14:15<4:22:49, 55.53s/it]
                                                     
{'loss': 1.687, 'learning_rate': 9.237706815023824e-06, 'epoch': 4.31}

 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 332/616 [5:14:15<4:22:49, 55.53s/it]
 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 333/616 [5:15:11<4:22:17, 55.61s/it]
                                                     
{'loss': 1.6626, 'learning_rate': 9.185247735744495e-06, 'epoch': 4.32}

 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 333/616 [5:15:11<4:22:17, 55.61s/it]
 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 334/616 [5:16:08<4:22:56, 55.94s/it]
                                                     
{'loss': 1.6431, 'learning_rate': 9.132811218364494e-06, 'epoch': 4.34}

 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 334/616 [5:16:08<4:22:56, 55.94s/it]
 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 335/616 [5:17:04<4:22:14, 56.00s/it]
                                                     
{'loss': 1.6562, 'learning_rate': 9.080398714941672e-06, 'epoch': 4.35}

 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 335/616 [5:17:04<4:22:14, 56.00s/it]
 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 336/616 [5:17:59<4:20:36, 55.84s/it]
                                                     
{'loss': 1.6714, 'learning_rate': 9.028011676868901e-06, 'epoch': 4.36}

 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 336/616 [5:17:59<4:20:36, 55.84s/it]
 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 337/616 [5:18:55<4:20:00, 55.92s/it]
                                                     
{'loss': 1.604, 'learning_rate': 8.975651554833869e-06, 'epoch': 4.38}

 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 337/616 [5:18:55<4:20:00, 55.92s/it]
 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 338/616 [5:19:51<4:18:46, 55.85s/it]
                                                     
{'loss': 1.6719, 'learning_rate': 8.92331979877891e-06, 'epoch': 4.39}

 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–    | 338/616 [5:19:51<4:18:46, 55.85s/it]
 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ    | 339/616 [5:20:47<4:18:26, 55.98s/it]
                                                     
{'loss': 1.707, 'learning_rate': 8.871017857860863e-06, 'epoch': 4.4}

 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ    | 339/616 [5:20:47<4:18:26, 55.98s/it]
 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ    | 340/616 [5:21:42<4:15:46, 55.60s/it]
                                                     
{'loss': 1.647, 'learning_rate': 8.81874718041092e-06, 'epoch': 4.42}

 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ    | 340/616 [5:21:42<4:15:46, 55.60s/it]
 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ    | 341/616 [5:22:38<4:14:55, 55.62s/it]
                                                     
{'loss': 1.6675, 'learning_rate': 8.766509213894552e-06, 'epoch': 4.43}

 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ    | 341/616 [5:22:38<4:14:55, 55.62s/it]
 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ    | 342/616 [5:23:34<4:14:39, 55.76s/it]
                                                     
{'loss': 1.6636, 'learning_rate': 8.714305404871397e-06, 'epoch': 4.44}

 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ    | 342/616 [5:23:34<4:14:39, 55.76s/it]
 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ    | 343/616 [5:24:29<4:12:18, 55.45s/it]
                                                     
{'loss': 1.6768, 'learning_rate': 8.662137198955211e-06, 'epoch': 4.45}

 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ    | 343/616 [5:24:29<4:12:18, 55.45s/it]
 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ    | 344/616 [5:25:23<4:10:08, 55.18s/it]
                                                     
{'loss': 1.5864, 'learning_rate': 8.610006040773844e-06, 'epoch': 4.47}

 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ    | 344/616 [5:25:23<4:10:08, 55.18s/it]
 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ    | 345/616 [5:26:19<4:10:41, 55.50s/it]
                                                     
{'loss': 1.6304, 'learning_rate': 8.557913373929222e-06, 'epoch': 4.48}

 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ    | 345/616 [5:26:19<4:10:41, 55.50s/it]
 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ    | 346/616 [5:27:15<4:10:23, 55.64s/it]
                                                     
{'loss': 1.6289, 'learning_rate': 8.50586064095739e-06, 'epoch': 4.49}

 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ    | 346/616 [5:27:15<4:10:23, 55.64s/it]
 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹    | 347/616 [5:28:11<4:09:24, 55.63s/it]
                                                     
{'loss': 1.6436, 'learning_rate': 8.453849283288554e-06, 'epoch': 4.51}

 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹    | 347/616 [5:28:11<4:09:24, 55.63s/it]
 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹    | 348/616 [5:29:07<4:08:28, 55.63s/it]
                                                     
{'loss': 1.6221, 'learning_rate': 8.401880741207155e-06, 'epoch': 4.52}

 56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹    | 348/616 [5:29:07<4:08:28, 55.63s/it]
 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹    | 349/616 [5:30:03<4:08:31, 55.85s/it]
                                                     
{'loss': 1.6904, 'learning_rate': 8.349956453812009e-06, 'epoch': 4.53}

 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹    | 349/616 [5:30:03<4:08:31, 55.85s/it]
 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹    | 350/616 [5:30:58<4:06:06, 55.51s/it]
                                                     
{'loss': 1.5898, 'learning_rate': 8.298077858976435e-06, 'epoch': 4.55}

 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹    | 350/616 [5:30:58<4:06:06, 55.51s/it]
 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹    | 351/616 [5:31:53<4:04:15, 55.30s/it]
                                                     
{'loss': 1.667, 'learning_rate': 8.246246393308448e-06, 'epoch': 4.56}

 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹    | 351/616 [5:31:53<4:04:15, 55.30s/it]
 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹    | 352/616 [5:32:48<4:03:37, 55.37s/it]
                                                     
{'loss': 1.6543, 'learning_rate': 8.194463492110982e-06, 'epoch': 4.57}

 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹    | 352/616 [5:32:48<4:03:37, 55.37s/it]
 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹    | 353/616 [5:33:44<4:03:10, 55.48s/it]
                                                     
{'loss': 1.6572, 'learning_rate': 8.142730589342119e-06, 'epoch': 4.58}

 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹    | 353/616 [5:33:44<4:03:10, 55.48s/it]
 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹    | 354/616 [5:34:40<4:03:36, 55.79s/it]
                                                     
{'loss': 1.6685, 'learning_rate': 8.091049117575424e-06, 'epoch': 4.6}

 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹    | 354/616 [5:34:40<4:03:36, 55.79s/it]
 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š    | 355/616 [5:35:36<4:02:46, 55.81s/it]
                                                     
{'loss': 1.6484, 'learning_rate': 8.03942050796022e-06, 'epoch': 4.61}

 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š    | 355/616 [5:35:36<4:02:46, 55.81s/it]
 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š    | 356/616 [5:36:31<4:00:57, 55.61s/it]
                                                     
{'loss': 1.5405, 'learning_rate': 7.98784619018201e-06, 'epoch': 4.62}

 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š    | 356/616 [5:36:31<4:00:57, 55.61s/it]
 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š    | 357/616 [5:37:26<3:59:25, 55.46s/it]
                                                     
{'loss': 1.644, 'learning_rate': 7.93632759242285e-06, 'epoch': 4.64}

 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š    | 357/616 [5:37:26<3:59:25, 55.46s/it]
 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š    | 358/616 [5:38:22<3:58:25, 55.45s/it]
                                                     
{'loss': 1.6206, 'learning_rate': 7.884866141321811e-06, 'epoch': 4.65}

 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š    | 358/616 [5:38:22<3:58:25, 55.45s/it]
 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š    | 359/616 [5:39:17<3:57:30, 55.45s/it]
                                                     
{'loss': 1.6079, 'learning_rate': 7.833463261935482e-06, 'epoch': 4.66}

 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š    | 359/616 [5:39:17<3:57:30, 55.45s/it]
 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š    | 360/616 [5:40:13<3:56:23, 55.40s/it]
                                                     
{'loss': 1.6108, 'learning_rate': 7.782120377698489e-06, 'epoch': 4.68}

 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š    | 360/616 [5:40:13<3:56:23, 55.40s/it]
 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š    | 361/616 [5:41:09<3:56:28, 55.64s/it]
                                                     
{'loss': 1.5625, 'learning_rate': 7.730838910384098e-06, 'epoch': 4.69}

 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š    | 361/616 [5:41:09<3:56:28, 55.64s/it]
 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰    | 362/616 [5:42:04<3:54:35, 55.42s/it]
                                                     
{'loss': 1.647, 'learning_rate': 7.679620280064837e-06, 'epoch': 4.7}

 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰    | 362/616 [5:42:04<3:54:35, 55.42s/it]
 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰    | 363/616 [5:43:00<3:54:24, 55.59s/it]
                                                     
{'loss': 1.5493, 'learning_rate': 7.6284659050731525e-06, 'epoch': 4.71}

 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰    | 363/616 [5:43:00<3:54:24, 55.59s/it]
 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰    | 364/616 [5:43:55<3:53:37, 55.62s/it]
                                                     
{'loss': 1.6362, 'learning_rate': 7.57737720196217e-06, 'epoch': 4.73}

 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰    | 364/616 [5:43:55<3:53:37, 55.62s/it]
 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰    | 365/616 [5:44:51<3:53:05, 55.72s/it]
                                                     
{'loss': 1.6294, 'learning_rate': 7.526355585466432e-06, 'epoch': 4.74}

 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰    | 365/616 [5:44:51<3:53:05, 55.72s/it]
 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰    | 366/616 [5:45:48<3:53:24, 56.02s/it]
                                                     
{'loss': 1.6675, 'learning_rate': 7.4754024684627405e-06, 'epoch': 4.75}

 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰    | 366/616 [5:45:48<3:53:24, 56.02s/it]
 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰    | 367/616 [5:46:44<3:51:58, 55.90s/it]
                                                     
{'loss': 1.6519, 'learning_rate': 7.424519261931036e-06, 'epoch': 4.77}

 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰    | 367/616 [5:46:44<3:51:58, 55.90s/it]
 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰    | 368/616 [5:47:39<3:50:52, 55.86s/it]
                                                     
{'loss': 1.6807, 'learning_rate': 7.373707374915303e-06, 'epoch': 4.78}

 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰    | 368/616 [5:47:39<3:50:52, 55.86s/it]
 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰    | 369/616 [5:48:35<3:49:39, 55.79s/it]
                                                     
{'loss': 1.6221, 'learning_rate': 7.322968214484583e-06, 'epoch': 4.79}

 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰    | 369/616 [5:48:35<3:49:39, 55.79s/it]
 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ    | 370/616 [5:49:30<3:48:07, 55.64s/it]
                                                     
{'loss': 1.6523, 'learning_rate': 7.27230318569397e-06, 'epoch': 4.81}

 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ    | 370/616 [5:49:30<3:48:07, 55.64s/it]
 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ    | 371/616 [5:50:26<3:47:32, 55.72s/it]
                                                     
{'loss': 1.6118, 'learning_rate': 7.221713691545746e-06, 'epoch': 4.82}

 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ    | 371/616 [5:50:26<3:47:32, 55.72s/it]
 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ    | 372/616 [5:51:22<3:46:53, 55.79s/it]
                                                     
{'loss': 1.6279, 'learning_rate': 7.171201132950502e-06, 'epoch': 4.83}

 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ    | 372/616 [5:51:22<3:46:53, 55.79s/it]
 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ    | 373/616 [5:52:18<3:45:21, 55.64s/it]
                                                     
{'loss': 1.6416, 'learning_rate': 7.1207669086883366e-06, 'epoch': 4.84}

 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ    | 373/616 [5:52:18<3:45:21, 55.64s/it]
 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ    | 374/616 [5:53:13<3:44:41, 55.71s/it]
                                                     
{'loss': 1.605, 'learning_rate': 7.070412415370158e-06, 'epoch': 4.86}

 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ    | 374/616 [5:53:13<3:44:41, 55.71s/it]
 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ    | 375/616 [5:54:10<3:44:46, 55.96s/it]
                                                     
{'loss': 1.627, 'learning_rate': 7.020139047398966e-06, 'epoch': 4.87}

 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ    | 375/616 [5:54:10<3:44:46, 55.96s/it]
 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ    | 376/616 [5:55:04<3:41:28, 55.37s/it]
                                                     
{'loss': 1.6123, 'learning_rate': 6.969948196931272e-06, 'epoch': 4.88}

 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ    | 376/616 [5:55:04<3:41:28, 55.37s/it]
 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ    | 377/616 [5:55:59<3:40:43, 55.41s/it]
                                                     
{'loss': 1.6333, 'learning_rate': 6.919841253838537e-06, 'epoch': 4.9}

 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ    | 377/616 [5:55:59<3:40:43, 55.41s/it]
 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 378/616 [5:56:56<3:40:57, 55.70s/it]
                                                     
{'loss': 1.5981, 'learning_rate': 6.869819605668669e-06, 'epoch': 4.91}

 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 378/616 [5:56:56<3:40:57, 55.70s/it]
 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 379/616 [5:57:51<3:39:35, 55.59s/it]
                                                     
{'loss': 1.646, 'learning_rate': 6.819884637607619e-06, 'epoch': 4.92}

 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 379/616 [5:57:51<3:39:35, 55.59s/it]
 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 380/616 [5:58:46<3:38:23, 55.52s/it]
                                                     
{'loss': 1.6641, 'learning_rate': 6.770037732441019e-06, 'epoch': 4.94}

 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 380/616 [5:58:46<3:38:23, 55.52s/it]
 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 381/616 [5:59:42<3:36:55, 55.38s/it]
                                                     
{'loss': 1.6362, 'learning_rate': 6.720280270515882e-06, 'epoch': 4.95}

 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 381/616 [5:59:42<3:36:55, 55.38s/it]
 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 382/616 [6:00:38<3:36:40, 55.56s/it]
                                                     
{'loss': 1.6562, 'learning_rate': 6.670613629702391e-06, 'epoch': 4.96}

 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 382/616 [6:00:38<3:36:40, 55.56s/it]
 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 383/616 [6:01:33<3:35:24, 55.47s/it]
                                                     
{'loss': 1.6772, 'learning_rate': 6.62103918535572e-06, 'epoch': 4.97}

 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 383/616 [6:01:33<3:35:24, 55.47s/it]
 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 384/616 [6:02:29<3:34:58, 55.60s/it]
                                                     
{'loss': 1.6729, 'learning_rate': 6.5715583102779815e-06, 'epoch': 4.99}

 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 384/616 [6:02:29<3:34:58, 55.60s/it]
 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž   | 385/616 [6:03:24<3:33:52, 55.55s/it]
                                                     
{'loss': 1.6597, 'learning_rate': 6.522172374680177e-06, 'epoch': 5.0}

 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž   | 385/616 [6:03:24<3:33:52, 55.55s/it]
 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž   | 386/616 [6:04:53<4:11:03, 65.49s/it]
                                                     
{'loss': 1.6348, 'learning_rate': 6.472882746144282e-06, 'epoch': 5.01}

 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž   | 386/616 [6:04:53<4:11:03, 65.49s/it]
 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž   | 387/616 [6:05:49<3:59:32, 62.76s/it]
                                                     
{'loss': 1.6108, 'learning_rate': 6.423690789585359e-06, 'epoch': 5.03}

 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž   | 387/616 [6:05:49<3:59:32, 62.76s/it]
 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž   | 388/616 [6:06:45<3:50:45, 60.73s/it]
                                                     
{'loss': 1.6421, 'learning_rate': 6.374597867213756e-06, 'epoch': 5.04}

 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž   | 388/616 [6:06:45<3:50:45, 60.73s/it]
 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž   | 389/616 [6:07:41<3:43:48, 59.16s/it]
                                                     
{'loss': 1.6455, 'learning_rate': 6.3256053384974105e-06, 'epoch': 5.05}

 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž   | 389/616 [6:07:41<3:43:48, 59.16s/it]
 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž   | 390/616 [6:08:37<3:39:10, 58.19s/it]
                                                     
{'loss': 1.6616, 'learning_rate': 6.276714560124166e-06, 'epoch': 5.06}

 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž   | 390/616 [6:08:37<3:39:10, 58.19s/it]
 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž   | 391/616 [6:09:31<3:34:03, 57.08s/it]
                                                     
{'loss': 1.6162, 'learning_rate': 6.2279268859642396e-06, 'epoch': 5.08}

 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž   | 391/616 [6:09:31<3:34:03, 57.08s/it]
 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž   | 392/616 [6:10:27<3:31:15, 56.59s/it]
                                                     
{'loss': 1.6646, 'learning_rate': 6.179243667032709e-06, 'epoch': 5.09}

 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž   | 392/616 [6:10:27<3:31:15, 56.59s/it]
 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 393/616 [6:11:22<3:29:23, 56.34s/it]
                                                     
{'loss': 1.6445, 'learning_rate': 6.130666251452102e-06, 'epoch': 5.1}

 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 393/616 [6:11:22<3:29:23, 56.34s/it]
 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 394/616 [6:12:18<3:28:01, 56.22s/it]
                                                     
{'loss': 1.6299, 'learning_rate': 6.082195984415069e-06, 'epoch': 5.12}

 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 394/616 [6:12:18<3:28:01, 56.22s/it]
 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 395/616 [6:13:13<3:25:56, 55.91s/it]
                                                     
{'loss': 1.6221, 'learning_rate': 6.03383420814714e-06, 'epoch': 5.13}

 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 395/616 [6:13:13<3:25:56, 55.91s/it]
 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 396/616 [6:14:08<3:24:04, 55.65s/it]
                                                     
{'loss': 1.647, 'learning_rate': 5.9855822618695385e-06, 'epoch': 5.14}

 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 396/616 [6:14:08<3:24:04, 55.65s/it]
 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 397/616 [6:15:04<3:22:35, 55.50s/it]
                                                     
{'loss': 1.6147, 'learning_rate': 5.937441481762112e-06, 'epoch': 5.16}

 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 397/616 [6:15:04<3:22:35, 55.50s/it]
 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 398/616 [6:15:59<3:21:06, 55.35s/it]
                                                     
{'loss': 1.6025, 'learning_rate': 5.889413200926317e-06, 'epoch': 5.17}

 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 398/616 [6:15:59<3:21:06, 55.35s/it]
 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 399/616 [6:16:54<3:19:48, 55.25s/it]
                                                     
{'loss': 1.6064, 'learning_rate': 5.841498749348322e-06, 'epoch': 5.18}

 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 399/616 [6:16:54<3:19:48, 55.25s/it]
 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 400/616 [6:17:50<3:19:49, 55.50s/it]
                                                     
{'loss': 1.6587, 'learning_rate': 5.793699453862161e-06, 'epoch': 5.19}

 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   | 400/616 [6:17:50<3:19:49, 55.50s/it]
 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ   | 401/616 [6:19:54<4:33:18, 76.27s/it]
                                                     
{'loss': 1.6255, 'learning_rate': 5.746016638112986e-06, 'epoch': 5.21}

 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ   | 401/616 [6:19:54<4:33:18, 76.27s/it]
 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ   | 402/616 [6:20:50<4:09:42, 70.01s/it]
                                                     
{'loss': 1.6523, 'learning_rate': 5.698451622520442e-06, 'epoch': 5.22}

 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ   | 402/616 [6:20:50<4:09:42, 70.01s/it]
 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ   | 403/616 [6:21:45<3:52:28, 65.49s/it]
                                                     
{'loss': 1.6367, 'learning_rate': 5.651005724242072e-06, 'epoch': 5.23}

 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ   | 403/616 [6:21:45<3:52:28, 65.49s/it]
 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ   | 404/616 [6:22:40<3:40:53, 62.52s/it]
                                                     
{'loss': 1.6006, 'learning_rate': 5.603680257136857e-06, 'epoch': 5.25}

 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ   | 404/616 [6:22:40<3:40:53, 62.52s/it]
 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ   | 405/616 [6:23:37<3:33:15, 60.64s/it]
                                                     
{'loss': 1.6294, 'learning_rate': 5.556476531728836e-06, 'epoch': 5.26}

 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ   | 405/616 [6:23:37<3:33:15, 60.64s/it]
 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ   | 406/616 [6:24:33<3:27:23, 59.26s/it]
                                                     
{'loss': 1.6284, 'learning_rate': 5.509395855170798e-06, 'epoch': 5.27}

 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ   | 406/616 [6:24:33<3:27:23, 59.26s/it]
 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ   | 407/616 [6:25:29<3:23:21, 58.38s/it]
                                                     
{'loss': 1.6392, 'learning_rate': 5.4624395312081125e-06, 'epoch': 5.29}

 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ   | 407/616 [6:25:29<3:23:21, 58.38s/it]
 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ   | 408/616 [6:26:25<3:20:23, 57.80s/it]
                                                     
{'loss': 1.625, 'learning_rate': 5.415608860142593e-06, 'epoch': 5.3}

 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ   | 408/616 [6:26:25<3:20:23, 57.80s/it]
 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹   | 409/616 [6:27:21<3:17:10, 57.15s/it]
                                                     
{'loss': 1.6162, 'learning_rate': 5.368905138796523e-06, 'epoch': 5.31}

 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹   | 409/616 [6:27:21<3:17:10, 57.15s/it]
 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹   | 410/616 [6:28:17<3:14:41, 56.71s/it]
                                                     
{'loss': 1.5752, 'learning_rate': 5.322329660476715e-06, 'epoch': 5.32}

 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹   | 410/616 [6:28:17<3:14:41, 56.71s/it]
 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹   | 411/616 [6:29:12<3:12:29, 56.34s/it]
                                                     
{'loss': 1.6655, 'learning_rate': 5.275883714938726e-06, 'epoch': 5.34}

 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹   | 411/616 [6:29:12<3:12:29, 56.34s/it]
 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹   | 412/616 [6:30:08<3:10:27, 56.02s/it]
                                                     
{'loss': 1.5972, 'learning_rate': 5.2295685883511086e-06, 'epoch': 5.35}

 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹   | 412/616 [6:30:08<3:10:27, 56.02s/it]
 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹   | 413/616 [6:31:03<3:09:04, 55.88s/it]
                                                     
{'loss': 1.6421, 'learning_rate': 5.183385563259819e-06, 'epoch': 5.36}

 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹   | 413/616 [6:31:03<3:09:04, 55.88s/it]
 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹   | 414/616 [6:31:59<3:07:42, 55.76s/it]
                                                     
{'loss': 1.5869, 'learning_rate': 5.137335918552702e-06, 'epoch': 5.38}

 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹   | 414/616 [6:31:59<3:07:42, 55.76s/it]
 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹   | 415/616 [6:32:54<3:06:39, 55.72s/it]
                                                     
{'loss': 1.6333, 'learning_rate': 5.091420929424065e-06, 'epoch': 5.39}

 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹   | 415/616 [6:32:54<3:06:39, 55.72s/it]
 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š   | 416/616 [6:33:50<3:05:36, 55.68s/it]
                                                     
{'loss': 1.6445, 'learning_rate': 5.045641867339361e-06, 'epoch': 5.4}

 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š   | 416/616 [6:33:50<3:05:36, 55.68s/it]
 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š   | 417/616 [6:34:47<3:05:53, 56.05s/it]
                                                     
{'loss': 1.6597, 'learning_rate': 5.000000000000003e-06, 'epoch': 5.42}

 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š   | 417/616 [6:34:47<3:05:53, 56.05s/it]
 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š   | 418/616 [6:35:42<3:04:23, 55.88s/it]
                                                     
{'loss': 1.6387, 'learning_rate': 4.954496591308227e-06, 'epoch': 5.43}

 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š   | 418/616 [6:35:42<3:04:23, 55.88s/it]
 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š   | 419/616 [6:36:38<3:03:44, 55.96s/it]
                                                     
{'loss': 1.6489, 'learning_rate': 4.909132901332122e-06, 'epoch': 5.44}

 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š   | 419/616 [6:36:38<3:03:44, 55.96s/it]
 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š   | 420/616 [6:37:35<3:03:39, 56.22s/it]
                                                     
{'loss': 1.6318, 'learning_rate': 4.863910186270726e-06, 'epoch': 5.45}

 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š   | 420/616 [6:37:35<3:03:39, 56.22s/it]
 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š   | 421/616 [6:38:31<3:02:23, 56.12s/it]
                                                     
{'loss': 1.6841, 'learning_rate': 4.818829698419225e-06, 'epoch': 5.47}

 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š   | 421/616 [6:38:31<3:02:23, 56.12s/it]
 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š   | 422/616 [6:39:27<3:01:16, 56.07s/it]
                                                     
{'loss': 1.666, 'learning_rate': 4.773892686134301e-06, 'epoch': 5.48}

 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š   | 422/616 [6:39:27<3:01:16, 56.07s/it]
 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š   | 423/616 [6:40:22<2:59:42, 55.87s/it]
                                                     
{'loss': 1.6162, 'learning_rate': 4.729100393799538e-06, 'epoch': 5.49}

 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š   | 423/616 [6:40:22<2:59:42, 55.87s/it]
 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰   | 424/616 [6:41:19<2:59:17, 56.03s/it]
                                                     
{'loss': 1.5957, 'learning_rate': 4.684454061790987e-06, 'epoch': 5.51}

 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰   | 424/616 [6:41:19<2:59:17, 56.03s/it]
 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰   | 425/616 [6:42:13<2:56:55, 55.58s/it]
                                                     
{'loss': 1.6201, 'learning_rate': 4.639954926442792e-06, 'epoch': 5.52}

 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰   | 425/616 [6:42:13<2:56:55, 55.58s/it]
 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰   | 426/616 [6:43:09<2:55:42, 55.49s/it]
                                                     
{'loss': 1.6533, 'learning_rate': 4.5956042200129725e-06, 'epoch': 5.53}

 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰   | 426/616 [6:43:09<2:55:42, 55.49s/it]
 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰   | 427/616 [6:44:05<2:55:42, 55.78s/it]
                                                     
{'loss': 1.624, 'learning_rate': 4.551403170649299e-06, 'epoch': 5.55}

 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰   | 427/616 [6:44:05<2:55:42, 55.78s/it]
 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰   | 428/616 [6:45:00<2:53:57, 55.52s/it]
                                                     
{'loss': 1.604, 'learning_rate': 4.507353002355269e-06, 'epoch': 5.56}

 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰   | 428/616 [6:45:00<2:53:57, 55.52s/it]
 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰   | 429/616 [6:45:56<2:53:53, 55.80s/it]
                                                     
{'loss': 1.6089, 'learning_rate': 4.4634549349562315e-06, 'epoch': 5.57}

 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰   | 429/616 [6:45:56<2:53:53, 55.80s/it]
 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰   | 430/616 [6:46:52<2:52:24, 55.62s/it]
                                                     
{'loss': 1.5962, 'learning_rate': 4.4197101840656e-06, 'epoch': 5.58}

 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰   | 430/616 [6:46:52<2:52:24, 55.62s/it]
 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰   | 431/616 [6:47:48<2:51:48, 55.72s/it]
                                                     
{'loss': 1.5962, 'learning_rate': 4.376119961051175e-06, 'epoch': 5.6}

 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰   | 431/616 [6:47:48<2:51:48, 55.72s/it]
 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ   | 432/616 [6:48:43<2:50:43, 55.67s/it]
                                                     
{'loss': 1.6313, 'learning_rate': 4.33268547300163e-06, 'epoch': 5.61}

 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ   | 432/616 [6:48:43<2:50:43, 55.67s/it]
 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ   | 433/616 [6:49:38<2:48:53, 55.37s/it]
                                                     
{'loss': 1.6626, 'learning_rate': 4.289407922693053e-06, 'epoch': 5.62}

 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ   | 433/616 [6:49:38<2:48:53, 55.37s/it]
 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ   | 434/616 [6:50:34<2:48:44, 55.63s/it]
                                                     
{'loss': 1.5796, 'learning_rate': 4.2462885085556635e-06, 'epoch': 5.64}

 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ   | 434/616 [6:50:34<2:48:44, 55.63s/it]
 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ   | 435/616 [6:51:30<2:48:04, 55.72s/it]
                                                     
{'loss': 1.6836, 'learning_rate': 4.203328424640619e-06, 'epoch': 5.65}

 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ   | 435/616 [6:51:30<2:48:04, 55.72s/it]
 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ   | 436/616 [6:52:25<2:46:41, 55.56s/it]
                                                     
{'loss': 1.6675, 'learning_rate': 4.1605288605869365e-06, 'epoch': 5.66}

 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ   | 436/616 [6:52:25<2:46:41, 55.56s/it]
 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ   | 437/616 [6:53:21<2:46:12, 55.71s/it]
                                                     
{'loss': 1.6807, 'learning_rate': 4.117891001588574e-06, 'epoch': 5.68}

 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ   | 437/616 [6:53:21<2:46:12, 55.71s/it]
 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ   | 438/616 [6:54:17<2:45:16, 55.71s/it]
                                                     
{'loss': 1.6167, 'learning_rate': 4.075416028361584e-06, 'epoch': 5.69}

 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ   | 438/616 [6:54:17<2:45:16, 55.71s/it]
 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 439/616 [6:55:14<2:45:15, 56.02s/it]
                                                     
{'loss': 1.6851, 'learning_rate': 4.033105117111441e-06, 'epoch': 5.7}

 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 439/616 [6:55:14<2:45:15, 56.02s/it]
 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 440/616 [6:56:09<2:43:49, 55.85s/it]
                                                     
{'loss': 1.6191, 'learning_rate': 3.9909594395004545e-06, 'epoch': 5.71}

 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 440/616 [6:56:09<2:43:49, 55.85s/it]
 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 441/616 [6:57:05<2:43:19, 56.00s/it]
                                                     
{'loss': 1.6362, 'learning_rate': 3.948980162615323e-06, 'epoch': 5.73}

 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 441/616 [6:57:05<2:43:19, 56.00s/it]
 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 442/616 [6:58:01<2:41:55, 55.84s/it]
                                                     
{'loss': 1.5825, 'learning_rate': 3.907168448934836e-06, 'epoch': 5.74}

 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 442/616 [6:58:01<2:41:55, 55.84s/it]
 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 443/616 [6:58:56<2:40:39, 55.72s/it]
                                                     
{'loss': 1.6182, 'learning_rate': 3.865525456297652e-06, 'epoch': 5.75}

 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 443/616 [6:58:56<2:40:39, 55.72s/it]
 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 444/616 [6:59:51<2:39:13, 55.54s/it]
                                                     
{'loss': 1.5908, 'learning_rate': 3.824052337870263e-06, 'epoch': 5.77}

 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 444/616 [6:59:51<2:39:13, 55.54s/it]
 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 445/616 [7:00:47<2:38:13, 55.52s/it]
                                                     
{'loss': 1.6162, 'learning_rate': 3.7827502421150497e-06, 'epoch': 5.78}

 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 445/616 [7:00:47<2:38:13, 55.52s/it]
 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 446/616 [7:01:43<2:38:07, 55.81s/it]
                                                     
{'loss': 1.6021, 'learning_rate': 3.741620312758469e-06, 'epoch': 5.79}

 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 446/616 [7:01:43<2:38:07, 55.81s/it]
 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž  | 447/616 [7:02:39<2:37:12, 55.81s/it]
                                                     
{'loss': 1.6479, 'learning_rate': 3.7006636887594095e-06, 'epoch': 5.81}

 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž  | 447/616 [7:02:39<2:37:12, 55.81s/it]
 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž  | 448/616 [7:03:34<2:35:15, 55.45s/it]
                                                     
{'loss': 1.6294, 'learning_rate': 3.6598815042776135e-06, 'epoch': 5.82}

 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž  | 448/616 [7:03:34<2:35:15, 55.45s/it]
 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž  | 449/616 [7:04:29<2:34:24, 55.48s/it]
                                                     
{'loss': 1.6914, 'learning_rate': 3.619274888642309e-06, 'epoch': 5.83}

 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž  | 449/616 [7:04:29<2:34:24, 55.48s/it]
 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž  | 450/616 [7:05:25<2:33:44, 55.57s/it]
                                                     
{'loss': 1.6226, 'learning_rate': 3.578844966320917e-06, 'epoch': 5.84}

 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž  | 450/616 [7:05:25<2:33:44, 55.57s/it]
 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž  | 451/616 [7:06:20<2:32:34, 55.48s/it]
                                                     
{'loss': 1.6196, 'learning_rate': 3.5385928568879012e-06, 'epoch': 5.86}

 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž  | 451/616 [7:06:20<2:32:34, 55.48s/it]
 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž  | 452/616 [7:07:16<2:31:53, 55.57s/it]
                                                     
{'loss': 1.5977, 'learning_rate': 3.4985196749937976e-06, 'epoch': 5.87}

 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž  | 452/616 [7:07:16<2:31:53, 55.57s/it]
 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž  | 453/616 [7:08:12<2:31:23, 55.73s/it]
                                                     
{'loss': 1.5786, 'learning_rate': 3.458626530334316e-06, 'epoch': 5.88}

 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž  | 453/616 [7:08:12<2:31:23, 55.73s/it]
 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž  | 454/616 [7:09:08<2:30:20, 55.68s/it]
                                                     
{'loss': 1.6113, 'learning_rate': 3.4189145276196244e-06, 'epoch': 5.9}

 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž  | 454/616 [7:09:08<2:30:20, 55.68s/it]
 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 455/616 [7:10:03<2:28:44, 55.43s/it]
                                                     
{'loss': 1.6025, 'learning_rate': 3.3793847665437674e-06, 'epoch': 5.91}

 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 455/616 [7:10:03<2:28:44, 55.43s/it]
 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 456/616 [7:10:58<2:27:26, 55.29s/it]
                                                     
{'loss': 1.6191, 'learning_rate': 3.340038341754189e-06, 'epoch': 5.92}

 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 456/616 [7:10:58<2:27:26, 55.29s/it]
 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 457/616 [7:11:53<2:26:08, 55.15s/it]
                                                     
{'loss': 1.604, 'learning_rate': 3.300876342821451e-06, 'epoch': 5.94}

 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 457/616 [7:11:53<2:26:08, 55.15s/it]
 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 458/616 [7:12:48<2:25:24, 55.22s/it]
                                                     
{'loss': 1.6274, 'learning_rate': 3.2618998542090263e-06, 'epoch': 5.95}

 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 458/616 [7:12:48<2:25:24, 55.22s/it]
 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 459/616 [7:13:44<2:24:50, 55.36s/it]
                                                     
{'loss': 1.6543, 'learning_rate': 3.2231099552433e-06, 'epoch': 5.96}

 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 459/616 [7:13:44<2:24:50, 55.36s/it]
 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 460/616 [7:14:40<2:24:28, 55.57s/it]
                                                     
{'loss': 1.6265, 'learning_rate': 3.1845077200836638e-06, 'epoch': 5.97}

 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 460/616 [7:14:40<2:24:28, 55.57s/it]
 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 461/616 [7:15:35<2:23:23, 55.51s/it]
                                                     
{'loss': 1.6123, 'learning_rate': 3.1460942176927666e-06, 'epoch': 5.99}

 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  | 461/616 [7:15:35<2:23:23, 55.51s/it]
 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ  | 462/616 [7:16:31<2:22:50, 55.66s/it]
                                                     
{'loss': 1.6401, 'learning_rate': 3.107870511806934e-06, 'epoch': 6.0}

 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ  | 462/616 [7:16:31<2:22:50, 55.66s/it]
 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ  | 463/616 [7:17:59<2:46:44, 65.39s/it]
                                                     
{'loss': 1.6094, 'learning_rate': 3.0698376609066828e-06, 'epoch': 6.01}

 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ  | 463/616 [7:17:59<2:46:44, 65.39s/it]
 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ  | 464/616 [7:18:54<2:37:50, 62.31s/it]
                                                     
{'loss': 1.5859, 'learning_rate': 3.0319967181874366e-06, 'epoch': 6.03}

 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ  | 464/616 [7:18:54<2:37:50, 62.31s/it]
 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ  | 465/616 [7:19:49<2:31:01, 60.01s/it]
                                                     
{'loss': 1.6182, 'learning_rate': 2.9943487315303486e-06, 'epoch': 6.04}

 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ  | 465/616 [7:19:49<2:31:01, 60.01s/it]
 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ  | 466/616 [7:20:44<2:26:06, 58.44s/it]
                                                     
{'loss': 1.6196, 'learning_rate': 2.9568947434732777e-06, 'epoch': 6.05}

 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ  | 466/616 [7:20:44<2:26:06, 58.44s/it]
 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ  | 467/616 [7:21:39<2:22:56, 57.56s/it]
                                                     
{'loss': 1.6367, 'learning_rate': 2.919635791181934e-06, 'epoch': 6.06}

 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ  | 467/616 [7:21:39<2:22:56, 57.56s/it]
 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ  | 468/616 [7:22:34<2:19:59, 56.76s/it]
                                                     
{'loss': 1.7124, 'learning_rate': 2.882572906421145e-06, 'epoch': 6.08}

 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ  | 468/616 [7:22:34<2:19:59, 56.76s/it]
 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ  | 469/616 [7:23:29<2:17:48, 56.25s/it]
                                                     
{'loss': 1.623, 'learning_rate': 2.8457071155262885e-06, 'epoch': 6.09}

 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ  | 469/616 [7:23:29<2:17:48, 56.25s/it]
 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹  | 470/616 [7:24:25<2:16:44, 56.19s/it]
                                                     
{'loss': 1.5874, 'learning_rate': 2.809039439374878e-06, 'epoch': 6.1}

 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹  | 470/616 [7:24:25<2:16:44, 56.19s/it]
 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹  | 471/616 [7:25:22<2:16:04, 56.31s/it]
                                                     
{'loss': 1.6362, 'learning_rate': 2.7725708933582785e-06, 'epoch': 6.12}

 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹  | 471/616 [7:25:22<2:16:04, 56.31s/it]
 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹  | 472/616 [7:26:17<2:14:30, 56.05s/it]
                                                     
{'loss': 1.6221, 'learning_rate': 2.7363024873536093e-06, 'epoch': 6.13}

 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹  | 472/616 [7:26:17<2:14:30, 56.05s/it]
 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹  | 473/616 [7:27:13<2:13:28, 56.01s/it]
                                                     
{'loss': 1.6416, 'learning_rate': 2.700235225695752e-06, 'epoch': 6.14}

 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹  | 473/616 [7:27:13<2:13:28, 56.01s/it]
 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹  | 474/616 [7:28:08<2:11:56, 55.75s/it]
                                                     
{'loss': 1.668, 'learning_rate': 2.6643701071495644e-06, 'epoch': 6.16}

 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹  | 474/616 [7:28:08<2:11:56, 55.75s/it]
 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹  | 475/616 [7:29:04<2:10:54, 55.70s/it]
                                                     
{'loss': 1.5928, 'learning_rate': 2.628708124882212e-06, 'epoch': 6.17}

 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹  | 475/616 [7:29:04<2:10:54, 55.70s/it]
 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹  | 476/616 [7:30:00<2:09:59, 55.71s/it]
                                                     
{'loss': 1.6172, 'learning_rate': 2.5932502664356553e-06, 'epoch': 6.18}

 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹  | 476/616 [7:30:00<2:09:59, 55.71s/it]
 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹  | 477/616 [7:30:56<2:09:20, 55.83s/it]
                                                     
{'loss': 1.6162, 'learning_rate': 2.5579975136993253e-06, 'epoch': 6.19}

 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹  | 477/616 [7:30:56<2:09:20, 55.83s/it]
 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š  | 478/616 [7:31:51<2:07:52, 55.60s/it]
                                                     
{'loss': 1.6636, 'learning_rate': 2.52295084288291e-06, 'epoch': 6.21}

 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š  | 478/616 [7:31:51<2:07:52, 55.60s/it]
 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š  | 479/616 [7:32:47<2:07:11, 55.70s/it]
                                                     
{'loss': 1.6748, 'learning_rate': 2.4881112244893403e-06, 'epoch': 6.22}

 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š  | 479/616 [7:32:47<2:07:11, 55.70s/it]
 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š  | 480/616 [7:33:42<2:06:17, 55.71s/it]
                                                     
{'loss': 1.6167, 'learning_rate': 2.453479623287909e-06, 'epoch': 6.23}

 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š  | 480/616 [7:33:42<2:06:17, 55.71s/it]
 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š  | 481/616 [7:34:39<2:05:45, 55.90s/it]
                                                     
{'loss': 1.6763, 'learning_rate': 2.419056998287547e-06, 'epoch': 6.25}

 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š  | 481/616 [7:34:39<2:05:45, 55.90s/it]
 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š  | 482/616 [7:35:36<2:05:33, 56.22s/it]
                                                     
{'loss': 1.6587, 'learning_rate': 2.3848443027102706e-06, 'epoch': 6.26}

 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š  | 482/616 [7:35:36<2:05:33, 56.22s/it]
 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š  | 483/616 [7:36:31<2:03:57, 55.92s/it]
                                                     
{'loss': 1.6538, 'learning_rate': 2.3508424839647994e-06, 'epoch': 6.27}

 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š  | 483/616 [7:36:31<2:03:57, 55.92s/it]
 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š  | 484/616 [7:37:27<2:03:00, 55.91s/it]
                                                     
{'loss': 1.5952, 'learning_rate': 2.3170524836202936e-06, 'epoch': 6.29}

 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š  | 484/616 [7:37:27<2:03:00, 55.91s/it]
 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š  | 485/616 [7:38:24<2:02:39, 56.18s/it]
                                                     
{'loss': 1.6348, 'learning_rate': 2.2834752373803094e-06, 'epoch': 6.3}

 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š  | 485/616 [7:38:24<2:02:39, 56.18s/it]
 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰  | 486/616 [7:39:20<2:01:47, 56.21s/it]
                                                     
{'loss': 1.6074, 'learning_rate': 2.250111675056863e-06, 'epoch': 6.31}

 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰  | 486/616 [7:39:20<2:01:47, 56.21s/it]
 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰  | 487/616 [7:40:17<2:01:07, 56.34s/it]
                                                     
{'loss': 1.6284, 'learning_rate': 2.216962720544703e-06, 'epoch': 6.32}

 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰  | 487/616 [7:40:17<2:01:07, 56.34s/it]
 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰  | 488/616 [7:41:12<1:59:49, 56.16s/it]
                                                     
{'loss': 1.6143, 'learning_rate': 2.184029291795705e-06, 'epoch': 6.34}

 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰  | 488/616 [7:41:12<1:59:49, 56.16s/it]
 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰  | 489/616 [7:42:08<1:58:41, 56.07s/it]
                                                     
{'loss': 1.6323, 'learning_rate': 2.151312300793473e-06, 'epoch': 6.35}

 79%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰  | 489/616 [7:42:08<1:58:41, 56.07s/it]
 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰  | 490/616 [7:43:04<1:57:44, 56.07s/it]
                                                     
{'loss': 1.6387, 'learning_rate': 2.118812653528077e-06, 'epoch': 6.36}

 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰  | 490/616 [7:43:04<1:57:44, 56.07s/it]
 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰  | 491/616 [7:43:59<1:56:07, 55.74s/it]
                                                     
{'loss': 1.6016, 'learning_rate': 2.086531249970952e-06, 'epoch': 6.38}

 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰  | 491/616 [7:43:59<1:56:07, 55.74s/it]
 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰  | 492/616 [7:44:56<1:55:33, 55.91s/it]
                                                     
{'loss': 1.6616, 'learning_rate': 2.0544689840499988e-06, 'epoch': 6.39}

 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰  | 492/616 [7:44:56<1:55:33, 55.91s/it]
 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ  | 493/616 [7:45:51<1:54:35, 55.90s/it]
                                                     
{'loss': 1.6211, 'learning_rate': 2.022626743624807e-06, 'epoch': 6.4}

 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ  | 493/616 [7:45:51<1:54:35, 55.90s/it]
 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ  | 494/616 [7:46:47<1:53:29, 55.82s/it]
                                                     
{'loss': 1.6504, 'learning_rate': 1.991005410462089e-06, 'epoch': 6.42}

 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ  | 494/616 [7:46:47<1:53:29, 55.82s/it]
 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ  | 495/616 [7:47:44<1:52:57, 56.01s/it]
                                                     
{'loss': 1.6748, 'learning_rate': 1.9596058602112533e-06, 'epoch': 6.43}

 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ  | 495/616 [7:47:44<1:52:57, 56.01s/it]
 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ  | 496/616 [7:48:41<1:53:07, 56.56s/it]
                                                     
{'loss': 1.6597, 'learning_rate': 1.928428962380148e-06, 'epoch': 6.44}

 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ  | 496/616 [7:48:41<1:53:07, 56.56s/it]
 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ  | 497/616 [7:49:37<1:51:38, 56.29s/it]
                                                     
{'loss': 1.6133, 'learning_rate': 1.8974755803109968e-06, 'epoch': 6.45}

 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ  | 497/616 [7:49:37<1:51:38, 56.29s/it]
 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ  | 498/616 [7:50:33<1:50:23, 56.14s/it]
                                                     
{'loss': 1.6294, 'learning_rate': 1.866746571156479e-06, 'epoch': 6.47}

 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ  | 498/616 [7:50:33<1:50:23, 56.14s/it]
 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ  | 499/616 [7:51:29<1:49:28, 56.14s/it]
                                                     
{'loss': 1.6074, 'learning_rate': 1.8362427858560094e-06, 'epoch': 6.48}

 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ  | 499/616 [7:51:29<1:49:28, 56.14s/it]
 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ  | 500/616 [7:52:25<1:48:36, 56.18s/it]
                                                     
{'loss': 1.645, 'learning_rate': 1.8059650691121611e-06, 'epoch': 6.49}

 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ  | 500/616 [7:52:25<1:48:36, 56.18s/it]
 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 501/616 [7:54:18<2:20:08, 73.11s/it]
                                                     
{'loss': 1.5884, 'learning_rate': 1.7759142593672707e-06, 'epoch': 6.51}

 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 501/616 [7:54:18<2:20:08, 73.11s/it]
 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 502/616 [7:55:13<2:08:26, 67.60s/it]
                                                     
{'loss': 1.6245, 'learning_rate': 1.74609118878024e-06, 'epoch': 6.52}

 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 502/616 [7:55:13<2:08:26, 67.60s/it]
 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 503/616 [7:56:08<2:00:33, 64.02s/it]
                                                     
{'loss': 1.6309, 'learning_rate': 1.7164966832034668e-06, 'epoch': 6.53}

 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 503/616 [7:56:08<2:00:33, 64.02s/it]
 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 504/616 [7:57:05<1:55:25, 61.83s/it]
                                                     
{'loss': 1.6035, 'learning_rate': 1.6871315621599982e-06, 'epoch': 6.55}

 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 504/616 [7:57:05<1:55:25, 61.83s/it]
 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 505/616 [7:58:01<1:51:20, 60.18s/it]
                                                     
{'loss': 1.5688, 'learning_rate': 1.6579966388208257e-06, 'epoch': 6.56}

 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 505/616 [7:58:01<1:51:20, 60.18s/it]
 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 506/616 [7:58:57<1:47:46, 58.79s/it]
                                                     
{'loss': 1.5762, 'learning_rate': 1.6290927199823604e-06, 'epoch': 6.57}

 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 506/616 [7:58:57<1:47:46, 58.79s/it]
 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 507/616 [7:59:52<1:44:51, 57.72s/it]
                                                     
{'loss': 1.6323, 'learning_rate': 1.6004206060441096e-06, 'epoch': 6.58}

 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 507/616 [7:59:52<1:44:51, 57.72s/it]
 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 508/616 [8:00:48<1:42:45, 57.09s/it]
                                                     
{'loss': 1.5884, 'learning_rate': 1.5719810909864941e-06, 'epoch': 6.6}

 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 508/616 [8:00:48<1:42:45, 57.09s/it]
 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 509/616 [8:01:44<1:41:20, 56.83s/it]
                                                     
{'loss': 1.6382, 'learning_rate': 1.543774962348874e-06, 'epoch': 6.61}

 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 509/616 [8:01:44<1:41:20, 56.83s/it]
 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 510/616 [8:02:40<1:39:51, 56.52s/it]
                                                     
{'loss': 1.6279, 'learning_rate': 1.5158030012077329e-06, 'epoch': 6.62}

 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 510/616 [8:02:40<1:39:51, 56.52s/it]
 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 511/616 [8:03:37<1:39:08, 56.65s/it]
                                                     
{'loss': 1.6304, 'learning_rate': 1.4880659821550547e-06, 'epoch': 6.64}

 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 511/616 [8:03:37<1:39:08, 56.65s/it]
 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 512/616 [8:04:32<1:37:24, 56.20s/it]
                                                     
{'loss': 1.6289, 'learning_rate': 1.4605646732768685e-06, 'epoch': 6.65}

 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 512/616 [8:04:32<1:37:24, 56.20s/it]
 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 513/616 [8:05:27<1:36:08, 56.00s/it]
                                                     
{'loss': 1.5889, 'learning_rate': 1.4332998361319783e-06, 'epoch': 6.66}

 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 513/616 [8:05:27<1:36:08, 56.00s/it]
 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 514/616 [8:06:24<1:35:25, 56.13s/it]
                                                     
{'loss': 1.6221, 'learning_rate': 1.4062722257308803e-06, 'epoch': 6.68}

 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 514/616 [8:06:24<1:35:25, 56.13s/it]
 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 515/616 [8:07:20<1:34:25, 56.09s/it]
                                                     
{'loss': 1.604, 'learning_rate': 1.3794825905148557e-06, 'epoch': 6.69}

 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 515/616 [8:07:20<1:34:25, 56.09s/it]
 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 516/616 [8:08:15<1:33:07, 55.88s/it]
                                                     
{'loss': 1.6099, 'learning_rate': 1.3529316723352303e-06, 'epoch': 6.7}

 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 516/616 [8:08:15<1:33:07, 55.88s/it]
 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 517/616 [8:09:11<1:32:09, 55.85s/it]
                                                     
{'loss': 1.6045, 'learning_rate': 1.3266202064328548e-06, 'epoch': 6.71}

 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 517/616 [8:09:11<1:32:09, 55.85s/it]
 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 518/616 [8:10:06<1:31:00, 55.72s/it]
                                                     
{'loss': 1.6289, 'learning_rate': 1.3005489214177213e-06, 'epoch': 6.73}

 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 518/616 [8:10:06<1:31:00, 55.72s/it]
 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 519/616 [8:11:03<1:30:18, 55.86s/it]
                                                     
{'loss': 1.6519, 'learning_rate': 1.2747185392488048e-06, 'epoch': 6.74}

 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 519/616 [8:11:03<1:30:18, 55.86s/it]
 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 520/616 [8:11:57<1:28:46, 55.48s/it]
                                                     
{'loss': 1.6338, 'learning_rate': 1.249129775214064e-06, 'epoch': 6.75}

 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 520/616 [8:11:57<1:28:46, 55.48s/it]
 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 521/616 [8:12:53<1:28:06, 55.64s/it]
                                                     
{'loss': 1.6196, 'learning_rate': 1.2237833379106257e-06, 'epoch': 6.77}

 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 521/616 [8:12:53<1:28:06, 55.64s/it]
 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 522/616 [8:13:48<1:26:42, 55.35s/it]
                                                     
{'loss': 1.6104, 'learning_rate': 1.1986799292251816e-06, 'epoch': 6.78}

 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 522/616 [8:13:48<1:26:42, 55.35s/it]
 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 523/616 [8:14:44<1:26:06, 55.56s/it]
                                                     
{'loss': 1.6309, 'learning_rate': 1.1738202443145307e-06, 'epoch': 6.79}

 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 523/616 [8:14:44<1:26:06, 55.56s/it]
 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 524/616 [8:15:40<1:25:39, 55.86s/it]
                                                     
{'loss': 1.5845, 'learning_rate': 1.1492049715863464e-06, 'epoch': 6.81}

 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 524/616 [8:15:40<1:25:39, 55.86s/it]
 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 525/616 [8:16:35<1:24:11, 55.52s/it]
                                                     
{'loss': 1.582, 'learning_rate': 1.1248347926801029e-06, 'epoch': 6.82}

 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 525/616 [8:16:35<1:24:11, 55.52s/it]
 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 526/616 [8:17:31<1:23:34, 55.71s/it]
                                                     
{'loss': 1.6553, 'learning_rate': 1.100710382448198e-06, 'epoch': 6.83}

 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 526/616 [8:17:31<1:23:34, 55.71s/it]
 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 527/616 [8:18:27<1:22:36, 55.69s/it]
                                                     
{'loss': 1.5771, 'learning_rate': 1.0768324089372816e-06, 'epoch': 6.84}

 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 527/616 [8:18:27<1:22:36, 55.69s/it]
 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 528/616 [8:19:22<1:21:27, 55.54s/it]
                                                     
{'loss': 1.6611, 'learning_rate': 1.053201533369731e-06, 'epoch': 6.86}

 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 528/616 [8:19:22<1:21:27, 55.54s/it]
 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 529/616 [8:20:18<1:20:36, 55.59s/it]
                                                     
{'loss': 1.6128, 'learning_rate': 1.029818410125365e-06, 'epoch': 6.87}

 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 529/616 [8:20:18<1:20:36, 55.59s/it]
 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 530/616 [8:21:14<1:19:49, 55.69s/it]
                                                     
{'loss': 1.5957, 'learning_rate': 1.0066836867233087e-06, 'epoch': 6.88}

 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 530/616 [8:21:14<1:19:49, 55.69s/it]
 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 531/616 [8:22:10<1:19:18, 55.98s/it]
                                                     
{'loss': 1.6299, 'learning_rate': 9.837980038040607e-07, 'epoch': 6.9}

 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 531/616 [8:22:10<1:19:18, 55.98s/it]
 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 532/616 [8:23:07<1:18:41, 56.21s/it]
                                                     
{'loss': 1.6147, 'learning_rate': 9.611619951117657e-07, 'epoch': 6.91}

 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 532/616 [8:23:07<1:18:41, 56.21s/it]
 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 533/616 [8:24:03<1:17:41, 56.16s/it]
                                                     
{'loss': 1.5864, 'learning_rate': 9.387762874766515e-07, 'epoch': 6.92}

 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 533/616 [8:24:03<1:17:41, 56.16s/it]
 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 534/616 [8:24:59<1:16:26, 55.94s/it]
                                                     
{'loss': 1.6245, 'learning_rate': 9.166415007976803e-07, 'epoch': 6.94}

 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 534/616 [8:24:59<1:16:26, 55.94s/it]
 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 535/616 [8:25:55<1:15:36, 56.01s/it]
                                                     
{'loss': 1.5781, 'learning_rate': 8.94758248025378e-07, 'epoch': 6.95}

 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 535/616 [8:25:55<1:15:36, 56.01s/it]
 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 536/616 [8:26:51<1:14:55, 56.19s/it]
                                                     
{'loss': 1.5845, 'learning_rate': 8.7312713514486e-07, 'epoch': 6.96}

 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 536/616 [8:26:51<1:14:55, 56.19s/it]
 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 537/616 [8:27:46<1:13:23, 55.74s/it]
                                                     
{'loss': 1.624, 'learning_rate': 8.517487611590558e-07, 'epoch': 6.97}

 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 537/616 [8:27:46<1:13:23, 55.74s/it]
 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 538/616 [8:28:41<1:12:04, 55.45s/it]
                                                     
{'loss': 1.5811, 'learning_rate': 8.306237180721121e-07, 'epoch': 6.99}

 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 538/616 [8:28:41<1:12:04, 55.45s/it]
 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 539/616 [8:29:37<1:11:36, 55.80s/it]
                                                     
{'loss': 1.5898, 'learning_rate': 8.097525908730108e-07, 'epoch': 7.0}

 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 539/616 [8:29:38<1:11:36, 55.80s/it]
 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 540/616 [8:30:59<1:20:35, 63.62s/it]
                                                     
{'loss': 1.5542, 'learning_rate': 7.891359575193613e-07, 'epoch': 7.01}

 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 540/616 [8:30:59<1:20:35, 63.62s/it]
 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 541/616 [8:31:55<1:16:28, 61.19s/it]
                                                     
{'loss': 1.6382, 'learning_rate': 7.687743889213939e-07, 'epoch': 7.03}

 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 541/616 [8:31:55<1:16:28, 61.19s/it]
 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 542/616 [8:32:51<1:13:34, 59.65s/it]
                                                     
{'loss': 1.6597, 'learning_rate': 7.486684489261609e-07, 'epoch': 7.04}

 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 542/616 [8:32:51<1:13:34, 59.65s/it]
 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 543/616 [8:33:46<1:10:46, 58.18s/it]
                                                     
{'loss': 1.5918, 'learning_rate': 7.288186943019171e-07, 'epoch': 7.05}

 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 543/616 [8:33:46<1:10:46, 58.18s/it]
 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 544/616 [8:34:42<1:09:00, 57.51s/it]
                                                     
{'loss': 1.6226, 'learning_rate': 7.092256747226944e-07, 'epoch': 7.06}

 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 544/616 [8:34:42<1:09:00, 57.51s/it]
 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 545/616 [8:35:38<1:07:30, 57.04s/it]
                                                     
{'loss': 1.563, 'learning_rate': 6.89889932753095e-07, 'epoch': 7.08}

 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 545/616 [8:35:38<1:07:30, 57.04s/it]
 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 546/616 [8:36:33<1:06:06, 56.67s/it]
                                                     
{'loss': 1.6348, 'learning_rate': 6.708120038332533e-07, 'epoch': 7.09}

 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 546/616 [8:36:33<1:06:06, 56.67s/it]
 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 547/616 [8:37:29<1:04:41, 56.25s/it]
                                                     
{'loss': 1.6089, 'learning_rate': 6.519924162640168e-07, 'epoch': 7.1}

 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 547/616 [8:37:29<1:04:41, 56.25s/it]
 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 548/616 [8:38:25<1:03:42, 56.22s/it]
                                                     
{'loss': 1.6143, 'learning_rate': 6.334316911923155e-07, 'epoch': 7.12}

 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 548/616 [8:38:25<1:03:42, 56.22s/it]
 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 549/616 [8:39:21<1:02:51, 56.29s/it]
                                                     
{'loss': 1.6396, 'learning_rate': 6.151303425967259e-07, 'epoch': 7.13}

 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 549/616 [8:39:21<1:02:51, 56.29s/it]
 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 550/616 [8:40:18<1:01:58, 56.34s/it]
                                                     
{'loss': 1.6387, 'learning_rate': 5.970888772732453e-07, 'epoch': 7.14}

 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 550/616 [8:40:18<1:01:58, 56.34s/it]
 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 551/616 [8:41:13<1:00:45, 56.08s/it]
                                                     
{'loss': 1.5835, 'learning_rate': 5.793077948212478e-07, 'epoch': 7.16}

 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 551/616 [8:41:13<1:00:45, 56.08s/it]
 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 552/616 [8:42:09<59:35, 55.87s/it]  
                                                   
{'loss': 1.6489, 'learning_rate': 5.617875876296641e-07, 'epoch': 7.17}

 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 552/616 [8:42:09<59:35, 55.87s/it]
 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 553/616 [8:43:05<58:43, 55.92s/it]
                                                   
{'loss': 1.6318, 'learning_rate': 5.445287408633304e-07, 'epoch': 7.18}

 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 553/616 [8:43:05<58:43, 55.92s/it]
 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 554/616 [8:44:00<57:42, 55.85s/it]
                                                   
{'loss': 1.645, 'learning_rate': 5.27531732449561e-07, 'epoch': 7.19}

 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 554/616 [8:44:00<57:42, 55.85s/it]
 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 555/616 [8:44:56<56:50, 55.91s/it]
                                                   
{'loss': 1.5996, 'learning_rate': 5.107970330649204e-07, 'epoch': 7.21}

 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 555/616 [8:44:56<56:50, 55.91s/it]
 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 556/616 [8:45:52<55:49, 55.83s/it]
                                                   
{'loss': 1.5962, 'learning_rate': 4.943251061221721e-07, 'epoch': 7.22}

 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 556/616 [8:45:52<55:49, 55.83s/it]
 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 557/616 [8:46:48<54:52, 55.81s/it]
                                                   
{'loss': 1.6211, 'learning_rate': 4.78116407757464e-07, 'epoch': 7.23}

 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 557/616 [8:46:48<54:52, 55.81s/it]
 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 558/616 [8:47:45<54:16, 56.15s/it]
                                                   
{'loss': 1.6011, 'learning_rate': 4.6217138681769026e-07, 'epoch': 7.25}

 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 558/616 [8:47:45<54:16, 56.15s/it]
 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 559/616 [8:48:40<53:13, 56.03s/it]
                                                   
{'loss': 1.6392, 'learning_rate': 4.464904848480522e-07, 'epoch': 7.26}

 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 559/616 [8:48:40<53:13, 56.03s/it]
 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 560/616 [8:49:38<52:36, 56.37s/it]
                                                   
{'loss': 1.6265, 'learning_rate': 4.310741360798498e-07, 'epoch': 7.27}

 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 560/616 [8:49:38<52:36, 56.37s/it]
 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 561/616 [8:50:33<51:24, 56.08s/it]
                                                   
{'loss': 1.6255, 'learning_rate': 4.1592276741844075e-07, 'epoch': 7.29}

 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 561/616 [8:50:33<51:24, 56.08s/it]
 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 562/616 [8:51:29<50:22, 55.97s/it]
                                                   
{'loss': 1.6196, 'learning_rate': 4.0103679843142895e-07, 'epoch': 7.3}

 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 562/616 [8:51:29<50:22, 55.97s/it]
 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 563/616 [8:52:26<49:45, 56.33s/it]
                                                   
{'loss': 1.6201, 'learning_rate': 3.864166413370429e-07, 'epoch': 7.31}

 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 563/616 [8:52:26<49:45, 56.33s/it]
 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 564/616 [8:53:22<48:45, 56.25s/it]
                                                   
{'loss': 1.6396, 'learning_rate': 3.720627009927158e-07, 'epoch': 7.32}

 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 564/616 [8:53:22<48:45, 56.25s/it]
 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 565/616 [8:54:18<47:48, 56.25s/it]
                                                   
{'loss': 1.6553, 'learning_rate': 3.5797537488388326e-07, 'epoch': 7.34}

 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 565/616 [8:54:18<47:48, 56.25s/it]
 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 566/616 [8:55:15<46:58, 56.37s/it]
                                                   
{'loss': 1.6431, 'learning_rate': 3.441550531129667e-07, 'epoch': 7.35}

 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 566/616 [8:55:15<46:58, 56.37s/it]
 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 567/616 [8:56:10<45:50, 56.13s/it]
                                                   
{'loss': 1.6362, 'learning_rate': 3.3060211838858104e-07, 'epoch': 7.36}

 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 567/616 [8:56:10<45:50, 56.13s/it]
 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 568/616 [8:57:05<44:38, 55.80s/it]
                                                   
{'loss': 1.5654, 'learning_rate': 3.1731694601492834e-07, 'epoch': 7.38}

 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 568/616 [8:57:05<44:38, 55.80s/it]
 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 569/616 [8:58:01<43:33, 55.61s/it]
                                                   
{'loss': 1.6074, 'learning_rate': 3.042999038814076e-07, 'epoch': 7.39}

 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 569/616 [8:58:01<43:33, 55.61s/it]
 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 570/616 [8:58:56<42:35, 55.55s/it]
                                                   
{'loss': 1.6094, 'learning_rate': 2.915513524524294e-07, 'epoch': 7.4}

 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 570/616 [8:58:56<42:35, 55.55s/it]
 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 571/616 [8:59:52<41:47, 55.72s/it]
                                                   
{'loss': 1.6758, 'learning_rate': 2.790716447574304e-07, 'epoch': 7.42}

 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 571/616 [8:59:52<41:47, 55.72s/it]
 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 572/616 [9:00:49<41:08, 56.11s/it]
                                                   
{'loss': 1.6313, 'learning_rate': 2.668611263811016e-07, 'epoch': 7.43}

 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 572/616 [9:00:49<41:08, 56.11s/it]
 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 573/616 [9:01:45<40:05, 55.95s/it]
                                                   
{'loss': 1.5835, 'learning_rate': 2.5492013545381666e-07, 'epoch': 7.44}

 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 573/616 [9:01:45<40:05, 55.95s/it]
 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 574/616 [9:02:41<39:14, 56.06s/it]
                                                   
{'loss': 1.5972, 'learning_rate': 2.4324900264226405e-07, 'epoch': 7.45}

 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 574/616 [9:02:41<39:14, 56.06s/it]
 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 575/616 [9:03:37<38:16, 56.02s/it]
                                                   
{'loss': 1.6689, 'learning_rate': 2.3184805114029872e-07, 'epoch': 7.47}

 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 575/616 [9:03:37<38:16, 56.02s/it]
 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 576/616 [9:04:32<37:07, 55.68s/it]
                                                   
{'loss': 1.6304, 'learning_rate': 2.2071759665998282e-07, 'epoch': 7.48}

 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 576/616 [9:04:32<37:07, 55.68s/it]
 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 577/616 [9:05:27<36:06, 55.55s/it]
                                                   
{'loss': 1.6035, 'learning_rate': 2.098579474228546e-07, 'epoch': 7.49}

 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 577/616 [9:05:27<36:06, 55.55s/it]
 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 578/616 [9:06:23<35:13, 55.63s/it]
                                                   
{'loss': 1.5952, 'learning_rate': 1.9926940415138206e-07, 'epoch': 7.51}

 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 578/616 [9:06:23<35:13, 55.63s/it]
 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 579/616 [9:07:20<34:30, 55.97s/it]
                                                   
{'loss': 1.584, 'learning_rate': 1.8895226006064084e-07, 'epoch': 7.52}

 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 579/616 [9:07:20<34:30, 55.97s/it]
 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 580/616 [9:08:16<33:42, 56.18s/it]
                                                   
{'loss': 1.6064, 'learning_rate': 1.7890680085019597e-07, 'epoch': 7.53}

 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 580/616 [9:08:16<33:42, 56.18s/it]
 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 581/616 [9:09:13<32:50, 56.30s/it]
                                                   
{'loss': 1.6235, 'learning_rate': 1.6913330469618628e-07, 'epoch': 7.55}

 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 581/616 [9:09:13<32:50, 56.30s/it]
 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 582/616 [9:10:10<32:00, 56.48s/it]
                                                   
{'loss': 1.6294, 'learning_rate': 1.5963204224362261e-07, 'epoch': 7.56}

 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 582/616 [9:10:10<32:00, 56.48s/it]
 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 583/616 [9:11:06<30:57, 56.29s/it]
                                                   
{'loss': 1.6055, 'learning_rate': 1.504032765988961e-07, 'epoch': 7.57}

 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 583/616 [9:11:06<30:57, 56.29s/it]
 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 584/616 [9:12:02<29:59, 56.23s/it]
                                                   
{'loss': 1.6353, 'learning_rate': 1.4144726332248726e-07, 'epoch': 7.58}

 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 584/616 [9:12:02<29:59, 56.23s/it]
 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 585/616 [9:12:58<29:03, 56.25s/it]
                                                   
{'loss': 1.6108, 'learning_rate': 1.327642504218951e-07, 'epoch': 7.6}

 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 585/616 [9:12:58<29:03, 56.25s/it]
 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 586/616 [9:13:54<28:00, 56.00s/it]
                                                   
{'loss': 1.6201, 'learning_rate': 1.2435447834476254e-07, 'epoch': 7.61}

 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 586/616 [9:13:54<28:00, 56.00s/it]
 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 587/616 [9:14:50<27:10, 56.22s/it]
                                                   
{'loss': 1.6128, 'learning_rate': 1.1621817997222507e-07, 'epoch': 7.62}

 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 587/616 [9:14:50<27:10, 56.22s/it]
 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 588/616 [9:15:46<26:06, 55.95s/it]
                                                   
{'loss': 1.6196, 'learning_rate': 1.0835558061245587e-07, 'epoch': 7.64}

 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 588/616 [9:15:46<26:06, 55.95s/it]
 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 589/616 [9:16:42<25:16, 56.17s/it]
                                                   
{'loss': 1.6621, 'learning_rate': 1.0076689799442874e-07, 'epoch': 7.65}

 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 589/616 [9:16:42<25:16, 56.17s/it]
 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 590/616 [9:17:38<24:14, 55.95s/it]
                                                   
{'loss': 1.6216, 'learning_rate': 9.34523422618916e-08, 'epoch': 7.66}

 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 590/616 [9:17:38<24:14, 55.95s/it]
 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 591/616 [9:18:35<23:27, 56.31s/it]
                                                   
{'loss': 1.6289, 'learning_rate': 8.641211596754129e-08, 'epoch': 7.68}

 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 591/616 [9:18:35<23:27, 56.31s/it]
 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 592/616 [9:19:31<22:28, 56.19s/it]
                                                   
{'loss': 1.6279, 'learning_rate': 7.964641406742135e-08, 'epoch': 7.69}

 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 592/616 [9:19:31<22:28, 56.19s/it]
 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 593/616 [9:20:28<21:37, 56.42s/it]
                                                   
{'loss': 1.6187, 'learning_rate': 7.315542391551966e-08, 'epoch': 7.7}

 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 593/616 [9:20:28<21:37, 56.42s/it]
 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 594/616 [9:21:24<20:42, 56.46s/it]
                                                   
{'loss': 1.6445, 'learning_rate': 6.693932525857927e-08, 'epoch': 7.71}

 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 594/616 [9:21:24<20:42, 56.46s/it]
 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 595/616 [9:22:21<19:45, 56.45s/it]
                                                   
{'loss': 1.6226, 'learning_rate': 6.099829023112236e-08, 'epoch': 7.73}

 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 595/616 [9:22:21<19:45, 56.45s/it]
 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 596/616 [9:23:17<18:47, 56.35s/it]
                                                   
{'loss': 1.6025, 'learning_rate': 5.533248335068409e-08, 'epoch': 7.74}

 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 596/616 [9:23:17<18:47, 56.35s/it]
 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 597/616 [9:24:13<17:49, 56.29s/it]
                                                   
{'loss': 1.5981, 'learning_rate': 4.994206151325509e-08, 'epoch': 7.75}

 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 597/616 [9:24:13<17:49, 56.29s/it]
 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 598/616 [9:25:10<16:58, 56.58s/it]
                                                   
{'loss': 1.6479, 'learning_rate': 4.482717398894165e-08, 'epoch': 7.77}

 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 598/616 [9:25:10<16:58, 56.58s/it]
 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 599/616 [9:26:07<16:03, 56.66s/it]
                                                   
{'loss': 1.6494, 'learning_rate': 3.998796241782232e-08, 'epoch': 7.78}

 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 599/616 [9:26:07<16:03, 56.66s/it]
 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 600/616 [9:27:02<14:58, 56.13s/it]
                                                   
{'loss': 1.6328, 'learning_rate': 3.5424560806036625e-08, 'epoch': 7.79}

 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 600/616 [9:27:02<14:58, 56.13s/it]
 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 601/616 [9:28:54<18:13, 72.93s/it]
                                                   
{'loss': 1.5732, 'learning_rate': 3.1137095522068006e-08, 'epoch': 7.81}

 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 601/616 [9:28:54<18:13, 72.93s/it]
 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 602/616 [9:29:49<15:47, 67.67s/it]
                                                   
{'loss': 1.6196, 'learning_rate': 2.7125685293245552e-08, 'epoch': 7.82}

 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 602/616 [9:29:49<15:47, 67.67s/it]
 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 603/616 [9:30:46<13:56, 64.33s/it]
                                                   
{'loss': 1.5894, 'learning_rate': 2.3390441202455484e-08, 'epoch': 7.83}

 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 603/616 [9:30:46<13:56, 64.33s/it]
 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 604/616 [9:31:42<12:22, 61.86s/it]
                                                   
{'loss': 1.6172, 'learning_rate': 1.993146668506585e-08, 'epoch': 7.84}

 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 604/616 [9:31:42<12:22, 61.86s/it]
 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 605/616 [9:32:40<11:07, 60.66s/it]
                                                   
{'loss': 1.604, 'learning_rate': 1.6748857526066588e-08, 'epoch': 7.86}

 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 605/616 [9:32:40<11:07, 60.66s/it]
 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 606/616 [9:33:36<09:53, 59.31s/it]
                                                   
{'loss': 1.6172, 'learning_rate': 1.3842701857406104e-08, 'epoch': 7.87}

 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 606/616 [9:33:36<09:53, 59.31s/it]
 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 607/616 [9:34:32<08:43, 58.22s/it]
                                                   
{'loss': 1.6377, 'learning_rate': 1.1213080155564327e-08, 'epoch': 7.88}

 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 607/616 [9:34:32<08:43, 58.22s/it]
 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 608/616 [9:35:27<07:39, 57.40s/it]
                                                   
{'loss': 1.6064, 'learning_rate': 8.860065239311155e-09, 'epoch': 7.9}

 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 608/616 [9:35:27<07:39, 57.40s/it]
 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 609/616 [9:36:23<06:38, 56.99s/it]
                                                   
{'loss': 1.6211, 'learning_rate': 6.783722267701409e-09, 'epoch': 7.91}

 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 609/616 [9:36:23<06:38, 56.99s/it]
 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 610/616 [9:37:19<05:39, 56.62s/it]
                                                   
{'loss': 1.6274, 'learning_rate': 4.984108738261828e-09, 'epoch': 7.92}

 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 610/616 [9:37:19<05:39, 56.62s/it]
 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 611/616 [9:38:16<04:43, 56.63s/it]
                                                   
{'loss': 1.6328, 'learning_rate': 3.4612744854045645e-09, 'epoch': 7.94}

 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 611/616 [9:38:16<04:43, 56.63s/it]
 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 612/616 [9:39:11<03:45, 56.36s/it]
                                                   
{'loss': 1.583, 'learning_rate': 2.215261679042735e-09, 'epoch': 7.95}

 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 612/616 [9:39:11<03:45, 56.36s/it]
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 613/616 [9:40:08<02:49, 56.35s/it]
                                                   
{'loss': 1.6318, 'learning_rate': 1.246104823426908e-09, 'epoch': 7.96}

100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 613/616 [9:40:08<02:49, 56.35s/it]
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 614/616 [9:41:03<01:52, 56.01s/it]
                                                   
{'loss': 1.6265, 'learning_rate': 5.538307561858691e-10, 'epoch': 7.97}

100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 614/616 [9:41:03<01:52, 56.01s/it]
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 615/616 [9:41:58<00:55, 55.78s/it]
                                                   
{'loss': 1.605, 'learning_rate': 1.3845864758610384e-10, 'epoch': 7.99}

100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 615/616 [9:41:58<00:55, 55.78s/it]
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 616/616 [9:42:54<00:00, 55.92s/it]
                                                   
{'loss': 1.6025, 'learning_rate': 0.0, 'epoch': 8.0}

100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 616/616 [9:42:54<00:00, 55.92s/it]
                                                   
{'train_runtime': 34978.7912, 'train_samples_per_second': 2.252, 'train_steps_per_second': 0.018, 'train_loss': 2.115578391335227, 'epoch': 8.0}

100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 616/616 [9:42:54<00:00, 55.92s/it]
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 616/616 [9:42:54<00:00, 56.78s/it]
Non lora weights:  dict_keys(['base_model.model.model.mm_projector.weight', 'base_model.model.model.mm_projector.bias', 'base_model.model.model.frames_conv.weight', 'base_model.model.model.frames_conv.bias'])
Non lora weights:  dict_keys(['base_model.model.model.mm_projector.weight', 'base_model.model.model.mm_projector.bias', 'base_model.model.model.frames_conv.weight', 'base_model.model.model.frames_conv.bias'])
wandb: Waiting for W&B process to finish... (success).
[2023-10-13 12:46:18,400] [INFO] [launch.py:347:main] Process 1707 exits successfully.
wandb: 
wandb: Run history:
wandb:                    train/epoch β–β–β–β–β–‚β–‚β–‚β–‚β–‚β–ƒβ–ƒβ–ƒβ–ƒβ–ƒβ–ƒβ–„β–„β–„β–„β–„β–…β–…β–…β–…β–…β–…β–†β–†β–†β–†β–†β–‡β–‡β–‡β–‡β–‡β–‡β–ˆβ–ˆβ–ˆ
wandb:              train/global_step β–β–β–β–β–‚β–‚β–‚β–‚β–‚β–ƒβ–ƒβ–ƒβ–ƒβ–ƒβ–ƒβ–„β–„β–„β–„β–„β–…β–…β–…β–…β–…β–…β–†β–†β–†β–†β–†β–‡β–‡β–‡β–‡β–‡β–‡β–ˆβ–ˆβ–ˆ
wandb:            train/learning_rate β–„β–‡β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‡β–‡β–‡β–‡β–‡β–†β–†β–†β–†β–…β–…β–…β–…β–„β–„β–„β–ƒβ–ƒβ–ƒβ–ƒβ–‚β–‚β–‚β–‚β–‚β–β–β–β–β–β–β–
wandb:                     train/loss β–ˆβ–…β–ƒβ–ƒβ–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–
wandb:               train/total_flos ▁
wandb:               train/train_loss ▁
wandb:            train/train_runtime ▁
wandb: train/train_samples_per_second ▁
wandb:   train/train_steps_per_second ▁
wandb: 
wandb: Run summary:
wandb:                    train/epoch 8.0
wandb:              train/global_step 616
wandb:            train/learning_rate 0.0
wandb:                     train/loss 1.6025
wandb:               train/total_flos 1.5114021399418634e+18
wandb:               train/train_loss 2.11558
wandb:            train/train_runtime 34978.7912
wandb: train/train_samples_per_second 2.252
wandb:   train/train_steps_per_second 0.018
wandb: 
wandb: πŸš€ View run fiery-dew-9 at: https://wandb.ai/wanghao-cst/huggingface/runs/30lhy90r
wandb: ️⚑ View job at https://wandb.ai/wanghao-cst/huggingface/jobs/QXJ0aWZhY3RDb2xsZWN0aW9uOjEwNTk0Mjk1MA==/version_details/v2
wandb: Synced 5 W&B file(s), 0 media file(s), 2 artifact file(s) and 0 other file(s)
wandb: Find logs at: ./wandb/run-20231013_030309-30lhy90r/logs
[2023-10-13 12:46:56,444] [INFO] [launch.py:347:main] Process 1706 exits successfully.