bigscience-bot
commited on
Commit
·
e1345e3
1
Parent(s):
2c0a5a5
new data
Browse files- logs/main_log.txt +38 -0
logs/main_log.txt
CHANGED
@@ -66825,3 +66825,41 @@ time (ms)
|
|
66825 |
time (ms)
|
66826 |
[2021-09-25 22:11:04] PULSE: tr8-104B is scheduled to start in 17:17:05 (at 2021-09-26T15:28:10) (1188168 on 'gpu_p13' partition)
|
66827 |
[2021-09-25 22:11:04] PULSE: tr8-104B is running for 17:44:03 since 2021-09-25T04:27:01 (1185639_1 on 'gpu_p13' partition (r6i5n[7-8],r6i6n0,r7i2n[4-5],r7i3n2,r7i6n[2-4],r7i7n[7-8],r8i0n[0,2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-5],r9i1n[0-3],r9i2n[3-6,8],r9i3n[0-1,7-8],r9i4n[0-3],r9i5n[3-8],r9i6n0)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
66825 |
time (ms)
|
66826 |
[2021-09-25 22:11:04] PULSE: tr8-104B is scheduled to start in 17:17:05 (at 2021-09-26T15:28:10) (1188168 on 'gpu_p13' partition)
|
66827 |
[2021-09-25 22:11:04] PULSE: tr8-104B is running for 17:44:03 since 2021-09-25T04:27:01 (1185639_1 on 'gpu_p13' partition (r6i5n[7-8],r6i6n0,r7i2n[4-5],r7i3n2,r7i6n[2-4],r7i7n[7-8],r8i0n[0,2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-5],r9i1n[0-3],r9i2n[3-6,8],r9i3n[0-1,7-8],r9i4n[0-3],r9i5n[3-8],r9i6n0)
|
66828 |
+
iteration 9400/ 159576 | consumed samples: 616192 | elapsed time per iteration (ms): 19918.8 | learning rate: 6.000E-05 | global batch size: 224 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
66829 |
+
time (ms)
|
66830 |
+
iteration 9410/ 159576 | consumed samples: 618432 | elapsed time per iteration (ms): 19675.6 | learning rate: 6.000E-05 | global batch size: 224 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
66831 |
+
time (ms)
|
66832 |
+
iteration 9420/ 159576 | consumed samples: 620672 | elapsed time per iteration (ms): 19904.3 | learning rate: 6.000E-05 | global batch size: 224 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
66833 |
+
time (ms)
|
66834 |
+
iteration 9430/ 159576 | consumed samples: 622912 | elapsed time per iteration (ms): 19702.9 | learning rate: 6.000E-05 | global batch size: 224 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
66835 |
+
time (ms)
|
66836 |
+
iteration 9440/ 159576 | consumed samples: 625152 | elapsed time per iteration (ms): 19798.2 | learning rate: 6.000E-05 | global batch size: 224 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
66837 |
+
time (ms)
|
66838 |
+
iteration 9450/ 159576 | consumed samples: 627392 | elapsed time per iteration (ms): 19797.6 | learning rate: 6.000E-05 | global batch size: 224 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
66839 |
+
time (ms)
|
66840 |
+
iteration 9460/ 159576 | consumed samples: 629632 | elapsed time per iteration (ms): 20223.0 | learning rate: 6.000E-05 | global batch size: 224 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
66841 |
+
time (ms)
|
66842 |
+
iteration 9470/ 159576 | consumed samples: 631872 | elapsed time per iteration (ms): 19847.6 | learning rate: 6.000E-05 | global batch size: 224 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
66843 |
+
time (ms)
|
66844 |
+
iteration 9480/ 159576 | consumed samples: 634112 | elapsed time per iteration (ms): 19783.5 | learning rate: 6.000E-05 | global batch size: 224 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
66845 |
+
time (ms)
|
66846 |
+
iteration 9490/ 159576 | consumed samples: 636352 | elapsed time per iteration (ms): 19768.8 | learning rate: 6.000E-05 | global batch size: 224 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
66847 |
+
time (ms)
|
66848 |
+
iteration 9500/ 159576 | consumed samples: 638592 | elapsed time per iteration (ms): 19836.7 | learning rate: 6.000E-05 | global batch size: 224 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
66849 |
+
time (ms)
|
66850 |
+
iteration 9510/ 159576 | consumed samples: 640832 | elapsed time per iteration (ms): 19791.2 | learning rate: 6.000E-05 | global batch size: 224 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
66851 |
+
time (ms)
|
66852 |
+
iteration 9520/ 159576 | consumed samples: 643072 | elapsed time per iteration (ms): 19677.8 | learning rate: 6.000E-05 | global batch size: 224 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
66853 |
+
time (ms)
|
66854 |
+
iteration 9530/ 159576 | consumed samples: 645312 | elapsed time per iteration (ms): 19695.3 | learning rate: 6.000E-05 | global batch size: 224 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
66855 |
+
time (ms)
|
66856 |
+
iteration 9540/ 159576 | consumed samples: 647552 | elapsed time per iteration (ms): 19697.0 | learning rate: 6.000E-05 | global batch size: 224 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
66857 |
+
time (ms)
|
66858 |
+
iteration 9550/ 159576 | consumed samples: 649792 | elapsed time per iteration (ms): 19776.4 | learning rate: 6.000E-05 | global batch size: 224 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
66859 |
+
time (ms)
|
66860 |
+
iteration 9560/ 159576 | consumed samples: 652032 | elapsed time per iteration (ms): 19726.6 | learning rate: 6.000E-05 | global batch size: 224 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
66861 |
+
time (ms)
|
66862 |
+
iteration 9570/ 159576 | consumed samples: 654272 | elapsed time per iteration (ms): 19764.1 | learning rate: 6.000E-05 | global batch size: 224 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 |
|
66863 |
+
time (ms)
|
66864 |
+
[2021-09-25 23:11:05] PULSE: tr8-104B is scheduled to start in 18:13:44 (at 2021-09-26T17:24:50) (1188168 on 'gpu_p13' partition)
|
66865 |
+
[2021-09-25 23:11:05] PULSE: tr8-104B is running for 18:44:04 since 2021-09-25T04:27:01 (1185639_1 on 'gpu_p13' partition (r6i5n[7-8],r6i6n0,r7i2n[4-5],r7i3n2,r7i6n[2-4],r7i7n[7-8],r8i0n[0,2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-5],r9i1n[0-3],r9i2n[3-6,8],r9i3n[0-1,7-8],r9i4n[0-3],r9i5n[3-8],r9i6n0)
|