bigscience
/

tr8-104B-logs

TensorBoard

Model card Files Files and versions Metrics Training metrics Community

bigscience-bot commited on Sep 25, 2021

Commit

e1345e3

1 Parent(s): 2c0a5a5

new data

Browse files

Files changed (1) hide show

logs/main_log.txt +38 -0

logs/main_log.txt CHANGED Viewed

@@ -66825,3 +66825,41 @@ time (ms)
 time (ms)
 [2021-09-25 22:11:04] PULSE: tr8-104B is scheduled to start in 17:17:05 (at 2021-09-26T15:28:10) (1188168 on 'gpu_p13' partition)
 [2021-09-25 22:11:04] PULSE: tr8-104B is running for 17:44:03 since 2021-09-25T04:27:01 (1185639_1 on 'gpu_p13' partition (r6i5n[7-8],r6i6n0,r7i2n[4-5],r7i3n2,r7i6n[2-4],r7i7n[7-8],r8i0n[0,2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-5],r9i1n[0-3],r9i2n[3-6,8],r9i3n[0-1,7-8],r9i4n[0-3],r9i5n[3-8],r9i6n0)

 time (ms)
 [2021-09-25 22:11:04] PULSE: tr8-104B is scheduled to start in 17:17:05 (at 2021-09-26T15:28:10) (1188168 on 'gpu_p13' partition)
 [2021-09-25 22:11:04] PULSE: tr8-104B is running for 17:44:03 since 2021-09-25T04:27:01 (1185639_1 on 'gpu_p13' partition (r6i5n[7-8],r6i6n0,r7i2n[4-5],r7i3n2,r7i6n[2-4],r7i7n[7-8],r8i0n[0,2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-5],r9i1n[0-3],r9i2n[3-6,8],r9i3n[0-1,7-8],r9i4n[0-3],r9i5n[3-8],r9i6n0)
+ iteration     9400/  159576 | consumed samples:       616192 | elapsed time per iteration (ms): 19918.8 | learning rate: 6.000E-05 | global batch size:   224 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
+time (ms)
+ iteration     9410/  159576 | consumed samples:       618432 | elapsed time per iteration (ms): 19675.6 | learning rate: 6.000E-05 | global batch size:   224 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
+time (ms)
+ iteration     9420/  159576 | consumed samples:       620672 | elapsed time per iteration (ms): 19904.3 | learning rate: 6.000E-05 | global batch size:   224 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
+time (ms)
+ iteration     9430/  159576 | consumed samples:       622912 | elapsed time per iteration (ms): 19702.9 | learning rate: 6.000E-05 | global batch size:   224 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
+time (ms)
+ iteration     9440/  159576 | consumed samples:       625152 | elapsed time per iteration (ms): 19798.2 | learning rate: 6.000E-05 | global batch size:   224 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
+time (ms)
+ iteration     9450/  159576 | consumed samples:       627392 | elapsed time per iteration (ms): 19797.6 | learning rate: 6.000E-05 | global batch size:   224 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
+time (ms)
+ iteration     9460/  159576 | consumed samples:       629632 | elapsed time per iteration (ms): 20223.0 | learning rate: 6.000E-05 | global batch size:   224 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
+time (ms)
+ iteration     9470/  159576 | consumed samples:       631872 | elapsed time per iteration (ms): 19847.6 | learning rate: 6.000E-05 | global batch size:   224 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
+time (ms)
+ iteration     9480/  159576 | consumed samples:       634112 | elapsed time per iteration (ms): 19783.5 | learning rate: 6.000E-05 | global batch size:   224 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
+time (ms)
+ iteration     9490/  159576 | consumed samples:       636352 | elapsed time per iteration (ms): 19768.8 | learning rate: 6.000E-05 | global batch size:   224 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
+time (ms)
+ iteration     9500/  159576 | consumed samples:       638592 | elapsed time per iteration (ms): 19836.7 | learning rate: 6.000E-05 | global batch size:   224 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
+time (ms)
+ iteration     9510/  159576 | consumed samples:       640832 | elapsed time per iteration (ms): 19791.2 | learning rate: 6.000E-05 | global batch size:   224 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
+time (ms)
+ iteration     9520/  159576 | consumed samples:       643072 | elapsed time per iteration (ms): 19677.8 | learning rate: 6.000E-05 | global batch size:   224 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
+time (ms)
+ iteration     9530/  159576 | consumed samples:       645312 | elapsed time per iteration (ms): 19695.3 | learning rate: 6.000E-05 | global batch size:   224 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
+time (ms)
+ iteration     9540/  159576 | consumed samples:       647552 | elapsed time per iteration (ms): 19697.0 | learning rate: 6.000E-05 | global batch size:   224 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
+time (ms)
+ iteration     9550/  159576 | consumed samples:       649792 | elapsed time per iteration (ms): 19776.4 | learning rate: 6.000E-05 | global batch size:   224 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
+time (ms)
+ iteration     9560/  159576 | consumed samples:       652032 | elapsed time per iteration (ms): 19726.6 | learning rate: 6.000E-05 | global batch size:   224 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
+time (ms)
+ iteration     9570/  159576 | consumed samples:       654272 | elapsed time per iteration (ms): 19764.1 | learning rate: 6.000E-05 | global batch size:   224 | loss scale: 1.0 | grad norm: 5927.930 | num zeros: 0.0 | number of skipped iterations:   0 | number of nan iterations:   0 |
+time (ms)
+[2021-09-25 23:11:05] PULSE: tr8-104B is scheduled to start in 18:13:44 (at 2021-09-26T17:24:50) (1188168 on 'gpu_p13' partition)
+[2021-09-25 23:11:05] PULSE: tr8-104B is running for 18:44:04 since 2021-09-25T04:27:01 (1185639_1 on 'gpu_p13' partition (r6i5n[7-8],r6i6n0,r7i2n[4-5],r7i3n2,r7i6n[2-4],r7i7n[7-8],r8i0n[0,2-3,5-8],r8i1n[0,2-4],r8i2n8,r8i3n[0-2],r8i5n[3-4],r8i7n[3-8],r9i0n[0-5],r9i1n[0-3],r9i2n[3-6,8],r9i3n[0-1,7-8],r9i4n[0-3],r9i5n[3-8],r9i6n0)