I am using a dataset of over 2 T to train BERT. The tokenization step has been completed, and the model has been trained for 500 epochs. However, after reaching the specified checkpoints (set with --eval_steps
and --logging_steps
), the training process has stopped using the GPU and appears to be performing additional tasks that rely on the CPU. It seems that these CPU tasks may take a considerable amount of time to complete.
Below is the log, and I would appreciate any assistance in explaining the ongoing process and providing suggestions to improve its speed.
0%| | 499/100000 [152:02:34<24501:24:43, 886.47s/it]
0%| | 500/100000 [152:17:13<24442:08:51, 884.34s/it]
0%| | 500/100000 [152:17:13<24442:08:51, 884.34s/it]
0%| | 0/13564759 [00:00<?, ?it/s]e[A
0%| | 2/13564759 [00:05<10294:25:46, 2.73s/it]e[A
0%| | 3/13564759 [00:12<16387:56:31, 4.35s/it]e[A
0%| | 4/13564759 [00:17<17900:02:18, 4.75s/it]e[A
0%| | 5/13564759 [00:23<19099:21:20, 5.07s/it]e[A