diff --git "a/sf_log.txt" "b/sf_log.txt" new file mode 100644--- /dev/null +++ "b/sf_log.txt" @@ -0,0 +1,1095 @@ +[2024-12-28 17:36:29,013][00596] Saving configuration to /content/train_dir/default_experiment/config.json... +[2024-12-28 17:36:29,016][00596] Rollout worker 0 uses device cpu +[2024-12-28 17:36:29,017][00596] Rollout worker 1 uses device cpu +[2024-12-28 17:36:29,019][00596] Rollout worker 2 uses device cpu +[2024-12-28 17:36:29,020][00596] Rollout worker 3 uses device cpu +[2024-12-28 17:36:29,021][00596] Rollout worker 4 uses device cpu +[2024-12-28 17:36:29,023][00596] Rollout worker 5 uses device cpu +[2024-12-28 17:36:29,026][00596] Rollout worker 6 uses device cpu +[2024-12-28 17:36:29,029][00596] Rollout worker 7 uses device cpu +[2024-12-28 17:36:29,181][00596] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-12-28 17:36:29,183][00596] InferenceWorker_p0-w0: min num requests: 2 +[2024-12-28 17:36:29,221][00596] Starting all processes... +[2024-12-28 17:36:29,223][00596] Starting process learner_proc0 +[2024-12-28 17:36:29,276][00596] Starting all processes... +[2024-12-28 17:36:29,285][00596] Starting process inference_proc0-0 +[2024-12-28 17:36:29,286][00596] Starting process rollout_proc0 +[2024-12-28 17:36:29,289][00596] Starting process rollout_proc1 +[2024-12-28 17:36:29,289][00596] Starting process rollout_proc2 +[2024-12-28 17:36:29,289][00596] Starting process rollout_proc3 +[2024-12-28 17:36:29,289][00596] Starting process rollout_proc4 +[2024-12-28 17:36:29,289][00596] Starting process rollout_proc5 +[2024-12-28 17:36:29,289][00596] Starting process rollout_proc6 +[2024-12-28 17:36:29,289][00596] Starting process rollout_proc7 +[2024-12-28 17:36:46,727][03341] Worker 5 uses CPU cores [1] +[2024-12-28 17:36:46,906][03321] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-12-28 17:36:46,910][03321] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 +[2024-12-28 17:36:47,023][03321] Num visible devices: 1 +[2024-12-28 17:36:47,033][03321] Starting seed is not provided +[2024-12-28 17:36:47,034][03321] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-12-28 17:36:47,034][03321] Initializing actor-critic model on device cuda:0 +[2024-12-28 17:36:47,034][03321] RunningMeanStd input shape: (3, 72, 128) +[2024-12-28 17:36:47,041][03321] RunningMeanStd input shape: (1,) +[2024-12-28 17:36:47,057][03334] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-12-28 17:36:47,064][03334] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 +[2024-12-28 17:36:47,158][03321] ConvEncoder: input_channels=3 +[2024-12-28 17:36:47,198][03335] Worker 0 uses CPU cores [0] +[2024-12-28 17:36:47,204][03334] Num visible devices: 1 +[2024-12-28 17:36:47,346][03337] Worker 3 uses CPU cores [1] +[2024-12-28 17:36:47,557][03342] Worker 7 uses CPU cores [1] +[2024-12-28 17:36:47,613][03336] Worker 1 uses CPU cores [1] +[2024-12-28 17:36:47,654][03338] Worker 2 uses CPU cores [0] +[2024-12-28 17:36:47,733][03340] Worker 4 uses CPU cores [0] +[2024-12-28 17:36:47,750][03339] Worker 6 uses CPU cores [0] +[2024-12-28 17:36:47,837][03321] Conv encoder output size: 512 +[2024-12-28 17:36:47,837][03321] Policy head output size: 512 +[2024-12-28 17:36:47,887][03321] Created Actor Critic model with architecture: +[2024-12-28 17:36:47,887][03321] ActorCriticSharedWeights( + (obs_normalizer): ObservationNormalizer( + (running_mean_std): RunningMeanStdDictInPlace( + (running_mean_std): ModuleDict( + (obs): RunningMeanStdInPlace() + ) + ) + ) + (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) + (encoder): VizdoomEncoder( + (basic_encoder): ConvEncoder( + (enc): RecursiveScriptModule( + original_name=ConvEncoderImpl + (conv_head): RecursiveScriptModule( + original_name=Sequential + (0): RecursiveScriptModule(original_name=Conv2d) + (1): RecursiveScriptModule(original_name=ELU) + (2): RecursiveScriptModule(original_name=Conv2d) + (3): RecursiveScriptModule(original_name=ELU) + (4): RecursiveScriptModule(original_name=Conv2d) + (5): RecursiveScriptModule(original_name=ELU) + ) + (mlp_layers): RecursiveScriptModule( + original_name=Sequential + (0): RecursiveScriptModule(original_name=Linear) + (1): RecursiveScriptModule(original_name=ELU) + ) + ) + ) + ) + (core): ModelCoreRNN( + (core): GRU(512, 512) + ) + (decoder): MlpDecoder( + (mlp): Identity() + ) + (critic_linear): Linear(in_features=512, out_features=1, bias=True) + (action_parameterization): ActionParameterizationDefault( + (distribution_linear): Linear(in_features=512, out_features=5, bias=True) + ) +) +[2024-12-28 17:36:48,269][03321] Using optimizer +[2024-12-28 17:36:49,174][00596] Heartbeat connected on Batcher_0 +[2024-12-28 17:36:49,182][00596] Heartbeat connected on InferenceWorker_p0-w0 +[2024-12-28 17:36:49,191][00596] Heartbeat connected on RolloutWorker_w0 +[2024-12-28 17:36:49,195][00596] Heartbeat connected on RolloutWorker_w1 +[2024-12-28 17:36:49,202][00596] Heartbeat connected on RolloutWorker_w2 +[2024-12-28 17:36:49,207][00596] Heartbeat connected on RolloutWorker_w3 +[2024-12-28 17:36:49,212][00596] Heartbeat connected on RolloutWorker_w5 +[2024-12-28 17:36:49,216][00596] Heartbeat connected on RolloutWorker_w4 +[2024-12-28 17:36:49,223][00596] Heartbeat connected on RolloutWorker_w6 +[2024-12-28 17:36:49,225][00596] Heartbeat connected on RolloutWorker_w7 +[2024-12-28 17:36:51,597][03321] No checkpoints found +[2024-12-28 17:36:51,597][03321] Did not load from checkpoint, starting from scratch! +[2024-12-28 17:36:51,597][03321] Initialized policy 0 weights for model version 0 +[2024-12-28 17:36:51,601][03321] LearnerWorker_p0 finished initialization! +[2024-12-28 17:36:51,604][03321] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-12-28 17:36:51,601][00596] Heartbeat connected on LearnerWorker_p0 +[2024-12-28 17:36:51,799][03334] RunningMeanStd input shape: (3, 72, 128) +[2024-12-28 17:36:51,801][03334] RunningMeanStd input shape: (1,) +[2024-12-28 17:36:51,813][03334] ConvEncoder: input_channels=3 +[2024-12-28 17:36:51,914][03334] Conv encoder output size: 512 +[2024-12-28 17:36:51,914][03334] Policy head output size: 512 +[2024-12-28 17:36:51,964][00596] Inference worker 0-0 is ready! +[2024-12-28 17:36:51,965][00596] All inference workers are ready! Signal rollout workers to start! +[2024-12-28 17:36:52,169][03340] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-12-28 17:36:52,164][03341] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-12-28 17:36:52,170][03335] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-12-28 17:36:52,168][03337] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-12-28 17:36:52,172][03338] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-12-28 17:36:52,171][03339] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-12-28 17:36:52,167][03336] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-12-28 17:36:52,172][03342] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-12-28 17:36:53,185][03335] Decorrelating experience for 0 frames... +[2024-12-28 17:36:53,186][03340] Decorrelating experience for 0 frames... +[2024-12-28 17:36:53,789][03342] Decorrelating experience for 0 frames... +[2024-12-28 17:36:53,791][03337] Decorrelating experience for 0 frames... +[2024-12-28 17:36:53,796][03336] Decorrelating experience for 0 frames... +[2024-12-28 17:36:53,807][03341] Decorrelating experience for 0 frames... +[2024-12-28 17:36:53,970][03335] Decorrelating experience for 32 frames... +[2024-12-28 17:36:53,969][03340] Decorrelating experience for 32 frames... +[2024-12-28 17:36:55,009][03338] Decorrelating experience for 0 frames... +[2024-12-28 17:36:55,180][03336] Decorrelating experience for 32 frames... +[2024-12-28 17:36:55,178][03342] Decorrelating experience for 32 frames... +[2024-12-28 17:36:55,183][03337] Decorrelating experience for 32 frames... +[2024-12-28 17:36:55,336][03335] Decorrelating experience for 64 frames... +[2024-12-28 17:36:55,338][03340] Decorrelating experience for 64 frames... +[2024-12-28 17:36:55,526][03341] Decorrelating experience for 32 frames... +[2024-12-28 17:36:55,555][00596] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-12-28 17:36:56,445][03336] Decorrelating experience for 64 frames... +[2024-12-28 17:36:56,455][03337] Decorrelating experience for 64 frames... +[2024-12-28 17:36:56,715][03339] Decorrelating experience for 0 frames... +[2024-12-28 17:36:56,917][03338] Decorrelating experience for 32 frames... +[2024-12-28 17:36:57,008][03340] Decorrelating experience for 96 frames... +[2024-12-28 17:36:57,010][03335] Decorrelating experience for 96 frames... +[2024-12-28 17:36:58,049][03341] Decorrelating experience for 64 frames... +[2024-12-28 17:36:58,240][03336] Decorrelating experience for 96 frames... +[2024-12-28 17:36:58,255][03337] Decorrelating experience for 96 frames... +[2024-12-28 17:36:58,333][03339] Decorrelating experience for 32 frames... +[2024-12-28 17:36:58,761][03342] Decorrelating experience for 64 frames... +[2024-12-28 17:36:58,950][03338] Decorrelating experience for 64 frames... +[2024-12-28 17:37:00,226][03341] Decorrelating experience for 96 frames... +[2024-12-28 17:37:00,559][00596] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 4.0. Samples: 20. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-12-28 17:37:00,560][00596] Avg episode reward: [(0, '1.280')] +[2024-12-28 17:37:01,731][03339] Decorrelating experience for 64 frames... +[2024-12-28 17:37:02,029][03338] Decorrelating experience for 96 frames... +[2024-12-28 17:37:04,536][03321] Signal inference workers to stop experience collection... +[2024-12-28 17:37:04,566][03334] InferenceWorker_p0-w0: stopping experience collection +[2024-12-28 17:37:04,907][03339] Decorrelating experience for 96 frames... +[2024-12-28 17:37:05,019][03342] Decorrelating experience for 96 frames... +[2024-12-28 17:37:05,555][00596] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 204.4. Samples: 2044. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-12-28 17:37:05,562][00596] Avg episode reward: [(0, '2.874')] +[2024-12-28 17:37:07,563][03321] Signal inference workers to resume experience collection... +[2024-12-28 17:37:07,564][03334] InferenceWorker_p0-w0: resuming experience collection +[2024-12-28 17:37:10,555][00596] Fps is (10 sec: 2048.7, 60 sec: 1365.3, 300 sec: 1365.3). Total num frames: 20480. Throughput: 0: 320.7. Samples: 4810. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-12-28 17:37:10,557][00596] Avg episode reward: [(0, '3.464')] +[2024-12-28 17:37:14,730][03334] Updated weights for policy 0, policy_version 10 (0.0148) +[2024-12-28 17:37:15,555][00596] Fps is (10 sec: 4096.1, 60 sec: 2048.0, 300 sec: 2048.0). Total num frames: 40960. Throughput: 0: 425.6. Samples: 8512. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0) +[2024-12-28 17:37:15,558][00596] Avg episode reward: [(0, '4.114')] +[2024-12-28 17:37:20,555][00596] Fps is (10 sec: 3686.4, 60 sec: 2293.8, 300 sec: 2293.8). Total num frames: 57344. Throughput: 0: 556.8. Samples: 13920. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0) +[2024-12-28 17:37:20,560][00596] Avg episode reward: [(0, '4.457')] +[2024-12-28 17:37:25,555][00596] Fps is (10 sec: 3686.4, 60 sec: 2594.1, 300 sec: 2594.1). Total num frames: 77824. Throughput: 0: 669.3. Samples: 20078. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0) +[2024-12-28 17:37:25,559][00596] Avg episode reward: [(0, '4.429')] +[2024-12-28 17:37:25,773][03334] Updated weights for policy 0, policy_version 20 (0.0033) +[2024-12-28 17:37:30,555][00596] Fps is (10 sec: 4505.6, 60 sec: 2925.7, 300 sec: 2925.7). Total num frames: 102400. Throughput: 0: 680.3. Samples: 23812. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-12-28 17:37:30,557][00596] Avg episode reward: [(0, '4.195')] +[2024-12-28 17:37:30,633][03321] Saving new best policy, reward=4.195! +[2024-12-28 17:37:35,555][00596] Fps is (10 sec: 4096.0, 60 sec: 2969.6, 300 sec: 2969.6). Total num frames: 118784. Throughput: 0: 750.6. Samples: 30022. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-12-28 17:37:35,559][00596] Avg episode reward: [(0, '4.227')] +[2024-12-28 17:37:35,566][03321] Saving new best policy, reward=4.227! +[2024-12-28 17:37:36,245][03334] Updated weights for policy 0, policy_version 30 (0.0017) +[2024-12-28 17:37:40,555][00596] Fps is (10 sec: 3276.8, 60 sec: 3003.7, 300 sec: 3003.7). Total num frames: 135168. Throughput: 0: 764.0. Samples: 34378. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-12-28 17:37:40,558][00596] Avg episode reward: [(0, '4.409')] +[2024-12-28 17:37:40,566][03321] Saving new best policy, reward=4.409! +[2024-12-28 17:37:45,555][00596] Fps is (10 sec: 4096.0, 60 sec: 3194.9, 300 sec: 3194.9). Total num frames: 159744. Throughput: 0: 844.0. Samples: 37996. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-12-28 17:37:45,557][00596] Avg episode reward: [(0, '4.528')] +[2024-12-28 17:37:45,560][03321] Saving new best policy, reward=4.528! +[2024-12-28 17:37:45,842][03334] Updated weights for policy 0, policy_version 40 (0.0022) +[2024-12-28 17:37:50,555][00596] Fps is (10 sec: 4505.6, 60 sec: 3276.8, 300 sec: 3276.8). Total num frames: 180224. Throughput: 0: 956.1. Samples: 45070. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-12-28 17:37:50,557][00596] Avg episode reward: [(0, '4.352')] +[2024-12-28 17:37:55,555][00596] Fps is (10 sec: 3686.4, 60 sec: 3276.8, 300 sec: 3276.8). Total num frames: 196608. Throughput: 0: 995.1. Samples: 49590. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-12-28 17:37:55,557][00596] Avg episode reward: [(0, '4.471')] +[2024-12-28 17:37:56,936][03334] Updated weights for policy 0, policy_version 50 (0.0027) +[2024-12-28 17:38:00,555][00596] Fps is (10 sec: 4096.0, 60 sec: 3686.6, 300 sec: 3402.8). Total num frames: 221184. Throughput: 0: 993.5. Samples: 53218. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-28 17:38:00,558][00596] Avg episode reward: [(0, '4.468')] +[2024-12-28 17:38:05,092][03334] Updated weights for policy 0, policy_version 60 (0.0016) +[2024-12-28 17:38:05,555][00596] Fps is (10 sec: 4915.2, 60 sec: 4096.0, 300 sec: 3510.9). Total num frames: 245760. Throughput: 0: 1039.4. Samples: 60692. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-28 17:38:05,560][00596] Avg episode reward: [(0, '4.462')] +[2024-12-28 17:38:10,558][00596] Fps is (10 sec: 3685.4, 60 sec: 3959.3, 300 sec: 3440.5). Total num frames: 258048. Throughput: 0: 1016.3. Samples: 65816. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-28 17:38:10,562][00596] Avg episode reward: [(0, '4.449')] +[2024-12-28 17:38:15,555][00596] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3532.8). Total num frames: 282624. Throughput: 0: 991.9. Samples: 68448. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-28 17:38:15,560][00596] Avg episode reward: [(0, '4.376')] +[2024-12-28 17:38:16,074][03334] Updated weights for policy 0, policy_version 70 (0.0036) +[2024-12-28 17:38:20,555][00596] Fps is (10 sec: 4916.5, 60 sec: 4164.3, 300 sec: 3614.1). Total num frames: 307200. Throughput: 0: 1020.0. Samples: 75924. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-12-28 17:38:20,557][00596] Avg episode reward: [(0, '4.220')] +[2024-12-28 17:38:20,567][03321] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000075_307200.pth... +[2024-12-28 17:38:25,559][00596] Fps is (10 sec: 4094.4, 60 sec: 4095.7, 300 sec: 3595.2). Total num frames: 323584. Throughput: 0: 1055.3. Samples: 81870. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-12-28 17:38:25,563][00596] Avg episode reward: [(0, '4.382')] +[2024-12-28 17:38:25,660][03334] Updated weights for policy 0, policy_version 80 (0.0028) +[2024-12-28 17:38:30,555][00596] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3621.7). Total num frames: 344064. Throughput: 0: 1025.5. Samples: 84142. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-12-28 17:38:30,563][00596] Avg episode reward: [(0, '4.518')] +[2024-12-28 17:38:35,436][03334] Updated weights for policy 0, policy_version 90 (0.0028) +[2024-12-28 17:38:35,555][00596] Fps is (10 sec: 4507.4, 60 sec: 4164.3, 300 sec: 3686.4). Total num frames: 368640. Throughput: 0: 1020.9. Samples: 91010. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-12-28 17:38:35,562][00596] Avg episode reward: [(0, '4.411')] +[2024-12-28 17:38:40,555][00596] Fps is (10 sec: 4505.6, 60 sec: 4232.5, 300 sec: 3705.9). Total num frames: 389120. Throughput: 0: 1076.6. Samples: 98038. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-12-28 17:38:40,562][00596] Avg episode reward: [(0, '4.458')] +[2024-12-28 17:38:45,555][00596] Fps is (10 sec: 3686.4, 60 sec: 4096.0, 300 sec: 3686.4). Total num frames: 405504. Throughput: 0: 1044.8. Samples: 100232. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-28 17:38:45,558][00596] Avg episode reward: [(0, '4.584')] +[2024-12-28 17:38:45,560][03321] Saving new best policy, reward=4.584! +[2024-12-28 17:38:46,406][03334] Updated weights for policy 0, policy_version 100 (0.0020) +[2024-12-28 17:38:50,555][00596] Fps is (10 sec: 4096.0, 60 sec: 4164.3, 300 sec: 3739.8). Total num frames: 430080. Throughput: 0: 1013.2. Samples: 106286. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-28 17:38:50,557][00596] Avg episode reward: [(0, '4.794')] +[2024-12-28 17:38:50,566][03321] Saving new best policy, reward=4.794! +[2024-12-28 17:38:54,573][03334] Updated weights for policy 0, policy_version 110 (0.0022) +[2024-12-28 17:38:55,555][00596] Fps is (10 sec: 4915.2, 60 sec: 4300.8, 300 sec: 3788.8). Total num frames: 454656. Throughput: 0: 1064.4. Samples: 113712. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-12-28 17:38:55,561][00596] Avg episode reward: [(0, '4.570')] +[2024-12-28 17:39:00,561][00596] Fps is (10 sec: 3684.4, 60 sec: 4095.6, 300 sec: 3735.4). Total num frames: 466944. Throughput: 0: 1062.4. Samples: 116262. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-12-28 17:39:00,563][00596] Avg episode reward: [(0, '4.333')] +[2024-12-28 17:39:05,543][03334] Updated weights for policy 0, policy_version 120 (0.0020) +[2024-12-28 17:39:05,555][00596] Fps is (10 sec: 3686.3, 60 sec: 4096.0, 300 sec: 3780.9). Total num frames: 491520. Throughput: 0: 1010.6. Samples: 121402. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-28 17:39:05,558][00596] Avg episode reward: [(0, '4.214')] +[2024-12-28 17:39:10,555][00596] Fps is (10 sec: 4508.1, 60 sec: 4232.7, 300 sec: 3792.6). Total num frames: 512000. Throughput: 0: 1041.4. Samples: 128730. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-12-28 17:39:10,557][00596] Avg episode reward: [(0, '4.697')] +[2024-12-28 17:39:14,971][03334] Updated weights for policy 0, policy_version 130 (0.0029) +[2024-12-28 17:39:15,555][00596] Fps is (10 sec: 4096.0, 60 sec: 4164.2, 300 sec: 3803.4). Total num frames: 532480. Throughput: 0: 1068.5. Samples: 132224. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-12-28 17:39:15,558][00596] Avg episode reward: [(0, '4.808')] +[2024-12-28 17:39:15,562][03321] Saving new best policy, reward=4.808! +[2024-12-28 17:39:20,555][00596] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3785.3). Total num frames: 548864. Throughput: 0: 1016.6. Samples: 136758. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-12-28 17:39:20,561][00596] Avg episode reward: [(0, '4.517')] +[2024-12-28 17:39:25,108][03334] Updated weights for policy 0, policy_version 140 (0.0014) +[2024-12-28 17:39:25,555][00596] Fps is (10 sec: 4096.2, 60 sec: 4164.5, 300 sec: 3822.9). Total num frames: 573440. Throughput: 0: 1018.8. Samples: 143886. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-12-28 17:39:25,557][00596] Avg episode reward: [(0, '4.408')] +[2024-12-28 17:39:30,555][00596] Fps is (10 sec: 4915.2, 60 sec: 4232.5, 300 sec: 3858.2). Total num frames: 598016. Throughput: 0: 1053.3. Samples: 147630. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-28 17:39:30,561][00596] Avg episode reward: [(0, '4.623')] +[2024-12-28 17:39:35,555][00596] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3814.4). Total num frames: 610304. Throughput: 0: 1035.4. Samples: 152880. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-28 17:39:35,561][00596] Avg episode reward: [(0, '4.654')] +[2024-12-28 17:39:35,746][03334] Updated weights for policy 0, policy_version 150 (0.0021) +[2024-12-28 17:39:40,555][00596] Fps is (10 sec: 3686.4, 60 sec: 4096.0, 300 sec: 3847.8). Total num frames: 634880. Throughput: 0: 1002.5. Samples: 158826. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-28 17:39:40,562][00596] Avg episode reward: [(0, '4.642')] +[2024-12-28 17:39:44,582][03334] Updated weights for policy 0, policy_version 160 (0.0028) +[2024-12-28 17:39:45,555][00596] Fps is (10 sec: 4915.2, 60 sec: 4232.5, 300 sec: 3879.2). Total num frames: 659456. Throughput: 0: 1027.5. Samples: 162494. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-12-28 17:39:45,562][00596] Avg episode reward: [(0, '4.514')] +[2024-12-28 17:39:50,555][00596] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 3861.9). Total num frames: 675840. Throughput: 0: 1055.7. Samples: 168906. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-12-28 17:39:50,557][00596] Avg episode reward: [(0, '4.390')] +[2024-12-28 17:39:55,385][03334] Updated weights for policy 0, policy_version 170 (0.0016) +[2024-12-28 17:39:55,555][00596] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3868.4). Total num frames: 696320. Throughput: 0: 1007.0. Samples: 174046. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-12-28 17:39:55,561][00596] Avg episode reward: [(0, '4.362')] +[2024-12-28 17:40:00,555][00596] Fps is (10 sec: 4505.6, 60 sec: 4232.9, 300 sec: 3896.7). Total num frames: 720896. Throughput: 0: 1010.9. Samples: 177712. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-12-28 17:40:00,557][00596] Avg episode reward: [(0, '4.613')] +[2024-12-28 17:40:03,731][03334] Updated weights for policy 0, policy_version 180 (0.0021) +[2024-12-28 17:40:05,555][00596] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 3902.0). Total num frames: 741376. Throughput: 0: 1073.8. Samples: 185078. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-12-28 17:40:05,563][00596] Avg episode reward: [(0, '4.717')] +[2024-12-28 17:40:10,555][00596] Fps is (10 sec: 3276.8, 60 sec: 4027.7, 300 sec: 3864.9). Total num frames: 753664. Throughput: 0: 1013.2. Samples: 189482. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-12-28 17:40:10,559][00596] Avg episode reward: [(0, '4.869')] +[2024-12-28 17:40:10,630][03321] Saving new best policy, reward=4.869! +[2024-12-28 17:40:14,990][03334] Updated weights for policy 0, policy_version 190 (0.0021) +[2024-12-28 17:40:15,555][00596] Fps is (10 sec: 3686.4, 60 sec: 4096.0, 300 sec: 3891.2). Total num frames: 778240. Throughput: 0: 1001.6. Samples: 192700. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-28 17:40:15,557][00596] Avg episode reward: [(0, '4.939')] +[2024-12-28 17:40:15,559][03321] Saving new best policy, reward=4.939! +[2024-12-28 17:40:20,555][00596] Fps is (10 sec: 4915.2, 60 sec: 4232.5, 300 sec: 3916.2). Total num frames: 802816. Throughput: 0: 1048.8. Samples: 200074. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-12-28 17:40:20,558][00596] Avg episode reward: [(0, '4.904')] +[2024-12-28 17:40:20,564][03321] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000196_802816.pth... +[2024-12-28 17:40:25,209][03334] Updated weights for policy 0, policy_version 200 (0.0026) +[2024-12-28 17:40:25,562][00596] Fps is (10 sec: 4093.1, 60 sec: 4095.5, 300 sec: 3900.8). Total num frames: 819200. Throughput: 0: 1032.3. Samples: 205288. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-12-28 17:40:25,565][00596] Avg episode reward: [(0, '4.764')] +[2024-12-28 17:40:30,555][00596] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3905.5). Total num frames: 839680. Throughput: 0: 1005.7. Samples: 207750. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-28 17:40:30,557][00596] Avg episode reward: [(0, '4.525')] +[2024-12-28 17:40:34,273][03334] Updated weights for policy 0, policy_version 210 (0.0027) +[2024-12-28 17:40:35,555][00596] Fps is (10 sec: 4508.9, 60 sec: 4232.5, 300 sec: 3928.4). Total num frames: 864256. Throughput: 0: 1029.7. Samples: 215244. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-12-28 17:40:35,560][00596] Avg episode reward: [(0, '4.925')] +[2024-12-28 17:40:40,557][00596] Fps is (10 sec: 4504.9, 60 sec: 4164.2, 300 sec: 3932.1). Total num frames: 884736. Throughput: 0: 1054.8. Samples: 221514. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-28 17:40:40,560][00596] Avg episode reward: [(0, '5.028')] +[2024-12-28 17:40:40,572][03321] Saving new best policy, reward=5.028! +[2024-12-28 17:40:45,435][03334] Updated weights for policy 0, policy_version 220 (0.0019) +[2024-12-28 17:40:45,555][00596] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3917.9). Total num frames: 901120. Throughput: 0: 1022.8. Samples: 223738. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-12-28 17:40:45,557][00596] Avg episode reward: [(0, '4.719')] +[2024-12-28 17:40:50,555][00596] Fps is (10 sec: 4096.7, 60 sec: 4164.3, 300 sec: 3939.1). Total num frames: 925696. Throughput: 0: 1004.6. Samples: 230286. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-12-28 17:40:50,557][00596] Avg episode reward: [(0, '4.665')] +[2024-12-28 17:40:53,595][03334] Updated weights for policy 0, policy_version 230 (0.0027) +[2024-12-28 17:40:55,555][00596] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 3942.4). Total num frames: 946176. Throughput: 0: 1067.8. Samples: 237534. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-12-28 17:40:55,563][00596] Avg episode reward: [(0, '4.675')] +[2024-12-28 17:41:00,558][00596] Fps is (10 sec: 3685.3, 60 sec: 4027.5, 300 sec: 3928.8). Total num frames: 962560. Throughput: 0: 1046.4. Samples: 239792. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-12-28 17:41:00,560][00596] Avg episode reward: [(0, '4.630')] +[2024-12-28 17:41:04,627][03334] Updated weights for policy 0, policy_version 240 (0.0018) +[2024-12-28 17:41:05,555][00596] Fps is (10 sec: 4095.9, 60 sec: 4096.0, 300 sec: 3948.5). Total num frames: 987136. Throughput: 0: 1011.8. Samples: 245604. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-12-28 17:41:05,561][00596] Avg episode reward: [(0, '4.861')] +[2024-12-28 17:41:10,555][00596] Fps is (10 sec: 4916.6, 60 sec: 4300.8, 300 sec: 3967.5). Total num frames: 1011712. Throughput: 0: 1060.3. Samples: 252996. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-12-28 17:41:10,559][00596] Avg episode reward: [(0, '4.998')] +[2024-12-28 17:41:13,761][03334] Updated weights for policy 0, policy_version 250 (0.0023) +[2024-12-28 17:41:15,558][00596] Fps is (10 sec: 4094.9, 60 sec: 4164.1, 300 sec: 3954.2). Total num frames: 1028096. Throughput: 0: 1071.7. Samples: 255982. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-12-28 17:41:15,560][00596] Avg episode reward: [(0, '4.962')] +[2024-12-28 17:41:20,555][00596] Fps is (10 sec: 3276.8, 60 sec: 4027.7, 300 sec: 3941.4). Total num frames: 1044480. Throughput: 0: 1011.8. Samples: 260774. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-12-28 17:41:20,557][00596] Avg episode reward: [(0, '5.024')] +[2024-12-28 17:41:23,949][03334] Updated weights for policy 0, policy_version 260 (0.0030) +[2024-12-28 17:41:25,555][00596] Fps is (10 sec: 4097.1, 60 sec: 4164.7, 300 sec: 3959.5). Total num frames: 1069056. Throughput: 0: 1035.2. Samples: 268098. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-28 17:41:25,558][00596] Avg episode reward: [(0, '4.952')] +[2024-12-28 17:41:30,555][00596] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 3961.9). Total num frames: 1089536. Throughput: 0: 1067.2. Samples: 271762. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-12-28 17:41:30,558][00596] Avg episode reward: [(0, '4.883')] +[2024-12-28 17:41:34,805][03334] Updated weights for policy 0, policy_version 270 (0.0023) +[2024-12-28 17:41:35,555][00596] Fps is (10 sec: 3686.5, 60 sec: 4027.7, 300 sec: 3949.7). Total num frames: 1105920. Throughput: 0: 1025.4. Samples: 276430. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-28 17:41:35,557][00596] Avg episode reward: [(0, '5.188')] +[2024-12-28 17:41:35,562][03321] Saving new best policy, reward=5.188! +[2024-12-28 17:41:40,555][00596] Fps is (10 sec: 4096.0, 60 sec: 4096.1, 300 sec: 3966.7). Total num frames: 1130496. Throughput: 0: 1012.9. Samples: 283116. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-28 17:41:40,557][00596] Avg episode reward: [(0, '5.302')] +[2024-12-28 17:41:40,566][03321] Saving new best policy, reward=5.302! +[2024-12-28 17:41:43,370][03334] Updated weights for policy 0, policy_version 280 (0.0014) +[2024-12-28 17:41:45,555][00596] Fps is (10 sec: 4915.2, 60 sec: 4232.5, 300 sec: 3983.0). Total num frames: 1155072. Throughput: 0: 1042.4. Samples: 286696. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-12-28 17:41:45,559][00596] Avg episode reward: [(0, '5.434')] +[2024-12-28 17:41:45,564][03321] Saving new best policy, reward=5.434! +[2024-12-28 17:41:50,555][00596] Fps is (10 sec: 4095.9, 60 sec: 4096.0, 300 sec: 3971.0). Total num frames: 1171456. Throughput: 0: 1038.5. Samples: 292336. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-12-28 17:41:50,564][00596] Avg episode reward: [(0, '5.302')] +[2024-12-28 17:41:54,401][03334] Updated weights for policy 0, policy_version 290 (0.0033) +[2024-12-28 17:41:55,555][00596] Fps is (10 sec: 3686.4, 60 sec: 4096.0, 300 sec: 4040.5). Total num frames: 1191936. Throughput: 0: 1004.7. Samples: 298208. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-12-28 17:41:55,557][00596] Avg episode reward: [(0, '5.209')] +[2024-12-28 17:42:00,555][00596] Fps is (10 sec: 4505.7, 60 sec: 4232.7, 300 sec: 4123.8). Total num frames: 1216512. Throughput: 0: 1020.6. Samples: 301908. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-12-28 17:42:00,557][00596] Avg episode reward: [(0, '5.465')] +[2024-12-28 17:42:00,565][03321] Saving new best policy, reward=5.465! +[2024-12-28 17:42:03,025][03334] Updated weights for policy 0, policy_version 300 (0.0015) +[2024-12-28 17:42:05,555][00596] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4109.9). Total num frames: 1232896. Throughput: 0: 1056.9. Samples: 308334. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-12-28 17:42:05,561][00596] Avg episode reward: [(0, '5.524')] +[2024-12-28 17:42:05,563][03321] Saving new best policy, reward=5.524! +[2024-12-28 17:42:10,555][00596] Fps is (10 sec: 3276.8, 60 sec: 3959.5, 300 sec: 4096.0). Total num frames: 1249280. Throughput: 0: 1001.1. Samples: 313146. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-12-28 17:42:10,560][00596] Avg episode reward: [(0, '5.691')] +[2024-12-28 17:42:10,645][03321] Saving new best policy, reward=5.691! +[2024-12-28 17:42:14,032][03334] Updated weights for policy 0, policy_version 310 (0.0026) +[2024-12-28 17:42:15,555][00596] Fps is (10 sec: 4096.0, 60 sec: 4096.2, 300 sec: 4123.8). Total num frames: 1273856. Throughput: 0: 999.4. Samples: 316734. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-12-28 17:42:15,560][00596] Avg episode reward: [(0, '5.442')] +[2024-12-28 17:42:20,555][00596] Fps is (10 sec: 4915.2, 60 sec: 4232.5, 300 sec: 4137.7). Total num frames: 1298432. Throughput: 0: 1062.3. Samples: 324232. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-12-28 17:42:20,557][00596] Avg episode reward: [(0, '5.465')] +[2024-12-28 17:42:20,567][03321] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000317_1298432.pth... +[2024-12-28 17:42:20,743][03321] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000075_307200.pth +[2024-12-28 17:42:24,342][03334] Updated weights for policy 0, policy_version 320 (0.0040) +[2024-12-28 17:42:25,555][00596] Fps is (10 sec: 3686.4, 60 sec: 4027.8, 300 sec: 4096.0). Total num frames: 1310720. Throughput: 0: 1014.2. Samples: 328754. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-12-28 17:42:25,565][00596] Avg episode reward: [(0, '5.848')] +[2024-12-28 17:42:25,567][03321] Saving new best policy, reward=5.848! +[2024-12-28 17:42:30,555][00596] Fps is (10 sec: 3686.4, 60 sec: 4096.0, 300 sec: 4123.8). Total num frames: 1335296. Throughput: 0: 1005.1. Samples: 331926. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-12-28 17:42:30,557][00596] Avg episode reward: [(0, '6.250')] +[2024-12-28 17:42:30,566][03321] Saving new best policy, reward=6.250! +[2024-12-28 17:42:33,326][03334] Updated weights for policy 0, policy_version 330 (0.0018) +[2024-12-28 17:42:35,555][00596] Fps is (10 sec: 4915.2, 60 sec: 4232.5, 300 sec: 4151.5). Total num frames: 1359872. Throughput: 0: 1044.3. Samples: 339328. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-28 17:42:35,563][00596] Avg episode reward: [(0, '6.255')] +[2024-12-28 17:42:35,566][03321] Saving new best policy, reward=6.255! +[2024-12-28 17:42:40,555][00596] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4123.8). Total num frames: 1376256. Throughput: 0: 1032.4. Samples: 344664. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-28 17:42:40,561][00596] Avg episode reward: [(0, '6.300')] +[2024-12-28 17:42:40,576][03321] Saving new best policy, reward=6.300! +[2024-12-28 17:42:44,535][03334] Updated weights for policy 0, policy_version 340 (0.0027) +[2024-12-28 17:42:45,555][00596] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4123.8). Total num frames: 1396736. Throughput: 0: 999.8. Samples: 346898. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-12-28 17:42:45,557][00596] Avg episode reward: [(0, '6.484')] +[2024-12-28 17:42:45,562][03321] Saving new best policy, reward=6.484! +[2024-12-28 17:42:50,555][00596] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 4151.5). Total num frames: 1421312. Throughput: 0: 1017.2. Samples: 354108. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-12-28 17:42:50,557][00596] Avg episode reward: [(0, '6.050')] +[2024-12-28 17:42:52,874][03334] Updated weights for policy 0, policy_version 350 (0.0021) +[2024-12-28 17:42:55,555][00596] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 4137.7). Total num frames: 1441792. Throughput: 0: 1060.1. Samples: 360850. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-28 17:42:55,558][00596] Avg episode reward: [(0, '6.180')] +[2024-12-28 17:43:00,555][00596] Fps is (10 sec: 3276.8, 60 sec: 3959.5, 300 sec: 4096.0). Total num frames: 1454080. Throughput: 0: 1028.0. Samples: 362996. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-12-28 17:43:00,560][00596] Avg episode reward: [(0, '6.377')] +[2024-12-28 17:43:03,874][03334] Updated weights for policy 0, policy_version 360 (0.0041) +[2024-12-28 17:43:05,555][00596] Fps is (10 sec: 4096.0, 60 sec: 4164.3, 300 sec: 4151.6). Total num frames: 1482752. Throughput: 0: 1003.2. Samples: 369374. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-28 17:43:05,559][00596] Avg episode reward: [(0, '6.731')] +[2024-12-28 17:43:05,564][03321] Saving new best policy, reward=6.731! +[2024-12-28 17:43:10,557][00596] Fps is (10 sec: 4914.4, 60 sec: 4232.4, 300 sec: 4137.6). Total num frames: 1503232. Throughput: 0: 1068.2. Samples: 376826. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-12-28 17:43:10,560][00596] Avg episode reward: [(0, '6.942')] +[2024-12-28 17:43:10,593][03321] Saving new best policy, reward=6.942! +[2024-12-28 17:43:13,414][03334] Updated weights for policy 0, policy_version 370 (0.0021) +[2024-12-28 17:43:15,555][00596] Fps is (10 sec: 3686.4, 60 sec: 4096.0, 300 sec: 4109.9). Total num frames: 1519616. Throughput: 0: 1049.2. Samples: 379138. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-12-28 17:43:15,558][00596] Avg episode reward: [(0, '7.174')] +[2024-12-28 17:43:15,560][03321] Saving new best policy, reward=7.174! +[2024-12-28 17:43:20,555][00596] Fps is (10 sec: 3687.0, 60 sec: 4027.7, 300 sec: 4123.8). Total num frames: 1540096. Throughput: 0: 998.3. Samples: 384252. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-28 17:43:20,559][00596] Avg episode reward: [(0, '7.577')] +[2024-12-28 17:43:20,568][03321] Saving new best policy, reward=7.577! +[2024-12-28 17:43:23,390][03334] Updated weights for policy 0, policy_version 380 (0.0018) +[2024-12-28 17:43:25,555][00596] Fps is (10 sec: 4505.6, 60 sec: 4232.5, 300 sec: 4137.7). Total num frames: 1564672. Throughput: 0: 1042.4. Samples: 391572. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-12-28 17:43:25,561][00596] Avg episode reward: [(0, '7.571')] +[2024-12-28 17:43:30,555][00596] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 4123.8). Total num frames: 1585152. Throughput: 0: 1069.9. Samples: 395042. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-12-28 17:43:30,564][00596] Avg episode reward: [(0, '8.028')] +[2024-12-28 17:43:30,574][03321] Saving new best policy, reward=8.028! +[2024-12-28 17:43:34,388][03334] Updated weights for policy 0, policy_version 390 (0.0032) +[2024-12-28 17:43:35,555][00596] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4109.9). Total num frames: 1601536. Throughput: 0: 1009.8. Samples: 399550. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-12-28 17:43:35,558][00596] Avg episode reward: [(0, '8.142')] +[2024-12-28 17:43:35,562][03321] Saving new best policy, reward=8.142! +[2024-12-28 17:43:40,556][00596] Fps is (10 sec: 4095.5, 60 sec: 4164.2, 300 sec: 4137.6). Total num frames: 1626112. Throughput: 0: 1020.9. Samples: 406792. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-12-28 17:43:40,563][00596] Avg episode reward: [(0, '8.527')] +[2024-12-28 17:43:40,572][03321] Saving new best policy, reward=8.527! +[2024-12-28 17:43:42,934][03334] Updated weights for policy 0, policy_version 400 (0.0022) +[2024-12-28 17:43:45,555][00596] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 4123.8). Total num frames: 1646592. Throughput: 0: 1050.5. Samples: 410270. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-28 17:43:45,560][00596] Avg episode reward: [(0, '8.535')] +[2024-12-28 17:43:45,565][03321] Saving new best policy, reward=8.535! +[2024-12-28 17:43:50,557][00596] Fps is (10 sec: 3685.9, 60 sec: 4027.6, 300 sec: 4096.0). Total num frames: 1662976. Throughput: 0: 1023.9. Samples: 415454. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-12-28 17:43:50,560][00596] Avg episode reward: [(0, '8.483')] +[2024-12-28 17:43:53,977][03334] Updated weights for policy 0, policy_version 410 (0.0016) +[2024-12-28 17:43:55,555][00596] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4123.8). Total num frames: 1683456. Throughput: 0: 997.2. Samples: 421700. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-28 17:43:55,562][00596] Avg episode reward: [(0, '8.315')] +[2024-12-28 17:44:00,555][00596] Fps is (10 sec: 4916.4, 60 sec: 4300.8, 300 sec: 4137.7). Total num frames: 1712128. Throughput: 0: 1026.2. Samples: 425318. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-12-28 17:44:00,561][00596] Avg episode reward: [(0, '9.012')] +[2024-12-28 17:44:00,571][03321] Saving new best policy, reward=9.012! +[2024-12-28 17:44:02,809][03334] Updated weights for policy 0, policy_version 420 (0.0027) +[2024-12-28 17:44:05,561][00596] Fps is (10 sec: 4093.4, 60 sec: 4027.3, 300 sec: 4109.8). Total num frames: 1724416. Throughput: 0: 1050.4. Samples: 431526. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-12-28 17:44:05,568][00596] Avg episode reward: [(0, '9.168')] +[2024-12-28 17:44:05,574][03321] Saving new best policy, reward=9.168! +[2024-12-28 17:44:10,555][00596] Fps is (10 sec: 3276.8, 60 sec: 4027.8, 300 sec: 4109.9). Total num frames: 1744896. Throughput: 0: 1005.5. Samples: 436818. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-12-28 17:44:10,561][00596] Avg episode reward: [(0, '9.974')] +[2024-12-28 17:44:10,571][03321] Saving new best policy, reward=9.974! +[2024-12-28 17:44:13,359][03334] Updated weights for policy 0, policy_version 430 (0.0036) +[2024-12-28 17:44:15,555][00596] Fps is (10 sec: 4508.5, 60 sec: 4164.3, 300 sec: 4137.7). Total num frames: 1769472. Throughput: 0: 1007.2. Samples: 440368. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-12-28 17:44:15,560][00596] Avg episode reward: [(0, '10.177')] +[2024-12-28 17:44:15,562][03321] Saving new best policy, reward=10.177! +[2024-12-28 17:44:20,556][00596] Fps is (10 sec: 4505.3, 60 sec: 4164.2, 300 sec: 4123.8). Total num frames: 1789952. Throughput: 0: 1064.1. Samples: 447436. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-12-28 17:44:20,558][00596] Avg episode reward: [(0, '9.945')] +[2024-12-28 17:44:20,570][03321] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000437_1789952.pth... +[2024-12-28 17:44:20,721][03321] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000196_802816.pth +[2024-12-28 17:44:24,020][03334] Updated weights for policy 0, policy_version 440 (0.0023) +[2024-12-28 17:44:25,555][00596] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4096.0). Total num frames: 1806336. Throughput: 0: 1004.0. Samples: 451970. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-28 17:44:25,557][00596] Avg episode reward: [(0, '9.531')] +[2024-12-28 17:44:30,555][00596] Fps is (10 sec: 4096.2, 60 sec: 4096.0, 300 sec: 4137.7). Total num frames: 1830912. Throughput: 0: 1006.3. Samples: 455556. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-28 17:44:30,558][00596] Avg episode reward: [(0, '9.783')] +[2024-12-28 17:44:32,756][03334] Updated weights for policy 0, policy_version 450 (0.0016) +[2024-12-28 17:44:35,555][00596] Fps is (10 sec: 4915.1, 60 sec: 4232.5, 300 sec: 4137.7). Total num frames: 1855488. Throughput: 0: 1054.6. Samples: 462910. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-28 17:44:35,557][00596] Avg episode reward: [(0, '10.454')] +[2024-12-28 17:44:35,560][03321] Saving new best policy, reward=10.454! +[2024-12-28 17:44:40,557][00596] Fps is (10 sec: 3685.9, 60 sec: 4027.7, 300 sec: 4096.0). Total num frames: 1867776. Throughput: 0: 1028.1. Samples: 467964. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-12-28 17:44:40,564][00596] Avg episode reward: [(0, '10.613')] +[2024-12-28 17:44:40,579][03321] Saving new best policy, reward=10.613! +[2024-12-28 17:44:43,930][03334] Updated weights for policy 0, policy_version 460 (0.0017) +[2024-12-28 17:44:45,555][00596] Fps is (10 sec: 3686.4, 60 sec: 4096.0, 300 sec: 4123.8). Total num frames: 1892352. Throughput: 0: 1004.3. Samples: 470510. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-12-28 17:44:45,557][00596] Avg episode reward: [(0, '11.366')] +[2024-12-28 17:44:45,563][03321] Saving new best policy, reward=11.366! +[2024-12-28 17:44:50,555][00596] Fps is (10 sec: 4916.0, 60 sec: 4232.7, 300 sec: 4137.7). Total num frames: 1916928. Throughput: 0: 1032.3. Samples: 477974. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-12-28 17:44:50,557][00596] Avg episode reward: [(0, '11.475')] +[2024-12-28 17:44:50,571][03321] Saving new best policy, reward=11.475! +[2024-12-28 17:44:52,039][03334] Updated weights for policy 0, policy_version 470 (0.0017) +[2024-12-28 17:44:55,558][00596] Fps is (10 sec: 4095.0, 60 sec: 4164.1, 300 sec: 4109.8). Total num frames: 1933312. Throughput: 0: 1051.8. Samples: 484150. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-12-28 17:44:55,565][00596] Avg episode reward: [(0, '12.452')] +[2024-12-28 17:44:55,568][03321] Saving new best policy, reward=12.452! +[2024-12-28 17:45:00,555][00596] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4109.9). Total num frames: 1953792. Throughput: 0: 1022.8. Samples: 486394. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-12-28 17:45:00,557][00596] Avg episode reward: [(0, '11.716')] +[2024-12-28 17:45:02,908][03334] Updated weights for policy 0, policy_version 480 (0.0031) +[2024-12-28 17:45:05,555][00596] Fps is (10 sec: 4506.8, 60 sec: 4233.0, 300 sec: 4151.5). Total num frames: 1978368. Throughput: 0: 1020.0. Samples: 493336. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-12-28 17:45:05,560][00596] Avg episode reward: [(0, '11.621')] +[2024-12-28 17:45:10,555][00596] Fps is (10 sec: 4505.6, 60 sec: 4232.5, 300 sec: 4137.7). Total num frames: 1998848. Throughput: 0: 1076.0. Samples: 500392. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-12-28 17:45:10,557][00596] Avg episode reward: [(0, '11.035')] +[2024-12-28 17:45:12,783][03334] Updated weights for policy 0, policy_version 490 (0.0019) +[2024-12-28 17:45:15,555][00596] Fps is (10 sec: 3686.4, 60 sec: 4096.0, 300 sec: 4109.9). Total num frames: 2015232. Throughput: 0: 1044.0. Samples: 502534. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-28 17:45:15,557][00596] Avg episode reward: [(0, '11.676')] +[2024-12-28 17:45:20,555][00596] Fps is (10 sec: 3686.4, 60 sec: 4096.0, 300 sec: 4123.9). Total num frames: 2035712. Throughput: 0: 1010.6. Samples: 508386. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-12-28 17:45:20,560][00596] Avg episode reward: [(0, '11.641')] +[2024-12-28 17:45:22,347][03334] Updated weights for policy 0, policy_version 500 (0.0026) +[2024-12-28 17:45:25,555][00596] Fps is (10 sec: 4505.6, 60 sec: 4232.5, 300 sec: 4137.7). Total num frames: 2060288. Throughput: 0: 1064.7. Samples: 515876. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-28 17:45:25,560][00596] Avg episode reward: [(0, '13.642')] +[2024-12-28 17:45:25,563][03321] Saving new best policy, reward=13.642! +[2024-12-28 17:45:30,558][00596] Fps is (10 sec: 4094.9, 60 sec: 4095.8, 300 sec: 4109.8). Total num frames: 2076672. Throughput: 0: 1073.3. Samples: 518812. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-28 17:45:30,560][00596] Avg episode reward: [(0, '13.505')] +[2024-12-28 17:45:33,335][03334] Updated weights for policy 0, policy_version 510 (0.0019) +[2024-12-28 17:45:35,555][00596] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4109.9). Total num frames: 2097152. Throughput: 0: 1015.7. Samples: 523682. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-12-28 17:45:35,557][00596] Avg episode reward: [(0, '14.012')] +[2024-12-28 17:45:35,560][03321] Saving new best policy, reward=14.012! +[2024-12-28 17:45:40,555][00596] Fps is (10 sec: 4506.8, 60 sec: 4232.6, 300 sec: 4137.7). Total num frames: 2121728. Throughput: 0: 1042.8. Samples: 531072. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-28 17:45:40,558][00596] Avg episode reward: [(0, '15.603')] +[2024-12-28 17:45:40,641][03321] Saving new best policy, reward=15.603! +[2024-12-28 17:45:41,584][03334] Updated weights for policy 0, policy_version 520 (0.0026) +[2024-12-28 17:45:45,555][00596] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 4123.8). Total num frames: 2142208. Throughput: 0: 1071.8. Samples: 534624. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-12-28 17:45:45,561][00596] Avg episode reward: [(0, '14.237')] +[2024-12-28 17:45:50,555][00596] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4109.9). Total num frames: 2158592. Throughput: 0: 1019.3. Samples: 539204. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-12-28 17:45:50,562][00596] Avg episode reward: [(0, '14.687')] +[2024-12-28 17:45:52,684][03334] Updated weights for policy 0, policy_version 530 (0.0024) +[2024-12-28 17:45:55,555][00596] Fps is (10 sec: 4096.0, 60 sec: 4164.4, 300 sec: 4137.7). Total num frames: 2183168. Throughput: 0: 1014.9. Samples: 546062. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-12-28 17:45:55,561][00596] Avg episode reward: [(0, '15.903')] +[2024-12-28 17:45:55,565][03321] Saving new best policy, reward=15.903! +[2024-12-28 17:46:00,556][00596] Fps is (10 sec: 4914.7, 60 sec: 4232.5, 300 sec: 4137.6). Total num frames: 2207744. Throughput: 0: 1047.8. Samples: 549688. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-28 17:46:00,562][00596] Avg episode reward: [(0, '15.349')] +[2024-12-28 17:46:01,559][03334] Updated weights for policy 0, policy_version 540 (0.0018) +[2024-12-28 17:46:05,555][00596] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4096.0). Total num frames: 2220032. Throughput: 0: 1043.7. Samples: 555354. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-12-28 17:46:05,557][00596] Avg episode reward: [(0, '15.592')] +[2024-12-28 17:46:10,555][00596] Fps is (10 sec: 3686.8, 60 sec: 4096.0, 300 sec: 4123.8). Total num frames: 2244608. Throughput: 0: 1009.2. Samples: 561290. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-12-28 17:46:10,560][00596] Avg episode reward: [(0, '16.429')] +[2024-12-28 17:46:10,574][03321] Saving new best policy, reward=16.429! +[2024-12-28 17:46:12,069][03334] Updated weights for policy 0, policy_version 550 (0.0021) +[2024-12-28 17:46:15,555][00596] Fps is (10 sec: 4915.1, 60 sec: 4232.5, 300 sec: 4151.5). Total num frames: 2269184. Throughput: 0: 1022.5. Samples: 564824. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-28 17:46:15,557][00596] Avg episode reward: [(0, '16.800')] +[2024-12-28 17:46:15,561][03321] Saving new best policy, reward=16.800! +[2024-12-28 17:46:20,555][00596] Fps is (10 sec: 4096.0, 60 sec: 4164.3, 300 sec: 4123.8). Total num frames: 2285568. Throughput: 0: 1060.3. Samples: 571394. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-28 17:46:20,559][00596] Avg episode reward: [(0, '18.109')] +[2024-12-28 17:46:20,582][03321] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000558_2285568.pth... +[2024-12-28 17:46:20,785][03321] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000317_1298432.pth +[2024-12-28 17:46:20,821][03321] Saving new best policy, reward=18.109! +[2024-12-28 17:46:22,714][03334] Updated weights for policy 0, policy_version 560 (0.0046) +[2024-12-28 17:46:25,555][00596] Fps is (10 sec: 3276.7, 60 sec: 4027.7, 300 sec: 4109.9). Total num frames: 2301952. Throughput: 0: 999.9. Samples: 576066. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-12-28 17:46:25,558][00596] Avg episode reward: [(0, '18.759')] +[2024-12-28 17:46:25,560][03321] Saving new best policy, reward=18.759! +[2024-12-28 17:46:30,555][00596] Fps is (10 sec: 4096.0, 60 sec: 4164.4, 300 sec: 4137.7). Total num frames: 2326528. Throughput: 0: 1001.6. Samples: 579694. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-12-28 17:46:30,557][00596] Avg episode reward: [(0, '18.185')] +[2024-12-28 17:46:31,625][03334] Updated weights for policy 0, policy_version 570 (0.0025) +[2024-12-28 17:46:35,555][00596] Fps is (10 sec: 4915.4, 60 sec: 4232.5, 300 sec: 4137.7). Total num frames: 2351104. Throughput: 0: 1065.6. Samples: 587158. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-28 17:46:35,560][00596] Avg episode reward: [(0, '17.007')] +[2024-12-28 17:46:40,555][00596] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4096.0). Total num frames: 2363392. Throughput: 0: 1016.7. Samples: 591814. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-12-28 17:46:40,562][00596] Avg episode reward: [(0, '15.670')] +[2024-12-28 17:46:42,442][03334] Updated weights for policy 0, policy_version 580 (0.0013) +[2024-12-28 17:46:45,555][00596] Fps is (10 sec: 3686.4, 60 sec: 4096.0, 300 sec: 4123.8). Total num frames: 2387968. Throughput: 0: 1006.8. Samples: 594994. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-12-28 17:46:45,557][00596] Avg episode reward: [(0, '16.729')] +[2024-12-28 17:46:50,555][00596] Fps is (10 sec: 4915.2, 60 sec: 4232.5, 300 sec: 4137.7). Total num frames: 2412544. Throughput: 0: 1046.2. Samples: 602434. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-12-28 17:46:50,557][00596] Avg episode reward: [(0, '16.719')] +[2024-12-28 17:46:50,837][03334] Updated weights for policy 0, policy_version 590 (0.0015) +[2024-12-28 17:46:55,555][00596] Fps is (10 sec: 4095.8, 60 sec: 4096.0, 300 sec: 4109.9). Total num frames: 2428928. Throughput: 0: 1034.8. Samples: 607858. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-12-28 17:46:55,560][00596] Avg episode reward: [(0, '17.039')] +[2024-12-28 17:47:00,555][00596] Fps is (10 sec: 3686.4, 60 sec: 4027.8, 300 sec: 4123.8). Total num frames: 2449408. Throughput: 0: 1009.9. Samples: 610270. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-12-28 17:47:00,558][00596] Avg episode reward: [(0, '18.602')] +[2024-12-28 17:47:01,768][03334] Updated weights for policy 0, policy_version 600 (0.0036) +[2024-12-28 17:47:05,555][00596] Fps is (10 sec: 4505.8, 60 sec: 4232.5, 300 sec: 4151.5). Total num frames: 2473984. Throughput: 0: 1028.0. Samples: 617654. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-12-28 17:47:05,557][00596] Avg episode reward: [(0, '19.075')] +[2024-12-28 17:47:05,564][03321] Saving new best policy, reward=19.075! +[2024-12-28 17:47:10,555][00596] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 4137.7). Total num frames: 2494464. Throughput: 0: 1067.3. Samples: 624096. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-12-28 17:47:10,559][00596] Avg episode reward: [(0, '19.175')] +[2024-12-28 17:47:10,570][03321] Saving new best policy, reward=19.175! +[2024-12-28 17:47:11,598][03334] Updated weights for policy 0, policy_version 610 (0.0026) +[2024-12-28 17:47:15,555][00596] Fps is (10 sec: 3686.3, 60 sec: 4027.7, 300 sec: 4109.9). Total num frames: 2510848. Throughput: 0: 1034.5. Samples: 626246. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-12-28 17:47:15,562][00596] Avg episode reward: [(0, '20.092')] +[2024-12-28 17:47:15,566][03321] Saving new best policy, reward=20.092! +[2024-12-28 17:47:20,555][00596] Fps is (10 sec: 4096.0, 60 sec: 4164.3, 300 sec: 4151.5). Total num frames: 2535424. Throughput: 0: 1013.2. Samples: 632750. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-28 17:47:20,557][00596] Avg episode reward: [(0, '19.535')] +[2024-12-28 17:47:21,261][03334] Updated weights for policy 0, policy_version 620 (0.0016) +[2024-12-28 17:47:25,558][00596] Fps is (10 sec: 4504.5, 60 sec: 4232.4, 300 sec: 4137.6). Total num frames: 2555904. Throughput: 0: 1073.4. Samples: 640120. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-12-28 17:47:25,560][00596] Avg episode reward: [(0, '18.865')] +[2024-12-28 17:47:30,555][00596] Fps is (10 sec: 3686.2, 60 sec: 4096.0, 300 sec: 4109.9). Total num frames: 2572288. Throughput: 0: 1053.3. Samples: 642394. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-28 17:47:30,562][00596] Avg episode reward: [(0, '19.288')] +[2024-12-28 17:47:32,124][03334] Updated weights for policy 0, policy_version 630 (0.0023) +[2024-12-28 17:47:35,555][00596] Fps is (10 sec: 4097.0, 60 sec: 4096.0, 300 sec: 4137.7). Total num frames: 2596864. Throughput: 0: 1012.7. Samples: 648004. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-12-28 17:47:35,557][00596] Avg episode reward: [(0, '20.192')] +[2024-12-28 17:47:35,564][03321] Saving new best policy, reward=20.192! +[2024-12-28 17:47:40,364][03334] Updated weights for policy 0, policy_version 640 (0.0016) +[2024-12-28 17:47:40,555][00596] Fps is (10 sec: 4915.4, 60 sec: 4300.8, 300 sec: 4151.5). Total num frames: 2621440. Throughput: 0: 1056.0. Samples: 655378. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-12-28 17:47:40,558][00596] Avg episode reward: [(0, '19.759')] +[2024-12-28 17:47:45,555][00596] Fps is (10 sec: 4096.1, 60 sec: 4164.3, 300 sec: 4123.8). Total num frames: 2637824. Throughput: 0: 1072.8. Samples: 658546. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-12-28 17:47:45,557][00596] Avg episode reward: [(0, '19.858')] +[2024-12-28 17:47:50,555][00596] Fps is (10 sec: 3686.4, 60 sec: 4096.0, 300 sec: 4123.8). Total num frames: 2658304. Throughput: 0: 1011.3. Samples: 663162. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-12-28 17:47:50,557][00596] Avg episode reward: [(0, '19.183')] +[2024-12-28 17:47:51,330][03334] Updated weights for policy 0, policy_version 650 (0.0036) +[2024-12-28 17:47:55,555][00596] Fps is (10 sec: 4505.6, 60 sec: 4232.6, 300 sec: 4165.4). Total num frames: 2682880. Throughput: 0: 1036.8. Samples: 670754. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-12-28 17:47:55,557][00596] Avg episode reward: [(0, '16.656')] +[2024-12-28 17:47:59,884][03334] Updated weights for policy 0, policy_version 660 (0.0021) +[2024-12-28 17:48:00,555][00596] Fps is (10 sec: 4505.6, 60 sec: 4232.5, 300 sec: 4137.7). Total num frames: 2703360. Throughput: 0: 1072.1. Samples: 674490. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-12-28 17:48:00,560][00596] Avg episode reward: [(0, '18.073')] +[2024-12-28 17:48:05,555][00596] Fps is (10 sec: 3686.4, 60 sec: 4096.0, 300 sec: 4123.8). Total num frames: 2719744. Throughput: 0: 1038.0. Samples: 679460. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-12-28 17:48:05,561][00596] Avg episode reward: [(0, '17.697')] +[2024-12-28 17:48:10,336][03334] Updated weights for policy 0, policy_version 670 (0.0021) +[2024-12-28 17:48:10,555][00596] Fps is (10 sec: 4096.0, 60 sec: 4164.3, 300 sec: 4151.5). Total num frames: 2744320. Throughput: 0: 1026.0. Samples: 686288. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-28 17:48:10,559][00596] Avg episode reward: [(0, '18.664')] +[2024-12-28 17:48:15,555][00596] Fps is (10 sec: 4915.2, 60 sec: 4300.8, 300 sec: 4165.4). Total num frames: 2768896. Throughput: 0: 1058.2. Samples: 690014. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-12-28 17:48:15,558][00596] Avg episode reward: [(0, '19.136')] +[2024-12-28 17:48:20,559][00596] Fps is (10 sec: 3685.1, 60 sec: 4095.8, 300 sec: 4123.7). Total num frames: 2781184. Throughput: 0: 1057.6. Samples: 695598. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-12-28 17:48:20,561][00596] Avg episode reward: [(0, '20.026')] +[2024-12-28 17:48:20,623][03321] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000680_2785280.pth... +[2024-12-28 17:48:20,631][03334] Updated weights for policy 0, policy_version 680 (0.0019) +[2024-12-28 17:48:20,803][03321] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000437_1789952.pth +[2024-12-28 17:48:25,555][00596] Fps is (10 sec: 3686.4, 60 sec: 4164.4, 300 sec: 4137.7). Total num frames: 2805760. Throughput: 0: 1020.8. Samples: 701314. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-12-28 17:48:25,559][00596] Avg episode reward: [(0, '18.499')] +[2024-12-28 17:48:29,622][03334] Updated weights for policy 0, policy_version 690 (0.0026) +[2024-12-28 17:48:30,555][00596] Fps is (10 sec: 4917.0, 60 sec: 4300.8, 300 sec: 4165.4). Total num frames: 2830336. Throughput: 0: 1033.2. Samples: 705038. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-12-28 17:48:30,561][00596] Avg episode reward: [(0, '18.462')] +[2024-12-28 17:48:35,555][00596] Fps is (10 sec: 4095.8, 60 sec: 4164.2, 300 sec: 4137.7). Total num frames: 2846720. Throughput: 0: 1081.4. Samples: 711824. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-12-28 17:48:35,558][00596] Avg episode reward: [(0, '18.908')] +[2024-12-28 17:48:40,468][03334] Updated weights for policy 0, policy_version 700 (0.0024) +[2024-12-28 17:48:40,555][00596] Fps is (10 sec: 3686.4, 60 sec: 4096.0, 300 sec: 4137.7). Total num frames: 2867200. Throughput: 0: 1021.9. Samples: 716740. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-12-28 17:48:40,560][00596] Avg episode reward: [(0, '19.192')] +[2024-12-28 17:48:45,555][00596] Fps is (10 sec: 4505.8, 60 sec: 4232.5, 300 sec: 4165.5). Total num frames: 2891776. Throughput: 0: 1021.6. Samples: 720460. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-28 17:48:45,557][00596] Avg episode reward: [(0, '19.027')] +[2024-12-28 17:48:48,789][03334] Updated weights for policy 0, policy_version 710 (0.0032) +[2024-12-28 17:48:50,559][00596] Fps is (10 sec: 4504.0, 60 sec: 4232.3, 300 sec: 4165.4). Total num frames: 2912256. Throughput: 0: 1075.7. Samples: 727870. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-28 17:48:50,561][00596] Avg episode reward: [(0, '20.141')] +[2024-12-28 17:48:55,558][00596] Fps is (10 sec: 3685.3, 60 sec: 4095.8, 300 sec: 4123.7). Total num frames: 2928640. Throughput: 0: 1026.3. Samples: 732476. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-12-28 17:48:55,563][00596] Avg episode reward: [(0, '20.537')] +[2024-12-28 17:48:55,565][03321] Saving new best policy, reward=20.537! +[2024-12-28 17:48:59,950][03334] Updated weights for policy 0, policy_version 720 (0.0032) +[2024-12-28 17:49:00,555][00596] Fps is (10 sec: 3687.7, 60 sec: 4096.0, 300 sec: 4151.6). Total num frames: 2949120. Throughput: 0: 1011.4. Samples: 735528. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-28 17:49:00,564][00596] Avg episode reward: [(0, '21.669')] +[2024-12-28 17:49:00,577][03321] Saving new best policy, reward=21.669! +[2024-12-28 17:49:05,555][00596] Fps is (10 sec: 4506.9, 60 sec: 4232.5, 300 sec: 4165.4). Total num frames: 2973696. Throughput: 0: 1052.0. Samples: 742934. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-28 17:49:05,560][00596] Avg episode reward: [(0, '21.544')] +[2024-12-28 17:49:09,271][03334] Updated weights for policy 0, policy_version 730 (0.0030) +[2024-12-28 17:49:10,555][00596] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4137.7). Total num frames: 2990080. Throughput: 0: 1050.8. Samples: 748602. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-12-28 17:49:10,558][00596] Avg episode reward: [(0, '22.495')] +[2024-12-28 17:49:10,566][03321] Saving new best policy, reward=22.495! +[2024-12-28 17:49:15,555][00596] Fps is (10 sec: 3686.3, 60 sec: 4027.7, 300 sec: 4137.7). Total num frames: 3010560. Throughput: 0: 1016.6. Samples: 750786. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-12-28 17:49:15,557][00596] Avg episode reward: [(0, '21.857')] +[2024-12-28 17:49:19,116][03334] Updated weights for policy 0, policy_version 740 (0.0026) +[2024-12-28 17:49:20,555][00596] Fps is (10 sec: 4505.6, 60 sec: 4232.8, 300 sec: 4165.4). Total num frames: 3035136. Throughput: 0: 1029.2. Samples: 758136. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-12-28 17:49:20,557][00596] Avg episode reward: [(0, '21.973')] +[2024-12-28 17:49:25,555][00596] Fps is (10 sec: 4505.7, 60 sec: 4164.3, 300 sec: 4151.5). Total num frames: 3055616. Throughput: 0: 1066.5. Samples: 764734. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-12-28 17:49:25,558][00596] Avg episode reward: [(0, '21.829')] +[2024-12-28 17:49:30,085][03334] Updated weights for policy 0, policy_version 750 (0.0022) +[2024-12-28 17:49:30,555][00596] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4123.8). Total num frames: 3072000. Throughput: 0: 1033.5. Samples: 766966. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-28 17:49:30,557][00596] Avg episode reward: [(0, '22.544')] +[2024-12-28 17:49:30,565][03321] Saving new best policy, reward=22.544! +[2024-12-28 17:49:35,555][00596] Fps is (10 sec: 4096.0, 60 sec: 4164.3, 300 sec: 4165.4). Total num frames: 3096576. Throughput: 0: 1011.4. Samples: 773378. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-28 17:49:35,562][00596] Avg episode reward: [(0, '22.867')] +[2024-12-28 17:49:35,566][03321] Saving new best policy, reward=22.867! +[2024-12-28 17:49:38,401][03334] Updated weights for policy 0, policy_version 760 (0.0022) +[2024-12-28 17:49:40,557][00596] Fps is (10 sec: 4914.4, 60 sec: 4232.4, 300 sec: 4165.4). Total num frames: 3121152. Throughput: 0: 1073.1. Samples: 780764. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-28 17:49:40,561][00596] Avg episode reward: [(0, '22.838')] +[2024-12-28 17:49:45,555][00596] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4123.8). Total num frames: 3133440. Throughput: 0: 1053.8. Samples: 782950. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-28 17:49:45,560][00596] Avg episode reward: [(0, '22.938')] +[2024-12-28 17:49:45,676][03321] Saving new best policy, reward=22.938! +[2024-12-28 17:49:49,402][03334] Updated weights for policy 0, policy_version 770 (0.0029) +[2024-12-28 17:49:50,555][00596] Fps is (10 sec: 3687.0, 60 sec: 4096.2, 300 sec: 4151.6). Total num frames: 3158016. Throughput: 0: 1014.0. Samples: 788562. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-12-28 17:49:50,562][00596] Avg episode reward: [(0, '22.959')] +[2024-12-28 17:49:50,573][03321] Saving new best policy, reward=22.959! +[2024-12-28 17:49:55,555][00596] Fps is (10 sec: 4915.1, 60 sec: 4232.7, 300 sec: 4165.4). Total num frames: 3182592. Throughput: 0: 1054.3. Samples: 796044. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-28 17:49:55,562][00596] Avg episode reward: [(0, '22.405')] +[2024-12-28 17:49:58,272][03334] Updated weights for policy 0, policy_version 780 (0.0018) +[2024-12-28 17:50:00,555][00596] Fps is (10 sec: 4096.0, 60 sec: 4164.3, 300 sec: 4137.7). Total num frames: 3198976. Throughput: 0: 1075.8. Samples: 799198. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-12-28 17:50:00,561][00596] Avg episode reward: [(0, '22.338')] +[2024-12-28 17:50:05,555][00596] Fps is (10 sec: 3686.5, 60 sec: 4096.0, 300 sec: 4137.7). Total num frames: 3219456. Throughput: 0: 1014.9. Samples: 803808. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-12-28 17:50:05,561][00596] Avg episode reward: [(0, '22.647')] +[2024-12-28 17:50:08,667][03334] Updated weights for policy 0, policy_version 790 (0.0030) +[2024-12-28 17:50:10,555][00596] Fps is (10 sec: 4505.6, 60 sec: 4232.5, 300 sec: 4165.4). Total num frames: 3244032. Throughput: 0: 1031.6. Samples: 811156. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-12-28 17:50:10,560][00596] Avg episode reward: [(0, '22.790')] +[2024-12-28 17:50:15,555][00596] Fps is (10 sec: 4505.6, 60 sec: 4232.6, 300 sec: 4165.4). Total num frames: 3264512. Throughput: 0: 1065.2. Samples: 814900. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-12-28 17:50:15,557][00596] Avg episode reward: [(0, '21.384')] +[2024-12-28 17:50:19,038][03334] Updated weights for policy 0, policy_version 800 (0.0020) +[2024-12-28 17:50:20,556][00596] Fps is (10 sec: 3686.1, 60 sec: 4095.9, 300 sec: 4137.6). Total num frames: 3280896. Throughput: 0: 1029.0. Samples: 819684. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-12-28 17:50:20,563][00596] Avg episode reward: [(0, '21.552')] +[2024-12-28 17:50:20,573][03321] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000801_3280896.pth... +[2024-12-28 17:50:20,739][03321] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000558_2285568.pth +[2024-12-28 17:50:25,555][00596] Fps is (10 sec: 4096.0, 60 sec: 4164.3, 300 sec: 4165.5). Total num frames: 3305472. Throughput: 0: 1012.1. Samples: 826306. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-28 17:50:25,563][00596] Avg episode reward: [(0, '20.546')] +[2024-12-28 17:50:27,996][03334] Updated weights for policy 0, policy_version 810 (0.0029) +[2024-12-28 17:50:30,555][00596] Fps is (10 sec: 4096.3, 60 sec: 4164.3, 300 sec: 4151.5). Total num frames: 3321856. Throughput: 0: 1046.5. Samples: 830044. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-28 17:50:30,561][00596] Avg episode reward: [(0, '20.736')] +[2024-12-28 17:50:35,555][00596] Fps is (10 sec: 2457.6, 60 sec: 3891.2, 300 sec: 4096.0). Total num frames: 3330048. Throughput: 0: 992.2. Samples: 833210. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-28 17:50:35,557][00596] Avg episode reward: [(0, '21.354')] +[2024-12-28 17:50:40,555][00596] Fps is (10 sec: 2867.1, 60 sec: 3823.0, 300 sec: 4096.0). Total num frames: 3350528. Throughput: 0: 944.3. Samples: 838538. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-12-28 17:50:40,557][00596] Avg episode reward: [(0, '21.677')] +[2024-12-28 17:50:41,453][03334] Updated weights for policy 0, policy_version 820 (0.0036) +[2024-12-28 17:50:45,555][00596] Fps is (10 sec: 4505.5, 60 sec: 4027.7, 300 sec: 4123.8). Total num frames: 3375104. Throughput: 0: 957.5. Samples: 842284. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-28 17:50:45,563][00596] Avg episode reward: [(0, '22.678')] +[2024-12-28 17:50:50,555][00596] Fps is (10 sec: 4505.6, 60 sec: 3959.4, 300 sec: 4109.9). Total num frames: 3395584. Throughput: 0: 999.1. Samples: 848768. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-12-28 17:50:50,561][00596] Avg episode reward: [(0, '22.609')] +[2024-12-28 17:50:51,234][03334] Updated weights for policy 0, policy_version 830 (0.0022) +[2024-12-28 17:50:55,555][00596] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 4082.1). Total num frames: 3411968. Throughput: 0: 942.3. Samples: 853558. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-12-28 17:50:55,562][00596] Avg episode reward: [(0, '21.641')] +[2024-12-28 17:51:00,555][00596] Fps is (10 sec: 4096.1, 60 sec: 3959.5, 300 sec: 4123.8). Total num frames: 3436544. Throughput: 0: 943.9. Samples: 857374. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-12-28 17:51:00,568][00596] Avg episode reward: [(0, '21.205')] +[2024-12-28 17:51:00,587][03334] Updated weights for policy 0, policy_version 840 (0.0016) +[2024-12-28 17:51:05,557][00596] Fps is (10 sec: 4914.5, 60 sec: 4027.6, 300 sec: 4123.7). Total num frames: 3461120. Throughput: 0: 1002.9. Samples: 864814. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-12-28 17:51:05,559][00596] Avg episode reward: [(0, '22.298')] +[2024-12-28 17:51:10,555][00596] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 4096.0). Total num frames: 3477504. Throughput: 0: 962.6. Samples: 869624. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-12-28 17:51:10,560][00596] Avg episode reward: [(0, '22.697')] +[2024-12-28 17:51:11,593][03334] Updated weights for policy 0, policy_version 850 (0.0031) +[2024-12-28 17:51:15,555][00596] Fps is (10 sec: 3687.0, 60 sec: 3891.2, 300 sec: 4109.9). Total num frames: 3497984. Throughput: 0: 949.7. Samples: 872780. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-28 17:51:15,560][00596] Avg episode reward: [(0, '21.980')] +[2024-12-28 17:51:19,758][03334] Updated weights for policy 0, policy_version 860 (0.0025) +[2024-12-28 17:51:20,555][00596] Fps is (10 sec: 4505.6, 60 sec: 4027.8, 300 sec: 4137.7). Total num frames: 3522560. Throughput: 0: 1043.2. Samples: 880156. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-12-28 17:51:20,564][00596] Avg episode reward: [(0, '23.242')] +[2024-12-28 17:51:20,575][03321] Saving new best policy, reward=23.242! +[2024-12-28 17:51:25,556][00596] Fps is (10 sec: 4095.8, 60 sec: 3891.2, 300 sec: 4109.9). Total num frames: 3538944. Throughput: 0: 1042.8. Samples: 885466. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-12-28 17:51:25,565][00596] Avg episode reward: [(0, '22.440')] +[2024-12-28 17:51:30,555][00596] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 4096.0). Total num frames: 3559424. Throughput: 0: 1010.7. Samples: 887766. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-12-28 17:51:30,562][00596] Avg episode reward: [(0, '20.531')] +[2024-12-28 17:51:31,073][03334] Updated weights for policy 0, policy_version 870 (0.0019) +[2024-12-28 17:51:35,555][00596] Fps is (10 sec: 4505.8, 60 sec: 4232.5, 300 sec: 4137.7). Total num frames: 3584000. Throughput: 0: 1031.1. Samples: 895168. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-28 17:51:35,557][00596] Avg episode reward: [(0, '20.429')] +[2024-12-28 17:51:39,699][03334] Updated weights for policy 0, policy_version 880 (0.0017) +[2024-12-28 17:51:40,555][00596] Fps is (10 sec: 4505.4, 60 sec: 4232.5, 300 sec: 4123.8). Total num frames: 3604480. Throughput: 0: 1074.1. Samples: 901892. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-12-28 17:51:40,558][00596] Avg episode reward: [(0, '22.447')] +[2024-12-28 17:51:45,555][00596] Fps is (10 sec: 3686.3, 60 sec: 4096.0, 300 sec: 4096.0). Total num frames: 3620864. Throughput: 0: 1041.0. Samples: 904220. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-28 17:51:45,557][00596] Avg episode reward: [(0, '21.629')] +[2024-12-28 17:51:50,166][03334] Updated weights for policy 0, policy_version 890 (0.0025) +[2024-12-28 17:51:50,555][00596] Fps is (10 sec: 4096.1, 60 sec: 4164.3, 300 sec: 4123.8). Total num frames: 3645440. Throughput: 0: 1015.6. Samples: 910514. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-28 17:51:50,557][00596] Avg episode reward: [(0, '23.673')] +[2024-12-28 17:51:50,564][03321] Saving new best policy, reward=23.673! +[2024-12-28 17:51:55,555][00596] Fps is (10 sec: 4915.3, 60 sec: 4300.8, 300 sec: 4137.7). Total num frames: 3670016. Throughput: 0: 1071.4. Samples: 917836. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-28 17:51:55,564][00596] Avg episode reward: [(0, '25.312')] +[2024-12-28 17:51:55,570][03321] Saving new best policy, reward=25.312! +[2024-12-28 17:52:00,555][00596] Fps is (10 sec: 3686.4, 60 sec: 4096.0, 300 sec: 4096.0). Total num frames: 3682304. Throughput: 0: 1052.0. Samples: 920120. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-28 17:52:00,559][00596] Avg episode reward: [(0, '25.527')] +[2024-12-28 17:52:00,577][03321] Saving new best policy, reward=25.527! +[2024-12-28 17:52:00,910][03334] Updated weights for policy 0, policy_version 900 (0.0037) +[2024-12-28 17:52:05,555][00596] Fps is (10 sec: 3276.8, 60 sec: 4027.8, 300 sec: 4096.0). Total num frames: 3702784. Throughput: 0: 1004.7. Samples: 925366. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-28 17:52:05,562][00596] Avg episode reward: [(0, '25.536')] +[2024-12-28 17:52:05,567][03321] Saving new best policy, reward=25.536! +[2024-12-28 17:52:10,003][03334] Updated weights for policy 0, policy_version 910 (0.0017) +[2024-12-28 17:52:10,555][00596] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 4123.8). Total num frames: 3727360. Throughput: 0: 1042.7. Samples: 932388. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-12-28 17:52:10,563][00596] Avg episode reward: [(0, '25.973')] +[2024-12-28 17:52:10,573][03321] Saving new best policy, reward=25.973! +[2024-12-28 17:52:15,555][00596] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 4109.9). Total num frames: 3747840. Throughput: 0: 1064.8. Samples: 935680. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2024-12-28 17:52:15,564][00596] Avg episode reward: [(0, '25.150')] +[2024-12-28 17:52:20,555][00596] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4096.0). Total num frames: 3764224. Throughput: 0: 1001.2. Samples: 940224. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2024-12-28 17:52:20,562][00596] Avg episode reward: [(0, '23.676')] +[2024-12-28 17:52:20,571][03321] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000919_3764224.pth... +[2024-12-28 17:52:20,694][03321] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000680_2785280.pth +[2024-12-28 17:52:21,116][03334] Updated weights for policy 0, policy_version 920 (0.0022) +[2024-12-28 17:52:25,555][00596] Fps is (10 sec: 4096.0, 60 sec: 4164.3, 300 sec: 4123.8). Total num frames: 3788800. Throughput: 0: 1012.1. Samples: 947434. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-28 17:52:25,557][00596] Avg episode reward: [(0, '23.346')] +[2024-12-28 17:52:29,382][03334] Updated weights for policy 0, policy_version 930 (0.0018) +[2024-12-28 17:52:30,555][00596] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 4109.9). Total num frames: 3809280. Throughput: 0: 1042.3. Samples: 951122. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-28 17:52:30,561][00596] Avg episode reward: [(0, '22.667')] +[2024-12-28 17:52:35,555][00596] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4082.1). Total num frames: 3825664. Throughput: 0: 1020.1. Samples: 956420. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-12-28 17:52:35,558][00596] Avg episode reward: [(0, '22.635')] +[2024-12-28 17:52:40,414][03334] Updated weights for policy 0, policy_version 940 (0.0015) +[2024-12-28 17:52:40,555][00596] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4109.9). Total num frames: 3850240. Throughput: 0: 997.1. Samples: 962706. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-12-28 17:52:40,562][00596] Avg episode reward: [(0, '23.312')] +[2024-12-28 17:52:45,555][00596] Fps is (10 sec: 4915.1, 60 sec: 4232.5, 300 sec: 4123.8). Total num frames: 3874816. Throughput: 0: 1028.8. Samples: 966418. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-12-28 17:52:45,557][00596] Avg episode reward: [(0, '23.297')] +[2024-12-28 17:52:50,065][03334] Updated weights for policy 0, policy_version 950 (0.0015) +[2024-12-28 17:52:50,555][00596] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4096.0). Total num frames: 3891200. Throughput: 0: 1049.6. Samples: 972600. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-12-28 17:52:50,557][00596] Avg episode reward: [(0, '23.181')] +[2024-12-28 17:52:55,555][00596] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4096.0). Total num frames: 3911680. Throughput: 0: 1013.4. Samples: 977992. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-12-28 17:52:55,560][00596] Avg episode reward: [(0, '23.976')] +[2024-12-28 17:52:59,466][03334] Updated weights for policy 0, policy_version 960 (0.0020) +[2024-12-28 17:53:00,555][00596] Fps is (10 sec: 4505.6, 60 sec: 4232.5, 300 sec: 4123.8). Total num frames: 3936256. Throughput: 0: 1024.1. Samples: 981764. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-12-28 17:53:00,558][00596] Avg episode reward: [(0, '23.012')] +[2024-12-28 17:53:05,555][00596] Fps is (10 sec: 4505.7, 60 sec: 4232.5, 300 sec: 4109.9). Total num frames: 3956736. Throughput: 0: 1082.7. Samples: 988944. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-12-28 17:53:05,562][00596] Avg episode reward: [(0, '22.924')] +[2024-12-28 17:53:10,297][03334] Updated weights for policy 0, policy_version 970 (0.0027) +[2024-12-28 17:53:10,555][00596] Fps is (10 sec: 3686.4, 60 sec: 4096.0, 300 sec: 4082.1). Total num frames: 3973120. Throughput: 0: 1024.2. Samples: 993524. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-12-28 17:53:10,557][00596] Avg episode reward: [(0, '22.711')] +[2024-12-28 17:53:15,555][00596] Fps is (10 sec: 4096.0, 60 sec: 4164.3, 300 sec: 4123.8). Total num frames: 3997696. Throughput: 0: 1023.4. Samples: 997176. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-12-28 17:53:15,558][00596] Avg episode reward: [(0, '23.793')] +[2024-12-28 17:53:17,002][03321] Stopping Batcher_0... +[2024-12-28 17:53:17,002][03321] Loop batcher_evt_loop terminating... +[2024-12-28 17:53:17,003][03321] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... +[2024-12-28 17:53:17,002][00596] Component Batcher_0 stopped! +[2024-12-28 17:53:17,066][03334] Weights refcount: 2 0 +[2024-12-28 17:53:17,070][03334] Stopping InferenceWorker_p0-w0... +[2024-12-28 17:53:17,071][03334] Loop inference_proc0-0_evt_loop terminating... +[2024-12-28 17:53:17,070][00596] Component InferenceWorker_p0-w0 stopped! +[2024-12-28 17:53:17,123][03321] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000801_3280896.pth +[2024-12-28 17:53:17,146][03321] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... +[2024-12-28 17:53:17,312][00596] Component RolloutWorker_w3 stopped! +[2024-12-28 17:53:17,313][03337] Stopping RolloutWorker_w3... +[2024-12-28 17:53:17,323][03337] Loop rollout_proc3_evt_loop terminating... +[2024-12-28 17:53:17,333][03336] Stopping RolloutWorker_w1... +[2024-12-28 17:53:17,333][00596] Component RolloutWorker_w1 stopped! +[2024-12-28 17:53:17,335][03336] Loop rollout_proc1_evt_loop terminating... +[2024-12-28 17:53:17,344][03341] Stopping RolloutWorker_w5... +[2024-12-28 17:53:17,344][00596] Component RolloutWorker_w5 stopped! +[2024-12-28 17:53:17,347][03341] Loop rollout_proc5_evt_loop terminating... +[2024-12-28 17:53:17,357][03342] Stopping RolloutWorker_w7... +[2024-12-28 17:53:17,360][03342] Loop rollout_proc7_evt_loop terminating... +[2024-12-28 17:53:17,356][00596] Component RolloutWorker_w7 stopped! +[2024-12-28 17:53:17,396][03321] Stopping LearnerWorker_p0... +[2024-12-28 17:53:17,396][00596] Component LearnerWorker_p0 stopped! +[2024-12-28 17:53:17,401][03321] Loop learner_proc0_evt_loop terminating... +[2024-12-28 17:53:17,544][03338] Stopping RolloutWorker_w2... +[2024-12-28 17:53:17,544][00596] Component RolloutWorker_w2 stopped! +[2024-12-28 17:53:17,546][03338] Loop rollout_proc2_evt_loop terminating... +[2024-12-28 17:53:17,580][03340] Stopping RolloutWorker_w4... +[2024-12-28 17:53:17,580][00596] Component RolloutWorker_w4 stopped! +[2024-12-28 17:53:17,590][03340] Loop rollout_proc4_evt_loop terminating... +[2024-12-28 17:53:17,595][00596] Component RolloutWorker_w0 stopped! +[2024-12-28 17:53:17,595][03335] Stopping RolloutWorker_w0... +[2024-12-28 17:53:17,600][03335] Loop rollout_proc0_evt_loop terminating... +[2024-12-28 17:53:17,620][03339] Stopping RolloutWorker_w6... +[2024-12-28 17:53:17,620][00596] Component RolloutWorker_w6 stopped! +[2024-12-28 17:53:17,621][00596] Waiting for process learner_proc0 to stop... +[2024-12-28 17:53:17,640][03339] Loop rollout_proc6_evt_loop terminating... +[2024-12-28 17:53:18,945][00596] Waiting for process inference_proc0-0 to join... +[2024-12-28 17:53:18,954][00596] Waiting for process rollout_proc0 to join... +[2024-12-28 17:53:21,357][00596] Waiting for process rollout_proc1 to join... +[2024-12-28 17:53:21,364][00596] Waiting for process rollout_proc2 to join... +[2024-12-28 17:53:21,375][00596] Waiting for process rollout_proc3 to join... +[2024-12-28 17:53:21,382][00596] Waiting for process rollout_proc4 to join... +[2024-12-28 17:53:21,388][00596] Waiting for process rollout_proc5 to join... +[2024-12-28 17:53:21,395][00596] Waiting for process rollout_proc6 to join... +[2024-12-28 17:53:21,407][00596] Waiting for process rollout_proc7 to join... +[2024-12-28 17:53:21,413][00596] Batcher 0 profile tree view: +batching: 26.0621, releasing_batches: 0.0259 +[2024-12-28 17:53:21,415][00596] InferenceWorker_p0-w0 profile tree view: +wait_policy: 0.0000 + wait_policy_total: 378.3158 +update_model: 8.2449 + weight_update: 0.0016 +one_step: 0.0080 + handle_policy_step: 555.8722 + deserialize: 14.0902, stack: 3.1317, obs_to_device_normalize: 120.0891, forward: 276.9159, send_messages: 27.5479 + prepare_outputs: 85.8603 + to_cpu: 52.4710 +[2024-12-28 17:53:21,418][00596] Learner 0 profile tree view: +misc: 0.0051, prepare_batch: 13.6461 +train: 73.1424 + epoch_init: 0.0092, minibatch_init: 0.0062, losses_postprocess: 0.6259, kl_divergence: 0.6239, after_optimizer: 33.4182 + calculate_losses: 26.1052 + losses_init: 0.0036, forward_head: 1.3829, bptt_initial: 17.2684, tail: 1.1402, advantages_returns: 0.2247, losses: 3.8147 + bptt: 1.9756 + bptt_forward_core: 1.8844 + update: 11.6268 + clip: 0.8462 +[2024-12-28 17:53:21,419][00596] RolloutWorker_w0 profile tree view: +wait_for_trajectories: 0.3111, enqueue_policy_requests: 85.0354, env_step: 778.3076, overhead: 11.8000, complete_rollouts: 6.7189 +save_policy_outputs: 19.8577 + split_output_tensors: 8.1105 +[2024-12-28 17:53:21,421][00596] RolloutWorker_w7 profile tree view: +wait_for_trajectories: 0.2282, enqueue_policy_requests: 86.3355, env_step: 769.1470, overhead: 11.7708, complete_rollouts: 7.0217 +save_policy_outputs: 19.4298 + split_output_tensors: 7.7731 +[2024-12-28 17:53:21,423][00596] Loop Runner_EvtLoop terminating... +[2024-12-28 17:53:21,427][00596] Runner profile tree view: +main_loop: 1012.2058 +[2024-12-28 17:53:21,428][00596] Collected {0: 4005888}, FPS: 3957.6 +[2024-12-28 17:53:22,003][00596] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json +[2024-12-28 17:53:22,005][00596] Overriding arg 'num_workers' with value 1 passed from command line +[2024-12-28 17:53:22,007][00596] Adding new argument 'no_render'=True that is not in the saved config file! +[2024-12-28 17:53:22,009][00596] Adding new argument 'save_video'=True that is not in the saved config file! +[2024-12-28 17:53:22,011][00596] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! +[2024-12-28 17:53:22,013][00596] Adding new argument 'video_name'=None that is not in the saved config file! +[2024-12-28 17:53:22,015][00596] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! +[2024-12-28 17:53:22,017][00596] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! +[2024-12-28 17:53:22,018][00596] Adding new argument 'push_to_hub'=False that is not in the saved config file! +[2024-12-28 17:53:22,019][00596] Adding new argument 'hf_repository'=None that is not in the saved config file! +[2024-12-28 17:53:22,020][00596] Adding new argument 'policy_index'=0 that is not in the saved config file! +[2024-12-28 17:53:22,021][00596] Adding new argument 'eval_deterministic'=False that is not in the saved config file! +[2024-12-28 17:53:22,022][00596] Adding new argument 'train_script'=None that is not in the saved config file! +[2024-12-28 17:53:22,024][00596] Adding new argument 'enjoy_script'=None that is not in the saved config file! +[2024-12-28 17:53:22,025][00596] Using frameskip 1 and render_action_repeat=4 for evaluation +[2024-12-28 17:53:22,069][00596] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-12-28 17:53:22,074][00596] RunningMeanStd input shape: (3, 72, 128) +[2024-12-28 17:53:22,076][00596] RunningMeanStd input shape: (1,) +[2024-12-28 17:53:22,097][00596] ConvEncoder: input_channels=3 +[2024-12-28 17:53:22,280][00596] Conv encoder output size: 512 +[2024-12-28 17:53:22,282][00596] Policy head output size: 512 +[2024-12-28 17:53:22,622][00596] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... +[2024-12-28 17:53:23,543][00596] Num frames 100... +[2024-12-28 17:53:23,663][00596] Num frames 200... +[2024-12-28 17:53:23,796][00596] Num frames 300... +[2024-12-28 17:53:23,918][00596] Num frames 400... +[2024-12-28 17:53:24,039][00596] Num frames 500... +[2024-12-28 17:53:24,165][00596] Num frames 600... +[2024-12-28 17:53:24,298][00596] Num frames 700... +[2024-12-28 17:53:24,424][00596] Num frames 800... +[2024-12-28 17:53:24,556][00596] Num frames 900... +[2024-12-28 17:53:24,678][00596] Num frames 1000... +[2024-12-28 17:53:24,800][00596] Num frames 1100... +[2024-12-28 17:53:24,942][00596] Avg episode rewards: #0: 29.690, true rewards: #0: 11.690 +[2024-12-28 17:53:24,943][00596] Avg episode reward: 29.690, avg true_objective: 11.690 +[2024-12-28 17:53:24,986][00596] Num frames 1200... +[2024-12-28 17:53:25,104][00596] Num frames 1300... +[2024-12-28 17:53:25,233][00596] Num frames 1400... +[2024-12-28 17:53:25,356][00596] Num frames 1500... +[2024-12-28 17:53:25,483][00596] Num frames 1600... +[2024-12-28 17:53:25,614][00596] Num frames 1700... +[2024-12-28 17:53:25,735][00596] Num frames 1800... +[2024-12-28 17:53:25,857][00596] Num frames 1900... +[2024-12-28 17:53:25,981][00596] Num frames 2000... +[2024-12-28 17:53:26,106][00596] Num frames 2100... +[2024-12-28 17:53:26,231][00596] Num frames 2200... +[2024-12-28 17:53:26,357][00596] Num frames 2300... +[2024-12-28 17:53:26,478][00596] Num frames 2400... +[2024-12-28 17:53:26,609][00596] Num frames 2500... +[2024-12-28 17:53:26,731][00596] Num frames 2600... +[2024-12-28 17:53:26,854][00596] Num frames 2700... +[2024-12-28 17:53:26,973][00596] Num frames 2800... +[2024-12-28 17:53:27,098][00596] Num frames 2900... +[2024-12-28 17:53:27,227][00596] Num frames 3000... +[2024-12-28 17:53:27,349][00596] Num frames 3100... +[2024-12-28 17:53:27,470][00596] Num frames 3200... +[2024-12-28 17:53:27,622][00596] Avg episode rewards: #0: 40.845, true rewards: #0: 16.345 +[2024-12-28 17:53:27,626][00596] Avg episode reward: 40.845, avg true_objective: 16.345 +[2024-12-28 17:53:27,667][00596] Num frames 3300... +[2024-12-28 17:53:27,786][00596] Num frames 3400... +[2024-12-28 17:53:27,908][00596] Num frames 3500... +[2024-12-28 17:53:28,028][00596] Num frames 3600... +[2024-12-28 17:53:28,149][00596] Num frames 3700... +[2024-12-28 17:53:28,280][00596] Num frames 3800... +[2024-12-28 17:53:28,404][00596] Num frames 3900... +[2024-12-28 17:53:28,528][00596] Num frames 4000... +[2024-12-28 17:53:28,663][00596] Num frames 4100... +[2024-12-28 17:53:28,785][00596] Num frames 4200... +[2024-12-28 17:53:28,905][00596] Num frames 4300... +[2024-12-28 17:53:29,028][00596] Num frames 4400... +[2024-12-28 17:53:29,149][00596] Num frames 4500... +[2024-12-28 17:53:29,277][00596] Num frames 4600... +[2024-12-28 17:53:29,402][00596] Num frames 4700... +[2024-12-28 17:53:29,482][00596] Avg episode rewards: #0: 40.400, true rewards: #0: 15.733 +[2024-12-28 17:53:29,483][00596] Avg episode reward: 40.400, avg true_objective: 15.733 +[2024-12-28 17:53:29,583][00596] Num frames 4800... +[2024-12-28 17:53:29,710][00596] Num frames 4900... +[2024-12-28 17:53:29,833][00596] Num frames 5000... +[2024-12-28 17:53:29,951][00596] Num frames 5100... +[2024-12-28 17:53:30,069][00596] Num frames 5200... +[2024-12-28 17:53:30,201][00596] Num frames 5300... +[2024-12-28 17:53:30,295][00596] Avg episode rewards: #0: 33.812, true rewards: #0: 13.312 +[2024-12-28 17:53:30,296][00596] Avg episode reward: 33.812, avg true_objective: 13.312 +[2024-12-28 17:53:30,385][00596] Num frames 5400... +[2024-12-28 17:53:30,501][00596] Num frames 5500... +[2024-12-28 17:53:30,634][00596] Num frames 5600... +[2024-12-28 17:53:30,754][00596] Num frames 5700... +[2024-12-28 17:53:30,869][00596] Num frames 5800... +[2024-12-28 17:53:31,014][00596] Avg episode rewards: #0: 28.752, true rewards: #0: 11.752 +[2024-12-28 17:53:31,016][00596] Avg episode reward: 28.752, avg true_objective: 11.752 +[2024-12-28 17:53:31,050][00596] Num frames 5900... +[2024-12-28 17:53:31,167][00596] Num frames 6000... +[2024-12-28 17:53:31,298][00596] Num frames 6100... +[2024-12-28 17:53:31,416][00596] Num frames 6200... +[2024-12-28 17:53:31,531][00596] Num frames 6300... +[2024-12-28 17:53:31,664][00596] Num frames 6400... +[2024-12-28 17:53:31,783][00596] Num frames 6500... +[2024-12-28 17:53:31,905][00596] Num frames 6600... +[2024-12-28 17:53:32,023][00596] Num frames 6700... +[2024-12-28 17:53:32,144][00596] Num frames 6800... +[2024-12-28 17:53:32,271][00596] Num frames 6900... +[2024-12-28 17:53:32,393][00596] Num frames 7000... +[2024-12-28 17:53:32,516][00596] Num frames 7100... +[2024-12-28 17:53:32,637][00596] Num frames 7200... +[2024-12-28 17:53:32,766][00596] Num frames 7300... +[2024-12-28 17:53:32,889][00596] Num frames 7400... +[2024-12-28 17:53:33,025][00596] Num frames 7500... +[2024-12-28 17:53:33,192][00596] Num frames 7600... +[2024-12-28 17:53:33,304][00596] Avg episode rewards: #0: 30.876, true rewards: #0: 12.710 +[2024-12-28 17:53:33,306][00596] Avg episode reward: 30.876, avg true_objective: 12.710 +[2024-12-28 17:53:33,426][00596] Num frames 7700... +[2024-12-28 17:53:33,590][00596] Num frames 7800... +[2024-12-28 17:53:33,775][00596] Num frames 7900... +[2024-12-28 17:53:33,961][00596] Avg episode rewards: #0: 27.398, true rewards: #0: 11.399 +[2024-12-28 17:53:33,965][00596] Avg episode reward: 27.398, avg true_objective: 11.399 +[2024-12-28 17:53:34,003][00596] Num frames 8000... +[2024-12-28 17:53:34,161][00596] Num frames 8100... +[2024-12-28 17:53:34,340][00596] Num frames 8200... +[2024-12-28 17:53:34,513][00596] Num frames 8300... +[2024-12-28 17:53:34,684][00596] Num frames 8400... +[2024-12-28 17:53:34,865][00596] Num frames 8500... +[2024-12-28 17:53:35,043][00596] Num frames 8600... +[2024-12-28 17:53:35,217][00596] Num frames 8700... +[2024-12-28 17:53:35,390][00596] Num frames 8800... +[2024-12-28 17:53:35,530][00596] Num frames 8900... +[2024-12-28 17:53:35,652][00596] Num frames 9000... +[2024-12-28 17:53:35,774][00596] Num frames 9100... +[2024-12-28 17:53:35,905][00596] Num frames 9200... +[2024-12-28 17:53:36,025][00596] Num frames 9300... +[2024-12-28 17:53:36,153][00596] Num frames 9400... +[2024-12-28 17:53:36,315][00596] Avg episode rewards: #0: 29.104, true rewards: #0: 11.854 +[2024-12-28 17:53:36,317][00596] Avg episode reward: 29.104, avg true_objective: 11.854 +[2024-12-28 17:53:36,338][00596] Num frames 9500... +[2024-12-28 17:53:36,461][00596] Num frames 9600... +[2024-12-28 17:53:36,578][00596] Num frames 9700... +[2024-12-28 17:53:36,698][00596] Num frames 9800... +[2024-12-28 17:53:36,819][00596] Num frames 9900... +[2024-12-28 17:53:36,946][00596] Num frames 10000... +[2024-12-28 17:53:37,036][00596] Avg episode rewards: #0: 26.919, true rewards: #0: 11.141 +[2024-12-28 17:53:37,038][00596] Avg episode reward: 26.919, avg true_objective: 11.141 +[2024-12-28 17:53:37,127][00596] Num frames 10100... +[2024-12-28 17:53:37,260][00596] Num frames 10200... +[2024-12-28 17:53:37,379][00596] Num frames 10300... +[2024-12-28 17:53:37,501][00596] Num frames 10400... +[2024-12-28 17:53:37,625][00596] Num frames 10500... +[2024-12-28 17:53:37,746][00596] Num frames 10600... +[2024-12-28 17:53:37,875][00596] Num frames 10700... +[2024-12-28 17:53:37,997][00596] Num frames 10800... +[2024-12-28 17:53:38,117][00596] Num frames 10900... +[2024-12-28 17:53:38,243][00596] Num frames 11000... +[2024-12-28 17:53:38,364][00596] Num frames 11100... +[2024-12-28 17:53:38,471][00596] Avg episode rewards: #0: 26.541, true rewards: #0: 11.141 +[2024-12-28 17:53:38,473][00596] Avg episode reward: 26.541, avg true_objective: 11.141 +[2024-12-28 17:54:42,111][00596] Replay video saved to /content/train_dir/default_experiment/replay.mp4! +[2024-12-28 17:57:38,880][00596] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json +[2024-12-28 17:57:38,882][00596] Overriding arg 'num_workers' with value 1 passed from command line +[2024-12-28 17:57:38,884][00596] Adding new argument 'no_render'=True that is not in the saved config file! +[2024-12-28 17:57:38,886][00596] Adding new argument 'save_video'=True that is not in the saved config file! +[2024-12-28 17:57:38,888][00596] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! +[2024-12-28 17:57:38,889][00596] Adding new argument 'video_name'=None that is not in the saved config file! +[2024-12-28 17:57:38,891][00596] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! +[2024-12-28 17:57:38,893][00596] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! +[2024-12-28 17:57:38,894][00596] Adding new argument 'push_to_hub'=True that is not in the saved config file! +[2024-12-28 17:57:38,895][00596] Adding new argument 'hf_repository'='Ali-HF/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! +[2024-12-28 17:57:38,896][00596] Adding new argument 'policy_index'=0 that is not in the saved config file! +[2024-12-28 17:57:38,897][00596] Adding new argument 'eval_deterministic'=False that is not in the saved config file! +[2024-12-28 17:57:38,898][00596] Adding new argument 'train_script'=None that is not in the saved config file! +[2024-12-28 17:57:38,899][00596] Adding new argument 'enjoy_script'=None that is not in the saved config file! +[2024-12-28 17:57:38,900][00596] Using frameskip 1 and render_action_repeat=4 for evaluation +[2024-12-28 17:57:38,928][00596] RunningMeanStd input shape: (3, 72, 128) +[2024-12-28 17:57:38,929][00596] RunningMeanStd input shape: (1,) +[2024-12-28 17:57:38,941][00596] ConvEncoder: input_channels=3 +[2024-12-28 17:57:38,976][00596] Conv encoder output size: 512 +[2024-12-28 17:57:38,978][00596] Policy head output size: 512 +[2024-12-28 17:57:38,996][00596] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... +[2024-12-28 17:57:39,431][00596] Num frames 100... +[2024-12-28 17:57:39,554][00596] Num frames 200... +[2024-12-28 17:57:39,676][00596] Num frames 300... +[2024-12-28 17:57:39,797][00596] Num frames 400... +[2024-12-28 17:57:39,922][00596] Num frames 500... +[2024-12-28 17:57:40,043][00596] Num frames 600... +[2024-12-28 17:57:40,174][00596] Num frames 700... +[2024-12-28 17:57:40,303][00596] Num frames 800... +[2024-12-28 17:57:40,476][00596] Avg episode rewards: #0: 20.960, true rewards: #0: 8.960 +[2024-12-28 17:57:40,477][00596] Avg episode reward: 20.960, avg true_objective: 8.960 +[2024-12-28 17:57:40,485][00596] Num frames 900... +[2024-12-28 17:57:40,606][00596] Num frames 1000... +[2024-12-28 17:57:40,733][00596] Num frames 1100... +[2024-12-28 17:57:40,853][00596] Num frames 1200... +[2024-12-28 17:57:40,983][00596] Num frames 1300... +[2024-12-28 17:57:41,104][00596] Num frames 1400... +[2024-12-28 17:57:41,248][00596] Num frames 1500... +[2024-12-28 17:57:41,382][00596] Num frames 1600... +[2024-12-28 17:57:41,505][00596] Num frames 1700... +[2024-12-28 17:57:41,627][00596] Num frames 1800... +[2024-12-28 17:57:41,790][00596] Avg episode rewards: #0: 22.940, true rewards: #0: 9.440 +[2024-12-28 17:57:41,791][00596] Avg episode reward: 22.940, avg true_objective: 9.440 +[2024-12-28 17:57:41,808][00596] Num frames 1900... +[2024-12-28 17:57:41,931][00596] Num frames 2000... +[2024-12-28 17:57:42,056][00596] Num frames 2100... +[2024-12-28 17:57:42,179][00596] Num frames 2200... +[2024-12-28 17:57:42,319][00596] Num frames 2300... +[2024-12-28 17:57:42,442][00596] Avg episode rewards: #0: 18.523, true rewards: #0: 7.857 +[2024-12-28 17:57:42,443][00596] Avg episode reward: 18.523, avg true_objective: 7.857 +[2024-12-28 17:57:42,496][00596] Num frames 2400... +[2024-12-28 17:57:42,673][00596] Num frames 2500... +[2024-12-28 17:57:42,847][00596] Num frames 2600... +[2024-12-28 17:57:43,009][00596] Num frames 2700... +[2024-12-28 17:57:43,174][00596] Num frames 2800... +[2024-12-28 17:57:43,353][00596] Num frames 2900... +[2024-12-28 17:57:43,519][00596] Num frames 3000... +[2024-12-28 17:57:43,676][00596] Avg episode rewards: #0: 17.653, true rewards: #0: 7.652 +[2024-12-28 17:57:43,680][00596] Avg episode reward: 17.653, avg true_objective: 7.652 +[2024-12-28 17:57:43,743][00596] Num frames 3100... +[2024-12-28 17:57:43,907][00596] Num frames 3200... +[2024-12-28 17:57:44,077][00596] Num frames 3300... +[2024-12-28 17:57:44,251][00596] Num frames 3400... +[2024-12-28 17:57:44,435][00596] Num frames 3500... +[2024-12-28 17:57:44,607][00596] Num frames 3600... +[2024-12-28 17:57:44,780][00596] Num frames 3700... +[2024-12-28 17:57:44,946][00596] Avg episode rewards: #0: 17.330, true rewards: #0: 7.530 +[2024-12-28 17:57:44,948][00596] Avg episode reward: 17.330, avg true_objective: 7.530 +[2024-12-28 17:57:45,009][00596] Num frames 3800... +[2024-12-28 17:57:45,144][00596] Num frames 3900... +[2024-12-28 17:57:45,282][00596] Num frames 4000... +[2024-12-28 17:57:45,410][00596] Num frames 4100... +[2024-12-28 17:57:45,529][00596] Num frames 4200... +[2024-12-28 17:57:45,643][00596] Avg episode rewards: #0: 15.908, true rewards: #0: 7.075 +[2024-12-28 17:57:45,644][00596] Avg episode reward: 15.908, avg true_objective: 7.075 +[2024-12-28 17:57:45,711][00596] Num frames 4300... +[2024-12-28 17:57:45,833][00596] Num frames 4400... +[2024-12-28 17:57:45,955][00596] Num frames 4500... +[2024-12-28 17:57:46,078][00596] Num frames 4600... +[2024-12-28 17:57:46,201][00596] Num frames 4700... +[2024-12-28 17:57:46,330][00596] Num frames 4800... +[2024-12-28 17:57:46,460][00596] Num frames 4900... +[2024-12-28 17:57:46,586][00596] Num frames 5000... +[2024-12-28 17:57:46,706][00596] Num frames 5100... +[2024-12-28 17:57:46,831][00596] Num frames 5200... +[2024-12-28 17:57:46,953][00596] Num frames 5300... +[2024-12-28 17:57:47,078][00596] Num frames 5400... +[2024-12-28 17:57:47,239][00596] Avg episode rewards: #0: 17.554, true rewards: #0: 7.840 +[2024-12-28 17:57:47,240][00596] Avg episode reward: 17.554, avg true_objective: 7.840 +[2024-12-28 17:57:47,257][00596] Num frames 5500... +[2024-12-28 17:57:47,378][00596] Num frames 5600... +[2024-12-28 17:57:47,509][00596] Num frames 5700... +[2024-12-28 17:57:47,630][00596] Num frames 5800... +[2024-12-28 17:57:47,756][00596] Num frames 5900... +[2024-12-28 17:57:47,877][00596] Num frames 6000... +[2024-12-28 17:57:48,028][00596] Num frames 6100... +[2024-12-28 17:57:48,152][00596] Num frames 6200... +[2024-12-28 17:57:48,287][00596] Num frames 6300... +[2024-12-28 17:57:48,414][00596] Num frames 6400... +[2024-12-28 17:57:48,551][00596] Num frames 6500... +[2024-12-28 17:57:48,673][00596] Num frames 6600... +[2024-12-28 17:57:48,797][00596] Num frames 6700... +[2024-12-28 17:57:48,922][00596] Num frames 6800... +[2024-12-28 17:57:49,045][00596] Num frames 6900... +[2024-12-28 17:57:49,169][00596] Avg episode rewards: #0: 19.814, true rewards: #0: 8.689 +[2024-12-28 17:57:49,170][00596] Avg episode reward: 19.814, avg true_objective: 8.689 +[2024-12-28 17:57:49,239][00596] Num frames 7000... +[2024-12-28 17:57:49,361][00596] Num frames 7100... +[2024-12-28 17:57:49,487][00596] Num frames 7200... +[2024-12-28 17:57:49,608][00596] Num frames 7300... +[2024-12-28 17:57:49,735][00596] Num frames 7400... +[2024-12-28 17:57:49,860][00596] Num frames 7500... +[2024-12-28 17:57:49,981][00596] Num frames 7600... +[2024-12-28 17:57:50,105][00596] Num frames 7700... +[2024-12-28 17:57:50,235][00596] Num frames 7800... +[2024-12-28 17:57:50,359][00596] Num frames 7900... +[2024-12-28 17:57:50,480][00596] Num frames 8000... +[2024-12-28 17:57:50,611][00596] Num frames 8100... +[2024-12-28 17:57:50,735][00596] Num frames 8200... +[2024-12-28 17:57:50,858][00596] Num frames 8300... +[2024-12-28 17:57:50,979][00596] Num frames 8400... +[2024-12-28 17:57:51,115][00596] Avg episode rewards: #0: 21.960, true rewards: #0: 9.404 +[2024-12-28 17:57:51,117][00596] Avg episode reward: 21.960, avg true_objective: 9.404 +[2024-12-28 17:57:51,164][00596] Num frames 8500... +[2024-12-28 17:57:51,296][00596] Num frames 8600... +[2024-12-28 17:57:51,421][00596] Num frames 8700... +[2024-12-28 17:57:51,549][00596] Num frames 8800... +[2024-12-28 17:57:51,670][00596] Num frames 8900... +[2024-12-28 17:57:51,795][00596] Num frames 9000... +[2024-12-28 17:57:51,919][00596] Num frames 9100... +[2024-12-28 17:57:52,038][00596] Num frames 9200... +[2024-12-28 17:57:52,163][00596] Num frames 9300... +[2024-12-28 17:57:52,291][00596] Num frames 9400... +[2024-12-28 17:57:52,411][00596] Num frames 9500... +[2024-12-28 17:57:52,531][00596] Num frames 9600... +[2024-12-28 17:57:52,692][00596] Avg episode rewards: #0: 22.680, true rewards: #0: 9.680 +[2024-12-28 17:57:52,693][00596] Avg episode reward: 22.680, avg true_objective: 9.680 +[2024-12-28 17:58:45,591][00596] Replay video saved to /content/train_dir/default_experiment/replay.mp4!