[2024-10-31 19:55:23,644][00273] Saving configuration to /content/train_dir/default_experiment/config.json... [2024-10-31 19:55:23,646][00273] Rollout worker 0 uses device cpu [2024-10-31 19:55:23,651][00273] Rollout worker 1 uses device cpu [2024-10-31 19:55:23,652][00273] Rollout worker 2 uses device cpu [2024-10-31 19:55:23,654][00273] Rollout worker 3 uses device cpu [2024-10-31 19:55:23,655][00273] Rollout worker 4 uses device cpu [2024-10-31 19:55:23,656][00273] Rollout worker 5 uses device cpu [2024-10-31 19:55:23,658][00273] Rollout worker 6 uses device cpu [2024-10-31 19:55:23,659][00273] Rollout worker 7 uses device cpu [2024-10-31 19:55:23,810][00273] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-10-31 19:55:23,812][00273] InferenceWorker_p0-w0: min num requests: 2 [2024-10-31 19:55:23,844][00273] Starting all processes... [2024-10-31 19:55:23,847][00273] Starting process learner_proc0 [2024-10-31 19:55:23,894][00273] Starting all processes... [2024-10-31 19:55:23,903][00273] Starting process inference_proc0-0 [2024-10-31 19:55:23,903][00273] Starting process rollout_proc0 [2024-10-31 19:55:23,904][00273] Starting process rollout_proc1 [2024-10-31 19:55:23,904][00273] Starting process rollout_proc2 [2024-10-31 19:55:23,904][00273] Starting process rollout_proc3 [2024-10-31 19:55:23,904][00273] Starting process rollout_proc4 [2024-10-31 19:55:23,904][00273] Starting process rollout_proc5 [2024-10-31 19:55:23,904][00273] Starting process rollout_proc6 [2024-10-31 19:55:23,904][00273] Starting process rollout_proc7 [2024-10-31 19:55:34,604][03744] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-10-31 19:55:34,605][03744] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2024-10-31 19:55:34,673][03744] Num visible devices: 1 [2024-10-31 19:55:34,705][03744] Starting seed is not provided [2024-10-31 19:55:34,706][03744] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-10-31 19:55:34,706][03744] Initializing actor-critic model on device cuda:0 [2024-10-31 19:55:34,707][03744] RunningMeanStd input shape: (3, 72, 128) [2024-10-31 19:55:34,709][03744] RunningMeanStd input shape: (1,) [2024-10-31 19:55:34,781][03744] ConvEncoder: input_channels=3 [2024-10-31 19:55:35,128][03763] Worker 5 uses CPU cores [1] [2024-10-31 19:55:35,151][03757] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-10-31 19:55:35,151][03757] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2024-10-31 19:55:35,197][03761] Worker 3 uses CPU cores [1] [2024-10-31 19:55:35,241][03760] Worker 2 uses CPU cores [0] [2024-10-31 19:55:35,249][03757] Num visible devices: 1 [2024-10-31 19:55:35,326][03764] Worker 6 uses CPU cores [0] [2024-10-31 19:55:35,351][03762] Worker 4 uses CPU cores [0] [2024-10-31 19:55:35,376][03758] Worker 0 uses CPU cores [0] [2024-10-31 19:55:35,427][03765] Worker 7 uses CPU cores [1] [2024-10-31 19:55:35,431][03744] Conv encoder output size: 512 [2024-10-31 19:55:35,431][03759] Worker 1 uses CPU cores [1] [2024-10-31 19:55:35,431][03744] Policy head output size: 512 [2024-10-31 19:55:35,449][03744] Created Actor Critic model with architecture: [2024-10-31 19:55:35,449][03744] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2024-10-31 19:55:39,358][03744] Using optimizer [2024-10-31 19:55:39,359][03744] No checkpoints found [2024-10-31 19:55:39,359][03744] Did not load from checkpoint, starting from scratch! [2024-10-31 19:55:39,359][03744] Initialized policy 0 weights for model version 0 [2024-10-31 19:55:39,369][03744] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-10-31 19:55:39,376][03744] LearnerWorker_p0 finished initialization! [2024-10-31 19:55:39,588][03757] RunningMeanStd input shape: (3, 72, 128) [2024-10-31 19:55:39,590][03757] RunningMeanStd input shape: (1,) [2024-10-31 19:55:39,608][03757] ConvEncoder: input_channels=3 [2024-10-31 19:55:39,768][03757] Conv encoder output size: 512 [2024-10-31 19:55:39,768][03757] Policy head output size: 512 [2024-10-31 19:55:40,354][00273] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-10-31 19:55:41,801][00273] Inference worker 0-0 is ready! [2024-10-31 19:55:41,802][00273] All inference workers are ready! Signal rollout workers to start! [2024-10-31 19:55:41,939][03762] Doom resolution: 160x120, resize resolution: (128, 72) [2024-10-31 19:55:41,946][03763] Doom resolution: 160x120, resize resolution: (128, 72) [2024-10-31 19:55:41,948][03760] Doom resolution: 160x120, resize resolution: (128, 72) [2024-10-31 19:55:41,965][03765] Doom resolution: 160x120, resize resolution: (128, 72) [2024-10-31 19:55:41,966][03759] Doom resolution: 160x120, resize resolution: (128, 72) [2024-10-31 19:55:41,968][03761] Doom resolution: 160x120, resize resolution: (128, 72) [2024-10-31 19:55:41,965][03764] Doom resolution: 160x120, resize resolution: (128, 72) [2024-10-31 19:55:41,976][03758] Doom resolution: 160x120, resize resolution: (128, 72) [2024-10-31 19:55:43,077][03759] Decorrelating experience for 0 frames... [2024-10-31 19:55:43,078][03763] Decorrelating experience for 0 frames... [2024-10-31 19:55:43,077][03758] Decorrelating experience for 0 frames... [2024-10-31 19:55:43,802][00273] Heartbeat connected on Batcher_0 [2024-10-31 19:55:43,806][00273] Heartbeat connected on LearnerWorker_p0 [2024-10-31 19:55:43,815][03759] Decorrelating experience for 32 frames... [2024-10-31 19:55:43,813][03763] Decorrelating experience for 32 frames... [2024-10-31 19:55:43,842][00273] Heartbeat connected on InferenceWorker_p0-w0 [2024-10-31 19:55:43,859][03764] Decorrelating experience for 0 frames... [2024-10-31 19:55:43,867][03758] Decorrelating experience for 32 frames... [2024-10-31 19:55:44,670][03764] Decorrelating experience for 32 frames... [2024-10-31 19:55:44,694][03759] Decorrelating experience for 64 frames... [2024-10-31 19:55:44,699][03763] Decorrelating experience for 64 frames... [2024-10-31 19:55:44,791][03758] Decorrelating experience for 64 frames... [2024-10-31 19:55:45,161][03759] Decorrelating experience for 96 frames... [2024-10-31 19:55:45,285][00273] Heartbeat connected on RolloutWorker_w1 [2024-10-31 19:55:45,354][00273] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-10-31 19:55:45,551][03764] Decorrelating experience for 64 frames... [2024-10-31 19:55:45,629][03758] Decorrelating experience for 96 frames... [2024-10-31 19:55:45,793][00273] Heartbeat connected on RolloutWorker_w0 [2024-10-31 19:55:45,808][03763] Decorrelating experience for 96 frames... [2024-10-31 19:55:45,921][00273] Heartbeat connected on RolloutWorker_w5 [2024-10-31 19:55:46,065][03764] Decorrelating experience for 96 frames... [2024-10-31 19:55:46,124][00273] Heartbeat connected on RolloutWorker_w6 [2024-10-31 19:55:50,353][00273] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 2.0. Samples: 20. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-10-31 19:55:50,356][00273] Avg episode reward: [(0, '3.156')] [2024-10-31 19:55:50,517][03744] Signal inference workers to stop experience collection... [2024-10-31 19:55:50,525][03757] InferenceWorker_p0-w0: stopping experience collection [2024-10-31 19:55:52,536][03744] Signal inference workers to resume experience collection... [2024-10-31 19:55:52,537][03757] InferenceWorker_p0-w0: resuming experience collection [2024-10-31 19:55:55,357][00273] Fps is (10 sec: 819.2, 60 sec: 546.2, 300 sec: 546.2). Total num frames: 8192. Throughput: 0: 170.9. Samples: 2564. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2024-10-31 19:55:55,364][00273] Avg episode reward: [(0, '3.506')] [2024-10-31 19:56:00,355][00273] Fps is (10 sec: 2866.7, 60 sec: 1433.5, 300 sec: 1433.5). Total num frames: 28672. Throughput: 0: 363.3. Samples: 7266. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2024-10-31 19:56:00,360][00273] Avg episode reward: [(0, '4.090')] [2024-10-31 19:56:02,801][03757] Updated weights for policy 0, policy_version 10 (0.0016) [2024-10-31 19:56:05,353][00273] Fps is (10 sec: 4096.0, 60 sec: 1966.2, 300 sec: 1966.2). Total num frames: 49152. Throughput: 0: 413.5. Samples: 10336. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2024-10-31 19:56:05,356][00273] Avg episode reward: [(0, '4.519')] [2024-10-31 19:56:10,353][00273] Fps is (10 sec: 3277.4, 60 sec: 2048.1, 300 sec: 2048.1). Total num frames: 61440. Throughput: 0: 507.9. Samples: 15236. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-10-31 19:56:10,356][00273] Avg episode reward: [(0, '4.550')] [2024-10-31 19:56:15,353][00273] Fps is (10 sec: 2867.2, 60 sec: 2223.6, 300 sec: 2223.6). Total num frames: 77824. Throughput: 0: 577.3. Samples: 20204. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-10-31 19:56:15,360][00273] Avg episode reward: [(0, '4.608')] [2024-10-31 19:56:15,388][03757] Updated weights for policy 0, policy_version 20 (0.0016) [2024-10-31 19:56:20,353][00273] Fps is (10 sec: 4096.0, 60 sec: 2560.1, 300 sec: 2560.1). Total num frames: 102400. Throughput: 0: 582.3. Samples: 23292. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-10-31 19:56:20,359][00273] Avg episode reward: [(0, '4.474')] [2024-10-31 19:56:20,368][03744] Saving new best policy, reward=4.474! [2024-10-31 19:56:25,353][00273] Fps is (10 sec: 3686.4, 60 sec: 2548.7, 300 sec: 2548.7). Total num frames: 114688. Throughput: 0: 629.2. Samples: 28312. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2024-10-31 19:56:25,359][00273] Avg episode reward: [(0, '4.277')] [2024-10-31 19:56:27,528][03757] Updated weights for policy 0, policy_version 30 (0.0012) [2024-10-31 19:56:30,353][00273] Fps is (10 sec: 2867.2, 60 sec: 2621.5, 300 sec: 2621.5). Total num frames: 131072. Throughput: 0: 743.1. Samples: 33438. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-10-31 19:56:30,356][00273] Avg episode reward: [(0, '4.460')] [2024-10-31 19:56:35,353][00273] Fps is (10 sec: 3686.4, 60 sec: 2755.5, 300 sec: 2755.5). Total num frames: 151552. Throughput: 0: 810.8. Samples: 36506. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-10-31 19:56:35,358][00273] Avg episode reward: [(0, '4.598')] [2024-10-31 19:56:35,368][03744] Saving new best policy, reward=4.598! [2024-10-31 19:56:37,961][03757] Updated weights for policy 0, policy_version 40 (0.0012) [2024-10-31 19:56:40,353][00273] Fps is (10 sec: 3686.4, 60 sec: 2799.0, 300 sec: 2799.0). Total num frames: 167936. Throughput: 0: 870.8. Samples: 41748. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-10-31 19:56:40,361][00273] Avg episode reward: [(0, '4.487')] [2024-10-31 19:56:45,354][00273] Fps is (10 sec: 3276.7, 60 sec: 3072.0, 300 sec: 2835.7). Total num frames: 184320. Throughput: 0: 866.6. Samples: 46260. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-10-31 19:56:45,356][00273] Avg episode reward: [(0, '4.502')] [2024-10-31 19:56:50,354][00273] Fps is (10 sec: 3276.5, 60 sec: 3345.0, 300 sec: 2867.2). Total num frames: 200704. Throughput: 0: 862.9. Samples: 49166. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-10-31 19:56:50,357][00273] Avg episode reward: [(0, '4.600')] [2024-10-31 19:56:50,368][03744] Saving new best policy, reward=4.600! [2024-10-31 19:56:51,370][03757] Updated weights for policy 0, policy_version 50 (0.0024) [2024-10-31 19:56:55,353][00273] Fps is (10 sec: 2867.3, 60 sec: 3413.3, 300 sec: 2839.9). Total num frames: 212992. Throughput: 0: 835.6. Samples: 52840. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-10-31 19:56:55,355][00273] Avg episode reward: [(0, '4.448')] [2024-10-31 19:57:00,353][00273] Fps is (10 sec: 2867.4, 60 sec: 3345.2, 300 sec: 2867.2). Total num frames: 229376. Throughput: 0: 821.7. Samples: 57182. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-10-31 19:57:00,359][00273] Avg episode reward: [(0, '4.454')] [2024-10-31 19:57:04,463][03757] Updated weights for policy 0, policy_version 60 (0.0012) [2024-10-31 19:57:05,353][00273] Fps is (10 sec: 3276.8, 60 sec: 3276.8, 300 sec: 2891.3). Total num frames: 245760. Throughput: 0: 819.5. Samples: 60168. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-10-31 19:57:05,356][00273] Avg episode reward: [(0, '4.504')] [2024-10-31 19:57:10,355][00273] Fps is (10 sec: 3685.8, 60 sec: 3413.2, 300 sec: 2958.2). Total num frames: 266240. Throughput: 0: 838.7. Samples: 66056. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-10-31 19:57:10,361][00273] Avg episode reward: [(0, '4.780')] [2024-10-31 19:57:10,371][03744] Saving new best policy, reward=4.780! [2024-10-31 19:57:15,353][00273] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 2931.9). Total num frames: 278528. Throughput: 0: 814.4. Samples: 70088. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-10-31 19:57:15,359][00273] Avg episode reward: [(0, '5.038')] [2024-10-31 19:57:15,362][03744] Saving new best policy, reward=5.038! [2024-10-31 19:57:16,808][03757] Updated weights for policy 0, policy_version 70 (0.0012) [2024-10-31 19:57:20,354][00273] Fps is (10 sec: 3277.3, 60 sec: 3276.8, 300 sec: 2990.1). Total num frames: 299008. Throughput: 0: 809.8. Samples: 72946. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-10-31 19:57:20,360][00273] Avg episode reward: [(0, '5.000')] [2024-10-31 19:57:20,375][03744] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000073_299008.pth... [2024-10-31 19:57:25,354][00273] Fps is (10 sec: 3686.3, 60 sec: 3345.1, 300 sec: 3003.8). Total num frames: 315392. Throughput: 0: 828.4. Samples: 79028. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-10-31 19:57:25,356][00273] Avg episode reward: [(0, '4.932')] [2024-10-31 19:57:28,780][03757] Updated weights for policy 0, policy_version 80 (0.0018) [2024-10-31 19:57:30,353][00273] Fps is (10 sec: 3276.9, 60 sec: 3345.1, 300 sec: 3016.2). Total num frames: 331776. Throughput: 0: 816.9. Samples: 83022. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-10-31 19:57:30,356][00273] Avg episode reward: [(0, '4.832')] [2024-10-31 19:57:35,353][00273] Fps is (10 sec: 3686.5, 60 sec: 3345.1, 300 sec: 3063.1). Total num frames: 352256. Throughput: 0: 820.1. Samples: 86068. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-10-31 19:57:35,362][00273] Avg episode reward: [(0, '4.966')] [2024-10-31 19:57:38,848][03757] Updated weights for policy 0, policy_version 90 (0.0012) [2024-10-31 19:57:40,353][00273] Fps is (10 sec: 4096.0, 60 sec: 3413.3, 300 sec: 3106.2). Total num frames: 372736. Throughput: 0: 877.6. Samples: 92334. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-10-31 19:57:40,356][00273] Avg episode reward: [(0, '5.265')] [2024-10-31 19:57:40,368][03744] Saving new best policy, reward=5.265! [2024-10-31 19:57:45,354][00273] Fps is (10 sec: 3276.6, 60 sec: 3345.0, 300 sec: 3080.2). Total num frames: 385024. Throughput: 0: 870.7. Samples: 96366. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-10-31 19:57:45,356][00273] Avg episode reward: [(0, '5.240')] [2024-10-31 19:57:50,354][00273] Fps is (10 sec: 3276.7, 60 sec: 3413.4, 300 sec: 3119.3). Total num frames: 405504. Throughput: 0: 870.2. Samples: 99326. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-10-31 19:57:50,356][00273] Avg episode reward: [(0, '5.222')] [2024-10-31 19:57:51,182][03757] Updated weights for policy 0, policy_version 100 (0.0015) [2024-10-31 19:57:55,353][00273] Fps is (10 sec: 4096.3, 60 sec: 3549.9, 300 sec: 3155.5). Total num frames: 425984. Throughput: 0: 875.3. Samples: 105444. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-10-31 19:57:55,356][00273] Avg episode reward: [(0, '5.423')] [2024-10-31 19:57:55,364][03744] Saving new best policy, reward=5.423! [2024-10-31 19:58:00,353][00273] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3130.5). Total num frames: 438272. Throughput: 0: 881.7. Samples: 109764. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-10-31 19:58:00,361][00273] Avg episode reward: [(0, '5.529')] [2024-10-31 19:58:00,374][03744] Saving new best policy, reward=5.529! [2024-10-31 19:58:03,194][03757] Updated weights for policy 0, policy_version 110 (0.0019) [2024-10-31 19:58:05,353][00273] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3163.8). Total num frames: 458752. Throughput: 0: 879.3. Samples: 112514. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-10-31 19:58:05,363][00273] Avg episode reward: [(0, '5.608')] [2024-10-31 19:58:05,365][03744] Saving new best policy, reward=5.608! [2024-10-31 19:58:10,354][00273] Fps is (10 sec: 4095.9, 60 sec: 3550.0, 300 sec: 3194.9). Total num frames: 479232. Throughput: 0: 879.2. Samples: 118590. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-10-31 19:58:10,360][00273] Avg episode reward: [(0, '5.396')] [2024-10-31 19:58:14,815][03757] Updated weights for policy 0, policy_version 120 (0.0015) [2024-10-31 19:58:15,353][00273] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3171.1). Total num frames: 491520. Throughput: 0: 890.2. Samples: 123082. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-10-31 19:58:15,360][00273] Avg episode reward: [(0, '5.417')] [2024-10-31 19:58:20,353][00273] Fps is (10 sec: 2048.0, 60 sec: 3345.1, 300 sec: 3123.2). Total num frames: 499712. Throughput: 0: 846.5. Samples: 124162. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-10-31 19:58:20,358][00273] Avg episode reward: [(0, '5.098')] [2024-10-31 19:58:25,353][00273] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3152.7). Total num frames: 520192. Throughput: 0: 822.5. Samples: 129348. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-10-31 19:58:25,361][00273] Avg episode reward: [(0, '5.016')] [2024-10-31 19:58:28,579][03757] Updated weights for policy 0, policy_version 130 (0.0013) [2024-10-31 19:58:30,355][00273] Fps is (10 sec: 3685.7, 60 sec: 3413.2, 300 sec: 3156.3). Total num frames: 536576. Throughput: 0: 838.5. Samples: 134100. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-10-31 19:58:30,363][00273] Avg episode reward: [(0, '4.955')] [2024-10-31 19:58:35,353][00273] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3159.8). Total num frames: 552960. Throughput: 0: 825.6. Samples: 136476. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-10-31 19:58:35,361][00273] Avg episode reward: [(0, '4.889')] [2024-10-31 19:58:39,968][03757] Updated weights for policy 0, policy_version 140 (0.0016) [2024-10-31 19:58:40,354][00273] Fps is (10 sec: 3687.0, 60 sec: 3345.1, 300 sec: 3185.8). Total num frames: 573440. Throughput: 0: 823.4. Samples: 142498. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-10-31 19:58:40,356][00273] Avg episode reward: [(0, '4.638')] [2024-10-31 19:58:45,353][00273] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3166.1). Total num frames: 585728. Throughput: 0: 832.9. Samples: 147244. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-10-31 19:58:45,359][00273] Avg episode reward: [(0, '4.731')] [2024-10-31 19:58:50,353][00273] Fps is (10 sec: 3276.9, 60 sec: 3345.1, 300 sec: 3190.6). Total num frames: 606208. Throughput: 0: 819.2. Samples: 149380. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-10-31 19:58:50,356][00273] Avg episode reward: [(0, '4.970')] [2024-10-31 19:58:52,276][03757] Updated weights for policy 0, policy_version 150 (0.0018) [2024-10-31 19:58:55,353][00273] Fps is (10 sec: 4096.0, 60 sec: 3345.1, 300 sec: 3213.8). Total num frames: 626688. Throughput: 0: 820.4. Samples: 155508. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-10-31 19:58:55,356][00273] Avg episode reward: [(0, '5.099')] [2024-10-31 19:59:00,353][00273] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3194.9). Total num frames: 638976. Throughput: 0: 834.1. Samples: 160618. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-10-31 19:59:00,356][00273] Avg episode reward: [(0, '4.987')] [2024-10-31 19:59:04,232][03757] Updated weights for policy 0, policy_version 160 (0.0013) [2024-10-31 19:59:05,353][00273] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3216.9). Total num frames: 659456. Throughput: 0: 853.6. Samples: 162574. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-10-31 19:59:05,356][00273] Avg episode reward: [(0, '4.880')] [2024-10-31 19:59:10,353][00273] Fps is (10 sec: 3686.4, 60 sec: 3276.8, 300 sec: 3218.3). Total num frames: 675840. Throughput: 0: 873.7. Samples: 168666. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-10-31 19:59:10,356][00273] Avg episode reward: [(0, '5.082')] [2024-10-31 19:59:15,353][00273] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3219.7). Total num frames: 692224. Throughput: 0: 884.7. Samples: 173908. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-10-31 19:59:15,356][00273] Avg episode reward: [(0, '5.301')] [2024-10-31 19:59:15,497][03757] Updated weights for policy 0, policy_version 170 (0.0012) [2024-10-31 19:59:20,353][00273] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3221.0). Total num frames: 708608. Throughput: 0: 875.2. Samples: 175860. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-10-31 19:59:20,357][00273] Avg episode reward: [(0, '5.132')] [2024-10-31 19:59:20,411][03744] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000174_712704.pth... [2024-10-31 19:59:25,353][00273] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3240.4). Total num frames: 729088. Throughput: 0: 875.1. Samples: 181876. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-10-31 19:59:25,356][00273] Avg episode reward: [(0, '4.911')] [2024-10-31 19:59:26,457][03757] Updated weights for policy 0, policy_version 180 (0.0013) [2024-10-31 19:59:30,353][00273] Fps is (10 sec: 4096.0, 60 sec: 3550.0, 300 sec: 3259.0). Total num frames: 749568. Throughput: 0: 894.4. Samples: 187492. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2024-10-31 19:59:30,360][00273] Avg episode reward: [(0, '4.922')] [2024-10-31 19:59:35,353][00273] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3259.4). Total num frames: 765952. Throughput: 0: 889.4. Samples: 189402. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-10-31 19:59:35,356][00273] Avg episode reward: [(0, '5.433')] [2024-10-31 19:59:38,306][03757] Updated weights for policy 0, policy_version 190 (0.0026) [2024-10-31 19:59:40,354][00273] Fps is (10 sec: 3686.3, 60 sec: 3549.9, 300 sec: 3276.8). Total num frames: 786432. Throughput: 0: 886.4. Samples: 195396. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-10-31 19:59:40,357][00273] Avg episode reward: [(0, '5.545')] [2024-10-31 19:59:45,353][00273] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3276.8). Total num frames: 802816. Throughput: 0: 893.1. Samples: 200806. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-10-31 19:59:45,359][00273] Avg episode reward: [(0, '5.482')] [2024-10-31 19:59:50,353][00273] Fps is (10 sec: 2867.3, 60 sec: 3481.6, 300 sec: 3260.4). Total num frames: 815104. Throughput: 0: 893.2. Samples: 202770. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-10-31 19:59:50,359][00273] Avg episode reward: [(0, '5.797')] [2024-10-31 19:59:50,370][03744] Saving new best policy, reward=5.797! [2024-10-31 19:59:50,577][03757] Updated weights for policy 0, policy_version 200 (0.0012) [2024-10-31 19:59:55,353][00273] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3276.8). Total num frames: 835584. Throughput: 0: 885.3. Samples: 208506. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-10-31 19:59:55,356][00273] Avg episode reward: [(0, '5.728')] [2024-10-31 20:00:00,353][00273] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3292.6). Total num frames: 856064. Throughput: 0: 896.7. Samples: 214258. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2024-10-31 20:00:00,357][00273] Avg episode reward: [(0, '5.599')] [2024-10-31 20:00:01,268][03757] Updated weights for policy 0, policy_version 210 (0.0012) [2024-10-31 20:00:05,353][00273] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3276.8). Total num frames: 868352. Throughput: 0: 895.5. Samples: 216156. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2024-10-31 20:00:05,356][00273] Avg episode reward: [(0, '5.899')] [2024-10-31 20:00:05,361][03744] Saving new best policy, reward=5.899! [2024-10-31 20:00:10,353][00273] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3292.0). Total num frames: 888832. Throughput: 0: 886.4. Samples: 221762. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-10-31 20:00:10,356][00273] Avg episode reward: [(0, '6.098')] [2024-10-31 20:00:10,363][03744] Saving new best policy, reward=6.098! [2024-10-31 20:00:12,409][03757] Updated weights for policy 0, policy_version 220 (0.0012) [2024-10-31 20:00:15,353][00273] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3306.6). Total num frames: 909312. Throughput: 0: 895.8. Samples: 227804. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-10-31 20:00:15,356][00273] Avg episode reward: [(0, '6.040')] [2024-10-31 20:00:20,353][00273] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3291.4). Total num frames: 921600. Throughput: 0: 897.6. Samples: 229796. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-10-31 20:00:20,361][00273] Avg episode reward: [(0, '6.065')] [2024-10-31 20:00:24,361][03757] Updated weights for policy 0, policy_version 230 (0.0012) [2024-10-31 20:00:25,353][00273] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3319.9). Total num frames: 946176. Throughput: 0: 886.7. Samples: 235296. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-10-31 20:00:25,356][00273] Avg episode reward: [(0, '5.867')] [2024-10-31 20:00:30,360][00273] Fps is (10 sec: 4502.6, 60 sec: 3617.7, 300 sec: 3333.2). Total num frames: 966656. Throughput: 0: 905.6. Samples: 241564. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-10-31 20:00:30,363][00273] Avg episode reward: [(0, '6.431')] [2024-10-31 20:00:30,373][03744] Saving new best policy, reward=6.431! [2024-10-31 20:00:35,353][00273] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3318.5). Total num frames: 978944. Throughput: 0: 906.6. Samples: 243566. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-10-31 20:00:35,363][00273] Avg episode reward: [(0, '6.257')] [2024-10-31 20:00:36,807][03757] Updated weights for policy 0, policy_version 240 (0.0013) [2024-10-31 20:00:40,353][00273] Fps is (10 sec: 2459.3, 60 sec: 3413.4, 300 sec: 3360.1). Total num frames: 991232. Throughput: 0: 863.9. Samples: 247380. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-10-31 20:00:40,356][00273] Avg episode reward: [(0, '6.756')] [2024-10-31 20:00:40,364][03744] Saving new best policy, reward=6.756! [2024-10-31 20:00:45,354][00273] Fps is (10 sec: 3276.7, 60 sec: 3481.6, 300 sec: 3429.5). Total num frames: 1011712. Throughput: 0: 855.9. Samples: 252774. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-10-31 20:00:45,357][00273] Avg episode reward: [(0, '6.823')] [2024-10-31 20:00:45,359][03744] Saving new best policy, reward=6.823! [2024-10-31 20:00:49,568][03757] Updated weights for policy 0, policy_version 250 (0.0011) [2024-10-31 20:00:50,354][00273] Fps is (10 sec: 3276.5, 60 sec: 3481.5, 300 sec: 3443.4). Total num frames: 1024000. Throughput: 0: 866.7. Samples: 255160. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-10-31 20:00:50,357][00273] Avg episode reward: [(0, '6.569')] [2024-10-31 20:00:55,353][00273] Fps is (10 sec: 3277.0, 60 sec: 3481.6, 300 sec: 3443.4). Total num frames: 1044480. Throughput: 0: 848.8. Samples: 259958. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-10-31 20:00:55,358][00273] Avg episode reward: [(0, '6.519')] [2024-10-31 20:00:59,946][03757] Updated weights for policy 0, policy_version 260 (0.0016) [2024-10-31 20:01:00,353][00273] Fps is (10 sec: 4096.4, 60 sec: 3481.6, 300 sec: 3443.4). Total num frames: 1064960. Throughput: 0: 855.6. Samples: 266306. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-10-31 20:01:00,358][00273] Avg episode reward: [(0, '7.251')] [2024-10-31 20:01:00,369][03744] Saving new best policy, reward=7.251! [2024-10-31 20:01:05,353][00273] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3443.4). Total num frames: 1077248. Throughput: 0: 868.8. Samples: 268890. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-10-31 20:01:05,358][00273] Avg episode reward: [(0, '7.729')] [2024-10-31 20:01:05,366][03744] Saving new best policy, reward=7.729! [2024-10-31 20:01:10,353][00273] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3457.3). Total num frames: 1097728. Throughput: 0: 849.3. Samples: 273516. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-10-31 20:01:10,356][00273] Avg episode reward: [(0, '7.304')] [2024-10-31 20:01:12,061][03757] Updated weights for policy 0, policy_version 270 (0.0012) [2024-10-31 20:01:15,353][00273] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 3443.4). Total num frames: 1118208. Throughput: 0: 847.1. Samples: 279678. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-10-31 20:01:15,355][00273] Avg episode reward: [(0, '6.625')] [2024-10-31 20:01:20,353][00273] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3457.3). Total num frames: 1134592. Throughput: 0: 864.0. Samples: 282448. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-10-31 20:01:20,356][00273] Avg episode reward: [(0, '6.263')] [2024-10-31 20:01:20,364][03744] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000277_1134592.pth... [2024-10-31 20:01:20,490][03744] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000073_299008.pth [2024-10-31 20:01:23,914][03757] Updated weights for policy 0, policy_version 280 (0.0018) [2024-10-31 20:01:25,353][00273] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3457.3). Total num frames: 1150976. Throughput: 0: 878.0. Samples: 286892. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-10-31 20:01:25,356][00273] Avg episode reward: [(0, '6.499')] [2024-10-31 20:01:30,353][00273] Fps is (10 sec: 3686.4, 60 sec: 3413.7, 300 sec: 3457.3). Total num frames: 1171456. Throughput: 0: 895.8. Samples: 293086. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-10-31 20:01:30,358][00273] Avg episode reward: [(0, '6.476')] [2024-10-31 20:01:34,808][03757] Updated weights for policy 0, policy_version 290 (0.0032) [2024-10-31 20:01:35,353][00273] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3457.3). Total num frames: 1187840. Throughput: 0: 906.9. Samples: 295968. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-10-31 20:01:35,356][00273] Avg episode reward: [(0, '6.136')] [2024-10-31 20:01:40,353][00273] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3457.3). Total num frames: 1204224. Throughput: 0: 895.8. Samples: 300270. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-10-31 20:01:40,355][00273] Avg episode reward: [(0, '6.180')] [2024-10-31 20:01:45,353][00273] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3471.2). Total num frames: 1224704. Throughput: 0: 893.8. Samples: 306526. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-10-31 20:01:45,358][00273] Avg episode reward: [(0, '6.796')] [2024-10-31 20:01:45,757][03757] Updated weights for policy 0, policy_version 300 (0.0014) [2024-10-31 20:01:50,353][00273] Fps is (10 sec: 3686.4, 60 sec: 3618.2, 300 sec: 3485.1). Total num frames: 1241088. Throughput: 0: 904.4. Samples: 309588. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-10-31 20:01:50,359][00273] Avg episode reward: [(0, '7.772')] [2024-10-31 20:01:50,373][03744] Saving new best policy, reward=7.772! [2024-10-31 20:01:55,353][00273] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3485.1). Total num frames: 1257472. Throughput: 0: 890.7. Samples: 313596. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-10-31 20:01:55,356][00273] Avg episode reward: [(0, '8.357')] [2024-10-31 20:01:55,361][03744] Saving new best policy, reward=8.357! [2024-10-31 20:01:57,868][03757] Updated weights for policy 0, policy_version 310 (0.0019) [2024-10-31 20:02:00,354][00273] Fps is (10 sec: 3686.2, 60 sec: 3549.8, 300 sec: 3499.0). Total num frames: 1277952. Throughput: 0: 887.7. Samples: 319624. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-10-31 20:02:00,357][00273] Avg episode reward: [(0, '8.726')] [2024-10-31 20:02:00,371][03744] Saving new best policy, reward=8.726! [2024-10-31 20:02:05,355][00273] Fps is (10 sec: 3685.9, 60 sec: 3618.0, 300 sec: 3485.1). Total num frames: 1294336. Throughput: 0: 891.0. Samples: 322546. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-10-31 20:02:05,357][00273] Avg episode reward: [(0, '8.602')] [2024-10-31 20:02:10,176][03757] Updated weights for policy 0, policy_version 320 (0.0012) [2024-10-31 20:02:10,353][00273] Fps is (10 sec: 3277.0, 60 sec: 3549.9, 300 sec: 3499.0). Total num frames: 1310720. Throughput: 0: 883.0. Samples: 326628. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-10-31 20:02:10,360][00273] Avg episode reward: [(0, '7.996')] [2024-10-31 20:02:15,353][00273] Fps is (10 sec: 3686.9, 60 sec: 3549.9, 300 sec: 3499.0). Total num frames: 1331200. Throughput: 0: 881.8. Samples: 332768. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-10-31 20:02:15,358][00273] Avg episode reward: [(0, '7.955')] [2024-10-31 20:02:20,358][00273] Fps is (10 sec: 3684.7, 60 sec: 3549.6, 300 sec: 3498.9). Total num frames: 1347584. Throughput: 0: 884.7. Samples: 335782. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-10-31 20:02:20,360][00273] Avg episode reward: [(0, '8.305')] [2024-10-31 20:02:21,091][03757] Updated weights for policy 0, policy_version 330 (0.0019) [2024-10-31 20:02:25,353][00273] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3499.0). Total num frames: 1363968. Throughput: 0: 879.6. Samples: 339854. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-10-31 20:02:25,360][00273] Avg episode reward: [(0, '8.857')] [2024-10-31 20:02:25,364][03744] Saving new best policy, reward=8.857! [2024-10-31 20:02:30,353][00273] Fps is (10 sec: 3688.1, 60 sec: 3549.9, 300 sec: 3499.0). Total num frames: 1384448. Throughput: 0: 877.4. Samples: 346010. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2024-10-31 20:02:30,358][00273] Avg episode reward: [(0, '10.008')] [2024-10-31 20:02:30,369][03744] Saving new best policy, reward=10.008! [2024-10-31 20:02:32,296][03757] Updated weights for policy 0, policy_version 340 (0.0014) [2024-10-31 20:02:35,353][00273] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3485.1). Total num frames: 1400832. Throughput: 0: 875.0. Samples: 348964. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-10-31 20:02:35,357][00273] Avg episode reward: [(0, '9.977')] [2024-10-31 20:02:40,353][00273] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3499.0). Total num frames: 1417216. Throughput: 0: 885.4. Samples: 353438. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-10-31 20:02:40,357][00273] Avg episode reward: [(0, '9.740')] [2024-10-31 20:02:44,092][03757] Updated weights for policy 0, policy_version 350 (0.0012) [2024-10-31 20:02:45,353][00273] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3499.0). Total num frames: 1437696. Throughput: 0: 884.8. Samples: 359440. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-10-31 20:02:45,356][00273] Avg episode reward: [(0, '8.057')] [2024-10-31 20:02:50,354][00273] Fps is (10 sec: 4095.6, 60 sec: 3618.1, 300 sec: 3498.9). Total num frames: 1458176. Throughput: 0: 886.9. Samples: 362458. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-10-31 20:02:50,359][00273] Avg episode reward: [(0, '8.382')] [2024-10-31 20:02:55,353][00273] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3499.0). Total num frames: 1470464. Throughput: 0: 899.0. Samples: 367082. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-10-31 20:02:55,357][00273] Avg episode reward: [(0, '8.026')] [2024-10-31 20:02:56,198][03757] Updated weights for policy 0, policy_version 360 (0.0021) [2024-10-31 20:03:00,353][00273] Fps is (10 sec: 3277.1, 60 sec: 3549.9, 300 sec: 3499.0). Total num frames: 1490944. Throughput: 0: 890.3. Samples: 372830. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-10-31 20:03:00,360][00273] Avg episode reward: [(0, '8.682')] [2024-10-31 20:03:05,357][00273] Fps is (10 sec: 4094.4, 60 sec: 3618.0, 300 sec: 3498.9). Total num frames: 1511424. Throughput: 0: 892.3. Samples: 375936. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-10-31 20:03:05,361][00273] Avg episode reward: [(0, '8.939')] [2024-10-31 20:03:06,207][03757] Updated weights for policy 0, policy_version 370 (0.0012) [2024-10-31 20:03:10,354][00273] Fps is (10 sec: 3276.6, 60 sec: 3549.8, 300 sec: 3498.9). Total num frames: 1523712. Throughput: 0: 909.1. Samples: 380764. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-10-31 20:03:10,357][00273] Avg episode reward: [(0, '8.664')] [2024-10-31 20:03:15,353][00273] Fps is (10 sec: 3278.1, 60 sec: 3549.9, 300 sec: 3540.6). Total num frames: 1544192. Throughput: 0: 894.2. Samples: 386250. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-10-31 20:03:15,358][00273] Avg episode reward: [(0, '9.078')] [2024-10-31 20:03:17,933][03757] Updated weights for policy 0, policy_version 380 (0.0012) [2024-10-31 20:03:20,353][00273] Fps is (10 sec: 4096.3, 60 sec: 3618.4, 300 sec: 3540.6). Total num frames: 1564672. Throughput: 0: 897.4. Samples: 389348. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-10-31 20:03:20,356][00273] Avg episode reward: [(0, '10.074')] [2024-10-31 20:03:20,368][03744] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000382_1564672.pth... [2024-10-31 20:03:20,473][03744] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000174_712704.pth [2024-10-31 20:03:20,484][03744] Saving new best policy, reward=10.074! [2024-10-31 20:03:25,357][00273] Fps is (10 sec: 3275.6, 60 sec: 3549.6, 300 sec: 3526.7). Total num frames: 1576960. Throughput: 0: 909.2. Samples: 394354. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-10-31 20:03:25,363][00273] Avg episode reward: [(0, '10.859')] [2024-10-31 20:03:25,366][03744] Saving new best policy, reward=10.859! [2024-10-31 20:03:29,861][03757] Updated weights for policy 0, policy_version 390 (0.0015) [2024-10-31 20:03:30,355][00273] Fps is (10 sec: 3276.2, 60 sec: 3549.8, 300 sec: 3540.6). Total num frames: 1597440. Throughput: 0: 893.1. Samples: 399630. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-10-31 20:03:30,364][00273] Avg episode reward: [(0, '11.984')] [2024-10-31 20:03:30,377][03744] Saving new best policy, reward=11.984! [2024-10-31 20:03:35,353][00273] Fps is (10 sec: 4097.6, 60 sec: 3618.1, 300 sec: 3540.6). Total num frames: 1617920. Throughput: 0: 892.1. Samples: 402600. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-10-31 20:03:35,358][00273] Avg episode reward: [(0, '13.300')] [2024-10-31 20:03:35,361][03744] Saving new best policy, reward=13.300! [2024-10-31 20:03:40,353][00273] Fps is (10 sec: 3687.1, 60 sec: 3618.1, 300 sec: 3554.5). Total num frames: 1634304. Throughput: 0: 907.3. Samples: 407910. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-10-31 20:03:40,360][00273] Avg episode reward: [(0, '13.386')] [2024-10-31 20:03:40,369][03744] Saving new best policy, reward=13.386! [2024-10-31 20:03:41,988][03757] Updated weights for policy 0, policy_version 400 (0.0012) [2024-10-31 20:03:45,354][00273] Fps is (10 sec: 3276.7, 60 sec: 3549.9, 300 sec: 3540.6). Total num frames: 1650688. Throughput: 0: 888.5. Samples: 412812. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-10-31 20:03:45,357][00273] Avg episode reward: [(0, '12.122')] [2024-10-31 20:03:50,355][00273] Fps is (10 sec: 3685.8, 60 sec: 3549.8, 300 sec: 3540.6). Total num frames: 1671168. Throughput: 0: 884.5. Samples: 415738. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-10-31 20:03:50,364][00273] Avg episode reward: [(0, '12.567')] [2024-10-31 20:03:52,241][03757] Updated weights for policy 0, policy_version 410 (0.0012) [2024-10-31 20:03:55,353][00273] Fps is (10 sec: 3686.5, 60 sec: 3618.1, 300 sec: 3554.5). Total num frames: 1687552. Throughput: 0: 896.3. Samples: 421098. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-10-31 20:03:55,357][00273] Avg episode reward: [(0, '12.603')] [2024-10-31 20:04:00,353][00273] Fps is (10 sec: 3277.3, 60 sec: 3549.9, 300 sec: 3540.6). Total num frames: 1703936. Throughput: 0: 881.5. Samples: 425918. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-10-31 20:04:00,356][00273] Avg episode reward: [(0, '12.794')] [2024-10-31 20:04:04,201][03757] Updated weights for policy 0, policy_version 420 (0.0013) [2024-10-31 20:04:05,353][00273] Fps is (10 sec: 3686.4, 60 sec: 3550.1, 300 sec: 3554.5). Total num frames: 1724416. Throughput: 0: 881.1. Samples: 428998. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-10-31 20:04:05,356][00273] Avg episode reward: [(0, '14.696')] [2024-10-31 20:04:05,358][03744] Saving new best policy, reward=14.696! [2024-10-31 20:04:10,353][00273] Fps is (10 sec: 3686.4, 60 sec: 3618.2, 300 sec: 3554.5). Total num frames: 1740800. Throughput: 0: 896.0. Samples: 434670. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-10-31 20:04:10,356][00273] Avg episode reward: [(0, '15.189')] [2024-10-31 20:04:10,371][03744] Saving new best policy, reward=15.189! [2024-10-31 20:04:15,354][00273] Fps is (10 sec: 3276.7, 60 sec: 3549.8, 300 sec: 3554.5). Total num frames: 1757184. Throughput: 0: 878.7. Samples: 439172. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-10-31 20:04:15,358][00273] Avg episode reward: [(0, '15.371')] [2024-10-31 20:04:15,361][03744] Saving new best policy, reward=15.371! [2024-10-31 20:04:16,415][03757] Updated weights for policy 0, policy_version 430 (0.0021) [2024-10-31 20:04:20,354][00273] Fps is (10 sec: 3686.3, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 1777664. Throughput: 0: 877.7. Samples: 442096. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-10-31 20:04:20,355][00273] Avg episode reward: [(0, '15.074')] [2024-10-31 20:04:25,353][00273] Fps is (10 sec: 3276.9, 60 sec: 3550.1, 300 sec: 3526.7). Total num frames: 1789952. Throughput: 0: 873.6. Samples: 447222. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2024-10-31 20:04:25,356][00273] Avg episode reward: [(0, '15.209')] [2024-10-31 20:04:29,951][03757] Updated weights for policy 0, policy_version 440 (0.0015) [2024-10-31 20:04:30,353][00273] Fps is (10 sec: 2457.6, 60 sec: 3413.4, 300 sec: 3512.8). Total num frames: 1802240. Throughput: 0: 840.8. Samples: 450646. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-10-31 20:04:30,358][00273] Avg episode reward: [(0, '14.383')] [2024-10-31 20:04:35,353][00273] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3512.8). Total num frames: 1822720. Throughput: 0: 836.7. Samples: 453388. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-10-31 20:04:35,360][00273] Avg episode reward: [(0, '13.789')] [2024-10-31 20:04:40,199][03757] Updated weights for policy 0, policy_version 450 (0.0014) [2024-10-31 20:04:40,353][00273] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 3526.7). Total num frames: 1843200. Throughput: 0: 858.1. Samples: 459712. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-10-31 20:04:40,355][00273] Avg episode reward: [(0, '14.437')] [2024-10-31 20:04:45,353][00273] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3526.7). Total num frames: 1855488. Throughput: 0: 853.2. Samples: 464310. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-10-31 20:04:45,357][00273] Avg episode reward: [(0, '13.683')] [2024-10-31 20:04:50,353][00273] Fps is (10 sec: 3276.8, 60 sec: 3413.4, 300 sec: 3526.7). Total num frames: 1875968. Throughput: 0: 842.0. Samples: 466886. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-10-31 20:04:50,355][00273] Avg episode reward: [(0, '14.414')] [2024-10-31 20:04:51,987][03757] Updated weights for policy 0, policy_version 460 (0.0012) [2024-10-31 20:04:55,354][00273] Fps is (10 sec: 4095.9, 60 sec: 3481.6, 300 sec: 3526.7). Total num frames: 1896448. Throughput: 0: 856.2. Samples: 473200. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-10-31 20:04:55,356][00273] Avg episode reward: [(0, '15.082')] [2024-10-31 20:05:00,353][00273] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3526.7). Total num frames: 1908736. Throughput: 0: 861.3. Samples: 477932. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-10-31 20:05:00,360][00273] Avg episode reward: [(0, '13.762')] [2024-10-31 20:05:03,819][03757] Updated weights for policy 0, policy_version 470 (0.0017) [2024-10-31 20:05:05,353][00273] Fps is (10 sec: 3276.9, 60 sec: 3413.3, 300 sec: 3526.7). Total num frames: 1929216. Throughput: 0: 853.3. Samples: 480496. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-10-31 20:05:05,359][00273] Avg episode reward: [(0, '13.863')] [2024-10-31 20:05:10,353][00273] Fps is (10 sec: 4096.0, 60 sec: 3481.6, 300 sec: 3526.7). Total num frames: 1949696. Throughput: 0: 875.9. Samples: 486636. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-10-31 20:05:10,355][00273] Avg episode reward: [(0, '13.512')] [2024-10-31 20:05:15,106][03757] Updated weights for policy 0, policy_version 480 (0.0012) [2024-10-31 20:05:15,354][00273] Fps is (10 sec: 3686.3, 60 sec: 3481.6, 300 sec: 3540.6). Total num frames: 1966080. Throughput: 0: 910.4. Samples: 491614. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2024-10-31 20:05:15,361][00273] Avg episode reward: [(0, '14.135')] [2024-10-31 20:05:20,353][00273] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3512.8). Total num frames: 1982464. Throughput: 0: 899.1. Samples: 493848. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-10-31 20:05:20,361][00273] Avg episode reward: [(0, '14.736')] [2024-10-31 20:05:20,373][03744] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000484_1982464.pth... [2024-10-31 20:05:20,378][00273] Components not started: RolloutWorker_w2, RolloutWorker_w3, RolloutWorker_w4, RolloutWorker_w7, wait_time=600.0 seconds [2024-10-31 20:05:20,473][03744] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000277_1134592.pth [2024-10-31 20:05:25,353][00273] Fps is (10 sec: 3686.5, 60 sec: 3549.9, 300 sec: 3512.9). Total num frames: 2002944. Throughput: 0: 895.6. Samples: 500014. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-10-31 20:05:25,356][00273] Avg episode reward: [(0, '15.135')] [2024-10-31 20:05:25,677][03757] Updated weights for policy 0, policy_version 490 (0.0014) [2024-10-31 20:05:30,358][00273] Fps is (10 sec: 3684.7, 60 sec: 3617.8, 300 sec: 3526.7). Total num frames: 2019328. Throughput: 0: 906.8. Samples: 505120. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-10-31 20:05:30,368][00273] Avg episode reward: [(0, '15.486')] [2024-10-31 20:05:30,376][03744] Saving new best policy, reward=15.486! [2024-10-31 20:05:35,353][00273] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3540.6). Total num frames: 2035712. Throughput: 0: 895.7. Samples: 507194. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-10-31 20:05:35,359][00273] Avg episode reward: [(0, '16.378')] [2024-10-31 20:05:35,362][03744] Saving new best policy, reward=16.378! [2024-10-31 20:05:37,619][03757] Updated weights for policy 0, policy_version 500 (0.0016) [2024-10-31 20:05:40,354][00273] Fps is (10 sec: 3688.1, 60 sec: 3549.9, 300 sec: 3540.6). Total num frames: 2056192. Throughput: 0: 895.7. Samples: 513508. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-10-31 20:05:40,358][00273] Avg episode reward: [(0, '15.418')] [2024-10-31 20:05:45,358][00273] Fps is (10 sec: 3684.6, 60 sec: 3617.8, 300 sec: 3554.4). Total num frames: 2072576. Throughput: 0: 906.5. Samples: 518730. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2024-10-31 20:05:45,361][00273] Avg episode reward: [(0, '15.665')] [2024-10-31 20:05:49,664][03757] Updated weights for policy 0, policy_version 510 (0.0030) [2024-10-31 20:05:50,353][00273] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3540.6). Total num frames: 2088960. Throughput: 0: 891.6. Samples: 520618. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2024-10-31 20:05:50,361][00273] Avg episode reward: [(0, '16.708')] [2024-10-31 20:05:50,375][03744] Saving new best policy, reward=16.708! [2024-10-31 20:05:55,353][00273] Fps is (10 sec: 3688.2, 60 sec: 3549.9, 300 sec: 3540.6). Total num frames: 2109440. Throughput: 0: 893.2. Samples: 526828. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-10-31 20:05:55,356][00273] Avg episode reward: [(0, '17.221')] [2024-10-31 20:05:55,359][03744] Saving new best policy, reward=17.221! [2024-10-31 20:06:00,353][00273] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3554.5). Total num frames: 2125824. Throughput: 0: 903.7. Samples: 532278. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-10-31 20:06:00,356][00273] Avg episode reward: [(0, '18.211')] [2024-10-31 20:06:00,367][03744] Saving new best policy, reward=18.211! [2024-10-31 20:06:00,715][03757] Updated weights for policy 0, policy_version 520 (0.0012) [2024-10-31 20:06:05,353][00273] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3540.6). Total num frames: 2142208. Throughput: 0: 894.5. Samples: 534102. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-10-31 20:06:05,356][00273] Avg episode reward: [(0, '18.553')] [2024-10-31 20:06:05,359][03744] Saving new best policy, reward=18.553! [2024-10-31 20:06:10,353][00273] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3540.6). Total num frames: 2162688. Throughput: 0: 894.1. Samples: 540250. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-10-31 20:06:10,356][00273] Avg episode reward: [(0, '19.506')] [2024-10-31 20:06:10,439][03744] Saving new best policy, reward=19.506! [2024-10-31 20:06:11,438][03757] Updated weights for policy 0, policy_version 530 (0.0013) [2024-10-31 20:06:15,353][00273] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3554.5). Total num frames: 2183168. Throughput: 0: 908.4. Samples: 545992. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-10-31 20:06:15,356][00273] Avg episode reward: [(0, '18.614')] [2024-10-31 20:06:20,354][00273] Fps is (10 sec: 3686.3, 60 sec: 3618.1, 300 sec: 3554.5). Total num frames: 2199552. Throughput: 0: 905.6. Samples: 547948. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-10-31 20:06:20,358][00273] Avg episode reward: [(0, '19.149')] [2024-10-31 20:06:23,096][03757] Updated weights for policy 0, policy_version 540 (0.0019) [2024-10-31 20:06:25,353][00273] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3554.5). Total num frames: 2220032. Throughput: 0: 900.0. Samples: 554008. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-10-31 20:06:25,356][00273] Avg episode reward: [(0, '18.874')] [2024-10-31 20:06:30,359][00273] Fps is (10 sec: 3684.4, 60 sec: 3618.1, 300 sec: 3554.4). Total num frames: 2236416. Throughput: 0: 914.7. Samples: 559894. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-10-31 20:06:30,373][00273] Avg episode reward: [(0, '17.509')] [2024-10-31 20:06:34,866][03757] Updated weights for policy 0, policy_version 550 (0.0018) [2024-10-31 20:06:35,353][00273] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3554.5). Total num frames: 2252800. Throughput: 0: 916.2. Samples: 561846. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-10-31 20:06:35,359][00273] Avg episode reward: [(0, '17.702')] [2024-10-31 20:06:40,353][00273] Fps is (10 sec: 3688.5, 60 sec: 3618.1, 300 sec: 3554.5). Total num frames: 2273280. Throughput: 0: 908.8. Samples: 567726. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-10-31 20:06:40,356][00273] Avg episode reward: [(0, '17.740')] [2024-10-31 20:06:44,644][03757] Updated weights for policy 0, policy_version 560 (0.0012) [2024-10-31 20:06:45,353][00273] Fps is (10 sec: 4096.0, 60 sec: 3686.7, 300 sec: 3568.4). Total num frames: 2293760. Throughput: 0: 923.6. Samples: 573842. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-10-31 20:06:45,362][00273] Avg episode reward: [(0, '17.685')] [2024-10-31 20:06:50,353][00273] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3568.4). Total num frames: 2310144. Throughput: 0: 928.0. Samples: 575862. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-10-31 20:06:50,358][00273] Avg episode reward: [(0, '16.796')] [2024-10-31 20:06:55,353][00273] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3568.4). Total num frames: 2330624. Throughput: 0: 916.4. Samples: 581488. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-10-31 20:06:55,360][00273] Avg episode reward: [(0, '18.346')] [2024-10-31 20:06:56,152][03757] Updated weights for policy 0, policy_version 570 (0.0012) [2024-10-31 20:07:00,353][00273] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3582.3). Total num frames: 2351104. Throughput: 0: 929.1. Samples: 587800. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-10-31 20:07:00,357][00273] Avg episode reward: [(0, '17.835')] [2024-10-31 20:07:05,353][00273] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3568.4). Total num frames: 2363392. Throughput: 0: 928.6. Samples: 589734. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-10-31 20:07:05,355][00273] Avg episode reward: [(0, '18.186')] [2024-10-31 20:07:07,905][03757] Updated weights for policy 0, policy_version 580 (0.0013) [2024-10-31 20:07:10,353][00273] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3568.4). Total num frames: 2383872. Throughput: 0: 913.9. Samples: 595134. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2024-10-31 20:07:10,359][00273] Avg episode reward: [(0, '18.246')] [2024-10-31 20:07:15,353][00273] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3582.3). Total num frames: 2404352. Throughput: 0: 921.4. Samples: 601352. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-10-31 20:07:15,356][00273] Avg episode reward: [(0, '18.586')] [2024-10-31 20:07:19,393][03757] Updated weights for policy 0, policy_version 590 (0.0014) [2024-10-31 20:07:20,354][00273] Fps is (10 sec: 3276.5, 60 sec: 3618.1, 300 sec: 3568.4). Total num frames: 2416640. Throughput: 0: 922.5. Samples: 603360. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2024-10-31 20:07:20,356][00273] Avg episode reward: [(0, '18.160')] [2024-10-31 20:07:20,371][03744] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000590_2416640.pth... [2024-10-31 20:07:20,487][03744] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000382_1564672.pth [2024-10-31 20:07:25,353][00273] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3568.4). Total num frames: 2437120. Throughput: 0: 905.2. Samples: 608462. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2024-10-31 20:07:25,356][00273] Avg episode reward: [(0, '18.386')] [2024-10-31 20:07:29,754][03757] Updated weights for policy 0, policy_version 600 (0.0015) [2024-10-31 20:07:30,353][00273] Fps is (10 sec: 4096.3, 60 sec: 3686.8, 300 sec: 3582.3). Total num frames: 2457600. Throughput: 0: 909.3. Samples: 614760. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-10-31 20:07:30,360][00273] Avg episode reward: [(0, '17.945')] [2024-10-31 20:07:35,354][00273] Fps is (10 sec: 3276.4, 60 sec: 3618.1, 300 sec: 3568.4). Total num frames: 2469888. Throughput: 0: 914.6. Samples: 617020. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-10-31 20:07:35,356][00273] Avg episode reward: [(0, '18.449')] [2024-10-31 20:07:40,353][00273] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3568.4). Total num frames: 2490368. Throughput: 0: 901.0. Samples: 622032. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-10-31 20:07:40,358][00273] Avg episode reward: [(0, '18.893')] [2024-10-31 20:07:41,432][03757] Updated weights for policy 0, policy_version 610 (0.0012) [2024-10-31 20:07:45,353][00273] Fps is (10 sec: 3686.8, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 2506752. Throughput: 0: 876.0. Samples: 627218. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-10-31 20:07:45,356][00273] Avg episode reward: [(0, '18.325')] [2024-10-31 20:07:50,356][00273] Fps is (10 sec: 2457.0, 60 sec: 3413.2, 300 sec: 3540.6). Total num frames: 2514944. Throughput: 0: 855.8. Samples: 628246. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-10-31 20:07:50,358][00273] Avg episode reward: [(0, '19.272')] [2024-10-31 20:07:55,353][00273] Fps is (10 sec: 2867.2, 60 sec: 3413.3, 300 sec: 3540.6). Total num frames: 2535424. Throughput: 0: 840.3. Samples: 632946. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2024-10-31 20:07:55,361][00273] Avg episode reward: [(0, '19.658')] [2024-10-31 20:07:55,364][03744] Saving new best policy, reward=19.658! [2024-10-31 20:07:55,802][03757] Updated weights for policy 0, policy_version 620 (0.0012) [2024-10-31 20:08:00,353][00273] Fps is (10 sec: 4096.9, 60 sec: 3413.3, 300 sec: 3540.7). Total num frames: 2555904. Throughput: 0: 842.3. Samples: 639256. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-10-31 20:08:00,359][00273] Avg episode reward: [(0, '19.939')] [2024-10-31 20:08:00,371][03744] Saving new best policy, reward=19.939! [2024-10-31 20:08:05,355][00273] Fps is (10 sec: 3685.6, 60 sec: 3481.5, 300 sec: 3554.5). Total num frames: 2572288. Throughput: 0: 853.0. Samples: 641744. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-10-31 20:08:05,363][00273] Avg episode reward: [(0, '19.511')] [2024-10-31 20:08:08,280][03757] Updated weights for policy 0, policy_version 630 (0.0020) [2024-10-31 20:08:10,355][00273] Fps is (10 sec: 2866.9, 60 sec: 3345.0, 300 sec: 3526.7). Total num frames: 2584576. Throughput: 0: 828.8. Samples: 645760. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-10-31 20:08:10,360][00273] Avg episode reward: [(0, '18.902')] [2024-10-31 20:08:15,353][00273] Fps is (10 sec: 2867.8, 60 sec: 3276.8, 300 sec: 3512.8). Total num frames: 2600960. Throughput: 0: 795.0. Samples: 650534. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-10-31 20:08:15,361][00273] Avg episode reward: [(0, '20.642')] [2024-10-31 20:08:15,364][03744] Saving new best policy, reward=20.642! [2024-10-31 20:08:20,353][00273] Fps is (10 sec: 3277.2, 60 sec: 3345.1, 300 sec: 3526.8). Total num frames: 2617344. Throughput: 0: 808.7. Samples: 653410. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-10-31 20:08:20,357][00273] Avg episode reward: [(0, '19.701')] [2024-10-31 20:08:21,253][03757] Updated weights for policy 0, policy_version 640 (0.0015) [2024-10-31 20:08:25,353][00273] Fps is (10 sec: 3276.8, 60 sec: 3276.8, 300 sec: 3512.9). Total num frames: 2633728. Throughput: 0: 792.8. Samples: 657710. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-10-31 20:08:25,355][00273] Avg episode reward: [(0, '18.483')] [2024-10-31 20:08:30,354][00273] Fps is (10 sec: 3686.3, 60 sec: 3276.8, 300 sec: 3512.8). Total num frames: 2654208. Throughput: 0: 818.7. Samples: 664058. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-10-31 20:08:30,356][00273] Avg episode reward: [(0, '18.882')] [2024-10-31 20:08:31,586][03757] Updated weights for policy 0, policy_version 650 (0.0012) [2024-10-31 20:08:35,354][00273] Fps is (10 sec: 3686.3, 60 sec: 3345.1, 300 sec: 3512.8). Total num frames: 2670592. Throughput: 0: 863.8. Samples: 667116. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-10-31 20:08:35,361][00273] Avg episode reward: [(0, '19.575')] [2024-10-31 20:08:40,353][00273] Fps is (10 sec: 3276.9, 60 sec: 3276.8, 300 sec: 3512.8). Total num frames: 2686976. Throughput: 0: 853.9. Samples: 671372. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-10-31 20:08:40,356][00273] Avg episode reward: [(0, '20.558')] [2024-10-31 20:08:43,376][03757] Updated weights for policy 0, policy_version 660 (0.0012) [2024-10-31 20:08:45,353][00273] Fps is (10 sec: 4096.1, 60 sec: 3413.3, 300 sec: 3526.7). Total num frames: 2711552. Throughput: 0: 855.5. Samples: 677752. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-10-31 20:08:45,356][00273] Avg episode reward: [(0, '19.868')] [2024-10-31 20:08:50,353][00273] Fps is (10 sec: 4096.0, 60 sec: 3550.0, 300 sec: 3526.7). Total num frames: 2727936. Throughput: 0: 871.1. Samples: 680940. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-10-31 20:08:50,358][00273] Avg episode reward: [(0, '19.421')] [2024-10-31 20:08:55,112][03757] Updated weights for policy 0, policy_version 670 (0.0013) [2024-10-31 20:08:55,353][00273] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3526.7). Total num frames: 2744320. Throughput: 0: 870.6. Samples: 684938. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-10-31 20:08:55,356][00273] Avg episode reward: [(0, '18.857')] [2024-10-31 20:09:00,353][00273] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3526.7). Total num frames: 2764800. Throughput: 0: 906.1. Samples: 691310. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-10-31 20:09:00,358][00273] Avg episode reward: [(0, '18.810')] [2024-10-31 20:09:05,353][00273] Fps is (10 sec: 3686.4, 60 sec: 3481.7, 300 sec: 3526.7). Total num frames: 2781184. Throughput: 0: 912.4. Samples: 694468. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-10-31 20:09:05,356][00273] Avg episode reward: [(0, '17.916')] [2024-10-31 20:09:05,378][03757] Updated weights for policy 0, policy_version 680 (0.0016) [2024-10-31 20:09:10,353][00273] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3526.7). Total num frames: 2797568. Throughput: 0: 912.1. Samples: 698754. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-10-31 20:09:10,355][00273] Avg episode reward: [(0, '18.247')] [2024-10-31 20:09:15,353][00273] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3526.7). Total num frames: 2818048. Throughput: 0: 910.3. Samples: 705020. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-10-31 20:09:15,359][00273] Avg episode reward: [(0, '19.419')] [2024-10-31 20:09:16,555][03757] Updated weights for policy 0, policy_version 690 (0.0016) [2024-10-31 20:09:20,353][00273] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3554.5). Total num frames: 2838528. Throughput: 0: 911.8. Samples: 708148. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-10-31 20:09:20,360][00273] Avg episode reward: [(0, '19.693')] [2024-10-31 20:09:20,371][03744] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000693_2838528.pth... [2024-10-31 20:09:20,519][03744] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000484_1982464.pth [2024-10-31 20:09:25,353][00273] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3554.5). Total num frames: 2850816. Throughput: 0: 913.5. Samples: 712478. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2024-10-31 20:09:25,364][00273] Avg episode reward: [(0, '19.166')] [2024-10-31 20:09:28,428][03757] Updated weights for policy 0, policy_version 700 (0.0012) [2024-10-31 20:09:30,353][00273] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3554.5). Total num frames: 2871296. Throughput: 0: 905.7. Samples: 718510. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-10-31 20:09:30,356][00273] Avg episode reward: [(0, '19.438')] [2024-10-31 20:09:35,353][00273] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3554.5). Total num frames: 2891776. Throughput: 0: 904.1. Samples: 721624. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-10-31 20:09:35,356][00273] Avg episode reward: [(0, '19.057')] [2024-10-31 20:09:40,353][00273] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3554.5). Total num frames: 2904064. Throughput: 0: 919.2. Samples: 726304. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-10-31 20:09:40,355][00273] Avg episode reward: [(0, '18.855')] [2024-10-31 20:09:40,426][03757] Updated weights for policy 0, policy_version 710 (0.0013) [2024-10-31 20:09:45,353][00273] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3568.4). Total num frames: 2928640. Throughput: 0: 904.8. Samples: 732026. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-10-31 20:09:45,357][00273] Avg episode reward: [(0, '19.485')] [2024-10-31 20:09:50,200][03757] Updated weights for policy 0, policy_version 720 (0.0013) [2024-10-31 20:09:50,354][00273] Fps is (10 sec: 4505.5, 60 sec: 3686.4, 300 sec: 3568.4). Total num frames: 2949120. Throughput: 0: 903.9. Samples: 735142. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-10-31 20:09:50,360][00273] Avg episode reward: [(0, '19.896')] [2024-10-31 20:09:55,353][00273] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3568.4). Total num frames: 2961408. Throughput: 0: 916.3. Samples: 739986. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-10-31 20:09:55,356][00273] Avg episode reward: [(0, '20.624')] [2024-10-31 20:10:00,353][00273] Fps is (10 sec: 3276.9, 60 sec: 3618.1, 300 sec: 3568.4). Total num frames: 2981888. Throughput: 0: 900.8. Samples: 745558. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-10-31 20:10:00,359][00273] Avg episode reward: [(0, '20.613')] [2024-10-31 20:10:01,925][03757] Updated weights for policy 0, policy_version 730 (0.0018) [2024-10-31 20:10:05,353][00273] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3568.4). Total num frames: 3002368. Throughput: 0: 901.6. Samples: 748720. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-10-31 20:10:05,360][00273] Avg episode reward: [(0, '20.428')] [2024-10-31 20:10:10,354][00273] Fps is (10 sec: 3276.7, 60 sec: 3618.1, 300 sec: 3554.5). Total num frames: 3014656. Throughput: 0: 919.9. Samples: 753872. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-10-31 20:10:10,359][00273] Avg episode reward: [(0, '20.183')] [2024-10-31 20:10:13,562][03757] Updated weights for policy 0, policy_version 740 (0.0025) [2024-10-31 20:10:15,353][00273] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3568.4). Total num frames: 3035136. Throughput: 0: 905.3. Samples: 759250. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-10-31 20:10:15,358][00273] Avg episode reward: [(0, '18.538')] [2024-10-31 20:10:20,353][00273] Fps is (10 sec: 4096.1, 60 sec: 3618.1, 300 sec: 3568.4). Total num frames: 3055616. Throughput: 0: 906.0. Samples: 762396. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-10-31 20:10:20,356][00273] Avg episode reward: [(0, '18.175')] [2024-10-31 20:10:24,770][03757] Updated weights for policy 0, policy_version 750 (0.0013) [2024-10-31 20:10:25,355][00273] Fps is (10 sec: 3685.7, 60 sec: 3686.3, 300 sec: 3568.4). Total num frames: 3072000. Throughput: 0: 916.7. Samples: 767558. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-10-31 20:10:25,357][00273] Avg episode reward: [(0, '18.106')] [2024-10-31 20:10:30,353][00273] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3568.4). Total num frames: 3088384. Throughput: 0: 903.7. Samples: 772692. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-10-31 20:10:30,357][00273] Avg episode reward: [(0, '18.720')] [2024-10-31 20:10:35,348][03757] Updated weights for policy 0, policy_version 760 (0.0012) [2024-10-31 20:10:35,356][00273] Fps is (10 sec: 4095.5, 60 sec: 3686.2, 300 sec: 3582.2). Total num frames: 3112960. Throughput: 0: 901.8. Samples: 775724. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-10-31 20:10:35,360][00273] Avg episode reward: [(0, '18.444')] [2024-10-31 20:10:40,355][00273] Fps is (10 sec: 3685.7, 60 sec: 3686.3, 300 sec: 3568.4). Total num frames: 3125248. Throughput: 0: 918.9. Samples: 781336. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-10-31 20:10:40,358][00273] Avg episode reward: [(0, '18.764')] [2024-10-31 20:10:45,353][00273] Fps is (10 sec: 3277.8, 60 sec: 3618.1, 300 sec: 3582.3). Total num frames: 3145728. Throughput: 0: 906.4. Samples: 786348. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-10-31 20:10:45,356][00273] Avg episode reward: [(0, '19.203')] [2024-10-31 20:10:47,004][03757] Updated weights for policy 0, policy_version 770 (0.0024) [2024-10-31 20:10:50,353][00273] Fps is (10 sec: 4096.7, 60 sec: 3618.1, 300 sec: 3582.3). Total num frames: 3166208. Throughput: 0: 907.0. Samples: 789534. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2024-10-31 20:10:50,360][00273] Avg episode reward: [(0, '19.899')] [2024-10-31 20:10:55,355][00273] Fps is (10 sec: 3685.9, 60 sec: 3686.3, 300 sec: 3582.2). Total num frames: 3182592. Throughput: 0: 916.0. Samples: 795092. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-10-31 20:10:55,363][00273] Avg episode reward: [(0, '20.002')] [2024-10-31 20:10:58,817][03757] Updated weights for policy 0, policy_version 780 (0.0012) [2024-10-31 20:11:00,353][00273] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3582.3). Total num frames: 3198976. Throughput: 0: 905.9. Samples: 800014. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-10-31 20:11:00,361][00273] Avg episode reward: [(0, '19.460')] [2024-10-31 20:11:05,353][00273] Fps is (10 sec: 3686.9, 60 sec: 3618.1, 300 sec: 3582.3). Total num frames: 3219456. Throughput: 0: 905.0. Samples: 803122. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-10-31 20:11:05,360][00273] Avg episode reward: [(0, '21.819')] [2024-10-31 20:11:05,364][03744] Saving new best policy, reward=21.819! [2024-10-31 20:11:09,595][03757] Updated weights for policy 0, policy_version 790 (0.0013) [2024-10-31 20:11:10,354][00273] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3568.4). Total num frames: 3235840. Throughput: 0: 912.9. Samples: 808638. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-10-31 20:11:10,359][00273] Avg episode reward: [(0, '23.291')] [2024-10-31 20:11:10,373][03744] Saving new best policy, reward=23.291! [2024-10-31 20:11:15,353][00273] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 3248128. Throughput: 0: 887.3. Samples: 812622. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-10-31 20:11:15,360][00273] Avg episode reward: [(0, '22.919')] [2024-10-31 20:11:20,353][00273] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 3268608. Throughput: 0: 882.6. Samples: 815440. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-10-31 20:11:20,356][00273] Avg episode reward: [(0, '22.990')] [2024-10-31 20:11:20,375][03744] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000798_3268608.pth... [2024-10-31 20:11:20,482][03744] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000590_2416640.pth [2024-10-31 20:11:22,274][03757] Updated weights for policy 0, policy_version 800 (0.0014) [2024-10-31 20:11:25,354][00273] Fps is (10 sec: 3686.1, 60 sec: 3549.9, 300 sec: 3554.6). Total num frames: 3284992. Throughput: 0: 877.0. Samples: 820798. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-10-31 20:11:25,357][00273] Avg episode reward: [(0, '20.824')] [2024-10-31 20:11:30,355][00273] Fps is (10 sec: 2866.9, 60 sec: 3481.5, 300 sec: 3540.6). Total num frames: 3297280. Throughput: 0: 849.7. Samples: 824586. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2024-10-31 20:11:30,364][00273] Avg episode reward: [(0, '21.933')] [2024-10-31 20:11:35,025][03757] Updated weights for policy 0, policy_version 810 (0.0020) [2024-10-31 20:11:35,353][00273] Fps is (10 sec: 3277.1, 60 sec: 3413.5, 300 sec: 3540.6). Total num frames: 3317760. Throughput: 0: 845.7. Samples: 827590. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-10-31 20:11:35,356][00273] Avg episode reward: [(0, '21.437')] [2024-10-31 20:11:40,353][00273] Fps is (10 sec: 3686.8, 60 sec: 3481.7, 300 sec: 3526.7). Total num frames: 3334144. Throughput: 0: 855.0. Samples: 833566. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-10-31 20:11:40,360][00273] Avg episode reward: [(0, '21.451')] [2024-10-31 20:11:45,353][00273] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3526.7). Total num frames: 3350528. Throughput: 0: 836.1. Samples: 837640. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-10-31 20:11:45,356][00273] Avg episode reward: [(0, '21.815')] [2024-10-31 20:11:47,081][03757] Updated weights for policy 0, policy_version 820 (0.0013) [2024-10-31 20:11:50,356][00273] Fps is (10 sec: 3685.4, 60 sec: 3413.2, 300 sec: 3526.7). Total num frames: 3371008. Throughput: 0: 835.3. Samples: 840712. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-10-31 20:11:50,361][00273] Avg episode reward: [(0, '23.875')] [2024-10-31 20:11:50,369][03744] Saving new best policy, reward=23.875! [2024-10-31 20:11:55,353][00273] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3499.0). Total num frames: 3383296. Throughput: 0: 826.7. Samples: 845838. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-10-31 20:11:55,356][00273] Avg episode reward: [(0, '24.833')] [2024-10-31 20:11:55,362][03744] Saving new best policy, reward=24.833! [2024-10-31 20:12:00,353][00273] Fps is (10 sec: 2458.3, 60 sec: 3276.8, 300 sec: 3499.0). Total num frames: 3395584. Throughput: 0: 818.3. Samples: 849444. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-10-31 20:12:00,357][00273] Avg episode reward: [(0, '26.254')] [2024-10-31 20:12:00,369][03744] Saving new best policy, reward=26.254! [2024-10-31 20:12:00,908][03757] Updated weights for policy 0, policy_version 830 (0.0024) [2024-10-31 20:12:05,353][00273] Fps is (10 sec: 3276.8, 60 sec: 3276.8, 300 sec: 3499.0). Total num frames: 3416064. Throughput: 0: 813.8. Samples: 852062. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2024-10-31 20:12:05,356][00273] Avg episode reward: [(0, '24.858')] [2024-10-31 20:12:10,353][00273] Fps is (10 sec: 4096.0, 60 sec: 3345.1, 300 sec: 3499.0). Total num frames: 3436544. Throughput: 0: 833.7. Samples: 858316. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2024-10-31 20:12:10,359][00273] Avg episode reward: [(0, '24.902')] [2024-10-31 20:12:10,658][03757] Updated weights for policy 0, policy_version 840 (0.0012) [2024-10-31 20:12:15,357][00273] Fps is (10 sec: 3685.0, 60 sec: 3413.1, 300 sec: 3512.8). Total num frames: 3452928. Throughput: 0: 860.7. Samples: 863318. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2024-10-31 20:12:15,359][00273] Avg episode reward: [(0, '22.840')] [2024-10-31 20:12:20,353][00273] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3499.0). Total num frames: 3469312. Throughput: 0: 850.0. Samples: 865838. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-10-31 20:12:20,356][00273] Avg episode reward: [(0, '20.400')] [2024-10-31 20:12:22,359][03757] Updated weights for policy 0, policy_version 850 (0.0012) [2024-10-31 20:12:25,353][00273] Fps is (10 sec: 4097.6, 60 sec: 3481.7, 300 sec: 3512.8). Total num frames: 3493888. Throughput: 0: 857.2. Samples: 872138. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-10-31 20:12:25,356][00273] Avg episode reward: [(0, '20.263')] [2024-10-31 20:12:30,354][00273] Fps is (10 sec: 3686.3, 60 sec: 3481.7, 300 sec: 3512.9). Total num frames: 3506176. Throughput: 0: 878.5. Samples: 877174. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-10-31 20:12:30,356][00273] Avg episode reward: [(0, '19.333')] [2024-10-31 20:12:34,050][03757] Updated weights for policy 0, policy_version 860 (0.0013) [2024-10-31 20:12:35,353][00273] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3512.8). Total num frames: 3526656. Throughput: 0: 861.0. Samples: 879456. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2024-10-31 20:12:35,357][00273] Avg episode reward: [(0, '20.249')] [2024-10-31 20:12:40,354][00273] Fps is (10 sec: 4096.0, 60 sec: 3549.8, 300 sec: 3526.7). Total num frames: 3547136. Throughput: 0: 889.0. Samples: 885842. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-10-31 20:12:40,360][00273] Avg episode reward: [(0, '21.234')] [2024-10-31 20:12:44,810][03757] Updated weights for policy 0, policy_version 870 (0.0013) [2024-10-31 20:12:45,354][00273] Fps is (10 sec: 3686.2, 60 sec: 3549.8, 300 sec: 3554.5). Total num frames: 3563520. Throughput: 0: 924.5. Samples: 891046. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2024-10-31 20:12:45,360][00273] Avg episode reward: [(0, '21.586')] [2024-10-31 20:12:50,355][00273] Fps is (10 sec: 3276.4, 60 sec: 3481.7, 300 sec: 3540.6). Total num frames: 3579904. Throughput: 0: 914.8. Samples: 893228. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-10-31 20:12:50,357][00273] Avg episode reward: [(0, '23.524')] [2024-10-31 20:12:55,353][00273] Fps is (10 sec: 3686.6, 60 sec: 3618.1, 300 sec: 3540.6). Total num frames: 3600384. Throughput: 0: 915.6. Samples: 899518. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-10-31 20:12:55,357][00273] Avg episode reward: [(0, '23.766')] [2024-10-31 20:12:55,541][03757] Updated weights for policy 0, policy_version 880 (0.0013) [2024-10-31 20:13:00,353][00273] Fps is (10 sec: 3687.0, 60 sec: 3686.4, 300 sec: 3540.6). Total num frames: 3616768. Throughput: 0: 926.3. Samples: 904998. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2024-10-31 20:13:00,360][00273] Avg episode reward: [(0, '24.785')] [2024-10-31 20:13:05,353][00273] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3568.4). Total num frames: 3637248. Throughput: 0: 916.5. Samples: 907080. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-10-31 20:13:05,356][00273] Avg episode reward: [(0, '26.458')] [2024-10-31 20:13:05,370][03744] Saving new best policy, reward=26.458! [2024-10-31 20:13:07,107][03757] Updated weights for policy 0, policy_version 890 (0.0022) [2024-10-31 20:13:10,353][00273] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3582.3). Total num frames: 3657728. Throughput: 0: 918.1. Samples: 913454. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-10-31 20:13:10,360][00273] Avg episode reward: [(0, '25.582')] [2024-10-31 20:13:15,354][00273] Fps is (10 sec: 3686.3, 60 sec: 3686.6, 300 sec: 3582.3). Total num frames: 3674112. Throughput: 0: 930.3. Samples: 919038. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-10-31 20:13:15,359][00273] Avg episode reward: [(0, '25.490')] [2024-10-31 20:13:18,651][03757] Updated weights for policy 0, policy_version 900 (0.0029) [2024-10-31 20:13:20,353][00273] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3582.3). Total num frames: 3690496. Throughput: 0: 924.1. Samples: 921042. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-10-31 20:13:20,360][00273] Avg episode reward: [(0, '23.954')] [2024-10-31 20:13:20,371][03744] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000901_3690496.pth... [2024-10-31 20:13:20,472][03744] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000693_2838528.pth [2024-10-31 20:13:25,353][00273] Fps is (10 sec: 3686.5, 60 sec: 3618.1, 300 sec: 3582.3). Total num frames: 3710976. Throughput: 0: 917.7. Samples: 927136. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-10-31 20:13:25,359][00273] Avg episode reward: [(0, '22.876')] [2024-10-31 20:13:28,617][03757] Updated weights for policy 0, policy_version 910 (0.0012) [2024-10-31 20:13:30,354][00273] Fps is (10 sec: 4095.6, 60 sec: 3754.6, 300 sec: 3596.1). Total num frames: 3731456. Throughput: 0: 930.3. Samples: 932910. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-10-31 20:13:30,358][00273] Avg episode reward: [(0, '21.578')] [2024-10-31 20:13:35,354][00273] Fps is (10 sec: 3686.3, 60 sec: 3686.4, 300 sec: 3596.1). Total num frames: 3747840. Throughput: 0: 926.1. Samples: 934900. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-10-31 20:13:35,358][00273] Avg episode reward: [(0, '20.739')] [2024-10-31 20:13:40,064][03757] Updated weights for policy 0, policy_version 920 (0.0017) [2024-10-31 20:13:40,353][00273] Fps is (10 sec: 3686.7, 60 sec: 3686.4, 300 sec: 3582.3). Total num frames: 3768320. Throughput: 0: 922.0. Samples: 941006. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-10-31 20:13:40,356][00273] Avg episode reward: [(0, '20.615')] [2024-10-31 20:13:45,353][00273] Fps is (10 sec: 3686.5, 60 sec: 3686.4, 300 sec: 3582.3). Total num frames: 3784704. Throughput: 0: 930.6. Samples: 946874. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-10-31 20:13:45,357][00273] Avg episode reward: [(0, '20.903')] [2024-10-31 20:13:50,354][00273] Fps is (10 sec: 3276.8, 60 sec: 3686.5, 300 sec: 3582.3). Total num frames: 3801088. Throughput: 0: 929.0. Samples: 948884. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-10-31 20:13:50,357][00273] Avg episode reward: [(0, '22.085')] [2024-10-31 20:13:51,838][03757] Updated weights for policy 0, policy_version 930 (0.0012) [2024-10-31 20:13:55,353][00273] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3582.3). Total num frames: 3821568. Throughput: 0: 913.8. Samples: 954574. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-10-31 20:13:55,360][00273] Avg episode reward: [(0, '23.228')] [2024-10-31 20:14:00,353][00273] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3596.1). Total num frames: 3842048. Throughput: 0: 926.0. Samples: 960706. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-10-31 20:14:00,356][00273] Avg episode reward: [(0, '22.236')] [2024-10-31 20:14:03,041][03757] Updated weights for policy 0, policy_version 940 (0.0018) [2024-10-31 20:14:05,353][00273] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3582.3). Total num frames: 3854336. Throughput: 0: 926.5. Samples: 962734. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-10-31 20:14:05,356][00273] Avg episode reward: [(0, '22.787')] [2024-10-31 20:14:10,353][00273] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3596.1). Total num frames: 3878912. Throughput: 0: 919.3. Samples: 968506. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-10-31 20:14:10,360][00273] Avg episode reward: [(0, '21.567')] [2024-10-31 20:14:13,154][03757] Updated weights for policy 0, policy_version 950 (0.0013) [2024-10-31 20:14:15,353][00273] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3582.3). Total num frames: 3895296. Throughput: 0: 928.4. Samples: 974686. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-10-31 20:14:15,356][00273] Avg episode reward: [(0, '19.789')] [2024-10-31 20:14:20,353][00273] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3596.1). Total num frames: 3911680. Throughput: 0: 929.1. Samples: 976710. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-10-31 20:14:20,356][00273] Avg episode reward: [(0, '19.620')] [2024-10-31 20:14:24,790][03757] Updated weights for policy 0, policy_version 960 (0.0016) [2024-10-31 20:14:25,353][00273] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3596.2). Total num frames: 3932160. Throughput: 0: 917.0. Samples: 982272. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-10-31 20:14:25,359][00273] Avg episode reward: [(0, '20.870')] [2024-10-31 20:14:30,354][00273] Fps is (10 sec: 4095.9, 60 sec: 3686.4, 300 sec: 3596.1). Total num frames: 3952640. Throughput: 0: 925.9. Samples: 988538. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-10-31 20:14:30,360][00273] Avg episode reward: [(0, '21.068')] [2024-10-31 20:14:35,353][00273] Fps is (10 sec: 3276.8, 60 sec: 3618.2, 300 sec: 3596.1). Total num frames: 3964928. Throughput: 0: 925.5. Samples: 990532. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-10-31 20:14:35,359][00273] Avg episode reward: [(0, '21.618')] [2024-10-31 20:14:36,471][03757] Updated weights for policy 0, policy_version 970 (0.0013) [2024-10-31 20:14:40,353][00273] Fps is (10 sec: 3686.5, 60 sec: 3686.4, 300 sec: 3596.1). Total num frames: 3989504. Throughput: 0: 923.5. Samples: 996130. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-10-31 20:14:40,359][00273] Avg episode reward: [(0, '22.882')] [2024-10-31 20:14:45,355][00273] Fps is (10 sec: 4504.8, 60 sec: 3754.6, 300 sec: 3596.1). Total num frames: 4009984. Throughput: 0: 930.7. Samples: 1002590. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-10-31 20:14:45,357][00273] Avg episode reward: [(0, '21.520')] [2024-10-31 20:14:46,493][03757] Updated weights for policy 0, policy_version 980 (0.0014) [2024-10-31 20:14:50,353][00273] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3596.1). Total num frames: 4022272. Throughput: 0: 931.2. Samples: 1004640. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2024-10-31 20:14:50,355][00273] Avg episode reward: [(0, '21.163')] [2024-10-31 20:14:55,353][00273] Fps is (10 sec: 3277.4, 60 sec: 3686.4, 300 sec: 3596.1). Total num frames: 4042752. Throughput: 0: 921.4. Samples: 1009970. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-10-31 20:14:55,358][00273] Avg episode reward: [(0, '20.461')] [2024-10-31 20:14:57,576][03744] Stopping Batcher_0... [2024-10-31 20:14:57,576][03744] Loop batcher_evt_loop terminating... [2024-10-31 20:14:57,579][03744] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000990_4055040.pth... [2024-10-31 20:14:57,578][00273] Component Batcher_0 stopped! [2024-10-31 20:14:57,581][00273] Component RolloutWorker_w2 process died already! Don't wait for it. [2024-10-31 20:14:57,586][00273] Component RolloutWorker_w3 process died already! Don't wait for it. [2024-10-31 20:14:57,589][00273] Component RolloutWorker_w4 process died already! Don't wait for it. [2024-10-31 20:14:57,592][00273] Component RolloutWorker_w7 process died already! Don't wait for it. [2024-10-31 20:14:57,604][03757] Updated weights for policy 0, policy_version 990 (0.0021) [2024-10-31 20:14:57,627][03757] Weights refcount: 2 0 [2024-10-31 20:14:57,631][03757] Stopping InferenceWorker_p0-w0... [2024-10-31 20:14:57,632][03757] Loop inference_proc0-0_evt_loop terminating... [2024-10-31 20:14:57,632][00273] Component InferenceWorker_p0-w0 stopped! [2024-10-31 20:14:57,685][03744] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000798_3268608.pth [2024-10-31 20:14:57,699][03744] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000990_4055040.pth... [2024-10-31 20:14:57,847][00273] Component LearnerWorker_p0 stopped! [2024-10-31 20:14:57,847][03744] Stopping LearnerWorker_p0... [2024-10-31 20:14:57,852][03744] Loop learner_proc0_evt_loop terminating... [2024-10-31 20:14:57,920][03763] Stopping RolloutWorker_w5... [2024-10-31 20:14:57,920][00273] Component RolloutWorker_w5 stopped! [2024-10-31 20:14:57,929][03759] Stopping RolloutWorker_w1... [2024-10-31 20:14:57,930][03763] Loop rollout_proc5_evt_loop terminating... [2024-10-31 20:14:57,929][00273] Component RolloutWorker_w1 stopped! [2024-10-31 20:14:57,931][03759] Loop rollout_proc1_evt_loop terminating... [2024-10-31 20:14:58,018][00273] Component RolloutWorker_w0 stopped! [2024-10-31 20:14:58,019][03758] Stopping RolloutWorker_w0... [2024-10-31 20:14:58,026][03758] Loop rollout_proc0_evt_loop terminating... [2024-10-31 20:14:58,069][00273] Component RolloutWorker_w6 stopped! [2024-10-31 20:14:58,069][03764] Stopping RolloutWorker_w6... [2024-10-31 20:14:58,072][00273] Waiting for process learner_proc0 to stop... [2024-10-31 20:14:58,083][03764] Loop rollout_proc6_evt_loop terminating... [2024-10-31 20:14:59,228][00273] Waiting for process inference_proc0-0 to join... [2024-10-31 20:14:59,345][00273] Waiting for process rollout_proc0 to join... [2024-10-31 20:14:59,907][00273] Waiting for process rollout_proc1 to join... [2024-10-31 20:14:59,912][00273] Waiting for process rollout_proc2 to join... [2024-10-31 20:14:59,915][00273] Waiting for process rollout_proc3 to join... [2024-10-31 20:14:59,917][00273] Waiting for process rollout_proc4 to join... [2024-10-31 20:14:59,919][00273] Waiting for process rollout_proc5 to join... [2024-10-31 20:14:59,925][00273] Waiting for process rollout_proc6 to join... [2024-10-31 20:14:59,929][00273] Waiting for process rollout_proc7 to join... [2024-10-31 20:14:59,931][00273] Batcher 0 profile tree view: batching: 21.7163, releasing_batches: 0.0239 [2024-10-31 20:14:59,932][00273] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0019 wait_policy_total: 497.0076 update_model: 8.8386 weight_update: 0.0021 one_step: 0.0025 handle_policy_step: 602.5132 deserialize: 15.6766, stack: 3.5788, obs_to_device_normalize: 131.5233, forward: 307.1245, send_messages: 23.0432 prepare_outputs: 90.3386 to_cpu: 56.8814 [2024-10-31 20:14:59,933][00273] Learner 0 profile tree view: misc: 0.0062, prepare_batch: 15.1695 train: 69.9771 epoch_init: 0.0164, minibatch_init: 0.0062, losses_postprocess: 0.4794, kl_divergence: 0.4779, after_optimizer: 33.3584 calculate_losses: 22.1077 losses_init: 0.0035, forward_head: 1.5071, bptt_initial: 14.7955, tail: 0.9730, advantages_returns: 0.2391, losses: 2.4373 bptt: 1.8655 bptt_forward_core: 1.7841 update: 13.0236 clip: 1.4075 [2024-10-31 20:14:59,935][00273] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.5294, enqueue_policy_requests: 188.8101, env_step: 795.5349, overhead: 20.7603, complete_rollouts: 6.2970 save_policy_outputs: 34.9887 split_output_tensors: 12.2059 [2024-10-31 20:14:59,936][00273] Loop Runner_EvtLoop terminating... [2024-10-31 20:14:59,938][00273] Runner profile tree view: main_loop: 1176.0936 [2024-10-31 20:14:59,939][00273] Collected {0: 4055040}, FPS: 3447.9 [2024-10-31 20:15:00,395][00273] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-10-31 20:15:00,398][00273] Overriding arg 'num_workers' with value 1 passed from command line [2024-10-31 20:15:00,400][00273] Adding new argument 'no_render'=True that is not in the saved config file! [2024-10-31 20:15:00,403][00273] Adding new argument 'save_video'=True that is not in the saved config file! [2024-10-31 20:15:00,405][00273] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-10-31 20:15:00,407][00273] Adding new argument 'video_name'=None that is not in the saved config file! [2024-10-31 20:15:00,410][00273] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2024-10-31 20:15:00,412][00273] Adding new argument 'max_num_episodes'=15 that is not in the saved config file! [2024-10-31 20:15:00,413][00273] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2024-10-31 20:15:00,414][00273] Adding new argument 'hf_repository'=None that is not in the saved config file! [2024-10-31 20:15:00,416][00273] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-10-31 20:15:00,417][00273] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-10-31 20:15:00,418][00273] Adding new argument 'train_script'=None that is not in the saved config file! [2024-10-31 20:15:00,419][00273] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-10-31 20:15:00,421][00273] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-10-31 20:15:00,434][00273] Doom resolution: 160x120, resize resolution: (128, 72) [2024-10-31 20:15:00,441][00273] RunningMeanStd input shape: (3, 72, 128) [2024-10-31 20:15:00,443][00273] RunningMeanStd input shape: (1,) [2024-10-31 20:15:00,470][00273] ConvEncoder: input_channels=3 [2024-10-31 20:15:00,670][00273] Conv encoder output size: 512 [2024-10-31 20:15:00,675][00273] Policy head output size: 512 [2024-10-31 20:15:02,775][00273] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000990_4055040.pth... [2024-10-31 20:15:03,618][00273] Num frames 100... [2024-10-31 20:15:03,748][00273] Num frames 200... [2024-10-31 20:15:03,864][00273] Num frames 300... [2024-10-31 20:15:03,982][00273] Num frames 400... [2024-10-31 20:15:04,099][00273] Num frames 500... [2024-10-31 20:15:04,244][00273] Avg episode rewards: #0: 9.760, true rewards: #0: 5.760 [2024-10-31 20:15:04,245][00273] Avg episode reward: 9.760, avg true_objective: 5.760 [2024-10-31 20:15:04,276][00273] Num frames 600... [2024-10-31 20:15:04,392][00273] Num frames 700... [2024-10-31 20:15:04,508][00273] Num frames 800... [2024-10-31 20:15:04,632][00273] Num frames 900... [2024-10-31 20:15:04,755][00273] Num frames 1000... [2024-10-31 20:15:04,870][00273] Num frames 1100... [2024-10-31 20:15:04,985][00273] Num frames 1200... [2024-10-31 20:15:05,101][00273] Num frames 1300... [2024-10-31 20:15:05,214][00273] Num frames 1400... [2024-10-31 20:15:05,330][00273] Num frames 1500... [2024-10-31 20:15:05,450][00273] Num frames 1600... [2024-10-31 20:15:05,578][00273] Num frames 1700... [2024-10-31 20:15:05,696][00273] Num frames 1800... [2024-10-31 20:15:05,820][00273] Num frames 1900... [2024-10-31 20:15:05,936][00273] Num frames 2000... [2024-10-31 20:15:06,056][00273] Avg episode rewards: #0: 25.765, true rewards: #0: 10.265 [2024-10-31 20:15:06,057][00273] Avg episode reward: 25.765, avg true_objective: 10.265 [2024-10-31 20:15:06,113][00273] Num frames 2100... [2024-10-31 20:15:06,228][00273] Num frames 2200... [2024-10-31 20:15:06,345][00273] Num frames 2300... [2024-10-31 20:15:06,464][00273] Num frames 2400... [2024-10-31 20:15:06,590][00273] Num frames 2500... [2024-10-31 20:15:06,708][00273] Num frames 2600... [2024-10-31 20:15:06,832][00273] Num frames 2700... [2024-10-31 20:15:06,951][00273] Num frames 2800... [2024-10-31 20:15:07,069][00273] Num frames 2900... [2024-10-31 20:15:07,188][00273] Num frames 3000... [2024-10-31 20:15:07,306][00273] Num frames 3100... [2024-10-31 20:15:07,426][00273] Num frames 3200... [2024-10-31 20:15:07,563][00273] Avg episode rewards: #0: 25.230, true rewards: #0: 10.897 [2024-10-31 20:15:07,565][00273] Avg episode reward: 25.230, avg true_objective: 10.897 [2024-10-31 20:15:07,605][00273] Num frames 3300... [2024-10-31 20:15:07,719][00273] Num frames 3400... [2024-10-31 20:15:07,843][00273] Num frames 3500... [2024-10-31 20:15:07,957][00273] Num frames 3600... [2024-10-31 20:15:08,070][00273] Num frames 3700... [2024-10-31 20:15:08,184][00273] Num frames 3800... [2024-10-31 20:15:08,302][00273] Num frames 3900... [2024-10-31 20:15:08,425][00273] Num frames 4000... [2024-10-31 20:15:08,549][00273] Num frames 4100... [2024-10-31 20:15:08,669][00273] Num frames 4200... [2024-10-31 20:15:08,791][00273] Num frames 4300... [2024-10-31 20:15:08,912][00273] Num frames 4400... [2024-10-31 20:15:09,029][00273] Num frames 4500... [2024-10-31 20:15:09,148][00273] Num frames 4600... [2024-10-31 20:15:09,270][00273] Num frames 4700... [2024-10-31 20:15:09,389][00273] Num frames 4800... [2024-10-31 20:15:09,507][00273] Num frames 4900... [2024-10-31 20:15:09,637][00273] Num frames 5000... [2024-10-31 20:15:09,754][00273] Num frames 5100... [2024-10-31 20:15:09,878][00273] Num frames 5200... [2024-10-31 20:15:09,998][00273] Num frames 5300... [2024-10-31 20:15:10,139][00273] Avg episode rewards: #0: 34.422, true rewards: #0: 13.422 [2024-10-31 20:15:10,140][00273] Avg episode reward: 34.422, avg true_objective: 13.422 [2024-10-31 20:15:10,182][00273] Num frames 5400... [2024-10-31 20:15:10,298][00273] Num frames 5500... [2024-10-31 20:15:10,416][00273] Num frames 5600... [2024-10-31 20:15:10,542][00273] Num frames 5700... [2024-10-31 20:15:10,660][00273] Num frames 5800... [2024-10-31 20:15:10,775][00273] Num frames 5900... [2024-10-31 20:15:10,899][00273] Num frames 6000... [2024-10-31 20:15:11,020][00273] Num frames 6100... [2024-10-31 20:15:11,136][00273] Num frames 6200... [2024-10-31 20:15:11,250][00273] Num frames 6300... [2024-10-31 20:15:11,375][00273] Avg episode rewards: #0: 31.522, true rewards: #0: 12.722 [2024-10-31 20:15:11,376][00273] Avg episode reward: 31.522, avg true_objective: 12.722 [2024-10-31 20:15:11,427][00273] Num frames 6400... [2024-10-31 20:15:11,548][00273] Num frames 6500... [2024-10-31 20:15:11,666][00273] Num frames 6600... [2024-10-31 20:15:11,783][00273] Num frames 6700... [2024-10-31 20:15:11,906][00273] Num frames 6800... [2024-10-31 20:15:12,026][00273] Num frames 6900... [2024-10-31 20:15:12,157][00273] Avg episode rewards: #0: 28.275, true rewards: #0: 11.608 [2024-10-31 20:15:12,159][00273] Avg episode reward: 28.275, avg true_objective: 11.608 [2024-10-31 20:15:12,200][00273] Num frames 7000... [2024-10-31 20:15:12,319][00273] Num frames 7100... [2024-10-31 20:15:12,437][00273] Num frames 7200... [2024-10-31 20:15:12,564][00273] Num frames 7300... [2024-10-31 20:15:12,710][00273] Num frames 7400... [2024-10-31 20:15:12,872][00273] Num frames 7500... [2024-10-31 20:15:13,038][00273] Num frames 7600... [2024-10-31 20:15:13,197][00273] Num frames 7700... [2024-10-31 20:15:13,354][00273] Num frames 7800... [2024-10-31 20:15:13,516][00273] Num frames 7900... [2024-10-31 20:15:13,682][00273] Num frames 8000... [2024-10-31 20:15:13,851][00273] Num frames 8100... [2024-10-31 20:15:13,927][00273] Avg episode rewards: #0: 28.014, true rewards: #0: 11.586 [2024-10-31 20:15:13,930][00273] Avg episode reward: 28.014, avg true_objective: 11.586 [2024-10-31 20:15:14,087][00273] Num frames 8200... [2024-10-31 20:15:14,245][00273] Num frames 8300... [2024-10-31 20:15:14,413][00273] Num frames 8400... [2024-10-31 20:15:14,580][00273] Num frames 8500... [2024-10-31 20:15:14,750][00273] Num frames 8600... [2024-10-31 20:15:14,916][00273] Num frames 8700... [2024-10-31 20:15:15,014][00273] Avg episode rewards: #0: 26.151, true rewards: #0: 10.901 [2024-10-31 20:15:15,016][00273] Avg episode reward: 26.151, avg true_objective: 10.901 [2024-10-31 20:15:15,111][00273] Num frames 8800... [2024-10-31 20:15:15,225][00273] Num frames 8900... [2024-10-31 20:15:15,349][00273] Num frames 9000... [2024-10-31 20:15:15,476][00273] Num frames 9100... [2024-10-31 20:15:15,605][00273] Avg episode rewards: #0: 24.178, true rewards: #0: 10.178 [2024-10-31 20:15:15,606][00273] Avg episode reward: 24.178, avg true_objective: 10.178 [2024-10-31 20:15:15,655][00273] Num frames 9200... [2024-10-31 20:15:15,766][00273] Num frames 9300... [2024-10-31 20:15:15,883][00273] Num frames 9400... [2024-10-31 20:15:16,004][00273] Num frames 9500... [2024-10-31 20:15:16,119][00273] Num frames 9600... [2024-10-31 20:15:16,236][00273] Num frames 9700... [2024-10-31 20:15:16,350][00273] Num frames 9800... [2024-10-31 20:15:16,468][00273] Num frames 9900... [2024-10-31 20:15:16,592][00273] Num frames 10000... [2024-10-31 20:15:16,711][00273] Num frames 10100... [2024-10-31 20:15:16,830][00273] Num frames 10200... [2024-10-31 20:15:16,949][00273] Num frames 10300... [2024-10-31 20:15:17,074][00273] Num frames 10400... [2024-10-31 20:15:17,213][00273] Avg episode rewards: #0: 24.772, true rewards: #0: 10.472 [2024-10-31 20:15:17,215][00273] Avg episode reward: 24.772, avg true_objective: 10.472 [2024-10-31 20:15:17,249][00273] Num frames 10500... [2024-10-31 20:15:17,366][00273] Num frames 10600... [2024-10-31 20:15:17,483][00273] Num frames 10700... [2024-10-31 20:15:17,605][00273] Num frames 10800... [2024-10-31 20:15:17,719][00273] Num frames 10900... [2024-10-31 20:15:17,835][00273] Num frames 11000... [2024-10-31 20:15:17,949][00273] Num frames 11100... [2024-10-31 20:15:18,071][00273] Num frames 11200... [2024-10-31 20:15:18,186][00273] Num frames 11300... [2024-10-31 20:15:18,306][00273] Num frames 11400... [2024-10-31 20:15:18,422][00273] Num frames 11500... [2024-10-31 20:15:18,545][00273] Num frames 11600... [2024-10-31 20:15:18,661][00273] Num frames 11700... [2024-10-31 20:15:18,781][00273] Num frames 11800... [2024-10-31 20:15:18,896][00273] Num frames 11900... [2024-10-31 20:15:19,042][00273] Avg episode rewards: #0: 25.614, true rewards: #0: 10.887 [2024-10-31 20:15:19,044][00273] Avg episode reward: 25.614, avg true_objective: 10.887 [2024-10-31 20:15:19,081][00273] Num frames 12000... [2024-10-31 20:15:19,197][00273] Num frames 12100... [2024-10-31 20:15:19,318][00273] Num frames 12200... [2024-10-31 20:15:19,433][00273] Num frames 12300... [2024-10-31 20:15:19,553][00273] Num frames 12400... [2024-10-31 20:15:19,667][00273] Num frames 12500... [2024-10-31 20:15:19,783][00273] Num frames 12600... [2024-10-31 20:15:19,896][00273] Num frames 12700... [2024-10-31 20:15:20,014][00273] Num frames 12800... [2024-10-31 20:15:20,137][00273] Num frames 12900... [2024-10-31 20:15:20,250][00273] Num frames 13000... [2024-10-31 20:15:20,372][00273] Num frames 13100... [2024-10-31 20:15:20,493][00273] Num frames 13200... [2024-10-31 20:15:20,616][00273] Num frames 13300... [2024-10-31 20:15:20,740][00273] Num frames 13400... [2024-10-31 20:15:20,870][00273] Num frames 13500... [2024-10-31 20:15:20,993][00273] Num frames 13600... [2024-10-31 20:15:21,114][00273] Num frames 13700... [2024-10-31 20:15:21,237][00273] Num frames 13800... [2024-10-31 20:15:21,356][00273] Num frames 13900... [2024-10-31 20:15:21,479][00273] Num frames 14000... [2024-10-31 20:15:21,630][00273] Avg episode rewards: #0: 27.980, true rewards: #0: 11.730 [2024-10-31 20:15:21,632][00273] Avg episode reward: 27.980, avg true_objective: 11.730 [2024-10-31 20:15:21,665][00273] Num frames 14100... [2024-10-31 20:15:21,786][00273] Num frames 14200... [2024-10-31 20:15:21,925][00273] Num frames 14300... [2024-10-31 20:15:22,041][00273] Num frames 14400... [2024-10-31 20:15:22,167][00273] Num frames 14500... [2024-10-31 20:15:22,285][00273] Num frames 14600... [2024-10-31 20:15:22,364][00273] Avg episode rewards: #0: 26.708, true rewards: #0: 11.246 [2024-10-31 20:15:22,366][00273] Avg episode reward: 26.708, avg true_objective: 11.246 [2024-10-31 20:15:22,464][00273] Num frames 14700... [2024-10-31 20:15:22,592][00273] Num frames 14800... [2024-10-31 20:15:22,709][00273] Num frames 14900... [2024-10-31 20:15:22,828][00273] Num frames 15000... [2024-10-31 20:15:22,943][00273] Num frames 15100... [2024-10-31 20:15:23,060][00273] Num frames 15200... [2024-10-31 20:15:23,184][00273] Num frames 15300... [2024-10-31 20:15:23,301][00273] Num frames 15400... [2024-10-31 20:15:23,419][00273] Num frames 15500... [2024-10-31 20:15:23,531][00273] Avg episode rewards: #0: 26.177, true rewards: #0: 11.106 [2024-10-31 20:15:23,534][00273] Avg episode reward: 26.177, avg true_objective: 11.106 [2024-10-31 20:15:23,596][00273] Num frames 15600... [2024-10-31 20:15:23,710][00273] Num frames 15700... [2024-10-31 20:15:23,826][00273] Num frames 15800... [2024-10-31 20:15:23,942][00273] Num frames 15900... [2024-10-31 20:15:24,059][00273] Num frames 16000... [2024-10-31 20:15:24,173][00273] Num frames 16100... [2024-10-31 20:15:24,297][00273] Num frames 16200... [2024-10-31 20:15:24,414][00273] Num frames 16300... [2024-10-31 20:15:24,540][00273] Num frames 16400... [2024-10-31 20:15:24,659][00273] Num frames 16500... [2024-10-31 20:15:24,772][00273] Num frames 16600... [2024-10-31 20:15:24,892][00273] Num frames 16700... [2024-10-31 20:15:25,011][00273] Num frames 16800... [2024-10-31 20:15:25,174][00273] Num frames 16900... [2024-10-31 20:15:25,356][00273] Num frames 17000... [2024-10-31 20:15:25,524][00273] Num frames 17100... [2024-10-31 20:15:25,684][00273] Num frames 17200... [2024-10-31 20:15:25,848][00273] Num frames 17300... [2024-10-31 20:15:26,011][00273] Num frames 17400... [2024-10-31 20:15:26,172][00273] Num frames 17500... [2024-10-31 20:15:26,241][00273] Avg episode rewards: #0: 27.737, true rewards: #0: 11.671 [2024-10-31 20:15:26,244][00273] Avg episode reward: 27.737, avg true_objective: 11.671 [2024-10-31 20:17:11,351][00273] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2024-10-31 20:18:53,311][00273] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-10-31 20:18:53,317][00273] Overriding arg 'num_workers' with value 1 passed from command line [2024-10-31 20:18:53,320][00273] Adding new argument 'no_render'=True that is not in the saved config file! [2024-10-31 20:18:53,324][00273] Adding new argument 'save_video'=True that is not in the saved config file! [2024-10-31 20:18:53,325][00273] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-10-31 20:18:53,327][00273] Adding new argument 'video_name'=None that is not in the saved config file! [2024-10-31 20:18:53,329][00273] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2024-10-31 20:18:53,330][00273] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-10-31 20:18:53,331][00273] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2024-10-31 20:18:53,332][00273] Adding new argument 'hf_repository'='atharv-16/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2024-10-31 20:18:53,333][00273] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-10-31 20:18:53,334][00273] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-10-31 20:18:53,335][00273] Adding new argument 'train_script'=None that is not in the saved config file! [2024-10-31 20:18:53,336][00273] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-10-31 20:18:53,337][00273] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-10-31 20:18:53,349][00273] RunningMeanStd input shape: (3, 72, 128) [2024-10-31 20:18:53,353][00273] RunningMeanStd input shape: (1,) [2024-10-31 20:18:53,377][00273] ConvEncoder: input_channels=3 [2024-10-31 20:18:53,432][00273] Conv encoder output size: 512 [2024-10-31 20:18:53,434][00273] Policy head output size: 512 [2024-10-31 20:18:53,461][00273] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000990_4055040.pth... [2024-10-31 20:18:54,202][00273] Num frames 100... [2024-10-31 20:18:54,371][00273] Num frames 200... [2024-10-31 20:18:54,537][00273] Num frames 300... [2024-10-31 20:18:54,699][00273] Num frames 400... [2024-10-31 20:18:54,893][00273] Avg episode rewards: #0: 11.870, true rewards: #0: 4.870 [2024-10-31 20:18:54,895][00273] Avg episode reward: 11.870, avg true_objective: 4.870 [2024-10-31 20:18:54,920][00273] Num frames 500... [2024-10-31 20:18:55,092][00273] Num frames 600... [2024-10-31 20:18:55,211][00273] Num frames 700... [2024-10-31 20:18:55,330][00273] Num frames 800... [2024-10-31 20:18:55,447][00273] Num frames 900... [2024-10-31 20:18:55,568][00273] Num frames 1000... [2024-10-31 20:18:55,681][00273] Num frames 1100... [2024-10-31 20:18:55,802][00273] Num frames 1200... [2024-10-31 20:18:55,924][00273] Num frames 1300... [2024-10-31 20:18:56,043][00273] Num frames 1400... [2024-10-31 20:18:56,168][00273] Num frames 1500... [2024-10-31 20:18:56,282][00273] Num frames 1600... [2024-10-31 20:18:56,384][00273] Avg episode rewards: #0: 18.195, true rewards: #0: 8.195 [2024-10-31 20:18:56,386][00273] Avg episode reward: 18.195, avg true_objective: 8.195 [2024-10-31 20:18:56,458][00273] Num frames 1700... [2024-10-31 20:18:56,585][00273] Num frames 1800... [2024-10-31 20:18:56,703][00273] Num frames 1900... [2024-10-31 20:18:56,818][00273] Num frames 2000... [2024-10-31 20:18:56,937][00273] Num frames 2100... [2024-10-31 20:18:57,051][00273] Num frames 2200... [2024-10-31 20:18:57,166][00273] Avg episode rewards: #0: 16.490, true rewards: #0: 7.490 [2024-10-31 20:18:57,167][00273] Avg episode reward: 16.490, avg true_objective: 7.490 [2024-10-31 20:18:57,229][00273] Num frames 2300... [2024-10-31 20:18:57,343][00273] Num frames 2400... [2024-10-31 20:18:57,469][00273] Num frames 2500... [2024-10-31 20:18:57,591][00273] Num frames 2600... [2024-10-31 20:18:57,709][00273] Num frames 2700... [2024-10-31 20:18:57,828][00273] Num frames 2800... [2024-10-31 20:18:57,946][00273] Num frames 2900... [2024-10-31 20:18:58,062][00273] Num frames 3000... [2024-10-31 20:18:58,188][00273] Num frames 3100... [2024-10-31 20:18:58,305][00273] Num frames 3200... [2024-10-31 20:18:58,436][00273] Avg episode rewards: #0: 18.413, true rewards: #0: 8.162 [2024-10-31 20:18:58,440][00273] Avg episode reward: 18.413, avg true_objective: 8.162 [2024-10-31 20:18:58,481][00273] Num frames 3300... [2024-10-31 20:18:58,605][00273] Num frames 3400... [2024-10-31 20:18:58,721][00273] Num frames 3500... [2024-10-31 20:18:58,837][00273] Num frames 3600... [2024-10-31 20:18:58,958][00273] Num frames 3700... [2024-10-31 20:18:59,074][00273] Num frames 3800... [2024-10-31 20:18:59,196][00273] Num frames 3900... [2024-10-31 20:18:59,309][00273] Num frames 4000... [2024-10-31 20:18:59,428][00273] Num frames 4100... [2024-10-31 20:18:59,555][00273] Num frames 4200... [2024-10-31 20:18:59,674][00273] Num frames 4300... [2024-10-31 20:18:59,765][00273] Avg episode rewards: #0: 19.058, true rewards: #0: 8.658 [2024-10-31 20:18:59,766][00273] Avg episode reward: 19.058, avg true_objective: 8.658 [2024-10-31 20:18:59,847][00273] Num frames 4400... [2024-10-31 20:18:59,967][00273] Num frames 4500... [2024-10-31 20:19:00,090][00273] Num frames 4600... [2024-10-31 20:19:00,230][00273] Num frames 4700... [2024-10-31 20:19:00,364][00273] Num frames 4800... [2024-10-31 20:19:00,499][00273] Num frames 4900... [2024-10-31 20:19:00,633][00273] Num frames 5000... [2024-10-31 20:19:00,752][00273] Num frames 5100... [2024-10-31 20:19:00,864][00273] Num frames 5200... [2024-10-31 20:19:00,979][00273] Num frames 5300... [2024-10-31 20:19:01,094][00273] Num frames 5400... [2024-10-31 20:19:01,226][00273] Num frames 5500... [2024-10-31 20:19:01,347][00273] Num frames 5600... [2024-10-31 20:19:01,480][00273] Num frames 5700... [2024-10-31 20:19:01,632][00273] Avg episode rewards: #0: 21.458, true rewards: #0: 9.625 [2024-10-31 20:19:01,634][00273] Avg episode reward: 21.458, avg true_objective: 9.625 [2024-10-31 20:19:01,667][00273] Num frames 5800... [2024-10-31 20:19:01,779][00273] Num frames 5900... [2024-10-31 20:19:01,893][00273] Num frames 6000... [2024-10-31 20:19:02,020][00273] Num frames 6100... [2024-10-31 20:19:02,145][00273] Num frames 6200... [2024-10-31 20:19:02,280][00273] Num frames 6300... [2024-10-31 20:19:02,403][00273] Num frames 6400... [2024-10-31 20:19:02,523][00273] Num frames 6500... [2024-10-31 20:19:02,597][00273] Avg episode rewards: #0: 20.874, true rewards: #0: 9.303 [2024-10-31 20:19:02,598][00273] Avg episode reward: 20.874, avg true_objective: 9.303 [2024-10-31 20:19:02,701][00273] Num frames 6600... [2024-10-31 20:19:02,817][00273] Num frames 6700... [2024-10-31 20:19:02,932][00273] Num frames 6800... [2024-10-31 20:19:03,052][00273] Num frames 6900... [2024-10-31 20:19:03,168][00273] Num frames 7000... [2024-10-31 20:19:03,292][00273] Num frames 7100... [2024-10-31 20:19:03,409][00273] Num frames 7200... [2024-10-31 20:19:03,529][00273] Num frames 7300... [2024-10-31 20:19:03,658][00273] Num frames 7400... [2024-10-31 20:19:03,774][00273] Num frames 7500... [2024-10-31 20:19:03,891][00273] Num frames 7600... [2024-10-31 20:19:04,013][00273] Num frames 7700... [2024-10-31 20:19:04,128][00273] Num frames 7800... [2024-10-31 20:19:04,249][00273] Num frames 7900... [2024-10-31 20:19:04,376][00273] Num frames 8000... [2024-10-31 20:19:04,495][00273] Num frames 8100... [2024-10-31 20:19:04,626][00273] Num frames 8200... [2024-10-31 20:19:04,760][00273] Avg episode rewards: #0: 23.830, true rewards: #0: 10.330 [2024-10-31 20:19:04,761][00273] Avg episode reward: 23.830, avg true_objective: 10.330 [2024-10-31 20:19:04,805][00273] Num frames 8300... [2024-10-31 20:19:04,920][00273] Num frames 8400... [2024-10-31 20:19:05,035][00273] Num frames 8500... [2024-10-31 20:19:05,171][00273] Num frames 8600... [2024-10-31 20:19:05,333][00273] Num frames 8700... [2024-10-31 20:19:05,498][00273] Num frames 8800... [2024-10-31 20:19:05,668][00273] Num frames 8900... [2024-10-31 20:19:05,827][00273] Num frames 9000... [2024-10-31 20:19:05,991][00273] Num frames 9100... [2024-10-31 20:19:06,147][00273] Num frames 9200... [2024-10-31 20:19:06,307][00273] Num frames 9300... [2024-10-31 20:19:06,493][00273] Num frames 9400... [2024-10-31 20:19:06,665][00273] Num frames 9500... [2024-10-31 20:19:06,825][00273] Num frames 9600... [2024-10-31 20:19:06,998][00273] Num frames 9700... [2024-10-31 20:19:07,113][00273] Avg episode rewards: #0: 24.929, true rewards: #0: 10.818 [2024-10-31 20:19:07,115][00273] Avg episode reward: 24.929, avg true_objective: 10.818 [2024-10-31 20:19:07,226][00273] Num frames 9800... [2024-10-31 20:19:07,390][00273] Num frames 9900... [2024-10-31 20:19:07,581][00273] Num frames 10000... [2024-10-31 20:19:07,723][00273] Num frames 10100... [2024-10-31 20:19:07,840][00273] Num frames 10200... [2024-10-31 20:19:07,959][00273] Num frames 10300... [2024-10-31 20:19:08,076][00273] Num frames 10400... [2024-10-31 20:19:08,177][00273] Avg episode rewards: #0: 23.640, true rewards: #0: 10.440 [2024-10-31 20:19:08,179][00273] Avg episode reward: 23.640, avg true_objective: 10.440 [2024-10-31 20:20:11,448][00273] Replay video saved to /content/train_dir/default_experiment/replay.mp4!