[2024-12-30 19:56:32,381][00338] Saving configuration to /content/train_dir/default_experiment/config.json... [2024-12-30 19:56:32,384][00338] Rollout worker 0 uses device cpu [2024-12-30 19:56:32,385][00338] Rollout worker 1 uses device cpu [2024-12-30 19:56:32,387][00338] Rollout worker 2 uses device cpu [2024-12-30 19:56:32,389][00338] Rollout worker 3 uses device cpu [2024-12-30 19:56:32,390][00338] Rollout worker 4 uses device cpu [2024-12-30 19:56:32,391][00338] Rollout worker 5 uses device cpu [2024-12-30 19:56:32,392][00338] Rollout worker 6 uses device cpu [2024-12-30 19:56:32,394][00338] Rollout worker 7 uses device cpu [2024-12-30 19:56:32,561][00338] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-12-30 19:56:32,563][00338] InferenceWorker_p0-w0: min num requests: 2 [2024-12-30 19:56:32,596][00338] Starting all processes... [2024-12-30 19:56:32,598][00338] Starting process learner_proc0 [2024-12-30 19:56:32,642][00338] Starting all processes... [2024-12-30 19:56:32,650][00338] Starting process inference_proc0-0 [2024-12-30 19:56:32,651][00338] Starting process rollout_proc0 [2024-12-30 19:56:32,652][00338] Starting process rollout_proc1 [2024-12-30 19:56:32,652][00338] Starting process rollout_proc2 [2024-12-30 19:56:32,652][00338] Starting process rollout_proc3 [2024-12-30 19:56:32,652][00338] Starting process rollout_proc4 [2024-12-30 19:56:32,652][00338] Starting process rollout_proc5 [2024-12-30 19:56:32,652][00338] Starting process rollout_proc6 [2024-12-30 19:56:32,652][00338] Starting process rollout_proc7 [2024-12-30 19:56:50,740][02943] Worker 6 uses CPU cores [0] [2024-12-30 19:56:50,796][02942] Worker 4 uses CPU cores [0] [2024-12-30 19:56:50,880][02939] Worker 1 uses CPU cores [1] [2024-12-30 19:56:50,883][02938] Worker 0 uses CPU cores [0] [2024-12-30 19:56:50,899][02924] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-12-30 19:56:50,899][02924] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2024-12-30 19:56:50,908][02941] Worker 3 uses CPU cores [1] [2024-12-30 19:56:50,946][02924] Num visible devices: 1 [2024-12-30 19:56:50,952][02944] Worker 5 uses CPU cores [1] [2024-12-30 19:56:50,961][02924] Starting seed is not provided [2024-12-30 19:56:50,962][02924] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-12-30 19:56:50,962][02924] Initializing actor-critic model on device cuda:0 [2024-12-30 19:56:50,963][02924] RunningMeanStd input shape: (3, 72, 128) [2024-12-30 19:56:50,966][02924] RunningMeanStd input shape: (1,) [2024-12-30 19:56:50,985][02924] ConvEncoder: input_channels=3 [2024-12-30 19:56:51,008][02940] Worker 2 uses CPU cores [0] [2024-12-30 19:56:51,016][02945] Worker 7 uses CPU cores [1] [2024-12-30 19:56:51,086][02937] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-12-30 19:56:51,086][02937] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2024-12-30 19:56:51,102][02937] Num visible devices: 1 [2024-12-30 19:56:51,250][02924] Conv encoder output size: 512 [2024-12-30 19:56:51,250][02924] Policy head output size: 512 [2024-12-30 19:56:51,302][02924] Created Actor Critic model with architecture: [2024-12-30 19:56:51,302][02924] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2024-12-30 19:56:51,687][02924] Using optimizer [2024-12-30 19:56:52,560][00338] Heartbeat connected on Batcher_0 [2024-12-30 19:56:52,563][00338] Heartbeat connected on InferenceWorker_p0-w0 [2024-12-30 19:56:52,574][00338] Heartbeat connected on RolloutWorker_w1 [2024-12-30 19:56:52,575][00338] Heartbeat connected on RolloutWorker_w0 [2024-12-30 19:56:52,579][00338] Heartbeat connected on RolloutWorker_w2 [2024-12-30 19:56:52,581][00338] Heartbeat connected on RolloutWorker_w3 [2024-12-30 19:56:52,585][00338] Heartbeat connected on RolloutWorker_w4 [2024-12-30 19:56:52,593][00338] Heartbeat connected on RolloutWorker_w5 [2024-12-30 19:56:52,594][00338] Heartbeat connected on RolloutWorker_w6 [2024-12-30 19:56:52,597][00338] Heartbeat connected on RolloutWorker_w7 [2024-12-30 19:56:55,612][02924] No checkpoints found [2024-12-30 19:56:55,612][02924] Did not load from checkpoint, starting from scratch! [2024-12-30 19:56:55,613][02924] Initialized policy 0 weights for model version 0 [2024-12-30 19:56:55,626][02924] LearnerWorker_p0 finished initialization! [2024-12-30 19:56:55,626][02924] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-12-30 19:56:55,629][00338] Heartbeat connected on LearnerWorker_p0 [2024-12-30 19:56:55,852][02937] RunningMeanStd input shape: (3, 72, 128) [2024-12-30 19:56:55,856][02937] RunningMeanStd input shape: (1,) [2024-12-30 19:56:55,876][02937] ConvEncoder: input_channels=3 [2024-12-30 19:56:56,039][02937] Conv encoder output size: 512 [2024-12-30 19:56:56,040][02937] Policy head output size: 512 [2024-12-30 19:56:56,120][00338] Inference worker 0-0 is ready! [2024-12-30 19:56:56,124][00338] All inference workers are ready! Signal rollout workers to start! [2024-12-30 19:56:56,356][02942] Doom resolution: 160x120, resize resolution: (128, 72) [2024-12-30 19:56:56,357][02938] Doom resolution: 160x120, resize resolution: (128, 72) [2024-12-30 19:56:56,359][02940] Doom resolution: 160x120, resize resolution: (128, 72) [2024-12-30 19:56:56,364][02943] Doom resolution: 160x120, resize resolution: (128, 72) [2024-12-30 19:56:56,425][02945] Doom resolution: 160x120, resize resolution: (128, 72) [2024-12-30 19:56:56,430][02944] Doom resolution: 160x120, resize resolution: (128, 72) [2024-12-30 19:56:56,427][02939] Doom resolution: 160x120, resize resolution: (128, 72) [2024-12-30 19:56:56,430][02941] Doom resolution: 160x120, resize resolution: (128, 72) [2024-12-30 19:56:57,000][00338] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-12-30 19:56:57,806][02940] Decorrelating experience for 0 frames... [2024-12-30 19:56:57,806][02944] Decorrelating experience for 0 frames... [2024-12-30 19:56:57,810][02943] Decorrelating experience for 0 frames... [2024-12-30 19:56:57,808][02945] Decorrelating experience for 0 frames... [2024-12-30 19:56:58,540][02944] Decorrelating experience for 32 frames... [2024-12-30 19:56:58,637][02939] Decorrelating experience for 0 frames... [2024-12-30 19:56:59,265][02940] Decorrelating experience for 32 frames... [2024-12-30 19:56:59,270][02943] Decorrelating experience for 32 frames... [2024-12-30 19:56:59,298][02938] Decorrelating experience for 0 frames... [2024-12-30 19:56:59,314][02942] Decorrelating experience for 0 frames... [2024-12-30 19:56:59,890][02939] Decorrelating experience for 32 frames... [2024-12-30 19:56:59,932][02941] Decorrelating experience for 0 frames... [2024-12-30 19:57:00,145][02944] Decorrelating experience for 64 frames... [2024-12-30 19:57:00,280][02938] Decorrelating experience for 32 frames... [2024-12-30 19:57:00,522][02940] Decorrelating experience for 64 frames... [2024-12-30 19:57:00,791][02945] Decorrelating experience for 32 frames... [2024-12-30 19:57:01,151][02943] Decorrelating experience for 64 frames... [2024-12-30 19:57:01,465][02941] Decorrelating experience for 32 frames... [2024-12-30 19:57:01,893][02944] Decorrelating experience for 96 frames... [2024-12-30 19:57:01,994][00338] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-12-30 19:57:02,114][02938] Decorrelating experience for 64 frames... [2024-12-30 19:57:02,122][02940] Decorrelating experience for 96 frames... [2024-12-30 19:57:02,304][02939] Decorrelating experience for 64 frames... [2024-12-30 19:57:02,423][02942] Decorrelating experience for 32 frames... [2024-12-30 19:57:03,598][02943] Decorrelating experience for 96 frames... [2024-12-30 19:57:03,693][02945] Decorrelating experience for 64 frames... [2024-12-30 19:57:04,163][02941] Decorrelating experience for 64 frames... [2024-12-30 19:57:04,255][02938] Decorrelating experience for 96 frames... [2024-12-30 19:57:04,952][02945] Decorrelating experience for 96 frames... [2024-12-30 19:57:05,596][02942] Decorrelating experience for 64 frames... [2024-12-30 19:57:05,670][02941] Decorrelating experience for 96 frames... [2024-12-30 19:57:06,996][00338] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 49.6. Samples: 496. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-12-30 19:57:06,998][00338] Avg episode reward: [(0, '2.065')] [2024-12-30 19:57:10,949][02924] Signal inference workers to stop experience collection... [2024-12-30 19:57:11,001][02937] InferenceWorker_p0-w0: stopping experience collection [2024-12-30 19:57:11,205][02939] Decorrelating experience for 96 frames... [2024-12-30 19:57:11,787][02942] Decorrelating experience for 96 frames... [2024-12-30 19:57:11,994][00338] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 159.8. Samples: 2396. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-12-30 19:57:11,997][00338] Avg episode reward: [(0, '2.759')] [2024-12-30 19:57:15,485][02924] Signal inference workers to resume experience collection... [2024-12-30 19:57:15,486][02937] InferenceWorker_p0-w0: resuming experience collection [2024-12-30 19:57:16,994][00338] Fps is (10 sec: 1229.0, 60 sec: 614.6, 300 sec: 614.6). Total num frames: 12288. Throughput: 0: 119.8. Samples: 2396. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2024-12-30 19:57:17,000][00338] Avg episode reward: [(0, '2.821')] [2024-12-30 19:57:21,994][00338] Fps is (10 sec: 3276.8, 60 sec: 1311.0, 300 sec: 1311.0). Total num frames: 32768. Throughput: 0: 304.5. Samples: 7610. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-30 19:57:21,996][00338] Avg episode reward: [(0, '3.809')] [2024-12-30 19:57:22,794][02937] Updated weights for policy 0, policy_version 10 (0.0035) [2024-12-30 19:57:26,997][00338] Fps is (10 sec: 4095.0, 60 sec: 1775.1, 300 sec: 1775.1). Total num frames: 53248. Throughput: 0: 460.9. Samples: 13826. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-30 19:57:26,999][00338] Avg episode reward: [(0, '4.409')] [2024-12-30 19:57:31,994][00338] Fps is (10 sec: 3686.4, 60 sec: 1989.8, 300 sec: 1989.8). Total num frames: 69632. Throughput: 0: 457.0. Samples: 15994. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-30 19:57:31,999][00338] Avg episode reward: [(0, '4.518')] [2024-12-30 19:57:33,914][02937] Updated weights for policy 0, policy_version 20 (0.0018) [2024-12-30 19:57:36,994][00338] Fps is (10 sec: 4096.9, 60 sec: 2355.5, 300 sec: 2355.5). Total num frames: 94208. Throughput: 0: 566.6. Samples: 22662. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-30 19:57:36,997][00338] Avg episode reward: [(0, '4.469')] [2024-12-30 19:57:41,994][00338] Fps is (10 sec: 4505.6, 60 sec: 2548.9, 300 sec: 2548.9). Total num frames: 114688. Throughput: 0: 661.3. Samples: 29756. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-30 19:57:42,003][00338] Avg episode reward: [(0, '4.295')] [2024-12-30 19:57:42,098][02924] Saving new best policy, reward=4.295! [2024-12-30 19:57:43,494][02937] Updated weights for policy 0, policy_version 30 (0.0013) [2024-12-30 19:57:46,996][00338] Fps is (10 sec: 3686.0, 60 sec: 2621.6, 300 sec: 2621.6). Total num frames: 131072. Throughput: 0: 708.3. Samples: 31876. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-30 19:57:46,998][00338] Avg episode reward: [(0, '4.366')] [2024-12-30 19:57:47,014][02924] Saving new best policy, reward=4.366! [2024-12-30 19:57:51,994][00338] Fps is (10 sec: 4096.0, 60 sec: 2830.2, 300 sec: 2830.2). Total num frames: 155648. Throughput: 0: 824.3. Samples: 37590. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-30 19:57:51,999][00338] Avg episode reward: [(0, '4.384')] [2024-12-30 19:57:52,002][02924] Saving new best policy, reward=4.384! [2024-12-30 19:57:53,655][02937] Updated weights for policy 0, policy_version 40 (0.0018) [2024-12-30 19:57:56,994][00338] Fps is (10 sec: 4915.9, 60 sec: 3004.0, 300 sec: 3004.0). Total num frames: 180224. Throughput: 0: 942.0. Samples: 44786. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-30 19:57:56,998][00338] Avg episode reward: [(0, '4.362')] [2024-12-30 19:58:01,994][00338] Fps is (10 sec: 4096.0, 60 sec: 3276.8, 300 sec: 3025.0). Total num frames: 196608. Throughput: 0: 1009.2. Samples: 47810. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-30 19:58:02,000][00338] Avg episode reward: [(0, '4.379')] [2024-12-30 19:58:04,722][02937] Updated weights for policy 0, policy_version 50 (0.0032) [2024-12-30 19:58:06,995][00338] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3043.0). Total num frames: 212992. Throughput: 0: 997.4. Samples: 52492. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-30 19:58:06,996][00338] Avg episode reward: [(0, '4.299')] [2024-12-30 19:58:11,994][00338] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3167.8). Total num frames: 237568. Throughput: 0: 1019.2. Samples: 59688. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-30 19:58:12,001][00338] Avg episode reward: [(0, '4.459')] [2024-12-30 19:58:12,006][02924] Saving new best policy, reward=4.459! [2024-12-30 19:58:13,203][02937] Updated weights for policy 0, policy_version 60 (0.0031) [2024-12-30 19:58:16,994][00338] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 3225.8). Total num frames: 258048. Throughput: 0: 1050.8. Samples: 63282. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-30 19:58:16,996][00338] Avg episode reward: [(0, '4.540')] [2024-12-30 19:58:17,010][02924] Saving new best policy, reward=4.540! [2024-12-30 19:58:21,994][00338] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3228.8). Total num frames: 274432. Throughput: 0: 1005.2. Samples: 67898. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-30 19:58:21,997][00338] Avg episode reward: [(0, '4.342')] [2024-12-30 19:58:24,360][02937] Updated weights for policy 0, policy_version 70 (0.0030) [2024-12-30 19:58:26,994][00338] Fps is (10 sec: 4096.0, 60 sec: 4096.2, 300 sec: 3322.5). Total num frames: 299008. Throughput: 0: 995.9. Samples: 74570. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-30 19:58:26,997][00338] Avg episode reward: [(0, '4.116')] [2024-12-30 19:58:27,002][02924] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000073_299008.pth... [2024-12-30 19:58:31,994][00338] Fps is (10 sec: 4915.2, 60 sec: 4232.5, 300 sec: 3406.3). Total num frames: 323584. Throughput: 0: 1030.5. Samples: 78246. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-30 19:58:31,998][00338] Avg episode reward: [(0, '4.390')] [2024-12-30 19:58:33,127][02937] Updated weights for policy 0, policy_version 80 (0.0022) [2024-12-30 19:58:36,995][00338] Fps is (10 sec: 3686.2, 60 sec: 4027.7, 300 sec: 3358.9). Total num frames: 335872. Throughput: 0: 1025.2. Samples: 83726. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-12-30 19:58:37,001][00338] Avg episode reward: [(0, '4.442')] [2024-12-30 19:58:41,994][00338] Fps is (10 sec: 3276.8, 60 sec: 4027.7, 300 sec: 3394.0). Total num frames: 356352. Throughput: 0: 991.1. Samples: 89384. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0) [2024-12-30 19:58:42,000][00338] Avg episode reward: [(0, '4.380')] [2024-12-30 19:58:43,877][02937] Updated weights for policy 0, policy_version 90 (0.0034) [2024-12-30 19:58:46,994][00338] Fps is (10 sec: 4505.8, 60 sec: 4164.4, 300 sec: 3463.2). Total num frames: 380928. Throughput: 0: 1002.4. Samples: 92916. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-30 19:58:46,996][00338] Avg episode reward: [(0, '4.438')] [2024-12-30 19:58:51,994][00338] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3455.0). Total num frames: 397312. Throughput: 0: 1045.8. Samples: 99554. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-30 19:58:52,001][00338] Avg episode reward: [(0, '4.431')] [2024-12-30 19:58:55,299][02937] Updated weights for policy 0, policy_version 100 (0.0042) [2024-12-30 19:58:56,994][00338] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3447.6). Total num frames: 413696. Throughput: 0: 982.9. Samples: 103918. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-30 19:58:56,997][00338] Avg episode reward: [(0, '4.544')] [2024-12-30 19:58:57,004][02924] Saving new best policy, reward=4.544! [2024-12-30 19:59:01,994][00338] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3506.3). Total num frames: 438272. Throughput: 0: 982.0. Samples: 107472. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-30 19:59:02,000][00338] Avg episode reward: [(0, '4.671')] [2024-12-30 19:59:02,004][02924] Saving new best policy, reward=4.671! [2024-12-30 19:59:04,195][02937] Updated weights for policy 0, policy_version 110 (0.0014) [2024-12-30 19:59:07,001][00338] Fps is (10 sec: 4502.7, 60 sec: 4095.6, 300 sec: 3528.8). Total num frames: 458752. Throughput: 0: 1032.4. Samples: 114362. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-30 19:59:07,005][00338] Avg episode reward: [(0, '4.383')] [2024-12-30 19:59:12,000][00338] Fps is (10 sec: 3684.2, 60 sec: 3959.1, 300 sec: 3519.5). Total num frames: 475136. Throughput: 0: 986.7. Samples: 118978. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-30 19:59:12,003][00338] Avg episode reward: [(0, '4.294')] [2024-12-30 19:59:15,460][02937] Updated weights for policy 0, policy_version 120 (0.0037) [2024-12-30 19:59:16,995][00338] Fps is (10 sec: 3688.6, 60 sec: 3959.4, 300 sec: 3540.2). Total num frames: 495616. Throughput: 0: 970.0. Samples: 121898. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-30 19:59:16,997][00338] Avg episode reward: [(0, '4.443')] [2024-12-30 19:59:21,994][00338] Fps is (10 sec: 4508.3, 60 sec: 4096.0, 300 sec: 3587.7). Total num frames: 520192. Throughput: 0: 1009.4. Samples: 129150. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-30 19:59:21,999][00338] Avg episode reward: [(0, '4.674')] [2024-12-30 19:59:22,001][02924] Saving new best policy, reward=4.674! [2024-12-30 19:59:24,720][02937] Updated weights for policy 0, policy_version 130 (0.0032) [2024-12-30 19:59:26,994][00338] Fps is (10 sec: 4096.1, 60 sec: 3959.5, 300 sec: 3577.3). Total num frames: 536576. Throughput: 0: 1009.2. Samples: 134798. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-30 19:59:26,997][00338] Avg episode reward: [(0, '4.609')] [2024-12-30 19:59:31,996][00338] Fps is (10 sec: 3685.7, 60 sec: 3891.1, 300 sec: 3594.0). Total num frames: 557056. Throughput: 0: 980.1. Samples: 137022. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-30 19:59:32,002][00338] Avg episode reward: [(0, '4.451')] [2024-12-30 19:59:34,902][02937] Updated weights for policy 0, policy_version 140 (0.0035) [2024-12-30 19:59:36,994][00338] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 3635.3). Total num frames: 581632. Throughput: 0: 992.0. Samples: 144192. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-30 19:59:37,001][00338] Avg episode reward: [(0, '4.458')] [2024-12-30 19:59:41,994][00338] Fps is (10 sec: 4096.8, 60 sec: 4027.7, 300 sec: 3624.5). Total num frames: 598016. Throughput: 0: 1037.1. Samples: 150586. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-30 19:59:41,998][00338] Avg episode reward: [(0, '4.380')] [2024-12-30 19:59:46,068][02937] Updated weights for policy 0, policy_version 150 (0.0043) [2024-12-30 19:59:46,996][00338] Fps is (10 sec: 3685.9, 60 sec: 3959.4, 300 sec: 3638.3). Total num frames: 618496. Throughput: 0: 1007.2. Samples: 152798. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-30 19:59:46,999][00338] Avg episode reward: [(0, '4.460')] [2024-12-30 19:59:51,994][00338] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 3674.8). Total num frames: 643072. Throughput: 0: 996.4. Samples: 159192. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-30 19:59:51,997][00338] Avg episode reward: [(0, '4.586')] [2024-12-30 19:59:54,505][02937] Updated weights for policy 0, policy_version 160 (0.0021) [2024-12-30 19:59:56,996][00338] Fps is (10 sec: 4505.6, 60 sec: 4164.2, 300 sec: 3686.5). Total num frames: 663552. Throughput: 0: 1056.2. Samples: 166500. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-30 19:59:57,001][00338] Avg episode reward: [(0, '4.649')] [2024-12-30 20:00:01,994][00338] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3675.4). Total num frames: 679936. Throughput: 0: 1038.9. Samples: 168646. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-30 20:00:01,997][00338] Avg episode reward: [(0, '4.582')] [2024-12-30 20:00:05,900][02937] Updated weights for policy 0, policy_version 170 (0.0025) [2024-12-30 20:00:06,995][00338] Fps is (10 sec: 3686.8, 60 sec: 4028.1, 300 sec: 3686.5). Total num frames: 700416. Throughput: 0: 992.3. Samples: 173802. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-30 20:00:07,000][00338] Avg episode reward: [(0, '4.337')] [2024-12-30 20:00:11,994][00338] Fps is (10 sec: 4505.6, 60 sec: 4164.7, 300 sec: 3718.0). Total num frames: 724992. Throughput: 0: 1026.3. Samples: 180980. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-30 20:00:11,997][00338] Avg episode reward: [(0, '4.365')] [2024-12-30 20:00:14,915][02937] Updated weights for policy 0, policy_version 180 (0.0021) [2024-12-30 20:00:16,994][00338] Fps is (10 sec: 4096.1, 60 sec: 4096.0, 300 sec: 3707.0). Total num frames: 741376. Throughput: 0: 1048.8. Samples: 184214. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-30 20:00:16,999][00338] Avg episode reward: [(0, '4.686')] [2024-12-30 20:00:17,009][02924] Saving new best policy, reward=4.686! [2024-12-30 20:00:21,994][00338] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3716.5). Total num frames: 761856. Throughput: 0: 988.8. Samples: 188686. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-30 20:00:22,001][00338] Avg episode reward: [(0, '4.703')] [2024-12-30 20:00:22,004][02924] Saving new best policy, reward=4.703! [2024-12-30 20:00:25,486][02937] Updated weights for policy 0, policy_version 190 (0.0033) [2024-12-30 20:00:26,994][00338] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 3725.5). Total num frames: 782336. Throughput: 0: 1008.4. Samples: 195962. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-30 20:00:27,001][00338] Avg episode reward: [(0, '4.813')] [2024-12-30 20:00:27,014][02924] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000191_782336.pth... [2024-12-30 20:00:27,167][02924] Saving new best policy, reward=4.813! [2024-12-30 20:00:31,995][00338] Fps is (10 sec: 4095.8, 60 sec: 4096.1, 300 sec: 3734.1). Total num frames: 802816. Throughput: 0: 1034.9. Samples: 199366. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-30 20:00:31,997][00338] Avg episode reward: [(0, '4.806')] [2024-12-30 20:00:36,730][02937] Updated weights for policy 0, policy_version 200 (0.0027) [2024-12-30 20:00:36,994][00338] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3723.7). Total num frames: 819200. Throughput: 0: 1000.0. Samples: 204194. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-30 20:00:37,001][00338] Avg episode reward: [(0, '4.730')] [2024-12-30 20:00:41,994][00338] Fps is (10 sec: 4096.2, 60 sec: 4096.0, 300 sec: 3750.2). Total num frames: 843776. Throughput: 0: 978.9. Samples: 210548. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-30 20:00:41,999][00338] Avg episode reward: [(0, '4.432')] [2024-12-30 20:00:45,320][02937] Updated weights for policy 0, policy_version 210 (0.0026) [2024-12-30 20:00:46,994][00338] Fps is (10 sec: 4505.6, 60 sec: 4096.1, 300 sec: 3757.7). Total num frames: 864256. Throughput: 0: 1014.0. Samples: 214278. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-30 20:00:47,001][00338] Avg episode reward: [(0, '4.501')] [2024-12-30 20:00:51,994][00338] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3747.5). Total num frames: 880640. Throughput: 0: 1021.8. Samples: 219782. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-30 20:00:51,997][00338] Avg episode reward: [(0, '4.499')] [2024-12-30 20:00:56,580][02937] Updated weights for policy 0, policy_version 220 (0.0015) [2024-12-30 20:00:56,994][00338] Fps is (10 sec: 3686.3, 60 sec: 3959.6, 300 sec: 3754.7). Total num frames: 901120. Throughput: 0: 986.9. Samples: 225390. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-30 20:00:56,999][00338] Avg episode reward: [(0, '4.562')] [2024-12-30 20:01:01,994][00338] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 3778.4). Total num frames: 925696. Throughput: 0: 993.1. Samples: 228902. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-30 20:01:01,999][00338] Avg episode reward: [(0, '4.806')] [2024-12-30 20:01:06,200][02937] Updated weights for policy 0, policy_version 230 (0.0024) [2024-12-30 20:01:06,994][00338] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3768.4). Total num frames: 942080. Throughput: 0: 1037.1. Samples: 235354. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-30 20:01:06,998][00338] Avg episode reward: [(0, '4.764')] [2024-12-30 20:01:11,994][00338] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3758.8). Total num frames: 958464. Throughput: 0: 972.6. Samples: 239728. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-30 20:01:12,001][00338] Avg episode reward: [(0, '4.859')] [2024-12-30 20:01:12,005][02924] Saving new best policy, reward=4.859! [2024-12-30 20:01:16,683][02937] Updated weights for policy 0, policy_version 240 (0.0020) [2024-12-30 20:01:16,994][00338] Fps is (10 sec: 4096.1, 60 sec: 4027.7, 300 sec: 3781.0). Total num frames: 983040. Throughput: 0: 975.2. Samples: 243250. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-30 20:01:16,997][00338] Avg episode reward: [(0, '4.698')] [2024-12-30 20:01:21,999][00338] Fps is (10 sec: 4503.6, 60 sec: 4027.4, 300 sec: 3786.9). Total num frames: 1003520. Throughput: 0: 1029.1. Samples: 250506. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-30 20:01:22,004][00338] Avg episode reward: [(0, '4.792')] [2024-12-30 20:01:26,994][00338] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3777.5). Total num frames: 1019904. Throughput: 0: 991.9. Samples: 255182. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-30 20:01:27,004][00338] Avg episode reward: [(0, '4.838')] [2024-12-30 20:01:28,014][02937] Updated weights for policy 0, policy_version 250 (0.0016) [2024-12-30 20:01:31,994][00338] Fps is (10 sec: 3688.0, 60 sec: 3959.5, 300 sec: 3783.3). Total num frames: 1040384. Throughput: 0: 970.7. Samples: 257960. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-30 20:01:31,996][00338] Avg episode reward: [(0, '4.748')] [2024-12-30 20:01:36,543][02937] Updated weights for policy 0, policy_version 260 (0.0024) [2024-12-30 20:01:36,994][00338] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 3803.5). Total num frames: 1064960. Throughput: 0: 1006.9. Samples: 265094. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-30 20:01:36,997][00338] Avg episode reward: [(0, '4.880')] [2024-12-30 20:01:37,009][02924] Saving new best policy, reward=4.880! [2024-12-30 20:01:41,994][00338] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3794.3). Total num frames: 1081344. Throughput: 0: 1000.7. Samples: 270422. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-30 20:01:41,998][00338] Avg episode reward: [(0, '4.857')] [2024-12-30 20:01:46,994][00338] Fps is (10 sec: 3276.7, 60 sec: 3891.2, 300 sec: 3785.3). Total num frames: 1097728. Throughput: 0: 969.8. Samples: 272544. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-30 20:01:47,001][00338] Avg episode reward: [(0, '4.671')] [2024-12-30 20:01:48,248][02937] Updated weights for policy 0, policy_version 270 (0.0017) [2024-12-30 20:01:51,994][00338] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3804.5). Total num frames: 1122304. Throughput: 0: 977.9. Samples: 279358. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-30 20:01:51,997][00338] Avg episode reward: [(0, '4.866')] [2024-12-30 20:01:56,994][00338] Fps is (10 sec: 4505.7, 60 sec: 4027.7, 300 sec: 3873.8). Total num frames: 1142784. Throughput: 0: 1028.5. Samples: 286012. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-30 20:01:56,997][00338] Avg episode reward: [(0, '4.958')] [2024-12-30 20:01:57,006][02924] Saving new best policy, reward=4.958! [2024-12-30 20:01:57,832][02937] Updated weights for policy 0, policy_version 280 (0.0037) [2024-12-30 20:02:01,994][00338] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3915.5). Total num frames: 1155072. Throughput: 0: 997.0. Samples: 288116. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-30 20:02:02,004][00338] Avg episode reward: [(0, '5.116')] [2024-12-30 20:02:02,028][02924] Saving new best policy, reward=5.116! [2024-12-30 20:02:06,994][00338] Fps is (10 sec: 3686.3, 60 sec: 3959.5, 300 sec: 3998.8). Total num frames: 1179648. Throughput: 0: 966.2. Samples: 293980. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-30 20:02:07,000][00338] Avg episode reward: [(0, '5.230')] [2024-12-30 20:02:07,008][02924] Saving new best policy, reward=5.230! [2024-12-30 20:02:08,177][02937] Updated weights for policy 0, policy_version 290 (0.0030) [2024-12-30 20:02:11,996][00338] Fps is (10 sec: 4914.1, 60 sec: 4095.8, 300 sec: 4040.4). Total num frames: 1204224. Throughput: 0: 1017.2. Samples: 300958. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-30 20:02:11,999][00338] Avg episode reward: [(0, '5.130')] [2024-12-30 20:02:16,994][00338] Fps is (10 sec: 3686.5, 60 sec: 3891.2, 300 sec: 4012.7). Total num frames: 1216512. Throughput: 0: 1010.8. Samples: 303444. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-30 20:02:16,999][00338] Avg episode reward: [(0, '5.145')] [2024-12-30 20:02:19,703][02937] Updated weights for policy 0, policy_version 300 (0.0027) [2024-12-30 20:02:21,996][00338] Fps is (10 sec: 3276.8, 60 sec: 3891.3, 300 sec: 4012.7). Total num frames: 1236992. Throughput: 0: 961.6. Samples: 308368. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-30 20:02:21,999][00338] Avg episode reward: [(0, '5.304')] [2024-12-30 20:02:22,007][02924] Saving new best policy, reward=5.304! [2024-12-30 20:02:26,994][00338] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 4040.5). Total num frames: 1261568. Throughput: 0: 1002.7. Samples: 315542. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-30 20:02:26,997][00338] Avg episode reward: [(0, '5.349')] [2024-12-30 20:02:27,007][02924] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000308_1261568.pth... [2024-12-30 20:02:27,118][02924] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000073_299008.pth [2024-12-30 20:02:27,145][02924] Saving new best policy, reward=5.349! [2024-12-30 20:02:28,324][02937] Updated weights for policy 0, policy_version 310 (0.0027) [2024-12-30 20:02:31,995][00338] Fps is (10 sec: 4096.7, 60 sec: 3959.4, 300 sec: 4012.7). Total num frames: 1277952. Throughput: 0: 1025.2. Samples: 318680. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-30 20:02:31,997][00338] Avg episode reward: [(0, '5.290')] [2024-12-30 20:02:36,994][00338] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3998.8). Total num frames: 1294336. Throughput: 0: 970.5. Samples: 323032. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-30 20:02:37,005][00338] Avg episode reward: [(0, '5.480')] [2024-12-30 20:02:37,021][02924] Saving new best policy, reward=5.480! [2024-12-30 20:02:39,799][02937] Updated weights for policy 0, policy_version 320 (0.0035) [2024-12-30 20:02:41,994][00338] Fps is (10 sec: 4096.1, 60 sec: 3959.5, 300 sec: 4026.6). Total num frames: 1318912. Throughput: 0: 969.7. Samples: 329650. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-30 20:02:41,996][00338] Avg episode reward: [(0, '5.449')] [2024-12-30 20:02:46,994][00338] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 4012.7). Total num frames: 1339392. Throughput: 0: 1002.8. Samples: 333242. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-30 20:02:47,000][00338] Avg episode reward: [(0, '5.286')] [2024-12-30 20:02:50,302][02937] Updated weights for policy 0, policy_version 330 (0.0014) [2024-12-30 20:02:51,995][00338] Fps is (10 sec: 3686.1, 60 sec: 3891.1, 300 sec: 3984.9). Total num frames: 1355776. Throughput: 0: 985.6. Samples: 338332. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-30 20:02:52,003][00338] Avg episode reward: [(0, '5.247')] [2024-12-30 20:02:56,994][00338] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3998.8). Total num frames: 1376256. Throughput: 0: 963.2. Samples: 344302. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-30 20:02:56,997][00338] Avg episode reward: [(0, '5.163')] [2024-12-30 20:02:59,869][02937] Updated weights for policy 0, policy_version 340 (0.0020) [2024-12-30 20:03:01,994][00338] Fps is (10 sec: 4506.0, 60 sec: 4096.0, 300 sec: 4026.6). Total num frames: 1400832. Throughput: 0: 987.6. Samples: 347886. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-30 20:03:01,997][00338] Avg episode reward: [(0, '5.257')] [2024-12-30 20:03:06,994][00338] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3998.8). Total num frames: 1417216. Throughput: 0: 1010.8. Samples: 353850. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-30 20:03:06,996][00338] Avg episode reward: [(0, '5.457')] [2024-12-30 20:03:11,159][02937] Updated weights for policy 0, policy_version 350 (0.0023) [2024-12-30 20:03:11,994][00338] Fps is (10 sec: 3686.4, 60 sec: 3891.3, 300 sec: 3998.8). Total num frames: 1437696. Throughput: 0: 962.0. Samples: 358834. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-30 20:03:11,996][00338] Avg episode reward: [(0, '5.737')] [2024-12-30 20:03:12,004][02924] Saving new best policy, reward=5.737! [2024-12-30 20:03:16,994][00338] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 4012.7). Total num frames: 1458176. Throughput: 0: 971.4. Samples: 362394. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-30 20:03:16,998][00338] Avg episode reward: [(0, '5.619')] [2024-12-30 20:03:19,730][02937] Updated weights for policy 0, policy_version 360 (0.0015) [2024-12-30 20:03:21,998][00338] Fps is (10 sec: 4094.6, 60 sec: 4027.6, 300 sec: 3998.8). Total num frames: 1478656. Throughput: 0: 1028.1. Samples: 369300. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-30 20:03:22,000][00338] Avg episode reward: [(0, '5.545')] [2024-12-30 20:03:26,994][00338] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3971.0). Total num frames: 1495040. Throughput: 0: 979.3. Samples: 373718. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-30 20:03:27,001][00338] Avg episode reward: [(0, '5.747')] [2024-12-30 20:03:27,010][02924] Saving new best policy, reward=5.747! [2024-12-30 20:03:31,160][02937] Updated weights for policy 0, policy_version 370 (0.0026) [2024-12-30 20:03:31,994][00338] Fps is (10 sec: 3687.7, 60 sec: 3959.5, 300 sec: 3998.8). Total num frames: 1515520. Throughput: 0: 975.1. Samples: 377120. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-30 20:03:32,002][00338] Avg episode reward: [(0, '6.008')] [2024-12-30 20:03:32,020][02924] Saving new best policy, reward=6.008! [2024-12-30 20:03:36,994][00338] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 4012.7). Total num frames: 1540096. Throughput: 0: 1019.9. Samples: 384226. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-30 20:03:36,997][00338] Avg episode reward: [(0, '5.952')] [2024-12-30 20:03:41,994][00338] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3971.0). Total num frames: 1552384. Throughput: 0: 994.4. Samples: 389050. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-30 20:03:42,005][00338] Avg episode reward: [(0, '5.880')] [2024-12-30 20:03:42,102][02937] Updated weights for policy 0, policy_version 380 (0.0022) [2024-12-30 20:03:46,994][00338] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3998.8). Total num frames: 1576960. Throughput: 0: 970.5. Samples: 391558. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-30 20:03:46,997][00338] Avg episode reward: [(0, '6.338')] [2024-12-30 20:03:47,007][02924] Saving new best policy, reward=6.338! [2024-12-30 20:03:51,236][02937] Updated weights for policy 0, policy_version 390 (0.0024) [2024-12-30 20:03:51,994][00338] Fps is (10 sec: 4505.6, 60 sec: 4027.8, 300 sec: 4012.7). Total num frames: 1597440. Throughput: 0: 996.6. Samples: 398696. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-30 20:03:51,998][00338] Avg episode reward: [(0, '6.392')] [2024-12-30 20:03:52,002][02924] Saving new best policy, reward=6.392! [2024-12-30 20:03:56,994][00338] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3998.8). Total num frames: 1617920. Throughput: 0: 1015.8. Samples: 404546. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-30 20:03:56,997][00338] Avg episode reward: [(0, '6.642')] [2024-12-30 20:03:57,011][02924] Saving new best policy, reward=6.642! [2024-12-30 20:04:01,994][00338] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3985.0). Total num frames: 1634304. Throughput: 0: 983.4. Samples: 406648. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-30 20:04:01,996][00338] Avg episode reward: [(0, '7.200')] [2024-12-30 20:04:02,004][02924] Saving new best policy, reward=7.200! [2024-12-30 20:04:02,639][02937] Updated weights for policy 0, policy_version 400 (0.0018) [2024-12-30 20:04:06,994][00338] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 4012.8). Total num frames: 1658880. Throughput: 0: 978.4. Samples: 413324. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-30 20:04:06,999][00338] Avg episode reward: [(0, '7.688')] [2024-12-30 20:04:07,008][02924] Saving new best policy, reward=7.688! [2024-12-30 20:04:11,503][02937] Updated weights for policy 0, policy_version 410 (0.0017) [2024-12-30 20:04:11,994][00338] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 4012.7). Total num frames: 1679360. Throughput: 0: 1030.6. Samples: 420096. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-30 20:04:11,996][00338] Avg episode reward: [(0, '7.982')] [2024-12-30 20:04:12,002][02924] Saving new best policy, reward=7.982! [2024-12-30 20:04:17,001][00338] Fps is (10 sec: 3274.7, 60 sec: 3890.8, 300 sec: 3970.9). Total num frames: 1691648. Throughput: 0: 998.4. Samples: 422054. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-30 20:04:17,003][00338] Avg episode reward: [(0, '8.522')] [2024-12-30 20:04:17,020][02924] Saving new best policy, reward=8.522! [2024-12-30 20:04:21,994][00338] Fps is (10 sec: 3686.4, 60 sec: 3959.7, 300 sec: 3998.8). Total num frames: 1716224. Throughput: 0: 965.6. Samples: 427678. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-30 20:04:21,999][00338] Avg episode reward: [(0, '9.253')] [2024-12-30 20:04:22,002][02924] Saving new best policy, reward=9.253! [2024-12-30 20:04:22,797][02937] Updated weights for policy 0, policy_version 420 (0.0027) [2024-12-30 20:04:26,994][00338] Fps is (10 sec: 4508.5, 60 sec: 4027.7, 300 sec: 3998.8). Total num frames: 1736704. Throughput: 0: 1015.6. Samples: 434750. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-30 20:04:26,997][00338] Avg episode reward: [(0, '8.960')] [2024-12-30 20:04:27,012][02924] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000424_1736704.pth... [2024-12-30 20:04:27,144][02924] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000191_782336.pth [2024-12-30 20:04:31,997][00338] Fps is (10 sec: 3685.5, 60 sec: 3959.3, 300 sec: 3971.0). Total num frames: 1753088. Throughput: 0: 1018.9. Samples: 437412. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-30 20:04:31,999][00338] Avg episode reward: [(0, '8.909')] [2024-12-30 20:04:34,001][02937] Updated weights for policy 0, policy_version 430 (0.0016) [2024-12-30 20:04:36,994][00338] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3984.9). Total num frames: 1773568. Throughput: 0: 964.6. Samples: 442104. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-30 20:04:36,997][00338] Avg episode reward: [(0, '9.415')] [2024-12-30 20:04:37,005][02924] Saving new best policy, reward=9.415! [2024-12-30 20:04:41,994][00338] Fps is (10 sec: 4506.7, 60 sec: 4096.0, 300 sec: 3998.8). Total num frames: 1798144. Throughput: 0: 992.6. Samples: 449214. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-30 20:04:41,997][00338] Avg episode reward: [(0, '10.134')] [2024-12-30 20:04:42,001][02924] Saving new best policy, reward=10.134! [2024-12-30 20:04:42,824][02937] Updated weights for policy 0, policy_version 440 (0.0017) [2024-12-30 20:04:46,994][00338] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3971.0). Total num frames: 1814528. Throughput: 0: 1023.1. Samples: 452686. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-30 20:04:47,000][00338] Avg episode reward: [(0, '10.608')] [2024-12-30 20:04:47,008][02924] Saving new best policy, reward=10.608! [2024-12-30 20:04:51,994][00338] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3957.2). Total num frames: 1830912. Throughput: 0: 972.9. Samples: 457104. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-30 20:04:51,997][00338] Avg episode reward: [(0, '10.380')] [2024-12-30 20:04:54,308][02937] Updated weights for policy 0, policy_version 450 (0.0014) [2024-12-30 20:04:56,994][00338] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3984.9). Total num frames: 1855488. Throughput: 0: 968.1. Samples: 463662. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-30 20:04:57,001][00338] Avg episode reward: [(0, '11.428')] [2024-12-30 20:04:57,011][02924] Saving new best policy, reward=11.428! [2024-12-30 20:05:01,995][00338] Fps is (10 sec: 4505.4, 60 sec: 4027.7, 300 sec: 3984.9). Total num frames: 1875968. Throughput: 0: 1004.5. Samples: 467252. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-30 20:05:01,997][00338] Avg episode reward: [(0, '11.505')] [2024-12-30 20:05:01,999][02924] Saving new best policy, reward=11.505! [2024-12-30 20:05:03,896][02937] Updated weights for policy 0, policy_version 460 (0.0018) [2024-12-30 20:05:06,994][00338] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3957.2). Total num frames: 1892352. Throughput: 0: 996.4. Samples: 472518. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-30 20:05:07,001][00338] Avg episode reward: [(0, '12.124')] [2024-12-30 20:05:07,016][02924] Saving new best policy, reward=12.124! [2024-12-30 20:05:11,994][00338] Fps is (10 sec: 3686.6, 60 sec: 3891.2, 300 sec: 3971.0). Total num frames: 1912832. Throughput: 0: 965.6. Samples: 478202. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-30 20:05:12,001][00338] Avg episode reward: [(0, '12.660')] [2024-12-30 20:05:12,005][02924] Saving new best policy, reward=12.660! [2024-12-30 20:05:14,296][02937] Updated weights for policy 0, policy_version 470 (0.0036) [2024-12-30 20:05:16,994][00338] Fps is (10 sec: 4505.6, 60 sec: 4096.4, 300 sec: 3984.9). Total num frames: 1937408. Throughput: 0: 982.1. Samples: 481604. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-30 20:05:17,001][00338] Avg episode reward: [(0, '10.789')] [2024-12-30 20:05:21,994][00338] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3971.0). Total num frames: 1953792. Throughput: 0: 1017.4. Samples: 487886. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-12-30 20:05:21,996][00338] Avg episode reward: [(0, '11.828')] [2024-12-30 20:05:25,580][02937] Updated weights for policy 0, policy_version 480 (0.0029) [2024-12-30 20:05:26,994][00338] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3957.2). Total num frames: 1970176. Throughput: 0: 966.0. Samples: 492682. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-30 20:05:26,997][00338] Avg episode reward: [(0, '11.629')] [2024-12-30 20:05:31,994][00338] Fps is (10 sec: 4096.0, 60 sec: 4027.9, 300 sec: 3984.9). Total num frames: 1994752. Throughput: 0: 968.5. Samples: 496270. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-30 20:05:32,001][00338] Avg episode reward: [(0, '13.302')] [2024-12-30 20:05:32,005][02924] Saving new best policy, reward=13.302! [2024-12-30 20:05:34,144][02937] Updated weights for policy 0, policy_version 490 (0.0021) [2024-12-30 20:05:36,998][00338] Fps is (10 sec: 4504.1, 60 sec: 4027.5, 300 sec: 3971.0). Total num frames: 2015232. Throughput: 0: 1031.3. Samples: 503514. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-30 20:05:37,004][00338] Avg episode reward: [(0, '14.132')] [2024-12-30 20:05:37,024][02924] Saving new best policy, reward=14.132! [2024-12-30 20:05:41,994][00338] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3957.2). Total num frames: 2031616. Throughput: 0: 981.9. Samples: 507848. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-30 20:05:42,001][00338] Avg episode reward: [(0, '14.532')] [2024-12-30 20:05:42,002][02924] Saving new best policy, reward=14.532! [2024-12-30 20:05:45,620][02937] Updated weights for policy 0, policy_version 500 (0.0016) [2024-12-30 20:05:46,994][00338] Fps is (10 sec: 3687.7, 60 sec: 3959.5, 300 sec: 3971.0). Total num frames: 2052096. Throughput: 0: 968.1. Samples: 510814. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-30 20:05:46,999][00338] Avg episode reward: [(0, '15.120')] [2024-12-30 20:05:47,008][02924] Saving new best policy, reward=15.120! [2024-12-30 20:05:51,994][00338] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 3984.9). Total num frames: 2076672. Throughput: 0: 1010.8. Samples: 518006. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-30 20:05:51,997][00338] Avg episode reward: [(0, '16.101')] [2024-12-30 20:05:52,005][02924] Saving new best policy, reward=16.101! [2024-12-30 20:05:55,774][02937] Updated weights for policy 0, policy_version 510 (0.0024) [2024-12-30 20:05:56,994][00338] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3943.3). Total num frames: 2088960. Throughput: 0: 997.8. Samples: 523104. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-30 20:05:56,999][00338] Avg episode reward: [(0, '16.200')] [2024-12-30 20:05:57,014][02924] Saving new best policy, reward=16.200! [2024-12-30 20:06:01,994][00338] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3957.2). Total num frames: 2109440. Throughput: 0: 970.5. Samples: 525276. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-30 20:06:02,001][00338] Avg episode reward: [(0, '16.456')] [2024-12-30 20:06:02,005][02924] Saving new best policy, reward=16.456! [2024-12-30 20:06:05,619][02937] Updated weights for policy 0, policy_version 520 (0.0016) [2024-12-30 20:06:06,994][00338] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3984.9). Total num frames: 2134016. Throughput: 0: 991.6. Samples: 532508. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-30 20:06:06,997][00338] Avg episode reward: [(0, '16.886')] [2024-12-30 20:06:07,003][02924] Saving new best policy, reward=16.886! [2024-12-30 20:06:11,994][00338] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3971.0). Total num frames: 2154496. Throughput: 0: 1021.0. Samples: 538628. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-30 20:06:11,998][00338] Avg episode reward: [(0, '17.220')] [2024-12-30 20:06:12,004][02924] Saving new best policy, reward=17.220! [2024-12-30 20:06:16,994][00338] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3943.3). Total num frames: 2166784. Throughput: 0: 987.9. Samples: 540724. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-30 20:06:16,997][00338] Avg episode reward: [(0, '16.939')] [2024-12-30 20:06:17,005][02937] Updated weights for policy 0, policy_version 530 (0.0023) [2024-12-30 20:06:21,995][00338] Fps is (10 sec: 3686.2, 60 sec: 3959.4, 300 sec: 3971.0). Total num frames: 2191360. Throughput: 0: 967.4. Samples: 547044. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-30 20:06:21,997][00338] Avg episode reward: [(0, '16.967')] [2024-12-30 20:06:25,523][02937] Updated weights for policy 0, policy_version 540 (0.0017) [2024-12-30 20:06:26,994][00338] Fps is (10 sec: 4915.2, 60 sec: 4096.0, 300 sec: 3984.9). Total num frames: 2215936. Throughput: 0: 1033.2. Samples: 554340. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-30 20:06:26,999][00338] Avg episode reward: [(0, '16.302')] [2024-12-30 20:06:27,011][02924] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000541_2215936.pth... [2024-12-30 20:06:27,172][02924] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000308_1261568.pth [2024-12-30 20:06:31,997][00338] Fps is (10 sec: 3685.7, 60 sec: 3891.0, 300 sec: 3943.2). Total num frames: 2228224. Throughput: 0: 1013.8. Samples: 556436. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-30 20:06:32,001][00338] Avg episode reward: [(0, '15.203')] [2024-12-30 20:06:36,671][02937] Updated weights for policy 0, policy_version 550 (0.0030) [2024-12-30 20:06:36,994][00338] Fps is (10 sec: 3686.4, 60 sec: 3959.7, 300 sec: 3971.0). Total num frames: 2252800. Throughput: 0: 971.6. Samples: 561728. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-30 20:06:36,997][00338] Avg episode reward: [(0, '14.382')] [2024-12-30 20:06:41,994][00338] Fps is (10 sec: 4916.4, 60 sec: 4096.0, 300 sec: 3998.8). Total num frames: 2277376. Throughput: 0: 1022.2. Samples: 569104. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-30 20:06:42,001][00338] Avg episode reward: [(0, '15.511')] [2024-12-30 20:06:46,810][02937] Updated weights for policy 0, policy_version 560 (0.0020) [2024-12-30 20:06:46,995][00338] Fps is (10 sec: 4095.8, 60 sec: 4027.7, 300 sec: 3971.0). Total num frames: 2293760. Throughput: 0: 1040.2. Samples: 572086. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-30 20:06:46,998][00338] Avg episode reward: [(0, '17.318')] [2024-12-30 20:06:47,008][02924] Saving new best policy, reward=17.318! [2024-12-30 20:06:51,994][00338] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3957.2). Total num frames: 2310144. Throughput: 0: 976.2. Samples: 576438. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-30 20:06:51,999][00338] Avg episode reward: [(0, '16.867')] [2024-12-30 20:06:56,627][02937] Updated weights for policy 0, policy_version 570 (0.0017) [2024-12-30 20:06:56,994][00338] Fps is (10 sec: 4096.2, 60 sec: 4096.0, 300 sec: 3998.8). Total num frames: 2334720. Throughput: 0: 999.2. Samples: 583590. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-30 20:06:57,001][00338] Avg episode reward: [(0, '17.922')] [2024-12-30 20:06:57,011][02924] Saving new best policy, reward=17.922! [2024-12-30 20:07:01,996][00338] Fps is (10 sec: 4504.6, 60 sec: 4095.8, 300 sec: 3984.9). Total num frames: 2355200. Throughput: 0: 1032.3. Samples: 587180. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-30 20:07:01,999][00338] Avg episode reward: [(0, '18.309')] [2024-12-30 20:07:02,003][02924] Saving new best policy, reward=18.309! [2024-12-30 20:07:06,994][00338] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3943.3). Total num frames: 2367488. Throughput: 0: 996.1. Samples: 591868. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-30 20:07:06,997][00338] Avg episode reward: [(0, '18.389')] [2024-12-30 20:07:07,046][02924] Saving new best policy, reward=18.389! [2024-12-30 20:07:07,978][02937] Updated weights for policy 0, policy_version 580 (0.0021) [2024-12-30 20:07:11,994][00338] Fps is (10 sec: 3687.2, 60 sec: 3959.5, 300 sec: 3984.9). Total num frames: 2392064. Throughput: 0: 974.8. Samples: 598208. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-30 20:07:12,002][00338] Avg episode reward: [(0, '17.932')] [2024-12-30 20:07:16,716][02937] Updated weights for policy 0, policy_version 590 (0.0019) [2024-12-30 20:07:16,994][00338] Fps is (10 sec: 4915.2, 60 sec: 4164.3, 300 sec: 3998.8). Total num frames: 2416640. Throughput: 0: 1005.3. Samples: 601672. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-30 20:07:17,001][00338] Avg episode reward: [(0, '18.134')] [2024-12-30 20:07:21,994][00338] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3957.2). Total num frames: 2428928. Throughput: 0: 1016.3. Samples: 607462. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-30 20:07:21,997][00338] Avg episode reward: [(0, '18.246')] [2024-12-30 20:07:26,994][00338] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3971.0). Total num frames: 2449408. Throughput: 0: 972.1. Samples: 612850. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-30 20:07:27,002][00338] Avg episode reward: [(0, '17.406')] [2024-12-30 20:07:27,854][02937] Updated weights for policy 0, policy_version 600 (0.0014) [2024-12-30 20:07:31,994][00338] Fps is (10 sec: 4505.7, 60 sec: 4096.2, 300 sec: 3998.8). Total num frames: 2473984. Throughput: 0: 985.1. Samples: 616416. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-30 20:07:31,996][00338] Avg episode reward: [(0, '18.745')] [2024-12-30 20:07:32,007][02924] Saving new best policy, reward=18.745! [2024-12-30 20:07:36,994][00338] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3984.9). Total num frames: 2494464. Throughput: 0: 1036.3. Samples: 623070. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-30 20:07:37,001][00338] Avg episode reward: [(0, '17.921')] [2024-12-30 20:07:38,266][02937] Updated weights for policy 0, policy_version 610 (0.0018) [2024-12-30 20:07:41,994][00338] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3971.0). Total num frames: 2510848. Throughput: 0: 975.3. Samples: 627478. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-30 20:07:41,997][00338] Avg episode reward: [(0, '17.921')] [2024-12-30 20:07:46,994][00338] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3984.9). Total num frames: 2531328. Throughput: 0: 974.2. Samples: 631018. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-30 20:07:46,999][00338] Avg episode reward: [(0, '20.292')] [2024-12-30 20:07:47,008][02924] Saving new best policy, reward=20.292! [2024-12-30 20:07:47,989][02937] Updated weights for policy 0, policy_version 620 (0.0041) [2024-12-30 20:07:52,000][00338] Fps is (10 sec: 4503.1, 60 sec: 4095.6, 300 sec: 3998.7). Total num frames: 2555904. Throughput: 0: 1027.8. Samples: 638126. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-30 20:07:52,002][00338] Avg episode reward: [(0, '18.989')] [2024-12-30 20:07:56,994][00338] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3957.2). Total num frames: 2568192. Throughput: 0: 989.0. Samples: 642714. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-30 20:07:57,007][00338] Avg episode reward: [(0, '17.805')] [2024-12-30 20:07:59,125][02937] Updated weights for policy 0, policy_version 630 (0.0022) [2024-12-30 20:08:01,994][00338] Fps is (10 sec: 3688.4, 60 sec: 3959.6, 300 sec: 3984.9). Total num frames: 2592768. Throughput: 0: 976.4. Samples: 645610. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-30 20:08:01,997][00338] Avg episode reward: [(0, '19.331')] [2024-12-30 20:08:06,994][00338] Fps is (10 sec: 4915.2, 60 sec: 4164.3, 300 sec: 3998.8). Total num frames: 2617344. Throughput: 0: 1008.0. Samples: 652822. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-30 20:08:06,997][00338] Avg episode reward: [(0, '18.677')] [2024-12-30 20:08:07,736][02937] Updated weights for policy 0, policy_version 640 (0.0030) [2024-12-30 20:08:11,995][00338] Fps is (10 sec: 3686.1, 60 sec: 3959.4, 300 sec: 3971.0). Total num frames: 2629632. Throughput: 0: 1010.8. Samples: 658336. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-30 20:08:12,002][00338] Avg episode reward: [(0, '18.678')] [2024-12-30 20:08:16,994][00338] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3971.1). Total num frames: 2650112. Throughput: 0: 978.7. Samples: 660456. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-30 20:08:17,001][00338] Avg episode reward: [(0, '19.203')] [2024-12-30 20:08:19,133][02937] Updated weights for policy 0, policy_version 650 (0.0027) [2024-12-30 20:08:21,994][00338] Fps is (10 sec: 4505.9, 60 sec: 4096.0, 300 sec: 3998.8). Total num frames: 2674688. Throughput: 0: 984.6. Samples: 667378. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-30 20:08:22,001][00338] Avg episode reward: [(0, '19.413')] [2024-12-30 20:08:26,995][00338] Fps is (10 sec: 4505.4, 60 sec: 4096.0, 300 sec: 3998.8). Total num frames: 2695168. Throughput: 0: 1031.7. Samples: 673904. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-30 20:08:26,997][00338] Avg episode reward: [(0, '20.996')] [2024-12-30 20:08:27,011][02924] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000658_2695168.pth... [2024-12-30 20:08:27,178][02924] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000424_1736704.pth [2024-12-30 20:08:27,195][02924] Saving new best policy, reward=20.996! [2024-12-30 20:08:29,924][02937] Updated weights for policy 0, policy_version 660 (0.0015) [2024-12-30 20:08:31,994][00338] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3957.2). Total num frames: 2707456. Throughput: 0: 999.0. Samples: 675974. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-30 20:08:31,999][00338] Avg episode reward: [(0, '20.689')] [2024-12-30 20:08:36,994][00338] Fps is (10 sec: 3686.6, 60 sec: 3959.5, 300 sec: 3998.8). Total num frames: 2732032. Throughput: 0: 973.6. Samples: 681934. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-30 20:08:37,000][00338] Avg episode reward: [(0, '21.298')] [2024-12-30 20:08:37,009][02924] Saving new best policy, reward=21.298! [2024-12-30 20:08:39,203][02937] Updated weights for policy 0, policy_version 670 (0.0028) [2024-12-30 20:08:41,994][00338] Fps is (10 sec: 4915.2, 60 sec: 4096.0, 300 sec: 3998.8). Total num frames: 2756608. Throughput: 0: 1030.3. Samples: 689078. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-30 20:08:41,997][00338] Avg episode reward: [(0, '23.269')] [2024-12-30 20:08:42,002][02924] Saving new best policy, reward=23.269! [2024-12-30 20:08:46,998][00338] Fps is (10 sec: 3685.1, 60 sec: 3959.2, 300 sec: 3971.0). Total num frames: 2768896. Throughput: 0: 1019.4. Samples: 691486. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-30 20:08:47,000][00338] Avg episode reward: [(0, '22.685')] [2024-12-30 20:08:50,625][02937] Updated weights for policy 0, policy_version 680 (0.0017) [2024-12-30 20:08:51,994][00338] Fps is (10 sec: 3276.8, 60 sec: 3891.6, 300 sec: 3971.0). Total num frames: 2789376. Throughput: 0: 968.8. Samples: 696420. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-30 20:08:52,001][00338] Avg episode reward: [(0, '20.595')] [2024-12-30 20:08:56,994][00338] Fps is (10 sec: 4507.1, 60 sec: 4096.0, 300 sec: 3998.8). Total num frames: 2813952. Throughput: 0: 1005.6. Samples: 703588. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-30 20:08:56,999][00338] Avg episode reward: [(0, '22.092')] [2024-12-30 20:08:59,380][02937] Updated weights for policy 0, policy_version 690 (0.0018) [2024-12-30 20:09:01,994][00338] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3971.0). Total num frames: 2830336. Throughput: 0: 1035.9. Samples: 707070. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-30 20:09:02,000][00338] Avg episode reward: [(0, '21.348')] [2024-12-30 20:09:06,995][00338] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3957.1). Total num frames: 2846720. Throughput: 0: 977.8. Samples: 711378. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-30 20:09:06,997][00338] Avg episode reward: [(0, '21.252')] [2024-12-30 20:09:10,362][02937] Updated weights for policy 0, policy_version 700 (0.0021) [2024-12-30 20:09:11,994][00338] Fps is (10 sec: 4096.0, 60 sec: 4027.8, 300 sec: 3998.9). Total num frames: 2871296. Throughput: 0: 987.4. Samples: 718338. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-30 20:09:11,996][00338] Avg episode reward: [(0, '23.068')] [2024-12-30 20:09:16,994][00338] Fps is (10 sec: 4915.4, 60 sec: 4096.0, 300 sec: 3998.8). Total num frames: 2895872. Throughput: 0: 1019.7. Samples: 721860. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-30 20:09:17,002][00338] Avg episode reward: [(0, '23.341')] [2024-12-30 20:09:17,015][02924] Saving new best policy, reward=23.341! [2024-12-30 20:09:21,210][02937] Updated weights for policy 0, policy_version 710 (0.0019) [2024-12-30 20:09:21,994][00338] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3971.0). Total num frames: 2908160. Throughput: 0: 995.5. Samples: 726732. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-30 20:09:22,003][00338] Avg episode reward: [(0, '23.111')] [2024-12-30 20:09:26,994][00338] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3998.8). Total num frames: 2932736. Throughput: 0: 971.6. Samples: 732802. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-30 20:09:27,001][00338] Avg episode reward: [(0, '22.742')] [2024-12-30 20:09:30,330][02937] Updated weights for policy 0, policy_version 720 (0.0015) [2024-12-30 20:09:31,994][00338] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 3998.8). Total num frames: 2953216. Throughput: 0: 998.7. Samples: 736422. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-30 20:09:31,996][00338] Avg episode reward: [(0, '23.523')] [2024-12-30 20:09:32,044][02924] Saving new best policy, reward=23.523! [2024-12-30 20:09:36,995][00338] Fps is (10 sec: 3686.1, 60 sec: 3959.4, 300 sec: 3971.0). Total num frames: 2969600. Throughput: 0: 1020.5. Samples: 742344. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-30 20:09:36,998][00338] Avg episode reward: [(0, '22.349')] [2024-12-30 20:09:41,680][02937] Updated weights for policy 0, policy_version 730 (0.0016) [2024-12-30 20:09:41,994][00338] Fps is (10 sec: 3686.3, 60 sec: 3891.2, 300 sec: 3984.9). Total num frames: 2990080. Throughput: 0: 972.5. Samples: 747352. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-30 20:09:41,999][00338] Avg episode reward: [(0, '22.614')] [2024-12-30 20:09:46,994][00338] Fps is (10 sec: 4096.3, 60 sec: 4028.0, 300 sec: 3998.8). Total num frames: 3010560. Throughput: 0: 974.6. Samples: 750926. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-30 20:09:46,997][00338] Avg episode reward: [(0, '22.465')] [2024-12-30 20:09:51,116][02937] Updated weights for policy 0, policy_version 740 (0.0013) [2024-12-30 20:09:52,001][00338] Fps is (10 sec: 4093.2, 60 sec: 4027.3, 300 sec: 3984.8). Total num frames: 3031040. Throughput: 0: 1027.5. Samples: 757624. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-30 20:09:52,004][00338] Avg episode reward: [(0, '24.615')] [2024-12-30 20:09:52,006][02924] Saving new best policy, reward=24.615! [2024-12-30 20:09:56,994][00338] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3971.0). Total num frames: 3047424. Throughput: 0: 969.0. Samples: 761942. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-30 20:09:56,998][00338] Avg episode reward: [(0, '24.300')] [2024-12-30 20:10:01,882][02937] Updated weights for policy 0, policy_version 750 (0.0027) [2024-12-30 20:10:01,994][00338] Fps is (10 sec: 4098.8, 60 sec: 4027.7, 300 sec: 3998.8). Total num frames: 3072000. Throughput: 0: 963.7. Samples: 765226. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-30 20:10:01,998][00338] Avg episode reward: [(0, '25.283')] [2024-12-30 20:10:02,001][02924] Saving new best policy, reward=25.283! [2024-12-30 20:10:06,999][00338] Fps is (10 sec: 4503.6, 60 sec: 4095.7, 300 sec: 3998.7). Total num frames: 3092480. Throughput: 0: 1014.2. Samples: 772376. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-30 20:10:07,003][00338] Avg episode reward: [(0, '24.958')] [2024-12-30 20:10:11,994][00338] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3971.0). Total num frames: 3108864. Throughput: 0: 989.6. Samples: 777334. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-30 20:10:11,997][00338] Avg episode reward: [(0, '25.480')] [2024-12-30 20:10:12,001][02924] Saving new best policy, reward=25.480! [2024-12-30 20:10:13,253][02937] Updated weights for policy 0, policy_version 760 (0.0022) [2024-12-30 20:10:16,994][00338] Fps is (10 sec: 3688.0, 60 sec: 3891.2, 300 sec: 3984.9). Total num frames: 3129344. Throughput: 0: 962.3. Samples: 779724. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-30 20:10:17,000][00338] Avg episode reward: [(0, '24.827')] [2024-12-30 20:10:21,938][02937] Updated weights for policy 0, policy_version 770 (0.0020) [2024-12-30 20:10:21,994][00338] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 4012.7). Total num frames: 3153920. Throughput: 0: 989.7. Samples: 786878. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-30 20:10:21,999][00338] Avg episode reward: [(0, '22.796')] [2024-12-30 20:10:26,996][00338] Fps is (10 sec: 4095.4, 60 sec: 3959.4, 300 sec: 3984.9). Total num frames: 3170304. Throughput: 0: 1011.4. Samples: 792868. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-30 20:10:27,005][00338] Avg episode reward: [(0, '22.265')] [2024-12-30 20:10:27,013][02924] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000774_3170304.pth... [2024-12-30 20:10:27,167][02924] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000541_2215936.pth [2024-12-30 20:10:31,994][00338] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3971.1). Total num frames: 3186688. Throughput: 0: 980.0. Samples: 795026. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-30 20:10:31,999][00338] Avg episode reward: [(0, '21.560')] [2024-12-30 20:10:33,191][02937] Updated weights for policy 0, policy_version 780 (0.0039) [2024-12-30 20:10:36,997][00338] Fps is (10 sec: 4095.6, 60 sec: 4027.6, 300 sec: 3998.8). Total num frames: 3211264. Throughput: 0: 979.2. Samples: 801684. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-30 20:10:37,004][00338] Avg episode reward: [(0, '20.763')] [2024-12-30 20:10:41,994][00338] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3998.8). Total num frames: 3231744. Throughput: 0: 1039.7. Samples: 808730. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-30 20:10:42,001][00338] Avg episode reward: [(0, '21.018')] [2024-12-30 20:10:42,184][02937] Updated weights for policy 0, policy_version 790 (0.0013) [2024-12-30 20:10:46,994][00338] Fps is (10 sec: 3687.3, 60 sec: 3959.5, 300 sec: 3971.0). Total num frames: 3248128. Throughput: 0: 1014.0. Samples: 810854. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-30 20:10:47,001][00338] Avg episode reward: [(0, '22.977')] [2024-12-30 20:10:51,994][00338] Fps is (10 sec: 3686.4, 60 sec: 3959.9, 300 sec: 3998.8). Total num frames: 3268608. Throughput: 0: 978.8. Samples: 816418. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-30 20:10:51,997][00338] Avg episode reward: [(0, '22.164')] [2024-12-30 20:10:53,032][02937] Updated weights for policy 0, policy_version 800 (0.0013) [2024-12-30 20:10:56,994][00338] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 4012.7). Total num frames: 3293184. Throughput: 0: 1023.3. Samples: 823384. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-30 20:10:57,003][00338] Avg episode reward: [(0, '22.531')] [2024-12-30 20:11:01,994][00338] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3984.9). Total num frames: 3309568. Throughput: 0: 1032.1. Samples: 826168. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-30 20:11:01,999][00338] Avg episode reward: [(0, '22.995')] [2024-12-30 20:11:04,581][02937] Updated weights for policy 0, policy_version 810 (0.0024) [2024-12-30 20:11:06,994][00338] Fps is (10 sec: 3276.8, 60 sec: 3891.5, 300 sec: 3971.0). Total num frames: 3325952. Throughput: 0: 972.4. Samples: 830638. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-30 20:11:07,002][00338] Avg episode reward: [(0, '22.243')] [2024-12-30 20:11:11,995][00338] Fps is (10 sec: 4095.9, 60 sec: 4027.7, 300 sec: 4012.7). Total num frames: 3350528. Throughput: 0: 996.3. Samples: 837702. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-30 20:11:12,002][00338] Avg episode reward: [(0, '21.920')] [2024-12-30 20:11:13,216][02937] Updated weights for policy 0, policy_version 820 (0.0014) [2024-12-30 20:11:16,998][00338] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3998.8). Total num frames: 3371008. Throughput: 0: 1027.6. Samples: 841266. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-30 20:11:17,005][00338] Avg episode reward: [(0, '22.273')] [2024-12-30 20:11:21,994][00338] Fps is (10 sec: 3276.9, 60 sec: 3822.9, 300 sec: 3957.2). Total num frames: 3383296. Throughput: 0: 978.1. Samples: 845694. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-30 20:11:21,996][00338] Avg episode reward: [(0, '22.489')] [2024-12-30 20:11:24,626][02937] Updated weights for policy 0, policy_version 830 (0.0036) [2024-12-30 20:11:26,994][00338] Fps is (10 sec: 3686.4, 60 sec: 3959.6, 300 sec: 3998.8). Total num frames: 3407872. Throughput: 0: 966.4. Samples: 852216. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-30 20:11:27,003][00338] Avg episode reward: [(0, '22.932')] [2024-12-30 20:11:31,994][00338] Fps is (10 sec: 4915.2, 60 sec: 4096.0, 300 sec: 3998.8). Total num frames: 3432448. Throughput: 0: 995.7. Samples: 855660. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-30 20:11:32,002][00338] Avg episode reward: [(0, '24.577')] [2024-12-30 20:11:34,413][02937] Updated weights for policy 0, policy_version 840 (0.0019) [2024-12-30 20:11:36,997][00338] Fps is (10 sec: 3685.5, 60 sec: 3891.2, 300 sec: 3957.1). Total num frames: 3444736. Throughput: 0: 995.6. Samples: 861224. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-30 20:11:36,999][00338] Avg episode reward: [(0, '24.389')] [2024-12-30 20:11:41,994][00338] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3971.0). Total num frames: 3465216. Throughput: 0: 957.8. Samples: 866486. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-30 20:11:41,997][00338] Avg episode reward: [(0, '25.368')] [2024-12-30 20:11:45,022][02937] Updated weights for policy 0, policy_version 850 (0.0031) [2024-12-30 20:11:46,994][00338] Fps is (10 sec: 4506.7, 60 sec: 4027.7, 300 sec: 3998.8). Total num frames: 3489792. Throughput: 0: 972.2. Samples: 869916. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-30 20:11:46,997][00338] Avg episode reward: [(0, '25.328')] [2024-12-30 20:11:51,994][00338] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3971.0). Total num frames: 3506176. Throughput: 0: 1009.8. Samples: 876080. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-30 20:11:51,999][00338] Avg episode reward: [(0, '24.922')] [2024-12-30 20:11:56,841][02937] Updated weights for policy 0, policy_version 860 (0.0026) [2024-12-30 20:11:56,994][00338] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3957.2). Total num frames: 3522560. Throughput: 0: 950.8. Samples: 880488. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-30 20:11:56,997][00338] Avg episode reward: [(0, '24.741')] [2024-12-30 20:12:01,995][00338] Fps is (10 sec: 3686.3, 60 sec: 3891.2, 300 sec: 3984.9). Total num frames: 3543040. Throughput: 0: 948.5. Samples: 883948. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-30 20:12:02,001][00338] Avg episode reward: [(0, '23.888')] [2024-12-30 20:12:05,482][02937] Updated weights for policy 0, policy_version 870 (0.0020) [2024-12-30 20:12:06,994][00338] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3984.9). Total num frames: 3567616. Throughput: 0: 1007.9. Samples: 891048. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-30 20:12:06,996][00338] Avg episode reward: [(0, '24.077')] [2024-12-30 20:12:11,994][00338] Fps is (10 sec: 3686.5, 60 sec: 3823.0, 300 sec: 3943.3). Total num frames: 3579904. Throughput: 0: 962.8. Samples: 895542. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-30 20:12:12,000][00338] Avg episode reward: [(0, '24.367')] [2024-12-30 20:12:16,855][02937] Updated weights for policy 0, policy_version 880 (0.0017) [2024-12-30 20:12:16,994][00338] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3984.9). Total num frames: 3604480. Throughput: 0: 954.3. Samples: 898602. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-30 20:12:17,001][00338] Avg episode reward: [(0, '24.123')] [2024-12-30 20:12:21,994][00338] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3984.9). Total num frames: 3624960. Throughput: 0: 989.0. Samples: 905728. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-30 20:12:21,996][00338] Avg episode reward: [(0, '23.791')] [2024-12-30 20:12:26,999][00338] Fps is (10 sec: 3684.8, 60 sec: 3890.9, 300 sec: 3957.1). Total num frames: 3641344. Throughput: 0: 989.9. Samples: 911036. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-30 20:12:27,001][00338] Avg episode reward: [(0, '24.124')] [2024-12-30 20:12:27,014][02924] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000889_3641344.pth... [2024-12-30 20:12:27,224][02924] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000658_2695168.pth [2024-12-30 20:12:27,341][02937] Updated weights for policy 0, policy_version 890 (0.0057) [2024-12-30 20:12:31,994][00338] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3957.2). Total num frames: 3661824. Throughput: 0: 962.4. Samples: 913224. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-30 20:12:32,002][00338] Avg episode reward: [(0, '24.495')] [2024-12-30 20:12:36,784][02937] Updated weights for policy 0, policy_version 900 (0.0032) [2024-12-30 20:12:36,994][00338] Fps is (10 sec: 4507.6, 60 sec: 4027.9, 300 sec: 3984.9). Total num frames: 3686400. Throughput: 0: 981.6. Samples: 920250. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-30 20:12:36,997][00338] Avg episode reward: [(0, '24.720')] [2024-12-30 20:12:41,994][00338] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3971.0). Total num frames: 3702784. Throughput: 0: 1025.9. Samples: 926654. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-30 20:12:41,997][00338] Avg episode reward: [(0, '26.380')] [2024-12-30 20:12:42,056][02924] Saving new best policy, reward=26.380! [2024-12-30 20:12:46,994][00338] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3943.3). Total num frames: 3719168. Throughput: 0: 995.7. Samples: 928754. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-30 20:12:46,999][00338] Avg episode reward: [(0, '26.367')] [2024-12-30 20:12:48,161][02937] Updated weights for policy 0, policy_version 910 (0.0028) [2024-12-30 20:12:51,994][00338] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3984.9). Total num frames: 3743744. Throughput: 0: 972.2. Samples: 934796. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-30 20:12:51,999][00338] Avg episode reward: [(0, '27.231')] [2024-12-30 20:12:52,002][02924] Saving new best policy, reward=27.231! [2024-12-30 20:12:56,994][00338] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3971.0). Total num frames: 3764224. Throughput: 0: 1031.8. Samples: 941974. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-30 20:12:57,001][00338] Avg episode reward: [(0, '28.804')] [2024-12-30 20:12:57,012][02924] Saving new best policy, reward=28.804! [2024-12-30 20:12:57,338][02937] Updated weights for policy 0, policy_version 920 (0.0017) [2024-12-30 20:13:01,994][00338] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3943.3). Total num frames: 3780608. Throughput: 0: 1008.8. Samples: 943998. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-30 20:13:01,996][00338] Avg episode reward: [(0, '28.385')] [2024-12-30 20:13:06,994][00338] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3971.0). Total num frames: 3801088. Throughput: 0: 971.0. Samples: 949424. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-30 20:13:06,996][00338] Avg episode reward: [(0, '28.096')] [2024-12-30 20:13:08,158][02937] Updated weights for policy 0, policy_version 930 (0.0017) [2024-12-30 20:13:11,994][00338] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 3984.9). Total num frames: 3825664. Throughput: 0: 1011.9. Samples: 956566. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-30 20:13:12,001][00338] Avg episode reward: [(0, '26.923')] [2024-12-30 20:13:16,996][00338] Fps is (10 sec: 4095.3, 60 sec: 3959.3, 300 sec: 3957.1). Total num frames: 3842048. Throughput: 0: 1028.8. Samples: 959520. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-30 20:13:17,002][00338] Avg episode reward: [(0, '26.471')] [2024-12-30 20:13:19,174][02937] Updated weights for policy 0, policy_version 940 (0.0021) [2024-12-30 20:13:22,001][00338] Fps is (10 sec: 3274.7, 60 sec: 3890.8, 300 sec: 3943.2). Total num frames: 3858432. Throughput: 0: 968.8. Samples: 963854. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-30 20:13:22,006][00338] Avg episode reward: [(0, '25.451')] [2024-12-30 20:13:26,994][00338] Fps is (10 sec: 4096.7, 60 sec: 4028.0, 300 sec: 3984.9). Total num frames: 3883008. Throughput: 0: 982.4. Samples: 970860. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-30 20:13:26,997][00338] Avg episode reward: [(0, '24.357')] [2024-12-30 20:13:28,320][02937] Updated weights for policy 0, policy_version 950 (0.0017) [2024-12-30 20:13:31,994][00338] Fps is (10 sec: 4508.5, 60 sec: 4027.7, 300 sec: 3971.0). Total num frames: 3903488. Throughput: 0: 1014.9. Samples: 974426. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-30 20:13:31,997][00338] Avg episode reward: [(0, '23.261')] [2024-12-30 20:13:36,994][00338] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3943.3). Total num frames: 3919872. Throughput: 0: 984.4. Samples: 979094. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-30 20:13:36,996][00338] Avg episode reward: [(0, '23.994')] [2024-12-30 20:13:39,413][02937] Updated weights for policy 0, policy_version 960 (0.0029) [2024-12-30 20:13:41,994][00338] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3971.1). Total num frames: 3940352. Throughput: 0: 967.6. Samples: 985516. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-30 20:13:42,005][00338] Avg episode reward: [(0, '22.942')] [2024-12-30 20:13:46,996][00338] Fps is (10 sec: 4504.9, 60 sec: 4095.9, 300 sec: 3984.9). Total num frames: 3964928. Throughput: 0: 1002.0. Samples: 989090. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0) [2024-12-30 20:13:47,003][00338] Avg episode reward: [(0, '22.463')] [2024-12-30 20:13:48,826][02937] Updated weights for policy 0, policy_version 970 (0.0039) [2024-12-30 20:13:51,995][00338] Fps is (10 sec: 4095.7, 60 sec: 3959.4, 300 sec: 3957.1). Total num frames: 3981312. Throughput: 0: 1004.5. Samples: 994628. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-30 20:13:51,998][00338] Avg episode reward: [(0, '22.844')] [2024-12-30 20:13:56,994][00338] Fps is (10 sec: 3686.9, 60 sec: 3959.5, 300 sec: 3971.0). Total num frames: 4001792. Throughput: 0: 965.9. Samples: 1000032. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-30 20:13:56,997][00338] Avg episode reward: [(0, '22.690')] [2024-12-30 20:13:57,802][02924] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-12-30 20:13:57,807][00338] Component Batcher_0 stopped! [2024-12-30 20:13:57,803][02924] Stopping Batcher_0... [2024-12-30 20:13:57,816][02924] Loop batcher_evt_loop terminating... [2024-12-30 20:13:57,880][02937] Weights refcount: 2 0 [2024-12-30 20:13:57,885][00338] Component InferenceWorker_p0-w0 stopped! [2024-12-30 20:13:57,893][02937] Stopping InferenceWorker_p0-w0... [2024-12-30 20:13:57,894][02937] Loop inference_proc0-0_evt_loop terminating... [2024-12-30 20:13:57,962][02924] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000774_3170304.pth [2024-12-30 20:13:57,984][02924] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-12-30 20:13:58,149][02943] Stopping RolloutWorker_w6... [2024-12-30 20:13:58,148][00338] Component RolloutWorker_w6 stopped! [2024-12-30 20:13:58,150][02943] Loop rollout_proc6_evt_loop terminating... [2024-12-30 20:13:58,197][02940] Stopping RolloutWorker_w2... [2024-12-30 20:13:58,197][00338] Component RolloutWorker_w2 stopped! [2024-12-30 20:13:58,205][00338] Component RolloutWorker_w0 stopped! [2024-12-30 20:13:58,205][02938] Stopping RolloutWorker_w0... [2024-12-30 20:13:58,223][02938] Loop rollout_proc0_evt_loop terminating... [2024-12-30 20:13:58,226][02924] Stopping LearnerWorker_p0... [2024-12-30 20:13:58,200][02940] Loop rollout_proc2_evt_loop terminating... [2024-12-30 20:13:58,229][02924] Loop learner_proc0_evt_loop terminating... [2024-12-30 20:13:58,226][00338] Component LearnerWorker_p0 stopped! [2024-12-30 20:13:58,246][02942] Stopping RolloutWorker_w4... [2024-12-30 20:13:58,247][02942] Loop rollout_proc4_evt_loop terminating... [2024-12-30 20:13:58,246][00338] Component RolloutWorker_w4 stopped! [2024-12-30 20:13:58,287][00338] Component RolloutWorker_w7 stopped! [2024-12-30 20:13:58,292][02945] Stopping RolloutWorker_w7... [2024-12-30 20:13:58,295][02945] Loop rollout_proc7_evt_loop terminating... [2024-12-30 20:13:58,313][00338] Component RolloutWorker_w5 stopped! [2024-12-30 20:13:58,319][02944] Stopping RolloutWorker_w5... [2024-12-30 20:13:58,323][00338] Component RolloutWorker_w1 stopped! [2024-12-30 20:13:58,327][02939] Stopping RolloutWorker_w1... [2024-12-30 20:13:58,320][02944] Loop rollout_proc5_evt_loop terminating... [2024-12-30 20:13:58,328][02939] Loop rollout_proc1_evt_loop terminating... [2024-12-30 20:13:58,345][00338] Component RolloutWorker_w3 stopped! [2024-12-30 20:13:58,351][00338] Waiting for process learner_proc0 to stop... [2024-12-30 20:13:58,357][02941] Stopping RolloutWorker_w3... [2024-12-30 20:13:58,358][02941] Loop rollout_proc3_evt_loop terminating... [2024-12-30 20:13:59,931][00338] Waiting for process inference_proc0-0 to join... [2024-12-30 20:13:59,938][00338] Waiting for process rollout_proc0 to join... [2024-12-30 20:14:01,834][00338] Waiting for process rollout_proc1 to join... [2024-12-30 20:14:01,843][00338] Waiting for process rollout_proc2 to join... [2024-12-30 20:14:01,846][00338] Waiting for process rollout_proc3 to join... [2024-12-30 20:14:01,849][00338] Waiting for process rollout_proc4 to join... [2024-12-30 20:14:01,852][00338] Waiting for process rollout_proc5 to join... [2024-12-30 20:14:01,856][00338] Waiting for process rollout_proc6 to join... [2024-12-30 20:14:01,861][00338] Waiting for process rollout_proc7 to join... [2024-12-30 20:14:01,865][00338] Batcher 0 profile tree view: batching: 26.6346, releasing_batches: 0.0307 [2024-12-30 20:14:01,866][00338] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0000 wait_policy_total: 408.7284 update_model: 8.2431 weight_update: 0.0032 one_step: 0.0058 handle_policy_step: 559.6663 deserialize: 14.1682, stack: 3.0056, obs_to_device_normalize: 120.3126, forward: 280.5517, send_messages: 27.2702 prepare_outputs: 85.7343 to_cpu: 52.2131 [2024-12-30 20:14:01,869][00338] Learner 0 profile tree view: misc: 0.0060, prepare_batch: 15.4795 train: 74.7115 epoch_init: 0.0062, minibatch_init: 0.0061, losses_postprocess: 0.6784, kl_divergence: 0.7140, after_optimizer: 33.3833 calculate_losses: 27.4171 losses_init: 0.0037, forward_head: 1.4711, bptt_initial: 18.7289, tail: 1.0654, advantages_returns: 0.2509, losses: 3.7698 bptt: 1.8192 bptt_forward_core: 1.7231 update: 11.8961 clip: 0.8663 [2024-12-30 20:14:01,871][00338] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.3129, enqueue_policy_requests: 96.5137, env_step: 798.1487, overhead: 11.9352, complete_rollouts: 7.0398 save_policy_outputs: 20.1138 split_output_tensors: 7.8855 [2024-12-30 20:14:01,873][00338] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.3270, enqueue_policy_requests: 96.3087, env_step: 797.6427, overhead: 12.5089, complete_rollouts: 6.8090 save_policy_outputs: 19.2943 split_output_tensors: 7.8342 [2024-12-30 20:14:01,874][00338] Loop Runner_EvtLoop terminating... [2024-12-30 20:14:01,877][00338] Runner profile tree view: main_loop: 1049.2810 [2024-12-30 20:14:01,881][00338] Collected {0: 4005888}, FPS: 3817.7 [2024-12-30 20:14:02,287][00338] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-12-30 20:14:02,289][00338] Overriding arg 'num_workers' with value 1 passed from command line [2024-12-30 20:14:02,292][00338] Adding new argument 'no_render'=True that is not in the saved config file! [2024-12-30 20:14:02,294][00338] Adding new argument 'save_video'=True that is not in the saved config file! [2024-12-30 20:14:02,296][00338] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-12-30 20:14:02,298][00338] Adding new argument 'video_name'=None that is not in the saved config file! [2024-12-30 20:14:02,300][00338] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2024-12-30 20:14:02,302][00338] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-12-30 20:14:02,303][00338] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2024-12-30 20:14:02,304][00338] Adding new argument 'hf_repository'=None that is not in the saved config file! [2024-12-30 20:14:02,305][00338] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-12-30 20:14:02,305][00338] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-12-30 20:14:02,306][00338] Adding new argument 'train_script'=None that is not in the saved config file! [2024-12-30 20:14:02,307][00338] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-12-30 20:14:02,308][00338] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-12-30 20:14:02,340][00338] Doom resolution: 160x120, resize resolution: (128, 72) [2024-12-30 20:14:02,344][00338] RunningMeanStd input shape: (3, 72, 128) [2024-12-30 20:14:02,346][00338] RunningMeanStd input shape: (1,) [2024-12-30 20:14:02,362][00338] ConvEncoder: input_channels=3 [2024-12-30 20:14:02,462][00338] Conv encoder output size: 512 [2024-12-30 20:14:02,464][00338] Policy head output size: 512 [2024-12-30 20:14:02,728][00338] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-12-30 20:14:03,546][00338] Num frames 100... [2024-12-30 20:14:03,666][00338] Num frames 200... [2024-12-30 20:14:03,841][00338] Num frames 300... [2024-12-30 20:14:04,014][00338] Num frames 400... [2024-12-30 20:14:04,185][00338] Num frames 500... [2024-12-30 20:14:04,360][00338] Num frames 600... [2024-12-30 20:14:04,525][00338] Num frames 700... [2024-12-30 20:14:04,692][00338] Num frames 800... [2024-12-30 20:14:04,865][00338] Num frames 900... [2024-12-30 20:14:05,029][00338] Num frames 1000... [2024-12-30 20:14:05,209][00338] Num frames 1100... [2024-12-30 20:14:05,382][00338] Num frames 1200... [2024-12-30 20:14:05,558][00338] Num frames 1300... [2024-12-30 20:14:05,733][00338] Num frames 1400... [2024-12-30 20:14:05,863][00338] Avg episode rewards: #0: 36.400, true rewards: #0: 14.400 [2024-12-30 20:14:05,864][00338] Avg episode reward: 36.400, avg true_objective: 14.400 [2024-12-30 20:14:05,968][00338] Num frames 1500... [2024-12-30 20:14:06,150][00338] Num frames 1600... [2024-12-30 20:14:06,304][00338] Num frames 1700... [2024-12-30 20:14:06,422][00338] Num frames 1800... [2024-12-30 20:14:06,541][00338] Num frames 1900... [2024-12-30 20:14:06,661][00338] Num frames 2000... [2024-12-30 20:14:06,780][00338] Num frames 2100... [2024-12-30 20:14:06,911][00338] Num frames 2200... [2024-12-30 20:14:07,034][00338] Num frames 2300... [2024-12-30 20:14:07,155][00338] Num frames 2400... [2024-12-30 20:14:07,288][00338] Num frames 2500... [2024-12-30 20:14:07,411][00338] Num frames 2600... [2024-12-30 20:14:07,532][00338] Num frames 2700... [2024-12-30 20:14:07,653][00338] Num frames 2800... [2024-12-30 20:14:07,772][00338] Num frames 2900... [2024-12-30 20:14:07,904][00338] Num frames 3000... [2024-12-30 20:14:08,028][00338] Num frames 3100... [2024-12-30 20:14:08,167][00338] Num frames 3200... [2024-12-30 20:14:08,301][00338] Num frames 3300... [2024-12-30 20:14:08,422][00338] Num frames 3400... [2024-12-30 20:14:08,507][00338] Avg episode rewards: #0: 46.619, true rewards: #0: 17.120 [2024-12-30 20:14:08,509][00338] Avg episode reward: 46.619, avg true_objective: 17.120 [2024-12-30 20:14:08,604][00338] Num frames 3500... [2024-12-30 20:14:08,725][00338] Num frames 3600... [2024-12-30 20:14:08,856][00338] Num frames 3700... [2024-12-30 20:14:08,980][00338] Num frames 3800... [2024-12-30 20:14:09,105][00338] Num frames 3900... [2024-12-30 20:14:09,224][00338] Num frames 4000... [2024-12-30 20:14:09,356][00338] Num frames 4100... [2024-12-30 20:14:09,480][00338] Num frames 4200... [2024-12-30 20:14:09,601][00338] Num frames 4300... [2024-12-30 20:14:09,720][00338] Num frames 4400... [2024-12-30 20:14:09,851][00338] Num frames 4500... [2024-12-30 20:14:09,974][00338] Num frames 4600... [2024-12-30 20:14:10,098][00338] Num frames 4700... [2024-12-30 20:14:10,224][00338] Num frames 4800... [2024-12-30 20:14:10,277][00338] Avg episode rewards: #0: 42.000, true rewards: #0: 16.000 [2024-12-30 20:14:10,280][00338] Avg episode reward: 42.000, avg true_objective: 16.000 [2024-12-30 20:14:10,408][00338] Num frames 4900... [2024-12-30 20:14:10,531][00338] Num frames 5000... [2024-12-30 20:14:10,651][00338] Num frames 5100... [2024-12-30 20:14:10,772][00338] Num frames 5200... [2024-12-30 20:14:10,899][00338] Num frames 5300... [2024-12-30 20:14:11,027][00338] Num frames 5400... [2024-12-30 20:14:11,153][00338] Num frames 5500... [2024-12-30 20:14:11,267][00338] Avg episode rewards: #0: 35.862, true rewards: #0: 13.862 [2024-12-30 20:14:11,268][00338] Avg episode reward: 35.862, avg true_objective: 13.862 [2024-12-30 20:14:11,339][00338] Num frames 5600... [2024-12-30 20:14:11,463][00338] Num frames 5700... [2024-12-30 20:14:11,586][00338] Num frames 5800... [2024-12-30 20:14:11,706][00338] Num frames 5900... [2024-12-30 20:14:11,827][00338] Num frames 6000... [2024-12-30 20:14:11,954][00338] Num frames 6100... [2024-12-30 20:14:12,078][00338] Num frames 6200... [2024-12-30 20:14:12,200][00338] Num frames 6300... [2024-12-30 20:14:12,318][00338] Num frames 6400... [2024-12-30 20:14:12,447][00338] Num frames 6500... [2024-12-30 20:14:12,628][00338] Avg episode rewards: #0: 33.598, true rewards: #0: 13.198 [2024-12-30 20:14:12,630][00338] Avg episode reward: 33.598, avg true_objective: 13.198 [2024-12-30 20:14:12,634][00338] Num frames 6600... [2024-12-30 20:14:12,754][00338] Num frames 6700... [2024-12-30 20:14:12,884][00338] Num frames 6800... [2024-12-30 20:14:13,003][00338] Num frames 6900... [2024-12-30 20:14:13,126][00338] Num frames 7000... [2024-12-30 20:14:13,247][00338] Num frames 7100... [2024-12-30 20:14:13,370][00338] Num frames 7200... [2024-12-30 20:14:13,497][00338] Num frames 7300... [2024-12-30 20:14:13,620][00338] Num frames 7400... [2024-12-30 20:14:13,738][00338] Num frames 7500... [2024-12-30 20:14:13,908][00338] Avg episode rewards: #0: 31.818, true rewards: #0: 12.652 [2024-12-30 20:14:13,910][00338] Avg episode reward: 31.818, avg true_objective: 12.652 [2024-12-30 20:14:13,924][00338] Num frames 7600... [2024-12-30 20:14:14,048][00338] Num frames 7700... [2024-12-30 20:14:14,172][00338] Num frames 7800... [2024-12-30 20:14:14,293][00338] Num frames 7900... [2024-12-30 20:14:14,411][00338] Num frames 8000... [2024-12-30 20:14:14,543][00338] Num frames 8100... [2024-12-30 20:14:14,716][00338] Avg episode rewards: #0: 28.998, true rewards: #0: 11.713 [2024-12-30 20:14:14,718][00338] Avg episode reward: 28.998, avg true_objective: 11.713 [2024-12-30 20:14:14,724][00338] Num frames 8200... [2024-12-30 20:14:14,849][00338] Num frames 8300... [2024-12-30 20:14:14,976][00338] Num frames 8400... [2024-12-30 20:14:15,100][00338] Num frames 8500... [2024-12-30 20:14:15,220][00338] Num frames 8600... [2024-12-30 20:14:15,340][00338] Num frames 8700... [2024-12-30 20:14:15,472][00338] Num frames 8800... [2024-12-30 20:14:15,590][00338] Num frames 8900... [2024-12-30 20:14:15,688][00338] Avg episode rewards: #0: 27.044, true rewards: #0: 11.169 [2024-12-30 20:14:15,690][00338] Avg episode reward: 27.044, avg true_objective: 11.169 [2024-12-30 20:14:15,770][00338] Num frames 9000... [2024-12-30 20:14:15,901][00338] Num frames 9100... [2024-12-30 20:14:16,024][00338] Num frames 9200... [2024-12-30 20:14:16,146][00338] Num frames 9300... [2024-12-30 20:14:16,286][00338] Num frames 9400... [2024-12-30 20:14:16,461][00338] Num frames 9500... [2024-12-30 20:14:16,644][00338] Num frames 9600... [2024-12-30 20:14:16,811][00338] Num frames 9700... [2024-12-30 20:14:16,978][00338] Num frames 9800... [2024-12-30 20:14:17,150][00338] Num frames 9900... [2024-12-30 20:14:17,236][00338] Avg episode rewards: #0: 26.240, true rewards: #0: 11.018 [2024-12-30 20:14:17,238][00338] Avg episode reward: 26.240, avg true_objective: 11.018 [2024-12-30 20:14:17,374][00338] Num frames 10000... [2024-12-30 20:14:17,574][00338] Num frames 10100... [2024-12-30 20:14:17,741][00338] Num frames 10200... [2024-12-30 20:14:17,924][00338] Num frames 10300... [2024-12-30 20:14:18,090][00338] Num frames 10400... [2024-12-30 20:14:18,276][00338] Num frames 10500... [2024-12-30 20:14:18,459][00338] Num frames 10600... [2024-12-30 20:14:18,634][00338] Num frames 10700... [2024-12-30 20:14:18,804][00338] Num frames 10800... [2024-12-30 20:14:18,979][00338] Num frames 10900... [2024-12-30 20:14:19,045][00338] Avg episode rewards: #0: 26.108, true rewards: #0: 10.908 [2024-12-30 20:14:19,048][00338] Avg episode reward: 26.108, avg true_objective: 10.908 [2024-12-30 20:15:18,940][00338] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2024-12-30 20:15:19,581][00338] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-12-30 20:15:19,583][00338] Overriding arg 'num_workers' with value 1 passed from command line [2024-12-30 20:15:19,584][00338] Adding new argument 'no_render'=True that is not in the saved config file! [2024-12-30 20:15:19,586][00338] Adding new argument 'save_video'=True that is not in the saved config file! [2024-12-30 20:15:19,587][00338] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-12-30 20:15:19,589][00338] Adding new argument 'video_name'=None that is not in the saved config file! [2024-12-30 20:15:19,590][00338] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2024-12-30 20:15:19,591][00338] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-12-30 20:15:19,592][00338] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2024-12-30 20:15:19,593][00338] Adding new argument 'hf_repository'='qbbian/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2024-12-30 20:15:19,595][00338] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-12-30 20:15:19,596][00338] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-12-30 20:15:19,597][00338] Adding new argument 'train_script'=None that is not in the saved config file! [2024-12-30 20:15:19,597][00338] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-12-30 20:15:19,598][00338] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-12-30 20:15:19,636][00338] RunningMeanStd input shape: (3, 72, 128) [2024-12-30 20:15:19,638][00338] RunningMeanStd input shape: (1,) [2024-12-30 20:15:19,654][00338] ConvEncoder: input_channels=3 [2024-12-30 20:15:19,716][00338] Conv encoder output size: 512 [2024-12-30 20:15:19,720][00338] Policy head output size: 512 [2024-12-30 20:15:19,748][00338] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-12-30 20:15:20,363][00338] Num frames 100... [2024-12-30 20:15:20,521][00338] Num frames 200... [2024-12-30 20:15:20,675][00338] Num frames 300... [2024-12-30 20:15:20,852][00338] Num frames 400... [2024-12-30 20:15:21,062][00338] Num frames 500... [2024-12-30 20:15:21,240][00338] Num frames 600... [2024-12-30 20:15:21,434][00338] Num frames 700... [2024-12-30 20:15:21,599][00338] Num frames 800... [2024-12-30 20:15:21,804][00338] Avg episode rewards: #0: 21.730, true rewards: #0: 8.730 [2024-12-30 20:15:21,806][00338] Avg episode reward: 21.730, avg true_objective: 8.730 [2024-12-30 20:15:21,871][00338] Num frames 900... [2024-12-30 20:15:22,052][00338] Num frames 1000... [2024-12-30 20:15:22,212][00338] Num frames 1100... [2024-12-30 20:15:22,370][00338] Num frames 1200... [2024-12-30 20:15:22,552][00338] Num frames 1300... [2024-12-30 20:15:22,706][00338] Num frames 1400... [2024-12-30 20:15:22,885][00338] Num frames 1500... [2024-12-30 20:15:23,054][00338] Num frames 1600... [2024-12-30 20:15:23,216][00338] Num frames 1700... [2024-12-30 20:15:23,388][00338] Num frames 1800... [2024-12-30 20:15:23,569][00338] Avg episode rewards: #0: 20.325, true rewards: #0: 9.325 [2024-12-30 20:15:23,571][00338] Avg episode reward: 20.325, avg true_objective: 9.325 [2024-12-30 20:15:23,639][00338] Num frames 1900... [2024-12-30 20:15:23,838][00338] Num frames 2000... [2024-12-30 20:15:24,033][00338] Num frames 2100... [2024-12-30 20:15:24,221][00338] Num frames 2200... [2024-12-30 20:15:24,427][00338] Num frames 2300... [2024-12-30 20:15:24,642][00338] Num frames 2400... [2024-12-30 20:15:24,874][00338] Num frames 2500... [2024-12-30 20:15:25,093][00338] Num frames 2600... [2024-12-30 20:15:25,294][00338] Num frames 2700... [2024-12-30 20:15:25,474][00338] Num frames 2800... [2024-12-30 20:15:25,683][00338] Num frames 2900... [2024-12-30 20:15:25,884][00338] Num frames 3000... [2024-12-30 20:15:26,099][00338] Num frames 3100... [2024-12-30 20:15:26,191][00338] Avg episode rewards: #0: 23.377, true rewards: #0: 10.377 [2024-12-30 20:15:26,193][00338] Avg episode reward: 23.377, avg true_objective: 10.377 [2024-12-30 20:15:26,351][00338] Num frames 3200... [2024-12-30 20:15:26,538][00338] Num frames 3300... [2024-12-30 20:15:26,710][00338] Num frames 3400... [2024-12-30 20:15:26,895][00338] Num frames 3500... [2024-12-30 20:15:27,123][00338] Num frames 3600... [2024-12-30 20:15:27,312][00338] Num frames 3700... [2024-12-30 20:15:27,483][00338] Num frames 3800... [2024-12-30 20:15:27,652][00338] Num frames 3900... [2024-12-30 20:15:27,819][00338] Num frames 4000... [2024-12-30 20:15:27,959][00338] Num frames 4100... [2024-12-30 20:15:28,080][00338] Num frames 4200... [2024-12-30 20:15:28,202][00338] Num frames 4300... [2024-12-30 20:15:28,329][00338] Num frames 4400... [2024-12-30 20:15:28,450][00338] Num frames 4500... [2024-12-30 20:15:28,572][00338] Num frames 4600... [2024-12-30 20:15:28,691][00338] Num frames 4700... [2024-12-30 20:15:28,812][00338] Num frames 4800... [2024-12-30 20:15:28,953][00338] Num frames 4900... [2024-12-30 20:15:29,078][00338] Num frames 5000... [2024-12-30 20:15:29,198][00338] Num frames 5100... [2024-12-30 20:15:29,326][00338] Num frames 5200... [2024-12-30 20:15:29,400][00338] Avg episode rewards: #0: 31.782, true rewards: #0: 13.032 [2024-12-30 20:15:29,402][00338] Avg episode reward: 31.782, avg true_objective: 13.032 [2024-12-30 20:15:29,508][00338] Num frames 5300... [2024-12-30 20:15:29,637][00338] Num frames 5400... [2024-12-30 20:15:29,754][00338] Num frames 5500... [2024-12-30 20:15:29,879][00338] Num frames 5600... [2024-12-30 20:15:29,997][00338] Num frames 5700... [2024-12-30 20:15:30,117][00338] Num frames 5800... [2024-12-30 20:15:30,239][00338] Num frames 5900... [2024-12-30 20:15:30,358][00338] Avg episode rewards: #0: 28.096, true rewards: #0: 11.896 [2024-12-30 20:15:30,360][00338] Avg episode reward: 28.096, avg true_objective: 11.896 [2024-12-30 20:15:30,423][00338] Num frames 6000... [2024-12-30 20:15:30,542][00338] Num frames 6100... [2024-12-30 20:15:30,662][00338] Num frames 6200... [2024-12-30 20:15:30,780][00338] Num frames 6300... [2024-12-30 20:15:30,907][00338] Num frames 6400... [2024-12-30 20:15:31,026][00338] Num frames 6500... [2024-12-30 20:15:31,148][00338] Num frames 6600... [2024-12-30 20:15:31,270][00338] Num frames 6700... [2024-12-30 20:15:31,397][00338] Num frames 6800... [2024-12-30 20:15:31,517][00338] Num frames 6900... [2024-12-30 20:15:31,636][00338] Num frames 7000... [2024-12-30 20:15:31,732][00338] Avg episode rewards: #0: 27.560, true rewards: #0: 11.727 [2024-12-30 20:15:31,734][00338] Avg episode reward: 27.560, avg true_objective: 11.727 [2024-12-30 20:15:31,810][00338] Num frames 7100... [2024-12-30 20:15:31,933][00338] Num frames 7200... [2024-12-30 20:15:32,053][00338] Num frames 7300... [2024-12-30 20:15:32,171][00338] Num frames 7400... [2024-12-30 20:15:32,290][00338] Num frames 7500... [2024-12-30 20:15:32,416][00338] Num frames 7600... [2024-12-30 20:15:32,535][00338] Num frames 7700... [2024-12-30 20:15:32,661][00338] Num frames 7800... [2024-12-30 20:15:32,782][00338] Num frames 7900... [2024-12-30 20:15:32,910][00338] Num frames 8000... [2024-12-30 20:15:33,027][00338] Num frames 8100... [2024-12-30 20:15:33,147][00338] Num frames 8200... [2024-12-30 20:15:33,269][00338] Num frames 8300... [2024-12-30 20:15:33,398][00338] Num frames 8400... [2024-12-30 20:15:33,521][00338] Num frames 8500... [2024-12-30 20:15:33,641][00338] Num frames 8600... [2024-12-30 20:15:33,759][00338] Num frames 8700... [2024-12-30 20:15:33,885][00338] Num frames 8800... [2024-12-30 20:15:34,009][00338] Num frames 8900... [2024-12-30 20:15:34,074][00338] Avg episode rewards: #0: 30.581, true rewards: #0: 12.724 [2024-12-30 20:15:34,075][00338] Avg episode reward: 30.581, avg true_objective: 12.724 [2024-12-30 20:15:34,189][00338] Num frames 9000... [2024-12-30 20:15:34,307][00338] Num frames 9100... [2024-12-30 20:15:34,437][00338] Num frames 9200... [2024-12-30 20:15:34,600][00338] Avg episode rewards: #0: 27.614, true rewards: #0: 11.614 [2024-12-30 20:15:34,602][00338] Avg episode reward: 27.614, avg true_objective: 11.614 [2024-12-30 20:15:34,616][00338] Num frames 9300... [2024-12-30 20:15:34,732][00338] Num frames 9400... [2024-12-30 20:15:34,854][00338] Num frames 9500... [2024-12-30 20:15:34,973][00338] Num frames 9600... [2024-12-30 20:15:35,097][00338] Num frames 9700... [2024-12-30 20:15:35,218][00338] Num frames 9800... [2024-12-30 20:15:35,361][00338] Num frames 9900... [2024-12-30 20:15:35,546][00338] Num frames 10000... [2024-12-30 20:15:35,712][00338] Num frames 10100... [2024-12-30 20:15:35,880][00338] Num frames 10200... [2024-12-30 20:15:35,982][00338] Avg episode rewards: #0: 26.918, true rewards: #0: 11.362 [2024-12-30 20:15:35,987][00338] Avg episode reward: 26.918, avg true_objective: 11.362 [2024-12-30 20:15:36,112][00338] Num frames 10300... [2024-12-30 20:15:36,273][00338] Num frames 10400... [2024-12-30 20:15:36,435][00338] Num frames 10500... [2024-12-30 20:15:36,626][00338] Num frames 10600... [2024-12-30 20:15:36,795][00338] Num frames 10700... [2024-12-30 20:15:36,970][00338] Num frames 10800... [2024-12-30 20:15:37,143][00338] Num frames 10900... [2024-12-30 20:15:37,313][00338] Num frames 11000... [2024-12-30 20:15:37,490][00338] Num frames 11100... [2024-12-30 20:15:37,678][00338] Num frames 11200... [2024-12-30 20:15:37,803][00338] Num frames 11300... [2024-12-30 20:15:37,930][00338] Num frames 11400... [2024-12-30 20:15:38,049][00338] Num frames 11500... [2024-12-30 20:15:38,170][00338] Num frames 11600... [2024-12-30 20:15:38,229][00338] Avg episode rewards: #0: 27.202, true rewards: #0: 11.602 [2024-12-30 20:15:38,230][00338] Avg episode reward: 27.202, avg true_objective: 11.602 [2024-12-30 20:16:42,066][00338] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2024-12-30 20:49:40,252][00338] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-12-30 20:49:40,253][00338] Overriding arg 'num_workers' with value 1 passed from command line [2024-12-30 20:49:40,255][00338] Adding new argument 'no_render'=True that is not in the saved config file! [2024-12-30 20:49:40,257][00338] Adding new argument 'save_video'=True that is not in the saved config file! [2024-12-30 20:49:40,258][00338] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-12-30 20:49:40,260][00338] Adding new argument 'video_name'=None that is not in the saved config file! [2024-12-30 20:49:40,261][00338] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2024-12-30 20:49:40,262][00338] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-12-30 20:49:40,263][00338] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2024-12-30 20:49:40,264][00338] Adding new argument 'hf_repository'='qbbian/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2024-12-30 20:49:40,265][00338] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-12-30 20:49:40,265][00338] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-12-30 20:49:40,266][00338] Adding new argument 'train_script'=None that is not in the saved config file! [2024-12-30 20:49:40,267][00338] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-12-30 20:49:40,268][00338] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-12-30 20:49:40,302][00338] RunningMeanStd input shape: (3, 72, 128) [2024-12-30 20:49:40,304][00338] RunningMeanStd input shape: (1,) [2024-12-30 20:49:40,317][00338] ConvEncoder: input_channels=3 [2024-12-30 20:49:40,357][00338] Conv encoder output size: 512 [2024-12-30 20:49:40,358][00338] Policy head output size: 512 [2024-12-30 20:49:40,377][00338] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-12-30 20:49:40,789][00338] Num frames 100... [2024-12-30 20:49:40,929][00338] Num frames 200... [2024-12-30 20:49:41,048][00338] Num frames 300... [2024-12-30 20:49:41,165][00338] Num frames 400... [2024-12-30 20:49:41,285][00338] Num frames 500... [2024-12-30 20:49:41,406][00338] Num frames 600... [2024-12-30 20:49:41,538][00338] Num frames 700... [2024-12-30 20:49:41,656][00338] Num frames 800... [2024-12-30 20:49:41,777][00338] Num frames 900... [2024-12-30 20:49:41,906][00338] Num frames 1000... [2024-12-30 20:49:42,031][00338] Num frames 1100... [2024-12-30 20:49:42,156][00338] Num frames 1200... [2024-12-30 20:49:42,277][00338] Num frames 1300... [2024-12-30 20:49:42,400][00338] Num frames 1400... [2024-12-30 20:49:42,533][00338] Num frames 1500... [2024-12-30 20:49:42,655][00338] Num frames 1600... [2024-12-30 20:49:42,771][00338] Avg episode rewards: #0: 41.490, true rewards: #0: 16.490 [2024-12-30 20:49:42,773][00338] Avg episode reward: 41.490, avg true_objective: 16.490 [2024-12-30 20:49:42,837][00338] Num frames 1700... [2024-12-30 20:49:42,965][00338] Num frames 1800... [2024-12-30 20:49:43,083][00338] Num frames 1900... [2024-12-30 20:49:43,204][00338] Num frames 2000... [2024-12-30 20:49:43,324][00338] Num frames 2100... [2024-12-30 20:49:43,446][00338] Num frames 2200... [2024-12-30 20:49:43,575][00338] Num frames 2300... [2024-12-30 20:49:43,697][00338] Num frames 2400... [2024-12-30 20:49:43,819][00338] Num frames 2500... [2024-12-30 20:49:43,946][00338] Num frames 2600... [2024-12-30 20:49:44,067][00338] Num frames 2700... [2024-12-30 20:49:44,190][00338] Num frames 2800... [2024-12-30 20:49:44,312][00338] Num frames 2900... [2024-12-30 20:49:44,435][00338] Num frames 3000... [2024-12-30 20:49:44,560][00338] Num frames 3100... [2024-12-30 20:49:44,686][00338] Num frames 3200... [2024-12-30 20:49:44,808][00338] Num frames 3300... [2024-12-30 20:49:44,941][00338] Num frames 3400... [2024-12-30 20:49:45,065][00338] Num frames 3500... [2024-12-30 20:49:45,185][00338] Num frames 3600... [2024-12-30 20:49:45,311][00338] Num frames 3700... [2024-12-30 20:49:45,429][00338] Avg episode rewards: #0: 48.244, true rewards: #0: 18.745 [2024-12-30 20:49:45,432][00338] Avg episode reward: 48.244, avg true_objective: 18.745 [2024-12-30 20:49:45,520][00338] Num frames 3800... [2024-12-30 20:49:45,689][00338] Num frames 3900... [2024-12-30 20:49:45,852][00338] Num frames 4000... [2024-12-30 20:49:46,018][00338] Num frames 4100... [2024-12-30 20:49:46,181][00338] Num frames 4200... [2024-12-30 20:49:46,356][00338] Num frames 4300... [2024-12-30 20:49:46,526][00338] Num frames 4400... [2024-12-30 20:49:46,695][00338] Num frames 4500... [2024-12-30 20:49:46,870][00338] Num frames 4600... [2024-12-30 20:49:47,042][00338] Num frames 4700... [2024-12-30 20:49:47,178][00338] Avg episode rewards: #0: 39.470, true rewards: #0: 15.803 [2024-12-30 20:49:47,179][00338] Avg episode reward: 39.470, avg true_objective: 15.803 [2024-12-30 20:49:47,276][00338] Num frames 4800... [2024-12-30 20:49:47,453][00338] Num frames 4900... [2024-12-30 20:49:47,628][00338] Num frames 5000... [2024-12-30 20:49:47,816][00338] Num frames 5100... [2024-12-30 20:49:47,987][00338] Num frames 5200... [2024-12-30 20:49:48,121][00338] Num frames 5300... [2024-12-30 20:49:48,242][00338] Num frames 5400... [2024-12-30 20:49:48,368][00338] Num frames 5500... [2024-12-30 20:49:48,495][00338] Num frames 5600... [2024-12-30 20:49:48,562][00338] Avg episode rewards: #0: 34.262, true rewards: #0: 14.012 [2024-12-30 20:49:48,563][00338] Avg episode reward: 34.262, avg true_objective: 14.012 [2024-12-30 20:49:48,673][00338] Num frames 5700... [2024-12-30 20:49:48,801][00338] Num frames 5800... [2024-12-30 20:49:48,932][00338] Num frames 5900... [2024-12-30 20:49:49,061][00338] Num frames 6000... [2024-12-30 20:49:49,187][00338] Num frames 6100... [2024-12-30 20:49:49,311][00338] Num frames 6200... [2024-12-30 20:49:49,447][00338] Num frames 6300... [2024-12-30 20:49:49,567][00338] Num frames 6400... [2024-12-30 20:49:49,693][00338] Num frames 6500... [2024-12-30 20:49:49,827][00338] Num frames 6600... [2024-12-30 20:49:49,953][00338] Num frames 6700... [2024-12-30 20:49:50,079][00338] Num frames 6800... [2024-12-30 20:49:50,216][00338] Num frames 6900... [2024-12-30 20:49:50,337][00338] Num frames 7000... [2024-12-30 20:49:50,460][00338] Num frames 7100... [2024-12-30 20:49:50,582][00338] Num frames 7200... [2024-12-30 20:49:50,704][00338] Num frames 7300... [2024-12-30 20:49:50,834][00338] Num frames 7400... [2024-12-30 20:49:50,946][00338] Avg episode rewards: #0: 37.682, true rewards: #0: 14.882 [2024-12-30 20:49:50,948][00338] Avg episode reward: 37.682, avg true_objective: 14.882 [2024-12-30 20:49:51,022][00338] Num frames 7500... [2024-12-30 20:49:51,143][00338] Num frames 7600... [2024-12-30 20:49:51,266][00338] Num frames 7700... [2024-12-30 20:49:51,385][00338] Num frames 7800... [2024-12-30 20:49:51,510][00338] Num frames 7900... [2024-12-30 20:49:51,630][00338] Num frames 8000... [2024-12-30 20:49:51,763][00338] Num frames 8100... [2024-12-30 20:49:51,900][00338] Num frames 8200... [2024-12-30 20:49:52,027][00338] Num frames 8300... [2024-12-30 20:49:52,154][00338] Num frames 8400... [2024-12-30 20:49:52,278][00338] Num frames 8500... [2024-12-30 20:49:52,371][00338] Avg episode rewards: #0: 35.548, true rewards: #0: 14.215 [2024-12-30 20:49:52,372][00338] Avg episode reward: 35.548, avg true_objective: 14.215 [2024-12-30 20:49:52,461][00338] Num frames 8600... [2024-12-30 20:49:52,581][00338] Num frames 8700... [2024-12-30 20:49:52,701][00338] Num frames 8800... [2024-12-30 20:49:52,835][00338] Num frames 8900... [2024-12-30 20:49:52,960][00338] Num frames 9000... [2024-12-30 20:49:53,079][00338] Num frames 9100... [2024-12-30 20:49:53,206][00338] Num frames 9200... [2024-12-30 20:49:53,323][00338] Num frames 9300... [2024-12-30 20:49:53,446][00338] Num frames 9400... [2024-12-30 20:49:53,522][00338] Avg episode rewards: #0: 34.024, true rewards: #0: 13.453 [2024-12-30 20:49:53,524][00338] Avg episode reward: 34.024, avg true_objective: 13.453 [2024-12-30 20:49:53,622][00338] Num frames 9500... [2024-12-30 20:49:53,742][00338] Num frames 9600... [2024-12-30 20:49:53,878][00338] Num frames 9700... [2024-12-30 20:49:53,995][00338] Num frames 9800... [2024-12-30 20:49:54,117][00338] Num frames 9900... [2024-12-30 20:49:54,237][00338] Num frames 10000... [2024-12-30 20:49:54,361][00338] Num frames 10100... [2024-12-30 20:49:54,456][00338] Avg episode rewards: #0: 31.536, true rewards: #0: 12.661 [2024-12-30 20:49:54,457][00338] Avg episode reward: 31.536, avg true_objective: 12.661 [2024-12-30 20:49:54,564][00338] Num frames 10200... [2024-12-30 20:49:54,698][00338] Num frames 10300... [2024-12-30 20:49:54,817][00338] Num frames 10400... [2024-12-30 20:49:54,956][00338] Num frames 10500... [2024-12-30 20:49:55,074][00338] Num frames 10600... [2024-12-30 20:49:55,194][00338] Num frames 10700... [2024-12-30 20:49:55,317][00338] Num frames 10800... [2024-12-30 20:49:55,376][00338] Avg episode rewards: #0: 29.556, true rewards: #0: 12.001 [2024-12-30 20:49:55,378][00338] Avg episode reward: 29.556, avg true_objective: 12.001 [2024-12-30 20:49:55,496][00338] Num frames 10900... [2024-12-30 20:49:55,617][00338] Num frames 11000... [2024-12-30 20:49:55,735][00338] Num frames 11100... [2024-12-30 20:49:55,865][00338] Num frames 11200... [2024-12-30 20:49:55,998][00338] Num frames 11300... [2024-12-30 20:49:56,119][00338] Num frames 11400... [2024-12-30 20:49:56,240][00338] Num frames 11500... [2024-12-30 20:49:56,362][00338] Num frames 11600... [2024-12-30 20:49:56,491][00338] Num frames 11700... [2024-12-30 20:49:56,610][00338] Num frames 11800... [2024-12-30 20:49:56,729][00338] Num frames 11900... [2024-12-30 20:49:56,856][00338] Num frames 12000... [2024-12-30 20:49:56,984][00338] Num frames 12100... [2024-12-30 20:49:57,106][00338] Num frames 12200... [2024-12-30 20:49:57,174][00338] Avg episode rewards: #0: 30.109, true rewards: #0: 12.209 [2024-12-30 20:49:57,176][00338] Avg episode reward: 30.109, avg true_objective: 12.209 [2024-12-30 20:51:06,640][00338] Replay video saved to /content/train_dir/default_experiment/replay.mp4!