[2024-12-25 18:38:30,964][00412] Saving configuration to /content/train_dir/default_experiment/config.json... [2024-12-25 18:38:30,967][00412] Rollout worker 0 uses device cpu [2024-12-25 18:38:30,969][00412] Rollout worker 1 uses device cpu [2024-12-25 18:38:30,970][00412] Rollout worker 2 uses device cpu [2024-12-25 18:38:30,971][00412] Rollout worker 3 uses device cpu [2024-12-25 18:38:30,972][00412] Rollout worker 4 uses device cpu [2024-12-25 18:38:30,974][00412] Rollout worker 5 uses device cpu [2024-12-25 18:38:30,975][00412] Rollout worker 6 uses device cpu [2024-12-25 18:38:30,976][00412] Rollout worker 7 uses device cpu [2024-12-25 18:38:31,132][00412] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-12-25 18:38:31,134][00412] InferenceWorker_p0-w0: min num requests: 2 [2024-12-25 18:38:31,169][00412] Starting all processes... [2024-12-25 18:38:31,171][00412] Starting process learner_proc0 [2024-12-25 18:38:31,218][00412] Starting all processes... [2024-12-25 18:38:31,227][00412] Starting process inference_proc0-0 [2024-12-25 18:38:31,228][00412] Starting process rollout_proc0 [2024-12-25 18:38:31,229][00412] Starting process rollout_proc1 [2024-12-25 18:38:31,230][00412] Starting process rollout_proc2 [2024-12-25 18:38:31,230][00412] Starting process rollout_proc3 [2024-12-25 18:38:31,230][00412] Starting process rollout_proc4 [2024-12-25 18:38:31,230][00412] Starting process rollout_proc5 [2024-12-25 18:38:31,230][00412] Starting process rollout_proc6 [2024-12-25 18:38:31,230][00412] Starting process rollout_proc7 [2024-12-25 18:38:48,793][04609] Worker 0 uses CPU cores [0] [2024-12-25 18:38:49,231][04611] Worker 2 uses CPU cores [0] [2024-12-25 18:38:49,423][04610] Worker 1 uses CPU cores [1] [2024-12-25 18:38:49,492][04595] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-12-25 18:38:49,496][04595] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2024-12-25 18:38:49,501][04608] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-12-25 18:38:49,503][04608] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2024-12-25 18:38:49,513][04614] Worker 5 uses CPU cores [1] [2024-12-25 18:38:49,523][04612] Worker 3 uses CPU cores [1] [2024-12-25 18:38:49,559][04608] Num visible devices: 1 [2024-12-25 18:38:49,571][04595] Num visible devices: 1 [2024-12-25 18:38:49,613][04595] Starting seed is not provided [2024-12-25 18:38:49,614][04613] Worker 4 uses CPU cores [0] [2024-12-25 18:38:49,614][04595] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-12-25 18:38:49,615][04595] Initializing actor-critic model on device cuda:0 [2024-12-25 18:38:49,617][04595] RunningMeanStd input shape: (3, 72, 128) [2024-12-25 18:38:49,620][04595] RunningMeanStd input shape: (1,) [2024-12-25 18:38:49,676][04616] Worker 7 uses CPU cores [1] [2024-12-25 18:38:49,694][04595] ConvEncoder: input_channels=3 [2024-12-25 18:38:49,726][04615] Worker 6 uses CPU cores [0] [2024-12-25 18:38:49,999][04595] Conv encoder output size: 512 [2024-12-25 18:38:49,999][04595] Policy head output size: 512 [2024-12-25 18:38:50,052][04595] Created Actor Critic model with architecture: [2024-12-25 18:38:50,052][04595] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2024-12-25 18:38:50,352][04595] Using optimizer [2024-12-25 18:38:51,127][00412] Heartbeat connected on Batcher_0 [2024-12-25 18:38:51,133][00412] Heartbeat connected on InferenceWorker_p0-w0 [2024-12-25 18:38:51,141][00412] Heartbeat connected on RolloutWorker_w0 [2024-12-25 18:38:51,147][00412] Heartbeat connected on RolloutWorker_w1 [2024-12-25 18:38:51,151][00412] Heartbeat connected on RolloutWorker_w2 [2024-12-25 18:38:51,154][00412] Heartbeat connected on RolloutWorker_w3 [2024-12-25 18:38:51,157][00412] Heartbeat connected on RolloutWorker_w4 [2024-12-25 18:38:51,161][00412] Heartbeat connected on RolloutWorker_w5 [2024-12-25 18:38:51,165][00412] Heartbeat connected on RolloutWorker_w6 [2024-12-25 18:38:51,175][00412] Heartbeat connected on RolloutWorker_w7 [2024-12-25 18:38:55,202][04595] No checkpoints found [2024-12-25 18:38:55,202][04595] Did not load from checkpoint, starting from scratch! [2024-12-25 18:38:55,203][04595] Initialized policy 0 weights for model version 0 [2024-12-25 18:38:55,213][04595] LearnerWorker_p0 finished initialization! [2024-12-25 18:38:55,213][04595] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-12-25 18:38:55,216][00412] Heartbeat connected on LearnerWorker_p0 [2024-12-25 18:38:55,333][00412] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-12-25 18:38:55,364][04608] RunningMeanStd input shape: (3, 72, 128) [2024-12-25 18:38:55,366][04608] RunningMeanStd input shape: (1,) [2024-12-25 18:38:55,386][04608] ConvEncoder: input_channels=3 [2024-12-25 18:38:55,555][04608] Conv encoder output size: 512 [2024-12-25 18:38:55,556][04608] Policy head output size: 512 [2024-12-25 18:38:55,639][00412] Inference worker 0-0 is ready! [2024-12-25 18:38:55,641][00412] All inference workers are ready! Signal rollout workers to start! [2024-12-25 18:38:55,867][04610] Doom resolution: 160x120, resize resolution: (128, 72) [2024-12-25 18:38:55,876][04612] Doom resolution: 160x120, resize resolution: (128, 72) [2024-12-25 18:38:55,901][04614] Doom resolution: 160x120, resize resolution: (128, 72) [2024-12-25 18:38:55,888][04616] Doom resolution: 160x120, resize resolution: (128, 72) [2024-12-25 18:38:55,996][04611] Doom resolution: 160x120, resize resolution: (128, 72) [2024-12-25 18:38:56,002][04609] Doom resolution: 160x120, resize resolution: (128, 72) [2024-12-25 18:38:56,005][04615] Doom resolution: 160x120, resize resolution: (128, 72) [2024-12-25 18:38:56,001][04613] Doom resolution: 160x120, resize resolution: (128, 72) [2024-12-25 18:38:57,494][04613] Decorrelating experience for 0 frames... [2024-12-25 18:38:57,492][04615] Decorrelating experience for 0 frames... [2024-12-25 18:38:57,959][04616] Decorrelating experience for 0 frames... [2024-12-25 18:38:57,975][04612] Decorrelating experience for 0 frames... [2024-12-25 18:38:57,993][04614] Decorrelating experience for 0 frames... [2024-12-25 18:38:58,002][04610] Decorrelating experience for 0 frames... [2024-12-25 18:38:58,356][04613] Decorrelating experience for 32 frames... [2024-12-25 18:38:58,362][04609] Decorrelating experience for 0 frames... [2024-12-25 18:38:58,972][04616] Decorrelating experience for 32 frames... [2024-12-25 18:38:58,997][04612] Decorrelating experience for 32 frames... [2024-12-25 18:38:59,033][04611] Decorrelating experience for 0 frames... [2024-12-25 18:38:59,373][04609] Decorrelating experience for 32 frames... [2024-12-25 18:38:59,730][04610] Decorrelating experience for 32 frames... [2024-12-25 18:39:00,283][04614] Decorrelating experience for 32 frames... [2024-12-25 18:39:00,333][00412] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-12-25 18:39:00,430][04611] Decorrelating experience for 32 frames... [2024-12-25 18:39:00,460][04613] Decorrelating experience for 64 frames... [2024-12-25 18:39:00,542][04612] Decorrelating experience for 64 frames... [2024-12-25 18:39:01,233][04609] Decorrelating experience for 64 frames... [2024-12-25 18:39:01,287][04615] Decorrelating experience for 32 frames... [2024-12-25 18:39:01,420][04616] Decorrelating experience for 64 frames... [2024-12-25 18:39:01,581][04614] Decorrelating experience for 64 frames... [2024-12-25 18:39:01,899][04613] Decorrelating experience for 96 frames... [2024-12-25 18:39:02,770][04610] Decorrelating experience for 64 frames... [2024-12-25 18:39:02,806][04611] Decorrelating experience for 64 frames... [2024-12-25 18:39:03,132][04615] Decorrelating experience for 64 frames... [2024-12-25 18:39:03,185][04616] Decorrelating experience for 96 frames... [2024-12-25 18:39:03,225][04612] Decorrelating experience for 96 frames... [2024-12-25 18:39:03,383][04614] Decorrelating experience for 96 frames... [2024-12-25 18:39:03,883][04610] Decorrelating experience for 96 frames... [2024-12-25 18:39:04,638][04609] Decorrelating experience for 96 frames... [2024-12-25 18:39:04,793][04611] Decorrelating experience for 96 frames... [2024-12-25 18:39:05,109][04615] Decorrelating experience for 96 frames... [2024-12-25 18:39:05,333][00412] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 33.8. Samples: 338. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-12-25 18:39:05,337][00412] Avg episode reward: [(0, '1.120')] [2024-12-25 18:39:08,050][04595] Signal inference workers to stop experience collection... [2024-12-25 18:39:08,084][04608] InferenceWorker_p0-w0: stopping experience collection [2024-12-25 18:39:10,334][00412] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 160.0. Samples: 2400. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-12-25 18:39:10,336][00412] Avg episode reward: [(0, '2.131')] [2024-12-25 18:39:11,490][04595] Signal inference workers to resume experience collection... [2024-12-25 18:39:11,490][04608] InferenceWorker_p0-w0: resuming experience collection [2024-12-25 18:39:15,333][00412] Fps is (10 sec: 2048.0, 60 sec: 1024.0, 300 sec: 1024.0). Total num frames: 20480. Throughput: 0: 205.2. Samples: 4104. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-25 18:39:15,336][00412] Avg episode reward: [(0, '3.451')] [2024-12-25 18:39:20,334][00412] Fps is (10 sec: 3686.4, 60 sec: 1474.5, 300 sec: 1474.5). Total num frames: 36864. Throughput: 0: 400.0. Samples: 10000. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-25 18:39:20,344][00412] Avg episode reward: [(0, '3.893')] [2024-12-25 18:39:20,891][04608] Updated weights for policy 0, policy_version 10 (0.0228) [2024-12-25 18:39:25,340][00412] Fps is (10 sec: 3274.6, 60 sec: 1774.5, 300 sec: 1774.5). Total num frames: 53248. Throughput: 0: 406.8. Samples: 12208. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-25 18:39:25,347][00412] Avg episode reward: [(0, '4.401')] [2024-12-25 18:39:30,333][00412] Fps is (10 sec: 2867.3, 60 sec: 1872.5, 300 sec: 1872.5). Total num frames: 65536. Throughput: 0: 451.4. Samples: 15798. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-25 18:39:30,341][00412] Avg episode reward: [(0, '4.528')] [2024-12-25 18:39:34,136][04608] Updated weights for policy 0, policy_version 20 (0.0038) [2024-12-25 18:39:35,333][00412] Fps is (10 sec: 3279.0, 60 sec: 2150.4, 300 sec: 2150.4). Total num frames: 86016. Throughput: 0: 537.3. Samples: 21492. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-25 18:39:35,339][00412] Avg episode reward: [(0, '4.483')] [2024-12-25 18:39:40,333][00412] Fps is (10 sec: 3686.3, 60 sec: 2275.5, 300 sec: 2275.5). Total num frames: 102400. Throughput: 0: 544.1. Samples: 24484. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-25 18:39:40,342][00412] Avg episode reward: [(0, '4.392')] [2024-12-25 18:39:40,344][04595] Saving new best policy, reward=4.392! [2024-12-25 18:39:45,333][00412] Fps is (10 sec: 2867.2, 60 sec: 2293.8, 300 sec: 2293.8). Total num frames: 114688. Throughput: 0: 636.6. Samples: 28646. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-25 18:39:45,338][00412] Avg episode reward: [(0, '4.463')] [2024-12-25 18:39:45,348][04595] Saving new best policy, reward=4.463! [2024-12-25 18:39:47,470][04608] Updated weights for policy 0, policy_version 30 (0.0029) [2024-12-25 18:39:50,333][00412] Fps is (10 sec: 2867.3, 60 sec: 2383.1, 300 sec: 2383.1). Total num frames: 131072. Throughput: 0: 734.2. Samples: 33376. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-25 18:39:50,340][00412] Avg episode reward: [(0, '4.537')] [2024-12-25 18:39:50,344][04595] Saving new best policy, reward=4.537! [2024-12-25 18:39:55,333][00412] Fps is (10 sec: 3686.4, 60 sec: 2525.9, 300 sec: 2525.9). Total num frames: 151552. Throughput: 0: 753.9. Samples: 36326. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-12-25 18:39:55,336][00412] Avg episode reward: [(0, '4.377')] [2024-12-25 18:39:58,611][04608] Updated weights for policy 0, policy_version 40 (0.0014) [2024-12-25 18:40:00,333][00412] Fps is (10 sec: 3276.8, 60 sec: 2730.7, 300 sec: 2520.6). Total num frames: 163840. Throughput: 0: 830.0. Samples: 41452. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-25 18:40:00,338][00412] Avg episode reward: [(0, '4.398')] [2024-12-25 18:40:05,333][00412] Fps is (10 sec: 2867.2, 60 sec: 3003.7, 300 sec: 2574.6). Total num frames: 180224. Throughput: 0: 787.9. Samples: 45454. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-25 18:40:05,339][00412] Avg episode reward: [(0, '4.464')] [2024-12-25 18:40:10,333][00412] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 2676.1). Total num frames: 200704. Throughput: 0: 806.1. Samples: 48478. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-25 18:40:10,339][00412] Avg episode reward: [(0, '4.506')] [2024-12-25 18:40:11,030][04608] Updated weights for policy 0, policy_version 50 (0.0033) [2024-12-25 18:40:15,334][00412] Fps is (10 sec: 3686.0, 60 sec: 3276.7, 300 sec: 2713.6). Total num frames: 217088. Throughput: 0: 858.5. Samples: 54430. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-25 18:40:15,338][00412] Avg episode reward: [(0, '4.368')] [2024-12-25 18:40:20,333][00412] Fps is (10 sec: 2867.2, 60 sec: 3208.6, 300 sec: 2698.5). Total num frames: 229376. Throughput: 0: 810.5. Samples: 57966. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-25 18:40:20,336][00412] Avg episode reward: [(0, '4.247')] [2024-12-25 18:40:24,307][04608] Updated weights for policy 0, policy_version 60 (0.0040) [2024-12-25 18:40:25,333][00412] Fps is (10 sec: 2867.5, 60 sec: 3208.9, 300 sec: 2730.7). Total num frames: 245760. Throughput: 0: 797.9. Samples: 60388. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-25 18:40:25,337][00412] Avg episode reward: [(0, '4.545')] [2024-12-25 18:40:25,383][04595] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000061_249856.pth... [2024-12-25 18:40:25,502][04595] Saving new best policy, reward=4.545! [2024-12-25 18:40:30,333][00412] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 2802.5). Total num frames: 266240. Throughput: 0: 834.5. Samples: 66200. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-25 18:40:30,338][00412] Avg episode reward: [(0, '4.734')] [2024-12-25 18:40:30,341][04595] Saving new best policy, reward=4.734! [2024-12-25 18:40:35,333][00412] Fps is (10 sec: 3276.7, 60 sec: 3208.5, 300 sec: 2785.3). Total num frames: 278528. Throughput: 0: 823.6. Samples: 70438. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-25 18:40:35,338][00412] Avg episode reward: [(0, '4.576')] [2024-12-25 18:40:37,639][04608] Updated weights for policy 0, policy_version 70 (0.0021) [2024-12-25 18:40:40,333][00412] Fps is (10 sec: 2867.1, 60 sec: 3208.5, 300 sec: 2808.7). Total num frames: 294912. Throughput: 0: 798.1. Samples: 72242. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-25 18:40:40,338][00412] Avg episode reward: [(0, '4.469')] [2024-12-25 18:40:45,333][00412] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 2867.2). Total num frames: 315392. Throughput: 0: 813.7. Samples: 78068. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-25 18:40:45,341][00412] Avg episode reward: [(0, '4.373')] [2024-12-25 18:40:48,131][04608] Updated weights for policy 0, policy_version 80 (0.0026) [2024-12-25 18:40:50,333][00412] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 2885.0). Total num frames: 331776. Throughput: 0: 842.7. Samples: 83374. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-25 18:40:50,339][00412] Avg episode reward: [(0, '4.466')] [2024-12-25 18:40:55,334][00412] Fps is (10 sec: 2867.1, 60 sec: 3208.5, 300 sec: 2867.2). Total num frames: 344064. Throughput: 0: 813.7. Samples: 85094. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-25 18:40:55,338][00412] Avg episode reward: [(0, '4.497')] [2024-12-25 18:41:00,333][00412] Fps is (10 sec: 2867.2, 60 sec: 3276.8, 300 sec: 2883.6). Total num frames: 360448. Throughput: 0: 787.6. Samples: 89870. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-25 18:41:00,336][00412] Avg episode reward: [(0, '4.589')] [2024-12-25 18:41:01,636][04608] Updated weights for policy 0, policy_version 90 (0.0022) [2024-12-25 18:41:05,333][00412] Fps is (10 sec: 3686.6, 60 sec: 3345.1, 300 sec: 2930.2). Total num frames: 380928. Throughput: 0: 840.2. Samples: 95776. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-25 18:41:05,335][00412] Avg episode reward: [(0, '4.604')] [2024-12-25 18:41:10,336][00412] Fps is (10 sec: 3275.9, 60 sec: 3208.4, 300 sec: 2912.7). Total num frames: 393216. Throughput: 0: 834.4. Samples: 97938. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-25 18:41:10,338][00412] Avg episode reward: [(0, '4.659')] [2024-12-25 18:41:14,750][04608] Updated weights for policy 0, policy_version 100 (0.0041) [2024-12-25 18:41:15,333][00412] Fps is (10 sec: 2867.2, 60 sec: 3208.6, 300 sec: 2925.7). Total num frames: 409600. Throughput: 0: 790.4. Samples: 101770. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-25 18:41:15,336][00412] Avg episode reward: [(0, '4.699')] [2024-12-25 18:41:20,338][00412] Fps is (10 sec: 3685.6, 60 sec: 3344.8, 300 sec: 2966.0). Total num frames: 430080. Throughput: 0: 828.7. Samples: 107732. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-25 18:41:20,342][00412] Avg episode reward: [(0, '4.494')] [2024-12-25 18:41:25,334][00412] Fps is (10 sec: 3686.2, 60 sec: 3345.0, 300 sec: 2976.4). Total num frames: 446464. Throughput: 0: 853.7. Samples: 110660. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-25 18:41:25,340][00412] Avg episode reward: [(0, '4.437')] [2024-12-25 18:41:26,896][04608] Updated weights for policy 0, policy_version 110 (0.0026) [2024-12-25 18:41:30,337][00412] Fps is (10 sec: 2458.0, 60 sec: 3140.1, 300 sec: 2933.2). Total num frames: 454656. Throughput: 0: 802.7. Samples: 114194. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-25 18:41:30,340][00412] Avg episode reward: [(0, '4.467')] [2024-12-25 18:41:35,333][00412] Fps is (10 sec: 2867.3, 60 sec: 3276.8, 300 sec: 2969.6). Total num frames: 475136. Throughput: 0: 802.5. Samples: 119486. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-25 18:41:35,337][00412] Avg episode reward: [(0, '4.538')] [2024-12-25 18:41:38,591][04608] Updated weights for policy 0, policy_version 120 (0.0026) [2024-12-25 18:41:40,333][00412] Fps is (10 sec: 4097.3, 60 sec: 3345.1, 300 sec: 3003.7). Total num frames: 495616. Throughput: 0: 831.3. Samples: 122500. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-25 18:41:40,338][00412] Avg episode reward: [(0, '4.602')] [2024-12-25 18:41:45,333][00412] Fps is (10 sec: 3276.8, 60 sec: 3208.5, 300 sec: 2987.7). Total num frames: 507904. Throughput: 0: 827.2. Samples: 127092. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-25 18:41:45,343][00412] Avg episode reward: [(0, '4.491')] [2024-12-25 18:41:50,333][00412] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 2995.9). Total num frames: 524288. Throughput: 0: 792.4. Samples: 131436. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-25 18:41:50,341][00412] Avg episode reward: [(0, '4.637')] [2024-12-25 18:41:52,049][04608] Updated weights for policy 0, policy_version 130 (0.0024) [2024-12-25 18:41:55,333][00412] Fps is (10 sec: 3276.8, 60 sec: 3276.8, 300 sec: 3003.7). Total num frames: 540672. Throughput: 0: 809.0. Samples: 134340. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-25 18:41:55,335][00412] Avg episode reward: [(0, '4.781')] [2024-12-25 18:41:55,355][04595] Saving new best policy, reward=4.781! [2024-12-25 18:42:00,333][00412] Fps is (10 sec: 3276.8, 60 sec: 3276.8, 300 sec: 3011.1). Total num frames: 557056. Throughput: 0: 842.4. Samples: 139678. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-25 18:42:00,341][00412] Avg episode reward: [(0, '4.776')] [2024-12-25 18:42:05,333][00412] Fps is (10 sec: 2867.2, 60 sec: 3140.3, 300 sec: 2996.5). Total num frames: 569344. Throughput: 0: 787.9. Samples: 143184. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-25 18:42:05,336][00412] Avg episode reward: [(0, '4.703')] [2024-12-25 18:42:05,660][04608] Updated weights for policy 0, policy_version 140 (0.0018) [2024-12-25 18:42:10,333][00412] Fps is (10 sec: 3276.8, 60 sec: 3276.9, 300 sec: 3024.7). Total num frames: 589824. Throughput: 0: 784.8. Samples: 145974. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-25 18:42:10,338][00412] Avg episode reward: [(0, '4.619')] [2024-12-25 18:42:15,333][00412] Fps is (10 sec: 4096.0, 60 sec: 3345.1, 300 sec: 3051.5). Total num frames: 610304. Throughput: 0: 839.0. Samples: 151948. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-25 18:42:15,335][00412] Avg episode reward: [(0, '4.429')] [2024-12-25 18:42:16,306][04608] Updated weights for policy 0, policy_version 150 (0.0022) [2024-12-25 18:42:20,335][00412] Fps is (10 sec: 3276.0, 60 sec: 3208.7, 300 sec: 3037.0). Total num frames: 622592. Throughput: 0: 811.3. Samples: 155996. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-25 18:42:20,343][00412] Avg episode reward: [(0, '4.582')] [2024-12-25 18:42:25,333][00412] Fps is (10 sec: 2867.1, 60 sec: 3208.5, 300 sec: 3042.7). Total num frames: 638976. Throughput: 0: 788.2. Samples: 157968. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-25 18:42:25,336][00412] Avg episode reward: [(0, '4.799')] [2024-12-25 18:42:25,350][04595] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000156_638976.pth... [2024-12-25 18:42:25,473][04595] Saving new best policy, reward=4.799! [2024-12-25 18:42:29,530][04608] Updated weights for policy 0, policy_version 160 (0.0023) [2024-12-25 18:42:30,333][00412] Fps is (10 sec: 3277.6, 60 sec: 3345.2, 300 sec: 3048.2). Total num frames: 655360. Throughput: 0: 810.6. Samples: 163568. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-25 18:42:30,336][00412] Avg episode reward: [(0, '4.907')] [2024-12-25 18:42:30,342][04595] Saving new best policy, reward=4.907! [2024-12-25 18:42:35,333][00412] Fps is (10 sec: 3276.9, 60 sec: 3276.8, 300 sec: 3053.4). Total num frames: 671744. Throughput: 0: 821.6. Samples: 168408. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-25 18:42:35,336][00412] Avg episode reward: [(0, '4.846')] [2024-12-25 18:42:40,333][00412] Fps is (10 sec: 2867.2, 60 sec: 3140.3, 300 sec: 3040.1). Total num frames: 684032. Throughput: 0: 795.7. Samples: 170148. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-25 18:42:40,343][00412] Avg episode reward: [(0, '4.676')] [2024-12-25 18:42:42,979][04608] Updated weights for policy 0, policy_version 170 (0.0018) [2024-12-25 18:42:45,333][00412] Fps is (10 sec: 3276.8, 60 sec: 3276.8, 300 sec: 3063.1). Total num frames: 704512. Throughput: 0: 794.2. Samples: 175418. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-25 18:42:45,335][00412] Avg episode reward: [(0, '4.689')] [2024-12-25 18:42:50,333][00412] Fps is (10 sec: 3686.4, 60 sec: 3276.8, 300 sec: 3067.6). Total num frames: 720896. Throughput: 0: 846.7. Samples: 181286. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-25 18:42:50,337][00412] Avg episode reward: [(0, '4.662')] [2024-12-25 18:42:55,333][00412] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 3054.9). Total num frames: 733184. Throughput: 0: 823.0. Samples: 183010. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-25 18:42:55,341][00412] Avg episode reward: [(0, '4.658')] [2024-12-25 18:42:55,758][04608] Updated weights for policy 0, policy_version 180 (0.0020) [2024-12-25 18:43:00,333][00412] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 3059.5). Total num frames: 749568. Throughput: 0: 780.0. Samples: 187046. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-25 18:43:00,341][00412] Avg episode reward: [(0, '4.739')] [2024-12-25 18:43:05,333][00412] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3080.2). Total num frames: 770048. Throughput: 0: 819.9. Samples: 192890. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-25 18:43:05,335][00412] Avg episode reward: [(0, '4.922')] [2024-12-25 18:43:05,346][04595] Saving new best policy, reward=4.922! [2024-12-25 18:43:06,891][04608] Updated weights for policy 0, policy_version 190 (0.0021) [2024-12-25 18:43:10,334][00412] Fps is (10 sec: 3276.6, 60 sec: 3208.5, 300 sec: 3068.0). Total num frames: 782336. Throughput: 0: 833.8. Samples: 195490. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-25 18:43:10,343][00412] Avg episode reward: [(0, '4.870')] [2024-12-25 18:43:15,333][00412] Fps is (10 sec: 2867.2, 60 sec: 3140.3, 300 sec: 3072.0). Total num frames: 798720. Throughput: 0: 786.8. Samples: 198972. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-25 18:43:15,338][00412] Avg episode reward: [(0, '4.761')] [2024-12-25 18:43:20,333][00412] Fps is (10 sec: 3277.0, 60 sec: 3208.7, 300 sec: 3075.9). Total num frames: 815104. Throughput: 0: 800.9. Samples: 204450. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-25 18:43:20,335][00412] Avg episode reward: [(0, '4.960')] [2024-12-25 18:43:20,340][04595] Saving new best policy, reward=4.960! [2024-12-25 18:43:20,738][04608] Updated weights for policy 0, policy_version 200 (0.0032) [2024-12-25 18:43:25,336][00412] Fps is (10 sec: 3685.5, 60 sec: 3276.7, 300 sec: 3094.7). Total num frames: 835584. Throughput: 0: 827.2. Samples: 207372. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-25 18:43:25,338][00412] Avg episode reward: [(0, '5.155')] [2024-12-25 18:43:25,354][04595] Saving new best policy, reward=5.155! [2024-12-25 18:43:30,333][00412] Fps is (10 sec: 2867.1, 60 sec: 3140.2, 300 sec: 3068.3). Total num frames: 843776. Throughput: 0: 801.0. Samples: 211462. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-25 18:43:30,338][00412] Avg episode reward: [(0, '4.803')] [2024-12-25 18:43:34,140][04608] Updated weights for policy 0, policy_version 210 (0.0025) [2024-12-25 18:43:35,333][00412] Fps is (10 sec: 2867.9, 60 sec: 3208.5, 300 sec: 3086.6). Total num frames: 864256. Throughput: 0: 772.7. Samples: 216058. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-25 18:43:35,340][00412] Avg episode reward: [(0, '4.914')] [2024-12-25 18:43:40,333][00412] Fps is (10 sec: 3686.5, 60 sec: 3276.8, 300 sec: 3090.0). Total num frames: 880640. Throughput: 0: 798.8. Samples: 218954. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-25 18:43:40,335][00412] Avg episode reward: [(0, '5.376')] [2024-12-25 18:43:40,342][04595] Saving new best policy, reward=5.376! [2024-12-25 18:43:45,333][00412] Fps is (10 sec: 3276.8, 60 sec: 3208.5, 300 sec: 3093.2). Total num frames: 897024. Throughput: 0: 825.8. Samples: 224208. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-25 18:43:45,335][00412] Avg episode reward: [(0, '5.690')] [2024-12-25 18:43:45,349][04595] Saving new best policy, reward=5.690! [2024-12-25 18:43:46,423][04608] Updated weights for policy 0, policy_version 220 (0.0022) [2024-12-25 18:43:50,333][00412] Fps is (10 sec: 2867.2, 60 sec: 3140.3, 300 sec: 3082.4). Total num frames: 909312. Throughput: 0: 776.0. Samples: 227812. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-25 18:43:50,340][00412] Avg episode reward: [(0, '5.656')] [2024-12-25 18:43:55,333][00412] Fps is (10 sec: 3276.8, 60 sec: 3276.8, 300 sec: 3151.8). Total num frames: 929792. Throughput: 0: 783.4. Samples: 230744. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-25 18:43:55,339][00412] Avg episode reward: [(0, '5.518')] [2024-12-25 18:43:58,314][04608] Updated weights for policy 0, policy_version 230 (0.0032) [2024-12-25 18:44:00,333][00412] Fps is (10 sec: 3686.4, 60 sec: 3276.8, 300 sec: 3207.4). Total num frames: 946176. Throughput: 0: 833.3. Samples: 236470. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-25 18:44:00,342][00412] Avg episode reward: [(0, '5.524')] [2024-12-25 18:44:05,333][00412] Fps is (10 sec: 2867.2, 60 sec: 3140.3, 300 sec: 3249.0). Total num frames: 958464. Throughput: 0: 794.7. Samples: 240210. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-25 18:44:05,338][00412] Avg episode reward: [(0, '5.618')] [2024-12-25 18:44:10,333][00412] Fps is (10 sec: 2867.2, 60 sec: 3208.6, 300 sec: 3235.1). Total num frames: 974848. Throughput: 0: 780.0. Samples: 242468. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-25 18:44:10,335][00412] Avg episode reward: [(0, '5.610')] [2024-12-25 18:44:11,577][04608] Updated weights for policy 0, policy_version 240 (0.0016) [2024-12-25 18:44:15,333][00412] Fps is (10 sec: 3686.4, 60 sec: 3276.8, 300 sec: 3249.0). Total num frames: 995328. Throughput: 0: 819.8. Samples: 248352. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-25 18:44:15,340][00412] Avg episode reward: [(0, '5.471')] [2024-12-25 18:44:20,337][00412] Fps is (10 sec: 3275.6, 60 sec: 3208.3, 300 sec: 3235.2). Total num frames: 1007616. Throughput: 0: 819.3. Samples: 252930. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-25 18:44:20,339][00412] Avg episode reward: [(0, '5.487')] [2024-12-25 18:44:24,983][04608] Updated weights for policy 0, policy_version 250 (0.0023) [2024-12-25 18:44:25,333][00412] Fps is (10 sec: 2867.2, 60 sec: 3140.4, 300 sec: 3249.0). Total num frames: 1024000. Throughput: 0: 794.4. Samples: 254700. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-25 18:44:25,336][00412] Avg episode reward: [(0, '5.712')] [2024-12-25 18:44:25,350][04595] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000250_1024000.pth... [2024-12-25 18:44:25,476][04595] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000061_249856.pth [2024-12-25 18:44:25,493][04595] Saving new best policy, reward=5.712! [2024-12-25 18:44:30,333][00412] Fps is (10 sec: 3277.9, 60 sec: 3276.8, 300 sec: 3235.1). Total num frames: 1040384. Throughput: 0: 793.2. Samples: 259902. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-25 18:44:30,339][00412] Avg episode reward: [(0, '5.679')] [2024-12-25 18:44:35,333][00412] Fps is (10 sec: 3686.4, 60 sec: 3276.8, 300 sec: 3249.0). Total num frames: 1060864. Throughput: 0: 835.3. Samples: 265400. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-25 18:44:35,340][00412] Avg episode reward: [(0, '5.592')] [2024-12-25 18:44:36,949][04608] Updated weights for policy 0, policy_version 260 (0.0018) [2024-12-25 18:44:40,336][00412] Fps is (10 sec: 2866.5, 60 sec: 3140.1, 300 sec: 3235.1). Total num frames: 1069056. Throughput: 0: 807.8. Samples: 267098. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-25 18:44:40,338][00412] Avg episode reward: [(0, '5.638')] [2024-12-25 18:44:45,333][00412] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 3249.0). Total num frames: 1089536. Throughput: 0: 784.8. Samples: 271784. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-25 18:44:45,338][00412] Avg episode reward: [(0, '5.933')] [2024-12-25 18:44:45,345][04595] Saving new best policy, reward=5.933! [2024-12-25 18:44:48,897][04608] Updated weights for policy 0, policy_version 270 (0.0031) [2024-12-25 18:44:50,333][00412] Fps is (10 sec: 4097.1, 60 sec: 3345.1, 300 sec: 3249.0). Total num frames: 1110016. Throughput: 0: 831.1. Samples: 277610. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-25 18:44:50,340][00412] Avg episode reward: [(0, '6.232')] [2024-12-25 18:44:50,346][04595] Saving new best policy, reward=6.232! [2024-12-25 18:44:55,333][00412] Fps is (10 sec: 3276.8, 60 sec: 3208.5, 300 sec: 3249.0). Total num frames: 1122304. Throughput: 0: 830.5. Samples: 279842. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-12-25 18:44:55,340][00412] Avg episode reward: [(0, '6.362')] [2024-12-25 18:44:55,348][04595] Saving new best policy, reward=6.362! [2024-12-25 18:45:00,333][00412] Fps is (10 sec: 2867.1, 60 sec: 3208.5, 300 sec: 3249.0). Total num frames: 1138688. Throughput: 0: 780.6. Samples: 283478. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-25 18:45:00,336][00412] Avg episode reward: [(0, '6.315')] [2024-12-25 18:45:02,345][04608] Updated weights for policy 0, policy_version 280 (0.0024) [2024-12-25 18:45:05,333][00412] Fps is (10 sec: 3276.8, 60 sec: 3276.8, 300 sec: 3235.1). Total num frames: 1155072. Throughput: 0: 810.6. Samples: 289404. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-25 18:45:05,339][00412] Avg episode reward: [(0, '6.697')] [2024-12-25 18:45:05,348][04595] Saving new best policy, reward=6.697! [2024-12-25 18:45:10,333][00412] Fps is (10 sec: 3276.9, 60 sec: 3276.8, 300 sec: 3235.2). Total num frames: 1171456. Throughput: 0: 834.4. Samples: 292248. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-12-25 18:45:10,339][00412] Avg episode reward: [(0, '6.802')] [2024-12-25 18:45:10,344][04595] Saving new best policy, reward=6.802! [2024-12-25 18:45:15,333][00412] Fps is (10 sec: 2867.2, 60 sec: 3140.3, 300 sec: 3235.1). Total num frames: 1183744. Throughput: 0: 801.4. Samples: 295964. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-12-25 18:45:15,339][00412] Avg episode reward: [(0, '7.014')] [2024-12-25 18:45:15,347][04595] Saving new best policy, reward=7.014! [2024-12-25 18:45:15,853][04608] Updated weights for policy 0, policy_version 290 (0.0020) [2024-12-25 18:45:20,333][00412] Fps is (10 sec: 3276.8, 60 sec: 3277.0, 300 sec: 3249.0). Total num frames: 1204224. Throughput: 0: 795.0. Samples: 301174. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-25 18:45:20,337][00412] Avg episode reward: [(0, '7.060')] [2024-12-25 18:45:20,339][04595] Saving new best policy, reward=7.060! [2024-12-25 18:45:25,333][00412] Fps is (10 sec: 4096.0, 60 sec: 3345.1, 300 sec: 3249.0). Total num frames: 1224704. Throughput: 0: 822.0. Samples: 304088. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-25 18:45:25,337][00412] Avg episode reward: [(0, '7.299')] [2024-12-25 18:45:25,354][04595] Saving new best policy, reward=7.299! [2024-12-25 18:45:26,972][04608] Updated weights for policy 0, policy_version 300 (0.0022) [2024-12-25 18:45:30,333][00412] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 3235.1). Total num frames: 1232896. Throughput: 0: 818.8. Samples: 308628. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-25 18:45:30,335][00412] Avg episode reward: [(0, '7.449')] [2024-12-25 18:45:30,338][04595] Saving new best policy, reward=7.449! [2024-12-25 18:45:35,333][00412] Fps is (10 sec: 2457.6, 60 sec: 3140.3, 300 sec: 3235.1). Total num frames: 1249280. Throughput: 0: 783.5. Samples: 312868. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-25 18:45:35,336][00412] Avg episode reward: [(0, '7.326')] [2024-12-25 18:45:39,873][04608] Updated weights for policy 0, policy_version 310 (0.0024) [2024-12-25 18:45:40,333][00412] Fps is (10 sec: 3686.4, 60 sec: 3345.2, 300 sec: 3235.1). Total num frames: 1269760. Throughput: 0: 800.3. Samples: 315854. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-25 18:45:40,340][00412] Avg episode reward: [(0, '7.928')] [2024-12-25 18:45:40,344][04595] Saving new best policy, reward=7.928! [2024-12-25 18:45:45,333][00412] Fps is (10 sec: 3686.4, 60 sec: 3276.8, 300 sec: 3235.1). Total num frames: 1286144. Throughput: 0: 842.8. Samples: 321402. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-25 18:45:45,335][00412] Avg episode reward: [(0, '7.707')] [2024-12-25 18:45:50,333][00412] Fps is (10 sec: 2867.2, 60 sec: 3140.3, 300 sec: 3235.2). Total num frames: 1298432. Throughput: 0: 788.5. Samples: 324886. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-25 18:45:50,341][00412] Avg episode reward: [(0, '7.945')] [2024-12-25 18:45:50,347][04595] Saving new best policy, reward=7.945! [2024-12-25 18:45:53,308][04608] Updated weights for policy 0, policy_version 320 (0.0038) [2024-12-25 18:45:55,333][00412] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 3235.1). Total num frames: 1314816. Throughput: 0: 785.6. Samples: 327602. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-25 18:45:55,336][00412] Avg episode reward: [(0, '8.043')] [2024-12-25 18:45:55,352][04595] Saving new best policy, reward=8.043! [2024-12-25 18:46:00,337][00412] Fps is (10 sec: 3685.1, 60 sec: 3276.6, 300 sec: 3235.1). Total num frames: 1335296. Throughput: 0: 830.4. Samples: 333336. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-25 18:46:00,339][00412] Avg episode reward: [(0, '9.104')] [2024-12-25 18:46:00,342][04595] Saving new best policy, reward=9.104! [2024-12-25 18:46:05,333][00412] Fps is (10 sec: 3276.8, 60 sec: 3208.5, 300 sec: 3235.2). Total num frames: 1347584. Throughput: 0: 802.0. Samples: 337264. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-25 18:46:05,342][00412] Avg episode reward: [(0, '9.248')] [2024-12-25 18:46:05,356][04595] Saving new best policy, reward=9.248! [2024-12-25 18:46:06,886][04608] Updated weights for policy 0, policy_version 330 (0.0019) [2024-12-25 18:46:10,333][00412] Fps is (10 sec: 2868.2, 60 sec: 3208.5, 300 sec: 3235.1). Total num frames: 1363968. Throughput: 0: 780.8. Samples: 339222. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-25 18:46:10,336][00412] Avg episode reward: [(0, '9.248')] [2024-12-25 18:46:15,333][00412] Fps is (10 sec: 3276.8, 60 sec: 3276.8, 300 sec: 3221.3). Total num frames: 1380352. Throughput: 0: 809.2. Samples: 345042. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-25 18:46:15,341][00412] Avg episode reward: [(0, '8.653')] [2024-12-25 18:46:17,664][04608] Updated weights for policy 0, policy_version 340 (0.0013) [2024-12-25 18:46:20,333][00412] Fps is (10 sec: 3276.8, 60 sec: 3208.5, 300 sec: 3221.3). Total num frames: 1396736. Throughput: 0: 820.4. Samples: 349784. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-25 18:46:20,339][00412] Avg episode reward: [(0, '8.574')] [2024-12-25 18:46:25,333][00412] Fps is (10 sec: 2867.2, 60 sec: 3072.0, 300 sec: 3235.2). Total num frames: 1409024. Throughput: 0: 792.0. Samples: 351496. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-25 18:46:25,340][00412] Avg episode reward: [(0, '8.619')] [2024-12-25 18:46:25,350][04595] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000344_1409024.pth... [2024-12-25 18:46:25,469][04595] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000156_638976.pth [2024-12-25 18:46:30,333][00412] Fps is (10 sec: 3276.8, 60 sec: 3276.8, 300 sec: 3235.1). Total num frames: 1429504. Throughput: 0: 781.6. Samples: 356574. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-25 18:46:30,335][00412] Avg episode reward: [(0, '8.916')] [2024-12-25 18:46:31,192][04608] Updated weights for policy 0, policy_version 350 (0.0031) [2024-12-25 18:46:35,333][00412] Fps is (10 sec: 3686.4, 60 sec: 3276.8, 300 sec: 3221.3). Total num frames: 1445888. Throughput: 0: 829.7. Samples: 362222. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-25 18:46:35,339][00412] Avg episode reward: [(0, '9.598')] [2024-12-25 18:46:35,357][04595] Saving new best policy, reward=9.598! [2024-12-25 18:46:40,333][00412] Fps is (10 sec: 2867.2, 60 sec: 3140.3, 300 sec: 3221.3). Total num frames: 1458176. Throughput: 0: 805.3. Samples: 363840. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-25 18:46:40,339][00412] Avg episode reward: [(0, '9.288')] [2024-12-25 18:46:44,748][04608] Updated weights for policy 0, policy_version 360 (0.0019) [2024-12-25 18:46:45,333][00412] Fps is (10 sec: 2867.2, 60 sec: 3140.3, 300 sec: 3221.3). Total num frames: 1474560. Throughput: 0: 773.5. Samples: 368142. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-25 18:46:45,339][00412] Avg episode reward: [(0, '8.816')] [2024-12-25 18:46:50,333][00412] Fps is (10 sec: 3686.4, 60 sec: 3276.8, 300 sec: 3235.1). Total num frames: 1495040. Throughput: 0: 816.4. Samples: 374004. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-25 18:46:50,336][00412] Avg episode reward: [(0, '9.355')] [2024-12-25 18:46:55,333][00412] Fps is (10 sec: 3276.8, 60 sec: 3208.5, 300 sec: 3221.3). Total num frames: 1507328. Throughput: 0: 828.4. Samples: 376502. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-25 18:46:55,336][00412] Avg episode reward: [(0, '9.236')] [2024-12-25 18:46:57,788][04608] Updated weights for policy 0, policy_version 370 (0.0033) [2024-12-25 18:47:00,333][00412] Fps is (10 sec: 2457.6, 60 sec: 3072.2, 300 sec: 3221.3). Total num frames: 1519616. Throughput: 0: 777.5. Samples: 380030. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-25 18:47:00,342][00412] Avg episode reward: [(0, '10.054')] [2024-12-25 18:47:00,345][04595] Saving new best policy, reward=10.054! [2024-12-25 18:47:05,333][00412] Fps is (10 sec: 3276.8, 60 sec: 3208.5, 300 sec: 3221.3). Total num frames: 1540096. Throughput: 0: 794.4. Samples: 385532. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-25 18:47:05,341][00412] Avg episode reward: [(0, '10.800')] [2024-12-25 18:47:05,351][04595] Saving new best policy, reward=10.800! [2024-12-25 18:47:08,997][04608] Updated weights for policy 0, policy_version 380 (0.0028) [2024-12-25 18:47:10,333][00412] Fps is (10 sec: 3686.4, 60 sec: 3208.5, 300 sec: 3207.4). Total num frames: 1556480. Throughput: 0: 818.5. Samples: 388330. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-25 18:47:10,338][00412] Avg episode reward: [(0, '9.911')] [2024-12-25 18:47:15,334][00412] Fps is (10 sec: 2867.1, 60 sec: 3140.2, 300 sec: 3207.4). Total num frames: 1568768. Throughput: 0: 796.5. Samples: 392416. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-25 18:47:15,336][00412] Avg episode reward: [(0, '10.270')] [2024-12-25 18:47:20,333][00412] Fps is (10 sec: 3276.8, 60 sec: 3208.5, 300 sec: 3221.3). Total num frames: 1589248. Throughput: 0: 778.6. Samples: 397258. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-25 18:47:20,336][00412] Avg episode reward: [(0, '11.089')] [2024-12-25 18:47:20,342][04595] Saving new best policy, reward=11.089! [2024-12-25 18:47:22,461][04608] Updated weights for policy 0, policy_version 390 (0.0047) [2024-12-25 18:47:25,333][00412] Fps is (10 sec: 3686.6, 60 sec: 3276.8, 300 sec: 3221.3). Total num frames: 1605632. Throughput: 0: 804.8. Samples: 400056. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-25 18:47:25,340][00412] Avg episode reward: [(0, '11.346')] [2024-12-25 18:47:25,353][04595] Saving new best policy, reward=11.346! [2024-12-25 18:47:30,333][00412] Fps is (10 sec: 2867.2, 60 sec: 3140.3, 300 sec: 3207.4). Total num frames: 1617920. Throughput: 0: 819.2. Samples: 405006. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-25 18:47:30,337][00412] Avg episode reward: [(0, '11.757')] [2024-12-25 18:47:30,343][04595] Saving new best policy, reward=11.757! [2024-12-25 18:47:35,333][00412] Fps is (10 sec: 2867.2, 60 sec: 3140.3, 300 sec: 3221.3). Total num frames: 1634304. Throughput: 0: 772.5. Samples: 408768. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-25 18:47:35,335][00412] Avg episode reward: [(0, '11.720')] [2024-12-25 18:47:36,123][04608] Updated weights for policy 0, policy_version 400 (0.0018) [2024-12-25 18:47:40,333][00412] Fps is (10 sec: 3686.3, 60 sec: 3276.8, 300 sec: 3221.3). Total num frames: 1654784. Throughput: 0: 782.0. Samples: 411694. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-25 18:47:40,340][00412] Avg episode reward: [(0, '10.778')] [2024-12-25 18:47:45,333][00412] Fps is (10 sec: 3686.4, 60 sec: 3276.8, 300 sec: 3221.3). Total num frames: 1671168. Throughput: 0: 836.0. Samples: 417652. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-25 18:47:45,337][00412] Avg episode reward: [(0, '11.186')] [2024-12-25 18:47:48,357][04608] Updated weights for policy 0, policy_version 410 (0.0014) [2024-12-25 18:47:50,334][00412] Fps is (10 sec: 2867.1, 60 sec: 3140.2, 300 sec: 3221.3). Total num frames: 1683456. Throughput: 0: 792.8. Samples: 421208. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-25 18:47:50,343][00412] Avg episode reward: [(0, '11.793')] [2024-12-25 18:47:50,345][04595] Saving new best policy, reward=11.793! [2024-12-25 18:47:55,333][00412] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 3221.3). Total num frames: 1699840. Throughput: 0: 781.4. Samples: 423492. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-25 18:47:55,339][00412] Avg episode reward: [(0, '12.252')] [2024-12-25 18:47:55,347][04595] Saving new best policy, reward=12.252! [2024-12-25 18:47:59,953][04608] Updated weights for policy 0, policy_version 420 (0.0032) [2024-12-25 18:48:00,333][00412] Fps is (10 sec: 3686.6, 60 sec: 3345.1, 300 sec: 3221.3). Total num frames: 1720320. Throughput: 0: 820.7. Samples: 429348. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-25 18:48:00,341][00412] Avg episode reward: [(0, '13.655')] [2024-12-25 18:48:00,344][04595] Saving new best policy, reward=13.655! [2024-12-25 18:48:05,333][00412] Fps is (10 sec: 3276.8, 60 sec: 3208.5, 300 sec: 3221.3). Total num frames: 1732608. Throughput: 0: 808.0. Samples: 433620. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-25 18:48:05,339][00412] Avg episode reward: [(0, '13.454')] [2024-12-25 18:48:10,333][00412] Fps is (10 sec: 2457.6, 60 sec: 3140.3, 300 sec: 3207.4). Total num frames: 1744896. Throughput: 0: 786.4. Samples: 435446. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-25 18:48:10,338][00412] Avg episode reward: [(0, '12.572')] [2024-12-25 18:48:13,505][04608] Updated weights for policy 0, policy_version 430 (0.0029) [2024-12-25 18:48:15,333][00412] Fps is (10 sec: 3276.8, 60 sec: 3276.8, 300 sec: 3221.3). Total num frames: 1765376. Throughput: 0: 799.2. Samples: 440968. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-25 18:48:15,336][00412] Avg episode reward: [(0, '13.117')] [2024-12-25 18:48:20,336][00412] Fps is (10 sec: 3685.2, 60 sec: 3208.4, 300 sec: 3207.4). Total num frames: 1781760. Throughput: 0: 834.7. Samples: 446334. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-25 18:48:20,339][00412] Avg episode reward: [(0, '13.028')] [2024-12-25 18:48:25,335][00412] Fps is (10 sec: 2866.6, 60 sec: 3140.2, 300 sec: 3221.2). Total num frames: 1794048. Throughput: 0: 807.2. Samples: 448018. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-25 18:48:25,338][00412] Avg episode reward: [(0, '12.238')] [2024-12-25 18:48:25,355][04595] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000438_1794048.pth... [2024-12-25 18:48:25,528][04595] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000250_1024000.pth [2024-12-25 18:48:27,279][04608] Updated weights for policy 0, policy_version 440 (0.0023) [2024-12-25 18:48:30,333][00412] Fps is (10 sec: 3277.8, 60 sec: 3276.8, 300 sec: 3221.3). Total num frames: 1814528. Throughput: 0: 775.5. Samples: 452550. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-25 18:48:30,338][00412] Avg episode reward: [(0, '13.114')] [2024-12-25 18:48:35,333][00412] Fps is (10 sec: 3687.2, 60 sec: 3276.8, 300 sec: 3221.3). Total num frames: 1830912. Throughput: 0: 824.2. Samples: 458298. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-25 18:48:35,336][00412] Avg episode reward: [(0, '13.928')] [2024-12-25 18:48:35,347][04595] Saving new best policy, reward=13.928! [2024-12-25 18:48:38,985][04608] Updated weights for policy 0, policy_version 450 (0.0019) [2024-12-25 18:48:40,333][00412] Fps is (10 sec: 2867.2, 60 sec: 3140.3, 300 sec: 3207.4). Total num frames: 1843200. Throughput: 0: 822.5. Samples: 460504. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-25 18:48:40,341][00412] Avg episode reward: [(0, '14.236')] [2024-12-25 18:48:40,345][04595] Saving new best policy, reward=14.236! [2024-12-25 18:48:45,333][00412] Fps is (10 sec: 2867.2, 60 sec: 3140.3, 300 sec: 3221.3). Total num frames: 1859584. Throughput: 0: 774.0. Samples: 464176. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-25 18:48:45,338][00412] Avg episode reward: [(0, '13.929')] [2024-12-25 18:48:50,333][00412] Fps is (10 sec: 3686.4, 60 sec: 3276.8, 300 sec: 3221.3). Total num frames: 1880064. Throughput: 0: 808.4. Samples: 469996. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-25 18:48:50,346][00412] Avg episode reward: [(0, '14.949')] [2024-12-25 18:48:50,349][04595] Saving new best policy, reward=14.949! [2024-12-25 18:48:51,430][04608] Updated weights for policy 0, policy_version 460 (0.0028) [2024-12-25 18:48:55,333][00412] Fps is (10 sec: 3276.8, 60 sec: 3208.5, 300 sec: 3207.4). Total num frames: 1892352. Throughput: 0: 829.7. Samples: 472782. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-25 18:48:55,338][00412] Avg episode reward: [(0, '14.487')] [2024-12-25 18:49:00,333][00412] Fps is (10 sec: 2457.6, 60 sec: 3072.0, 300 sec: 3207.4). Total num frames: 1904640. Throughput: 0: 789.5. Samples: 476496. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-25 18:49:00,336][00412] Avg episode reward: [(0, '14.011')] [2024-12-25 18:49:05,071][04608] Updated weights for policy 0, policy_version 470 (0.0033) [2024-12-25 18:49:05,333][00412] Fps is (10 sec: 3276.8, 60 sec: 3208.5, 300 sec: 3221.3). Total num frames: 1925120. Throughput: 0: 780.6. Samples: 481460. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-25 18:49:05,336][00412] Avg episode reward: [(0, '14.416')] [2024-12-25 18:49:10,333][00412] Fps is (10 sec: 4096.0, 60 sec: 3345.1, 300 sec: 3221.3). Total num frames: 1945600. Throughput: 0: 807.0. Samples: 484330. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-25 18:49:10,336][00412] Avg episode reward: [(0, '14.330')] [2024-12-25 18:49:15,333][00412] Fps is (10 sec: 3276.8, 60 sec: 3208.5, 300 sec: 3221.3). Total num frames: 1957888. Throughput: 0: 810.9. Samples: 489040. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-25 18:49:15,340][00412] Avg episode reward: [(0, '14.497')] [2024-12-25 18:49:18,562][04608] Updated weights for policy 0, policy_version 480 (0.0037) [2024-12-25 18:49:20,333][00412] Fps is (10 sec: 2457.6, 60 sec: 3140.4, 300 sec: 3207.4). Total num frames: 1970176. Throughput: 0: 773.0. Samples: 493082. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-25 18:49:20,339][00412] Avg episode reward: [(0, '14.582')] [2024-12-25 18:49:25,333][00412] Fps is (10 sec: 3276.8, 60 sec: 3276.9, 300 sec: 3221.3). Total num frames: 1990656. Throughput: 0: 789.1. Samples: 496014. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-25 18:49:25,340][00412] Avg episode reward: [(0, '16.305')] [2024-12-25 18:49:25,346][04595] Saving new best policy, reward=16.305! [2024-12-25 18:49:29,764][04608] Updated weights for policy 0, policy_version 490 (0.0025) [2024-12-25 18:49:30,333][00412] Fps is (10 sec: 3686.4, 60 sec: 3208.5, 300 sec: 3207.4). Total num frames: 2007040. Throughput: 0: 831.8. Samples: 501608. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-25 18:49:30,339][00412] Avg episode reward: [(0, '16.001')] [2024-12-25 18:49:35,338][00412] Fps is (10 sec: 2865.9, 60 sec: 3140.0, 300 sec: 3221.2). Total num frames: 2019328. Throughput: 0: 780.0. Samples: 505100. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-25 18:49:35,341][00412] Avg episode reward: [(0, '15.999')] [2024-12-25 18:49:40,333][00412] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 3207.4). Total num frames: 2035712. Throughput: 0: 772.8. Samples: 507556. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-25 18:49:40,340][00412] Avg episode reward: [(0, '16.382')] [2024-12-25 18:49:40,342][04595] Saving new best policy, reward=16.382! [2024-12-25 18:49:42,814][04608] Updated weights for policy 0, policy_version 500 (0.0021) [2024-12-25 18:49:45,333][00412] Fps is (10 sec: 3688.1, 60 sec: 3276.8, 300 sec: 3207.4). Total num frames: 2056192. Throughput: 0: 818.7. Samples: 513336. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-25 18:49:45,336][00412] Avg episode reward: [(0, '16.604')] [2024-12-25 18:49:45,347][04595] Saving new best policy, reward=16.604! [2024-12-25 18:49:50,335][00412] Fps is (10 sec: 3276.1, 60 sec: 3140.2, 300 sec: 3207.4). Total num frames: 2068480. Throughput: 0: 802.3. Samples: 517566. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-25 18:49:50,338][00412] Avg episode reward: [(0, '16.994')] [2024-12-25 18:49:50,341][04595] Saving new best policy, reward=16.994! [2024-12-25 18:49:55,333][00412] Fps is (10 sec: 2457.6, 60 sec: 3140.3, 300 sec: 3193.5). Total num frames: 2080768. Throughput: 0: 776.4. Samples: 519268. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-25 18:49:55,335][00412] Avg episode reward: [(0, '17.195')] [2024-12-25 18:49:55,347][04595] Saving new best policy, reward=17.195! [2024-12-25 18:49:56,627][04608] Updated weights for policy 0, policy_version 510 (0.0017) [2024-12-25 18:50:00,333][00412] Fps is (10 sec: 3277.5, 60 sec: 3276.8, 300 sec: 3207.4). Total num frames: 2101248. Throughput: 0: 795.7. Samples: 524848. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-25 18:50:00,339][00412] Avg episode reward: [(0, '19.213')] [2024-12-25 18:50:00,344][04595] Saving new best policy, reward=19.213! [2024-12-25 18:50:05,333][00412] Fps is (10 sec: 3686.4, 60 sec: 3208.5, 300 sec: 3207.4). Total num frames: 2117632. Throughput: 0: 818.4. Samples: 529910. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-25 18:50:05,338][00412] Avg episode reward: [(0, '18.915')] [2024-12-25 18:50:10,175][04608] Updated weights for policy 0, policy_version 520 (0.0053) [2024-12-25 18:50:10,333][00412] Fps is (10 sec: 2867.2, 60 sec: 3072.0, 300 sec: 3207.4). Total num frames: 2129920. Throughput: 0: 791.0. Samples: 531608. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-25 18:50:10,336][00412] Avg episode reward: [(0, '18.804')] [2024-12-25 18:50:15,333][00412] Fps is (10 sec: 2867.2, 60 sec: 3140.3, 300 sec: 3193.5). Total num frames: 2146304. Throughput: 0: 773.9. Samples: 536434. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-25 18:50:15,336][00412] Avg episode reward: [(0, '18.273')] [2024-12-25 18:50:20,333][00412] Fps is (10 sec: 3686.4, 60 sec: 3276.8, 300 sec: 3193.5). Total num frames: 2166784. Throughput: 0: 824.5. Samples: 542198. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-25 18:50:20,335][00412] Avg episode reward: [(0, '16.756')] [2024-12-25 18:50:20,853][04608] Updated weights for policy 0, policy_version 530 (0.0020) [2024-12-25 18:50:25,333][00412] Fps is (10 sec: 3276.8, 60 sec: 3140.3, 300 sec: 3207.4). Total num frames: 2179072. Throughput: 0: 816.6. Samples: 544304. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-25 18:50:25,341][00412] Avg episode reward: [(0, '16.483')] [2024-12-25 18:50:25,355][04595] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000532_2179072.pth... [2024-12-25 18:50:25,528][04595] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000344_1409024.pth [2024-12-25 18:50:30,333][00412] Fps is (10 sec: 2867.2, 60 sec: 3140.3, 300 sec: 3207.4). Total num frames: 2195456. Throughput: 0: 771.1. Samples: 548034. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-25 18:50:30,338][00412] Avg episode reward: [(0, '16.113')] [2024-12-25 18:50:34,379][04608] Updated weights for policy 0, policy_version 540 (0.0026) [2024-12-25 18:50:35,333][00412] Fps is (10 sec: 3276.8, 60 sec: 3208.8, 300 sec: 3193.5). Total num frames: 2211840. Throughput: 0: 802.6. Samples: 553680. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-25 18:50:35,341][00412] Avg episode reward: [(0, '16.804')] [2024-12-25 18:50:40,334][00412] Fps is (10 sec: 3276.6, 60 sec: 3208.5, 300 sec: 3193.5). Total num frames: 2228224. Throughput: 0: 829.7. Samples: 556606. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-25 18:50:40,340][00412] Avg episode reward: [(0, '17.483')] [2024-12-25 18:50:45,333][00412] Fps is (10 sec: 2867.2, 60 sec: 3072.0, 300 sec: 3193.5). Total num frames: 2240512. Throughput: 0: 785.3. Samples: 560188. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-25 18:50:45,336][00412] Avg episode reward: [(0, '18.373')] [2024-12-25 18:50:47,876][04608] Updated weights for policy 0, policy_version 550 (0.0046) [2024-12-25 18:50:50,333][00412] Fps is (10 sec: 3277.0, 60 sec: 3208.7, 300 sec: 3207.4). Total num frames: 2260992. Throughput: 0: 790.0. Samples: 565460. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-25 18:50:50,337][00412] Avg episode reward: [(0, '18.775')] [2024-12-25 18:50:55,333][00412] Fps is (10 sec: 4096.0, 60 sec: 3345.1, 300 sec: 3207.4). Total num frames: 2281472. Throughput: 0: 817.2. Samples: 568380. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-25 18:50:55,336][00412] Avg episode reward: [(0, '19.048')] [2024-12-25 18:51:00,336][00412] Fps is (10 sec: 2866.5, 60 sec: 3140.1, 300 sec: 3193.5). Total num frames: 2289664. Throughput: 0: 810.0. Samples: 572884. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-25 18:51:00,342][00412] Avg episode reward: [(0, '19.662')] [2024-12-25 18:51:00,346][04595] Saving new best policy, reward=19.662! [2024-12-25 18:51:00,734][04608] Updated weights for policy 0, policy_version 560 (0.0027) [2024-12-25 18:51:05,333][00412] Fps is (10 sec: 2457.6, 60 sec: 3140.3, 300 sec: 3193.5). Total num frames: 2306048. Throughput: 0: 775.4. Samples: 577092. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-25 18:51:05,335][00412] Avg episode reward: [(0, '19.127')] [2024-12-25 18:51:10,333][00412] Fps is (10 sec: 3687.3, 60 sec: 3276.8, 300 sec: 3207.4). Total num frames: 2326528. Throughput: 0: 792.5. Samples: 579966. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-25 18:51:10,340][00412] Avg episode reward: [(0, '20.113')] [2024-12-25 18:51:10,344][04595] Saving new best policy, reward=20.113! [2024-12-25 18:51:12,130][04608] Updated weights for policy 0, policy_version 570 (0.0020) [2024-12-25 18:51:15,333][00412] Fps is (10 sec: 3276.8, 60 sec: 3208.5, 300 sec: 3193.5). Total num frames: 2338816. Throughput: 0: 827.2. Samples: 585256. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-25 18:51:15,338][00412] Avg episode reward: [(0, '20.227')] [2024-12-25 18:51:15,425][04595] Saving new best policy, reward=20.227! [2024-12-25 18:51:20,333][00412] Fps is (10 sec: 2457.6, 60 sec: 3072.0, 300 sec: 3193.5). Total num frames: 2351104. Throughput: 0: 778.0. Samples: 588688. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-25 18:51:20,339][00412] Avg episode reward: [(0, '20.112')] [2024-12-25 18:51:25,333][00412] Fps is (10 sec: 3276.8, 60 sec: 3208.5, 300 sec: 3193.5). Total num frames: 2371584. Throughput: 0: 773.3. Samples: 591406. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-25 18:51:25,336][00412] Avg episode reward: [(0, '20.654')] [2024-12-25 18:51:25,346][04595] Saving new best policy, reward=20.654! [2024-12-25 18:51:25,937][04608] Updated weights for policy 0, policy_version 580 (0.0027) [2024-12-25 18:51:30,333][00412] Fps is (10 sec: 4096.0, 60 sec: 3276.8, 300 sec: 3207.4). Total num frames: 2392064. Throughput: 0: 820.4. Samples: 597108. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-25 18:51:30,344][00412] Avg episode reward: [(0, '21.117')] [2024-12-25 18:51:30,348][04595] Saving new best policy, reward=21.117! [2024-12-25 18:51:35,333][00412] Fps is (10 sec: 2867.1, 60 sec: 3140.3, 300 sec: 3193.5). Total num frames: 2400256. Throughput: 0: 791.0. Samples: 601056. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-25 18:51:35,336][00412] Avg episode reward: [(0, '20.462')] [2024-12-25 18:51:39,730][04608] Updated weights for policy 0, policy_version 590 (0.0031) [2024-12-25 18:51:40,333][00412] Fps is (10 sec: 2457.6, 60 sec: 3140.3, 300 sec: 3193.5). Total num frames: 2416640. Throughput: 0: 765.5. Samples: 602828. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-25 18:51:40,340][00412] Avg episode reward: [(0, '19.825')] [2024-12-25 18:51:45,333][00412] Fps is (10 sec: 3686.5, 60 sec: 3276.8, 300 sec: 3193.5). Total num frames: 2437120. Throughput: 0: 794.5. Samples: 608634. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-12-25 18:51:45,337][00412] Avg episode reward: [(0, '18.487')] [2024-12-25 18:51:50,334][00412] Fps is (10 sec: 3685.9, 60 sec: 3208.5, 300 sec: 3207.4). Total num frames: 2453504. Throughput: 0: 813.1. Samples: 613684. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-25 18:51:50,337][00412] Avg episode reward: [(0, '19.374')] [2024-12-25 18:51:52,185][04608] Updated weights for policy 0, policy_version 600 (0.0015) [2024-12-25 18:51:55,333][00412] Fps is (10 sec: 2457.6, 60 sec: 3003.7, 300 sec: 3193.5). Total num frames: 2461696. Throughput: 0: 786.4. Samples: 615356. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-25 18:51:55,343][00412] Avg episode reward: [(0, '20.112')] [2024-12-25 18:52:00,333][00412] Fps is (10 sec: 2867.6, 60 sec: 3208.7, 300 sec: 3193.5). Total num frames: 2482176. Throughput: 0: 779.4. Samples: 620330. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2024-12-25 18:52:00,335][00412] Avg episode reward: [(0, '18.807')] [2024-12-25 18:52:03,647][04608] Updated weights for policy 0, policy_version 610 (0.0019) [2024-12-25 18:52:05,337][00412] Fps is (10 sec: 4094.6, 60 sec: 3276.6, 300 sec: 3207.3). Total num frames: 2502656. Throughput: 0: 832.1. Samples: 626134. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-25 18:52:05,340][00412] Avg episode reward: [(0, '20.021')] [2024-12-25 18:52:10,333][00412] Fps is (10 sec: 3276.8, 60 sec: 3140.3, 300 sec: 3207.4). Total num frames: 2514944. Throughput: 0: 811.5. Samples: 627922. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-25 18:52:10,335][00412] Avg episode reward: [(0, '19.990')] [2024-12-25 18:52:15,333][00412] Fps is (10 sec: 2868.2, 60 sec: 3208.5, 300 sec: 3193.5). Total num frames: 2531328. Throughput: 0: 773.0. Samples: 631892. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-25 18:52:15,337][00412] Avg episode reward: [(0, '19.105')] [2024-12-25 18:52:17,352][04608] Updated weights for policy 0, policy_version 620 (0.0031) [2024-12-25 18:52:20,333][00412] Fps is (10 sec: 3276.8, 60 sec: 3276.8, 300 sec: 3193.5). Total num frames: 2547712. Throughput: 0: 812.8. Samples: 637632. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-25 18:52:20,336][00412] Avg episode reward: [(0, '18.221')] [2024-12-25 18:52:25,333][00412] Fps is (10 sec: 3276.8, 60 sec: 3208.5, 300 sec: 3207.4). Total num frames: 2564096. Throughput: 0: 834.7. Samples: 640390. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-25 18:52:25,341][00412] Avg episode reward: [(0, '17.942')] [2024-12-25 18:52:25,350][04595] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000626_2564096.pth... [2024-12-25 18:52:25,522][04595] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000438_1794048.pth [2024-12-25 18:52:30,333][00412] Fps is (10 sec: 2867.2, 60 sec: 3072.0, 300 sec: 3193.5). Total num frames: 2576384. Throughput: 0: 781.2. Samples: 643786. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-25 18:52:30,336][00412] Avg episode reward: [(0, '18.929')] [2024-12-25 18:52:30,983][04608] Updated weights for policy 0, policy_version 630 (0.0032) [2024-12-25 18:52:35,333][00412] Fps is (10 sec: 3276.8, 60 sec: 3276.8, 300 sec: 3193.5). Total num frames: 2596864. Throughput: 0: 789.8. Samples: 649226. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-25 18:52:35,336][00412] Avg episode reward: [(0, '19.425')] [2024-12-25 18:52:40,333][00412] Fps is (10 sec: 3686.4, 60 sec: 3276.8, 300 sec: 3193.5). Total num frames: 2613248. Throughput: 0: 814.3. Samples: 651998. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-25 18:52:40,338][00412] Avg episode reward: [(0, '19.702')] [2024-12-25 18:52:43,031][04608] Updated weights for policy 0, policy_version 640 (0.0032) [2024-12-25 18:52:45,333][00412] Fps is (10 sec: 2867.2, 60 sec: 3140.3, 300 sec: 3193.5). Total num frames: 2625536. Throughput: 0: 797.1. Samples: 656198. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-25 18:52:45,340][00412] Avg episode reward: [(0, '20.990')] [2024-12-25 18:52:50,333][00412] Fps is (10 sec: 2867.2, 60 sec: 3140.3, 300 sec: 3193.5). Total num frames: 2641920. Throughput: 0: 773.2. Samples: 660924. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-25 18:52:50,337][00412] Avg episode reward: [(0, '21.440')] [2024-12-25 18:52:50,341][04595] Saving new best policy, reward=21.440! [2024-12-25 18:52:55,152][04608] Updated weights for policy 0, policy_version 650 (0.0023) [2024-12-25 18:52:55,333][00412] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3193.5). Total num frames: 2662400. Throughput: 0: 795.6. Samples: 663726. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-25 18:52:55,337][00412] Avg episode reward: [(0, '19.670')] [2024-12-25 18:53:00,333][00412] Fps is (10 sec: 3276.7, 60 sec: 3208.5, 300 sec: 3193.5). Total num frames: 2674688. Throughput: 0: 819.3. Samples: 668760. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-25 18:53:00,336][00412] Avg episode reward: [(0, '18.032')] [2024-12-25 18:53:05,333][00412] Fps is (10 sec: 2867.1, 60 sec: 3140.4, 300 sec: 3207.4). Total num frames: 2691072. Throughput: 0: 774.8. Samples: 672496. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-25 18:53:05,336][00412] Avg episode reward: [(0, '18.336')] [2024-12-25 18:53:08,713][04608] Updated weights for policy 0, policy_version 660 (0.0030) [2024-12-25 18:53:10,333][00412] Fps is (10 sec: 3276.9, 60 sec: 3208.5, 300 sec: 3193.5). Total num frames: 2707456. Throughput: 0: 774.6. Samples: 675246. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-25 18:53:10,338][00412] Avg episode reward: [(0, '18.845')] [2024-12-25 18:53:15,333][00412] Fps is (10 sec: 3276.9, 60 sec: 3208.5, 300 sec: 3193.5). Total num frames: 2723840. Throughput: 0: 827.4. Samples: 681018. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-25 18:53:15,339][00412] Avg episode reward: [(0, '18.331')] [2024-12-25 18:53:20,333][00412] Fps is (10 sec: 2867.2, 60 sec: 3140.3, 300 sec: 3193.5). Total num frames: 2736128. Throughput: 0: 784.5. Samples: 684528. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-25 18:53:20,336][00412] Avg episode reward: [(0, '18.110')] [2024-12-25 18:53:22,531][04608] Updated weights for policy 0, policy_version 670 (0.0028) [2024-12-25 18:53:25,333][00412] Fps is (10 sec: 2867.2, 60 sec: 3140.3, 300 sec: 3179.6). Total num frames: 2752512. Throughput: 0: 772.6. Samples: 686764. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-25 18:53:25,336][00412] Avg episode reward: [(0, '18.635')] [2024-12-25 18:53:30,333][00412] Fps is (10 sec: 3686.4, 60 sec: 3276.8, 300 sec: 3193.5). Total num frames: 2772992. Throughput: 0: 807.2. Samples: 692520. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-25 18:53:30,335][00412] Avg episode reward: [(0, '20.957')] [2024-12-25 18:53:34,196][04608] Updated weights for policy 0, policy_version 680 (0.0017) [2024-12-25 18:53:35,334][00412] Fps is (10 sec: 3276.7, 60 sec: 3140.2, 300 sec: 3193.5). Total num frames: 2785280. Throughput: 0: 801.1. Samples: 696972. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-25 18:53:35,337][00412] Avg episode reward: [(0, '20.645')] [2024-12-25 18:53:40,333][00412] Fps is (10 sec: 2457.6, 60 sec: 3072.0, 300 sec: 3179.6). Total num frames: 2797568. Throughput: 0: 776.5. Samples: 698668. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-25 18:53:40,336][00412] Avg episode reward: [(0, '20.756')] [2024-12-25 18:53:45,333][00412] Fps is (10 sec: 3276.9, 60 sec: 3208.5, 300 sec: 3179.6). Total num frames: 2818048. Throughput: 0: 783.4. Samples: 704014. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-25 18:53:45,342][00412] Avg episode reward: [(0, '19.345')] [2024-12-25 18:53:46,869][04608] Updated weights for policy 0, policy_version 690 (0.0023) [2024-12-25 18:53:50,336][00412] Fps is (10 sec: 3685.5, 60 sec: 3208.4, 300 sec: 3193.5). Total num frames: 2834432. Throughput: 0: 822.1. Samples: 709494. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-25 18:53:50,338][00412] Avg episode reward: [(0, '19.626')] [2024-12-25 18:53:55,334][00412] Fps is (10 sec: 2867.1, 60 sec: 3072.0, 300 sec: 3193.5). Total num frames: 2846720. Throughput: 0: 798.1. Samples: 711160. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-25 18:53:55,336][00412] Avg episode reward: [(0, '20.764')] [2024-12-25 18:54:00,211][04608] Updated weights for policy 0, policy_version 700 (0.0015) [2024-12-25 18:54:00,334][00412] Fps is (10 sec: 3277.5, 60 sec: 3208.5, 300 sec: 3193.5). Total num frames: 2867200. Throughput: 0: 771.4. Samples: 715732. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-12-25 18:54:00,339][00412] Avg episode reward: [(0, '19.846')] [2024-12-25 18:54:05,333][00412] Fps is (10 sec: 3686.6, 60 sec: 3208.5, 300 sec: 3179.6). Total num frames: 2883584. Throughput: 0: 822.5. Samples: 721542. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-25 18:54:05,335][00412] Avg episode reward: [(0, '19.172')] [2024-12-25 18:54:10,333][00412] Fps is (10 sec: 2867.2, 60 sec: 3140.2, 300 sec: 3179.6). Total num frames: 2895872. Throughput: 0: 823.9. Samples: 723842. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-25 18:54:10,339][00412] Avg episode reward: [(0, '19.259')] [2024-12-25 18:54:13,896][04608] Updated weights for policy 0, policy_version 710 (0.0030) [2024-12-25 18:54:15,333][00412] Fps is (10 sec: 2867.2, 60 sec: 3140.3, 300 sec: 3193.5). Total num frames: 2912256. Throughput: 0: 774.6. Samples: 727376. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-25 18:54:15,340][00412] Avg episode reward: [(0, '21.153')] [2024-12-25 18:54:20,333][00412] Fps is (10 sec: 3686.5, 60 sec: 3276.8, 300 sec: 3193.5). Total num frames: 2932736. Throughput: 0: 804.5. Samples: 733174. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-25 18:54:20,336][00412] Avg episode reward: [(0, '22.251')] [2024-12-25 18:54:20,343][04595] Saving new best policy, reward=22.251! [2024-12-25 18:54:24,822][04608] Updated weights for policy 0, policy_version 720 (0.0021) [2024-12-25 18:54:25,333][00412] Fps is (10 sec: 3686.4, 60 sec: 3276.8, 300 sec: 3193.5). Total num frames: 2949120. Throughput: 0: 830.4. Samples: 736038. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-25 18:54:25,338][00412] Avg episode reward: [(0, '21.742')] [2024-12-25 18:54:25,350][04595] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000720_2949120.pth... [2024-12-25 18:54:25,575][04595] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000532_2179072.pth [2024-12-25 18:54:30,333][00412] Fps is (10 sec: 2457.6, 60 sec: 3072.0, 300 sec: 3179.7). Total num frames: 2957312. Throughput: 0: 797.2. Samples: 739886. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-25 18:54:30,341][00412] Avg episode reward: [(0, '21.808')] [2024-12-25 18:54:35,333][00412] Fps is (10 sec: 2867.1, 60 sec: 3208.5, 300 sec: 3193.5). Total num frames: 2977792. Throughput: 0: 784.4. Samples: 744792. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-25 18:54:35,335][00412] Avg episode reward: [(0, '21.733')] [2024-12-25 18:54:38,152][04608] Updated weights for policy 0, policy_version 730 (0.0025) [2024-12-25 18:54:40,333][00412] Fps is (10 sec: 4096.0, 60 sec: 3345.1, 300 sec: 3193.5). Total num frames: 2998272. Throughput: 0: 812.0. Samples: 747700. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-25 18:54:40,341][00412] Avg episode reward: [(0, '20.389')] [2024-12-25 18:54:45,333][00412] Fps is (10 sec: 3276.9, 60 sec: 3208.5, 300 sec: 3193.5). Total num frames: 3010560. Throughput: 0: 818.4. Samples: 752558. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-25 18:54:45,339][00412] Avg episode reward: [(0, '20.317')] [2024-12-25 18:54:50,333][00412] Fps is (10 sec: 2457.6, 60 sec: 3140.4, 300 sec: 3193.5). Total num frames: 3022848. Throughput: 0: 777.3. Samples: 756520. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-25 18:54:50,341][00412] Avg episode reward: [(0, '20.951')] [2024-12-25 18:54:51,439][04608] Updated weights for policy 0, policy_version 740 (0.0034) [2024-12-25 18:54:55,333][00412] Fps is (10 sec: 3276.8, 60 sec: 3276.8, 300 sec: 3193.5). Total num frames: 3043328. Throughput: 0: 790.7. Samples: 759422. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-25 18:54:55,335][00412] Avg episode reward: [(0, '20.446')] [2024-12-25 18:55:00,335][00412] Fps is (10 sec: 3685.7, 60 sec: 3208.5, 300 sec: 3193.5). Total num frames: 3059712. Throughput: 0: 840.1. Samples: 765184. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-25 18:55:00,341][00412] Avg episode reward: [(0, '21.059')] [2024-12-25 18:55:04,501][04608] Updated weights for policy 0, policy_version 750 (0.0040) [2024-12-25 18:55:05,338][00412] Fps is (10 sec: 2865.9, 60 sec: 3140.0, 300 sec: 3193.4). Total num frames: 3072000. Throughput: 0: 789.4. Samples: 768700. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-25 18:55:05,345][00412] Avg episode reward: [(0, '19.912')] [2024-12-25 18:55:10,333][00412] Fps is (10 sec: 3277.4, 60 sec: 3276.8, 300 sec: 3207.4). Total num frames: 3092480. Throughput: 0: 782.7. Samples: 771258. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-25 18:55:10,335][00412] Avg episode reward: [(0, '20.784')] [2024-12-25 18:55:15,333][00412] Fps is (10 sec: 3688.1, 60 sec: 3276.8, 300 sec: 3193.5). Total num frames: 3108864. Throughput: 0: 824.0. Samples: 776966. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-25 18:55:15,341][00412] Avg episode reward: [(0, '21.905')] [2024-12-25 18:55:15,508][04608] Updated weights for policy 0, policy_version 760 (0.0025) [2024-12-25 18:55:20,333][00412] Fps is (10 sec: 2867.2, 60 sec: 3140.3, 300 sec: 3193.5). Total num frames: 3121152. Throughput: 0: 810.1. Samples: 781244. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-25 18:55:20,340][00412] Avg episode reward: [(0, '21.782')] [2024-12-25 18:55:25,333][00412] Fps is (10 sec: 2867.2, 60 sec: 3140.3, 300 sec: 3193.5). Total num frames: 3137536. Throughput: 0: 785.4. Samples: 783042. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-25 18:55:25,338][00412] Avg episode reward: [(0, '21.269')] [2024-12-25 18:55:28,854][04608] Updated weights for policy 0, policy_version 770 (0.0025) [2024-12-25 18:55:30,333][00412] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3207.4). Total num frames: 3158016. Throughput: 0: 807.9. Samples: 788914. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-25 18:55:30,339][00412] Avg episode reward: [(0, '21.012')] [2024-12-25 18:55:35,333][00412] Fps is (10 sec: 3686.3, 60 sec: 3276.8, 300 sec: 3207.4). Total num frames: 3174400. Throughput: 0: 836.7. Samples: 794172. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-25 18:55:35,336][00412] Avg episode reward: [(0, '21.103')] [2024-12-25 18:55:40,333][00412] Fps is (10 sec: 2867.2, 60 sec: 3140.3, 300 sec: 3207.4). Total num frames: 3186688. Throughput: 0: 810.0. Samples: 795872. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-25 18:55:40,335][00412] Avg episode reward: [(0, '20.482')] [2024-12-25 18:55:42,441][04608] Updated weights for policy 0, policy_version 780 (0.0042) [2024-12-25 18:55:45,333][00412] Fps is (10 sec: 2867.3, 60 sec: 3208.5, 300 sec: 3193.5). Total num frames: 3203072. Throughput: 0: 787.1. Samples: 800600. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-25 18:55:45,336][00412] Avg episode reward: [(0, '18.804')] [2024-12-25 18:55:50,333][00412] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3193.5). Total num frames: 3223552. Throughput: 0: 838.8. Samples: 806440. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-25 18:55:50,338][00412] Avg episode reward: [(0, '18.844')] [2024-12-25 18:55:54,725][04608] Updated weights for policy 0, policy_version 790 (0.0027) [2024-12-25 18:55:55,333][00412] Fps is (10 sec: 3276.8, 60 sec: 3208.5, 300 sec: 3207.4). Total num frames: 3235840. Throughput: 0: 826.7. Samples: 808460. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-25 18:55:55,339][00412] Avg episode reward: [(0, '19.561')] [2024-12-25 18:56:00,333][00412] Fps is (10 sec: 2867.2, 60 sec: 3208.6, 300 sec: 3207.4). Total num frames: 3252224. Throughput: 0: 786.5. Samples: 812360. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-25 18:56:00,339][00412] Avg episode reward: [(0, '19.074')] [2024-12-25 18:56:05,333][00412] Fps is (10 sec: 3686.4, 60 sec: 3345.3, 300 sec: 3207.4). Total num frames: 3272704. Throughput: 0: 821.5. Samples: 818212. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-25 18:56:05,336][00412] Avg episode reward: [(0, '19.413')] [2024-12-25 18:56:06,323][04608] Updated weights for policy 0, policy_version 800 (0.0015) [2024-12-25 18:56:10,333][00412] Fps is (10 sec: 3276.7, 60 sec: 3208.5, 300 sec: 3207.4). Total num frames: 3284992. Throughput: 0: 845.6. Samples: 821094. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-25 18:56:10,336][00412] Avg episode reward: [(0, '19.067')] [2024-12-25 18:56:15,333][00412] Fps is (10 sec: 2457.6, 60 sec: 3140.3, 300 sec: 3207.4). Total num frames: 3297280. Throughput: 0: 791.2. Samples: 824516. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2024-12-25 18:56:15,335][00412] Avg episode reward: [(0, '19.485')] [2024-12-25 18:56:19,715][04608] Updated weights for policy 0, policy_version 810 (0.0028) [2024-12-25 18:56:20,333][00412] Fps is (10 sec: 3276.8, 60 sec: 3276.8, 300 sec: 3207.4). Total num frames: 3317760. Throughput: 0: 793.6. Samples: 829882. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-25 18:56:20,336][00412] Avg episode reward: [(0, '19.888')] [2024-12-25 18:56:25,334][00412] Fps is (10 sec: 4095.8, 60 sec: 3345.0, 300 sec: 3207.4). Total num frames: 3338240. Throughput: 0: 821.1. Samples: 832822. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-25 18:56:25,339][00412] Avg episode reward: [(0, '20.539')] [2024-12-25 18:56:25,348][04595] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000815_3338240.pth... [2024-12-25 18:56:25,557][04595] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000626_2564096.pth [2024-12-25 18:56:30,337][00412] Fps is (10 sec: 2866.2, 60 sec: 3140.1, 300 sec: 3207.3). Total num frames: 3346432. Throughput: 0: 812.2. Samples: 837154. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-25 18:56:30,339][00412] Avg episode reward: [(0, '21.114')] [2024-12-25 18:56:33,175][04608] Updated weights for policy 0, policy_version 820 (0.0030) [2024-12-25 18:56:35,333][00412] Fps is (10 sec: 2867.3, 60 sec: 3208.5, 300 sec: 3221.3). Total num frames: 3366912. Throughput: 0: 782.8. Samples: 841668. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-25 18:56:35,336][00412] Avg episode reward: [(0, '22.826')] [2024-12-25 18:56:35,349][04595] Saving new best policy, reward=22.826! [2024-12-25 18:56:40,333][00412] Fps is (10 sec: 3687.7, 60 sec: 3276.8, 300 sec: 3207.4). Total num frames: 3383296. Throughput: 0: 802.2. Samples: 844558. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-25 18:56:40,339][00412] Avg episode reward: [(0, '22.648')] [2024-12-25 18:56:44,981][04608] Updated weights for policy 0, policy_version 830 (0.0025) [2024-12-25 18:56:45,334][00412] Fps is (10 sec: 3276.5, 60 sec: 3276.7, 300 sec: 3207.4). Total num frames: 3399680. Throughput: 0: 829.1. Samples: 849670. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-25 18:56:45,337][00412] Avg episode reward: [(0, '22.383')] [2024-12-25 18:56:50,333][00412] Fps is (10 sec: 2867.2, 60 sec: 3140.3, 300 sec: 3221.3). Total num frames: 3411968. Throughput: 0: 778.7. Samples: 853252. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-25 18:56:50,335][00412] Avg episode reward: [(0, '23.119')] [2024-12-25 18:56:50,343][04595] Saving new best policy, reward=23.119! [2024-12-25 18:56:55,333][00412] Fps is (10 sec: 3277.2, 60 sec: 3276.8, 300 sec: 3221.3). Total num frames: 3432448. Throughput: 0: 778.4. Samples: 856120. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-25 18:56:55,340][00412] Avg episode reward: [(0, '21.533')] [2024-12-25 18:56:57,292][04608] Updated weights for policy 0, policy_version 840 (0.0017) [2024-12-25 18:57:00,333][00412] Fps is (10 sec: 3686.4, 60 sec: 3276.8, 300 sec: 3207.4). Total num frames: 3448832. Throughput: 0: 831.5. Samples: 861934. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-25 18:57:00,336][00412] Avg episode reward: [(0, '20.566')] [2024-12-25 18:57:05,334][00412] Fps is (10 sec: 2866.9, 60 sec: 3140.2, 300 sec: 3207.4). Total num frames: 3461120. Throughput: 0: 797.0. Samples: 865748. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-25 18:57:05,341][00412] Avg episode reward: [(0, '19.697')] [2024-12-25 18:57:10,333][00412] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 3207.4). Total num frames: 3477504. Throughput: 0: 778.5. Samples: 867852. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-25 18:57:10,342][00412] Avg episode reward: [(0, '20.812')] [2024-12-25 18:57:11,000][04608] Updated weights for policy 0, policy_version 850 (0.0031) [2024-12-25 18:57:15,333][00412] Fps is (10 sec: 3686.8, 60 sec: 3345.1, 300 sec: 3221.3). Total num frames: 3497984. Throughput: 0: 810.1. Samples: 873606. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-25 18:57:15,340][00412] Avg episode reward: [(0, '20.022')] [2024-12-25 18:57:20,335][00412] Fps is (10 sec: 3276.2, 60 sec: 3208.4, 300 sec: 3207.4). Total num frames: 3510272. Throughput: 0: 815.3. Samples: 878358. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-25 18:57:20,337][00412] Avg episode reward: [(0, '18.676')] [2024-12-25 18:57:24,352][04608] Updated weights for policy 0, policy_version 860 (0.0021) [2024-12-25 18:57:25,333][00412] Fps is (10 sec: 2457.6, 60 sec: 3072.0, 300 sec: 3207.4). Total num frames: 3522560. Throughput: 0: 789.6. Samples: 880092. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-25 18:57:25,335][00412] Avg episode reward: [(0, '19.787')] [2024-12-25 18:57:30,333][00412] Fps is (10 sec: 3277.4, 60 sec: 3277.0, 300 sec: 3207.4). Total num frames: 3543040. Throughput: 0: 792.9. Samples: 885348. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-25 18:57:30,336][00412] Avg episode reward: [(0, '19.287')] [2024-12-25 18:57:35,290][04608] Updated weights for policy 0, policy_version 870 (0.0027) [2024-12-25 18:57:35,338][00412] Fps is (10 sec: 4094.1, 60 sec: 3276.5, 300 sec: 3221.2). Total num frames: 3563520. Throughput: 0: 839.7. Samples: 891042. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-25 18:57:35,341][00412] Avg episode reward: [(0, '20.340')] [2024-12-25 18:57:40,336][00412] Fps is (10 sec: 2866.3, 60 sec: 3140.1, 300 sec: 3207.3). Total num frames: 3571712. Throughput: 0: 813.7. Samples: 892738. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-25 18:57:40,340][00412] Avg episode reward: [(0, '19.970')] [2024-12-25 18:57:45,333][00412] Fps is (10 sec: 2458.7, 60 sec: 3140.3, 300 sec: 3207.4). Total num frames: 3588096. Throughput: 0: 778.8. Samples: 896982. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-25 18:57:45,335][00412] Avg episode reward: [(0, '20.729')] [2024-12-25 18:57:48,456][04608] Updated weights for policy 0, policy_version 880 (0.0023) [2024-12-25 18:57:50,333][00412] Fps is (10 sec: 3687.6, 60 sec: 3276.8, 300 sec: 3207.4). Total num frames: 3608576. Throughput: 0: 824.2. Samples: 902838. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-25 18:57:50,340][00412] Avg episode reward: [(0, '22.252')] [2024-12-25 18:57:55,333][00412] Fps is (10 sec: 3686.4, 60 sec: 3208.5, 300 sec: 3221.3). Total num frames: 3624960. Throughput: 0: 833.1. Samples: 905340. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-25 18:57:55,340][00412] Avg episode reward: [(0, '22.195')] [2024-12-25 18:58:00,333][00412] Fps is (10 sec: 2867.2, 60 sec: 3140.3, 300 sec: 3207.4). Total num frames: 3637248. Throughput: 0: 783.5. Samples: 908862. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-25 18:58:00,336][00412] Avg episode reward: [(0, '22.232')] [2024-12-25 18:58:01,672][04608] Updated weights for policy 0, policy_version 890 (0.0014) [2024-12-25 18:58:05,333][00412] Fps is (10 sec: 3276.8, 60 sec: 3276.9, 300 sec: 3221.3). Total num frames: 3657728. Throughput: 0: 809.3. Samples: 914774. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-25 18:58:05,339][00412] Avg episode reward: [(0, '22.287')] [2024-12-25 18:58:10,333][00412] Fps is (10 sec: 3686.4, 60 sec: 3276.8, 300 sec: 3221.3). Total num frames: 3674112. Throughput: 0: 836.4. Samples: 917728. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-25 18:58:10,341][00412] Avg episode reward: [(0, '22.804')] [2024-12-25 18:58:14,650][04608] Updated weights for policy 0, policy_version 900 (0.0037) [2024-12-25 18:58:15,338][00412] Fps is (10 sec: 2865.9, 60 sec: 3140.0, 300 sec: 3221.2). Total num frames: 3686400. Throughput: 0: 806.9. Samples: 921664. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-25 18:58:15,344][00412] Avg episode reward: [(0, '22.128')] [2024-12-25 18:58:20,333][00412] Fps is (10 sec: 2867.2, 60 sec: 3208.6, 300 sec: 3221.3). Total num frames: 3702784. Throughput: 0: 788.3. Samples: 926514. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-25 18:58:20,342][00412] Avg episode reward: [(0, '21.572')] [2024-12-25 18:58:25,333][00412] Fps is (10 sec: 3688.0, 60 sec: 3345.1, 300 sec: 3221.3). Total num frames: 3723264. Throughput: 0: 815.7. Samples: 929442. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-25 18:58:25,336][00412] Avg episode reward: [(0, '21.323')] [2024-12-25 18:58:25,352][04595] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000909_3723264.pth... [2024-12-25 18:58:25,476][04595] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000720_2949120.pth [2024-12-25 18:58:25,699][04608] Updated weights for policy 0, policy_version 910 (0.0020) [2024-12-25 18:58:30,333][00412] Fps is (10 sec: 3276.8, 60 sec: 3208.5, 300 sec: 3221.3). Total num frames: 3735552. Throughput: 0: 830.0. Samples: 934332. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-25 18:58:30,336][00412] Avg episode reward: [(0, '20.735')] [2024-12-25 18:58:35,333][00412] Fps is (10 sec: 2867.3, 60 sec: 3140.5, 300 sec: 3235.1). Total num frames: 3751936. Throughput: 0: 788.7. Samples: 938328. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-25 18:58:35,336][00412] Avg episode reward: [(0, '20.802')] [2024-12-25 18:58:39,142][04608] Updated weights for policy 0, policy_version 920 (0.0020) [2024-12-25 18:58:40,333][00412] Fps is (10 sec: 3686.4, 60 sec: 3345.3, 300 sec: 3235.1). Total num frames: 3772416. Throughput: 0: 798.4. Samples: 941266. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-25 18:58:40,336][00412] Avg episode reward: [(0, '21.012')] [2024-12-25 18:58:45,333][00412] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3235.2). Total num frames: 3788800. Throughput: 0: 846.1. Samples: 946938. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-25 18:58:45,342][00412] Avg episode reward: [(0, '20.598')] [2024-12-25 18:58:50,334][00412] Fps is (10 sec: 2866.9, 60 sec: 3208.5, 300 sec: 3235.1). Total num frames: 3801088. Throughput: 0: 793.9. Samples: 950500. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-25 18:58:50,342][00412] Avg episode reward: [(0, '20.609')] [2024-12-25 18:58:52,473][04608] Updated weights for policy 0, policy_version 930 (0.0032) [2024-12-25 18:58:55,333][00412] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 3221.3). Total num frames: 3817472. Throughput: 0: 782.8. Samples: 952954. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-25 18:58:55,335][00412] Avg episode reward: [(0, '20.958')] [2024-12-25 18:59:00,334][00412] Fps is (10 sec: 3686.6, 60 sec: 3345.0, 300 sec: 3235.1). Total num frames: 3837952. Throughput: 0: 827.5. Samples: 958896. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2024-12-25 18:59:00,336][00412] Avg episode reward: [(0, '22.105')] [2024-12-25 18:59:04,249][04608] Updated weights for policy 0, policy_version 940 (0.0016) [2024-12-25 18:59:05,334][00412] Fps is (10 sec: 3276.6, 60 sec: 3208.5, 300 sec: 3235.1). Total num frames: 3850240. Throughput: 0: 817.5. Samples: 963304. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-25 18:59:05,337][00412] Avg episode reward: [(0, '20.976')] [2024-12-25 18:59:10,333][00412] Fps is (10 sec: 2867.3, 60 sec: 3208.5, 300 sec: 3235.1). Total num frames: 3866624. Throughput: 0: 793.1. Samples: 965130. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-25 18:59:10,339][00412] Avg episode reward: [(0, '21.309')] [2024-12-25 18:59:15,333][00412] Fps is (10 sec: 3686.6, 60 sec: 3345.3, 300 sec: 3235.1). Total num frames: 3887104. Throughput: 0: 808.0. Samples: 970692. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-25 18:59:15,338][00412] Avg episode reward: [(0, '21.703')] [2024-12-25 18:59:16,329][04608] Updated weights for policy 0, policy_version 950 (0.0018) [2024-12-25 18:59:20,333][00412] Fps is (10 sec: 3276.8, 60 sec: 3276.8, 300 sec: 3221.3). Total num frames: 3899392. Throughput: 0: 837.9. Samples: 976032. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2024-12-25 18:59:20,341][00412] Avg episode reward: [(0, '21.333')] [2024-12-25 18:59:25,333][00412] Fps is (10 sec: 2457.6, 60 sec: 3140.3, 300 sec: 3235.1). Total num frames: 3911680. Throughput: 0: 811.3. Samples: 977776. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2024-12-25 18:59:25,339][00412] Avg episode reward: [(0, '21.563')] [2024-12-25 18:59:29,684][04608] Updated weights for policy 0, policy_version 960 (0.0016) [2024-12-25 18:59:30,333][00412] Fps is (10 sec: 3276.8, 60 sec: 3276.8, 300 sec: 3235.1). Total num frames: 3932160. Throughput: 0: 790.0. Samples: 982488. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2024-12-25 18:59:30,339][00412] Avg episode reward: [(0, '21.581')] [2024-12-25 18:59:35,333][00412] Fps is (10 sec: 4096.0, 60 sec: 3345.1, 300 sec: 3235.1). Total num frames: 3952640. Throughput: 0: 842.3. Samples: 988402. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-25 18:59:35,338][00412] Avg episode reward: [(0, '21.931')] [2024-12-25 18:59:40,333][00412] Fps is (10 sec: 3276.8, 60 sec: 3208.5, 300 sec: 3235.1). Total num frames: 3964928. Throughput: 0: 837.0. Samples: 990620. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2024-12-25 18:59:40,335][00412] Avg episode reward: [(0, '22.244')] [2024-12-25 18:59:43,296][04608] Updated weights for policy 0, policy_version 970 (0.0027) [2024-12-25 18:59:45,333][00412] Fps is (10 sec: 2457.6, 60 sec: 3140.3, 300 sec: 3235.1). Total num frames: 3977216. Throughput: 0: 785.5. Samples: 994242. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2024-12-25 18:59:45,335][00412] Avg episode reward: [(0, '21.391')] [2024-12-25 18:59:50,333][00412] Fps is (10 sec: 3276.8, 60 sec: 3276.8, 300 sec: 3235.1). Total num frames: 3997696. Throughput: 0: 817.3. Samples: 1000084. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2024-12-25 18:59:50,337][00412] Avg episode reward: [(0, '20.982')] [2024-12-25 18:59:51,735][04595] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-12-25 18:59:51,737][00412] Component Batcher_0 stopped! [2024-12-25 18:59:51,737][04595] Stopping Batcher_0... [2024-12-25 18:59:51,749][04595] Loop batcher_evt_loop terminating... [2024-12-25 18:59:51,797][04608] Weights refcount: 2 0 [2024-12-25 18:59:51,803][00412] Component InferenceWorker_p0-w0 stopped! [2024-12-25 18:59:51,808][04608] Stopping InferenceWorker_p0-w0... [2024-12-25 18:59:51,808][04608] Loop inference_proc0-0_evt_loop terminating... [2024-12-25 18:59:51,868][04595] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000815_3338240.pth [2024-12-25 18:59:51,894][04595] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-12-25 18:59:52,085][00412] Component LearnerWorker_p0 stopped! [2024-12-25 18:59:52,085][04595] Stopping LearnerWorker_p0... [2024-12-25 18:59:52,095][04595] Loop learner_proc0_evt_loop terminating... [2024-12-25 18:59:52,165][00412] Component RolloutWorker_w7 stopped! [2024-12-25 18:59:52,165][04616] Stopping RolloutWorker_w7... [2024-12-25 18:59:52,170][00412] Component RolloutWorker_w0 stopped! [2024-12-25 18:59:52,174][04609] Stopping RolloutWorker_w0... [2024-12-25 18:59:52,171][04616] Loop rollout_proc7_evt_loop terminating... [2024-12-25 18:59:52,181][04611] Stopping RolloutWorker_w2... [2024-12-25 18:59:52,181][00412] Component RolloutWorker_w2 stopped! [2024-12-25 18:59:52,177][04609] Loop rollout_proc0_evt_loop terminating... [2024-12-25 18:59:52,192][04611] Loop rollout_proc2_evt_loop terminating... [2024-12-25 18:59:52,212][04610] Stopping RolloutWorker_w1... [2024-12-25 18:59:52,213][04615] Stopping RolloutWorker_w6... [2024-12-25 18:59:52,212][00412] Component RolloutWorker_w6 stopped! [2024-12-25 18:59:52,213][04610] Loop rollout_proc1_evt_loop terminating... [2024-12-25 18:59:52,214][04615] Loop rollout_proc6_evt_loop terminating... [2024-12-25 18:59:52,217][00412] Component RolloutWorker_w1 stopped! [2024-12-25 18:59:52,227][04613] Stopping RolloutWorker_w4... [2024-12-25 18:59:52,227][00412] Component RolloutWorker_w4 stopped! [2024-12-25 18:59:52,232][04612] Stopping RolloutWorker_w3... [2024-12-25 18:59:52,232][00412] Component RolloutWorker_w3 stopped! [2024-12-25 18:59:52,230][04613] Loop rollout_proc4_evt_loop terminating... [2024-12-25 18:59:52,246][04612] Loop rollout_proc3_evt_loop terminating... [2024-12-25 18:59:52,249][04614] Stopping RolloutWorker_w5... [2024-12-25 18:59:52,249][00412] Component RolloutWorker_w5 stopped! [2024-12-25 18:59:52,254][04614] Loop rollout_proc5_evt_loop terminating... [2024-12-25 18:59:52,251][00412] Waiting for process learner_proc0 to stop... [2024-12-25 18:59:54,222][00412] Waiting for process inference_proc0-0 to join... [2024-12-25 18:59:54,233][00412] Waiting for process rollout_proc0 to join... [2024-12-25 18:59:57,278][00412] Waiting for process rollout_proc1 to join... [2024-12-25 18:59:57,567][00412] Waiting for process rollout_proc2 to join... [2024-12-25 18:59:57,571][00412] Waiting for process rollout_proc3 to join... [2024-12-25 18:59:57,575][00412] Waiting for process rollout_proc4 to join... [2024-12-25 18:59:57,579][00412] Waiting for process rollout_proc5 to join... [2024-12-25 18:59:57,586][00412] Waiting for process rollout_proc6 to join... [2024-12-25 18:59:57,592][00412] Waiting for process rollout_proc7 to join... [2024-12-25 18:59:57,595][00412] Batcher 0 profile tree view: batching: 27.7766, releasing_batches: 0.0380 [2024-12-25 18:59:57,597][00412] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0000 wait_policy_total: 438.9701 update_model: 12.1155 weight_update: 0.0015 one_step: 0.0046 handle_policy_step: 747.7715 deserialize: 19.6973, stack: 4.2133, obs_to_device_normalize: 154.5381, forward: 391.2906, send_messages: 38.1381 prepare_outputs: 103.8489 to_cpu: 57.1289 [2024-12-25 18:59:57,599][00412] Learner 0 profile tree view: misc: 0.0053, prepare_batch: 13.7092 train: 75.5362 epoch_init: 0.0064, minibatch_init: 0.0083, losses_postprocess: 0.6127, kl_divergence: 0.6701, after_optimizer: 34.3104 calculate_losses: 26.5633 losses_init: 0.0050, forward_head: 1.3985, bptt_initial: 17.4039, tail: 1.2639, advantages_returns: 0.3061, losses: 3.6183 bptt: 2.1562 bptt_forward_core: 2.0640 update: 12.5927 clip: 1.0282 [2024-12-25 18:59:57,603][00412] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.5015, enqueue_policy_requests: 122.8559, env_step: 970.5052, overhead: 21.0328, complete_rollouts: 8.2408 save_policy_outputs: 25.5540 split_output_tensors: 9.9918 [2024-12-25 18:59:57,605][00412] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.4408, enqueue_policy_requests: 125.1689, env_step: 969.8041, overhead: 20.1446, complete_rollouts: 8.1740 save_policy_outputs: 25.2221 split_output_tensors: 9.7138 [2024-12-25 18:59:57,606][00412] Loop Runner_EvtLoop terminating... [2024-12-25 18:59:57,608][00412] Runner profile tree view: main_loop: 1286.4388 [2024-12-25 18:59:57,608][00412] Collected {0: 4005888}, FPS: 3113.9 [2024-12-25 18:59:58,044][00412] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-12-25 18:59:58,047][00412] Overriding arg 'num_workers' with value 1 passed from command line [2024-12-25 18:59:58,050][00412] Adding new argument 'no_render'=True that is not in the saved config file! [2024-12-25 18:59:58,053][00412] Adding new argument 'save_video'=True that is not in the saved config file! [2024-12-25 18:59:58,056][00412] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-12-25 18:59:58,057][00412] Adding new argument 'video_name'=None that is not in the saved config file! [2024-12-25 18:59:58,059][00412] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2024-12-25 18:59:58,061][00412] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-12-25 18:59:58,062][00412] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2024-12-25 18:59:58,063][00412] Adding new argument 'hf_repository'=None that is not in the saved config file! [2024-12-25 18:59:58,064][00412] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-12-25 18:59:58,065][00412] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-12-25 18:59:58,066][00412] Adding new argument 'train_script'=None that is not in the saved config file! [2024-12-25 18:59:58,067][00412] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-12-25 18:59:58,068][00412] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-12-25 18:59:58,107][00412] Doom resolution: 160x120, resize resolution: (128, 72) [2024-12-25 18:59:58,112][00412] RunningMeanStd input shape: (3, 72, 128) [2024-12-25 18:59:58,115][00412] RunningMeanStd input shape: (1,) [2024-12-25 18:59:58,136][00412] ConvEncoder: input_channels=3 [2024-12-25 18:59:58,275][00412] Conv encoder output size: 512 [2024-12-25 18:59:58,276][00412] Policy head output size: 512 [2024-12-25 18:59:58,457][00412] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-12-25 18:59:59,311][00412] Num frames 100... [2024-12-25 18:59:59,449][00412] Num frames 200... [2024-12-25 18:59:59,587][00412] Num frames 300... [2024-12-25 18:59:59,745][00412] Num frames 400... [2024-12-25 18:59:59,877][00412] Num frames 500... [2024-12-25 19:00:00,005][00412] Num frames 600... [2024-12-25 19:00:00,137][00412] Num frames 700... [2024-12-25 19:00:00,237][00412] Avg episode rewards: #0: 14.360, true rewards: #0: 7.360 [2024-12-25 19:00:00,239][00412] Avg episode reward: 14.360, avg true_objective: 7.360 [2024-12-25 19:00:00,324][00412] Num frames 800... [2024-12-25 19:00:00,456][00412] Num frames 900... [2024-12-25 19:00:00,581][00412] Num frames 1000... [2024-12-25 19:00:00,710][00412] Num frames 1100... [2024-12-25 19:00:00,849][00412] Num frames 1200... [2024-12-25 19:00:00,981][00412] Num frames 1300... [2024-12-25 19:00:01,114][00412] Num frames 1400... [2024-12-25 19:00:01,250][00412] Num frames 1500... [2024-12-25 19:00:01,381][00412] Num frames 1600... [2024-12-25 19:00:01,512][00412] Num frames 1700... [2024-12-25 19:00:01,642][00412] Num frames 1800... [2024-12-25 19:00:01,791][00412] Num frames 1900... [2024-12-25 19:00:01,920][00412] Num frames 2000... [2024-12-25 19:00:02,046][00412] Num frames 2100... [2024-12-25 19:00:02,183][00412] Num frames 2200... [2024-12-25 19:00:02,313][00412] Num frames 2300... [2024-12-25 19:00:02,439][00412] Num frames 2400... [2024-12-25 19:00:02,566][00412] Num frames 2500... [2024-12-25 19:00:02,696][00412] Num frames 2600... [2024-12-25 19:00:02,868][00412] Avg episode rewards: #0: 30.920, true rewards: #0: 13.420 [2024-12-25 19:00:02,870][00412] Avg episode reward: 30.920, avg true_objective: 13.420 [2024-12-25 19:00:02,896][00412] Num frames 2700... [2024-12-25 19:00:03,024][00412] Num frames 2800... [2024-12-25 19:00:03,169][00412] Num frames 2900... [2024-12-25 19:00:03,296][00412] Num frames 3000... [2024-12-25 19:00:03,422][00412] Num frames 3100... [2024-12-25 19:00:03,553][00412] Num frames 3200... [2024-12-25 19:00:03,682][00412] Num frames 3300... [2024-12-25 19:00:03,819][00412] Num frames 3400... [2024-12-25 19:00:03,945][00412] Avg episode rewards: #0: 27.506, true rewards: #0: 11.507 [2024-12-25 19:00:03,946][00412] Avg episode reward: 27.506, avg true_objective: 11.507 [2024-12-25 19:00:04,015][00412] Num frames 3500... [2024-12-25 19:00:04,154][00412] Num frames 3600... [2024-12-25 19:00:04,278][00412] Num frames 3700... [2024-12-25 19:00:04,405][00412] Num frames 3800... [2024-12-25 19:00:04,539][00412] Num frames 3900... [2024-12-25 19:00:04,665][00412] Num frames 4000... [2024-12-25 19:00:04,798][00412] Num frames 4100... [2024-12-25 19:00:04,937][00412] Avg episode rewards: #0: 23.890, true rewards: #0: 10.390 [2024-12-25 19:00:04,939][00412] Avg episode reward: 23.890, avg true_objective: 10.390 [2024-12-25 19:00:05,001][00412] Num frames 4200... [2024-12-25 19:00:05,137][00412] Num frames 4300... [2024-12-25 19:00:05,264][00412] Num frames 4400... [2024-12-25 19:00:05,396][00412] Num frames 4500... [2024-12-25 19:00:05,525][00412] Num frames 4600... [2024-12-25 19:00:05,627][00412] Avg episode rewards: #0: 20.672, true rewards: #0: 9.272 [2024-12-25 19:00:05,629][00412] Avg episode reward: 20.672, avg true_objective: 9.272 [2024-12-25 19:00:05,719][00412] Num frames 4700... [2024-12-25 19:00:05,849][00412] Num frames 4800... [2024-12-25 19:00:05,985][00412] Num frames 4900... [2024-12-25 19:00:06,156][00412] Avg episode rewards: #0: 18.313, true rewards: #0: 8.313 [2024-12-25 19:00:06,158][00412] Avg episode reward: 18.313, avg true_objective: 8.313 [2024-12-25 19:00:06,178][00412] Num frames 5000... [2024-12-25 19:00:06,302][00412] Num frames 5100... [2024-12-25 19:00:06,430][00412] Num frames 5200... [2024-12-25 19:00:06,558][00412] Num frames 5300... [2024-12-25 19:00:06,687][00412] Num frames 5400... [2024-12-25 19:00:06,813][00412] Num frames 5500... [2024-12-25 19:00:07,001][00412] Avg episode rewards: #0: 17.426, true rewards: #0: 7.997 [2024-12-25 19:00:07,004][00412] Avg episode reward: 17.426, avg true_objective: 7.997 [2024-12-25 19:00:07,009][00412] Num frames 5600... [2024-12-25 19:00:07,149][00412] Num frames 5700... [2024-12-25 19:00:07,277][00412] Num frames 5800... [2024-12-25 19:00:07,455][00412] Num frames 5900... [2024-12-25 19:00:07,639][00412] Num frames 6000... [2024-12-25 19:00:07,817][00412] Num frames 6100... [2024-12-25 19:00:07,998][00412] Num frames 6200... [2024-12-25 19:00:08,171][00412] Num frames 6300... [2024-12-25 19:00:08,233][00412] Avg episode rewards: #0: 16.752, true rewards: #0: 7.877 [2024-12-25 19:00:08,235][00412] Avg episode reward: 16.752, avg true_objective: 7.877 [2024-12-25 19:00:08,402][00412] Num frames 6400... [2024-12-25 19:00:08,577][00412] Num frames 6500... [2024-12-25 19:00:08,756][00412] Num frames 6600... [2024-12-25 19:00:08,942][00412] Num frames 6700... [2024-12-25 19:00:09,145][00412] Num frames 6800... [2024-12-25 19:00:09,335][00412] Num frames 6900... [2024-12-25 19:00:09,479][00412] Avg episode rewards: #0: 16.494, true rewards: #0: 7.717 [2024-12-25 19:00:09,482][00412] Avg episode reward: 16.494, avg true_objective: 7.717 [2024-12-25 19:00:09,584][00412] Num frames 7000... [2024-12-25 19:00:09,775][00412] Num frames 7100... [2024-12-25 19:00:09,927][00412] Num frames 7200... [2024-12-25 19:00:10,070][00412] Num frames 7300... [2024-12-25 19:00:10,206][00412] Num frames 7400... [2024-12-25 19:00:10,332][00412] Num frames 7500... [2024-12-25 19:00:10,459][00412] Num frames 7600... [2024-12-25 19:00:10,585][00412] Num frames 7700... [2024-12-25 19:00:10,713][00412] Num frames 7800... [2024-12-25 19:00:10,836][00412] Num frames 7900... [2024-12-25 19:00:10,959][00412] Num frames 8000... [2024-12-25 19:00:11,099][00412] Num frames 8100... [2024-12-25 19:00:11,238][00412] Num frames 8200... [2024-12-25 19:00:11,370][00412] Num frames 8300... [2024-12-25 19:00:11,533][00412] Avg episode rewards: #0: 18.085, true rewards: #0: 8.385 [2024-12-25 19:00:11,535][00412] Avg episode reward: 18.085, avg true_objective: 8.385 [2024-12-25 19:01:05,893][00412] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2024-12-25 19:14:54,335][00412] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2024-12-25 19:14:54,337][00412] Overriding arg 'num_workers' with value 1 passed from command line [2024-12-25 19:14:54,338][00412] Adding new argument 'no_render'=True that is not in the saved config file! [2024-12-25 19:14:54,341][00412] Adding new argument 'save_video'=True that is not in the saved config file! [2024-12-25 19:14:54,342][00412] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2024-12-25 19:14:54,345][00412] Adding new argument 'video_name'=None that is not in the saved config file! [2024-12-25 19:14:54,346][00412] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2024-12-25 19:14:54,348][00412] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2024-12-25 19:14:54,351][00412] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2024-12-25 19:14:54,352][00412] Adding new argument 'hf_repository'='ZhaoxiZheng/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2024-12-25 19:14:54,354][00412] Adding new argument 'policy_index'=0 that is not in the saved config file! [2024-12-25 19:14:54,355][00412] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2024-12-25 19:14:54,356][00412] Adding new argument 'train_script'=None that is not in the saved config file! [2024-12-25 19:14:54,357][00412] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2024-12-25 19:14:54,358][00412] Using frameskip 1 and render_action_repeat=4 for evaluation [2024-12-25 19:14:54,383][00412] RunningMeanStd input shape: (3, 72, 128) [2024-12-25 19:14:54,385][00412] RunningMeanStd input shape: (1,) [2024-12-25 19:14:54,397][00412] ConvEncoder: input_channels=3 [2024-12-25 19:14:54,434][00412] Conv encoder output size: 512 [2024-12-25 19:14:54,435][00412] Policy head output size: 512 [2024-12-25 19:14:54,454][00412] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2024-12-25 19:14:54,853][00412] Num frames 100... [2024-12-25 19:14:54,975][00412] Num frames 200... [2024-12-25 19:14:55,097][00412] Num frames 300... [2024-12-25 19:14:55,249][00412] Num frames 400... [2024-12-25 19:14:55,367][00412] Num frames 500... [2024-12-25 19:14:55,483][00412] Num frames 600... [2024-12-25 19:14:55,606][00412] Num frames 700... [2024-12-25 19:14:55,727][00412] Num frames 800... [2024-12-25 19:14:55,846][00412] Num frames 900... [2024-12-25 19:14:55,958][00412] Avg episode rewards: #0: 19.470, true rewards: #0: 9.470 [2024-12-25 19:14:55,960][00412] Avg episode reward: 19.470, avg true_objective: 9.470 [2024-12-25 19:14:56,024][00412] Num frames 1000... [2024-12-25 19:14:56,149][00412] Num frames 1100... [2024-12-25 19:14:56,278][00412] Num frames 1200... [2024-12-25 19:14:56,402][00412] Num frames 1300... [2024-12-25 19:14:56,522][00412] Num frames 1400... [2024-12-25 19:14:56,644][00412] Num frames 1500... [2024-12-25 19:14:56,761][00412] Num frames 1600... [2024-12-25 19:14:56,881][00412] Num frames 1700... [2024-12-25 19:14:57,003][00412] Num frames 1800... [2024-12-25 19:14:57,126][00412] Num frames 1900... [2024-12-25 19:14:57,253][00412] Num frames 2000... [2024-12-25 19:14:57,379][00412] Num frames 2100... [2024-12-25 19:14:57,497][00412] Num frames 2200... [2024-12-25 19:14:57,617][00412] Num frames 2300... [2024-12-25 19:14:57,772][00412] Avg episode rewards: #0: 26.405, true rewards: #0: 11.905 [2024-12-25 19:14:57,773][00412] Avg episode reward: 26.405, avg true_objective: 11.905 [2024-12-25 19:14:57,798][00412] Num frames 2400... [2024-12-25 19:14:57,915][00412] Num frames 2500... [2024-12-25 19:14:58,031][00412] Num frames 2600... [2024-12-25 19:14:58,159][00412] Num frames 2700... [2024-12-25 19:14:58,293][00412] Num frames 2800... [2024-12-25 19:14:58,410][00412] Num frames 2900... [2024-12-25 19:14:58,529][00412] Num frames 3000... [2024-12-25 19:14:58,645][00412] Num frames 3100... [2024-12-25 19:14:58,761][00412] Num frames 3200... [2024-12-25 19:14:58,879][00412] Num frames 3300... [2024-12-25 19:14:59,000][00412] Num frames 3400... [2024-12-25 19:14:59,126][00412] Num frames 3500... [2024-12-25 19:14:59,247][00412] Num frames 3600... [2024-12-25 19:14:59,377][00412] Num frames 3700... [2024-12-25 19:14:59,496][00412] Num frames 3800... [2024-12-25 19:14:59,616][00412] Num frames 3900... [2024-12-25 19:14:59,724][00412] Avg episode rewards: #0: 31.147, true rewards: #0: 13.147 [2024-12-25 19:14:59,726][00412] Avg episode reward: 31.147, avg true_objective: 13.147 [2024-12-25 19:14:59,793][00412] Num frames 4000... [2024-12-25 19:14:59,914][00412] Num frames 4100... [2024-12-25 19:15:00,032][00412] Num frames 4200... [2024-12-25 19:15:00,156][00412] Num frames 4300... [2024-12-25 19:15:00,285][00412] Num frames 4400... [2024-12-25 19:15:00,415][00412] Num frames 4500... [2024-12-25 19:15:00,494][00412] Avg episode rewards: #0: 25.800, true rewards: #0: 11.300 [2024-12-25 19:15:00,496][00412] Avg episode reward: 25.800, avg true_objective: 11.300 [2024-12-25 19:15:00,592][00412] Num frames 4600... [2024-12-25 19:15:00,714][00412] Num frames 4700... [2024-12-25 19:15:00,832][00412] Num frames 4800... [2024-12-25 19:15:00,950][00412] Num frames 4900... [2024-12-25 19:15:01,096][00412] Avg episode rewards: #0: 22.336, true rewards: #0: 9.936 [2024-12-25 19:15:01,098][00412] Avg episode reward: 22.336, avg true_objective: 9.936 [2024-12-25 19:15:01,149][00412] Num frames 5000... [2024-12-25 19:15:01,268][00412] Num frames 5100... [2024-12-25 19:15:01,398][00412] Num frames 5200... [2024-12-25 19:15:01,514][00412] Num frames 5300... [2024-12-25 19:15:01,633][00412] Num frames 5400... [2024-12-25 19:15:01,753][00412] Num frames 5500... [2024-12-25 19:15:01,870][00412] Num frames 5600... [2024-12-25 19:15:01,988][00412] Num frames 5700... [2024-12-25 19:15:02,115][00412] Num frames 5800... [2024-12-25 19:15:02,235][00412] Num frames 5900... [2024-12-25 19:15:02,359][00412] Num frames 6000... [2024-12-25 19:15:02,490][00412] Num frames 6100... [2024-12-25 19:15:02,608][00412] Num frames 6200... [2024-12-25 19:15:02,726][00412] Num frames 6300... [2024-12-25 19:15:02,844][00412] Num frames 6400... [2024-12-25 19:15:02,967][00412] Num frames 6500... [2024-12-25 19:15:03,138][00412] Num frames 6600... [2024-12-25 19:15:03,315][00412] Num frames 6700... [2024-12-25 19:15:03,421][00412] Avg episode rewards: #0: 25.880, true rewards: #0: 11.213 [2024-12-25 19:15:03,424][00412] Avg episode reward: 25.880, avg true_objective: 11.213 [2024-12-25 19:15:03,541][00412] Num frames 6800... [2024-12-25 19:15:03,701][00412] Num frames 6900... [2024-12-25 19:15:03,864][00412] Num frames 7000... [2024-12-25 19:15:04,024][00412] Num frames 7100... [2024-12-25 19:15:04,200][00412] Num frames 7200... [2024-12-25 19:15:04,368][00412] Num frames 7300... [2024-12-25 19:15:04,545][00412] Num frames 7400... [2024-12-25 19:15:04,714][00412] Num frames 7500... [2024-12-25 19:15:04,929][00412] Avg episode rewards: #0: 24.703, true rewards: #0: 10.846 [2024-12-25 19:15:04,931][00412] Avg episode reward: 24.703, avg true_objective: 10.846 [2024-12-25 19:15:04,946][00412] Num frames 7600... [2024-12-25 19:15:05,132][00412] Num frames 7700... [2024-12-25 19:15:05,307][00412] Num frames 7800... [2024-12-25 19:15:05,495][00412] Num frames 7900... [2024-12-25 19:15:05,633][00412] Num frames 8000... [2024-12-25 19:15:05,751][00412] Num frames 8100... [2024-12-25 19:15:05,871][00412] Num frames 8200... [2024-12-25 19:15:05,989][00412] Num frames 8300... [2024-12-25 19:15:06,116][00412] Num frames 8400... [2024-12-25 19:15:06,233][00412] Num frames 8500... [2024-12-25 19:15:06,353][00412] Num frames 8600... [2024-12-25 19:15:06,428][00412] Avg episode rewards: #0: 23.895, true rewards: #0: 10.770 [2024-12-25 19:15:06,429][00412] Avg episode reward: 23.895, avg true_objective: 10.770 [2024-12-25 19:15:06,537][00412] Num frames 8700... [2024-12-25 19:15:06,656][00412] Num frames 8800... [2024-12-25 19:15:06,774][00412] Num frames 8900... [2024-12-25 19:15:06,893][00412] Num frames 9000... [2024-12-25 19:15:07,011][00412] Num frames 9100... [2024-12-25 19:15:07,138][00412] Num frames 9200... [2024-12-25 19:15:07,257][00412] Num frames 9300... [2024-12-25 19:15:07,378][00412] Num frames 9400... [2024-12-25 19:15:07,498][00412] Num frames 9500... [2024-12-25 19:15:07,623][00412] Num frames 9600... [2024-12-25 19:15:07,744][00412] Num frames 9700... [2024-12-25 19:15:07,817][00412] Avg episode rewards: #0: 24.461, true rewards: #0: 10.794 [2024-12-25 19:15:07,819][00412] Avg episode reward: 24.461, avg true_objective: 10.794 [2024-12-25 19:15:07,918][00412] Num frames 9800... [2024-12-25 19:15:08,037][00412] Num frames 9900... [2024-12-25 19:15:08,163][00412] Num frames 10000... [2024-12-25 19:15:08,289][00412] Num frames 10100... [2024-12-25 19:15:08,418][00412] Num frames 10200... [2024-12-25 19:15:08,535][00412] Num frames 10300... [2024-12-25 19:15:08,659][00412] Num frames 10400... [2024-12-25 19:15:08,778][00412] Num frames 10500... [2024-12-25 19:15:08,894][00412] Num frames 10600... [2024-12-25 19:15:09,012][00412] Num frames 10700... [2024-12-25 19:15:09,137][00412] Num frames 10800... [2024-12-25 19:15:09,198][00412] Avg episode rewards: #0: 24.204, true rewards: #0: 10.804 [2024-12-25 19:15:09,200][00412] Avg episode reward: 24.204, avg true_objective: 10.804 [2024-12-25 19:16:17,318][00412] Replay video saved to /content/train_dir/default_experiment/replay.mp4!