rootchina's picture
Upload folder using huggingface_hub
1ec06f6 verified
[2025-02-01 07:53:12,335][00813] Saving configuration to /content/train_dir/default_experiment/config.json...
[2025-02-01 07:53:12,337][00813] Rollout worker 0 uses device cpu
[2025-02-01 07:53:12,340][00813] Rollout worker 1 uses device cpu
[2025-02-01 07:53:12,342][00813] Rollout worker 2 uses device cpu
[2025-02-01 07:53:12,343][00813] Rollout worker 3 uses device cpu
[2025-02-01 07:53:12,345][00813] Rollout worker 4 uses device cpu
[2025-02-01 07:53:12,346][00813] Rollout worker 5 uses device cpu
[2025-02-01 07:53:12,347][00813] Rollout worker 6 uses device cpu
[2025-02-01 07:53:12,348][00813] Rollout worker 7 uses device cpu
[2025-02-01 07:53:12,505][00813] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2025-02-01 07:53:12,506][00813] InferenceWorker_p0-w0: min num requests: 2
[2025-02-01 07:53:12,545][00813] Starting all processes...
[2025-02-01 07:53:12,546][00813] Starting process learner_proc0
[2025-02-01 07:53:12,602][00813] Starting all processes...
[2025-02-01 07:53:12,617][00813] Starting process inference_proc0-0
[2025-02-01 07:53:12,618][00813] Starting process rollout_proc0
[2025-02-01 07:53:12,620][00813] Starting process rollout_proc1
[2025-02-01 07:53:12,620][00813] Starting process rollout_proc2
[2025-02-01 07:53:12,620][00813] Starting process rollout_proc3
[2025-02-01 07:53:12,621][00813] Starting process rollout_proc4
[2025-02-01 07:53:12,621][00813] Starting process rollout_proc5
[2025-02-01 07:53:12,622][00813] Starting process rollout_proc6
[2025-02-01 07:53:12,622][00813] Starting process rollout_proc7
[2025-02-01 07:53:30,227][04645] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2025-02-01 07:53:30,237][04645] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0
[2025-02-01 07:53:30,426][04645] Num visible devices: 1
[2025-02-01 07:53:30,484][04645] Starting seed is not provided
[2025-02-01 07:53:30,485][04645] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2025-02-01 07:53:30,486][04645] Initializing actor-critic model on device cuda:0
[2025-02-01 07:53:30,487][04645] RunningMeanStd input shape: (3, 72, 128)
[2025-02-01 07:53:30,490][04645] RunningMeanStd input shape: (1,)
[2025-02-01 07:53:30,573][04645] ConvEncoder: input_channels=3
[2025-02-01 07:53:30,883][04663] Worker 4 uses CPU cores [0]
[2025-02-01 07:53:31,034][04660] Worker 1 uses CPU cores [1]
[2025-02-01 07:53:31,497][04664] Worker 5 uses CPU cores [1]
[2025-02-01 07:53:31,540][04659] Worker 0 uses CPU cores [0]
[2025-02-01 07:53:31,587][04658] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2025-02-01 07:53:31,590][04658] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0
[2025-02-01 07:53:31,697][04665] Worker 6 uses CPU cores [0]
[2025-02-01 07:53:31,738][04658] Num visible devices: 1
[2025-02-01 07:53:31,757][04662] Worker 3 uses CPU cores [1]
[2025-02-01 07:53:31,939][04661] Worker 2 uses CPU cores [0]
[2025-02-01 07:53:31,968][04645] Conv encoder output size: 512
[2025-02-01 07:53:31,970][04645] Policy head output size: 512
[2025-02-01 07:53:32,013][04666] Worker 7 uses CPU cores [1]
[2025-02-01 07:53:32,096][04645] Created Actor Critic model with architecture:
[2025-02-01 07:53:32,097][04645] ActorCriticSharedWeights(
(obs_normalizer): ObservationNormalizer(
(running_mean_std): RunningMeanStdDictInPlace(
(running_mean_std): ModuleDict(
(obs): RunningMeanStdInPlace()
)
)
)
(returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace)
(encoder): VizdoomEncoder(
(basic_encoder): ConvEncoder(
(enc): RecursiveScriptModule(
original_name=ConvEncoderImpl
(conv_head): RecursiveScriptModule(
original_name=Sequential
(0): RecursiveScriptModule(original_name=Conv2d)
(1): RecursiveScriptModule(original_name=ELU)
(2): RecursiveScriptModule(original_name=Conv2d)
(3): RecursiveScriptModule(original_name=ELU)
(4): RecursiveScriptModule(original_name=Conv2d)
(5): RecursiveScriptModule(original_name=ELU)
)
(mlp_layers): RecursiveScriptModule(
original_name=Sequential
(0): RecursiveScriptModule(original_name=Linear)
(1): RecursiveScriptModule(original_name=ELU)
)
)
)
)
(core): ModelCoreRNN(
(core): GRU(512, 512)
)
(decoder): MlpDecoder(
(mlp): Identity()
)
(critic_linear): Linear(in_features=512, out_features=1, bias=True)
(action_parameterization): ActionParameterizationDefault(
(distribution_linear): Linear(in_features=512, out_features=5, bias=True)
)
)
[2025-02-01 07:53:32,430][04645] Using optimizer <class 'torch.optim.adam.Adam'>
[2025-02-01 07:53:32,501][00813] Heartbeat connected on Batcher_0
[2025-02-01 07:53:32,507][00813] Heartbeat connected on InferenceWorker_p0-w0
[2025-02-01 07:53:32,514][00813] Heartbeat connected on RolloutWorker_w0
[2025-02-01 07:53:32,520][00813] Heartbeat connected on RolloutWorker_w1
[2025-02-01 07:53:32,524][00813] Heartbeat connected on RolloutWorker_w2
[2025-02-01 07:53:32,528][00813] Heartbeat connected on RolloutWorker_w3
[2025-02-01 07:53:32,532][00813] Heartbeat connected on RolloutWorker_w4
[2025-02-01 07:53:32,537][00813] Heartbeat connected on RolloutWorker_w5
[2025-02-01 07:53:32,541][00813] Heartbeat connected on RolloutWorker_w6
[2025-02-01 07:53:32,545][00813] Heartbeat connected on RolloutWorker_w7
[2025-02-01 07:53:36,659][04645] No checkpoints found
[2025-02-01 07:53:36,659][04645] Did not load from checkpoint, starting from scratch!
[2025-02-01 07:53:36,660][04645] Initialized policy 0 weights for model version 0
[2025-02-01 07:53:36,662][04645] LearnerWorker_p0 finished initialization!
[2025-02-01 07:53:36,663][00813] Heartbeat connected on LearnerWorker_p0
[2025-02-01 07:53:36,663][04645] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2025-02-01 07:53:36,821][04658] RunningMeanStd input shape: (3, 72, 128)
[2025-02-01 07:53:36,822][04658] RunningMeanStd input shape: (1,)
[2025-02-01 07:53:36,833][04658] ConvEncoder: input_channels=3
[2025-02-01 07:53:36,933][04658] Conv encoder output size: 512
[2025-02-01 07:53:36,934][04658] Policy head output size: 512
[2025-02-01 07:53:36,968][00813] Inference worker 0-0 is ready!
[2025-02-01 07:53:36,969][00813] All inference workers are ready! Signal rollout workers to start!
[2025-02-01 07:53:37,054][04660] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-02-01 07:53:37,063][04659] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-02-01 07:53:37,067][04664] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-02-01 07:53:37,069][04662] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-02-01 07:53:37,077][04663] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-02-01 07:53:37,139][04666] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-02-01 07:53:37,160][04665] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-02-01 07:53:37,212][04661] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-02-01 07:53:37,459][00813] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
[2025-02-01 07:53:38,279][04663] Decorrelating experience for 0 frames...
[2025-02-01 07:53:38,280][04659] Decorrelating experience for 0 frames...
[2025-02-01 07:53:38,281][04662] Decorrelating experience for 0 frames...
[2025-02-01 07:53:38,279][04664] Decorrelating experience for 0 frames...
[2025-02-01 07:53:38,282][04666] Decorrelating experience for 0 frames...
[2025-02-01 07:53:38,675][04659] Decorrelating experience for 32 frames...
[2025-02-01 07:53:39,110][04659] Decorrelating experience for 64 frames...
[2025-02-01 07:53:39,540][04666] Decorrelating experience for 32 frames...
[2025-02-01 07:53:39,542][04662] Decorrelating experience for 32 frames...
[2025-02-01 07:53:39,546][04664] Decorrelating experience for 32 frames...
[2025-02-01 07:53:39,561][04660] Decorrelating experience for 0 frames...
[2025-02-01 07:53:40,681][04663] Decorrelating experience for 32 frames...
[2025-02-01 07:53:40,766][04660] Decorrelating experience for 32 frames...
[2025-02-01 07:53:40,797][04659] Decorrelating experience for 96 frames...
[2025-02-01 07:53:40,810][04661] Decorrelating experience for 0 frames...
[2025-02-01 07:53:40,848][04662] Decorrelating experience for 64 frames...
[2025-02-01 07:53:40,850][04664] Decorrelating experience for 64 frames...
[2025-02-01 07:53:41,494][04660] Decorrelating experience for 64 frames...
[2025-02-01 07:53:41,831][04663] Decorrelating experience for 64 frames...
[2025-02-01 07:53:41,838][04665] Decorrelating experience for 0 frames...
[2025-02-01 07:53:42,459][00813] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
[2025-02-01 07:53:42,588][04666] Decorrelating experience for 64 frames...
[2025-02-01 07:53:43,854][04666] Decorrelating experience for 96 frames...
[2025-02-01 07:53:43,972][04661] Decorrelating experience for 32 frames...
[2025-02-01 07:53:43,987][04665] Decorrelating experience for 32 frames...
[2025-02-01 07:53:44,165][04663] Decorrelating experience for 96 frames...
[2025-02-01 07:53:46,600][04660] Decorrelating experience for 96 frames...
[2025-02-01 07:53:47,459][00813] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 112.4. Samples: 1124. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
[2025-02-01 07:53:47,461][00813] Avg episode reward: [(0, '2.977')]
[2025-02-01 07:53:48,591][04665] Decorrelating experience for 64 frames...
[2025-02-01 07:53:48,594][04661] Decorrelating experience for 64 frames...
[2025-02-01 07:53:50,505][04645] Signal inference workers to stop experience collection...
[2025-02-01 07:53:50,517][04658] InferenceWorker_p0-w0: stopping experience collection
[2025-02-01 07:53:50,616][04662] Decorrelating experience for 96 frames...
[2025-02-01 07:53:51,111][04645] Signal inference workers to resume experience collection...
[2025-02-01 07:53:51,112][04658] InferenceWorker_p0-w0: resuming experience collection
[2025-02-01 07:53:51,123][04664] Decorrelating experience for 96 frames...
[2025-02-01 07:53:52,201][04665] Decorrelating experience for 96 frames...
[2025-02-01 07:53:52,204][04661] Decorrelating experience for 96 frames...
[2025-02-01 07:53:52,459][00813] Fps is (10 sec: 819.2, 60 sec: 546.1, 300 sec: 546.1). Total num frames: 8192. Throughput: 0: 193.6. Samples: 2904. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0)
[2025-02-01 07:53:52,463][00813] Avg episode reward: [(0, '3.376')]
[2025-02-01 07:53:57,459][00813] Fps is (10 sec: 3276.8, 60 sec: 1638.4, 300 sec: 1638.4). Total num frames: 32768. Throughput: 0: 306.2. Samples: 6124. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-02-01 07:53:57,465][00813] Avg episode reward: [(0, '3.894')]
[2025-02-01 07:53:59,229][04658] Updated weights for policy 0, policy_version 10 (0.0013)
[2025-02-01 07:54:02,459][00813] Fps is (10 sec: 4095.9, 60 sec: 1966.1, 300 sec: 1966.1). Total num frames: 49152. Throughput: 0: 480.9. Samples: 12022. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-02-01 07:54:02,464][00813] Avg episode reward: [(0, '4.241')]
[2025-02-01 07:54:07,459][00813] Fps is (10 sec: 2867.2, 60 sec: 2048.0, 300 sec: 2048.0). Total num frames: 61440. Throughput: 0: 534.1. Samples: 16024. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-02-01 07:54:07,466][00813] Avg episode reward: [(0, '4.479')]
[2025-02-01 07:54:11,292][04658] Updated weights for policy 0, policy_version 20 (0.0019)
[2025-02-01 07:54:12,460][00813] Fps is (10 sec: 3686.4, 60 sec: 2457.6, 300 sec: 2457.6). Total num frames: 86016. Throughput: 0: 558.2. Samples: 19538. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-02-01 07:54:12,467][00813] Avg episode reward: [(0, '4.496')]
[2025-02-01 07:54:17,459][00813] Fps is (10 sec: 4505.6, 60 sec: 2662.4, 300 sec: 2662.4). Total num frames: 106496. Throughput: 0: 665.7. Samples: 26626. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2025-02-01 07:54:17,465][00813] Avg episode reward: [(0, '4.496')]
[2025-02-01 07:54:17,473][04645] Saving new best policy, reward=4.496!
[2025-02-01 07:54:22,404][04658] Updated weights for policy 0, policy_version 30 (0.0013)
[2025-02-01 07:54:22,463][00813] Fps is (10 sec: 3685.3, 60 sec: 2730.5, 300 sec: 2730.5). Total num frames: 122880. Throughput: 0: 690.8. Samples: 31088. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2025-02-01 07:54:22,471][00813] Avg episode reward: [(0, '4.274')]
[2025-02-01 07:54:27,459][00813] Fps is (10 sec: 3686.3, 60 sec: 2867.2, 300 sec: 2867.2). Total num frames: 143360. Throughput: 0: 754.2. Samples: 33938. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-02-01 07:54:27,465][00813] Avg episode reward: [(0, '4.311')]
[2025-02-01 07:54:31,486][04658] Updated weights for policy 0, policy_version 40 (0.0014)
[2025-02-01 07:54:32,459][00813] Fps is (10 sec: 4507.1, 60 sec: 3053.4, 300 sec: 3053.4). Total num frames: 167936. Throughput: 0: 882.8. Samples: 40852. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-01 07:54:32,464][00813] Avg episode reward: [(0, '4.452')]
[2025-02-01 07:54:37,460][00813] Fps is (10 sec: 4095.8, 60 sec: 3072.0, 300 sec: 3072.0). Total num frames: 184320. Throughput: 0: 965.2. Samples: 46338. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-02-01 07:54:37,466][00813] Avg episode reward: [(0, '4.474')]
[2025-02-01 07:54:42,459][00813] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3087.8). Total num frames: 200704. Throughput: 0: 939.9. Samples: 48420. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-02-01 07:54:42,465][00813] Avg episode reward: [(0, '4.396')]
[2025-02-01 07:54:43,180][04658] Updated weights for policy 0, policy_version 50 (0.0027)
[2025-02-01 07:54:47,460][00813] Fps is (10 sec: 3686.3, 60 sec: 3686.3, 300 sec: 3159.7). Total num frames: 221184. Throughput: 0: 955.5. Samples: 55020. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-02-01 07:54:47,465][00813] Avg episode reward: [(0, '4.427')]
[2025-02-01 07:54:52,465][00813] Fps is (10 sec: 4093.6, 60 sec: 3890.8, 300 sec: 3221.9). Total num frames: 241664. Throughput: 0: 1007.7. Samples: 61374. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-02-01 07:54:52,469][00813] Avg episode reward: [(0, '4.473')]
[2025-02-01 07:54:53,120][04658] Updated weights for policy 0, policy_version 60 (0.0029)
[2025-02-01 07:54:57,460][00813] Fps is (10 sec: 3686.6, 60 sec: 3754.6, 300 sec: 3225.6). Total num frames: 258048. Throughput: 0: 975.0. Samples: 63412. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-02-01 07:54:57,462][00813] Avg episode reward: [(0, '4.458')]
[2025-02-01 07:55:02,459][00813] Fps is (10 sec: 3688.5, 60 sec: 3823.0, 300 sec: 3276.8). Total num frames: 278528. Throughput: 0: 942.2. Samples: 69026. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-02-01 07:55:02,461][00813] Avg episode reward: [(0, '4.409')]
[2025-02-01 07:55:03,889][04658] Updated weights for policy 0, policy_version 70 (0.0022)
[2025-02-01 07:55:07,460][00813] Fps is (10 sec: 4505.7, 60 sec: 4027.7, 300 sec: 3367.8). Total num frames: 303104. Throughput: 0: 999.3. Samples: 76052. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-02-01 07:55:07,465][00813] Avg episode reward: [(0, '4.391')]
[2025-02-01 07:55:07,471][04645] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000074_303104.pth...
[2025-02-01 07:55:12,462][00813] Fps is (10 sec: 3685.4, 60 sec: 3822.8, 300 sec: 3319.8). Total num frames: 315392. Throughput: 0: 987.7. Samples: 78386. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-02-01 07:55:12,472][00813] Avg episode reward: [(0, '4.408')]
[2025-02-01 07:55:15,847][04658] Updated weights for policy 0, policy_version 80 (0.0034)
[2025-02-01 07:55:17,459][00813] Fps is (10 sec: 2867.3, 60 sec: 3754.7, 300 sec: 3317.8). Total num frames: 331776. Throughput: 0: 929.2. Samples: 82664. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-02-01 07:55:17,462][00813] Avg episode reward: [(0, '4.564')]
[2025-02-01 07:55:17,478][04645] Saving new best policy, reward=4.564!
[2025-02-01 07:55:22,459][00813] Fps is (10 sec: 4097.2, 60 sec: 3891.4, 300 sec: 3393.8). Total num frames: 356352. Throughput: 0: 959.2. Samples: 89502. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-02-01 07:55:22,462][00813] Avg episode reward: [(0, '4.439')]
[2025-02-01 07:55:24,890][04658] Updated weights for policy 0, policy_version 90 (0.0019)
[2025-02-01 07:55:27,462][00813] Fps is (10 sec: 4094.9, 60 sec: 3822.8, 300 sec: 3388.4). Total num frames: 372736. Throughput: 0: 987.4. Samples: 92854. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-02-01 07:55:27,464][00813] Avg episode reward: [(0, '4.387')]
[2025-02-01 07:55:32,459][00813] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3383.7). Total num frames: 389120. Throughput: 0: 934.9. Samples: 97088. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-02-01 07:55:32,462][00813] Avg episode reward: [(0, '4.394')]
[2025-02-01 07:55:36,502][04658] Updated weights for policy 0, policy_version 100 (0.0014)
[2025-02-01 07:55:37,459][00813] Fps is (10 sec: 4097.1, 60 sec: 3823.0, 300 sec: 3447.5). Total num frames: 413696. Throughput: 0: 937.0. Samples: 103534. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-02-01 07:55:37,462][00813] Avg episode reward: [(0, '4.623')]
[2025-02-01 07:55:37,467][04645] Saving new best policy, reward=4.623!
[2025-02-01 07:55:42,459][00813] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3473.4). Total num frames: 434176. Throughput: 0: 964.4. Samples: 106808. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-02-01 07:55:42,464][00813] Avg episode reward: [(0, '4.623')]
[2025-02-01 07:55:47,326][04658] Updated weights for policy 0, policy_version 110 (0.0018)
[2025-02-01 07:55:47,459][00813] Fps is (10 sec: 3686.4, 60 sec: 3823.0, 300 sec: 3465.8). Total num frames: 450560. Throughput: 0: 958.5. Samples: 112160. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-02-01 07:55:47,465][00813] Avg episode reward: [(0, '4.648')]
[2025-02-01 07:55:47,471][04645] Saving new best policy, reward=4.648!
[2025-02-01 07:55:52,459][00813] Fps is (10 sec: 3276.8, 60 sec: 3755.0, 300 sec: 3458.8). Total num frames: 466944. Throughput: 0: 922.7. Samples: 117572. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2025-02-01 07:55:52,464][00813] Avg episode reward: [(0, '4.567')]
[2025-02-01 07:55:57,124][04658] Updated weights for policy 0, policy_version 120 (0.0032)
[2025-02-01 07:55:57,459][00813] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3510.9). Total num frames: 491520. Throughput: 0: 949.7. Samples: 121118. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-02-01 07:55:57,463][00813] Avg episode reward: [(0, '4.650')]
[2025-02-01 07:55:57,469][04645] Saving new best policy, reward=4.650!
[2025-02-01 07:56:02,459][00813] Fps is (10 sec: 4095.9, 60 sec: 3822.9, 300 sec: 3502.8). Total num frames: 507904. Throughput: 0: 993.1. Samples: 127352. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-02-01 07:56:02,463][00813] Avg episode reward: [(0, '4.700')]
[2025-02-01 07:56:02,470][04645] Saving new best policy, reward=4.700!
[2025-02-01 07:56:07,459][00813] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3495.3). Total num frames: 524288. Throughput: 0: 941.6. Samples: 131872. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-02-01 07:56:07,464][00813] Avg episode reward: [(0, '4.553')]
[2025-02-01 07:56:08,544][04658] Updated weights for policy 0, policy_version 130 (0.0030)
[2025-02-01 07:56:12,459][00813] Fps is (10 sec: 4096.1, 60 sec: 3891.4, 300 sec: 3541.1). Total num frames: 548864. Throughput: 0: 942.1. Samples: 135248. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-02-01 07:56:12,461][00813] Avg episode reward: [(0, '4.558')]
[2025-02-01 07:56:17,460][00813] Fps is (10 sec: 4505.0, 60 sec: 3959.4, 300 sec: 3558.4). Total num frames: 569344. Throughput: 0: 1007.4. Samples: 142424. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0)
[2025-02-01 07:56:17,469][00813] Avg episode reward: [(0, '4.424')]
[2025-02-01 07:56:17,912][04658] Updated weights for policy 0, policy_version 140 (0.0016)
[2025-02-01 07:56:22,459][00813] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3549.9). Total num frames: 585728. Throughput: 0: 963.8. Samples: 146906. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2025-02-01 07:56:22,463][00813] Avg episode reward: [(0, '4.363')]
[2025-02-01 07:56:27,459][00813] Fps is (10 sec: 3686.9, 60 sec: 3891.4, 300 sec: 3565.9). Total num frames: 606208. Throughput: 0: 954.9. Samples: 149780. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-02-01 07:56:27,465][00813] Avg episode reward: [(0, '4.249')]
[2025-02-01 07:56:28,882][04658] Updated weights for policy 0, policy_version 150 (0.0017)
[2025-02-01 07:56:32,459][00813] Fps is (10 sec: 4505.7, 60 sec: 4027.7, 300 sec: 3604.5). Total num frames: 630784. Throughput: 0: 989.2. Samples: 156672. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-02-01 07:56:32,462][00813] Avg episode reward: [(0, '4.264')]
[2025-02-01 07:56:37,459][00813] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3572.6). Total num frames: 643072. Throughput: 0: 984.9. Samples: 161892. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-02-01 07:56:37,461][00813] Avg episode reward: [(0, '4.440')]
[2025-02-01 07:56:40,706][04658] Updated weights for policy 0, policy_version 160 (0.0022)
[2025-02-01 07:56:42,459][00813] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3586.8). Total num frames: 663552. Throughput: 0: 950.7. Samples: 163898. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-02-01 07:56:42,462][00813] Avg episode reward: [(0, '4.439')]
[2025-02-01 07:56:47,459][00813] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3600.2). Total num frames: 684032. Throughput: 0: 961.1. Samples: 170600. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-02-01 07:56:47,461][00813] Avg episode reward: [(0, '4.543')]
[2025-02-01 07:56:49,329][04658] Updated weights for policy 0, policy_version 170 (0.0016)
[2025-02-01 07:56:52,463][00813] Fps is (10 sec: 4094.4, 60 sec: 3959.2, 300 sec: 3612.8). Total num frames: 704512. Throughput: 0: 1005.9. Samples: 177142. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-02-01 07:56:52,468][00813] Avg episode reward: [(0, '4.418')]
[2025-02-01 07:56:57,460][00813] Fps is (10 sec: 3686.1, 60 sec: 3822.9, 300 sec: 3604.5). Total num frames: 720896. Throughput: 0: 978.1. Samples: 179264. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2025-02-01 07:56:57,462][00813] Avg episode reward: [(0, '4.422')]
[2025-02-01 07:57:00,630][04658] Updated weights for policy 0, policy_version 180 (0.0019)
[2025-02-01 07:57:02,459][00813] Fps is (10 sec: 3687.9, 60 sec: 3891.2, 300 sec: 3616.5). Total num frames: 741376. Throughput: 0: 950.0. Samples: 185172. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2025-02-01 07:57:02,465][00813] Avg episode reward: [(0, '4.452')]
[2025-02-01 07:57:07,459][00813] Fps is (10 sec: 4506.0, 60 sec: 4027.7, 300 sec: 3647.4). Total num frames: 765952. Throughput: 0: 1006.3. Samples: 192190. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-02-01 07:57:07,463][00813] Avg episode reward: [(0, '4.579')]
[2025-02-01 07:57:07,472][04645] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000187_765952.pth...
[2025-02-01 07:57:10,830][04658] Updated weights for policy 0, policy_version 190 (0.0025)
[2025-02-01 07:57:12,459][00813] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3638.8). Total num frames: 782336. Throughput: 0: 995.4. Samples: 194572. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-02-01 07:57:12,466][00813] Avg episode reward: [(0, '4.658')]
[2025-02-01 07:57:17,459][00813] Fps is (10 sec: 3276.8, 60 sec: 3823.0, 300 sec: 3630.5). Total num frames: 798720. Throughput: 0: 947.2. Samples: 199296. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-02-01 07:57:17,464][00813] Avg episode reward: [(0, '4.600')]
[2025-02-01 07:57:21,198][04658] Updated weights for policy 0, policy_version 200 (0.0018)
[2025-02-01 07:57:22,459][00813] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3659.1). Total num frames: 823296. Throughput: 0: 989.1. Samples: 206402. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-02-01 07:57:22,466][00813] Avg episode reward: [(0, '4.375')]
[2025-02-01 07:57:27,461][00813] Fps is (10 sec: 4504.6, 60 sec: 3959.3, 300 sec: 3668.6). Total num frames: 843776. Throughput: 0: 1022.8. Samples: 209926. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2025-02-01 07:57:27,464][00813] Avg episode reward: [(0, '4.287')]
[2025-02-01 07:57:32,459][00813] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3642.8). Total num frames: 856064. Throughput: 0: 967.7. Samples: 214146. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-02-01 07:57:32,461][00813] Avg episode reward: [(0, '4.256')]
[2025-02-01 07:57:32,694][04658] Updated weights for policy 0, policy_version 210 (0.0013)
[2025-02-01 07:57:37,459][00813] Fps is (10 sec: 3687.2, 60 sec: 3959.5, 300 sec: 3669.3). Total num frames: 880640. Throughput: 0: 964.3. Samples: 220530. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-01 07:57:37,461][00813] Avg episode reward: [(0, '4.463')]
[2025-02-01 07:57:41,590][04658] Updated weights for policy 0, policy_version 220 (0.0022)
[2025-02-01 07:57:42,459][00813] Fps is (10 sec: 4505.5, 60 sec: 3959.4, 300 sec: 3678.0). Total num frames: 901120. Throughput: 0: 994.0. Samples: 223994. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2025-02-01 07:57:42,462][00813] Avg episode reward: [(0, '4.606')]
[2025-02-01 07:57:47,459][00813] Fps is (10 sec: 3686.3, 60 sec: 3891.2, 300 sec: 3670.0). Total num frames: 917504. Throughput: 0: 976.0. Samples: 229090. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2025-02-01 07:57:47,462][00813] Avg episode reward: [(0, '4.620')]
[2025-02-01 07:57:52,459][00813] Fps is (10 sec: 3686.5, 60 sec: 3891.5, 300 sec: 3678.4). Total num frames: 937984. Throughput: 0: 947.2. Samples: 234812. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2025-02-01 07:57:52,464][00813] Avg episode reward: [(0, '4.685')]
[2025-02-01 07:57:53,024][04658] Updated weights for policy 0, policy_version 230 (0.0022)
[2025-02-01 07:57:57,459][00813] Fps is (10 sec: 4505.7, 60 sec: 4027.8, 300 sec: 3702.2). Total num frames: 962560. Throughput: 0: 970.4. Samples: 238238. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-02-01 07:57:57,462][00813] Avg episode reward: [(0, '4.763')]
[2025-02-01 07:57:57,468][04645] Saving new best policy, reward=4.763!
[2025-02-01 07:58:02,460][00813] Fps is (10 sec: 4095.7, 60 sec: 3959.4, 300 sec: 3694.1). Total num frames: 978944. Throughput: 0: 1000.9. Samples: 244336. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2025-02-01 07:58:02,462][00813] Avg episode reward: [(0, '4.717')]
[2025-02-01 07:58:03,603][04658] Updated weights for policy 0, policy_version 240 (0.0016)
[2025-02-01 07:58:07,459][00813] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3686.4). Total num frames: 995328. Throughput: 0: 947.5. Samples: 249038. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-01 07:58:07,462][00813] Avg episode reward: [(0, '4.582')]
[2025-02-01 07:58:12,459][00813] Fps is (10 sec: 3686.7, 60 sec: 3891.2, 300 sec: 3693.8). Total num frames: 1015808. Throughput: 0: 947.9. Samples: 252578. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-02-01 07:58:12,461][00813] Avg episode reward: [(0, '4.764')]
[2025-02-01 07:58:13,579][04658] Updated weights for policy 0, policy_version 250 (0.0035)
[2025-02-01 07:58:17,462][00813] Fps is (10 sec: 4094.8, 60 sec: 3959.3, 300 sec: 3701.0). Total num frames: 1036288. Throughput: 0: 1004.1. Samples: 259332. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-02-01 07:58:17,469][00813] Avg episode reward: [(0, '4.853')]
[2025-02-01 07:58:17,476][04645] Saving new best policy, reward=4.853!
[2025-02-01 07:58:22,459][00813] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3693.6). Total num frames: 1052672. Throughput: 0: 955.9. Samples: 263544. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2025-02-01 07:58:22,462][00813] Avg episode reward: [(0, '4.604')]
[2025-02-01 07:58:25,057][04658] Updated weights for policy 0, policy_version 260 (0.0040)
[2025-02-01 07:58:27,459][00813] Fps is (10 sec: 3687.5, 60 sec: 3823.1, 300 sec: 3700.5). Total num frames: 1073152. Throughput: 0: 945.7. Samples: 266552. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-01 07:58:27,462][00813] Avg episode reward: [(0, '4.651')]
[2025-02-01 07:58:32,459][00813] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3721.1). Total num frames: 1097728. Throughput: 0: 989.0. Samples: 273596. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2025-02-01 07:58:32,462][00813] Avg episode reward: [(0, '4.787')]
[2025-02-01 07:58:34,353][04658] Updated weights for policy 0, policy_version 270 (0.0026)
[2025-02-01 07:58:37,459][00813] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3762.8). Total num frames: 1110016. Throughput: 0: 974.0. Samples: 278644. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2025-02-01 07:58:37,469][00813] Avg episode reward: [(0, '4.783')]
[2025-02-01 07:58:42,459][00813] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3832.2). Total num frames: 1130496. Throughput: 0: 945.9. Samples: 280802. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-02-01 07:58:42,463][00813] Avg episode reward: [(0, '4.804')]
[2025-02-01 07:58:45,846][04658] Updated weights for policy 0, policy_version 280 (0.0022)
[2025-02-01 07:58:47,459][00813] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 1150976. Throughput: 0: 960.1. Samples: 287540. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-02-01 07:58:47,462][00813] Avg episode reward: [(0, '4.951')]
[2025-02-01 07:58:47,467][04645] Saving new best policy, reward=4.951!
[2025-02-01 07:58:52,459][00813] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 1171456. Throughput: 0: 991.2. Samples: 293640. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-02-01 07:58:52,463][00813] Avg episode reward: [(0, '5.039')]
[2025-02-01 07:58:52,464][04645] Saving new best policy, reward=5.039!
[2025-02-01 07:58:57,227][04658] Updated weights for policy 0, policy_version 290 (0.0013)
[2025-02-01 07:58:57,459][00813] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3860.0). Total num frames: 1187840. Throughput: 0: 958.2. Samples: 295696. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-02-01 07:58:57,462][00813] Avg episode reward: [(0, '4.892')]
[2025-02-01 07:59:02,460][00813] Fps is (10 sec: 3686.0, 60 sec: 3822.9, 300 sec: 3887.7). Total num frames: 1208320. Throughput: 0: 940.7. Samples: 301664. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-02-01 07:59:02,468][00813] Avg episode reward: [(0, '5.090')]
[2025-02-01 07:59:02,471][04645] Saving new best policy, reward=5.090!
[2025-02-01 07:59:06,329][04658] Updated weights for policy 0, policy_version 300 (0.0016)
[2025-02-01 07:59:07,459][00813] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3887.7). Total num frames: 1232896. Throughput: 0: 1000.0. Samples: 308546. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2025-02-01 07:59:07,461][00813] Avg episode reward: [(0, '4.973')]
[2025-02-01 07:59:07,485][04645] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000301_1232896.pth...
[2025-02-01 07:59:07,668][04645] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000074_303104.pth
[2025-02-01 07:59:12,459][00813] Fps is (10 sec: 3686.8, 60 sec: 3822.9, 300 sec: 3860.0). Total num frames: 1245184. Throughput: 0: 980.6. Samples: 310678. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-02-01 07:59:12,467][00813] Avg episode reward: [(0, '5.090')]
[2025-02-01 07:59:17,459][00813] Fps is (10 sec: 3276.8, 60 sec: 3823.1, 300 sec: 3873.9). Total num frames: 1265664. Throughput: 0: 934.2. Samples: 315634. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-01 07:59:17,465][00813] Avg episode reward: [(0, '5.105')]
[2025-02-01 07:59:17,475][04645] Saving new best policy, reward=5.105!
[2025-02-01 07:59:18,069][04658] Updated weights for policy 0, policy_version 310 (0.0023)
[2025-02-01 07:59:22,459][00813] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 1286144. Throughput: 0: 975.3. Samples: 322534. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-02-01 07:59:22,461][00813] Avg episode reward: [(0, '5.365')]
[2025-02-01 07:59:22,528][04645] Saving new best policy, reward=5.365!
[2025-02-01 07:59:27,459][00813] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 1306624. Throughput: 0: 997.2. Samples: 325674. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-02-01 07:59:27,465][00813] Avg episode reward: [(0, '5.344')]
[2025-02-01 07:59:28,912][04658] Updated weights for policy 0, policy_version 320 (0.0016)
[2025-02-01 07:59:32,459][00813] Fps is (10 sec: 3276.7, 60 sec: 3686.4, 300 sec: 3846.1). Total num frames: 1318912. Throughput: 0: 940.8. Samples: 329878. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2025-02-01 07:59:32,462][00813] Avg episode reward: [(0, '5.271')]
[2025-02-01 07:59:37,459][00813] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 1343488. Throughput: 0: 955.9. Samples: 336654. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-02-01 07:59:37,466][00813] Avg episode reward: [(0, '5.134')]
[2025-02-01 07:59:38,599][04658] Updated weights for policy 0, policy_version 330 (0.0018)
[2025-02-01 07:59:42,462][00813] Fps is (10 sec: 4913.9, 60 sec: 3959.3, 300 sec: 3887.7). Total num frames: 1368064. Throughput: 0: 988.2. Samples: 340168. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-02-01 07:59:42,468][00813] Avg episode reward: [(0, '5.095')]
[2025-02-01 07:59:47,466][00813] Fps is (10 sec: 3683.7, 60 sec: 3822.5, 300 sec: 3859.9). Total num frames: 1380352. Throughput: 0: 962.1. Samples: 344966. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-01 07:59:47,469][00813] Avg episode reward: [(0, '4.950')]
[2025-02-01 07:59:50,333][04658] Updated weights for policy 0, policy_version 340 (0.0024)
[2025-02-01 07:59:52,459][00813] Fps is (10 sec: 3277.7, 60 sec: 3822.9, 300 sec: 3873.9). Total num frames: 1400832. Throughput: 0: 937.6. Samples: 350740. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-02-01 07:59:52,463][00813] Avg episode reward: [(0, '4.905')]
[2025-02-01 07:59:57,459][00813] Fps is (10 sec: 4508.8, 60 sec: 3959.5, 300 sec: 3887.7). Total num frames: 1425408. Throughput: 0: 966.8. Samples: 354182. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-02-01 07:59:57,462][00813] Avg episode reward: [(0, '4.864')]
[2025-02-01 07:59:59,370][04658] Updated weights for policy 0, policy_version 350 (0.0016)
[2025-02-01 08:00:02,459][00813] Fps is (10 sec: 4096.0, 60 sec: 3891.3, 300 sec: 3860.0). Total num frames: 1441792. Throughput: 0: 989.3. Samples: 360152. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-02-01 08:00:02,461][00813] Avg episode reward: [(0, '4.845')]
[2025-02-01 08:00:07,459][00813] Fps is (10 sec: 3276.9, 60 sec: 3754.7, 300 sec: 3873.9). Total num frames: 1458176. Throughput: 0: 936.8. Samples: 364690. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-02-01 08:00:07,462][00813] Avg episode reward: [(0, '4.971')]
[2025-02-01 08:00:11,139][04658] Updated weights for policy 0, policy_version 360 (0.0038)
[2025-02-01 08:00:12,459][00813] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3887.7). Total num frames: 1478656. Throughput: 0: 939.4. Samples: 367948. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2025-02-01 08:00:12,465][00813] Avg episode reward: [(0, '4.999')]
[2025-02-01 08:00:17,465][00813] Fps is (10 sec: 4093.6, 60 sec: 3890.8, 300 sec: 3873.8). Total num frames: 1499136. Throughput: 0: 991.8. Samples: 374514. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-02-01 08:00:17,467][00813] Avg episode reward: [(0, '4.936')]
[2025-02-01 08:00:22,459][00813] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3860.0). Total num frames: 1511424. Throughput: 0: 931.9. Samples: 378590. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-02-01 08:00:22,463][00813] Avg episode reward: [(0, '4.964')]
[2025-02-01 08:00:23,227][04658] Updated weights for policy 0, policy_version 370 (0.0015)
[2025-02-01 08:00:27,459][00813] Fps is (10 sec: 3278.7, 60 sec: 3754.7, 300 sec: 3873.8). Total num frames: 1531904. Throughput: 0: 918.2. Samples: 381484. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-02-01 08:00:27,461][00813] Avg episode reward: [(0, '5.157')]
[2025-02-01 08:00:32,322][04658] Updated weights for policy 0, policy_version 380 (0.0023)
[2025-02-01 08:00:32,459][00813] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3873.8). Total num frames: 1556480. Throughput: 0: 963.0. Samples: 388292. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-02-01 08:00:32,463][00813] Avg episode reward: [(0, '4.962')]
[2025-02-01 08:00:37,460][00813] Fps is (10 sec: 3686.1, 60 sec: 3754.6, 300 sec: 3846.1). Total num frames: 1568768. Throughput: 0: 944.3. Samples: 393234. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-02-01 08:00:37,465][00813] Avg episode reward: [(0, '4.679')]
[2025-02-01 08:00:42,459][00813] Fps is (10 sec: 3276.8, 60 sec: 3686.6, 300 sec: 3860.0). Total num frames: 1589248. Throughput: 0: 911.4. Samples: 395196. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-02-01 08:00:42,466][00813] Avg episode reward: [(0, '4.556')]
[2025-02-01 08:00:44,458][04658] Updated weights for policy 0, policy_version 390 (0.0018)
[2025-02-01 08:00:47,459][00813] Fps is (10 sec: 4096.2, 60 sec: 3823.4, 300 sec: 3873.8). Total num frames: 1609728. Throughput: 0: 922.6. Samples: 401670. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-02-01 08:00:47,466][00813] Avg episode reward: [(0, '4.503')]
[2025-02-01 08:00:52,459][00813] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3846.1). Total num frames: 1626112. Throughput: 0: 952.8. Samples: 407566. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-02-01 08:00:52,467][00813] Avg episode reward: [(0, '4.904')]
[2025-02-01 08:00:56,024][04658] Updated weights for policy 0, policy_version 400 (0.0013)
[2025-02-01 08:00:57,459][00813] Fps is (10 sec: 2867.3, 60 sec: 3549.9, 300 sec: 3832.2). Total num frames: 1638400. Throughput: 0: 923.5. Samples: 409504. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-02-01 08:00:57,462][00813] Avg episode reward: [(0, '4.859')]
[2025-02-01 08:01:02,459][00813] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3860.0). Total num frames: 1662976. Throughput: 0: 903.3. Samples: 415158. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-02-01 08:01:02,465][00813] Avg episode reward: [(0, '5.096')]
[2025-02-01 08:01:05,647][04658] Updated weights for policy 0, policy_version 410 (0.0017)
[2025-02-01 08:01:07,461][00813] Fps is (10 sec: 4505.0, 60 sec: 3754.6, 300 sec: 3846.1). Total num frames: 1683456. Throughput: 0: 963.6. Samples: 421952. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-02-01 08:01:07,463][00813] Avg episode reward: [(0, '5.060')]
[2025-02-01 08:01:07,471][04645] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000411_1683456.pth...
[2025-02-01 08:01:07,627][04645] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000187_765952.pth
[2025-02-01 08:01:12,459][00813] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3832.2). Total num frames: 1699840. Throughput: 0: 944.8. Samples: 423998. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-02-01 08:01:12,461][00813] Avg episode reward: [(0, '5.249')]
[2025-02-01 08:01:17,459][00813] Fps is (10 sec: 3277.2, 60 sec: 3618.5, 300 sec: 3832.2). Total num frames: 1716224. Throughput: 0: 898.5. Samples: 428726. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-02-01 08:01:17,463][00813] Avg episode reward: [(0, '5.034')]
[2025-02-01 08:01:17,648][04658] Updated weights for policy 0, policy_version 420 (0.0031)
[2025-02-01 08:01:22,460][00813] Fps is (10 sec: 4095.6, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 1740800. Throughput: 0: 948.2. Samples: 435904. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2025-02-01 08:01:22,466][00813] Avg episode reward: [(0, '4.876')]
[2025-02-01 08:01:27,180][04658] Updated weights for policy 0, policy_version 430 (0.0016)
[2025-02-01 08:01:27,459][00813] Fps is (10 sec: 4505.7, 60 sec: 3822.9, 300 sec: 3832.2). Total num frames: 1761280. Throughput: 0: 980.0. Samples: 439294. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-02-01 08:01:27,464][00813] Avg episode reward: [(0, '4.835')]
[2025-02-01 08:01:32,459][00813] Fps is (10 sec: 3686.8, 60 sec: 3686.4, 300 sec: 3846.1). Total num frames: 1777664. Throughput: 0: 932.9. Samples: 443650. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-02-01 08:01:32,464][00813] Avg episode reward: [(0, '4.857')]
[2025-02-01 08:01:37,410][04658] Updated weights for policy 0, policy_version 440 (0.0013)
[2025-02-01 08:01:37,459][00813] Fps is (10 sec: 4096.0, 60 sec: 3891.3, 300 sec: 3860.0). Total num frames: 1802240. Throughput: 0: 955.9. Samples: 450582. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2025-02-01 08:01:37,466][00813] Avg episode reward: [(0, '4.874')]
[2025-02-01 08:01:42,459][00813] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 1822720. Throughput: 0: 990.0. Samples: 454052. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-02-01 08:01:42,464][00813] Avg episode reward: [(0, '4.826')]
[2025-02-01 08:01:47,459][00813] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3832.2). Total num frames: 1835008. Throughput: 0: 970.4. Samples: 458824. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-02-01 08:01:47,462][00813] Avg episode reward: [(0, '4.783')]
[2025-02-01 08:01:49,153][04658] Updated weights for policy 0, policy_version 450 (0.0015)
[2025-02-01 08:01:52,459][00813] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 1855488. Throughput: 0: 942.6. Samples: 464368. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-02-01 08:01:52,465][00813] Avg episode reward: [(0, '4.819')]
[2025-02-01 08:01:57,459][00813] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3846.1). Total num frames: 1875968. Throughput: 0: 970.8. Samples: 467682. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-01 08:01:57,467][00813] Avg episode reward: [(0, '4.784')]
[2025-02-01 08:01:58,444][04658] Updated weights for policy 0, policy_version 460 (0.0020)
[2025-02-01 08:02:02,459][00813] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3818.3). Total num frames: 1892352. Throughput: 0: 991.3. Samples: 473336. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2025-02-01 08:02:02,466][00813] Avg episode reward: [(0, '4.762')]
[2025-02-01 08:02:07,460][00813] Fps is (10 sec: 3276.5, 60 sec: 3754.7, 300 sec: 3818.3). Total num frames: 1908736. Throughput: 0: 934.4. Samples: 477950. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2025-02-01 08:02:07,467][00813] Avg episode reward: [(0, '4.784')]
[2025-02-01 08:02:10,327][04658] Updated weights for policy 0, policy_version 470 (0.0028)
[2025-02-01 08:02:12,459][00813] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 1933312. Throughput: 0: 933.7. Samples: 481310. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2025-02-01 08:02:12,467][00813] Avg episode reward: [(0, '4.849')]
[2025-02-01 08:02:17,465][00813] Fps is (10 sec: 4094.0, 60 sec: 3890.8, 300 sec: 3818.2). Total num frames: 1949696. Throughput: 0: 981.3. Samples: 487814. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-02-01 08:02:17,474][00813] Avg episode reward: [(0, '4.888')]
[2025-02-01 08:02:22,247][04658] Updated weights for policy 0, policy_version 480 (0.0024)
[2025-02-01 08:02:22,459][00813] Fps is (10 sec: 3276.7, 60 sec: 3754.7, 300 sec: 3804.4). Total num frames: 1966080. Throughput: 0: 916.5. Samples: 491826. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2025-02-01 08:02:22,466][00813] Avg episode reward: [(0, '5.039')]
[2025-02-01 08:02:27,462][00813] Fps is (10 sec: 3687.7, 60 sec: 3754.5, 300 sec: 3832.2). Total num frames: 1986560. Throughput: 0: 904.8. Samples: 494768. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-02-01 08:02:27,467][00813] Avg episode reward: [(0, '5.055')]
[2025-02-01 08:02:31,780][04658] Updated weights for policy 0, policy_version 490 (0.0022)
[2025-02-01 08:02:32,459][00813] Fps is (10 sec: 4096.1, 60 sec: 3822.9, 300 sec: 3818.3). Total num frames: 2007040. Throughput: 0: 946.0. Samples: 501392. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2025-02-01 08:02:32,464][00813] Avg episode reward: [(0, '5.254')]
[2025-02-01 08:02:37,460][00813] Fps is (10 sec: 3687.1, 60 sec: 3686.4, 300 sec: 3804.4). Total num frames: 2023424. Throughput: 0: 930.6. Samples: 506246. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2025-02-01 08:02:37,465][00813] Avg episode reward: [(0, '5.129')]
[2025-02-01 08:02:42,459][00813] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3804.4). Total num frames: 2039808. Throughput: 0: 903.9. Samples: 508358. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-01 08:02:42,470][00813] Avg episode reward: [(0, '5.019')]
[2025-02-01 08:02:43,676][04658] Updated weights for policy 0, policy_version 500 (0.0014)
[2025-02-01 08:02:47,459][00813] Fps is (10 sec: 4096.1, 60 sec: 3822.9, 300 sec: 3818.3). Total num frames: 2064384. Throughput: 0: 930.2. Samples: 515196. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2025-02-01 08:02:47,462][00813] Avg episode reward: [(0, '5.024')]
[2025-02-01 08:02:52,459][00813] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3790.5). Total num frames: 2080768. Throughput: 0: 962.7. Samples: 521270. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-02-01 08:02:52,464][00813] Avg episode reward: [(0, '5.281')]
[2025-02-01 08:02:54,091][04658] Updated weights for policy 0, policy_version 510 (0.0018)
[2025-02-01 08:02:57,461][00813] Fps is (10 sec: 3276.2, 60 sec: 3686.3, 300 sec: 3790.5). Total num frames: 2097152. Throughput: 0: 935.4. Samples: 523406. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-02-01 08:02:57,463][00813] Avg episode reward: [(0, '5.424')]
[2025-02-01 08:02:57,475][04645] Saving new best policy, reward=5.424!
[2025-02-01 08:03:02,459][00813] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3818.3). Total num frames: 2121728. Throughput: 0: 923.1. Samples: 529348. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-02-01 08:03:02,463][00813] Avg episode reward: [(0, '5.367')]
[2025-02-01 08:03:04,017][04658] Updated weights for policy 0, policy_version 520 (0.0019)
[2025-02-01 08:03:07,460][00813] Fps is (10 sec: 4505.9, 60 sec: 3891.2, 300 sec: 3818.3). Total num frames: 2142208. Throughput: 0: 992.9. Samples: 536506. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2025-02-01 08:03:07,467][00813] Avg episode reward: [(0, '5.652')]
[2025-02-01 08:03:07,479][04645] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000523_2142208.pth...
[2025-02-01 08:03:07,663][04645] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000301_1232896.pth
[2025-02-01 08:03:07,681][04645] Saving new best policy, reward=5.652!
[2025-02-01 08:03:12,461][00813] Fps is (10 sec: 3685.7, 60 sec: 3754.6, 300 sec: 3804.4). Total num frames: 2158592. Throughput: 0: 975.6. Samples: 538670. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-02-01 08:03:12,463][00813] Avg episode reward: [(0, '5.566')]
[2025-02-01 08:03:15,512][04658] Updated weights for policy 0, policy_version 530 (0.0030)
[2025-02-01 08:03:17,459][00813] Fps is (10 sec: 3686.8, 60 sec: 3823.3, 300 sec: 3818.3). Total num frames: 2179072. Throughput: 0: 941.9. Samples: 543778. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-02-01 08:03:17,461][00813] Avg episode reward: [(0, '5.444')]
[2025-02-01 08:03:22,459][00813] Fps is (10 sec: 4506.5, 60 sec: 3959.5, 300 sec: 3832.2). Total num frames: 2203648. Throughput: 0: 994.4. Samples: 550994. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-02-01 08:03:22,462][00813] Avg episode reward: [(0, '5.346')]
[2025-02-01 08:03:24,164][04658] Updated weights for policy 0, policy_version 540 (0.0016)
[2025-02-01 08:03:27,461][00813] Fps is (10 sec: 4095.3, 60 sec: 3891.2, 300 sec: 3804.4). Total num frames: 2220032. Throughput: 0: 1022.4. Samples: 554368. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-02-01 08:03:27,463][00813] Avg episode reward: [(0, '5.328')]
[2025-02-01 08:03:32,459][00813] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3818.3). Total num frames: 2236416. Throughput: 0: 963.7. Samples: 558564. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-02-01 08:03:32,462][00813] Avg episode reward: [(0, '5.620')]
[2025-02-01 08:03:36,058][04658] Updated weights for policy 0, policy_version 550 (0.0015)
[2025-02-01 08:03:37,459][00813] Fps is (10 sec: 3687.0, 60 sec: 3891.2, 300 sec: 3818.3). Total num frames: 2256896. Throughput: 0: 970.8. Samples: 564958. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-02-01 08:03:37,461][00813] Avg episode reward: [(0, '5.869')]
[2025-02-01 08:03:37,476][04645] Saving new best policy, reward=5.869!
[2025-02-01 08:03:42,462][00813] Fps is (10 sec: 4094.8, 60 sec: 3959.3, 300 sec: 3818.3). Total num frames: 2277376. Throughput: 0: 1000.0. Samples: 568408. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-02-01 08:03:42,464][00813] Avg episode reward: [(0, '5.957')]
[2025-02-01 08:03:42,512][04645] Saving new best policy, reward=5.957!
[2025-02-01 08:03:47,123][04658] Updated weights for policy 0, policy_version 560 (0.0025)
[2025-02-01 08:03:47,465][00813] Fps is (10 sec: 3684.3, 60 sec: 3822.6, 300 sec: 3804.3). Total num frames: 2293760. Throughput: 0: 974.9. Samples: 573224. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-02-01 08:03:47,469][00813] Avg episode reward: [(0, '6.228')]
[2025-02-01 08:03:47,488][04645] Saving new best policy, reward=6.228!
[2025-02-01 08:03:52,459][00813] Fps is (10 sec: 3687.5, 60 sec: 3891.2, 300 sec: 3818.3). Total num frames: 2314240. Throughput: 0: 943.4. Samples: 578956. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-01 08:03:52,464][00813] Avg episode reward: [(0, '6.513')]
[2025-02-01 08:03:52,472][04645] Saving new best policy, reward=6.513!
[2025-02-01 08:03:56,388][04658] Updated weights for policy 0, policy_version 570 (0.0023)
[2025-02-01 08:03:57,459][00813] Fps is (10 sec: 4508.2, 60 sec: 4027.8, 300 sec: 3832.2). Total num frames: 2338816. Throughput: 0: 973.2. Samples: 582462. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-02-01 08:03:57,463][00813] Avg episode reward: [(0, '6.153')]
[2025-02-01 08:04:02,461][00813] Fps is (10 sec: 4095.4, 60 sec: 3891.1, 300 sec: 3804.4). Total num frames: 2355200. Throughput: 0: 998.1. Samples: 588694. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-01 08:04:02,467][00813] Avg episode reward: [(0, '6.175')]
[2025-02-01 08:04:07,459][00813] Fps is (10 sec: 3276.8, 60 sec: 3823.0, 300 sec: 3818.3). Total num frames: 2371584. Throughput: 0: 944.4. Samples: 593494. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-02-01 08:04:07,466][00813] Avg episode reward: [(0, '6.494')]
[2025-02-01 08:04:07,792][04658] Updated weights for policy 0, policy_version 580 (0.0027)
[2025-02-01 08:04:12,459][00813] Fps is (10 sec: 4096.5, 60 sec: 3959.6, 300 sec: 3832.2). Total num frames: 2396160. Throughput: 0: 950.5. Samples: 597140. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-02-01 08:04:12,461][00813] Avg episode reward: [(0, '6.964')]
[2025-02-01 08:04:12,463][04645] Saving new best policy, reward=6.964!
[2025-02-01 08:04:16,599][04658] Updated weights for policy 0, policy_version 590 (0.0020)
[2025-02-01 08:04:17,463][00813] Fps is (10 sec: 4503.9, 60 sec: 3959.2, 300 sec: 3832.1). Total num frames: 2416640. Throughput: 0: 1014.4. Samples: 604218. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-02-01 08:04:17,465][00813] Avg episode reward: [(0, '7.239')]
[2025-02-01 08:04:17,479][04645] Saving new best policy, reward=7.239!
[2025-02-01 08:04:22,459][00813] Fps is (10 sec: 3686.5, 60 sec: 3822.9, 300 sec: 3818.3). Total num frames: 2433024. Throughput: 0: 967.5. Samples: 608496. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-02-01 08:04:22,464][00813] Avg episode reward: [(0, '7.517')]
[2025-02-01 08:04:22,475][04645] Saving new best policy, reward=7.517!
[2025-02-01 08:04:27,459][00813] Fps is (10 sec: 3687.8, 60 sec: 3891.3, 300 sec: 3846.1). Total num frames: 2453504. Throughput: 0: 955.1. Samples: 611384. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-02-01 08:04:27,461][00813] Avg episode reward: [(0, '7.542')]
[2025-02-01 08:04:27,472][04645] Saving new best policy, reward=7.542!
[2025-02-01 08:04:28,057][04658] Updated weights for policy 0, policy_version 600 (0.0014)
[2025-02-01 08:04:32,459][00813] Fps is (10 sec: 4505.5, 60 sec: 4027.7, 300 sec: 3846.1). Total num frames: 2478080. Throughput: 0: 1005.6. Samples: 618468. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-02-01 08:04:32,461][00813] Avg episode reward: [(0, '7.716')]
[2025-02-01 08:04:32,463][04645] Saving new best policy, reward=7.716!
[2025-02-01 08:04:37,459][00813] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3818.3). Total num frames: 2494464. Throughput: 0: 999.1. Samples: 623916. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-02-01 08:04:37,461][00813] Avg episode reward: [(0, '8.002')]
[2025-02-01 08:04:37,470][04645] Saving new best policy, reward=8.002!
[2025-02-01 08:04:38,826][04658] Updated weights for policy 0, policy_version 610 (0.0015)
[2025-02-01 08:04:42,460][00813] Fps is (10 sec: 3276.6, 60 sec: 3891.4, 300 sec: 3832.3). Total num frames: 2510848. Throughput: 0: 969.3. Samples: 626082. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-02-01 08:04:42,466][00813] Avg episode reward: [(0, '8.528')]
[2025-02-01 08:04:42,469][04645] Saving new best policy, reward=8.528!
[2025-02-01 08:04:47,459][00813] Fps is (10 sec: 4096.0, 60 sec: 4028.1, 300 sec: 3846.1). Total num frames: 2535424. Throughput: 0: 985.9. Samples: 633056. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-02-01 08:04:47,466][00813] Avg episode reward: [(0, '8.976')]
[2025-02-01 08:04:47,485][04645] Saving new best policy, reward=8.976!
[2025-02-01 08:04:48,089][04658] Updated weights for policy 0, policy_version 620 (0.0019)
[2025-02-01 08:04:52,459][00813] Fps is (10 sec: 4505.8, 60 sec: 4027.7, 300 sec: 3832.2). Total num frames: 2555904. Throughput: 0: 1020.7. Samples: 639426. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-02-01 08:04:52,462][00813] Avg episode reward: [(0, '8.076')]
[2025-02-01 08:04:57,459][00813] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3818.3). Total num frames: 2568192. Throughput: 0: 986.8. Samples: 641544. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-02-01 08:04:57,467][00813] Avg episode reward: [(0, '7.264')]
[2025-02-01 08:04:59,463][04658] Updated weights for policy 0, policy_version 630 (0.0016)
[2025-02-01 08:05:02,459][00813] Fps is (10 sec: 3686.4, 60 sec: 3959.6, 300 sec: 3846.1). Total num frames: 2592768. Throughput: 0: 965.5. Samples: 647660. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-02-01 08:05:02,466][00813] Avg episode reward: [(0, '7.093')]
[2025-02-01 08:05:07,459][00813] Fps is (10 sec: 4915.2, 60 sec: 4096.0, 300 sec: 3860.0). Total num frames: 2617344. Throughput: 0: 1031.0. Samples: 654892. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-02-01 08:05:07,462][00813] Avg episode reward: [(0, '7.297')]
[2025-02-01 08:05:07,482][04645] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000639_2617344.pth...
[2025-02-01 08:05:07,661][04645] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000411_1683456.pth
[2025-02-01 08:05:08,146][04658] Updated weights for policy 0, policy_version 640 (0.0021)
[2025-02-01 08:05:12,459][00813] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3846.2). Total num frames: 2633728. Throughput: 0: 1019.1. Samples: 657244. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-02-01 08:05:12,463][00813] Avg episode reward: [(0, '7.780')]
[2025-02-01 08:05:17,459][00813] Fps is (10 sec: 3276.8, 60 sec: 3891.4, 300 sec: 3860.0). Total num frames: 2650112. Throughput: 0: 974.0. Samples: 662298. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-02-01 08:05:17,462][00813] Avg episode reward: [(0, '8.317')]
[2025-02-01 08:05:19,280][04658] Updated weights for policy 0, policy_version 650 (0.0031)
[2025-02-01 08:05:22,460][00813] Fps is (10 sec: 4095.7, 60 sec: 4027.7, 300 sec: 3873.8). Total num frames: 2674688. Throughput: 0: 1010.3. Samples: 669382. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-02-01 08:05:22,462][00813] Avg episode reward: [(0, '9.335')]
[2025-02-01 08:05:22,466][04645] Saving new best policy, reward=9.335!
[2025-02-01 08:05:27,459][00813] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3860.0). Total num frames: 2695168. Throughput: 0: 1038.4. Samples: 672808. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-02-01 08:05:27,470][00813] Avg episode reward: [(0, '9.665')]
[2025-02-01 08:05:27,483][04645] Saving new best policy, reward=9.665!
[2025-02-01 08:05:30,055][04658] Updated weights for policy 0, policy_version 660 (0.0025)
[2025-02-01 08:05:32,459][00813] Fps is (10 sec: 3686.7, 60 sec: 3891.2, 300 sec: 3873.9). Total num frames: 2711552. Throughput: 0: 979.2. Samples: 677120. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-02-01 08:05:32,461][00813] Avg episode reward: [(0, '9.821')]
[2025-02-01 08:05:32,465][04645] Saving new best policy, reward=9.821!
[2025-02-01 08:05:37,459][00813] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3873.8). Total num frames: 2732032. Throughput: 0: 986.4. Samples: 683816. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-01 08:05:37,461][00813] Avg episode reward: [(0, '9.658')]
[2025-02-01 08:05:39,410][04658] Updated weights for policy 0, policy_version 670 (0.0026)
[2025-02-01 08:05:42,459][00813] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 3887.7). Total num frames: 2756608. Throughput: 0: 1019.5. Samples: 687422. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-02-01 08:05:42,461][00813] Avg episode reward: [(0, '9.861')]
[2025-02-01 08:05:42,465][04645] Saving new best policy, reward=9.861!
[2025-02-01 08:05:47,466][00813] Fps is (10 sec: 3683.8, 60 sec: 3890.8, 300 sec: 3873.8). Total num frames: 2768896. Throughput: 0: 997.0. Samples: 692532. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-02-01 08:05:47,473][00813] Avg episode reward: [(0, '9.723')]
[2025-02-01 08:05:50,799][04658] Updated weights for policy 0, policy_version 680 (0.0020)
[2025-02-01 08:05:52,459][00813] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3901.6). Total num frames: 2789376. Throughput: 0: 963.2. Samples: 698236. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-02-01 08:05:52,466][00813] Avg episode reward: [(0, '9.286')]
[2025-02-01 08:05:57,459][00813] Fps is (10 sec: 4508.7, 60 sec: 4096.0, 300 sec: 3901.6). Total num frames: 2813952. Throughput: 0: 992.6. Samples: 701910. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-02-01 08:05:57,462][00813] Avg episode reward: [(0, '8.404')]
[2025-02-01 08:05:59,405][04658] Updated weights for policy 0, policy_version 690 (0.0013)
[2025-02-01 08:06:02,459][00813] Fps is (10 sec: 4505.5, 60 sec: 4027.7, 300 sec: 3901.6). Total num frames: 2834432. Throughput: 0: 1021.6. Samples: 708268. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-02-01 08:06:02,462][00813] Avg episode reward: [(0, '8.296')]
[2025-02-01 08:06:07,459][00813] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3901.6). Total num frames: 2850816. Throughput: 0: 973.6. Samples: 713194. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-02-01 08:06:07,466][00813] Avg episode reward: [(0, '8.496')]
[2025-02-01 08:06:10,529][04658] Updated weights for policy 0, policy_version 700 (0.0021)
[2025-02-01 08:06:12,459][00813] Fps is (10 sec: 4096.1, 60 sec: 4027.7, 300 sec: 3929.4). Total num frames: 2875392. Throughput: 0: 978.7. Samples: 716848. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2025-02-01 08:06:12,468][00813] Avg episode reward: [(0, '9.418')]
[2025-02-01 08:06:17,461][00813] Fps is (10 sec: 4504.8, 60 sec: 4095.9, 300 sec: 3915.5). Total num frames: 2895872. Throughput: 0: 1044.6. Samples: 724130. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-02-01 08:06:17,465][00813] Avg episode reward: [(0, '10.534')]
[2025-02-01 08:06:17,479][04645] Saving new best policy, reward=10.534!
[2025-02-01 08:06:21,101][04658] Updated weights for policy 0, policy_version 710 (0.0025)
[2025-02-01 08:06:22,459][00813] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3887.7). Total num frames: 2908160. Throughput: 0: 988.3. Samples: 728288. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2025-02-01 08:06:22,463][00813] Avg episode reward: [(0, '10.777')]
[2025-02-01 08:06:22,534][04645] Saving new best policy, reward=10.777!
[2025-02-01 08:06:27,459][00813] Fps is (10 sec: 3686.9, 60 sec: 3959.4, 300 sec: 3915.5). Total num frames: 2932736. Throughput: 0: 977.5. Samples: 731412. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-02-01 08:06:27,468][00813] Avg episode reward: [(0, '10.459')]
[2025-02-01 08:06:30,393][04658] Updated weights for policy 0, policy_version 720 (0.0032)
[2025-02-01 08:06:32,459][00813] Fps is (10 sec: 4915.2, 60 sec: 4096.0, 300 sec: 3915.5). Total num frames: 2957312. Throughput: 0: 1024.4. Samples: 738622. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2025-02-01 08:06:32,466][00813] Avg episode reward: [(0, '10.908')]
[2025-02-01 08:06:32,468][04645] Saving new best policy, reward=10.908!
[2025-02-01 08:06:37,460][00813] Fps is (10 sec: 4095.8, 60 sec: 4027.7, 300 sec: 3901.6). Total num frames: 2973696. Throughput: 0: 1017.3. Samples: 744016. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-02-01 08:06:37,462][00813] Avg episode reward: [(0, '10.526')]
[2025-02-01 08:06:41,694][04658] Updated weights for policy 0, policy_version 730 (0.0035)
[2025-02-01 08:06:42,459][00813] Fps is (10 sec: 3276.7, 60 sec: 3891.2, 300 sec: 3915.5). Total num frames: 2990080. Throughput: 0: 983.2. Samples: 746152. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-02-01 08:06:42,463][00813] Avg episode reward: [(0, '12.370')]
[2025-02-01 08:06:42,525][04645] Saving new best policy, reward=12.370!
[2025-02-01 08:06:47,459][00813] Fps is (10 sec: 4096.3, 60 sec: 4096.5, 300 sec: 3929.4). Total num frames: 3014656. Throughput: 0: 999.7. Samples: 753256. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-02-01 08:06:47,463][00813] Avg episode reward: [(0, '12.512')]
[2025-02-01 08:06:47,469][04645] Saving new best policy, reward=12.512!
[2025-02-01 08:06:50,750][04658] Updated weights for policy 0, policy_version 740 (0.0014)
[2025-02-01 08:06:52,459][00813] Fps is (10 sec: 4505.7, 60 sec: 4096.0, 300 sec: 3929.4). Total num frames: 3035136. Throughput: 0: 1026.0. Samples: 759364. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-02-01 08:06:52,463][00813] Avg episode reward: [(0, '12.556')]
[2025-02-01 08:06:52,470][04645] Saving new best policy, reward=12.556!
[2025-02-01 08:06:57,459][00813] Fps is (10 sec: 3276.7, 60 sec: 3891.2, 300 sec: 3915.5). Total num frames: 3047424. Throughput: 0: 990.0. Samples: 761400. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2025-02-01 08:06:57,468][00813] Avg episode reward: [(0, '11.570')]
[2025-02-01 08:07:01,828][04658] Updated weights for policy 0, policy_version 750 (0.0021)
[2025-02-01 08:07:02,459][00813] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3943.3). Total num frames: 3072000. Throughput: 0: 964.4. Samples: 767526. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-02-01 08:07:02,466][00813] Avg episode reward: [(0, '11.912')]
[2025-02-01 08:07:07,459][00813] Fps is (10 sec: 4915.3, 60 sec: 4096.0, 300 sec: 3943.3). Total num frames: 3096576. Throughput: 0: 1034.6. Samples: 774846. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-02-01 08:07:07,464][00813] Avg episode reward: [(0, '12.159')]
[2025-02-01 08:07:07,481][04645] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000756_3096576.pth...
[2025-02-01 08:07:07,649][04645] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000523_2142208.pth
[2025-02-01 08:07:12,408][04658] Updated weights for policy 0, policy_version 760 (0.0024)
[2025-02-01 08:07:12,459][00813] Fps is (10 sec: 4096.1, 60 sec: 3959.5, 300 sec: 3943.3). Total num frames: 3112960. Throughput: 0: 1014.9. Samples: 777082. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2025-02-01 08:07:12,461][00813] Avg episode reward: [(0, '12.273')]
[2025-02-01 08:07:17,459][00813] Fps is (10 sec: 3276.8, 60 sec: 3891.3, 300 sec: 3943.3). Total num frames: 3129344. Throughput: 0: 967.0. Samples: 782138. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-02-01 08:07:17,466][00813] Avg episode reward: [(0, '13.600')]
[2025-02-01 08:07:17,477][04645] Saving new best policy, reward=13.600!
[2025-02-01 08:07:21,882][04658] Updated weights for policy 0, policy_version 770 (0.0026)
[2025-02-01 08:07:22,459][00813] Fps is (10 sec: 4095.9, 60 sec: 4096.0, 300 sec: 3957.2). Total num frames: 3153920. Throughput: 0: 1004.2. Samples: 789206. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-02-01 08:07:22,462][00813] Avg episode reward: [(0, '13.484')]
[2025-02-01 08:07:27,459][00813] Fps is (10 sec: 4505.6, 60 sec: 4027.8, 300 sec: 3957.2). Total num frames: 3174400. Throughput: 0: 1032.4. Samples: 792610. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-02-01 08:07:27,462][00813] Avg episode reward: [(0, '13.438')]
[2025-02-01 08:07:32,459][00813] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3957.2). Total num frames: 3190784. Throughput: 0: 971.0. Samples: 796952. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0)
[2025-02-01 08:07:32,461][00813] Avg episode reward: [(0, '13.489')]
[2025-02-01 08:07:33,283][04658] Updated weights for policy 0, policy_version 780 (0.0016)
[2025-02-01 08:07:37,459][00813] Fps is (10 sec: 4096.0, 60 sec: 4027.8, 300 sec: 3984.9). Total num frames: 3215360. Throughput: 0: 990.1. Samples: 803920. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-02-01 08:07:37,466][00813] Avg episode reward: [(0, '13.251')]
[2025-02-01 08:07:41,545][04658] Updated weights for policy 0, policy_version 790 (0.0012)
[2025-02-01 08:07:42,465][00813] Fps is (10 sec: 4503.0, 60 sec: 4095.6, 300 sec: 3971.0). Total num frames: 3235840. Throughput: 0: 1026.0. Samples: 807574. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-02-01 08:07:42,467][00813] Avg episode reward: [(0, '13.773')]
[2025-02-01 08:07:42,474][04645] Saving new best policy, reward=13.773!
[2025-02-01 08:07:47,459][00813] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3971.0). Total num frames: 3252224. Throughput: 0: 1006.6. Samples: 812824. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-02-01 08:07:47,467][00813] Avg episode reward: [(0, '14.986')]
[2025-02-01 08:07:47,479][04645] Saving new best policy, reward=14.986!
[2025-02-01 08:07:52,459][00813] Fps is (10 sec: 3688.5, 60 sec: 3959.5, 300 sec: 3984.9). Total num frames: 3272704. Throughput: 0: 974.5. Samples: 818698. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-02-01 08:07:52,463][00813] Avg episode reward: [(0, '14.858')]
[2025-02-01 08:07:52,880][04658] Updated weights for policy 0, policy_version 800 (0.0019)
[2025-02-01 08:07:57,459][00813] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 3984.9). Total num frames: 3297280. Throughput: 0: 1005.4. Samples: 822326. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-02-01 08:07:57,465][00813] Avg episode reward: [(0, '16.155')]
[2025-02-01 08:07:57,471][04645] Saving new best policy, reward=16.155!
[2025-02-01 08:08:02,459][00813] Fps is (10 sec: 4095.9, 60 sec: 4027.7, 300 sec: 3971.0). Total num frames: 3313664. Throughput: 0: 1031.7. Samples: 828566. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-02-01 08:08:02,465][00813] Avg episode reward: [(0, '16.522')]
[2025-02-01 08:08:02,469][04645] Saving new best policy, reward=16.522!
[2025-02-01 08:08:03,176][04658] Updated weights for policy 0, policy_version 810 (0.0027)
[2025-02-01 08:08:07,459][00813] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3971.1). Total num frames: 3330048. Throughput: 0: 975.2. Samples: 833088. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-02-01 08:08:07,463][00813] Avg episode reward: [(0, '17.163')]
[2025-02-01 08:08:07,468][04645] Saving new best policy, reward=17.163!
[2025-02-01 08:08:12,459][00813] Fps is (10 sec: 4096.1, 60 sec: 4027.7, 300 sec: 3984.9). Total num frames: 3354624. Throughput: 0: 978.1. Samples: 836626. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2025-02-01 08:08:12,466][00813] Avg episode reward: [(0, '17.103')]
[2025-02-01 08:08:13,335][04658] Updated weights for policy 0, policy_version 820 (0.0017)
[2025-02-01 08:08:17,460][00813] Fps is (10 sec: 4505.2, 60 sec: 4095.9, 300 sec: 3971.0). Total num frames: 3375104. Throughput: 0: 1025.8. Samples: 843112. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-02-01 08:08:17,462][00813] Avg episode reward: [(0, '16.323')]
[2025-02-01 08:08:22,459][00813] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3957.2). Total num frames: 3387392. Throughput: 0: 966.2. Samples: 847400. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-02-01 08:08:22,467][00813] Avg episode reward: [(0, '16.150')]
[2025-02-01 08:08:25,332][04658] Updated weights for policy 0, policy_version 830 (0.0018)
[2025-02-01 08:08:27,459][00813] Fps is (10 sec: 3277.1, 60 sec: 3891.2, 300 sec: 3971.0). Total num frames: 3407872. Throughput: 0: 945.0. Samples: 850092. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-02-01 08:08:27,462][00813] Avg episode reward: [(0, '15.309')]
[2025-02-01 08:08:32,459][00813] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3984.9). Total num frames: 3432448. Throughput: 0: 984.1. Samples: 857108. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-02-01 08:08:32,461][00813] Avg episode reward: [(0, '16.451')]
[2025-02-01 08:08:34,420][04658] Updated weights for policy 0, policy_version 840 (0.0021)
[2025-02-01 08:08:37,459][00813] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3971.1). Total num frames: 3448832. Throughput: 0: 975.6. Samples: 862598. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-02-01 08:08:37,463][00813] Avg episode reward: [(0, '16.027')]
[2025-02-01 08:08:42,459][00813] Fps is (10 sec: 3276.8, 60 sec: 3823.3, 300 sec: 3971.1). Total num frames: 3465216. Throughput: 0: 943.2. Samples: 864768. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-02-01 08:08:42,464][00813] Avg episode reward: [(0, '16.412')]
[2025-02-01 08:08:45,803][04658] Updated weights for policy 0, policy_version 850 (0.0037)
[2025-02-01 08:08:47,459][00813] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3971.0). Total num frames: 3485696. Throughput: 0: 946.2. Samples: 871144. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-02-01 08:08:47,467][00813] Avg episode reward: [(0, '17.725')]
[2025-02-01 08:08:47,478][04645] Saving new best policy, reward=17.725!
[2025-02-01 08:08:52,459][00813] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3957.2). Total num frames: 3506176. Throughput: 0: 983.1. Samples: 877326. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-02-01 08:08:52,468][00813] Avg episode reward: [(0, '17.717')]
[2025-02-01 08:08:57,459][00813] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3943.3). Total num frames: 3518464. Throughput: 0: 948.0. Samples: 879288. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2025-02-01 08:08:57,464][00813] Avg episode reward: [(0, '17.877')]
[2025-02-01 08:08:57,474][04645] Saving new best policy, reward=17.877!
[2025-02-01 08:08:57,797][04658] Updated weights for policy 0, policy_version 860 (0.0018)
[2025-02-01 08:09:02,459][00813] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3957.2). Total num frames: 3538944. Throughput: 0: 917.1. Samples: 884380. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-02-01 08:09:02,461][00813] Avg episode reward: [(0, '18.114')]
[2025-02-01 08:09:02,467][04645] Saving new best policy, reward=18.114!
[2025-02-01 08:09:07,368][04658] Updated weights for policy 0, policy_version 870 (0.0020)
[2025-02-01 08:09:07,459][00813] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3957.2). Total num frames: 3563520. Throughput: 0: 966.8. Samples: 890904. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-01 08:09:07,462][00813] Avg episode reward: [(0, '16.961')]
[2025-02-01 08:09:07,475][04645] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000870_3563520.pth...
[2025-02-01 08:09:07,604][04645] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000639_2617344.pth
[2025-02-01 08:09:12,461][00813] Fps is (10 sec: 3685.8, 60 sec: 3686.3, 300 sec: 3929.4). Total num frames: 3575808. Throughput: 0: 965.4. Samples: 893536. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-01 08:09:12,472][00813] Avg episode reward: [(0, '16.485')]
[2025-02-01 08:09:17,459][00813] Fps is (10 sec: 3276.8, 60 sec: 3686.5, 300 sec: 3943.3). Total num frames: 3596288. Throughput: 0: 907.6. Samples: 897948. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-01 08:09:17,461][00813] Avg episode reward: [(0, '15.016')]
[2025-02-01 08:09:19,038][04658] Updated weights for policy 0, policy_version 880 (0.0023)
[2025-02-01 08:09:22,459][00813] Fps is (10 sec: 4096.7, 60 sec: 3822.9, 300 sec: 3943.3). Total num frames: 3616768. Throughput: 0: 939.7. Samples: 904886. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-01 08:09:22,463][00813] Avg episode reward: [(0, '12.802')]
[2025-02-01 08:09:27,459][00813] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3929.4). Total num frames: 3637248. Throughput: 0: 968.3. Samples: 908340. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-02-01 08:09:27,477][00813] Avg episode reward: [(0, '12.436')]
[2025-02-01 08:09:29,381][04658] Updated weights for policy 0, policy_version 890 (0.0029)
[2025-02-01 08:09:32,460][00813] Fps is (10 sec: 3686.3, 60 sec: 3686.4, 300 sec: 3929.4). Total num frames: 3653632. Throughput: 0: 927.6. Samples: 912888. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-01 08:09:32,463][00813] Avg episode reward: [(0, '13.602')]
[2025-02-01 08:09:37,459][00813] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3943.3). Total num frames: 3674112. Throughput: 0: 931.6. Samples: 919248. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-02-01 08:09:37,465][00813] Avg episode reward: [(0, '14.698')]
[2025-02-01 08:09:39,387][04658] Updated weights for policy 0, policy_version 900 (0.0026)
[2025-02-01 08:09:42,459][00813] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3943.3). Total num frames: 3698688. Throughput: 0: 966.4. Samples: 922774. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-01 08:09:42,461][00813] Avg episode reward: [(0, '16.542')]
[2025-02-01 08:09:47,461][00813] Fps is (10 sec: 3685.6, 60 sec: 3754.5, 300 sec: 3915.5). Total num frames: 3710976. Throughput: 0: 974.8. Samples: 928246. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2025-02-01 08:09:47,467][00813] Avg episode reward: [(0, '17.809')]
[2025-02-01 08:09:51,211][04658] Updated weights for policy 0, policy_version 910 (0.0017)
[2025-02-01 08:09:52,460][00813] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3943.3). Total num frames: 3731456. Throughput: 0: 942.8. Samples: 933332. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-02-01 08:09:52,465][00813] Avg episode reward: [(0, '18.019')]
[2025-02-01 08:09:57,459][00813] Fps is (10 sec: 4096.9, 60 sec: 3891.2, 300 sec: 3929.4). Total num frames: 3751936. Throughput: 0: 957.5. Samples: 936620. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-02-01 08:09:57,461][00813] Avg episode reward: [(0, '18.548')]
[2025-02-01 08:09:57,471][04645] Saving new best policy, reward=18.548!
[2025-02-01 08:10:00,856][04658] Updated weights for policy 0, policy_version 920 (0.0019)
[2025-02-01 08:10:02,460][00813] Fps is (10 sec: 4095.8, 60 sec: 3891.2, 300 sec: 3915.5). Total num frames: 3772416. Throughput: 0: 996.1. Samples: 942772. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-02-01 08:10:02,467][00813] Avg episode reward: [(0, '17.339')]
[2025-02-01 08:10:07,459][00813] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3901.6). Total num frames: 3784704. Throughput: 0: 932.7. Samples: 946858. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-02-01 08:10:07,463][00813] Avg episode reward: [(0, '17.397')]
[2025-02-01 08:10:12,447][04658] Updated weights for policy 0, policy_version 930 (0.0025)
[2025-02-01 08:10:12,459][00813] Fps is (10 sec: 3686.7, 60 sec: 3891.3, 300 sec: 3929.4). Total num frames: 3809280. Throughput: 0: 930.4. Samples: 950210. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-02-01 08:10:12,461][00813] Avg episode reward: [(0, '18.785')]
[2025-02-01 08:10:12,467][04645] Saving new best policy, reward=18.785!
[2025-02-01 08:10:17,459][00813] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3915.5). Total num frames: 3829760. Throughput: 0: 969.8. Samples: 956530. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-02-01 08:10:17,466][00813] Avg episode reward: [(0, '19.986')]
[2025-02-01 08:10:17,474][04645] Saving new best policy, reward=19.986!
[2025-02-01 08:10:22,460][00813] Fps is (10 sec: 3276.4, 60 sec: 3754.6, 300 sec: 3887.7). Total num frames: 3842048. Throughput: 0: 920.9. Samples: 960688. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-01 08:10:22,466][00813] Avg episode reward: [(0, '19.533')]
[2025-02-01 08:10:25,060][04658] Updated weights for policy 0, policy_version 940 (0.0025)
[2025-02-01 08:10:27,459][00813] Fps is (10 sec: 2867.2, 60 sec: 3686.4, 300 sec: 3887.7). Total num frames: 3858432. Throughput: 0: 893.0. Samples: 962960. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-01 08:10:27,462][00813] Avg episode reward: [(0, '20.216')]
[2025-02-01 08:10:27,470][04645] Saving new best policy, reward=20.216!
[2025-02-01 08:10:32,459][00813] Fps is (10 sec: 3686.8, 60 sec: 3754.7, 300 sec: 3887.7). Total num frames: 3878912. Throughput: 0: 911.6. Samples: 969266. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-02-01 08:10:32,465][00813] Avg episode reward: [(0, '20.572')]
[2025-02-01 08:10:32,470][04645] Saving new best policy, reward=20.572!
[2025-02-01 08:10:35,182][04658] Updated weights for policy 0, policy_version 950 (0.0017)
[2025-02-01 08:10:37,459][00813] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3860.0). Total num frames: 3895296. Throughput: 0: 915.2. Samples: 974518. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-02-01 08:10:37,465][00813] Avg episode reward: [(0, '19.200')]
[2025-02-01 08:10:42,459][00813] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3873.9). Total num frames: 3911680. Throughput: 0: 886.1. Samples: 976494. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-02-01 08:10:42,467][00813] Avg episode reward: [(0, '18.908')]
[2025-02-01 08:10:46,846][04658] Updated weights for policy 0, policy_version 960 (0.0013)
[2025-02-01 08:10:47,459][00813] Fps is (10 sec: 3686.4, 60 sec: 3686.5, 300 sec: 3873.8). Total num frames: 3932160. Throughput: 0: 886.1. Samples: 982644. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-02-01 08:10:47,467][00813] Avg episode reward: [(0, '18.707')]
[2025-02-01 08:10:52,459][00813] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3860.0). Total num frames: 3952640. Throughput: 0: 937.5. Samples: 989046. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-02-01 08:10:52,465][00813] Avg episode reward: [(0, '19.056')]
[2025-02-01 08:10:57,466][00813] Fps is (10 sec: 3274.6, 60 sec: 3549.5, 300 sec: 3832.1). Total num frames: 3964928. Throughput: 0: 904.9. Samples: 990938. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-02-01 08:10:57,470][00813] Avg episode reward: [(0, '19.254')]
[2025-02-01 08:10:59,199][04658] Updated weights for policy 0, policy_version 970 (0.0018)
[2025-02-01 08:11:02,459][00813] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3846.1). Total num frames: 3985408. Throughput: 0: 872.7. Samples: 995802. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-02-01 08:11:02,462][00813] Avg episode reward: [(0, '19.211')]
[2025-02-01 08:11:06,579][04645] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
[2025-02-01 08:11:06,584][00813] Component Batcher_0 stopped!
[2025-02-01 08:11:06,581][04645] Stopping Batcher_0...
[2025-02-01 08:11:06,591][04645] Loop batcher_evt_loop terminating...
[2025-02-01 08:11:06,668][04658] Weights refcount: 2 0
[2025-02-01 08:11:06,675][04658] Stopping InferenceWorker_p0-w0...
[2025-02-01 08:11:06,675][00813] Component InferenceWorker_p0-w0 stopped!
[2025-02-01 08:11:06,680][04658] Loop inference_proc0-0_evt_loop terminating...
[2025-02-01 08:11:06,724][04645] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000756_3096576.pth
[2025-02-01 08:11:06,751][04645] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
[2025-02-01 08:11:06,966][04645] Stopping LearnerWorker_p0...
[2025-02-01 08:11:06,967][04645] Loop learner_proc0_evt_loop terminating...
[2025-02-01 08:11:06,966][00813] Component LearnerWorker_p0 stopped!
[2025-02-01 08:11:07,057][04664] Stopping RolloutWorker_w5...
[2025-02-01 08:11:07,057][00813] Component RolloutWorker_w4 stopped!
[2025-02-01 08:11:07,061][00813] Component RolloutWorker_w5 stopped!
[2025-02-01 08:11:07,064][04663] Stopping RolloutWorker_w4...
[2025-02-01 08:11:07,067][04664] Loop rollout_proc5_evt_loop terminating...
[2025-02-01 08:11:07,064][04663] Loop rollout_proc4_evt_loop terminating...
[2025-02-01 08:11:07,084][00813] Component RolloutWorker_w2 stopped!
[2025-02-01 08:11:07,087][04661] Stopping RolloutWorker_w2...
[2025-02-01 08:11:07,087][04661] Loop rollout_proc2_evt_loop terminating...
[2025-02-01 08:11:07,095][00813] Component RolloutWorker_w6 stopped!
[2025-02-01 08:11:07,098][04665] Stopping RolloutWorker_w6...
[2025-02-01 08:11:07,098][04665] Loop rollout_proc6_evt_loop terminating...
[2025-02-01 08:11:07,104][04660] Stopping RolloutWorker_w1...
[2025-02-01 08:11:07,104][00813] Component RolloutWorker_w1 stopped!
[2025-02-01 08:11:07,105][04660] Loop rollout_proc1_evt_loop terminating...
[2025-02-01 08:11:07,121][04662] Stopping RolloutWorker_w3...
[2025-02-01 08:11:07,121][00813] Component RolloutWorker_w3 stopped!
[2025-02-01 08:11:07,122][04662] Loop rollout_proc3_evt_loop terminating...
[2025-02-01 08:11:07,138][00813] Component RolloutWorker_w7 stopped!
[2025-02-01 08:11:07,138][04666] Stopping RolloutWorker_w7...
[2025-02-01 08:11:07,143][04666] Loop rollout_proc7_evt_loop terminating...
[2025-02-01 08:11:07,166][04659] Stopping RolloutWorker_w0...
[2025-02-01 08:11:07,166][00813] Component RolloutWorker_w0 stopped!
[2025-02-01 08:11:07,168][00813] Waiting for process learner_proc0 to stop...
[2025-02-01 08:11:07,166][04659] Loop rollout_proc0_evt_loop terminating...
[2025-02-01 08:11:08,906][00813] Waiting for process inference_proc0-0 to join...
[2025-02-01 08:11:08,914][00813] Waiting for process rollout_proc0 to join...
[2025-02-01 08:11:11,954][00813] Waiting for process rollout_proc1 to join...
[2025-02-01 08:11:12,195][00813] Waiting for process rollout_proc2 to join...
[2025-02-01 08:11:12,199][00813] Waiting for process rollout_proc3 to join...
[2025-02-01 08:11:12,206][00813] Waiting for process rollout_proc4 to join...
[2025-02-01 08:11:12,209][00813] Waiting for process rollout_proc5 to join...
[2025-02-01 08:11:12,214][00813] Waiting for process rollout_proc6 to join...
[2025-02-01 08:11:12,218][00813] Waiting for process rollout_proc7 to join...
[2025-02-01 08:11:12,224][00813] Batcher 0 profile tree view:
batching: 26.3893, releasing_batches: 0.0312
[2025-02-01 08:11:12,227][00813] InferenceWorker_p0-w0 profile tree view:
wait_policy: 0.0000
wait_policy_total: 414.1147
update_model: 8.6605
weight_update: 0.0027
one_step: 0.0026
handle_policy_step: 589.4426
deserialize: 14.3608, stack: 3.1484, obs_to_device_normalize: 124.3984, forward: 304.2026, send_messages: 28.7004
prepare_outputs: 89.0484
to_cpu: 53.7122
[2025-02-01 08:11:12,230][00813] Learner 0 profile tree view:
misc: 0.0040, prepare_batch: 13.7621
train: 72.7585
epoch_init: 0.0047, minibatch_init: 0.0096, losses_postprocess: 0.6173, kl_divergence: 0.6034, after_optimizer: 32.7752
calculate_losses: 25.9012
losses_init: 0.0035, forward_head: 1.4834, bptt_initial: 17.0182, tail: 1.2067, advantages_returns: 0.2991, losses: 3.4767
bptt: 2.1560
bptt_forward_core: 2.0794
update: 12.1929
clip: 0.8955
[2025-02-01 08:11:12,232][00813] RolloutWorker_w0 profile tree view:
wait_for_trajectories: 0.3411, enqueue_policy_requests: 108.7590, env_step: 819.6869, overhead: 12.1735, complete_rollouts: 7.3382
save_policy_outputs: 19.4107
split_output_tensors: 7.4442
[2025-02-01 08:11:12,234][00813] RolloutWorker_w7 profile tree view:
wait_for_trajectories: 0.2329, enqueue_policy_requests: 110.2416, env_step: 816.0737, overhead: 12.7219, complete_rollouts: 7.3473
save_policy_outputs: 18.9501
split_output_tensors: 7.2621
[2025-02-01 08:11:12,237][00813] Loop Runner_EvtLoop terminating...
[2025-02-01 08:11:12,238][00813] Runner profile tree view:
main_loop: 1079.6936
[2025-02-01 08:11:12,240][00813] Collected {0: 4005888}, FPS: 3710.2
[2025-02-01 08:11:20,971][00813] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
[2025-02-01 08:11:20,973][00813] Overriding arg 'num_workers' with value 1 passed from command line
[2025-02-01 08:11:20,975][00813] Adding new argument 'no_render'=True that is not in the saved config file!
[2025-02-01 08:11:20,977][00813] Adding new argument 'save_video'=True that is not in the saved config file!
[2025-02-01 08:11:20,979][00813] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
[2025-02-01 08:11:20,980][00813] Adding new argument 'video_name'=None that is not in the saved config file!
[2025-02-01 08:11:20,982][00813] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file!
[2025-02-01 08:11:20,983][00813] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
[2025-02-01 08:11:20,984][00813] Adding new argument 'push_to_hub'=False that is not in the saved config file!
[2025-02-01 08:11:20,986][00813] Adding new argument 'hf_repository'=None that is not in the saved config file!
[2025-02-01 08:11:20,987][00813] Adding new argument 'policy_index'=0 that is not in the saved config file!
[2025-02-01 08:11:20,988][00813] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
[2025-02-01 08:11:20,989][00813] Adding new argument 'train_script'=None that is not in the saved config file!
[2025-02-01 08:11:20,990][00813] Adding new argument 'enjoy_script'=None that is not in the saved config file!
[2025-02-01 08:11:20,991][00813] Using frameskip 1 and render_action_repeat=4 for evaluation
[2025-02-01 08:11:21,008][00813] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-02-01 08:11:21,010][00813] RunningMeanStd input shape: (3, 72, 128)
[2025-02-01 08:11:21,012][00813] RunningMeanStd input shape: (1,)
[2025-02-01 08:11:21,029][00813] ConvEncoder: input_channels=3
[2025-02-01 08:11:21,136][00813] Conv encoder output size: 512
[2025-02-01 08:11:21,138][00813] Policy head output size: 512
[2025-02-01 08:11:21,311][00813] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
[2025-02-01 08:11:22,074][00813] Num frames 100...
[2025-02-01 08:11:22,211][00813] Num frames 200...
[2025-02-01 08:11:22,340][00813] Num frames 300...
[2025-02-01 08:11:22,470][00813] Num frames 400...
[2025-02-01 08:11:22,613][00813] Num frames 500...
[2025-02-01 08:11:22,785][00813] Num frames 600...
[2025-02-01 08:11:22,953][00813] Num frames 700...
[2025-02-01 08:11:23,123][00813] Avg episode rewards: #0: 13.680, true rewards: #0: 7.680
[2025-02-01 08:11:23,125][00813] Avg episode reward: 13.680, avg true_objective: 7.680
[2025-02-01 08:11:23,184][00813] Num frames 800...
[2025-02-01 08:11:23,357][00813] Num frames 900...
[2025-02-01 08:11:23,530][00813] Num frames 1000...
[2025-02-01 08:11:23,697][00813] Num frames 1100...
[2025-02-01 08:11:23,888][00813] Num frames 1200...
[2025-02-01 08:11:24,066][00813] Num frames 1300...
[2025-02-01 08:11:24,145][00813] Avg episode rewards: #0: 10.560, true rewards: #0: 6.560
[2025-02-01 08:11:24,147][00813] Avg episode reward: 10.560, avg true_objective: 6.560
[2025-02-01 08:11:24,308][00813] Num frames 1400...
[2025-02-01 08:11:24,499][00813] Num frames 1500...
[2025-02-01 08:11:24,678][00813] Num frames 1600...
[2025-02-01 08:11:24,859][00813] Num frames 1700...
[2025-02-01 08:11:25,049][00813] Num frames 1800...
[2025-02-01 08:11:25,213][00813] Num frames 1900...
[2025-02-01 08:11:25,347][00813] Num frames 2000...
[2025-02-01 08:11:25,485][00813] Num frames 2100...
[2025-02-01 08:11:25,619][00813] Num frames 2200...
[2025-02-01 08:11:25,748][00813] Num frames 2300...
[2025-02-01 08:11:25,880][00813] Num frames 2400...
[2025-02-01 08:11:26,007][00813] Num frames 2500...
[2025-02-01 08:11:26,141][00813] Num frames 2600...
[2025-02-01 08:11:26,219][00813] Avg episode rewards: #0: 17.717, true rewards: #0: 8.717
[2025-02-01 08:11:26,221][00813] Avg episode reward: 17.717, avg true_objective: 8.717
[2025-02-01 08:11:26,333][00813] Num frames 2700...
[2025-02-01 08:11:26,469][00813] Num frames 2800...
[2025-02-01 08:11:26,600][00813] Num frames 2900...
[2025-02-01 08:11:26,731][00813] Num frames 3000...
[2025-02-01 08:11:26,870][00813] Avg episode rewards: #0: 14.658, true rewards: #0: 7.657
[2025-02-01 08:11:26,872][00813] Avg episode reward: 14.658, avg true_objective: 7.657
[2025-02-01 08:11:26,925][00813] Num frames 3100...
[2025-02-01 08:11:27,057][00813] Num frames 3200...
[2025-02-01 08:11:27,188][00813] Num frames 3300...
[2025-02-01 08:11:27,329][00813] Num frames 3400...
[2025-02-01 08:11:27,466][00813] Num frames 3500...
[2025-02-01 08:11:27,597][00813] Num frames 3600...
[2025-02-01 08:11:27,725][00813] Num frames 3700...
[2025-02-01 08:11:27,856][00813] Num frames 3800...
[2025-02-01 08:11:27,986][00813] Num frames 3900...
[2025-02-01 08:11:28,115][00813] Num frames 4000...
[2025-02-01 08:11:28,205][00813] Avg episode rewards: #0: 15.254, true rewards: #0: 8.054
[2025-02-01 08:11:28,207][00813] Avg episode reward: 15.254, avg true_objective: 8.054
[2025-02-01 08:11:28,311][00813] Num frames 4100...
[2025-02-01 08:11:28,446][00813] Num frames 4200...
[2025-02-01 08:11:28,576][00813] Num frames 4300...
[2025-02-01 08:11:28,701][00813] Num frames 4400...
[2025-02-01 08:11:28,826][00813] Num frames 4500...
[2025-02-01 08:11:28,972][00813] Avg episode rewards: #0: 14.452, true rewards: #0: 7.618
[2025-02-01 08:11:28,973][00813] Avg episode reward: 14.452, avg true_objective: 7.618
[2025-02-01 08:11:29,013][00813] Num frames 4600...
[2025-02-01 08:11:29,139][00813] Num frames 4700...
[2025-02-01 08:11:29,274][00813] Num frames 4800...
[2025-02-01 08:11:29,399][00813] Num frames 4900...
[2025-02-01 08:11:29,541][00813] Num frames 5000...
[2025-02-01 08:11:29,670][00813] Num frames 5100...
[2025-02-01 08:11:29,799][00813] Num frames 5200...
[2025-02-01 08:11:29,928][00813] Num frames 5300...
[2025-02-01 08:11:30,056][00813] Num frames 5400...
[2025-02-01 08:11:30,186][00813] Num frames 5500...
[2025-02-01 08:11:30,322][00813] Num frames 5600...
[2025-02-01 08:11:30,458][00813] Num frames 5700...
[2025-02-01 08:11:30,544][00813] Avg episode rewards: #0: 15.604, true rewards: #0: 8.176
[2025-02-01 08:11:30,545][00813] Avg episode reward: 15.604, avg true_objective: 8.176
[2025-02-01 08:11:30,646][00813] Num frames 5800...
[2025-02-01 08:11:30,777][00813] Num frames 5900...
[2025-02-01 08:11:30,905][00813] Num frames 6000...
[2025-02-01 08:11:31,031][00813] Num frames 6100...
[2025-02-01 08:11:31,175][00813] Avg episode rewards: #0: 14.339, true rewards: #0: 7.714
[2025-02-01 08:11:31,177][00813] Avg episode reward: 14.339, avg true_objective: 7.714
[2025-02-01 08:11:31,220][00813] Num frames 6200...
[2025-02-01 08:11:31,357][00813] Num frames 6300...
[2025-02-01 08:11:31,495][00813] Num frames 6400...
[2025-02-01 08:11:31,625][00813] Num frames 6500...
[2025-02-01 08:11:31,752][00813] Num frames 6600...
[2025-02-01 08:11:31,879][00813] Num frames 6700...
[2025-02-01 08:11:32,008][00813] Num frames 6800...
[2025-02-01 08:11:32,135][00813] Num frames 6900...
[2025-02-01 08:11:32,246][00813] Avg episode rewards: #0: 14.493, true rewards: #0: 7.716
[2025-02-01 08:11:32,248][00813] Avg episode reward: 14.493, avg true_objective: 7.716
[2025-02-01 08:11:32,321][00813] Num frames 7000...
[2025-02-01 08:11:32,464][00813] Num frames 7100...
[2025-02-01 08:11:32,598][00813] Num frames 7200...
[2025-02-01 08:11:32,727][00813] Num frames 7300...
[2025-02-01 08:11:32,858][00813] Avg episode rewards: #0: 13.660, true rewards: #0: 7.360
[2025-02-01 08:11:32,860][00813] Avg episode reward: 13.660, avg true_objective: 7.360
[2025-02-01 08:12:17,846][00813] Replay video saved to /content/train_dir/default_experiment/replay.mp4!
[2025-02-01 08:15:13,629][00813] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
[2025-02-01 08:15:13,631][00813] Overriding arg 'num_workers' with value 1 passed from command line
[2025-02-01 08:15:13,633][00813] Adding new argument 'no_render'=True that is not in the saved config file!
[2025-02-01 08:15:13,635][00813] Adding new argument 'save_video'=True that is not in the saved config file!
[2025-02-01 08:15:13,637][00813] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
[2025-02-01 08:15:13,638][00813] Adding new argument 'video_name'=None that is not in the saved config file!
[2025-02-01 08:15:13,640][00813] Adding new argument 'max_num_frames'=100000 that is not in the saved config file!
[2025-02-01 08:15:13,641][00813] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
[2025-02-01 08:15:13,642][00813] Adding new argument 'push_to_hub'=True that is not in the saved config file!
[2025-02-01 08:15:13,643][00813] Adding new argument 'hf_repository'='rootchina/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file!
[2025-02-01 08:15:13,644][00813] Adding new argument 'policy_index'=0 that is not in the saved config file!
[2025-02-01 08:15:13,645][00813] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
[2025-02-01 08:15:13,646][00813] Adding new argument 'train_script'=None that is not in the saved config file!
[2025-02-01 08:15:13,647][00813] Adding new argument 'enjoy_script'=None that is not in the saved config file!
[2025-02-01 08:15:13,648][00813] Using frameskip 1 and render_action_repeat=4 for evaluation
[2025-02-01 08:15:13,657][00813] RunningMeanStd input shape: (3, 72, 128)
[2025-02-01 08:15:13,665][00813] RunningMeanStd input shape: (1,)
[2025-02-01 08:15:13,678][00813] ConvEncoder: input_channels=3
[2025-02-01 08:15:13,712][00813] Conv encoder output size: 512
[2025-02-01 08:15:13,714][00813] Policy head output size: 512
[2025-02-01 08:15:13,733][00813] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
[2025-02-01 08:15:14,157][00813] Num frames 100...
[2025-02-01 08:15:14,293][00813] Num frames 200...
[2025-02-01 08:15:14,423][00813] Num frames 300...
[2025-02-01 08:15:14,576][00813] Num frames 400...
[2025-02-01 08:15:14,702][00813] Num frames 500...
[2025-02-01 08:15:14,830][00813] Num frames 600...
[2025-02-01 08:15:14,958][00813] Num frames 700...
[2025-02-01 08:15:15,082][00813] Num frames 800...
[2025-02-01 08:15:15,209][00813] Num frames 900...
[2025-02-01 08:15:15,345][00813] Num frames 1000...
[2025-02-01 08:15:15,486][00813] Num frames 1100...
[2025-02-01 08:15:15,644][00813] Avg episode rewards: #0: 25.840, true rewards: #0: 11.840
[2025-02-01 08:15:15,646][00813] Avg episode reward: 25.840, avg true_objective: 11.840
[2025-02-01 08:15:15,668][00813] Num frames 1200...
[2025-02-01 08:15:15,791][00813] Num frames 1300...
[2025-02-01 08:15:15,936][00813] Num frames 1400...
[2025-02-01 08:15:16,064][00813] Num frames 1500...
[2025-02-01 08:15:16,191][00813] Num frames 1600...
[2025-02-01 08:15:16,327][00813] Num frames 1700...
[2025-02-01 08:15:16,458][00813] Num frames 1800...
[2025-02-01 08:15:16,621][00813] Num frames 1900...
[2025-02-01 08:15:16,685][00813] Avg episode rewards: #0: 20.025, true rewards: #0: 9.525
[2025-02-01 08:15:16,686][00813] Avg episode reward: 20.025, avg true_objective: 9.525
[2025-02-01 08:15:16,805][00813] Num frames 2000...
[2025-02-01 08:15:16,932][00813] Num frames 2100...
[2025-02-01 08:15:17,058][00813] Num frames 2200...
[2025-02-01 08:15:17,190][00813] Num frames 2300...
[2025-02-01 08:15:17,326][00813] Num frames 2400...
[2025-02-01 08:15:17,461][00813] Num frames 2500...
[2025-02-01 08:15:17,590][00813] Num frames 2600...
[2025-02-01 08:15:17,719][00813] Num frames 2700...
[2025-02-01 08:15:17,771][00813] Avg episode rewards: #0: 19.000, true rewards: #0: 9.000
[2025-02-01 08:15:17,772][00813] Avg episode reward: 19.000, avg true_objective: 9.000
[2025-02-01 08:15:17,899][00813] Num frames 2800...
[2025-02-01 08:15:18,022][00813] Num frames 2900...
[2025-02-01 08:15:18,149][00813] Num frames 3000...
[2025-02-01 08:15:18,277][00813] Num frames 3100...
[2025-02-01 08:15:18,419][00813] Num frames 3200...
[2025-02-01 08:15:18,531][00813] Avg episode rewards: #0: 16.350, true rewards: #0: 8.100
[2025-02-01 08:15:18,534][00813] Avg episode reward: 16.350, avg true_objective: 8.100
[2025-02-01 08:15:18,612][00813] Num frames 3300...
[2025-02-01 08:15:18,741][00813] Num frames 3400...
[2025-02-01 08:15:18,868][00813] Num frames 3500...
[2025-02-01 08:15:18,993][00813] Num frames 3600...
[2025-02-01 08:15:19,117][00813] Num frames 3700...
[2025-02-01 08:15:19,245][00813] Num frames 3800...
[2025-02-01 08:15:19,409][00813] Num frames 3900...
[2025-02-01 08:15:19,590][00813] Num frames 4000...
[2025-02-01 08:15:19,739][00813] Avg episode rewards: #0: 16.504, true rewards: #0: 8.104
[2025-02-01 08:15:19,741][00813] Avg episode reward: 16.504, avg true_objective: 8.104
[2025-02-01 08:15:19,824][00813] Num frames 4100...
[2025-02-01 08:15:19,990][00813] Num frames 4200...
[2025-02-01 08:15:20,162][00813] Num frames 4300...
[2025-02-01 08:15:20,328][00813] Num frames 4400...
[2025-02-01 08:15:20,514][00813] Num frames 4500...
[2025-02-01 08:15:20,680][00813] Num frames 4600...
[2025-02-01 08:15:20,867][00813] Num frames 4700...
[2025-02-01 08:15:21,048][00813] Num frames 4800...
[2025-02-01 08:15:21,137][00813] Avg episode rewards: #0: 15.867, true rewards: #0: 8.033
[2025-02-01 08:15:21,139][00813] Avg episode reward: 15.867, avg true_objective: 8.033
[2025-02-01 08:15:21,287][00813] Num frames 4900...
[2025-02-01 08:15:21,472][00813] Num frames 5000...
[2025-02-01 08:15:21,670][00813] Num frames 5100...
[2025-02-01 08:15:21,840][00813] Num frames 5200...
[2025-02-01 08:15:21,966][00813] Num frames 5300...
[2025-02-01 08:15:22,092][00813] Num frames 5400...
[2025-02-01 08:15:22,223][00813] Num frames 5500...
[2025-02-01 08:15:22,350][00813] Num frames 5600...
[2025-02-01 08:15:22,485][00813] Num frames 5700...
[2025-02-01 08:15:22,620][00813] Num frames 5800...
[2025-02-01 08:15:22,751][00813] Num frames 5900...
[2025-02-01 08:15:22,876][00813] Num frames 6000...
[2025-02-01 08:15:22,937][00813] Avg episode rewards: #0: 17.149, true rewards: #0: 8.577
[2025-02-01 08:15:22,938][00813] Avg episode reward: 17.149, avg true_objective: 8.577
[2025-02-01 08:15:23,057][00813] Num frames 6100...
[2025-02-01 08:15:23,184][00813] Num frames 6200...
[2025-02-01 08:15:23,309][00813] Num frames 6300...
[2025-02-01 08:15:23,442][00813] Num frames 6400...
[2025-02-01 08:15:23,584][00813] Num frames 6500...
[2025-02-01 08:15:23,711][00813] Num frames 6600...
[2025-02-01 08:15:23,844][00813] Num frames 6700...
[2025-02-01 08:15:23,970][00813] Num frames 6800...
[2025-02-01 08:15:24,097][00813] Num frames 6900...
[2025-02-01 08:15:24,224][00813] Num frames 7000...
[2025-02-01 08:15:24,354][00813] Num frames 7100...
[2025-02-01 08:15:24,490][00813] Num frames 7200...
[2025-02-01 08:15:24,627][00813] Num frames 7300...
[2025-02-01 08:15:24,760][00813] Num frames 7400...
[2025-02-01 08:15:24,888][00813] Num frames 7500...
[2025-02-01 08:15:25,018][00813] Num frames 7600...
[2025-02-01 08:15:25,146][00813] Num frames 7700...
[2025-02-01 08:15:25,277][00813] Num frames 7800...
[2025-02-01 08:15:25,407][00813] Num frames 7900...
[2025-02-01 08:15:25,588][00813] Avg episode rewards: #0: 20.610, true rewards: #0: 9.985
[2025-02-01 08:15:25,589][00813] Avg episode reward: 20.610, avg true_objective: 9.985
[2025-02-01 08:15:25,612][00813] Num frames 8000...
[2025-02-01 08:15:25,745][00813] Num frames 8100...
[2025-02-01 08:15:25,870][00813] Num frames 8200...
[2025-02-01 08:15:25,995][00813] Num frames 8300...
[2025-02-01 08:15:26,121][00813] Num frames 8400...
[2025-02-01 08:15:26,278][00813] Num frames 8500...
[2025-02-01 08:15:26,461][00813] Num frames 8600...
[2025-02-01 08:15:26,643][00813] Num frames 8700...
[2025-02-01 08:15:26,800][00813] Avg episode rewards: #0: 19.729, true rewards: #0: 9.729
[2025-02-01 08:15:26,802][00813] Avg episode reward: 19.729, avg true_objective: 9.729
[2025-02-01 08:15:26,870][00813] Num frames 8800...
[2025-02-01 08:15:26,998][00813] Num frames 8900...
[2025-02-01 08:15:27,126][00813] Num frames 9000...
[2025-02-01 08:15:27,254][00813] Num frames 9100...
[2025-02-01 08:15:27,387][00813] Num frames 9200...
[2025-02-01 08:15:27,523][00813] Num frames 9300...
[2025-02-01 08:15:27,651][00813] Num frames 9400...
[2025-02-01 08:15:27,791][00813] Num frames 9500...
[2025-02-01 08:15:27,920][00813] Num frames 9600...
[2025-02-01 08:15:28,050][00813] Num frames 9700...
[2025-02-01 08:15:28,181][00813] Num frames 9800...
[2025-02-01 08:15:28,313][00813] Num frames 9900...
[2025-02-01 08:15:28,446][00813] Num frames 10000...
[2025-02-01 08:15:28,552][00813] Avg episode rewards: #0: 20.536, true rewards: #0: 10.036
[2025-02-01 08:15:28,553][00813] Avg episode reward: 20.536, avg true_objective: 10.036
[2025-02-01 08:16:29,036][00813] Replay video saved to /content/train_dir/default_experiment/replay.mp4!