diff --git "a/sf_log.txt" "b/sf_log.txt" new file mode 100644--- /dev/null +++ "b/sf_log.txt" @@ -0,0 +1,36161 @@ +[2024-11-07 12:11:25,065][118435] Saving configuration to /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/config.json... +[2024-11-07 12:11:25,068][118435] Rollout worker 0 uses device cpu +[2024-11-07 12:11:25,069][118435] Rollout worker 1 uses device cpu +[2024-11-07 12:11:25,070][118435] Rollout worker 2 uses device cpu +[2024-11-07 12:11:25,071][118435] Rollout worker 3 uses device cpu +[2024-11-07 12:11:25,072][118435] Rollout worker 4 uses device cpu +[2024-11-07 12:11:25,073][118435] Rollout worker 5 uses device cpu +[2024-11-07 12:11:25,073][118435] Rollout worker 6 uses device cpu +[2024-11-07 12:11:25,074][118435] Rollout worker 7 uses device cpu +[2024-11-07 12:11:25,221][118435] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-11-07 12:11:25,222][118435] InferenceWorker_p0-w0: min num requests: 2 +[2024-11-07 12:11:25,257][118435] Starting all processes... +[2024-11-07 12:11:25,258][118435] Starting process learner_proc0 +[2024-11-07 12:11:25,372][118435] Starting all processes... +[2024-11-07 12:11:25,465][118435] Starting process inference_proc0-0 +[2024-11-07 12:11:25,466][118435] Starting process rollout_proc0 +[2024-11-07 12:11:25,467][118435] Starting process rollout_proc1 +[2024-11-07 12:11:25,467][118435] Starting process rollout_proc2 +[2024-11-07 12:11:25,470][118435] Starting process rollout_proc3 +[2024-11-07 12:11:25,475][118435] Starting process rollout_proc4 +[2024-11-07 12:11:25,475][118435] Starting process rollout_proc5 +[2024-11-07 12:11:25,476][118435] Starting process rollout_proc6 +[2024-11-07 12:11:25,477][118435] Starting process rollout_proc7 +[2024-11-07 12:11:32,755][118900] Worker 3 uses CPU cores [3] +[2024-11-07 12:11:32,895][118881] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-11-07 12:11:32,896][118881] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 +[2024-11-07 12:11:33,210][118881] Num visible devices: 1 +[2024-11-07 12:11:33,257][118881] Starting seed is not provided +[2024-11-07 12:11:33,257][118881] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-11-07 12:11:33,258][118881] Initializing actor-critic model on device cuda:0 +[2024-11-07 12:11:33,258][118881] RunningMeanStd input shape: (3, 72, 128) +[2024-11-07 12:11:33,260][118881] RunningMeanStd input shape: (1,) +[2024-11-07 12:11:33,371][118881] ConvEncoder: input_channels=3 +[2024-11-07 12:11:33,723][118881] Conv encoder output size: 512 +[2024-11-07 12:11:33,724][118881] Policy head output size: 512 +[2024-11-07 12:11:33,957][118904] Worker 6 uses CPU cores [6] +[2024-11-07 12:11:34,139][118901] Worker 2 uses CPU cores [2] +[2024-11-07 12:11:34,178][118899] Worker 1 uses CPU cores [1] +[2024-11-07 12:11:34,283][118903] Worker 5 uses CPU cores [5] +[2024-11-07 12:11:34,454][118897] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-11-07 12:11:34,455][118897] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 +[2024-11-07 12:11:34,489][118897] Num visible devices: 1 +[2024-11-07 12:11:34,503][118902] Worker 4 uses CPU cores [4] +[2024-11-07 12:11:34,563][118911] Worker 7 uses CPU cores [0, 1, 2, 3, 4, 5, 6] +[2024-11-07 12:11:34,612][118898] Worker 0 uses CPU cores [0] +[2024-11-07 12:11:34,638][118881] Created Actor Critic model with architecture: +[2024-11-07 12:11:34,638][118881] ActorCriticSharedWeights( + (obs_normalizer): ObservationNormalizer( + (running_mean_std): RunningMeanStdDictInPlace( + (running_mean_std): ModuleDict( + (obs): RunningMeanStdInPlace() + ) + ) + ) + (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) + (encoder): VizdoomEncoder( + (basic_encoder): ConvEncoder( + (enc): RecursiveScriptModule( + original_name=ConvEncoderImpl + (conv_head): RecursiveScriptModule( + original_name=Sequential + (0): RecursiveScriptModule(original_name=Conv2d) + (1): RecursiveScriptModule(original_name=ELU) + (2): RecursiveScriptModule(original_name=Conv2d) + (3): RecursiveScriptModule(original_name=ELU) + (4): RecursiveScriptModule(original_name=Conv2d) + (5): RecursiveScriptModule(original_name=ELU) + ) + (mlp_layers): RecursiveScriptModule( + original_name=Sequential + (0): RecursiveScriptModule(original_name=Linear) + (1): RecursiveScriptModule(original_name=ELU) + ) + ) + ) + ) + (core): ModelCoreRNN( + (core): GRU(512, 512) + ) + (decoder): MlpDecoder( + (mlp): Identity() + ) + (critic_linear): Linear(in_features=512, out_features=1, bias=True) + (action_parameterization): ActionParameterizationDefault( + (distribution_linear): Linear(in_features=512, out_features=5, bias=True) + ) +) +[2024-11-07 12:11:36,780][118881] Using optimizer +[2024-11-07 12:11:43,557][118435] Keyboard interrupt detected in the event loop EvtLoop [Runner_EvtLoop, process=main process 118435], exiting... +[2024-11-07 12:11:43,559][118899] Stopping RolloutWorker_w1... +[2024-11-07 12:11:43,559][118898] Stopping RolloutWorker_w0... +[2024-11-07 12:11:43,560][118911] Stopping RolloutWorker_w7... +[2024-11-07 12:11:43,560][118898] Loop rollout_proc0_evt_loop terminating... +[2024-11-07 12:11:43,560][118899] Loop rollout_proc1_evt_loop terminating... +[2024-11-07 12:11:43,560][118903] Stopping RolloutWorker_w5... +[2024-11-07 12:11:43,560][118897] Stopping InferenceWorker_p0-w0... +[2024-11-07 12:11:43,561][118903] Loop rollout_proc5_evt_loop terminating... +[2024-11-07 12:11:43,561][118897] Loop inference_proc0-0_evt_loop terminating... +[2024-11-07 12:11:43,561][118901] Stopping RolloutWorker_w2... +[2024-11-07 12:11:43,561][118900] Stopping RolloutWorker_w3... +[2024-11-07 12:11:43,561][118901] Loop rollout_proc2_evt_loop terminating... +[2024-11-07 12:11:43,562][118900] Loop rollout_proc3_evt_loop terminating... +[2024-11-07 12:11:43,560][118435] Runner profile tree view: +main_loop: 18.3035 +[2024-11-07 12:11:43,569][118904] Stopping RolloutWorker_w6... +[2024-11-07 12:11:43,569][118902] Stopping RolloutWorker_w4... +[2024-11-07 12:11:43,569][118904] Loop rollout_proc6_evt_loop terminating... +[2024-11-07 12:11:43,570][118902] Loop rollout_proc4_evt_loop terminating... +[2024-11-07 12:11:43,569][118435] Collected {}, FPS: 0.0 +[2024-11-07 12:11:43,572][118911] Loop rollout_proc7_evt_loop terminating... +[2024-11-07 12:11:43,604][118881] Stopping Batcher_0... +[2024-11-07 12:11:43,604][118881] Loop batcher_evt_loop terminating... +[2024-11-07 12:11:43,805][118881] No checkpoints found +[2024-11-07 12:11:43,805][118881] Did not load from checkpoint, starting from scratch! +[2024-11-07 12:11:43,829][118881] Initialized policy 0 weights for model version 0 +[2024-11-07 12:11:43,887][118881] LearnerWorker_p0 finished initialization! +[2024-11-07 12:11:43,890][118881] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000000_0.pth... +[2024-11-07 12:11:44,125][118881] Stopping LearnerWorker_p0... +[2024-11-07 12:11:44,126][118881] Loop learner_proc0_evt_loop terminating... +[2024-11-07 12:16:30,455][118435] Environment doom_basic already registered, overwriting... +[2024-11-07 12:16:30,458][118435] Environment doom_two_colors_easy already registered, overwriting... +[2024-11-07 12:16:30,459][118435] Environment doom_two_colors_hard already registered, overwriting... +[2024-11-07 12:16:30,461][118435] Environment doom_dm already registered, overwriting... +[2024-11-07 12:16:30,462][118435] Environment doom_dwango5 already registered, overwriting... +[2024-11-07 12:16:30,465][118435] Environment doom_my_way_home_flat_actions already registered, overwriting... +[2024-11-07 12:16:30,467][118435] Environment doom_defend_the_center_flat_actions already registered, overwriting... +[2024-11-07 12:16:30,468][118435] Environment doom_my_way_home already registered, overwriting... +[2024-11-07 12:16:30,471][118435] Environment doom_deadly_corridor already registered, overwriting... +[2024-11-07 12:16:30,473][118435] Environment doom_defend_the_center already registered, overwriting... +[2024-11-07 12:16:30,474][118435] Environment doom_defend_the_line already registered, overwriting... +[2024-11-07 12:16:30,476][118435] Environment doom_health_gathering already registered, overwriting... +[2024-11-07 12:16:30,477][118435] Environment doom_health_gathering_supreme already registered, overwriting... +[2024-11-07 12:16:30,478][118435] Environment doom_battle already registered, overwriting... +[2024-11-07 12:16:30,481][118435] Environment doom_battle2 already registered, overwriting... +[2024-11-07 12:16:30,482][118435] Environment doom_duel_bots already registered, overwriting... +[2024-11-07 12:16:30,485][118435] Environment doom_deathmatch_bots already registered, overwriting... +[2024-11-07 12:16:30,486][118435] Environment doom_duel already registered, overwriting... +[2024-11-07 12:16:30,488][118435] Environment doom_deathmatch_full already registered, overwriting... +[2024-11-07 12:16:30,489][118435] Environment doom_benchmark already registered, overwriting... +[2024-11-07 12:16:30,490][118435] register_encoder_factory: +[2024-11-07 12:16:30,506][118435] Loading existing experiment configuration from /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/config.json +[2024-11-07 12:16:30,519][118435] Experiment dir /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment already exists! +[2024-11-07 12:16:30,521][118435] Resuming existing experiment from /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment... +[2024-11-07 12:16:30,523][118435] Weights and Biases integration disabled +[2024-11-07 12:16:30,527][118435] Environment var CUDA_VISIBLE_DEVICES is 0 + +[2024-11-07 12:16:35,987][118435] Starting experiment with the following configuration: +help=False +algo=APPO +env=doom_health_gathering_supreme +experiment=default_experiment +train_dir=/root/hfRL/ml/LunarLander-v2/train_dir +restart_behavior=resume +device=gpu +seed=None +num_policies=1 +async_rl=True +serial_mode=False +batched_sampling=False +num_batches_to_accumulate=2 +worker_num_splits=2 +policy_workers_per_policy=1 +max_policy_lag=1000 +num_workers=8 +num_envs_per_worker=4 +batch_size=1024 +num_batches_per_epoch=1 +num_epochs=1 +rollout=32 +recurrence=32 +shuffle_minibatches=False +gamma=0.99 +reward_scale=1.0 +reward_clip=1000.0 +value_bootstrap=False +normalize_returns=True +exploration_loss_coeff=0.001 +value_loss_coeff=0.5 +kl_loss_coeff=0.0 +exploration_loss=symmetric_kl +gae_lambda=0.95 +ppo_clip_ratio=0.1 +ppo_clip_value=0.2 +with_vtrace=False +vtrace_rho=1.0 +vtrace_c=1.0 +optimizer=adam +adam_eps=1e-06 +adam_beta1=0.9 +adam_beta2=0.999 +max_grad_norm=4.0 +learning_rate=0.0001 +lr_schedule=constant +lr_schedule_kl_threshold=0.008 +lr_adaptive_min=1e-06 +lr_adaptive_max=0.01 +obs_subtract_mean=0.0 +obs_scale=255.0 +normalize_input=True +normalize_input_keys=None +decorrelate_experience_max_seconds=0 +decorrelate_envs_on_one_worker=True +actor_worker_gpus=[] +set_workers_cpu_affinity=True +force_envs_single_thread=False +default_niceness=0 +log_to_file=True +experiment_summaries_interval=10 +flush_summaries_interval=30 +stats_avg=100 +summaries_use_frameskip=True +heartbeat_interval=20 +heartbeat_reporting_interval=600 +train_for_env_steps=4000000 +train_for_seconds=10000000000 +save_every_sec=120 +keep_checkpoints=2 +load_checkpoint_kind=latest +save_milestones_sec=-1 +save_best_every_sec=5 +save_best_metric=reward +save_best_after=100000 +benchmark=False +encoder_mlp_layers=[512, 512] +encoder_conv_architecture=convnet_simple +encoder_conv_mlp_layers=[512] +use_rnn=True +rnn_size=512 +rnn_type=gru +rnn_num_layers=1 +decoder_mlp_layers=[] +nonlinearity=elu +policy_initialization=orthogonal +policy_init_gain=1.0 +actor_critic_share_weights=True +adaptive_stddev=True +continuous_tanh_scale=0.0 +initial_stddev=1.0 +use_env_info_cache=False +env_gpu_actions=False +env_gpu_observations=True +env_frameskip=4 +env_framestack=1 +pixel_format=CHW +use_record_episode_statistics=False +with_wandb=False +wandb_user=None +wandb_project=sample_factory +wandb_group=None +wandb_job_type=SF +wandb_tags=[] +with_pbt=False +pbt_mix_policies_in_one_env=True +pbt_period_env_steps=5000000 +pbt_start_mutation=20000000 +pbt_replace_fraction=0.3 +pbt_mutation_rate=0.15 +pbt_replace_reward_gap=0.1 +pbt_replace_reward_gap_absolute=1e-06 +pbt_optimize_gamma=False +pbt_target_objective=true_objective +pbt_perturb_min=1.1 +pbt_perturb_max=1.5 +num_agents=-1 +num_humans=0 +num_bots=-1 +start_bot_difficulty=None +timelimit=None +res_w=128 +res_h=72 +wide_aspect_ratio=False +eval_env_frameskip=1 +fps=35 +command_line=--env=doom_health_gathering_supreme --num_workers=8 --num_envs_per_worker=4 --train_for_env_steps=4000000 +cli_args={'env': 'doom_health_gathering_supreme', 'num_workers': 8, 'num_envs_per_worker': 4, 'train_for_env_steps': 4000000} +git_hash=unknown +git_repo_name=not a git repository +[2024-11-07 12:16:35,989][118435] Saving configuration to /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/config.json... +[2024-11-07 12:16:35,990][118435] Rollout worker 0 uses device cpu +[2024-11-07 12:16:35,991][118435] Rollout worker 1 uses device cpu +[2024-11-07 12:16:35,993][118435] Rollout worker 2 uses device cpu +[2024-11-07 12:16:35,994][118435] Rollout worker 3 uses device cpu +[2024-11-07 12:16:35,995][118435] Rollout worker 4 uses device cpu +[2024-11-07 12:16:35,996][118435] Rollout worker 5 uses device cpu +[2024-11-07 12:16:35,997][118435] Rollout worker 6 uses device cpu +[2024-11-07 12:16:35,999][118435] Rollout worker 7 uses device cpu +[2024-11-07 12:16:36,166][118435] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-11-07 12:16:36,168][118435] InferenceWorker_p0-w0: min num requests: 2 +[2024-11-07 12:16:36,203][118435] Starting all processes... +[2024-11-07 12:16:36,204][118435] Starting process learner_proc0 +[2024-11-07 12:16:36,252][118435] Starting all processes... +[2024-11-07 12:16:36,257][118435] Starting process inference_proc0-0 +[2024-11-07 12:16:36,258][118435] Starting process rollout_proc0 +[2024-11-07 12:16:36,259][118435] Starting process rollout_proc1 +[2024-11-07 12:16:36,259][118435] Starting process rollout_proc2 +[2024-11-07 12:16:36,260][118435] Starting process rollout_proc3 +[2024-11-07 12:16:36,264][118435] Starting process rollout_proc4 +[2024-11-07 12:16:36,265][118435] Starting process rollout_proc5 +[2024-11-07 12:16:36,266][118435] Starting process rollout_proc6 +[2024-11-07 12:16:36,266][118435] Starting process rollout_proc7 +[2024-11-07 12:16:42,024][121082] Worker 6 uses CPU cores [6] +[2024-11-07 12:16:42,134][121076] Worker 0 uses CPU cores [0] +[2024-11-07 12:16:42,334][121080] Worker 5 uses CPU cores [5] +[2024-11-07 12:16:42,498][121062] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-11-07 12:16:42,498][121062] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 +[2024-11-07 12:16:42,545][121075] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-11-07 12:16:42,546][121075] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 +[2024-11-07 12:16:42,646][121075] Num visible devices: 1 +[2024-11-07 12:16:42,647][121062] Num visible devices: 1 +[2024-11-07 12:16:42,665][121077] Worker 1 uses CPU cores [1] +[2024-11-07 12:16:42,675][121062] Starting seed is not provided +[2024-11-07 12:16:42,675][121062] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-11-07 12:16:42,675][121062] Initializing actor-critic model on device cuda:0 +[2024-11-07 12:16:42,676][121062] RunningMeanStd input shape: (3, 72, 128) +[2024-11-07 12:16:42,679][121062] RunningMeanStd input shape: (1,) +[2024-11-07 12:16:42,703][121062] ConvEncoder: input_channels=3 +[2024-11-07 12:16:42,715][121081] Worker 3 uses CPU cores [3] +[2024-11-07 12:16:43,044][121078] Worker 2 uses CPU cores [2] +[2024-11-07 12:16:43,051][121079] Worker 4 uses CPU cores [4] +[2024-11-07 12:16:43,067][121083] Worker 7 uses CPU cores [0, 1, 2, 3, 4, 5, 6] +[2024-11-07 12:16:43,335][121062] Conv encoder output size: 512 +[2024-11-07 12:16:43,335][121062] Policy head output size: 512 +[2024-11-07 12:16:43,380][121062] Created Actor Critic model with architecture: +[2024-11-07 12:16:43,380][121062] ActorCriticSharedWeights( + (obs_normalizer): ObservationNormalizer( + (running_mean_std): RunningMeanStdDictInPlace( + (running_mean_std): ModuleDict( + (obs): RunningMeanStdInPlace() + ) + ) + ) + (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) + (encoder): VizdoomEncoder( + (basic_encoder): ConvEncoder( + (enc): RecursiveScriptModule( + original_name=ConvEncoderImpl + (conv_head): RecursiveScriptModule( + original_name=Sequential + (0): RecursiveScriptModule(original_name=Conv2d) + (1): RecursiveScriptModule(original_name=ELU) + (2): RecursiveScriptModule(original_name=Conv2d) + (3): RecursiveScriptModule(original_name=ELU) + (4): RecursiveScriptModule(original_name=Conv2d) + (5): RecursiveScriptModule(original_name=ELU) + ) + (mlp_layers): RecursiveScriptModule( + original_name=Sequential + (0): RecursiveScriptModule(original_name=Linear) + (1): RecursiveScriptModule(original_name=ELU) + ) + ) + ) + ) + (core): ModelCoreRNN( + (core): GRU(512, 512) + ) + (decoder): MlpDecoder( + (mlp): Identity() + ) + (critic_linear): Linear(in_features=512, out_features=1, bias=True) + (action_parameterization): ActionParameterizationDefault( + (distribution_linear): Linear(in_features=512, out_features=5, bias=True) + ) +) +[2024-11-07 12:16:44,388][121062] Using optimizer +[2024-11-07 12:16:47,810][121062] Loading state from checkpoint /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000000_0.pth... +[2024-11-07 12:16:47,858][121062] Loading model from checkpoint +[2024-11-07 12:16:47,860][121062] Loaded experiment state at self.train_step=0, self.env_steps=0 +[2024-11-07 12:16:47,861][121062] Initialized policy 0 weights for model version 0 +[2024-11-07 12:16:47,868][121062] LearnerWorker_p0 finished initialization! +[2024-11-07 12:16:47,868][121062] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-11-07 12:16:48,092][121075] RunningMeanStd input shape: (3, 72, 128) +[2024-11-07 12:16:48,094][121075] RunningMeanStd input shape: (1,) +[2024-11-07 12:16:48,106][121075] ConvEncoder: input_channels=3 +[2024-11-07 12:16:48,225][121075] Conv encoder output size: 512 +[2024-11-07 12:16:48,225][121075] Policy head output size: 512 +[2024-11-07 12:16:48,281][118435] Inference worker 0-0 is ready! +[2024-11-07 12:16:48,282][118435] All inference workers are ready! Signal rollout workers to start! +[2024-11-07 12:16:48,365][121081] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 12:16:48,371][121080] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 12:16:48,371][121079] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 12:16:48,376][121077] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 12:16:48,378][121076] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 12:16:48,381][121082] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 12:16:48,388][121083] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 12:16:48,396][121078] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 12:16:50,528][118435] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-11-07 12:16:54,744][121081] Decorrelating experience for 0 frames... +[2024-11-07 12:16:54,744][121079] Decorrelating experience for 0 frames... +[2024-11-07 12:16:55,091][121081] Decorrelating experience for 32 frames... +[2024-11-07 12:16:55,528][118435] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-11-07 12:16:56,054][118435] Heartbeat connected on Batcher_0 +[2024-11-07 12:16:56,058][118435] Heartbeat connected on LearnerWorker_p0 +[2024-11-07 12:16:56,198][118435] Heartbeat connected on InferenceWorker_p0-w0 +[2024-11-07 12:17:00,528][118435] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-11-07 12:17:06,999][118435] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-11-07 12:17:09,503][118435] Keyboard interrupt detected in the event loop EvtLoop [Runner_EvtLoop, process=main process 118435], exiting... +[2024-11-07 12:17:09,506][118435] Runner profile tree view: +main_loop: 33.3031 +[2024-11-07 12:17:09,506][121062] Stopping Batcher_0... +[2024-11-07 12:17:09,509][121062] Loop batcher_evt_loop terminating... +[2024-11-07 12:17:09,508][118435] Collected {0: 0}, FPS: 0.0 +[2024-11-07 12:17:09,510][121062] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000000_0.pth... +[2024-11-07 12:17:09,569][121075] Weights refcount: 2 0 +[2024-11-07 12:17:09,572][121075] Stopping InferenceWorker_p0-w0... +[2024-11-07 12:17:09,572][121075] Loop inference_proc0-0_evt_loop terminating... +[2024-11-07 12:17:09,626][121062] Stopping LearnerWorker_p0... +[2024-11-07 12:17:09,627][121062] Loop learner_proc0_evt_loop terminating... +[2024-11-07 12:17:09,637][121083] Decorrelating experience for 0 frames... +[2024-11-07 12:17:10,364][121082] Another process currently holds the lock /tmp/sf2_root/doom_003.lockfile, attempt: 1 +[2024-11-07 12:17:16,520][121079] Another process currently holds the lock /tmp/sf2_root/doom_003.lockfile, attempt: 1 +[2024-11-07 12:17:16,979][121081] Another process currently holds the lock /tmp/sf2_root/doom_003.lockfile, attempt: 1 +[2024-11-07 12:17:17,874][121083] Decorrelating experience for 32 frames... +[2024-11-07 12:17:18,292][121083] Decorrelating experience for 64 frames... +[2024-11-07 12:17:18,658][121083] Decorrelating experience for 96 frames... +[2024-11-07 12:17:19,206][121083] Stopping RolloutWorker_w7... +[2024-11-07 12:17:19,207][121083] Loop rollout_proc7_evt_loop terminating... +[2024-11-07 12:17:26,095][121082] Decorrelating experience for 0 frames... +[2024-11-07 12:17:26,430][121079] Decorrelating experience for 32 frames... +[2024-11-07 12:17:26,739][121081] Decorrelating experience for 64 frames... +[2024-11-07 12:17:26,811][121079] Decorrelating experience for 64 frames... +[2024-11-07 12:17:27,157][121081] Decorrelating experience for 96 frames... +[2024-11-07 12:17:27,213][121079] Decorrelating experience for 96 frames... +[2024-11-07 12:17:27,249][121081] Stopping RolloutWorker_w3... +[2024-11-07 12:17:27,249][121081] Loop rollout_proc3_evt_loop terminating... +[2024-11-07 12:17:27,328][121079] Stopping RolloutWorker_w4... +[2024-11-07 12:17:27,328][121079] Loop rollout_proc4_evt_loop terminating... +[2024-11-07 12:17:34,653][121082] Decorrelating experience for 32 frames... +[2024-11-07 12:17:35,009][121082] Decorrelating experience for 64 frames... +[2024-11-07 12:17:35,369][121082] Decorrelating experience for 96 frames... +[2024-11-07 12:17:35,594][121082] Stopping RolloutWorker_w6... +[2024-11-07 12:17:35,595][121082] Loop rollout_proc6_evt_loop terminating... +[2024-11-07 12:17:50,263][122819] Saving configuration to /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/config.json... +[2024-11-07 12:17:50,264][122819] Rollout worker 0 uses device cpu +[2024-11-07 12:17:50,266][122819] Rollout worker 1 uses device cpu +[2024-11-07 12:17:50,266][122819] Rollout worker 2 uses device cpu +[2024-11-07 12:17:50,267][122819] Rollout worker 3 uses device cpu +[2024-11-07 12:17:50,268][122819] Rollout worker 4 uses device cpu +[2024-11-07 12:17:50,268][122819] Rollout worker 5 uses device cpu +[2024-11-07 12:17:50,269][122819] Rollout worker 6 uses device cpu +[2024-11-07 12:17:50,270][122819] Rollout worker 7 uses device cpu +[2024-11-07 12:17:50,323][122819] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-11-07 12:17:50,324][122819] InferenceWorker_p0-w0: min num requests: 2 +[2024-11-07 12:17:50,355][122819] Starting all processes... +[2024-11-07 12:17:50,356][122819] Starting process learner_proc0 +[2024-11-07 12:17:50,483][122819] Starting all processes... +[2024-11-07 12:17:50,821][122819] Starting process inference_proc0-0 +[2024-11-07 12:17:50,822][122819] Starting process rollout_proc0 +[2024-11-07 12:17:50,822][122819] Starting process rollout_proc1 +[2024-11-07 12:17:50,823][122819] Starting process rollout_proc2 +[2024-11-07 12:17:50,823][122819] Starting process rollout_proc3 +[2024-11-07 12:17:50,824][122819] Starting process rollout_proc4 +[2024-11-07 12:17:50,824][122819] Starting process rollout_proc5 +[2024-11-07 12:17:50,828][122819] Starting process rollout_proc6 +[2024-11-07 12:17:50,829][122819] Starting process rollout_proc7 +[2024-11-07 12:17:55,453][122943] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-11-07 12:17:55,454][122943] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 +[2024-11-07 12:17:55,543][122944] Worker 1 uses CPU cores [1] +[2024-11-07 12:17:55,604][122929] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-11-07 12:17:55,604][122929] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 +[2024-11-07 12:17:55,716][122943] Num visible devices: 1 +[2024-11-07 12:17:55,735][122929] Num visible devices: 1 +[2024-11-07 12:17:55,780][122929] Starting seed is not provided +[2024-11-07 12:17:55,780][122929] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-11-07 12:17:55,781][122929] Initializing actor-critic model on device cuda:0 +[2024-11-07 12:17:55,781][122929] RunningMeanStd input shape: (3, 72, 128) +[2024-11-07 12:17:55,784][122929] RunningMeanStd input shape: (1,) +[2024-11-07 12:17:55,829][122929] ConvEncoder: input_channels=3 +[2024-11-07 12:17:55,890][122947] Worker 4 uses CPU cores [4] +[2024-11-07 12:17:55,970][122945] Worker 2 uses CPU cores [2] +[2024-11-07 12:17:55,995][122942] Worker 0 uses CPU cores [0] +[2024-11-07 12:17:56,101][122948] Worker 5 uses CPU cores [5] +[2024-11-07 12:17:56,137][122929] Conv encoder output size: 512 +[2024-11-07 12:17:56,137][122929] Policy head output size: 512 +[2024-11-07 12:17:56,170][122946] Worker 3 uses CPU cores [3] +[2024-11-07 12:17:56,174][122929] Created Actor Critic model with architecture: +[2024-11-07 12:17:56,175][122929] ActorCriticSharedWeights( + (obs_normalizer): ObservationNormalizer( + (running_mean_std): RunningMeanStdDictInPlace( + (running_mean_std): ModuleDict( + (obs): RunningMeanStdInPlace() + ) + ) + ) + (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) + (encoder): VizdoomEncoder( + (basic_encoder): ConvEncoder( + (enc): RecursiveScriptModule( + original_name=ConvEncoderImpl + (conv_head): RecursiveScriptModule( + original_name=Sequential + (0): RecursiveScriptModule(original_name=Conv2d) + (1): RecursiveScriptModule(original_name=ELU) + (2): RecursiveScriptModule(original_name=Conv2d) + (3): RecursiveScriptModule(original_name=ELU) + (4): RecursiveScriptModule(original_name=Conv2d) + (5): RecursiveScriptModule(original_name=ELU) + ) + (mlp_layers): RecursiveScriptModule( + original_name=Sequential + (0): RecursiveScriptModule(original_name=Linear) + (1): RecursiveScriptModule(original_name=ELU) + ) + ) + ) + ) + (core): ModelCoreRNN( + (core): GRU(512, 512) + ) + (decoder): MlpDecoder( + (mlp): Identity() + ) + (critic_linear): Linear(in_features=512, out_features=1, bias=True) + (action_parameterization): ActionParameterizationDefault( + (distribution_linear): Linear(in_features=512, out_features=5, bias=True) + ) +) +[2024-11-07 12:17:56,275][122956] Worker 7 uses CPU cores [0, 1, 2, 3, 4, 5, 6] +[2024-11-07 12:17:56,391][122949] Worker 6 uses CPU cores [6] +[2024-11-07 12:17:56,939][122929] Using optimizer +[2024-11-07 12:17:59,196][122929] Loading state from checkpoint /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000000_0.pth... +[2024-11-07 12:17:59,227][122929] Loading model from checkpoint +[2024-11-07 12:17:59,228][122929] Loaded experiment state at self.train_step=0, self.env_steps=0 +[2024-11-07 12:17:59,229][122929] Initialized policy 0 weights for model version 0 +[2024-11-07 12:17:59,236][122929] LearnerWorker_p0 finished initialization! +[2024-11-07 12:17:59,239][122929] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-11-07 12:17:59,395][122943] RunningMeanStd input shape: (3, 72, 128) +[2024-11-07 12:17:59,396][122943] RunningMeanStd input shape: (1,) +[2024-11-07 12:17:59,408][122943] ConvEncoder: input_channels=3 +[2024-11-07 12:17:59,539][122943] Conv encoder output size: 512 +[2024-11-07 12:17:59,539][122943] Policy head output size: 512 +[2024-11-07 12:17:59,585][122819] Inference worker 0-0 is ready! +[2024-11-07 12:17:59,586][122819] All inference workers are ready! Signal rollout workers to start! +[2024-11-07 12:17:59,725][122942] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 12:17:59,728][122947] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 12:17:59,738][122949] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 12:17:59,742][122946] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 12:17:59,750][122944] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 12:17:59,779][122945] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 12:17:59,787][122956] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 12:17:59,820][122948] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 12:18:00,172][122947] Decorrelating experience for 0 frames... +[2024-11-07 12:18:00,200][122942] Decorrelating experience for 0 frames... +[2024-11-07 12:18:00,223][122944] Decorrelating experience for 0 frames... +[2024-11-07 12:18:00,223][122949] Decorrelating experience for 0 frames... +[2024-11-07 12:18:00,228][122956] Decorrelating experience for 0 frames... +[2024-11-07 12:18:00,277][122948] Decorrelating experience for 0 frames... +[2024-11-07 12:18:00,565][122947] Decorrelating experience for 32 frames... +[2024-11-07 12:18:00,584][122949] Decorrelating experience for 32 frames... +[2024-11-07 12:18:00,590][122956] Decorrelating experience for 32 frames... +[2024-11-07 12:18:00,628][122944] Decorrelating experience for 32 frames... +[2024-11-07 12:18:01,122][122942] Decorrelating experience for 32 frames... +[2024-11-07 12:18:01,147][122945] Decorrelating experience for 0 frames... +[2024-11-07 12:18:01,268][122947] Decorrelating experience for 64 frames... +[2024-11-07 12:18:01,299][122948] Decorrelating experience for 32 frames... +[2024-11-07 12:18:01,303][122956] Decorrelating experience for 64 frames... +[2024-11-07 12:18:01,352][122944] Decorrelating experience for 64 frames... +[2024-11-07 12:18:01,608][122946] Decorrelating experience for 0 frames... +[2024-11-07 12:18:01,625][122945] Decorrelating experience for 32 frames... +[2024-11-07 12:18:01,944][122947] Decorrelating experience for 96 frames... +[2024-11-07 12:18:01,995][122956] Decorrelating experience for 96 frames... +[2024-11-07 12:18:01,995][122944] Decorrelating experience for 96 frames... +[2024-11-07 12:18:02,015][122942] Decorrelating experience for 64 frames... +[2024-11-07 12:18:02,102][122948] Decorrelating experience for 64 frames... +[2024-11-07 12:18:02,162][122946] Decorrelating experience for 32 frames... +[2024-11-07 12:18:02,232][122945] Decorrelating experience for 64 frames... +[2024-11-07 12:18:03,024][122949] Decorrelating experience for 64 frames... +[2024-11-07 12:18:03,097][122948] Decorrelating experience for 96 frames... +[2024-11-07 12:18:03,204][122946] Decorrelating experience for 64 frames... +[2024-11-07 12:18:03,217][122942] Decorrelating experience for 96 frames... +[2024-11-07 12:18:03,454][122949] Decorrelating experience for 96 frames... +[2024-11-07 12:18:03,715][122946] Decorrelating experience for 96 frames... +[2024-11-07 12:18:03,740][122945] Decorrelating experience for 96 frames... +[2024-11-07 12:18:04,155][122819] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-11-07 12:18:09,155][122819] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-11-07 12:18:10,315][122819] Heartbeat connected on Batcher_0 +[2024-11-07 12:18:10,318][122819] Heartbeat connected on LearnerWorker_p0 +[2024-11-07 12:18:10,329][122819] Heartbeat connected on RolloutWorker_w0 +[2024-11-07 12:18:10,333][122819] Heartbeat connected on RolloutWorker_w1 +[2024-11-07 12:18:10,336][122819] Heartbeat connected on RolloutWorker_w2 +[2024-11-07 12:18:10,340][122819] Heartbeat connected on RolloutWorker_w3 +[2024-11-07 12:18:10,344][122819] Heartbeat connected on RolloutWorker_w4 +[2024-11-07 12:18:10,348][122819] Heartbeat connected on RolloutWorker_w5 +[2024-11-07 12:18:10,351][122819] Heartbeat connected on RolloutWorker_w6 +[2024-11-07 12:18:10,355][122819] Heartbeat connected on RolloutWorker_w7 +[2024-11-07 12:18:11,719][122819] Heartbeat connected on InferenceWorker_p0-w0 +[2024-11-07 12:18:15,019][122819] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 9.0. Samples: 98. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-11-07 12:18:15,021][122819] Avg episode reward: [(0, '1.371')] +[2024-11-07 12:18:15,670][122929] Signal inference workers to stop experience collection... +[2024-11-07 12:18:15,689][122943] InferenceWorker_p0-w0: stopping experience collection +[2024-11-07 12:18:19,155][122819] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 155.2. Samples: 2328. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-11-07 12:18:19,158][122819] Avg episode reward: [(0, '2.097')] +[2024-11-07 12:18:24,155][122819] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 116.4. Samples: 2328. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-11-07 12:18:24,156][122819] Avg episode reward: [(0, '2.097')] +[2024-11-07 12:18:29,155][122819] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 93.1. Samples: 2328. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-11-07 12:18:29,157][122819] Avg episode reward: [(0, '2.097')] +[2024-11-07 12:18:29,624][122929] Signal inference workers to resume experience collection... +[2024-11-07 12:18:29,624][122943] InferenceWorker_p0-w0: resuming experience collection +[2024-11-07 12:18:34,155][122819] Fps is (10 sec: 3686.4, 60 sec: 1228.8, 300 sec: 1228.8). Total num frames: 36864. Throughput: 0: 260.8. Samples: 7824. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-07 12:18:34,157][122819] Avg episode reward: [(0, '3.975')] +[2024-11-07 12:18:34,234][122943] Updated weights for policy 0, policy_version 10 (0.0033) +[2024-11-07 12:18:39,156][122819] Fps is (10 sec: 7782.4, 60 sec: 2223.6, 300 sec: 2223.6). Total num frames: 77824. Throughput: 0: 567.8. Samples: 19874. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 12:18:39,159][122819] Avg episode reward: [(0, '4.551')] +[2024-11-07 12:18:39,407][122943] Updated weights for policy 0, policy_version 20 (0.0031) +[2024-11-07 12:18:44,155][122819] Fps is (10 sec: 7372.6, 60 sec: 2764.8, 300 sec: 2764.8). Total num frames: 110592. Throughput: 0: 625.4. Samples: 25018. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 12:18:44,158][122819] Avg episode reward: [(0, '4.375')] +[2024-11-07 12:18:44,171][122929] Saving new best policy, reward=4.375! +[2024-11-07 12:18:45,758][122943] Updated weights for policy 0, policy_version 30 (0.0032) +[2024-11-07 12:18:49,155][122819] Fps is (10 sec: 5324.9, 60 sec: 2912.7, 300 sec: 2912.7). Total num frames: 131072. Throughput: 0: 752.9. Samples: 33882. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 12:18:49,157][122819] Avg episode reward: [(0, '4.381')] +[2024-11-07 12:18:49,310][122929] Saving new best policy, reward=4.381! +[2024-11-07 12:18:54,155][122819] Fps is (10 sec: 4915.3, 60 sec: 3194.9, 300 sec: 3194.9). Total num frames: 159744. Throughput: 0: 896.5. Samples: 40344. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-07 12:18:54,158][122819] Avg episode reward: [(0, '4.426')] +[2024-11-07 12:18:54,169][122929] Saving new best policy, reward=4.426! +[2024-11-07 12:18:54,743][122943] Updated weights for policy 0, policy_version 40 (0.0042) +[2024-11-07 12:18:59,155][122819] Fps is (10 sec: 5734.4, 60 sec: 3425.8, 300 sec: 3425.8). Total num frames: 188416. Throughput: 0: 1008.9. Samples: 44626. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 12:18:59,156][122819] Avg episode reward: [(0, '4.479')] +[2024-11-07 12:18:59,158][122929] Saving new best policy, reward=4.479! +[2024-11-07 12:19:01,057][122943] Updated weights for policy 0, policy_version 50 (0.0042) +[2024-11-07 12:19:04,155][122819] Fps is (10 sec: 6144.1, 60 sec: 3686.4, 300 sec: 3686.4). Total num frames: 221184. Throughput: 0: 1147.8. Samples: 53978. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 12:19:04,158][122819] Avg episode reward: [(0, '4.507')] +[2024-11-07 12:19:04,306][122929] Saving new best policy, reward=4.507! +[2024-11-07 12:19:07,031][122943] Updated weights for policy 0, policy_version 60 (0.0024) +[2024-11-07 12:19:09,155][122819] Fps is (10 sec: 7372.8, 60 sec: 4369.1, 300 sec: 4033.0). Total num frames: 262144. Throughput: 0: 1398.8. Samples: 65272. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 12:19:09,157][122819] Avg episode reward: [(0, '4.468')] +[2024-11-07 12:19:12,838][122943] Updated weights for policy 0, policy_version 70 (0.0029) +[2024-11-07 12:19:14,156][122819] Fps is (10 sec: 6962.9, 60 sec: 4917.7, 300 sec: 4154.5). Total num frames: 290816. Throughput: 0: 1510.1. Samples: 70284. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 12:19:14,161][122819] Avg episode reward: [(0, '4.555')] +[2024-11-07 12:19:14,197][122929] Saving new best policy, reward=4.555! +[2024-11-07 12:19:19,162][122819] Fps is (10 sec: 6139.5, 60 sec: 5392.4, 300 sec: 4314.1). Total num frames: 323584. Throughput: 0: 1582.6. Samples: 79050. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 12:19:19,167][122819] Avg episode reward: [(0, '4.434')] +[2024-11-07 12:19:19,851][122943] Updated weights for policy 0, policy_version 80 (0.0039) +[2024-11-07 12:19:24,157][122819] Fps is (10 sec: 5324.3, 60 sec: 5734.3, 300 sec: 4300.7). Total num frames: 344064. Throughput: 0: 1477.7. Samples: 86374. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 12:19:24,159][122819] Avg episode reward: [(0, '4.234')] +[2024-11-07 12:19:27,482][122943] Updated weights for policy 0, policy_version 90 (0.0041) +[2024-11-07 12:19:29,155][122819] Fps is (10 sec: 5328.6, 60 sec: 6280.5, 300 sec: 4433.3). Total num frames: 376832. Throughput: 0: 1474.2. Samples: 91356. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-07 12:19:29,159][122819] Avg episode reward: [(0, '4.254')] +[2024-11-07 12:19:33,232][122943] Updated weights for policy 0, policy_version 100 (0.0032) +[2024-11-07 12:19:34,155][122819] Fps is (10 sec: 6964.1, 60 sec: 6280.5, 300 sec: 4596.6). Total num frames: 413696. Throughput: 0: 1512.4. Samples: 101942. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 12:19:34,160][122819] Avg episode reward: [(0, '4.592')] +[2024-11-07 12:19:34,170][122929] Saving new best policy, reward=4.592! +[2024-11-07 12:19:39,050][122943] Updated weights for policy 0, policy_version 110 (0.0031) +[2024-11-07 12:19:39,155][122819] Fps is (10 sec: 7372.8, 60 sec: 6212.3, 300 sec: 4742.8). Total num frames: 450560. Throughput: 0: 1604.2. Samples: 112532. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 12:19:39,157][122819] Avg episode reward: [(0, '4.377')] +[2024-11-07 12:19:44,155][122819] Fps is (10 sec: 7372.8, 60 sec: 6280.6, 300 sec: 4874.3). Total num frames: 487424. Throughput: 0: 1638.9. Samples: 118378. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 12:19:44,157][122819] Avg episode reward: [(0, '4.348')] +[2024-11-07 12:19:44,180][122929] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000119_487424.pth... +[2024-11-07 12:19:44,635][122943] Updated weights for policy 0, policy_version 120 (0.0033) +[2024-11-07 12:19:49,156][122819] Fps is (10 sec: 7371.8, 60 sec: 6553.5, 300 sec: 4993.2). Total num frames: 524288. Throughput: 0: 1660.6. Samples: 128706. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 12:19:49,161][122819] Avg episode reward: [(0, '4.624')] +[2024-11-07 12:19:49,165][122929] Saving new best policy, reward=4.624! +[2024-11-07 12:19:50,445][122943] Updated weights for policy 0, policy_version 130 (0.0034) +[2024-11-07 12:19:54,155][122819] Fps is (10 sec: 6963.2, 60 sec: 6621.9, 300 sec: 5064.2). Total num frames: 557056. Throughput: 0: 1648.7. Samples: 139464. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 12:19:54,158][122819] Avg episode reward: [(0, '4.426')] +[2024-11-07 12:19:57,840][122943] Updated weights for policy 0, policy_version 140 (0.0025) +[2024-11-07 12:19:59,155][122819] Fps is (10 sec: 5735.1, 60 sec: 6553.6, 300 sec: 5057.7). Total num frames: 581632. Throughput: 0: 1598.8. Samples: 142230. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 12:19:59,157][122819] Avg episode reward: [(0, '4.516')] +[2024-11-07 12:20:04,058][122943] Updated weights for policy 0, policy_version 150 (0.0033) +[2024-11-07 12:20:04,155][122819] Fps is (10 sec: 5734.5, 60 sec: 6553.6, 300 sec: 5120.0). Total num frames: 614400. Throughput: 0: 1620.6. Samples: 151964. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 12:20:04,156][122819] Avg episode reward: [(0, '4.474')] +[2024-11-07 12:20:09,155][122819] Fps is (10 sec: 6963.3, 60 sec: 6485.3, 300 sec: 5210.1). Total num frames: 651264. Throughput: 0: 1697.2. Samples: 162746. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 12:20:09,157][122819] Avg episode reward: [(0, '4.298')] +[2024-11-07 12:20:09,614][122943] Updated weights for policy 0, policy_version 160 (0.0028) +[2024-11-07 12:20:14,155][122819] Fps is (10 sec: 7372.6, 60 sec: 6621.9, 300 sec: 5293.3). Total num frames: 688128. Throughput: 0: 1718.5. Samples: 168688. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 12:20:14,158][122819] Avg episode reward: [(0, '4.335')] +[2024-11-07 12:20:15,086][122943] Updated weights for policy 0, policy_version 170 (0.0025) +[2024-11-07 12:20:19,155][122819] Fps is (10 sec: 6143.9, 60 sec: 6486.1, 300 sec: 5279.3). Total num frames: 712704. Throughput: 0: 1665.2. Samples: 176876. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 12:20:19,157][122819] Avg episode reward: [(0, '4.277')] +[2024-11-07 12:20:22,352][122943] Updated weights for policy 0, policy_version 180 (0.0037) +[2024-11-07 12:20:24,156][122819] Fps is (10 sec: 6143.6, 60 sec: 6758.5, 300 sec: 5354.0). Total num frames: 749568. Throughput: 0: 1663.3. Samples: 187384. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 12:20:24,158][122819] Avg episode reward: [(0, '4.517')] +[2024-11-07 12:20:28,302][122943] Updated weights for policy 0, policy_version 190 (0.0031) +[2024-11-07 12:20:30,637][122819] Fps is (10 sec: 6064.3, 60 sec: 6595.4, 300 sec: 5340.8). Total num frames: 782336. Throughput: 0: 1593.3. Samples: 192438. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 12:20:30,640][122819] Avg episode reward: [(0, '4.553')] +[2024-11-07 12:20:34,155][122819] Fps is (10 sec: 5325.2, 60 sec: 6485.3, 300 sec: 5352.1). Total num frames: 802816. Throughput: 0: 1576.1. Samples: 199630. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 12:20:34,157][122819] Avg episode reward: [(0, '4.420')] +[2024-11-07 12:20:36,012][122943] Updated weights for policy 0, policy_version 200 (0.0034) +[2024-11-07 12:20:39,155][122819] Fps is (10 sec: 7213.3, 60 sec: 6553.6, 300 sec: 5443.7). Total num frames: 843776. Throughput: 0: 1581.4. Samples: 210628. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 12:20:39,156][122819] Avg episode reward: [(0, '4.603')] +[2024-11-07 12:20:41,201][122943] Updated weights for policy 0, policy_version 210 (0.0021) +[2024-11-07 12:20:44,155][122819] Fps is (10 sec: 7782.5, 60 sec: 6553.6, 300 sec: 5504.0). Total num frames: 880640. Throughput: 0: 1651.3. Samples: 216538. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 12:20:44,157][122819] Avg episode reward: [(0, '4.202')] +[2024-11-07 12:20:46,573][122943] Updated weights for policy 0, policy_version 220 (0.0028) +[2024-11-07 12:20:49,155][122819] Fps is (10 sec: 7372.7, 60 sec: 6553.7, 300 sec: 5560.6). Total num frames: 917504. Throughput: 0: 1685.6. Samples: 227816. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 12:20:49,160][122819] Avg episode reward: [(0, '4.434')] +[2024-11-07 12:20:51,948][122943] Updated weights for policy 0, policy_version 230 (0.0027) +[2024-11-07 12:20:54,155][122819] Fps is (10 sec: 7372.7, 60 sec: 6621.9, 300 sec: 5613.9). Total num frames: 954368. Throughput: 0: 1697.3. Samples: 239124. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 12:20:54,157][122819] Avg episode reward: [(0, '4.418')] +[2024-11-07 12:20:57,740][122943] Updated weights for policy 0, policy_version 240 (0.0040) +[2024-11-07 12:20:59,155][122819] Fps is (10 sec: 7372.7, 60 sec: 6826.7, 300 sec: 5664.2). Total num frames: 991232. Throughput: 0: 1682.4. Samples: 244396. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 12:20:59,162][122819] Avg episode reward: [(0, '4.365')] +[2024-11-07 12:21:04,507][122819] Fps is (10 sec: 5935.2, 60 sec: 6651.1, 300 sec: 5632.4). Total num frames: 1015808. Throughput: 0: 1695.8. Samples: 253782. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 12:21:04,509][122819] Avg episode reward: [(0, '4.325')] +[2024-11-07 12:21:05,592][122943] Updated weights for policy 0, policy_version 250 (0.0034) +[2024-11-07 12:21:09,155][122819] Fps is (10 sec: 4915.2, 60 sec: 6485.3, 300 sec: 5623.7). Total num frames: 1040384. Throughput: 0: 1629.6. Samples: 260714. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-11-07 12:21:09,165][122819] Avg episode reward: [(0, '4.297')] +[2024-11-07 12:21:12,581][122943] Updated weights for policy 0, policy_version 260 (0.0046) +[2024-11-07 12:21:14,156][122819] Fps is (10 sec: 5943.0, 60 sec: 6417.0, 300 sec: 5648.1). Total num frames: 1073152. Throughput: 0: 1674.0. Samples: 265290. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-11-07 12:21:14,158][122819] Avg episode reward: [(0, '4.538')] +[2024-11-07 12:21:18,394][122943] Updated weights for policy 0, policy_version 270 (0.0029) +[2024-11-07 12:21:19,155][122819] Fps is (10 sec: 6963.2, 60 sec: 6621.8, 300 sec: 5692.4). Total num frames: 1110016. Throughput: 0: 1693.4. Samples: 275834. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 12:21:19,157][122819] Avg episode reward: [(0, '4.423')] +[2024-11-07 12:21:24,116][122943] Updated weights for policy 0, policy_version 280 (0.0046) +[2024-11-07 12:21:24,155][122819] Fps is (10 sec: 7373.4, 60 sec: 6621.9, 300 sec: 5734.4). Total num frames: 1146880. Throughput: 0: 1682.9. Samples: 286360. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 12:21:24,156][122819] Avg episode reward: [(0, '4.495')] +[2024-11-07 12:21:29,155][122819] Fps is (10 sec: 6963.3, 60 sec: 6789.6, 300 sec: 5754.4). Total num frames: 1179648. Throughput: 0: 1668.0. Samples: 291598. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 12:21:29,157][122819] Avg episode reward: [(0, '4.812')] +[2024-11-07 12:21:29,161][122929] Saving new best policy, reward=4.812! +[2024-11-07 12:21:29,921][122943] Updated weights for policy 0, policy_version 290 (0.0049) +[2024-11-07 12:21:34,155][122819] Fps is (10 sec: 6553.6, 60 sec: 6826.7, 300 sec: 5773.4). Total num frames: 1212416. Throughput: 0: 1644.4. Samples: 301814. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 12:21:34,157][122819] Avg episode reward: [(0, '4.543')] +[2024-11-07 12:21:36,087][122943] Updated weights for policy 0, policy_version 300 (0.0029) +[2024-11-07 12:21:39,155][122819] Fps is (10 sec: 6963.1, 60 sec: 6758.4, 300 sec: 5810.6). Total num frames: 1249280. Throughput: 0: 1628.2. Samples: 312392. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 12:21:39,157][122819] Avg episode reward: [(0, '4.347')] +[2024-11-07 12:21:41,723][122943] Updated weights for policy 0, policy_version 310 (0.0064) +[2024-11-07 12:21:44,155][122819] Fps is (10 sec: 7372.9, 60 sec: 6758.4, 300 sec: 5846.1). Total num frames: 1286144. Throughput: 0: 1634.3. Samples: 317938. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 12:21:44,157][122819] Avg episode reward: [(0, '4.511')] +[2024-11-07 12:21:44,170][122929] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000314_1286144.pth... +[2024-11-07 12:21:44,269][122929] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000000_0.pth +[2024-11-07 12:21:47,402][122943] Updated weights for policy 0, policy_version 320 (0.0031) +[2024-11-07 12:21:49,155][122819] Fps is (10 sec: 7372.7, 60 sec: 6758.4, 300 sec: 5880.0). Total num frames: 1323008. Throughput: 0: 1685.0. Samples: 329016. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-07 12:21:49,157][122819] Avg episode reward: [(0, '4.405')] +[2024-11-07 12:21:53,010][122943] Updated weights for policy 0, policy_version 330 (0.0025) +[2024-11-07 12:21:54,155][122819] Fps is (10 sec: 6963.2, 60 sec: 6690.1, 300 sec: 5894.7). Total num frames: 1355776. Throughput: 0: 1755.7. Samples: 339720. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 12:21:54,158][122819] Avg episode reward: [(0, '4.536')] +[2024-11-07 12:21:58,782][122943] Updated weights for policy 0, policy_version 340 (0.0044) +[2024-11-07 12:21:59,155][122819] Fps is (10 sec: 6963.3, 60 sec: 6690.1, 300 sec: 5926.1). Total num frames: 1392640. Throughput: 0: 1772.8. Samples: 345064. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 12:21:59,158][122819] Avg episode reward: [(0, '4.511')] +[2024-11-07 12:22:04,156][122819] Fps is (10 sec: 6552.9, 60 sec: 6798.1, 300 sec: 5922.1). Total num frames: 1421312. Throughput: 0: 1739.9. Samples: 354130. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 12:22:04,159][122819] Avg episode reward: [(0, '4.302')] +[2024-11-07 12:22:05,663][122943] Updated weights for policy 0, policy_version 350 (0.0031) +[2024-11-07 12:22:10,540][122819] Fps is (10 sec: 5036.8, 60 sec: 6672.6, 300 sec: 5885.0). Total num frames: 1449984. Throughput: 0: 1665.0. Samples: 363592. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 12:22:10,542][122819] Avg episode reward: [(0, '4.394')] +[2024-11-07 12:22:13,964][122943] Updated weights for policy 0, policy_version 360 (0.0034) +[2024-11-07 12:22:14,156][122819] Fps is (10 sec: 5325.2, 60 sec: 6690.2, 300 sec: 5898.2). Total num frames: 1474560. Throughput: 0: 1644.6. Samples: 365604. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 12:22:14,157][122819] Avg episode reward: [(0, '4.341')] +[2024-11-07 12:22:19,155][122819] Fps is (10 sec: 7131.8, 60 sec: 6690.2, 300 sec: 5927.2). Total num frames: 1511424. Throughput: 0: 1649.2. Samples: 376026. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 12:22:19,158][122819] Avg episode reward: [(0, '4.387')] +[2024-11-07 12:22:19,429][122943] Updated weights for policy 0, policy_version 370 (0.0025) +[2024-11-07 12:22:24,156][122819] Fps is (10 sec: 7372.8, 60 sec: 6690.1, 300 sec: 5954.9). Total num frames: 1548288. Throughput: 0: 1672.0. Samples: 387634. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 12:22:24,157][122819] Avg episode reward: [(0, '4.164')] +[2024-11-07 12:22:24,864][122943] Updated weights for policy 0, policy_version 380 (0.0028) +[2024-11-07 12:22:29,156][122819] Fps is (10 sec: 7372.4, 60 sec: 6758.3, 300 sec: 5981.7). Total num frames: 1585152. Throughput: 0: 1672.2. Samples: 393190. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 12:22:29,160][122819] Avg episode reward: [(0, '4.529')] +[2024-11-07 12:22:30,336][122943] Updated weights for policy 0, policy_version 390 (0.0025) +[2024-11-07 12:22:34,155][122819] Fps is (10 sec: 7373.2, 60 sec: 6826.7, 300 sec: 6007.5). Total num frames: 1622016. Throughput: 0: 1665.7. Samples: 403970. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 12:22:34,157][122819] Avg episode reward: [(0, '4.375')] +[2024-11-07 12:22:36,419][122943] Updated weights for policy 0, policy_version 400 (0.0031) +[2024-11-07 12:22:39,155][122819] Fps is (10 sec: 6963.6, 60 sec: 6758.4, 300 sec: 6017.4). Total num frames: 1654784. Throughput: 0: 1656.8. Samples: 414276. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 12:22:39,157][122819] Avg episode reward: [(0, '4.374')] +[2024-11-07 12:22:42,402][122943] Updated weights for policy 0, policy_version 410 (0.0031) +[2024-11-07 12:22:44,446][122819] Fps is (10 sec: 5970.1, 60 sec: 6589.9, 300 sec: 6006.1). Total num frames: 1683456. Throughput: 0: 1639.3. Samples: 419310. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 12:22:44,448][122819] Avg episode reward: [(0, '4.303')] +[2024-11-07 12:22:49,155][122819] Fps is (10 sec: 5734.3, 60 sec: 6485.3, 300 sec: 6007.5). Total num frames: 1712128. Throughput: 0: 1616.8. Samples: 426886. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 12:22:49,157][122819] Avg episode reward: [(0, '4.298')] +[2024-11-07 12:22:49,921][122943] Updated weights for policy 0, policy_version 420 (0.0029) +[2024-11-07 12:22:54,155][122819] Fps is (10 sec: 6750.0, 60 sec: 6553.6, 300 sec: 6031.0). Total num frames: 1748992. Throughput: 0: 1696.2. Samples: 437572. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 12:22:54,157][122819] Avg episode reward: [(0, '4.409')] +[2024-11-07 12:22:55,504][122943] Updated weights for policy 0, policy_version 430 (0.0040) +[2024-11-07 12:22:59,155][122819] Fps is (10 sec: 7372.7, 60 sec: 6553.6, 300 sec: 6053.7). Total num frames: 1785856. Throughput: 0: 1718.7. Samples: 442944. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 12:22:59,157][122819] Avg episode reward: [(0, '4.417')] +[2024-11-07 12:23:00,954][122943] Updated weights for policy 0, policy_version 440 (0.0028) +[2024-11-07 12:23:04,155][122819] Fps is (10 sec: 6963.2, 60 sec: 6622.0, 300 sec: 6164.8). Total num frames: 1818624. Throughput: 0: 1725.8. Samples: 453688. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 12:23:04,157][122819] Avg episode reward: [(0, '4.456')] +[2024-11-07 12:23:06,963][122943] Updated weights for policy 0, policy_version 450 (0.0028) +[2024-11-07 12:23:09,155][122819] Fps is (10 sec: 7373.1, 60 sec: 6988.0, 300 sec: 6322.2). Total num frames: 1859584. Throughput: 0: 1710.8. Samples: 464618. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 12:23:09,159][122819] Avg episode reward: [(0, '4.311')] +[2024-11-07 12:23:12,643][122943] Updated weights for policy 0, policy_version 460 (0.0027) +[2024-11-07 12:23:14,155][122819] Fps is (10 sec: 7372.8, 60 sec: 6963.2, 300 sec: 6414.8). Total num frames: 1892352. Throughput: 0: 1705.7. Samples: 469946. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 12:23:14,158][122819] Avg episode reward: [(0, '4.664')] +[2024-11-07 12:23:19,055][122819] Keyboard interrupt detected in the event loop EvtLoop [Runner_EvtLoop, process=main process 122819], exiting... +[2024-11-07 12:23:19,060][122929] Stopping Batcher_0... +[2024-11-07 12:23:19,061][122929] Loop batcher_evt_loop terminating... +[2024-11-07 12:23:19,060][122819] Runner profile tree view: +main_loop: 328.7046 +[2024-11-07 12:23:19,063][122929] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000468_1916928.pth... +[2024-11-07 12:23:19,062][122819] Collected {0: 1916928}, FPS: 5831.8 +[2024-11-07 12:23:19,095][122943] Weights refcount: 2 0 +[2024-11-07 12:23:19,099][122943] Stopping InferenceWorker_p0-w0... +[2024-11-07 12:23:19,100][122943] Loop inference_proc0-0_evt_loop terminating... +[2024-11-07 12:23:19,237][122929] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000119_487424.pth +[2024-11-07 12:23:19,246][122929] Stopping LearnerWorker_p0... +[2024-11-07 12:23:19,246][122929] Loop learner_proc0_evt_loop terminating... +[2024-11-07 12:23:19,533][122942] Stopping RolloutWorker_w0... +[2024-11-07 12:23:19,536][122942] Loop rollout_proc0_evt_loop terminating... +[2024-11-07 12:23:19,569][122947] Stopping RolloutWorker_w4... +[2024-11-07 12:23:19,570][122947] Loop rollout_proc4_evt_loop terminating... +[2024-11-07 12:23:19,566][122948] Stopping RolloutWorker_w5... +[2024-11-07 12:23:19,572][122948] Loop rollout_proc5_evt_loop terminating... +[2024-11-07 12:23:19,599][122946] Stopping RolloutWorker_w3... +[2024-11-07 12:23:19,601][122946] Loop rollout_proc3_evt_loop terminating... +[2024-11-07 12:23:19,626][122944] Stopping RolloutWorker_w1... +[2024-11-07 12:23:19,626][122944] Loop rollout_proc1_evt_loop terminating... +[2024-11-07 12:23:19,702][122949] Stopping RolloutWorker_w6... +[2024-11-07 12:23:19,704][122949] Loop rollout_proc6_evt_loop terminating... +[2024-11-07 12:23:19,735][122956] Stopping RolloutWorker_w7... +[2024-11-07 12:23:19,737][122956] Loop rollout_proc7_evt_loop terminating... +[2024-11-07 12:23:19,826][122945] Stopping RolloutWorker_w2... +[2024-11-07 12:23:19,827][122945] Loop rollout_proc2_evt_loop terminating... +[2024-11-07 12:26:37,256][125367] Saving configuration to /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/config.json... +[2024-11-07 12:26:37,258][125367] Rollout worker 0 uses device cpu +[2024-11-07 12:26:37,261][125367] Rollout worker 1 uses device cpu +[2024-11-07 12:26:37,262][125367] Rollout worker 2 uses device cpu +[2024-11-07 12:26:37,263][125367] Rollout worker 3 uses device cpu +[2024-11-07 12:26:37,264][125367] Rollout worker 4 uses device cpu +[2024-11-07 12:26:37,265][125367] Rollout worker 5 uses device cpu +[2024-11-07 12:26:37,265][125367] Rollout worker 6 uses device cpu +[2024-11-07 12:26:37,266][125367] Rollout worker 7 uses device cpu +[2024-11-07 12:26:37,339][125367] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-11-07 12:26:37,341][125367] InferenceWorker_p0-w0: min num requests: 2 +[2024-11-07 12:26:37,388][125367] Starting all processes... +[2024-11-07 12:26:37,390][125367] Starting process learner_proc0 +[2024-11-07 12:26:37,515][125367] Starting all processes... +[2024-11-07 12:26:37,560][125367] Starting process inference_proc0-0 +[2024-11-07 12:26:37,561][125367] Starting process rollout_proc0 +[2024-11-07 12:26:37,562][125367] Starting process rollout_proc1 +[2024-11-07 12:26:37,562][125367] Starting process rollout_proc2 +[2024-11-07 12:26:37,568][125367] Starting process rollout_proc3 +[2024-11-07 12:26:37,573][125367] Starting process rollout_proc4 +[2024-11-07 12:26:37,581][125367] Starting process rollout_proc5 +[2024-11-07 12:26:37,586][125367] Starting process rollout_proc6 +[2024-11-07 12:26:37,588][125367] Starting process rollout_proc7 +[2024-11-07 12:26:46,473][125885] Worker 5 uses CPU cores [5] +[2024-11-07 12:26:46,763][125890] Worker 3 uses CPU cores [3] +[2024-11-07 12:26:46,971][125892] Worker 7 uses CPU cores [0, 1, 2, 3, 4, 5, 6] +[2024-11-07 12:26:47,013][125882] Worker 0 uses CPU cores [0] +[2024-11-07 12:26:47,264][125868] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-11-07 12:26:47,264][125868] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 +[2024-11-07 12:26:47,428][125881] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-11-07 12:26:47,428][125881] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 +[2024-11-07 12:26:47,449][125868] Num visible devices: 1 +[2024-11-07 12:26:47,454][125881] Num visible devices: 1 +[2024-11-07 12:26:47,487][125868] Starting seed is not provided +[2024-11-07 12:26:47,488][125868] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-11-07 12:26:47,488][125868] Initializing actor-critic model on device cuda:0 +[2024-11-07 12:26:47,489][125868] RunningMeanStd input shape: (3, 72, 128) +[2024-11-07 12:26:47,493][125868] RunningMeanStd input shape: (1,) +[2024-11-07 12:26:47,525][125868] ConvEncoder: input_channels=3 +[2024-11-07 12:26:47,606][125887] Worker 4 uses CPU cores [4] +[2024-11-07 12:26:47,620][125891] Worker 6 uses CPU cores [6] +[2024-11-07 12:26:47,657][125884] Worker 2 uses CPU cores [2] +[2024-11-07 12:26:47,713][125883] Worker 1 uses CPU cores [1] +[2024-11-07 12:26:47,781][125868] Conv encoder output size: 512 +[2024-11-07 12:26:47,782][125868] Policy head output size: 512 +[2024-11-07 12:26:47,826][125868] Created Actor Critic model with architecture: +[2024-11-07 12:26:47,826][125868] ActorCriticSharedWeights( + (obs_normalizer): ObservationNormalizer( + (running_mean_std): RunningMeanStdDictInPlace( + (running_mean_std): ModuleDict( + (obs): RunningMeanStdInPlace() + ) + ) + ) + (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) + (encoder): VizdoomEncoder( + (basic_encoder): ConvEncoder( + (enc): RecursiveScriptModule( + original_name=ConvEncoderImpl + (conv_head): RecursiveScriptModule( + original_name=Sequential + (0): RecursiveScriptModule(original_name=Conv2d) + (1): RecursiveScriptModule(original_name=ELU) + (2): RecursiveScriptModule(original_name=Conv2d) + (3): RecursiveScriptModule(original_name=ELU) + (4): RecursiveScriptModule(original_name=Conv2d) + (5): RecursiveScriptModule(original_name=ELU) + ) + (mlp_layers): RecursiveScriptModule( + original_name=Sequential + (0): RecursiveScriptModule(original_name=Linear) + (1): RecursiveScriptModule(original_name=ELU) + ) + ) + ) + ) + (core): ModelCoreRNN( + (core): GRU(512, 512) + ) + (decoder): MlpDecoder( + (mlp): Identity() + ) + (critic_linear): Linear(in_features=512, out_features=1, bias=True) + (action_parameterization): ActionParameterizationDefault( + (distribution_linear): Linear(in_features=512, out_features=5, bias=True) + ) +) +[2024-11-07 12:26:49,316][125868] Using optimizer +[2024-11-07 12:26:55,921][125868] Loading state from checkpoint /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000468_1916928.pth... +[2024-11-07 12:26:56,020][125868] Loading model from checkpoint +[2024-11-07 12:26:56,022][125868] Loaded experiment state at self.train_step=468, self.env_steps=1916928 +[2024-11-07 12:26:56,023][125868] Initialized policy 0 weights for model version 468 +[2024-11-07 12:26:56,029][125868] LearnerWorker_p0 finished initialization! +[2024-11-07 12:26:56,030][125868] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-11-07 12:26:56,062][125367] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 1916928. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-11-07 12:26:56,304][125881] RunningMeanStd input shape: (3, 72, 128) +[2024-11-07 12:26:56,305][125881] RunningMeanStd input shape: (1,) +[2024-11-07 12:26:56,324][125881] ConvEncoder: input_channels=3 +[2024-11-07 12:26:56,453][125881] Conv encoder output size: 512 +[2024-11-07 12:26:56,453][125881] Policy head output size: 512 +[2024-11-07 12:26:56,506][125367] Inference worker 0-0 is ready! +[2024-11-07 12:26:56,507][125367] All inference workers are ready! Signal rollout workers to start! +[2024-11-07 12:26:56,586][125890] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 12:26:56,588][125887] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 12:26:56,594][125884] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 12:26:56,609][125891] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 12:26:56,617][125883] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 12:26:56,624][125882] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 12:26:56,651][125885] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 12:26:56,658][125892] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 12:26:57,329][125367] Heartbeat connected on Batcher_0 +[2024-11-07 12:26:57,332][125367] Heartbeat connected on LearnerWorker_p0 +[2024-11-07 12:26:57,370][125367] Heartbeat connected on InferenceWorker_p0-w0 +[2024-11-07 12:26:59,179][125884] Decorrelating experience for 0 frames... +[2024-11-07 12:26:59,179][125892] Decorrelating experience for 0 frames... +[2024-11-07 12:26:59,179][125891] Decorrelating experience for 0 frames... +[2024-11-07 12:26:59,179][125890] Decorrelating experience for 0 frames... +[2024-11-07 12:26:59,650][125891] Decorrelating experience for 32 frames... +[2024-11-07 12:26:59,654][125892] Decorrelating experience for 32 frames... +[2024-11-07 12:26:59,656][125890] Decorrelating experience for 32 frames... +[2024-11-07 12:26:59,674][125887] Decorrelating experience for 0 frames... +[2024-11-07 12:26:59,679][125885] Decorrelating experience for 0 frames... +[2024-11-07 12:27:00,122][125887] Decorrelating experience for 32 frames... +[2024-11-07 12:27:00,169][125882] Decorrelating experience for 0 frames... +[2024-11-07 12:27:00,172][125885] Decorrelating experience for 32 frames... +[2024-11-07 12:27:00,399][125890] Decorrelating experience for 64 frames... +[2024-11-07 12:27:00,739][125882] Decorrelating experience for 32 frames... +[2024-11-07 12:27:00,752][125884] Decorrelating experience for 32 frames... +[2024-11-07 12:27:00,767][125892] Decorrelating experience for 64 frames... +[2024-11-07 12:27:01,033][125887] Decorrelating experience for 64 frames... +[2024-11-07 12:27:01,046][125891] Decorrelating experience for 64 frames... +[2024-11-07 12:27:01,062][125367] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 1916928. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-11-07 12:27:01,180][125885] Decorrelating experience for 64 frames... +[2024-11-07 12:27:01,483][125883] Decorrelating experience for 0 frames... +[2024-11-07 12:27:01,578][125890] Decorrelating experience for 96 frames... +[2024-11-07 12:27:01,608][125892] Decorrelating experience for 96 frames... +[2024-11-07 12:27:01,639][125887] Decorrelating experience for 96 frames... +[2024-11-07 12:27:01,724][125367] Heartbeat connected on RolloutWorker_w3 +[2024-11-07 12:27:01,812][125367] Heartbeat connected on RolloutWorker_w7 +[2024-11-07 12:27:01,819][125884] Decorrelating experience for 64 frames... +[2024-11-07 12:27:01,822][125367] Heartbeat connected on RolloutWorker_w4 +[2024-11-07 12:27:01,937][125882] Decorrelating experience for 64 frames... +[2024-11-07 12:27:01,946][125891] Decorrelating experience for 96 frames... +[2024-11-07 12:27:02,111][125367] Heartbeat connected on RolloutWorker_w6 +[2024-11-07 12:27:02,205][125883] Decorrelating experience for 32 frames... +[2024-11-07 12:27:02,425][125885] Decorrelating experience for 96 frames... +[2024-11-07 12:27:02,612][125367] Heartbeat connected on RolloutWorker_w5 +[2024-11-07 12:27:02,723][125882] Decorrelating experience for 96 frames... +[2024-11-07 12:27:02,806][125367] Heartbeat connected on RolloutWorker_w0 +[2024-11-07 12:27:02,967][125884] Decorrelating experience for 96 frames... +[2024-11-07 12:27:02,972][125883] Decorrelating experience for 64 frames... +[2024-11-07 12:27:03,060][125367] Heartbeat connected on RolloutWorker_w2 +[2024-11-07 12:27:03,582][125883] Decorrelating experience for 96 frames... +[2024-11-07 12:27:03,729][125367] Heartbeat connected on RolloutWorker_w1 +[2024-11-07 12:27:05,475][125868] Signal inference workers to stop experience collection... +[2024-11-07 12:27:05,493][125881] InferenceWorker_p0-w0: stopping experience collection +[2024-11-07 12:27:06,062][125367] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 1916928. Throughput: 0: 287.2. Samples: 2872. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-11-07 12:27:06,064][125367] Avg episode reward: [(0, '2.347')] +[2024-11-07 12:27:11,062][125367] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 1916928. Throughput: 0: 191.5. Samples: 2872. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-11-07 12:27:11,063][125367] Avg episode reward: [(0, '2.347')] +[2024-11-07 12:27:15,761][125868] Signal inference workers to resume experience collection... +[2024-11-07 12:27:15,762][125881] InferenceWorker_p0-w0: resuming experience collection +[2024-11-07 12:27:16,062][125367] Fps is (10 sec: 409.6, 60 sec: 204.8, 300 sec: 204.8). Total num frames: 1921024. Throughput: 0: 143.6. Samples: 2872. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) +[2024-11-07 12:27:16,065][125367] Avg episode reward: [(0, '2.347')] +[2024-11-07 12:27:21,062][125367] Fps is (10 sec: 3276.6, 60 sec: 1310.7, 300 sec: 1310.7). Total num frames: 1949696. Throughput: 0: 348.9. Samples: 8722. Policy #0 lag: (min: 0.0, avg: 0.8, max: 1.0) +[2024-11-07 12:27:21,065][125367] Avg episode reward: [(0, '4.010')] +[2024-11-07 12:27:22,265][125881] Updated weights for policy 0, policy_version 478 (0.0035) +[2024-11-07 12:27:26,061][125367] Fps is (10 sec: 5734.7, 60 sec: 2048.0, 300 sec: 2048.0). Total num frames: 1978368. Throughput: 0: 437.4. Samples: 13122. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-07 12:27:26,064][125367] Avg episode reward: [(0, '4.390')] +[2024-11-07 12:27:28,815][125881] Updated weights for policy 0, policy_version 488 (0.0035) +[2024-11-07 12:27:31,062][125367] Fps is (10 sec: 6144.3, 60 sec: 2691.7, 300 sec: 2691.7). Total num frames: 2011136. Throughput: 0: 635.8. Samples: 22252. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 12:27:31,064][125367] Avg episode reward: [(0, '4.244')] +[2024-11-07 12:27:35,979][125881] Updated weights for policy 0, policy_version 498 (0.0044) +[2024-11-07 12:27:36,062][125367] Fps is (10 sec: 6143.8, 60 sec: 3072.0, 300 sec: 3072.0). Total num frames: 2039808. Throughput: 0: 767.1. Samples: 30686. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 12:27:36,065][125367] Avg episode reward: [(0, '4.582')] +[2024-11-07 12:27:41,062][125367] Fps is (10 sec: 5324.8, 60 sec: 3276.8, 300 sec: 3276.8). Total num frames: 2064384. Throughput: 0: 776.3. Samples: 34932. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 12:27:41,066][125367] Avg episode reward: [(0, '4.401')] +[2024-11-07 12:27:44,344][125881] Updated weights for policy 0, policy_version 508 (0.0059) +[2024-11-07 12:27:46,062][125367] Fps is (10 sec: 4505.7, 60 sec: 3358.7, 300 sec: 3358.7). Total num frames: 2084864. Throughput: 0: 927.3. Samples: 41728. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 12:27:46,064][125367] Avg episode reward: [(0, '4.441')] +[2024-11-07 12:27:51,064][125367] Fps is (10 sec: 3276.0, 60 sec: 3276.7, 300 sec: 3276.7). Total num frames: 2097152. Throughput: 0: 951.9. Samples: 45712. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-11-07 12:27:51,069][125367] Avg episode reward: [(0, '4.615')] +[2024-11-07 12:27:56,062][125367] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3345.1). Total num frames: 2117632. Throughput: 0: 1015.4. Samples: 48564. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-11-07 12:27:56,069][125367] Avg episode reward: [(0, '4.553')] +[2024-11-07 12:27:56,853][125881] Updated weights for policy 0, policy_version 518 (0.0057) +[2024-11-07 12:28:01,062][125367] Fps is (10 sec: 3687.3, 60 sec: 3618.1, 300 sec: 3339.8). Total num frames: 2134016. Throughput: 0: 1142.8. Samples: 54296. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2024-11-07 12:28:01,064][125367] Avg episode reward: [(0, '4.420')] +[2024-11-07 12:28:04,738][125367] Keyboard interrupt detected in the event loop EvtLoop [Runner_EvtLoop, process=main process 125367], exiting... +[2024-11-07 12:28:04,741][125868] Stopping Batcher_0... +[2024-11-07 12:28:04,741][125367] Runner profile tree view: +main_loop: 87.3535 +[2024-11-07 12:28:04,744][125367] Collected {0: 2146304}, FPS: 2625.8 +[2024-11-07 12:28:04,742][125868] Loop batcher_evt_loop terminating... +[2024-11-07 12:28:04,801][125881] Weights refcount: 2 0 +[2024-11-07 12:28:04,812][125881] Stopping InferenceWorker_p0-w0... +[2024-11-07 12:28:04,812][125881] Loop inference_proc0-0_evt_loop terminating... +[2024-11-07 12:28:04,826][125868] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000525_2150400.pth... +[2024-11-07 12:28:05,126][125868] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000314_1286144.pth +[2024-11-07 12:28:05,129][125868] Stopping LearnerWorker_p0... +[2024-11-07 12:28:05,132][125868] Loop learner_proc0_evt_loop terminating... +[2024-11-07 12:28:05,856][125884] Stopping RolloutWorker_w2... +[2024-11-07 12:28:05,858][125884] Loop rollout_proc2_evt_loop terminating... +[2024-11-07 12:28:05,873][125887] Stopping RolloutWorker_w4... +[2024-11-07 12:28:05,874][125887] Loop rollout_proc4_evt_loop terminating... +[2024-11-07 12:28:05,913][125885] Stopping RolloutWorker_w5... +[2024-11-07 12:28:05,914][125885] Loop rollout_proc5_evt_loop terminating... +[2024-11-07 12:28:05,915][125883] Stopping RolloutWorker_w1... +[2024-11-07 12:28:05,916][125883] Loop rollout_proc1_evt_loop terminating... +[2024-11-07 12:28:05,992][125890] Stopping RolloutWorker_w3... +[2024-11-07 12:28:05,992][125890] Loop rollout_proc3_evt_loop terminating... +[2024-11-07 12:28:06,002][125891] Stopping RolloutWorker_w6... +[2024-11-07 12:28:06,009][125891] Loop rollout_proc6_evt_loop terminating... +[2024-11-07 12:28:06,026][125892] Stopping RolloutWorker_w7... +[2024-11-07 12:28:06,026][125892] Loop rollout_proc7_evt_loop terminating... +[2024-11-07 12:28:06,164][125882] Stopping RolloutWorker_w0... +[2024-11-07 12:28:06,165][125882] Loop rollout_proc0_evt_loop terminating... +[2024-11-07 12:32:34,540][129156] Saving configuration to /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/config.json... +[2024-11-07 12:32:34,542][129156] Rollout worker 0 uses device cpu +[2024-11-07 12:32:34,543][129156] Rollout worker 1 uses device cpu +[2024-11-07 12:32:34,544][129156] Rollout worker 2 uses device cpu +[2024-11-07 12:32:34,546][129156] Rollout worker 3 uses device cpu +[2024-11-07 12:32:34,547][129156] Rollout worker 4 uses device cpu +[2024-11-07 12:32:34,548][129156] Rollout worker 5 uses device cpu +[2024-11-07 12:32:34,551][129156] Rollout worker 6 uses device cpu +[2024-11-07 12:32:34,552][129156] Rollout worker 7 uses device cpu +[2024-11-07 12:32:34,629][129156] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-11-07 12:32:34,630][129156] InferenceWorker_p0-w0: min num requests: 2 +[2024-11-07 12:32:34,670][129156] Starting all processes... +[2024-11-07 12:32:34,671][129156] Starting process learner_proc0 +[2024-11-07 12:32:34,759][129156] Starting all processes... +[2024-11-07 12:32:34,888][129156] Starting process inference_proc0-0 +[2024-11-07 12:32:34,889][129156] Starting process rollout_proc0 +[2024-11-07 12:32:34,890][129156] Starting process rollout_proc1 +[2024-11-07 12:32:34,892][129156] Starting process rollout_proc2 +[2024-11-07 12:32:34,903][129156] Starting process rollout_proc3 +[2024-11-07 12:32:34,905][129156] Starting process rollout_proc4 +[2024-11-07 12:32:34,906][129156] Starting process rollout_proc5 +[2024-11-07 12:32:34,907][129156] Starting process rollout_proc6 +[2024-11-07 12:32:34,914][129156] Starting process rollout_proc7 +[2024-11-07 12:32:42,998][129261] Worker 5 uses CPU cores [5] +[2024-11-07 12:32:43,007][129242] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-11-07 12:32:43,007][129242] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 +[2024-11-07 12:32:43,292][129242] Num visible devices: 1 +[2024-11-07 12:32:43,328][129242] Starting seed is not provided +[2024-11-07 12:32:43,329][129242] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-11-07 12:32:43,329][129242] Initializing actor-critic model on device cuda:0 +[2024-11-07 12:32:43,330][129242] RunningMeanStd input shape: (3, 72, 128) +[2024-11-07 12:32:43,331][129242] RunningMeanStd input shape: (1,) +[2024-11-07 12:32:43,368][129242] ConvEncoder: input_channels=3 +[2024-11-07 12:32:43,403][129260] Worker 4 uses CPU cores [4] +[2024-11-07 12:32:43,740][129242] Conv encoder output size: 512 +[2024-11-07 12:32:43,742][129242] Policy head output size: 512 +[2024-11-07 12:32:43,792][129242] Created Actor Critic model with architecture: +[2024-11-07 12:32:43,793][129242] ActorCriticSharedWeights( + (obs_normalizer): ObservationNormalizer( + (running_mean_std): RunningMeanStdDictInPlace( + (running_mean_std): ModuleDict( + (obs): RunningMeanStdInPlace() + ) + ) + ) + (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) + (encoder): VizdoomEncoder( + (basic_encoder): ConvEncoder( + (enc): RecursiveScriptModule( + original_name=ConvEncoderImpl + (conv_head): RecursiveScriptModule( + original_name=Sequential + (0): RecursiveScriptModule(original_name=Conv2d) + (1): RecursiveScriptModule(original_name=ELU) + (2): RecursiveScriptModule(original_name=Conv2d) + (3): RecursiveScriptModule(original_name=ELU) + (4): RecursiveScriptModule(original_name=Conv2d) + (5): RecursiveScriptModule(original_name=ELU) + ) + (mlp_layers): RecursiveScriptModule( + original_name=Sequential + (0): RecursiveScriptModule(original_name=Linear) + (1): RecursiveScriptModule(original_name=ELU) + ) + ) + ) + ) + (core): ModelCoreRNN( + (core): GRU(512, 512) + ) + (decoder): MlpDecoder( + (mlp): Identity() + ) + (critic_linear): Linear(in_features=512, out_features=1, bias=True) + (action_parameterization): ActionParameterizationDefault( + (distribution_linear): Linear(in_features=512, out_features=5, bias=True) + ) +) +[2024-11-07 12:32:43,838][129255] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-11-07 12:32:43,838][129255] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 +[2024-11-07 12:32:43,878][129258] Worker 2 uses CPU cores [2] +[2024-11-07 12:32:43,890][129255] Num visible devices: 1 +[2024-11-07 12:32:44,070][129257] Worker 1 uses CPU cores [1] +[2024-11-07 12:32:44,178][129256] Worker 0 uses CPU cores [0] +[2024-11-07 12:32:44,278][129263] Worker 6 uses CPU cores [6] +[2024-11-07 12:32:44,298][129259] Worker 3 uses CPU cores [3] +[2024-11-07 12:32:44,564][129262] Worker 7 uses CPU cores [0, 1, 2, 3, 4, 5, 6] +[2024-11-07 12:32:45,125][129242] Using optimizer +[2024-11-07 12:32:49,395][129242] Loading state from checkpoint /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000525_2150400.pth... +[2024-11-07 12:32:49,492][129242] Loading model from checkpoint +[2024-11-07 12:32:49,494][129242] Loaded experiment state at self.train_step=525, self.env_steps=2150400 +[2024-11-07 12:32:49,495][129242] Initialized policy 0 weights for model version 525 +[2024-11-07 12:32:49,502][129242] LearnerWorker_p0 finished initialization! +[2024-11-07 12:32:49,503][129242] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-11-07 12:32:49,707][129156] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 2150400. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-11-07 12:32:49,753][129255] RunningMeanStd input shape: (3, 72, 128) +[2024-11-07 12:32:49,754][129255] RunningMeanStd input shape: (1,) +[2024-11-07 12:32:49,770][129255] ConvEncoder: input_channels=3 +[2024-11-07 12:32:49,905][129255] Conv encoder output size: 512 +[2024-11-07 12:32:49,906][129255] Policy head output size: 512 +[2024-11-07 12:32:49,958][129156] Inference worker 0-0 is ready! +[2024-11-07 12:32:49,959][129156] All inference workers are ready! Signal rollout workers to start! +[2024-11-07 12:32:50,049][129260] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 12:32:50,071][129257] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 12:32:50,078][129256] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 12:32:50,084][129259] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 12:32:50,086][129263] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 12:32:50,091][129258] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 12:32:50,117][129261] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 12:32:50,151][129262] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 12:32:50,703][129260] Decorrelating experience for 0 frames... +[2024-11-07 12:32:50,702][129259] Decorrelating experience for 0 frames... +[2024-11-07 12:32:50,710][129258] Decorrelating experience for 0 frames... +[2024-11-07 12:32:50,741][129263] Decorrelating experience for 0 frames... +[2024-11-07 12:32:50,741][129257] Decorrelating experience for 0 frames... +[2024-11-07 12:32:50,767][129256] Decorrelating experience for 0 frames... +[2024-11-07 12:32:51,143][129260] Decorrelating experience for 32 frames... +[2024-11-07 12:32:51,154][129259] Decorrelating experience for 32 frames... +[2024-11-07 12:32:51,192][129257] Decorrelating experience for 32 frames... +[2024-11-07 12:32:51,231][129263] Decorrelating experience for 32 frames... +[2024-11-07 12:32:51,242][129261] Decorrelating experience for 0 frames... +[2024-11-07 12:32:51,886][129256] Decorrelating experience for 32 frames... +[2024-11-07 12:32:51,942][129258] Decorrelating experience for 32 frames... +[2024-11-07 12:32:51,945][129261] Decorrelating experience for 32 frames... +[2024-11-07 12:32:52,175][129257] Decorrelating experience for 64 frames... +[2024-11-07 12:32:52,182][129259] Decorrelating experience for 64 frames... +[2024-11-07 12:32:52,265][129260] Decorrelating experience for 64 frames... +[2024-11-07 12:32:54,620][129156] Heartbeat connected on Batcher_0 +[2024-11-07 12:32:54,632][129156] Heartbeat connected on LearnerWorker_p0 +[2024-11-07 12:32:54,678][129156] Heartbeat connected on InferenceWorker_p0-w0 +[2024-11-07 12:32:54,710][129156] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 2150400. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-11-07 12:32:55,085][129262] Decorrelating experience for 0 frames... +[2024-11-07 12:32:55,146][129259] Decorrelating experience for 96 frames... +[2024-11-07 12:32:55,158][129260] Decorrelating experience for 96 frames... +[2024-11-07 12:32:55,164][129258] Decorrelating experience for 64 frames... +[2024-11-07 12:32:55,171][129256] Decorrelating experience for 64 frames... +[2024-11-07 12:32:55,297][129156] Heartbeat connected on RolloutWorker_w4 +[2024-11-07 12:32:55,303][129261] Decorrelating experience for 64 frames... +[2024-11-07 12:32:55,319][129263] Decorrelating experience for 64 frames... +[2024-11-07 12:32:55,337][129156] Heartbeat connected on RolloutWorker_w3 +[2024-11-07 12:32:55,799][129258] Decorrelating experience for 96 frames... +[2024-11-07 12:32:55,896][129156] Heartbeat connected on RolloutWorker_w2 +[2024-11-07 12:32:55,985][129257] Decorrelating experience for 96 frames... +[2024-11-07 12:32:56,004][129262] Decorrelating experience for 32 frames... +[2024-11-07 12:32:56,147][129156] Heartbeat connected on RolloutWorker_w1 +[2024-11-07 12:32:56,171][129256] Decorrelating experience for 96 frames... +[2024-11-07 12:32:56,220][129261] Decorrelating experience for 96 frames... +[2024-11-07 12:32:56,258][129263] Decorrelating experience for 96 frames... +[2024-11-07 12:32:56,279][129156] Heartbeat connected on RolloutWorker_w0 +[2024-11-07 12:32:56,300][129156] Heartbeat connected on RolloutWorker_w5 +[2024-11-07 12:32:56,342][129156] Heartbeat connected on RolloutWorker_w6 +[2024-11-07 12:32:56,557][129262] Decorrelating experience for 64 frames... +[2024-11-07 12:32:57,231][129262] Decorrelating experience for 96 frames... +[2024-11-07 12:32:57,381][129156] Heartbeat connected on RolloutWorker_w7 +[2024-11-07 12:32:59,711][129156] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 2150400. Throughput: 0: 148.3. Samples: 1484. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-11-07 12:32:59,713][129156] Avg episode reward: [(0, '1.831')] +[2024-11-07 12:32:59,879][129242] Signal inference workers to stop experience collection... +[2024-11-07 12:32:59,913][129255] InferenceWorker_p0-w0: stopping experience collection +[2024-11-07 12:33:04,707][129156] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 2150400. Throughput: 0: 148.9. Samples: 2234. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-11-07 12:33:04,709][129156] Avg episode reward: [(0, '1.979')] +[2024-11-07 12:33:08,573][129242] Signal inference workers to resume experience collection... +[2024-11-07 12:33:08,574][129255] InferenceWorker_p0-w0: resuming experience collection +[2024-11-07 12:33:09,707][129156] Fps is (10 sec: 819.6, 60 sec: 409.6, 300 sec: 409.6). Total num frames: 2158592. Throughput: 0: 111.7. Samples: 2234. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) +[2024-11-07 12:33:09,712][129156] Avg episode reward: [(0, '2.843')] +[2024-11-07 12:33:14,707][129156] Fps is (10 sec: 3686.5, 60 sec: 1474.6, 300 sec: 1474.6). Total num frames: 2187264. Throughput: 0: 367.0. Samples: 9174. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 12:33:14,709][129156] Avg episode reward: [(0, '3.927')] +[2024-11-07 12:33:15,426][129255] Updated weights for policy 0, policy_version 535 (0.0041) +[2024-11-07 12:33:19,707][129156] Fps is (10 sec: 4915.1, 60 sec: 1911.4, 300 sec: 1911.4). Total num frames: 2207744. Throughput: 0: 433.9. Samples: 13018. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 12:33:19,710][129156] Avg episode reward: [(0, '4.468')] +[2024-11-07 12:33:24,035][129255] Updated weights for policy 0, policy_version 545 (0.0073) +[2024-11-07 12:33:24,710][129156] Fps is (10 sec: 4504.2, 60 sec: 2340.4, 300 sec: 2340.4). Total num frames: 2232320. Throughput: 0: 565.1. Samples: 19780. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 12:33:24,714][129156] Avg episode reward: [(0, '4.618')] +[2024-11-07 12:33:29,707][129156] Fps is (10 sec: 4096.2, 60 sec: 2457.6, 300 sec: 2457.6). Total num frames: 2248704. Throughput: 0: 630.4. Samples: 25214. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 12:33:29,708][129156] Avg episode reward: [(0, '4.466')] +[2024-11-07 12:33:34,306][129255] Updated weights for policy 0, policy_version 555 (0.0049) +[2024-11-07 12:33:34,707][129156] Fps is (10 sec: 4097.1, 60 sec: 2730.6, 300 sec: 2730.6). Total num frames: 2273280. Throughput: 0: 625.4. Samples: 28144. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 12:33:34,712][129156] Avg episode reward: [(0, '4.361')] +[2024-11-07 12:33:39,707][129156] Fps is (10 sec: 5324.8, 60 sec: 3031.0, 300 sec: 3031.0). Total num frames: 2301952. Throughput: 0: 809.0. Samples: 36402. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 12:33:39,711][129156] Avg episode reward: [(0, '4.402')] +[2024-11-07 12:33:41,514][129255] Updated weights for policy 0, policy_version 565 (0.0030) +[2024-11-07 12:33:44,707][129156] Fps is (10 sec: 5734.5, 60 sec: 3276.8, 300 sec: 3276.8). Total num frames: 2330624. Throughput: 0: 965.4. Samples: 44922. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 12:33:44,709][129156] Avg episode reward: [(0, '4.465')] +[2024-11-07 12:33:48,703][129255] Updated weights for policy 0, policy_version 575 (0.0026) +[2024-11-07 12:33:49,707][129156] Fps is (10 sec: 5734.2, 60 sec: 3481.6, 300 sec: 3481.6). Total num frames: 2359296. Throughput: 0: 1048.7. Samples: 49424. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 12:33:49,713][129156] Avg episode reward: [(0, '4.284')] +[2024-11-07 12:33:54,707][129156] Fps is (10 sec: 5734.4, 60 sec: 3959.7, 300 sec: 3654.9). Total num frames: 2387968. Throughput: 0: 1234.8. Samples: 57800. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-07 12:33:54,709][129156] Avg episode reward: [(0, '4.330')] +[2024-11-07 12:33:56,229][129255] Updated weights for policy 0, policy_version 585 (0.0046) +[2024-11-07 12:33:59,707][129156] Fps is (10 sec: 5734.5, 60 sec: 4437.7, 300 sec: 3803.4). Total num frames: 2416640. Throughput: 0: 1272.5. Samples: 66438. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 12:33:59,709][129156] Avg episode reward: [(0, '4.526')] +[2024-11-07 12:34:04,707][129156] Fps is (10 sec: 4505.7, 60 sec: 4710.4, 300 sec: 3768.3). Total num frames: 2433024. Throughput: 0: 1224.1. Samples: 68102. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 12:34:04,709][129156] Avg episode reward: [(0, '4.584')] +[2024-11-07 12:34:05,375][129255] Updated weights for policy 0, policy_version 595 (0.0031) +[2024-11-07 12:34:09,712][129156] Fps is (10 sec: 3684.7, 60 sec: 4914.8, 300 sec: 3788.6). Total num frames: 2453504. Throughput: 0: 1234.4. Samples: 75332. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 12:34:09,718][129156] Avg episode reward: [(0, '4.446')] +[2024-11-07 12:34:14,707][129156] Fps is (10 sec: 4095.9, 60 sec: 4778.6, 300 sec: 3806.9). Total num frames: 2473984. Throughput: 0: 1235.3. Samples: 80802. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 12:34:14,718][129156] Avg episode reward: [(0, '4.383')] +[2024-11-07 12:34:15,898][129255] Updated weights for policy 0, policy_version 605 (0.0067) +[2024-11-07 12:34:19,708][129156] Fps is (10 sec: 3687.6, 60 sec: 4710.3, 300 sec: 3777.4). Total num frames: 2490368. Throughput: 0: 1233.0. Samples: 83632. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 12:34:19,711][129156] Avg episode reward: [(0, '4.357')] +[2024-11-07 12:34:20,565][129156] Keyboard interrupt detected in the event loop EvtLoop [Runner_EvtLoop, process=main process 129156], exiting... +[2024-11-07 12:34:20,586][129242] Stopping Batcher_0... +[2024-11-07 12:34:20,586][129242] Loop batcher_evt_loop terminating... +[2024-11-07 12:34:20,570][129156] Runner profile tree view: +main_loop: 105.9005 +[2024-11-07 12:34:20,590][129156] Collected {0: 2490368}, FPS: 3210.3 +[2024-11-07 12:34:20,692][129255] Weights refcount: 2 0 +[2024-11-07 12:34:20,731][129255] Stopping InferenceWorker_p0-w0... +[2024-11-07 12:34:20,732][129255] Loop inference_proc0-0_evt_loop terminating... +[2024-11-07 12:34:20,733][129242] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000609_2494464.pth... +[2024-11-07 12:34:20,954][129259] Stopping RolloutWorker_w3... +[2024-11-07 12:34:20,955][129259] Loop rollout_proc3_evt_loop terminating... +[2024-11-07 12:34:21,021][129260] Stopping RolloutWorker_w4... +[2024-11-07 12:34:21,021][129260] Loop rollout_proc4_evt_loop terminating... +[2024-11-07 12:34:21,091][129263] Stopping RolloutWorker_w6... +[2024-11-07 12:34:21,092][129263] Loop rollout_proc6_evt_loop terminating... +[2024-11-07 12:34:21,253][129242] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000468_1916928.pth +[2024-11-07 12:34:21,242][129256] Stopping RolloutWorker_w0... +[2024-11-07 12:34:21,259][129256] Loop rollout_proc0_evt_loop terminating... +[2024-11-07 12:34:21,281][129242] Stopping LearnerWorker_p0... +[2024-11-07 12:34:21,282][129242] Loop learner_proc0_evt_loop terminating... +[2024-11-07 12:34:21,299][129257] Stopping RolloutWorker_w1... +[2024-11-07 12:34:21,324][129257] Loop rollout_proc1_evt_loop terminating... +[2024-11-07 12:34:21,363][129258] Stopping RolloutWorker_w2... +[2024-11-07 12:34:21,364][129258] Loop rollout_proc2_evt_loop terminating... +[2024-11-07 12:34:21,385][129262] Stopping RolloutWorker_w7... +[2024-11-07 12:34:21,386][129262] Loop rollout_proc7_evt_loop terminating... +[2024-11-07 12:34:21,515][129261] Stopping RolloutWorker_w5... +[2024-11-07 12:34:21,523][129261] Loop rollout_proc5_evt_loop terminating... +[2024-11-07 12:36:09,909][129156] Environment doom_basic already registered, overwriting... +[2024-11-07 12:36:09,911][129156] Environment doom_two_colors_easy already registered, overwriting... +[2024-11-07 12:36:09,912][129156] Environment doom_two_colors_hard already registered, overwriting... +[2024-11-07 12:36:09,914][129156] Environment doom_dm already registered, overwriting... +[2024-11-07 12:36:09,915][129156] Environment doom_dwango5 already registered, overwriting... +[2024-11-07 12:36:09,916][129156] Environment doom_my_way_home_flat_actions already registered, overwriting... +[2024-11-07 12:36:09,917][129156] Environment doom_defend_the_center_flat_actions already registered, overwriting... +[2024-11-07 12:36:09,919][129156] Environment doom_my_way_home already registered, overwriting... +[2024-11-07 12:36:09,920][129156] Environment doom_deadly_corridor already registered, overwriting... +[2024-11-07 12:36:09,921][129156] Environment doom_defend_the_center already registered, overwriting... +[2024-11-07 12:36:09,922][129156] Environment doom_defend_the_line already registered, overwriting... +[2024-11-07 12:36:09,922][129156] Environment doom_health_gathering already registered, overwriting... +[2024-11-07 12:36:09,928][129156] Environment doom_health_gathering_supreme already registered, overwriting... +[2024-11-07 12:36:09,930][129156] Environment doom_battle already registered, overwriting... +[2024-11-07 12:36:09,931][129156] Environment doom_battle2 already registered, overwriting... +[2024-11-07 12:36:09,933][129156] Environment doom_duel_bots already registered, overwriting... +[2024-11-07 12:36:09,934][129156] Environment doom_deathmatch_bots already registered, overwriting... +[2024-11-07 12:36:09,936][129156] Environment doom_duel already registered, overwriting... +[2024-11-07 12:36:09,938][129156] Environment doom_deathmatch_full already registered, overwriting... +[2024-11-07 12:36:09,941][129156] Environment doom_benchmark already registered, overwriting... +[2024-11-07 12:36:09,943][129156] register_encoder_factory: +[2024-11-07 12:36:10,059][129156] Loading existing experiment configuration from /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/config.json +[2024-11-07 12:36:10,066][129156] Experiment dir /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment already exists! +[2024-11-07 12:36:10,067][129156] Resuming existing experiment from /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment... +[2024-11-07 12:36:10,069][129156] Weights and Biases integration disabled +[2024-11-07 12:36:10,073][129156] Environment var CUDA_VISIBLE_DEVICES is 0 + +[2024-11-07 12:36:15,029][129156] Starting experiment with the following configuration: +help=False +algo=APPO +env=doom_health_gathering_supreme +experiment=default_experiment +train_dir=/root/hfRL/ml/LunarLander-v2/train_dir +restart_behavior=resume +device=gpu +seed=None +num_policies=1 +async_rl=True +serial_mode=False +batched_sampling=False +num_batches_to_accumulate=2 +worker_num_splits=2 +policy_workers_per_policy=1 +max_policy_lag=1000 +num_workers=8 +num_envs_per_worker=4 +batch_size=1024 +num_batches_per_epoch=1 +num_epochs=1 +rollout=32 +recurrence=32 +shuffle_minibatches=False +gamma=0.99 +reward_scale=1.0 +reward_clip=1000.0 +value_bootstrap=False +normalize_returns=True +exploration_loss_coeff=0.001 +value_loss_coeff=0.5 +kl_loss_coeff=0.0 +exploration_loss=symmetric_kl +gae_lambda=0.95 +ppo_clip_ratio=0.1 +ppo_clip_value=0.2 +with_vtrace=False +vtrace_rho=1.0 +vtrace_c=1.0 +optimizer=adam +adam_eps=1e-06 +adam_beta1=0.9 +adam_beta2=0.999 +max_grad_norm=4.0 +learning_rate=0.0001 +lr_schedule=constant +lr_schedule_kl_threshold=0.008 +lr_adaptive_min=1e-06 +lr_adaptive_max=0.01 +obs_subtract_mean=0.0 +obs_scale=255.0 +normalize_input=True +normalize_input_keys=None +decorrelate_experience_max_seconds=0 +decorrelate_envs_on_one_worker=True +actor_worker_gpus=[] +set_workers_cpu_affinity=True +force_envs_single_thread=False +default_niceness=0 +log_to_file=True +experiment_summaries_interval=10 +flush_summaries_interval=30 +stats_avg=100 +summaries_use_frameskip=True +heartbeat_interval=20 +heartbeat_reporting_interval=600 +train_for_env_steps=4000000 +train_for_seconds=10000000000 +save_every_sec=120 +keep_checkpoints=2 +load_checkpoint_kind=latest +save_milestones_sec=-1 +save_best_every_sec=5 +save_best_metric=reward +save_best_after=100000 +benchmark=False +encoder_mlp_layers=[512, 512] +encoder_conv_architecture=convnet_simple +encoder_conv_mlp_layers=[512] +use_rnn=True +rnn_size=512 +rnn_type=gru +rnn_num_layers=1 +decoder_mlp_layers=[] +nonlinearity=elu +policy_initialization=orthogonal +policy_init_gain=1.0 +actor_critic_share_weights=True +adaptive_stddev=True +continuous_tanh_scale=0.0 +initial_stddev=1.0 +use_env_info_cache=False +env_gpu_actions=False +env_gpu_observations=True +env_frameskip=4 +env_framestack=1 +pixel_format=CHW +use_record_episode_statistics=False +with_wandb=False +wandb_user=None +wandb_project=sample_factory +wandb_group=None +wandb_job_type=SF +wandb_tags=[] +with_pbt=False +pbt_mix_policies_in_one_env=True +pbt_period_env_steps=5000000 +pbt_start_mutation=20000000 +pbt_replace_fraction=0.3 +pbt_mutation_rate=0.15 +pbt_replace_reward_gap=0.1 +pbt_replace_reward_gap_absolute=1e-06 +pbt_optimize_gamma=False +pbt_target_objective=true_objective +pbt_perturb_min=1.1 +pbt_perturb_max=1.5 +num_agents=-1 +num_humans=0 +num_bots=-1 +start_bot_difficulty=None +timelimit=None +res_w=128 +res_h=72 +wide_aspect_ratio=False +eval_env_frameskip=1 +fps=35 +command_line=--env=doom_health_gathering_supreme --num_workers=8 --num_envs_per_worker=4 --train_for_env_steps=4000000 +cli_args={'env': 'doom_health_gathering_supreme', 'num_workers': 8, 'num_envs_per_worker': 4, 'train_for_env_steps': 4000000} +git_hash=unknown +git_repo_name=not a git repository +[2024-11-07 12:36:15,031][129156] Saving configuration to /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/config.json... +[2024-11-07 12:36:15,035][129156] Rollout worker 0 uses device cpu +[2024-11-07 12:36:15,038][129156] Rollout worker 1 uses device cpu +[2024-11-07 12:36:15,039][129156] Rollout worker 2 uses device cpu +[2024-11-07 12:36:15,039][129156] Rollout worker 3 uses device cpu +[2024-11-07 12:36:15,040][129156] Rollout worker 4 uses device cpu +[2024-11-07 12:36:15,041][129156] Rollout worker 5 uses device cpu +[2024-11-07 12:36:15,044][129156] Rollout worker 6 uses device cpu +[2024-11-07 12:36:15,045][129156] Rollout worker 7 uses device cpu +[2024-11-07 12:36:15,119][129156] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-11-07 12:36:15,121][129156] InferenceWorker_p0-w0: min num requests: 2 +[2024-11-07 12:36:15,157][129156] Starting all processes... +[2024-11-07 12:36:15,158][129156] Starting process learner_proc0 +[2024-11-07 12:36:15,197][129156] Starting all processes... +[2024-11-07 12:36:15,202][129156] Starting process inference_proc0-0 +[2024-11-07 12:36:15,202][129156] Starting process rollout_proc0 +[2024-11-07 12:36:15,204][129156] Starting process rollout_proc1 +[2024-11-07 12:36:15,207][129156] Starting process rollout_proc2 +[2024-11-07 12:36:15,208][129156] Starting process rollout_proc3 +[2024-11-07 12:36:15,208][129156] Starting process rollout_proc4 +[2024-11-07 12:36:15,209][129156] Starting process rollout_proc5 +[2024-11-07 12:36:15,215][129156] Starting process rollout_proc6 +[2024-11-07 12:36:15,226][129156] Starting process rollout_proc7 +[2024-11-07 12:36:20,758][129156] Keyboard interrupt detected in the event loop EvtLoop [Runner_EvtLoop, process=main process 129156], exiting... +[2024-11-07 12:36:20,767][129156] Runner profile tree view: +main_loop: 5.6098 +[2024-11-07 12:36:20,769][129156] Collected {}, FPS: 0.0 +[2024-11-07 12:36:21,570][130420] Worker 5 uses CPU cores [5] +[2024-11-07 12:36:21,772][130420] Stopping RolloutWorker_w5... +[2024-11-07 12:36:21,773][130420] Loop rollout_proc5_evt_loop terminating... +[2024-11-07 12:36:21,981][130417] Worker 3 uses CPU cores [3] +[2024-11-07 12:36:22,131][130417] Stopping RolloutWorker_w3... +[2024-11-07 12:36:22,132][130417] Loop rollout_proc3_evt_loop terminating... +[2024-11-07 12:36:22,630][130415] Worker 1 uses CPU cores [1] +[2024-11-07 12:36:22,821][130415] Stopping RolloutWorker_w1... +[2024-11-07 12:36:22,822][130415] Loop rollout_proc1_evt_loop terminating... +[2024-11-07 12:36:23,284][130419] Worker 6 uses CPU cores [6] +[2024-11-07 12:36:23,402][130419] Stopping RolloutWorker_w6... +[2024-11-07 12:36:23,423][130419] Loop rollout_proc6_evt_loop terminating... +[2024-11-07 12:36:23,485][130422] Worker 7 uses CPU cores [0, 1, 2, 3, 4, 5, 6] +[2024-11-07 12:36:23,529][130422] Stopping RolloutWorker_w7... +[2024-11-07 12:36:23,529][130422] Loop rollout_proc7_evt_loop terminating... +[2024-11-07 12:36:23,762][130414] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-11-07 12:36:23,764][130414] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 +[2024-11-07 12:36:23,870][130416] Worker 2 uses CPU cores [2] +[2024-11-07 12:36:23,919][130418] Worker 4 uses CPU cores [4] +[2024-11-07 12:36:23,929][130416] Stopping RolloutWorker_w2... +[2024-11-07 12:36:23,929][130416] Loop rollout_proc2_evt_loop terminating... +[2024-11-07 12:36:24,001][130418] Stopping RolloutWorker_w4... +[2024-11-07 12:36:24,001][130418] Loop rollout_proc4_evt_loop terminating... +[2024-11-07 12:36:24,328][130414] Num visible devices: 1 +[2024-11-07 12:36:24,377][130414] Stopping InferenceWorker_p0-w0... +[2024-11-07 12:36:24,378][130414] Loop inference_proc0-0_evt_loop terminating... +[2024-11-07 12:36:24,462][130413] Worker 0 uses CPU cores [0] +[2024-11-07 12:36:24,641][130413] Stopping RolloutWorker_w0... +[2024-11-07 12:36:24,642][130413] Loop rollout_proc0_evt_loop terminating... +[2024-11-07 12:36:24,841][130400] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-11-07 12:36:24,841][130400] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 +[2024-11-07 12:36:24,938][130400] Num visible devices: 1 +[2024-11-07 12:36:25,011][130400] Starting seed is not provided +[2024-11-07 12:36:25,011][130400] Stopping Batcher_0... +[2024-11-07 12:36:25,012][130400] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-11-07 12:36:25,012][130400] Loop batcher_evt_loop terminating... +[2024-11-07 12:36:25,013][130400] Initializing actor-critic model on device cuda:0 +[2024-11-07 12:36:25,014][130400] RunningMeanStd input shape: (3, 72, 128) +[2024-11-07 12:36:25,016][130400] RunningMeanStd input shape: (1,) +[2024-11-07 12:36:25,101][130400] ConvEncoder: input_channels=3 +[2024-11-07 12:36:25,691][130400] Conv encoder output size: 512 +[2024-11-07 12:36:25,692][130400] Policy head output size: 512 +[2024-11-07 12:36:25,718][130400] Created Actor Critic model with architecture: +[2024-11-07 12:36:25,719][130400] ActorCriticSharedWeights( + (obs_normalizer): ObservationNormalizer( + (running_mean_std): RunningMeanStdDictInPlace( + (running_mean_std): ModuleDict( + (obs): RunningMeanStdInPlace() + ) + ) + ) + (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) + (encoder): VizdoomEncoder( + (basic_encoder): ConvEncoder( + (enc): RecursiveScriptModule( + original_name=ConvEncoderImpl + (conv_head): RecursiveScriptModule( + original_name=Sequential + (0): RecursiveScriptModule(original_name=Conv2d) + (1): RecursiveScriptModule(original_name=ELU) + (2): RecursiveScriptModule(original_name=Conv2d) + (3): RecursiveScriptModule(original_name=ELU) + (4): RecursiveScriptModule(original_name=Conv2d) + (5): RecursiveScriptModule(original_name=ELU) + ) + (mlp_layers): RecursiveScriptModule( + original_name=Sequential + (0): RecursiveScriptModule(original_name=Linear) + (1): RecursiveScriptModule(original_name=ELU) + ) + ) + ) + ) + (core): ModelCoreRNN( + (core): GRU(512, 512) + ) + (decoder): MlpDecoder( + (mlp): Identity() + ) + (critic_linear): Linear(in_features=512, out_features=1, bias=True) + (action_parameterization): ActionParameterizationDefault( + (distribution_linear): Linear(in_features=512, out_features=5, bias=True) + ) +) +[2024-11-07 12:36:26,916][130400] Using optimizer +[2024-11-07 12:36:28,167][130400] Loading state from checkpoint /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000609_2494464.pth... +[2024-11-07 12:36:28,230][130400] Loading model from checkpoint +[2024-11-07 12:36:28,231][130400] Loaded experiment state at self.train_step=609, self.env_steps=2494464 +[2024-11-07 12:36:28,232][130400] Initialized policy 0 weights for model version 609 +[2024-11-07 12:36:28,238][130400] LearnerWorker_p0 finished initialization! +[2024-11-07 12:36:28,238][130400] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000609_2494464.pth... +[2024-11-07 12:36:28,304][130400] Stopping LearnerWorker_p0... +[2024-11-07 12:36:28,304][130400] Loop learner_proc0_evt_loop terminating... +[2024-11-07 12:36:49,466][129156] Environment doom_basic already registered, overwriting... +[2024-11-07 12:36:49,469][129156] Environment doom_two_colors_easy already registered, overwriting... +[2024-11-07 12:36:49,469][129156] Environment doom_two_colors_hard already registered, overwriting... +[2024-11-07 12:36:49,471][129156] Environment doom_dm already registered, overwriting... +[2024-11-07 12:36:49,472][129156] Environment doom_dwango5 already registered, overwriting... +[2024-11-07 12:36:49,473][129156] Environment doom_my_way_home_flat_actions already registered, overwriting... +[2024-11-07 12:36:49,475][129156] Environment doom_defend_the_center_flat_actions already registered, overwriting... +[2024-11-07 12:36:49,477][129156] Environment doom_my_way_home already registered, overwriting... +[2024-11-07 12:36:49,480][129156] Environment doom_deadly_corridor already registered, overwriting... +[2024-11-07 12:36:49,482][129156] Environment doom_defend_the_center already registered, overwriting... +[2024-11-07 12:36:49,484][129156] Environment doom_defend_the_line already registered, overwriting... +[2024-11-07 12:36:49,485][129156] Environment doom_health_gathering already registered, overwriting... +[2024-11-07 12:36:49,486][129156] Environment doom_health_gathering_supreme already registered, overwriting... +[2024-11-07 12:36:49,488][129156] Environment doom_battle already registered, overwriting... +[2024-11-07 12:36:49,489][129156] Environment doom_battle2 already registered, overwriting... +[2024-11-07 12:36:49,491][129156] Environment doom_duel_bots already registered, overwriting... +[2024-11-07 12:36:49,494][129156] Environment doom_deathmatch_bots already registered, overwriting... +[2024-11-07 12:36:49,495][129156] Environment doom_duel already registered, overwriting... +[2024-11-07 12:36:49,497][129156] Environment doom_deathmatch_full already registered, overwriting... +[2024-11-07 12:36:49,498][129156] Environment doom_benchmark already registered, overwriting... +[2024-11-07 12:36:49,499][129156] register_encoder_factory: +[2024-11-07 12:36:49,515][129156] Loading existing experiment configuration from /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/config.json +[2024-11-07 12:36:49,521][129156] Experiment dir /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment already exists! +[2024-11-07 12:36:49,522][129156] Resuming existing experiment from /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment... +[2024-11-07 12:36:49,523][129156] Weights and Biases integration disabled +[2024-11-07 12:36:49,527][129156] Environment var CUDA_VISIBLE_DEVICES is 0 + +[2024-11-07 12:36:54,270][129156] Starting experiment with the following configuration: +help=False +algo=APPO +env=doom_health_gathering_supreme +experiment=default_experiment +train_dir=/root/hfRL/ml/LunarLander-v2/train_dir +restart_behavior=resume +device=gpu +seed=None +num_policies=1 +async_rl=True +serial_mode=False +batched_sampling=False +num_batches_to_accumulate=2 +worker_num_splits=2 +policy_workers_per_policy=1 +max_policy_lag=1000 +num_workers=8 +num_envs_per_worker=4 +batch_size=1024 +num_batches_per_epoch=1 +num_epochs=1 +rollout=32 +recurrence=32 +shuffle_minibatches=False +gamma=0.99 +reward_scale=1.0 +reward_clip=1000.0 +value_bootstrap=False +normalize_returns=True +exploration_loss_coeff=0.001 +value_loss_coeff=0.5 +kl_loss_coeff=0.0 +exploration_loss=symmetric_kl +gae_lambda=0.95 +ppo_clip_ratio=0.1 +ppo_clip_value=0.2 +with_vtrace=False +vtrace_rho=1.0 +vtrace_c=1.0 +optimizer=adam +adam_eps=1e-06 +adam_beta1=0.9 +adam_beta2=0.999 +max_grad_norm=4.0 +learning_rate=0.0001 +lr_schedule=constant +lr_schedule_kl_threshold=0.008 +lr_adaptive_min=1e-06 +lr_adaptive_max=0.01 +obs_subtract_mean=0.0 +obs_scale=255.0 +normalize_input=True +normalize_input_keys=None +decorrelate_experience_max_seconds=0 +decorrelate_envs_on_one_worker=True +actor_worker_gpus=[] +set_workers_cpu_affinity=True +force_envs_single_thread=False +default_niceness=0 +log_to_file=True +experiment_summaries_interval=10 +flush_summaries_interval=30 +stats_avg=100 +summaries_use_frameskip=True +heartbeat_interval=20 +heartbeat_reporting_interval=600 +train_for_env_steps=4000000 +train_for_seconds=10000000000 +save_every_sec=120 +keep_checkpoints=2 +load_checkpoint_kind=latest +save_milestones_sec=-1 +save_best_every_sec=5 +save_best_metric=reward +save_best_after=100000 +benchmark=False +encoder_mlp_layers=[512, 512] +encoder_conv_architecture=convnet_simple +encoder_conv_mlp_layers=[512] +use_rnn=True +rnn_size=512 +rnn_type=gru +rnn_num_layers=1 +decoder_mlp_layers=[] +nonlinearity=elu +policy_initialization=orthogonal +policy_init_gain=1.0 +actor_critic_share_weights=True +adaptive_stddev=True +continuous_tanh_scale=0.0 +initial_stddev=1.0 +use_env_info_cache=False +env_gpu_actions=False +env_gpu_observations=True +env_frameskip=4 +env_framestack=1 +pixel_format=CHW +use_record_episode_statistics=False +with_wandb=False +wandb_user=None +wandb_project=sample_factory +wandb_group=None +wandb_job_type=SF +wandb_tags=[] +with_pbt=False +pbt_mix_policies_in_one_env=True +pbt_period_env_steps=5000000 +pbt_start_mutation=20000000 +pbt_replace_fraction=0.3 +pbt_mutation_rate=0.15 +pbt_replace_reward_gap=0.1 +pbt_replace_reward_gap_absolute=1e-06 +pbt_optimize_gamma=False +pbt_target_objective=true_objective +pbt_perturb_min=1.1 +pbt_perturb_max=1.5 +num_agents=-1 +num_humans=0 +num_bots=-1 +start_bot_difficulty=None +timelimit=None +res_w=128 +res_h=72 +wide_aspect_ratio=False +eval_env_frameskip=1 +fps=35 +command_line=--env=doom_health_gathering_supreme --num_workers=8 --num_envs_per_worker=4 --train_for_env_steps=4000000 +cli_args={'env': 'doom_health_gathering_supreme', 'num_workers': 8, 'num_envs_per_worker': 4, 'train_for_env_steps': 4000000} +git_hash=unknown +git_repo_name=not a git repository +[2024-11-07 12:36:54,272][129156] Saving configuration to /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/config.json... +[2024-11-07 12:36:54,276][129156] Rollout worker 0 uses device cpu +[2024-11-07 12:36:54,278][129156] Rollout worker 1 uses device cpu +[2024-11-07 12:36:54,279][129156] Rollout worker 2 uses device cpu +[2024-11-07 12:36:54,280][129156] Rollout worker 3 uses device cpu +[2024-11-07 12:36:54,282][129156] Rollout worker 4 uses device cpu +[2024-11-07 12:36:54,282][129156] Rollout worker 5 uses device cpu +[2024-11-07 12:36:54,283][129156] Rollout worker 6 uses device cpu +[2024-11-07 12:36:54,285][129156] Rollout worker 7 uses device cpu +[2024-11-07 12:36:54,346][129156] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-11-07 12:36:54,348][129156] InferenceWorker_p0-w0: min num requests: 2 +[2024-11-07 12:36:54,383][129156] Starting all processes... +[2024-11-07 12:36:54,386][129156] Starting process learner_proc0 +[2024-11-07 12:36:54,434][129156] Starting all processes... +[2024-11-07 12:36:54,440][129156] Starting process inference_proc0-0 +[2024-11-07 12:36:54,441][129156] Starting process rollout_proc0 +[2024-11-07 12:36:54,447][129156] Starting process rollout_proc1 +[2024-11-07 12:36:54,447][129156] Starting process rollout_proc2 +[2024-11-07 12:36:54,447][129156] Starting process rollout_proc3 +[2024-11-07 12:36:54,447][129156] Starting process rollout_proc4 +[2024-11-07 12:36:54,448][129156] Starting process rollout_proc5 +[2024-11-07 12:36:54,449][129156] Starting process rollout_proc6 +[2024-11-07 12:36:54,450][129156] Starting process rollout_proc7 +[2024-11-07 12:36:59,619][130707] Worker 6 uses CPU cores [6] +[2024-11-07 12:36:59,676][130699] Worker 1 uses CPU cores [1] +[2024-11-07 12:36:59,687][129156] Keyboard interrupt detected in the event loop EvtLoop [Runner_EvtLoop, process=main process 129156], exiting... +[2024-11-07 12:36:59,688][129156] Runner profile tree view: +main_loop: 5.3055 +[2024-11-07 12:36:59,689][129156] Collected {}, FPS: 0.0 +[2024-11-07 12:36:59,702][130681] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-11-07 12:36:59,703][130681] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 +[2024-11-07 12:36:59,769][130699] Stopping RolloutWorker_w1... +[2024-11-07 12:36:59,770][130699] Loop rollout_proc1_evt_loop terminating... +[2024-11-07 12:36:59,819][130707] Stopping RolloutWorker_w6... +[2024-11-07 12:36:59,821][130707] Loop rollout_proc6_evt_loop terminating... +[2024-11-07 12:36:59,828][130703] Worker 2 uses CPU cores [2] +[2024-11-07 12:36:59,932][130703] Stopping RolloutWorker_w2... +[2024-11-07 12:36:59,932][130703] Loop rollout_proc2_evt_loop terminating... +[2024-11-07 12:36:59,967][130681] Num visible devices: 1 +[2024-11-07 12:37:00,021][130681] Starting seed is not provided +[2024-11-07 12:37:00,021][130681] Stopping Batcher_0... +[2024-11-07 12:37:00,022][130681] Loop batcher_evt_loop terminating... +[2024-11-07 12:37:00,022][130681] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-11-07 12:37:00,024][130681] Initializing actor-critic model on device cuda:0 +[2024-11-07 12:37:00,025][130681] RunningMeanStd input shape: (3, 72, 128) +[2024-11-07 12:37:00,030][130681] RunningMeanStd input shape: (1,) +[2024-11-07 12:37:00,138][130681] ConvEncoder: input_channels=3 +[2024-11-07 12:37:00,190][130697] Worker 0 uses CPU cores [0] +[2024-11-07 12:37:00,228][130701] Worker 4 uses CPU cores [4] +[2024-11-07 12:37:00,268][130697] Stopping RolloutWorker_w0... +[2024-11-07 12:37:00,268][130697] Loop rollout_proc0_evt_loop terminating... +[2024-11-07 12:37:00,300][130700] Worker 3 uses CPU cores [3] +[2024-11-07 12:37:00,325][130701] Stopping RolloutWorker_w4... +[2024-11-07 12:37:00,329][130701] Loop rollout_proc4_evt_loop terminating... +[2024-11-07 12:37:00,356][130700] Stopping RolloutWorker_w3... +[2024-11-07 12:37:00,357][130700] Loop rollout_proc3_evt_loop terminating... +[2024-11-07 12:37:00,511][130705] Worker 5 uses CPU cores [5] +[2024-11-07 12:37:00,589][130705] Stopping RolloutWorker_w5... +[2024-11-07 12:37:00,589][130705] Loop rollout_proc5_evt_loop terminating... +[2024-11-07 12:37:00,637][130681] Conv encoder output size: 512 +[2024-11-07 12:37:00,638][130681] Policy head output size: 512 +[2024-11-07 12:37:00,720][130681] Created Actor Critic model with architecture: +[2024-11-07 12:37:00,720][130681] ActorCriticSharedWeights( + (obs_normalizer): ObservationNormalizer( + (running_mean_std): RunningMeanStdDictInPlace( + (running_mean_std): ModuleDict( + (obs): RunningMeanStdInPlace() + ) + ) + ) + (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) + (encoder): VizdoomEncoder( + (basic_encoder): ConvEncoder( + (enc): RecursiveScriptModule( + original_name=ConvEncoderImpl + (conv_head): RecursiveScriptModule( + original_name=Sequential + (0): RecursiveScriptModule(original_name=Conv2d) + (1): RecursiveScriptModule(original_name=ELU) + (2): RecursiveScriptModule(original_name=Conv2d) + (3): RecursiveScriptModule(original_name=ELU) + (4): RecursiveScriptModule(original_name=Conv2d) + (5): RecursiveScriptModule(original_name=ELU) + ) + (mlp_layers): RecursiveScriptModule( + original_name=Sequential + (0): RecursiveScriptModule(original_name=Linear) + (1): RecursiveScriptModule(original_name=ELU) + ) + ) + ) + ) + (core): ModelCoreRNN( + (core): GRU(512, 512) + ) + (decoder): MlpDecoder( + (mlp): Identity() + ) + (critic_linear): Linear(in_features=512, out_features=1, bias=True) + (action_parameterization): ActionParameterizationDefault( + (distribution_linear): Linear(in_features=512, out_features=5, bias=True) + ) +) +[2024-11-07 12:37:00,922][130698] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-11-07 12:37:00,922][130698] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 +[2024-11-07 12:37:00,971][130698] Num visible devices: 1 +[2024-11-07 12:37:01,009][130698] Stopping InferenceWorker_p0-w0... +[2024-11-07 12:37:01,010][130698] Loop inference_proc0-0_evt_loop terminating... +[2024-11-07 12:37:01,410][130708] Worker 7 uses CPU cores [0, 1, 2, 3, 4, 5, 6] +[2024-11-07 12:37:01,434][130708] Stopping RolloutWorker_w7... +[2024-11-07 12:37:01,435][130708] Loop rollout_proc7_evt_loop terminating... +[2024-11-07 12:37:02,039][130681] Using optimizer +[2024-11-07 12:37:03,034][130681] Loading state from checkpoint /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000609_2494464.pth... +[2024-11-07 12:37:03,073][130681] Loading model from checkpoint +[2024-11-07 12:37:03,075][130681] Loaded experiment state at self.train_step=609, self.env_steps=2494464 +[2024-11-07 12:37:03,075][130681] Initialized policy 0 weights for model version 609 +[2024-11-07 12:37:03,081][130681] LearnerWorker_p0 finished initialization! +[2024-11-07 12:37:03,082][130681] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000609_2494464.pth... +[2024-11-07 12:37:03,140][130681] Stopping LearnerWorker_p0... +[2024-11-07 12:37:03,140][130681] Loop learner_proc0_evt_loop terminating... +[2024-11-07 12:37:09,549][129156] Environment doom_basic already registered, overwriting... +[2024-11-07 12:37:09,551][129156] Environment doom_two_colors_easy already registered, overwriting... +[2024-11-07 12:37:09,553][129156] Environment doom_two_colors_hard already registered, overwriting... +[2024-11-07 12:37:09,556][129156] Environment doom_dm already registered, overwriting... +[2024-11-07 12:37:09,557][129156] Environment doom_dwango5 already registered, overwriting... +[2024-11-07 12:37:09,559][129156] Environment doom_my_way_home_flat_actions already registered, overwriting... +[2024-11-07 12:37:09,560][129156] Environment doom_defend_the_center_flat_actions already registered, overwriting... +[2024-11-07 12:37:09,561][129156] Environment doom_my_way_home already registered, overwriting... +[2024-11-07 12:37:09,562][129156] Environment doom_deadly_corridor already registered, overwriting... +[2024-11-07 12:37:09,563][129156] Environment doom_defend_the_center already registered, overwriting... +[2024-11-07 12:37:09,564][129156] Environment doom_defend_the_line already registered, overwriting... +[2024-11-07 12:37:09,566][129156] Environment doom_health_gathering already registered, overwriting... +[2024-11-07 12:37:09,568][129156] Environment doom_health_gathering_supreme already registered, overwriting... +[2024-11-07 12:37:09,568][129156] Environment doom_battle already registered, overwriting... +[2024-11-07 12:37:09,570][129156] Environment doom_battle2 already registered, overwriting... +[2024-11-07 12:37:09,572][129156] Environment doom_duel_bots already registered, overwriting... +[2024-11-07 12:37:09,573][129156] Environment doom_deathmatch_bots already registered, overwriting... +[2024-11-07 12:37:09,575][129156] Environment doom_duel already registered, overwriting... +[2024-11-07 12:37:09,577][129156] Environment doom_deathmatch_full already registered, overwriting... +[2024-11-07 12:37:09,581][129156] Environment doom_benchmark already registered, overwriting... +[2024-11-07 12:37:09,582][129156] register_encoder_factory: +[2024-11-07 12:37:09,599][129156] Loading existing experiment configuration from /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/config.json +[2024-11-07 12:37:09,607][129156] Experiment dir /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment already exists! +[2024-11-07 12:37:09,609][129156] Resuming existing experiment from /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment... +[2024-11-07 12:37:09,610][129156] Weights and Biases integration disabled +[2024-11-07 12:37:09,614][129156] Environment var CUDA_VISIBLE_DEVICES is 0 + +[2024-11-07 12:37:11,933][129156] Starting experiment with the following configuration: +help=False +algo=APPO +env=doom_health_gathering_supreme +experiment=default_experiment +train_dir=/root/hfRL/ml/LunarLander-v2/train_dir +restart_behavior=resume +device=gpu +seed=None +num_policies=1 +async_rl=True +serial_mode=False +batched_sampling=False +num_batches_to_accumulate=2 +worker_num_splits=2 +policy_workers_per_policy=1 +max_policy_lag=1000 +num_workers=8 +num_envs_per_worker=4 +batch_size=1024 +num_batches_per_epoch=1 +num_epochs=1 +rollout=32 +recurrence=32 +shuffle_minibatches=False +gamma=0.99 +reward_scale=1.0 +reward_clip=1000.0 +value_bootstrap=False +normalize_returns=True +exploration_loss_coeff=0.001 +value_loss_coeff=0.5 +kl_loss_coeff=0.0 +exploration_loss=symmetric_kl +gae_lambda=0.95 +ppo_clip_ratio=0.1 +ppo_clip_value=0.2 +with_vtrace=False +vtrace_rho=1.0 +vtrace_c=1.0 +optimizer=adam +adam_eps=1e-06 +adam_beta1=0.9 +adam_beta2=0.999 +max_grad_norm=4.0 +learning_rate=0.0001 +lr_schedule=constant +lr_schedule_kl_threshold=0.008 +lr_adaptive_min=1e-06 +lr_adaptive_max=0.01 +obs_subtract_mean=0.0 +obs_scale=255.0 +normalize_input=True +normalize_input_keys=None +decorrelate_experience_max_seconds=0 +decorrelate_envs_on_one_worker=True +actor_worker_gpus=[] +set_workers_cpu_affinity=True +force_envs_single_thread=False +default_niceness=0 +log_to_file=True +experiment_summaries_interval=10 +flush_summaries_interval=30 +stats_avg=100 +summaries_use_frameskip=True +heartbeat_interval=20 +heartbeat_reporting_interval=600 +train_for_env_steps=4000000 +train_for_seconds=10000000000 +save_every_sec=120 +keep_checkpoints=2 +load_checkpoint_kind=latest +save_milestones_sec=-1 +save_best_every_sec=5 +save_best_metric=reward +save_best_after=100000 +benchmark=False +encoder_mlp_layers=[512, 512] +encoder_conv_architecture=convnet_simple +encoder_conv_mlp_layers=[512] +use_rnn=True +rnn_size=512 +rnn_type=gru +rnn_num_layers=1 +decoder_mlp_layers=[] +nonlinearity=elu +policy_initialization=orthogonal +policy_init_gain=1.0 +actor_critic_share_weights=True +adaptive_stddev=True +continuous_tanh_scale=0.0 +initial_stddev=1.0 +use_env_info_cache=False +env_gpu_actions=False +env_gpu_observations=True +env_frameskip=4 +env_framestack=1 +pixel_format=CHW +use_record_episode_statistics=False +with_wandb=False +wandb_user=None +wandb_project=sample_factory +wandb_group=None +wandb_job_type=SF +wandb_tags=[] +with_pbt=False +pbt_mix_policies_in_one_env=True +pbt_period_env_steps=5000000 +pbt_start_mutation=20000000 +pbt_replace_fraction=0.3 +pbt_mutation_rate=0.15 +pbt_replace_reward_gap=0.1 +pbt_replace_reward_gap_absolute=1e-06 +pbt_optimize_gamma=False +pbt_target_objective=true_objective +pbt_perturb_min=1.1 +pbt_perturb_max=1.5 +num_agents=-1 +num_humans=0 +num_bots=-1 +start_bot_difficulty=None +timelimit=None +res_w=128 +res_h=72 +wide_aspect_ratio=False +eval_env_frameskip=1 +fps=35 +command_line=--env=doom_health_gathering_supreme --num_workers=8 --num_envs_per_worker=4 --train_for_env_steps=4000000 +cli_args={'env': 'doom_health_gathering_supreme', 'num_workers': 8, 'num_envs_per_worker': 4, 'train_for_env_steps': 4000000} +git_hash=unknown +git_repo_name=not a git repository +[2024-11-07 12:37:11,934][129156] Saving configuration to /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/config.json... +[2024-11-07 12:37:11,937][129156] Rollout worker 0 uses device cpu +[2024-11-07 12:37:11,938][129156] Rollout worker 1 uses device cpu +[2024-11-07 12:37:11,939][129156] Rollout worker 2 uses device cpu +[2024-11-07 12:37:11,941][129156] Rollout worker 3 uses device cpu +[2024-11-07 12:37:11,942][129156] Rollout worker 4 uses device cpu +[2024-11-07 12:37:11,943][129156] Rollout worker 5 uses device cpu +[2024-11-07 12:37:11,945][129156] Rollout worker 6 uses device cpu +[2024-11-07 12:37:11,946][129156] Rollout worker 7 uses device cpu +[2024-11-07 12:37:12,028][129156] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-11-07 12:37:12,031][129156] InferenceWorker_p0-w0: min num requests: 2 +[2024-11-07 12:37:12,068][129156] Starting all processes... +[2024-11-07 12:37:12,069][129156] Starting process learner_proc0 +[2024-11-07 12:37:12,118][129156] Starting all processes... +[2024-11-07 12:37:12,123][129156] Starting process inference_proc0-0 +[2024-11-07 12:37:12,124][129156] Starting process rollout_proc0 +[2024-11-07 12:37:12,124][129156] Starting process rollout_proc1 +[2024-11-07 12:37:12,125][129156] Starting process rollout_proc2 +[2024-11-07 12:37:12,126][129156] Starting process rollout_proc3 +[2024-11-07 12:37:12,133][129156] Starting process rollout_proc4 +[2024-11-07 12:37:12,137][129156] Starting process rollout_proc5 +[2024-11-07 12:37:12,138][129156] Starting process rollout_proc6 +[2024-11-07 12:37:12,139][129156] Starting process rollout_proc7 +[2024-11-07 12:37:16,596][130924] Worker 1 uses CPU cores [1] +[2024-11-07 12:37:16,645][130909] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-11-07 12:37:16,645][130909] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 +[2024-11-07 12:37:16,761][130909] Num visible devices: 1 +[2024-11-07 12:37:16,826][130909] Starting seed is not provided +[2024-11-07 12:37:16,827][130909] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-11-07 12:37:16,827][130909] Initializing actor-critic model on device cuda:0 +[2024-11-07 12:37:16,827][130909] RunningMeanStd input shape: (3, 72, 128) +[2024-11-07 12:37:16,828][130909] RunningMeanStd input shape: (1,) +[2024-11-07 12:37:16,855][130909] ConvEncoder: input_channels=3 +[2024-11-07 12:37:17,022][130926] Worker 3 uses CPU cores [3] +[2024-11-07 12:37:17,032][130929] Worker 5 uses CPU cores [5] +[2024-11-07 12:37:17,124][130909] Conv encoder output size: 512 +[2024-11-07 12:37:17,125][130909] Policy head output size: 512 +[2024-11-07 12:37:17,143][130909] Created Actor Critic model with architecture: +[2024-11-07 12:37:17,144][130909] ActorCriticSharedWeights( + (obs_normalizer): ObservationNormalizer( + (running_mean_std): RunningMeanStdDictInPlace( + (running_mean_std): ModuleDict( + (obs): RunningMeanStdInPlace() + ) + ) + ) + (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) + (encoder): VizdoomEncoder( + (basic_encoder): ConvEncoder( + (enc): RecursiveScriptModule( + original_name=ConvEncoderImpl + (conv_head): RecursiveScriptModule( + original_name=Sequential + (0): RecursiveScriptModule(original_name=Conv2d) + (1): RecursiveScriptModule(original_name=ELU) + (2): RecursiveScriptModule(original_name=Conv2d) + (3): RecursiveScriptModule(original_name=ELU) + (4): RecursiveScriptModule(original_name=Conv2d) + (5): RecursiveScriptModule(original_name=ELU) + ) + (mlp_layers): RecursiveScriptModule( + original_name=Sequential + (0): RecursiveScriptModule(original_name=Linear) + (1): RecursiveScriptModule(original_name=ELU) + ) + ) + ) + ) + (core): ModelCoreRNN( + (core): GRU(512, 512) + ) + (decoder): MlpDecoder( + (mlp): Identity() + ) + (critic_linear): Linear(in_features=512, out_features=1, bias=True) + (action_parameterization): ActionParameterizationDefault( + (distribution_linear): Linear(in_features=512, out_features=5, bias=True) + ) +) +[2024-11-07 12:37:17,494][130923] Worker 0 uses CPU cores [0] +[2024-11-07 12:37:17,505][130922] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-11-07 12:37:17,505][130922] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 +[2024-11-07 12:37:17,506][130936] Worker 4 uses CPU cores [4] +[2024-11-07 12:37:17,541][130922] Num visible devices: 1 +[2024-11-07 12:37:17,576][130928] Worker 6 uses CPU cores [6] +[2024-11-07 12:37:17,590][130927] Worker 7 uses CPU cores [0, 1, 2, 3, 4, 5, 6] +[2024-11-07 12:37:17,637][130925] Worker 2 uses CPU cores [2] +[2024-11-07 12:37:17,840][130909] Using optimizer +[2024-11-07 12:37:18,775][130909] Loading state from checkpoint /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000609_2494464.pth... +[2024-11-07 12:37:18,816][130909] Loading model from checkpoint +[2024-11-07 12:37:18,819][130909] Loaded experiment state at self.train_step=609, self.env_steps=2494464 +[2024-11-07 12:37:18,819][130909] Initialized policy 0 weights for model version 609 +[2024-11-07 12:37:18,826][130909] LearnerWorker_p0 finished initialization! +[2024-11-07 12:37:18,828][130909] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-11-07 12:37:19,029][130922] RunningMeanStd input shape: (3, 72, 128) +[2024-11-07 12:37:19,030][130922] RunningMeanStd input shape: (1,) +[2024-11-07 12:37:19,046][130922] ConvEncoder: input_channels=3 +[2024-11-07 12:37:19,171][130922] Conv encoder output size: 512 +[2024-11-07 12:37:19,172][130922] Policy head output size: 512 +[2024-11-07 12:37:19,216][129156] Inference worker 0-0 is ready! +[2024-11-07 12:37:19,217][129156] All inference workers are ready! Signal rollout workers to start! +[2024-11-07 12:37:19,275][130929] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 12:37:19,278][130926] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 12:37:19,285][130936] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 12:37:19,288][130924] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 12:37:19,301][130928] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 12:37:19,303][130923] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 12:37:19,330][130927] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 12:37:19,348][130925] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 12:37:19,615][129156] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 2494464. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-11-07 12:37:21,706][130923] Decorrelating experience for 0 frames... +[2024-11-07 12:37:21,708][130929] Decorrelating experience for 0 frames... +[2024-11-07 12:37:21,709][130928] Decorrelating experience for 0 frames... +[2024-11-07 12:37:21,712][130925] Decorrelating experience for 0 frames... +[2024-11-07 12:37:21,718][130927] Decorrelating experience for 0 frames... +[2024-11-07 12:37:21,721][130926] Decorrelating experience for 0 frames... +[2024-11-07 12:37:22,045][130923] Decorrelating experience for 32 frames... +[2024-11-07 12:37:22,061][130927] Decorrelating experience for 32 frames... +[2024-11-07 12:37:22,062][130929] Decorrelating experience for 32 frames... +[2024-11-07 12:37:22,121][130936] Decorrelating experience for 0 frames... +[2024-11-07 12:37:22,140][130928] Decorrelating experience for 32 frames... +[2024-11-07 12:37:22,477][130936] Decorrelating experience for 32 frames... +[2024-11-07 12:37:22,491][130925] Decorrelating experience for 32 frames... +[2024-11-07 12:37:22,515][130924] Decorrelating experience for 0 frames... +[2024-11-07 12:37:22,659][130928] Decorrelating experience for 64 frames... +[2024-11-07 12:37:22,663][130929] Decorrelating experience for 64 frames... +[2024-11-07 12:37:23,009][130927] Decorrelating experience for 64 frames... +[2024-11-07 12:37:23,040][130924] Decorrelating experience for 32 frames... +[2024-11-07 12:37:23,063][130926] Decorrelating experience for 32 frames... +[2024-11-07 12:37:23,153][130936] Decorrelating experience for 64 frames... +[2024-11-07 12:37:23,247][130923] Decorrelating experience for 64 frames... +[2024-11-07 12:37:23,290][130929] Decorrelating experience for 96 frames... +[2024-11-07 12:37:23,343][130928] Decorrelating experience for 96 frames... +[2024-11-07 12:37:23,594][130925] Decorrelating experience for 64 frames... +[2024-11-07 12:37:23,652][130926] Decorrelating experience for 64 frames... +[2024-11-07 12:37:23,698][130924] Decorrelating experience for 64 frames... +[2024-11-07 12:37:23,709][130936] Decorrelating experience for 96 frames... +[2024-11-07 12:37:23,949][130927] Decorrelating experience for 96 frames... +[2024-11-07 12:37:24,191][130925] Decorrelating experience for 96 frames... +[2024-11-07 12:37:24,236][130926] Decorrelating experience for 96 frames... +[2024-11-07 12:37:24,271][130923] Decorrelating experience for 96 frames... +[2024-11-07 12:37:24,478][130924] Decorrelating experience for 96 frames... +[2024-11-07 12:37:24,615][129156] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 2494464. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-11-07 12:37:28,321][130909] Signal inference workers to stop experience collection... +[2024-11-07 12:37:28,331][130922] InferenceWorker_p0-w0: stopping experience collection +[2024-11-07 12:37:29,614][129156] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 2494464. Throughput: 0: 211.8. Samples: 2118. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-11-07 12:37:29,616][129156] Avg episode reward: [(0, '1.631')] +[2024-11-07 12:37:30,568][129156] Keyboard interrupt detected in the event loop EvtLoop [Runner_EvtLoop, process=main process 129156], exiting... +[2024-11-07 12:37:30,571][130909] Stopping Batcher_0... +[2024-11-07 12:37:30,571][130909] Loop batcher_evt_loop terminating... +[2024-11-07 12:37:30,570][129156] Runner profile tree view: +main_loop: 18.5024 +[2024-11-07 12:37:30,574][129156] Collected {0: 2494464}, FPS: 0.0 +[2024-11-07 12:37:30,588][130922] Weights refcount: 2 0 +[2024-11-07 12:37:30,591][130922] Stopping InferenceWorker_p0-w0... +[2024-11-07 12:37:30,592][130922] Loop inference_proc0-0_evt_loop terminating... +[2024-11-07 12:37:30,719][130923] Stopping RolloutWorker_w0... +[2024-11-07 12:37:30,720][130923] Loop rollout_proc0_evt_loop terminating... +[2024-11-07 12:37:30,740][130936] Stopping RolloutWorker_w4... +[2024-11-07 12:37:30,741][130936] Loop rollout_proc4_evt_loop terminating... +[2024-11-07 12:37:30,745][130929] Stopping RolloutWorker_w5... +[2024-11-07 12:37:30,746][130929] Loop rollout_proc5_evt_loop terminating... +[2024-11-07 12:37:30,747][130925] Stopping RolloutWorker_w2... +[2024-11-07 12:37:30,747][130925] Loop rollout_proc2_evt_loop terminating... +[2024-11-07 12:37:30,778][130926] Stopping RolloutWorker_w3... +[2024-11-07 12:37:30,779][130926] Loop rollout_proc3_evt_loop terminating... +[2024-11-07 12:37:30,782][130928] Stopping RolloutWorker_w6... +[2024-11-07 12:37:30,783][130928] Loop rollout_proc6_evt_loop terminating... +[2024-11-07 12:37:30,814][130924] Stopping RolloutWorker_w1... +[2024-11-07 12:37:30,816][130924] Loop rollout_proc1_evt_loop terminating... +[2024-11-07 12:37:30,851][130927] Stopping RolloutWorker_w7... +[2024-11-07 12:37:30,852][130927] Loop rollout_proc7_evt_loop terminating... +[2024-11-07 12:37:35,714][130909] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000611_2502656.pth... +[2024-11-07 12:37:35,783][130909] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000525_2150400.pth +[2024-11-07 12:37:35,785][130909] Stopping LearnerWorker_p0... +[2024-11-07 12:37:35,786][130909] Loop learner_proc0_evt_loop terminating... +[2024-11-07 12:40:27,910][129156] Environment doom_basic already registered, overwriting... +[2024-11-07 12:40:27,912][129156] Environment doom_two_colors_easy already registered, overwriting... +[2024-11-07 12:40:27,913][129156] Environment doom_two_colors_hard already registered, overwriting... +[2024-11-07 12:40:27,914][129156] Environment doom_dm already registered, overwriting... +[2024-11-07 12:40:27,916][129156] Environment doom_dwango5 already registered, overwriting... +[2024-11-07 12:40:27,917][129156] Environment doom_my_way_home_flat_actions already registered, overwriting... +[2024-11-07 12:40:27,919][129156] Environment doom_defend_the_center_flat_actions already registered, overwriting... +[2024-11-07 12:40:27,921][129156] Environment doom_my_way_home already registered, overwriting... +[2024-11-07 12:40:27,922][129156] Environment doom_deadly_corridor already registered, overwriting... +[2024-11-07 12:40:27,923][129156] Environment doom_defend_the_center already registered, overwriting... +[2024-11-07 12:40:27,924][129156] Environment doom_defend_the_line already registered, overwriting... +[2024-11-07 12:40:27,926][129156] Environment doom_health_gathering already registered, overwriting... +[2024-11-07 12:40:27,927][129156] Environment doom_health_gathering_supreme already registered, overwriting... +[2024-11-07 12:40:27,928][129156] Environment doom_battle already registered, overwriting... +[2024-11-07 12:40:27,929][129156] Environment doom_battle2 already registered, overwriting... +[2024-11-07 12:40:27,930][129156] Environment doom_duel_bots already registered, overwriting... +[2024-11-07 12:40:27,931][129156] Environment doom_deathmatch_bots already registered, overwriting... +[2024-11-07 12:40:27,932][129156] Environment doom_duel already registered, overwriting... +[2024-11-07 12:40:27,933][129156] Environment doom_deathmatch_full already registered, overwriting... +[2024-11-07 12:40:27,935][129156] Environment doom_benchmark already registered, overwriting... +[2024-11-07 12:40:27,937][129156] register_encoder_factory: +[2024-11-07 12:40:27,953][129156] Loading existing experiment configuration from /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/config.json +[2024-11-07 12:40:27,960][129156] Experiment dir /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment already exists! +[2024-11-07 12:40:27,962][129156] Resuming existing experiment from /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment... +[2024-11-07 12:40:27,963][129156] Weights and Biases integration disabled +[2024-11-07 12:40:27,966][129156] Environment var CUDA_VISIBLE_DEVICES is 0 + +[2024-11-07 12:40:31,814][129156] Starting experiment with the following configuration: +help=False +algo=APPO +env=doom_health_gathering_supreme +experiment=default_experiment +train_dir=/root/hfRL/ml/LunarLander-v2/train_dir +restart_behavior=resume +device=gpu +seed=None +num_policies=1 +async_rl=True +serial_mode=False +batched_sampling=False +num_batches_to_accumulate=2 +worker_num_splits=2 +policy_workers_per_policy=1 +max_policy_lag=1000 +num_workers=8 +num_envs_per_worker=4 +batch_size=1024 +num_batches_per_epoch=1 +num_epochs=1 +rollout=32 +recurrence=32 +shuffle_minibatches=False +gamma=0.99 +reward_scale=1.0 +reward_clip=1000.0 +value_bootstrap=False +normalize_returns=True +exploration_loss_coeff=0.001 +value_loss_coeff=0.5 +kl_loss_coeff=0.0 +exploration_loss=symmetric_kl +gae_lambda=0.95 +ppo_clip_ratio=0.1 +ppo_clip_value=0.2 +with_vtrace=False +vtrace_rho=1.0 +vtrace_c=1.0 +optimizer=adam +adam_eps=1e-06 +adam_beta1=0.9 +adam_beta2=0.999 +max_grad_norm=4.0 +learning_rate=0.0001 +lr_schedule=constant +lr_schedule_kl_threshold=0.008 +lr_adaptive_min=1e-06 +lr_adaptive_max=0.01 +obs_subtract_mean=0.0 +obs_scale=255.0 +normalize_input=True +normalize_input_keys=None +decorrelate_experience_max_seconds=0 +decorrelate_envs_on_one_worker=True +actor_worker_gpus=[] +set_workers_cpu_affinity=True +force_envs_single_thread=False +default_niceness=0 +log_to_file=True +experiment_summaries_interval=10 +flush_summaries_interval=30 +stats_avg=100 +summaries_use_frameskip=True +heartbeat_interval=20 +heartbeat_reporting_interval=600 +train_for_env_steps=4000000 +train_for_seconds=10000000000 +save_every_sec=120 +keep_checkpoints=2 +load_checkpoint_kind=latest +save_milestones_sec=-1 +save_best_every_sec=5 +save_best_metric=reward +save_best_after=100000 +benchmark=False +encoder_mlp_layers=[512, 512] +encoder_conv_architecture=convnet_simple +encoder_conv_mlp_layers=[512] +use_rnn=True +rnn_size=512 +rnn_type=gru +rnn_num_layers=1 +decoder_mlp_layers=[] +nonlinearity=elu +policy_initialization=orthogonal +policy_init_gain=1.0 +actor_critic_share_weights=True +adaptive_stddev=True +continuous_tanh_scale=0.0 +initial_stddev=1.0 +use_env_info_cache=False +env_gpu_actions=False +env_gpu_observations=True +env_frameskip=4 +env_framestack=1 +pixel_format=CHW +use_record_episode_statistics=False +with_wandb=False +wandb_user=None +wandb_project=sample_factory +wandb_group=None +wandb_job_type=SF +wandb_tags=[] +with_pbt=False +pbt_mix_policies_in_one_env=True +pbt_period_env_steps=5000000 +pbt_start_mutation=20000000 +pbt_replace_fraction=0.3 +pbt_mutation_rate=0.15 +pbt_replace_reward_gap=0.1 +pbt_replace_reward_gap_absolute=1e-06 +pbt_optimize_gamma=False +pbt_target_objective=true_objective +pbt_perturb_min=1.1 +pbt_perturb_max=1.5 +num_agents=-1 +num_humans=0 +num_bots=-1 +start_bot_difficulty=None +timelimit=None +res_w=128 +res_h=72 +wide_aspect_ratio=False +eval_env_frameskip=1 +fps=35 +command_line=--env=doom_health_gathering_supreme --num_workers=8 --num_envs_per_worker=4 --train_for_env_steps=4000000 +cli_args={'env': 'doom_health_gathering_supreme', 'num_workers': 8, 'num_envs_per_worker': 4, 'train_for_env_steps': 4000000} +git_hash=unknown +git_repo_name=not a git repository +[2024-11-07 12:40:31,816][129156] Saving configuration to /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/config.json... +[2024-11-07 12:40:31,818][129156] Rollout worker 0 uses device cpu +[2024-11-07 12:40:31,819][129156] Rollout worker 1 uses device cpu +[2024-11-07 12:40:31,820][129156] Rollout worker 2 uses device cpu +[2024-11-07 12:40:31,821][129156] Rollout worker 3 uses device cpu +[2024-11-07 12:40:31,822][129156] Rollout worker 4 uses device cpu +[2024-11-07 12:40:31,822][129156] Rollout worker 5 uses device cpu +[2024-11-07 12:40:31,823][129156] Rollout worker 6 uses device cpu +[2024-11-07 12:40:31,824][129156] Rollout worker 7 uses device cpu +[2024-11-07 12:40:31,875][129156] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-11-07 12:40:31,876][129156] InferenceWorker_p0-w0: min num requests: 2 +[2024-11-07 12:40:31,909][129156] Starting all processes... +[2024-11-07 12:40:31,910][129156] Starting process learner_proc0 +[2024-11-07 12:40:31,959][129156] Starting all processes... +[2024-11-07 12:40:31,964][129156] Starting process inference_proc0-0 +[2024-11-07 12:40:31,965][129156] Starting process rollout_proc0 +[2024-11-07 12:40:31,965][129156] Starting process rollout_proc1 +[2024-11-07 12:40:31,966][129156] Starting process rollout_proc2 +[2024-11-07 12:40:31,966][129156] Starting process rollout_proc3 +[2024-11-07 12:40:31,967][129156] Starting process rollout_proc4 +[2024-11-07 12:40:31,967][129156] Starting process rollout_proc5 +[2024-11-07 12:40:31,968][129156] Starting process rollout_proc6 +[2024-11-07 12:40:31,970][129156] Starting process rollout_proc7 +[2024-11-07 12:40:36,276][132047] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-11-07 12:40:36,276][132047] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 +[2024-11-07 12:40:36,304][132058] Worker 7 uses CPU cores [0, 1, 2, 3, 4, 5, 6] +[2024-11-07 12:40:36,305][132049] Worker 1 uses CPU cores [1] +[2024-11-07 12:40:36,347][132048] Worker 0 uses CPU cores [0] +[2024-11-07 12:40:36,369][132047] Num visible devices: 1 +[2024-11-07 12:40:36,375][132053] Worker 5 uses CPU cores [5] +[2024-11-07 12:40:36,574][132031] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-11-07 12:40:36,574][132031] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 +[2024-11-07 12:40:36,575][132051] Worker 2 uses CPU cores [2] +[2024-11-07 12:40:36,597][132031] Num visible devices: 1 +[2024-11-07 12:40:36,605][132052] Worker 4 uses CPU cores [4] +[2024-11-07 12:40:36,606][132055] Worker 6 uses CPU cores [6] +[2024-11-07 12:40:36,613][132031] Starting seed is not provided +[2024-11-07 12:40:36,614][132031] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-11-07 12:40:36,614][132031] Initializing actor-critic model on device cuda:0 +[2024-11-07 12:40:36,614][132031] RunningMeanStd input shape: (3, 72, 128) +[2024-11-07 12:40:36,615][132031] RunningMeanStd input shape: (1,) +[2024-11-07 12:40:36,638][132031] ConvEncoder: input_channels=3 +[2024-11-07 12:40:36,746][132031] Conv encoder output size: 512 +[2024-11-07 12:40:36,746][132031] Policy head output size: 512 +[2024-11-07 12:40:36,759][132031] Created Actor Critic model with architecture: +[2024-11-07 12:40:36,759][132031] ActorCriticSharedWeights( + (obs_normalizer): ObservationNormalizer( + (running_mean_std): RunningMeanStdDictInPlace( + (running_mean_std): ModuleDict( + (obs): RunningMeanStdInPlace() + ) + ) + ) + (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) + (encoder): VizdoomEncoder( + (basic_encoder): ConvEncoder( + (enc): RecursiveScriptModule( + original_name=ConvEncoderImpl + (conv_head): RecursiveScriptModule( + original_name=Sequential + (0): RecursiveScriptModule(original_name=Conv2d) + (1): RecursiveScriptModule(original_name=ELU) + (2): RecursiveScriptModule(original_name=Conv2d) + (3): RecursiveScriptModule(original_name=ELU) + (4): RecursiveScriptModule(original_name=Conv2d) + (5): RecursiveScriptModule(original_name=ELU) + ) + (mlp_layers): RecursiveScriptModule( + original_name=Sequential + (0): RecursiveScriptModule(original_name=Linear) + (1): RecursiveScriptModule(original_name=ELU) + ) + ) + ) + ) + (core): ModelCoreRNN( + (core): GRU(512, 512) + ) + (decoder): MlpDecoder( + (mlp): Identity() + ) + (critic_linear): Linear(in_features=512, out_features=1, bias=True) + (action_parameterization): ActionParameterizationDefault( + (distribution_linear): Linear(in_features=512, out_features=5, bias=True) + ) +) +[2024-11-07 12:40:36,784][132050] Worker 3 uses CPU cores [3] +[2024-11-07 12:40:37,258][132031] Using optimizer +[2024-11-07 12:40:38,132][132031] Loading state from checkpoint /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000611_2502656.pth... +[2024-11-07 12:40:38,170][132031] Loading model from checkpoint +[2024-11-07 12:40:38,172][132031] Loaded experiment state at self.train_step=611, self.env_steps=2502656 +[2024-11-07 12:40:38,172][132031] Initialized policy 0 weights for model version 611 +[2024-11-07 12:40:38,178][132031] LearnerWorker_p0 finished initialization! +[2024-11-07 12:40:38,178][132031] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-11-07 12:40:38,330][132047] RunningMeanStd input shape: (3, 72, 128) +[2024-11-07 12:40:38,331][132047] RunningMeanStd input shape: (1,) +[2024-11-07 12:40:38,342][132047] ConvEncoder: input_channels=3 +[2024-11-07 12:40:38,444][132047] Conv encoder output size: 512 +[2024-11-07 12:40:38,444][132047] Policy head output size: 512 +[2024-11-07 12:40:38,489][129156] Inference worker 0-0 is ready! +[2024-11-07 12:40:38,490][129156] All inference workers are ready! Signal rollout workers to start! +[2024-11-07 12:40:38,554][132050] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 12:40:38,560][132048] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 12:40:38,562][132053] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 12:40:38,564][132051] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 12:40:38,567][132055] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 12:40:38,567][132049] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 12:40:38,600][132052] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 12:40:38,613][132058] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 12:40:39,173][132051] Decorrelating experience for 0 frames... +[2024-11-07 12:40:39,179][132048] Decorrelating experience for 0 frames... +[2024-11-07 12:40:39,179][132053] Decorrelating experience for 0 frames... +[2024-11-07 12:40:39,187][132050] Decorrelating experience for 0 frames... +[2024-11-07 12:40:39,188][132058] Decorrelating experience for 0 frames... +[2024-11-07 12:40:39,516][132051] Decorrelating experience for 32 frames... +[2024-11-07 12:40:39,521][132050] Decorrelating experience for 32 frames... +[2024-11-07 12:40:39,521][132049] Decorrelating experience for 0 frames... +[2024-11-07 12:40:39,562][132058] Decorrelating experience for 32 frames... +[2024-11-07 12:40:39,596][132052] Decorrelating experience for 0 frames... +[2024-11-07 12:40:39,652][132048] Decorrelating experience for 32 frames... +[2024-11-07 12:40:39,897][132053] Decorrelating experience for 32 frames... +[2024-11-07 12:40:39,949][132050] Decorrelating experience for 64 frames... +[2024-11-07 12:40:40,005][132051] Decorrelating experience for 64 frames... +[2024-11-07 12:40:40,033][132058] Decorrelating experience for 64 frames... +[2024-11-07 12:40:40,071][132049] Decorrelating experience for 32 frames... +[2024-11-07 12:40:40,297][132052] Decorrelating experience for 32 frames... +[2024-11-07 12:40:40,419][132050] Decorrelating experience for 96 frames... +[2024-11-07 12:40:40,420][132051] Decorrelating experience for 96 frames... +[2024-11-07 12:40:40,436][132048] Decorrelating experience for 64 frames... +[2024-11-07 12:40:40,540][132058] Decorrelating experience for 96 frames... +[2024-11-07 12:40:40,637][132049] Decorrelating experience for 64 frames... +[2024-11-07 12:40:40,855][132052] Decorrelating experience for 64 frames... +[2024-11-07 12:40:41,058][132055] Decorrelating experience for 0 frames... +[2024-11-07 12:40:41,163][132053] Decorrelating experience for 64 frames... +[2024-11-07 12:40:41,270][132048] Decorrelating experience for 96 frames... +[2024-11-07 12:40:41,699][132052] Decorrelating experience for 96 frames... +[2024-11-07 12:40:41,819][132049] Decorrelating experience for 96 frames... +[2024-11-07 12:40:41,992][132055] Decorrelating experience for 32 frames... +[2024-11-07 12:40:42,383][132053] Decorrelating experience for 96 frames... +[2024-11-07 12:40:42,749][132055] Decorrelating experience for 64 frames... +[2024-11-07 12:40:42,967][129156] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 2502656. Throughput: 0: nan. Samples: 104. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-11-07 12:40:42,972][129156] Avg episode reward: [(0, '1.194')] +[2024-11-07 12:40:43,443][132055] Decorrelating experience for 96 frames... +[2024-11-07 12:40:44,214][132031] Signal inference workers to stop experience collection... +[2024-11-07 12:40:44,231][132047] InferenceWorker_p0-w0: stopping experience collection +[2024-11-07 12:40:46,259][129156] Keyboard interrupt detected in the event loop EvtLoop [Runner_EvtLoop, process=main process 129156], exiting... +[2024-11-07 12:40:46,260][132031] Stopping Batcher_0... +[2024-11-07 12:40:46,261][132031] Loop batcher_evt_loop terminating... +[2024-11-07 12:40:46,261][129156] Runner profile tree view: +main_loop: 14.3520 +[2024-11-07 12:40:46,262][129156] Collected {0: 2502656}, FPS: 0.0 +[2024-11-07 12:40:46,277][132047] Weights refcount: 2 0 +[2024-11-07 12:40:46,280][132047] Stopping InferenceWorker_p0-w0... +[2024-11-07 12:40:46,280][132047] Loop inference_proc0-0_evt_loop terminating... +[2024-11-07 12:40:46,344][132052] Stopping RolloutWorker_w4... +[2024-11-07 12:40:46,344][132052] Loop rollout_proc4_evt_loop terminating... +[2024-11-07 12:40:46,411][132048] Stopping RolloutWorker_w0... +[2024-11-07 12:40:46,411][132048] Loop rollout_proc0_evt_loop terminating... +[2024-11-07 12:40:46,419][132055] Stopping RolloutWorker_w6... +[2024-11-07 12:40:46,420][132055] Loop rollout_proc6_evt_loop terminating... +[2024-11-07 12:40:46,445][132053] Stopping RolloutWorker_w5... +[2024-11-07 12:40:46,446][132053] Loop rollout_proc5_evt_loop terminating... +[2024-11-07 12:40:46,461][132050] Stopping RolloutWorker_w3... +[2024-11-07 12:40:46,462][132050] Loop rollout_proc3_evt_loop terminating... +[2024-11-07 12:40:46,495][132058] Stopping RolloutWorker_w7... +[2024-11-07 12:40:46,497][132058] Loop rollout_proc7_evt_loop terminating... +[2024-11-07 12:40:46,499][132051] Stopping RolloutWorker_w2... +[2024-11-07 12:40:46,500][132051] Loop rollout_proc2_evt_loop terminating... +[2024-11-07 12:40:46,560][132049] Stopping RolloutWorker_w1... +[2024-11-07 12:40:46,561][132049] Loop rollout_proc1_evt_loop terminating... +[2024-11-07 12:40:49,852][132031] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000613_2510848.pth... +[2024-11-07 12:40:49,919][132031] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000609_2494464.pth +[2024-11-07 12:40:49,924][132031] Stopping LearnerWorker_p0... +[2024-11-07 12:40:49,925][132031] Loop learner_proc0_evt_loop terminating... +[2024-11-07 12:42:00,143][129156] Environment doom_basic already registered, overwriting... +[2024-11-07 12:42:00,146][129156] Environment doom_two_colors_easy already registered, overwriting... +[2024-11-07 12:42:00,147][129156] Environment doom_two_colors_hard already registered, overwriting... +[2024-11-07 12:42:00,150][129156] Environment doom_dm already registered, overwriting... +[2024-11-07 12:42:00,151][129156] Environment doom_dwango5 already registered, overwriting... +[2024-11-07 12:42:00,152][129156] Environment doom_my_way_home_flat_actions already registered, overwriting... +[2024-11-07 12:42:00,153][129156] Environment doom_defend_the_center_flat_actions already registered, overwriting... +[2024-11-07 12:42:00,155][129156] Environment doom_my_way_home already registered, overwriting... +[2024-11-07 12:42:00,156][129156] Environment doom_deadly_corridor already registered, overwriting... +[2024-11-07 12:42:00,158][129156] Environment doom_defend_the_center already registered, overwriting... +[2024-11-07 12:42:00,160][129156] Environment doom_defend_the_line already registered, overwriting... +[2024-11-07 12:42:00,161][129156] Environment doom_health_gathering already registered, overwriting... +[2024-11-07 12:42:00,164][129156] Environment doom_health_gathering_supreme already registered, overwriting... +[2024-11-07 12:42:00,166][129156] Environment doom_battle already registered, overwriting... +[2024-11-07 12:42:00,167][129156] Environment doom_battle2 already registered, overwriting... +[2024-11-07 12:42:00,169][129156] Environment doom_duel_bots already registered, overwriting... +[2024-11-07 12:42:00,170][129156] Environment doom_deathmatch_bots already registered, overwriting... +[2024-11-07 12:42:00,172][129156] Environment doom_duel already registered, overwriting... +[2024-11-07 12:42:00,176][129156] Environment doom_deathmatch_full already registered, overwriting... +[2024-11-07 12:42:00,177][129156] Environment doom_benchmark already registered, overwriting... +[2024-11-07 12:42:00,178][129156] register_encoder_factory: +[2024-11-07 13:15:32,891][07338] Saving configuration to /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/config.json... +[2024-11-07 13:15:32,915][07338] Rollout worker 0 uses device cpu +[2024-11-07 13:15:32,917][07338] Rollout worker 1 uses device cpu +[2024-11-07 13:15:32,919][07338] Rollout worker 2 uses device cpu +[2024-11-07 13:15:32,920][07338] Rollout worker 3 uses device cpu +[2024-11-07 13:15:32,922][07338] Rollout worker 4 uses device cpu +[2024-11-07 13:15:32,924][07338] Rollout worker 5 uses device cpu +[2024-11-07 13:15:32,926][07338] Rollout worker 6 uses device cpu +[2024-11-07 13:15:32,928][07338] Rollout worker 7 uses device cpu +[2024-11-07 13:15:33,385][07338] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-11-07 13:15:33,386][07338] InferenceWorker_p0-w0: min num requests: 2 +[2024-11-07 13:15:33,425][07338] Starting all processes... +[2024-11-07 13:15:33,428][07338] Starting process learner_proc0 +[2024-11-07 13:15:33,598][07338] Starting all processes... +[2024-11-07 13:15:33,669][07338] Starting process inference_proc0-0 +[2024-11-07 13:15:33,670][07338] Starting process rollout_proc0 +[2024-11-07 13:15:33,671][07338] Starting process rollout_proc1 +[2024-11-07 13:15:33,672][07338] Starting process rollout_proc2 +[2024-11-07 13:15:33,672][07338] Starting process rollout_proc3 +[2024-11-07 13:15:33,673][07338] Starting process rollout_proc4 +[2024-11-07 13:15:33,673][07338] Starting process rollout_proc5 +[2024-11-07 13:15:33,679][07338] Starting process rollout_proc6 +[2024-11-07 13:15:33,680][07338] Starting process rollout_proc7 +[2024-11-07 13:15:43,393][07455] Worker 4 uses CPU cores [4] +[2024-11-07 13:15:43,672][07452] Worker 1 uses CPU cores [1] +[2024-11-07 13:15:44,020][07457] Worker 6 uses CPU cores [6] +[2024-11-07 13:15:44,077][07437] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-11-07 13:15:44,078][07437] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 +[2024-11-07 13:15:44,428][07451] Worker 0 uses CPU cores [0] +[2024-11-07 13:15:44,461][07453] Worker 2 uses CPU cores [2] +[2024-11-07 13:15:44,493][07437] Num visible devices: 1 +[2024-11-07 13:15:44,495][07454] Worker 3 uses CPU cores [3] +[2024-11-07 13:15:44,501][07456] Worker 5 uses CPU cores [5] +[2024-11-07 13:15:44,510][07437] Starting seed is not provided +[2024-11-07 13:15:44,510][07437] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-11-07 13:15:44,510][07437] Initializing actor-critic model on device cuda:0 +[2024-11-07 13:15:44,512][07437] RunningMeanStd input shape: (3, 72, 128) +[2024-11-07 13:15:44,539][07437] RunningMeanStd input shape: (1,) +[2024-11-07 13:15:44,583][07437] ConvEncoder: input_channels=3 +[2024-11-07 13:15:44,637][07450] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-11-07 13:15:44,637][07450] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 +[2024-11-07 13:15:44,794][07450] Num visible devices: 1 +[2024-11-07 13:15:44,972][07458] Worker 7 uses CPU cores [0, 1, 2, 3, 4, 5, 6] +[2024-11-07 13:15:46,129][07437] Conv encoder output size: 512 +[2024-11-07 13:15:46,130][07437] Policy head output size: 512 +[2024-11-07 13:15:46,689][07437] Created Actor Critic model with architecture: +[2024-11-07 13:15:46,690][07437] ActorCriticSharedWeights( + (obs_normalizer): ObservationNormalizer( + (running_mean_std): RunningMeanStdDictInPlace( + (running_mean_std): ModuleDict( + (obs): RunningMeanStdInPlace() + ) + ) + ) + (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) + (encoder): VizdoomEncoder( + (basic_encoder): ConvEncoder( + (enc): RecursiveScriptModule( + original_name=ConvEncoderImpl + (conv_head): RecursiveScriptModule( + original_name=Sequential + (0): RecursiveScriptModule(original_name=Conv2d) + (1): RecursiveScriptModule(original_name=ELU) + (2): RecursiveScriptModule(original_name=Conv2d) + (3): RecursiveScriptModule(original_name=ELU) + (4): RecursiveScriptModule(original_name=Conv2d) + (5): RecursiveScriptModule(original_name=ELU) + ) + (mlp_layers): RecursiveScriptModule( + original_name=Sequential + (0): RecursiveScriptModule(original_name=Linear) + (1): RecursiveScriptModule(original_name=ELU) + ) + ) + ) + ) + (core): ModelCoreRNN( + (core): GRU(512, 512) + ) + (decoder): MlpDecoder( + (mlp): Identity() + ) + (critic_linear): Linear(in_features=512, out_features=1, bias=True) + (action_parameterization): ActionParameterizationDefault( + (distribution_linear): Linear(in_features=512, out_features=5, bias=True) + ) +) +[2024-11-07 13:15:49,286][07437] Using optimizer +[2024-11-07 13:15:53,386][07338] Heartbeat connected on InferenceWorker_p0-w0 +[2024-11-07 13:15:53,392][07338] Heartbeat connected on RolloutWorker_w0 +[2024-11-07 13:15:53,396][07338] Heartbeat connected on RolloutWorker_w1 +[2024-11-07 13:15:53,399][07338] Heartbeat connected on RolloutWorker_w2 +[2024-11-07 13:15:53,405][07338] Heartbeat connected on RolloutWorker_w3 +[2024-11-07 13:15:53,410][07338] Heartbeat connected on RolloutWorker_w4 +[2024-11-07 13:15:53,417][07338] Heartbeat connected on RolloutWorker_w5 +[2024-11-07 13:15:53,420][07338] Heartbeat connected on RolloutWorker_w6 +[2024-11-07 13:15:53,425][07338] Heartbeat connected on RolloutWorker_w7 +[2024-11-07 13:15:54,978][07338] Heartbeat connected on Batcher_0 +[2024-11-07 13:15:58,478][07437] Loading state from checkpoint /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000613_2510848.pth... +[2024-11-07 13:15:59,119][07437] Loading model from checkpoint +[2024-11-07 13:15:59,121][07437] Loaded experiment state at self.train_step=613, self.env_steps=2510848 +[2024-11-07 13:15:59,183][07437] Initialized policy 0 weights for model version 613 +[2024-11-07 13:15:59,194][07437] LearnerWorker_p0 finished initialization! +[2024-11-07 13:15:59,194][07437] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-11-07 13:15:59,195][07338] Heartbeat connected on LearnerWorker_p0 +[2024-11-07 13:15:59,685][07450] RunningMeanStd input shape: (3, 72, 128) +[2024-11-07 13:15:59,689][07450] RunningMeanStd input shape: (1,) +[2024-11-07 13:15:59,750][07450] ConvEncoder: input_channels=3 +[2024-11-07 13:16:00,208][07450] Conv encoder output size: 512 +[2024-11-07 13:16:00,209][07450] Policy head output size: 512 +[2024-11-07 13:16:00,283][07338] Inference worker 0-0 is ready! +[2024-11-07 13:16:00,285][07338] All inference workers are ready! Signal rollout workers to start! +[2024-11-07 13:16:00,894][07456] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 13:16:00,900][07454] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 13:16:00,941][07452] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 13:16:00,964][07453] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 13:16:00,985][07451] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 13:16:00,993][07457] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 13:16:01,056][07455] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 13:16:01,412][07458] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 13:16:02,891][07338] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 2510848. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-11-07 13:16:05,134][07456] Decorrelating experience for 0 frames... +[2024-11-07 13:16:05,135][07453] Decorrelating experience for 0 frames... +[2024-11-07 13:16:05,134][07454] Decorrelating experience for 0 frames... +[2024-11-07 13:16:05,135][07455] Decorrelating experience for 0 frames... +[2024-11-07 13:16:05,136][07452] Decorrelating experience for 0 frames... +[2024-11-07 13:16:05,144][07457] Decorrelating experience for 0 frames... +[2024-11-07 13:16:05,134][07458] Decorrelating experience for 0 frames... +[2024-11-07 13:16:05,674][07451] Decorrelating experience for 0 frames... +[2024-11-07 13:16:05,848][07453] Decorrelating experience for 32 frames... +[2024-11-07 13:16:05,848][07455] Decorrelating experience for 32 frames... +[2024-11-07 13:16:05,876][07452] Decorrelating experience for 32 frames... +[2024-11-07 13:16:05,887][07456] Decorrelating experience for 32 frames... +[2024-11-07 13:16:05,897][07454] Decorrelating experience for 32 frames... +[2024-11-07 13:16:06,245][07457] Decorrelating experience for 32 frames... +[2024-11-07 13:16:06,261][07451] Decorrelating experience for 32 frames... +[2024-11-07 13:16:06,619][07453] Decorrelating experience for 64 frames... +[2024-11-07 13:16:06,624][07455] Decorrelating experience for 64 frames... +[2024-11-07 13:16:06,670][07452] Decorrelating experience for 64 frames... +[2024-11-07 13:16:06,834][07456] Decorrelating experience for 64 frames... +[2024-11-07 13:16:07,019][07457] Decorrelating experience for 64 frames... +[2024-11-07 13:16:07,115][07454] Decorrelating experience for 64 frames... +[2024-11-07 13:16:07,267][07455] Decorrelating experience for 96 frames... +[2024-11-07 13:16:07,786][07458] Decorrelating experience for 32 frames... +[2024-11-07 13:16:07,892][07338] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 2510848. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-11-07 13:16:08,164][07453] Decorrelating experience for 96 frames... +[2024-11-07 13:16:08,292][07454] Decorrelating experience for 96 frames... +[2024-11-07 13:16:08,644][07457] Decorrelating experience for 96 frames... +[2024-11-07 13:16:08,976][07452] Decorrelating experience for 96 frames... +[2024-11-07 13:16:08,978][07458] Decorrelating experience for 64 frames... +[2024-11-07 13:16:09,105][07451] Decorrelating experience for 64 frames... +[2024-11-07 13:16:09,656][07458] Decorrelating experience for 96 frames... +[2024-11-07 13:16:09,688][07456] Decorrelating experience for 96 frames... +[2024-11-07 13:16:09,713][07451] Decorrelating experience for 96 frames... +[2024-11-07 13:16:12,892][07338] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 2510848. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-11-07 13:16:17,891][07338] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 2510848. Throughput: 0: 5.5. Samples: 82. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-11-07 13:16:17,893][07338] Avg episode reward: [(0, '0.590')] +[2024-11-07 13:16:18,994][07437] Signal inference workers to stop experience collection... +[2024-11-07 13:16:19,022][07450] InferenceWorker_p0-w0: stopping experience collection +[2024-11-07 13:16:22,892][07338] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 2510848. Throughput: 0: 105.1. Samples: 2102. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-11-07 13:16:22,894][07338] Avg episode reward: [(0, '1.991')] +[2024-11-07 13:16:27,891][07338] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 2510848. Throughput: 0: 84.1. Samples: 2102. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-11-07 13:16:27,894][07338] Avg episode reward: [(0, '1.991')] +[2024-11-07 13:16:32,892][07338] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 2510848. Throughput: 0: 70.1. Samples: 2102. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-11-07 13:16:32,894][07338] Avg episode reward: [(0, '1.991')] +[2024-11-07 13:16:37,892][07338] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 2510848. Throughput: 0: 60.1. Samples: 2102. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-11-07 13:16:37,893][07338] Avg episode reward: [(0, '1.991')] +[2024-11-07 13:16:42,510][07437] Signal inference workers to resume experience collection... +[2024-11-07 13:16:42,512][07450] InferenceWorker_p0-w0: resuming experience collection +[2024-11-07 13:16:42,892][07338] Fps is (10 sec: 409.6, 60 sec: 102.4, 300 sec: 102.4). Total num frames: 2514944. Throughput: 0: 52.5. Samples: 2102. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) +[2024-11-07 13:16:42,895][07338] Avg episode reward: [(0, '1.991')] +[2024-11-07 13:16:47,891][07338] Fps is (10 sec: 3276.9, 60 sec: 728.2, 300 sec: 728.2). Total num frames: 2543616. Throughput: 0: 113.3. Samples: 5098. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 13:16:47,894][07338] Avg episode reward: [(0, '3.847')] +[2024-11-07 13:16:48,812][07450] Updated weights for policy 0, policy_version 623 (0.0047) +[2024-11-07 13:16:52,892][07338] Fps is (10 sec: 6144.1, 60 sec: 1310.7, 300 sec: 1310.7). Total num frames: 2576384. Throughput: 0: 341.2. Samples: 15356. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 13:16:52,970][07338] Avg episode reward: [(0, '4.473')] +[2024-11-07 13:16:55,293][07450] Updated weights for policy 0, policy_version 633 (0.0034) +[2024-11-07 13:16:57,891][07338] Fps is (10 sec: 6553.6, 60 sec: 1787.3, 300 sec: 1787.3). Total num frames: 2609152. Throughput: 0: 539.4. Samples: 24272. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 13:16:57,895][07338] Avg episode reward: [(0, '4.368')] +[2024-11-07 13:17:01,853][07450] Updated weights for policy 0, policy_version 643 (0.0027) +[2024-11-07 13:17:03,537][07338] Fps is (10 sec: 5771.7, 60 sec: 2093.8, 300 sec: 2093.8). Total num frames: 2637824. Throughput: 0: 638.7. Samples: 29236. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 13:17:03,538][07338] Avg episode reward: [(0, '4.472')] +[2024-11-07 13:17:03,725][07338] Keyboard interrupt detected in the event loop EvtLoop [Runner_EvtLoop, process=main process 7338], exiting... +[2024-11-07 13:17:03,727][07338] Runner profile tree view: +main_loop: 90.3018 +[2024-11-07 13:17:03,728][07338] Collected {0: 2637824}, FPS: 1406.1 +[2024-11-07 13:17:03,760][07437] Stopping Batcher_0... +[2024-11-07 13:17:03,762][07437] Loop batcher_evt_loop terminating... +[2024-11-07 13:17:03,824][07437] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000645_2641920.pth... +[2024-11-07 13:17:04,574][07458] Stopping RolloutWorker_w7... +[2024-11-07 13:17:04,575][07458] Loop rollout_proc7_evt_loop terminating... +[2024-11-07 13:17:04,588][07457] Stopping RolloutWorker_w6... +[2024-11-07 13:17:04,588][07455] Stopping RolloutWorker_w4... +[2024-11-07 13:17:04,589][07455] Loop rollout_proc4_evt_loop terminating... +[2024-11-07 13:17:04,589][07457] Loop rollout_proc6_evt_loop terminating... +[2024-11-07 13:17:04,589][07452] Stopping RolloutWorker_w1... +[2024-11-07 13:17:04,590][07456] Stopping RolloutWorker_w5... +[2024-11-07 13:17:04,590][07452] Loop rollout_proc1_evt_loop terminating... +[2024-11-07 13:17:04,590][07456] Loop rollout_proc5_evt_loop terminating... +[2024-11-07 13:17:04,590][07451] Stopping RolloutWorker_w0... +[2024-11-07 13:17:04,591][07451] Loop rollout_proc0_evt_loop terminating... +[2024-11-07 13:17:04,600][07453] Stopping RolloutWorker_w2... +[2024-11-07 13:17:04,600][07454] Stopping RolloutWorker_w3... +[2024-11-07 13:17:04,601][07454] Loop rollout_proc3_evt_loop terminating... +[2024-11-07 13:17:04,601][07453] Loop rollout_proc2_evt_loop terminating... +[2024-11-07 13:17:04,602][07437] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000611_2502656.pth +[2024-11-07 13:17:04,603][07437] Stopping LearnerWorker_p0... +[2024-11-07 13:17:04,604][07437] Loop learner_proc0_evt_loop terminating... +[2024-11-07 13:17:04,773][07450] Weights refcount: 2 0 +[2024-11-07 13:17:04,899][07450] Stopping InferenceWorker_p0-w0... +[2024-11-07 13:17:04,900][07450] Loop inference_proc0-0_evt_loop terminating... +[2024-11-07 13:18:34,922][08210] Saving configuration to /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/config.json... +[2024-11-07 13:18:34,924][08210] Rollout worker 0 uses device cpu +[2024-11-07 13:18:34,925][08210] Rollout worker 1 uses device cpu +[2024-11-07 13:18:34,925][08210] Rollout worker 2 uses device cpu +[2024-11-07 13:18:34,927][08210] Rollout worker 3 uses device cpu +[2024-11-07 13:18:34,929][08210] Rollout worker 4 uses device cpu +[2024-11-07 13:18:34,929][08210] Rollout worker 5 uses device cpu +[2024-11-07 13:18:34,931][08210] Rollout worker 6 uses device cpu +[2024-11-07 13:18:34,933][08210] Rollout worker 7 uses device cpu +[2024-11-07 13:18:35,019][08210] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-11-07 13:18:35,021][08210] InferenceWorker_p0-w0: min num requests: 2 +[2024-11-07 13:18:35,053][08210] Starting all processes... +[2024-11-07 13:18:35,054][08210] Starting process learner_proc0 +[2024-11-07 13:18:35,234][08210] Starting all processes... +[2024-11-07 13:18:35,283][08210] Starting process inference_proc0-0 +[2024-11-07 13:18:35,284][08210] Starting process rollout_proc0 +[2024-11-07 13:18:35,284][08210] Starting process rollout_proc1 +[2024-11-07 13:18:35,285][08210] Starting process rollout_proc2 +[2024-11-07 13:18:35,285][08210] Starting process rollout_proc3 +[2024-11-07 13:18:35,289][08210] Starting process rollout_proc4 +[2024-11-07 13:18:35,294][08210] Starting process rollout_proc5 +[2024-11-07 13:18:35,295][08210] Starting process rollout_proc6 +[2024-11-07 13:18:35,300][08210] Starting process rollout_proc7 +[2024-11-07 13:18:40,907][08464] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-11-07 13:18:40,907][08464] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 +[2024-11-07 13:18:40,908][08467] Worker 2 uses CPU cores [2] +[2024-11-07 13:18:40,909][08465] Worker 0 uses CPU cores [0] +[2024-11-07 13:18:40,912][08471] Worker 5 uses CPU cores [5] +[2024-11-07 13:18:41,023][08464] Num visible devices: 1 +[2024-11-07 13:18:41,259][08472] Worker 7 uses CPU cores [0, 1, 2, 3, 4, 5, 6] +[2024-11-07 13:18:41,407][08468] Worker 3 uses CPU cores [3] +[2024-11-07 13:18:41,417][08466] Worker 1 uses CPU cores [1] +[2024-11-07 13:18:41,471][08451] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-11-07 13:18:41,471][08451] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 +[2024-11-07 13:18:41,480][08470] Worker 6 uses CPU cores [6] +[2024-11-07 13:18:41,496][08451] Num visible devices: 1 +[2024-11-07 13:18:41,508][08451] Starting seed is not provided +[2024-11-07 13:18:41,509][08451] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-11-07 13:18:41,509][08451] Initializing actor-critic model on device cuda:0 +[2024-11-07 13:18:41,510][08451] RunningMeanStd input shape: (3, 72, 128) +[2024-11-07 13:18:41,511][08451] RunningMeanStd input shape: (1,) +[2024-11-07 13:18:41,529][08451] ConvEncoder: input_channels=3 +[2024-11-07 13:18:41,562][08469] Worker 4 uses CPU cores [4] +[2024-11-07 13:18:43,335][08451] Conv encoder output size: 512 +[2024-11-07 13:18:43,335][08451] Policy head output size: 512 +[2024-11-07 13:18:43,699][08451] Created Actor Critic model with architecture: +[2024-11-07 13:18:43,699][08451] ActorCriticSharedWeights( + (obs_normalizer): ObservationNormalizer( + (running_mean_std): RunningMeanStdDictInPlace( + (running_mean_std): ModuleDict( + (obs): RunningMeanStdInPlace() + ) + ) + ) + (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) + (encoder): VizdoomEncoder( + (basic_encoder): ConvEncoder( + (enc): RecursiveScriptModule( + original_name=ConvEncoderImpl + (conv_head): RecursiveScriptModule( + original_name=Sequential + (0): RecursiveScriptModule(original_name=Conv2d) + (1): RecursiveScriptModule(original_name=ELU) + (2): RecursiveScriptModule(original_name=Conv2d) + (3): RecursiveScriptModule(original_name=ELU) + (4): RecursiveScriptModule(original_name=Conv2d) + (5): RecursiveScriptModule(original_name=ELU) + ) + (mlp_layers): RecursiveScriptModule( + original_name=Sequential + (0): RecursiveScriptModule(original_name=Linear) + (1): RecursiveScriptModule(original_name=ELU) + ) + ) + ) + ) + (core): ModelCoreRNN( + (core): GRU(512, 512) + ) + (decoder): MlpDecoder( + (mlp): Identity() + ) + (critic_linear): Linear(in_features=512, out_features=1, bias=True) + (action_parameterization): ActionParameterizationDefault( + (distribution_linear): Linear(in_features=512, out_features=5, bias=True) + ) +) +[2024-11-07 13:18:45,204][08451] Using optimizer +[2024-11-07 13:18:53,245][08451] Loading state from checkpoint /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000645_2641920.pth... +[2024-11-07 13:18:53,333][08451] Loading model from checkpoint +[2024-11-07 13:18:53,336][08451] Loaded experiment state at self.train_step=645, self.env_steps=2641920 +[2024-11-07 13:18:53,337][08451] Initialized policy 0 weights for model version 645 +[2024-11-07 13:18:53,346][08451] LearnerWorker_p0 finished initialization! +[2024-11-07 13:18:53,347][08451] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-11-07 13:18:53,607][08464] RunningMeanStd input shape: (3, 72, 128) +[2024-11-07 13:18:53,609][08464] RunningMeanStd input shape: (1,) +[2024-11-07 13:18:53,624][08464] ConvEncoder: input_channels=3 +[2024-11-07 13:18:53,742][08464] Conv encoder output size: 512 +[2024-11-07 13:18:53,742][08464] Policy head output size: 512 +[2024-11-07 13:18:53,788][08210] Inference worker 0-0 is ready! +[2024-11-07 13:18:53,790][08210] All inference workers are ready! Signal rollout workers to start! +[2024-11-07 13:18:53,863][08469] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 13:18:53,871][08465] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 13:18:53,870][08467] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 13:18:53,873][08468] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 13:18:53,887][08466] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 13:18:53,893][08470] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 13:18:53,939][08471] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 13:18:53,942][08472] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 13:18:54,932][08210] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 2641920. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-11-07 13:18:55,011][08210] Heartbeat connected on Batcher_0 +[2024-11-07 13:18:55,015][08210] Heartbeat connected on LearnerWorker_p0 +[2024-11-07 13:18:55,070][08210] Heartbeat connected on InferenceWorker_p0-w0 +[2024-11-07 13:18:56,575][08468] Decorrelating experience for 0 frames... +[2024-11-07 13:18:56,576][08471] Decorrelating experience for 0 frames... +[2024-11-07 13:18:56,576][08470] Decorrelating experience for 0 frames... +[2024-11-07 13:18:56,581][08467] Decorrelating experience for 0 frames... +[2024-11-07 13:18:56,605][08469] Decorrelating experience for 0 frames... +[2024-11-07 13:18:56,950][08468] Decorrelating experience for 32 frames... +[2024-11-07 13:18:56,958][08472] Decorrelating experience for 0 frames... +[2024-11-07 13:18:56,975][08467] Decorrelating experience for 32 frames... +[2024-11-07 13:18:56,977][08466] Decorrelating experience for 0 frames... +[2024-11-07 13:18:57,011][08465] Decorrelating experience for 0 frames... +[2024-11-07 13:18:57,366][08469] Decorrelating experience for 32 frames... +[2024-11-07 13:18:57,369][08471] Decorrelating experience for 32 frames... +[2024-11-07 13:18:57,373][08472] Decorrelating experience for 32 frames... +[2024-11-07 13:18:57,440][08465] Decorrelating experience for 32 frames... +[2024-11-07 13:18:57,500][08467] Decorrelating experience for 64 frames... +[2024-11-07 13:18:57,526][08470] Decorrelating experience for 32 frames... +[2024-11-07 13:18:57,791][08466] Decorrelating experience for 32 frames... +[2024-11-07 13:18:57,879][08472] Decorrelating experience for 64 frames... +[2024-11-07 13:18:57,883][08471] Decorrelating experience for 64 frames... +[2024-11-07 13:18:57,938][08467] Decorrelating experience for 96 frames... +[2024-11-07 13:18:58,010][08210] Heartbeat connected on RolloutWorker_w2 +[2024-11-07 13:18:58,163][08469] Decorrelating experience for 64 frames... +[2024-11-07 13:18:58,383][08470] Decorrelating experience for 64 frames... +[2024-11-07 13:18:58,385][08471] Decorrelating experience for 96 frames... +[2024-11-07 13:18:58,416][08466] Decorrelating experience for 64 frames... +[2024-11-07 13:18:58,472][08472] Decorrelating experience for 96 frames... +[2024-11-07 13:18:58,517][08210] Heartbeat connected on RolloutWorker_w5 +[2024-11-07 13:18:58,602][08210] Heartbeat connected on RolloutWorker_w7 +[2024-11-07 13:18:58,641][08468] Decorrelating experience for 64 frames... +[2024-11-07 13:18:58,688][08469] Decorrelating experience for 96 frames... +[2024-11-07 13:18:58,759][08210] Heartbeat connected on RolloutWorker_w4 +[2024-11-07 13:18:58,861][08470] Decorrelating experience for 96 frames... +[2024-11-07 13:18:58,922][08210] Heartbeat connected on RolloutWorker_w6 +[2024-11-07 13:18:59,109][08466] Decorrelating experience for 96 frames... +[2024-11-07 13:18:59,113][08468] Decorrelating experience for 96 frames... +[2024-11-07 13:18:59,114][08465] Decorrelating experience for 64 frames... +[2024-11-07 13:18:59,233][08210] Heartbeat connected on RolloutWorker_w1 +[2024-11-07 13:18:59,237][08210] Heartbeat connected on RolloutWorker_w3 +[2024-11-07 13:18:59,731][08465] Decorrelating experience for 96 frames... +[2024-11-07 13:18:59,816][08210] Heartbeat connected on RolloutWorker_w0 +[2024-11-07 13:18:59,931][08210] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 2641920. Throughput: 0: 138.0. Samples: 690. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-11-07 13:18:59,933][08210] Avg episode reward: [(0, '0.640')] +[2024-11-07 13:19:01,468][08451] Signal inference workers to stop experience collection... +[2024-11-07 13:19:01,481][08464] InferenceWorker_p0-w0: stopping experience collection +[2024-11-07 13:19:04,931][08210] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 2641920. Throughput: 0: 286.0. Samples: 2860. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-11-07 13:19:04,933][08210] Avg episode reward: [(0, '1.968')] +[2024-11-07 13:19:09,603][08210] Keyboard interrupt detected in the event loop EvtLoop [Runner_EvtLoop, process=main process 8210], exiting... +[2024-11-07 13:19:09,606][08451] Stopping Batcher_0... +[2024-11-07 13:19:09,607][08451] Loop batcher_evt_loop terminating... +[2024-11-07 13:19:09,606][08210] Runner profile tree view: +main_loop: 34.5527 +[2024-11-07 13:19:09,611][08210] Collected {0: 2641920}, FPS: 0.0 +[2024-11-07 13:19:09,650][08464] Weights refcount: 2 0 +[2024-11-07 13:19:09,653][08464] Stopping InferenceWorker_p0-w0... +[2024-11-07 13:19:09,653][08464] Loop inference_proc0-0_evt_loop terminating... +[2024-11-07 13:19:09,747][08466] Stopping RolloutWorker_w1... +[2024-11-07 13:19:09,747][08466] Loop rollout_proc1_evt_loop terminating... +[2024-11-07 13:19:09,749][08471] Stopping RolloutWorker_w5... +[2024-11-07 13:19:09,750][08471] Loop rollout_proc5_evt_loop terminating... +[2024-11-07 13:19:09,800][08469] Stopping RolloutWorker_w4... +[2024-11-07 13:19:09,801][08469] Loop rollout_proc4_evt_loop terminating... +[2024-11-07 13:19:09,837][08470] Stopping RolloutWorker_w6... +[2024-11-07 13:19:09,842][08470] Loop rollout_proc6_evt_loop terminating... +[2024-11-07 13:19:09,862][08472] Stopping RolloutWorker_w7... +[2024-11-07 13:19:09,862][08472] Loop rollout_proc7_evt_loop terminating... +[2024-11-07 13:19:09,869][08467] Stopping RolloutWorker_w2... +[2024-11-07 13:19:09,870][08467] Loop rollout_proc2_evt_loop terminating... +[2024-11-07 13:19:09,900][08468] Stopping RolloutWorker_w3... +[2024-11-07 13:19:09,901][08468] Loop rollout_proc3_evt_loop terminating... +[2024-11-07 13:19:09,962][08465] Stopping RolloutWorker_w0... +[2024-11-07 13:19:09,963][08465] Loop rollout_proc0_evt_loop terminating... +[2024-11-07 13:19:13,550][08451] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000647_2650112.pth... +[2024-11-07 13:19:13,648][08451] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000613_2510848.pth +[2024-11-07 13:19:13,653][08451] Stopping LearnerWorker_p0... +[2024-11-07 13:19:13,655][08451] Loop learner_proc0_evt_loop terminating... +[2024-11-07 13:23:18,807][09379] Saving configuration to /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/config.json... +[2024-11-07 13:23:18,808][09379] Rollout worker 0 uses device cpu +[2024-11-07 13:23:18,809][09379] Rollout worker 1 uses device cpu +[2024-11-07 13:23:18,811][09379] Rollout worker 2 uses device cpu +[2024-11-07 13:23:18,812][09379] Rollout worker 3 uses device cpu +[2024-11-07 13:23:18,813][09379] Rollout worker 4 uses device cpu +[2024-11-07 13:23:18,814][09379] Rollout worker 5 uses device cpu +[2024-11-07 13:23:18,815][09379] Rollout worker 6 uses device cpu +[2024-11-07 13:23:18,816][09379] Rollout worker 7 uses device cpu +[2024-11-07 13:23:18,871][09379] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-11-07 13:23:18,872][09379] InferenceWorker_p0-w0: min num requests: 2 +[2024-11-07 13:23:18,907][09379] Starting all processes... +[2024-11-07 13:23:18,908][09379] Starting process learner_proc0 +[2024-11-07 13:23:19,009][09379] Starting all processes... +[2024-11-07 13:23:19,017][09379] Starting process inference_proc0-0 +[2024-11-07 13:23:19,018][09379] Starting process rollout_proc0 +[2024-11-07 13:23:19,019][09379] Starting process rollout_proc1 +[2024-11-07 13:23:19,021][09379] Starting process rollout_proc2 +[2024-11-07 13:23:19,022][09379] Starting process rollout_proc3 +[2024-11-07 13:23:19,024][09379] Starting process rollout_proc4 +[2024-11-07 13:23:19,027][09379] Starting process rollout_proc5 +[2024-11-07 13:23:19,027][09379] Starting process rollout_proc6 +[2024-11-07 13:23:19,028][09379] Starting process rollout_proc7 +[2024-11-07 13:23:25,787][09667] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-11-07 13:23:25,788][09667] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 +[2024-11-07 13:23:25,968][09689] Worker 2 uses CPU cores [2] +[2024-11-07 13:23:26,036][09692] Worker 5 uses CPU cores [5] +[2024-11-07 13:23:26,074][09667] Num visible devices: 1 +[2024-11-07 13:23:26,179][09667] Starting seed is not provided +[2024-11-07 13:23:26,179][09667] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-11-07 13:23:26,183][09667] Initializing actor-critic model on device cuda:0 +[2024-11-07 13:23:26,184][09667] RunningMeanStd input shape: (3, 72, 128) +[2024-11-07 13:23:26,185][09667] RunningMeanStd input shape: (1,) +[2024-11-07 13:23:26,220][09667] ConvEncoder: input_channels=3 +[2024-11-07 13:23:26,424][09691] Worker 4 uses CPU cores [4] +[2024-11-07 13:23:26,608][09667] Conv encoder output size: 512 +[2024-11-07 13:23:26,609][09667] Policy head output size: 512 +[2024-11-07 13:23:26,646][09667] Created Actor Critic model with architecture: +[2024-11-07 13:23:26,646][09667] ActorCriticSharedWeights( + (obs_normalizer): ObservationNormalizer( + (running_mean_std): RunningMeanStdDictInPlace( + (running_mean_std): ModuleDict( + (obs): RunningMeanStdInPlace() + ) + ) + ) + (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) + (encoder): VizdoomEncoder( + (basic_encoder): ConvEncoder( + (enc): RecursiveScriptModule( + original_name=ConvEncoderImpl + (conv_head): RecursiveScriptModule( + original_name=Sequential + (0): RecursiveScriptModule(original_name=Conv2d) + (1): RecursiveScriptModule(original_name=ELU) + (2): RecursiveScriptModule(original_name=Conv2d) + (3): RecursiveScriptModule(original_name=ELU) + (4): RecursiveScriptModule(original_name=Conv2d) + (5): RecursiveScriptModule(original_name=ELU) + ) + (mlp_layers): RecursiveScriptModule( + original_name=Sequential + (0): RecursiveScriptModule(original_name=Linear) + (1): RecursiveScriptModule(original_name=ELU) + ) + ) + ) + ) + (core): ModelCoreRNN( + (core): GRU(512, 512) + ) + (decoder): MlpDecoder( + (mlp): Identity() + ) + (critic_linear): Linear(in_features=512, out_features=1, bias=True) + (action_parameterization): ActionParameterizationDefault( + (distribution_linear): Linear(in_features=512, out_features=5, bias=True) + ) +) +[2024-11-07 13:23:27,048][09688] Worker 1 uses CPU cores [1] +[2024-11-07 13:23:27,431][09680] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-11-07 13:23:27,432][09680] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 +[2024-11-07 13:23:27,480][09680] Num visible devices: 1 +[2024-11-07 13:23:27,534][09690] Worker 3 uses CPU cores [3] +[2024-11-07 13:23:27,633][09694] Worker 7 uses CPU cores [0, 1, 2, 3, 4, 5, 6] +[2024-11-07 13:23:27,678][09693] Worker 6 uses CPU cores [6] +[2024-11-07 13:23:27,704][09667] Using optimizer +[2024-11-07 13:23:27,848][09687] Worker 0 uses CPU cores [0] +[2024-11-07 13:23:28,869][09667] Loading state from checkpoint /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000647_2650112.pth... +[2024-11-07 13:23:28,945][09667] Loading model from checkpoint +[2024-11-07 13:23:28,947][09667] Loaded experiment state at self.train_step=647, self.env_steps=2650112 +[2024-11-07 13:23:28,948][09667] Initialized policy 0 weights for model version 647 +[2024-11-07 13:23:28,957][09667] LearnerWorker_p0 finished initialization! +[2024-11-07 13:23:28,957][09667] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-11-07 13:23:29,118][09680] RunningMeanStd input shape: (3, 72, 128) +[2024-11-07 13:23:29,119][09680] RunningMeanStd input shape: (1,) +[2024-11-07 13:23:29,131][09680] ConvEncoder: input_channels=3 +[2024-11-07 13:23:29,243][09680] Conv encoder output size: 512 +[2024-11-07 13:23:29,243][09680] Policy head output size: 512 +[2024-11-07 13:23:29,288][09379] Inference worker 0-0 is ready! +[2024-11-07 13:23:29,290][09379] All inference workers are ready! Signal rollout workers to start! +[2024-11-07 13:23:29,358][09691] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 13:23:29,363][09689] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 13:23:29,370][09688] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 13:23:29,377][09690] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 13:23:29,384][09692] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 13:23:29,387][09693] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 13:23:29,417][09687] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 13:23:29,419][09694] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 13:23:29,902][09693] Decorrelating experience for 0 frames... +[2024-11-07 13:23:29,906][09688] Decorrelating experience for 0 frames... +[2024-11-07 13:23:29,907][09689] Decorrelating experience for 0 frames... +[2024-11-07 13:23:29,920][09691] Decorrelating experience for 0 frames... +[2024-11-07 13:23:30,219][09689] Decorrelating experience for 32 frames... +[2024-11-07 13:23:30,219][09692] Decorrelating experience for 0 frames... +[2024-11-07 13:23:30,268][09691] Decorrelating experience for 32 frames... +[2024-11-07 13:23:30,272][09690] Decorrelating experience for 0 frames... +[2024-11-07 13:23:30,558][09692] Decorrelating experience for 32 frames... +[2024-11-07 13:23:30,627][09690] Decorrelating experience for 32 frames... +[2024-11-07 13:23:30,668][09694] Decorrelating experience for 0 frames... +[2024-11-07 13:23:30,700][09693] Decorrelating experience for 32 frames... +[2024-11-07 13:23:30,735][09691] Decorrelating experience for 64 frames... +[2024-11-07 13:23:31,049][09687] Decorrelating experience for 0 frames... +[2024-11-07 13:23:31,224][09689] Decorrelating experience for 64 frames... +[2024-11-07 13:23:31,231][09692] Decorrelating experience for 64 frames... +[2024-11-07 13:23:31,256][09690] Decorrelating experience for 64 frames... +[2024-11-07 13:23:31,270][09688] Decorrelating experience for 32 frames... +[2024-11-07 13:23:31,347][09691] Decorrelating experience for 96 frames... +[2024-11-07 13:23:31,639][09694] Decorrelating experience for 32 frames... +[2024-11-07 13:23:31,693][09689] Decorrelating experience for 96 frames... +[2024-11-07 13:23:31,729][09690] Decorrelating experience for 96 frames... +[2024-11-07 13:23:31,788][09692] Decorrelating experience for 96 frames... +[2024-11-07 13:23:31,844][09693] Decorrelating experience for 64 frames... +[2024-11-07 13:23:31,889][09687] Decorrelating experience for 32 frames... +[2024-11-07 13:23:32,168][09694] Decorrelating experience for 64 frames... +[2024-11-07 13:23:32,323][09693] Decorrelating experience for 96 frames... +[2024-11-07 13:23:32,363][09688] Decorrelating experience for 64 frames... +[2024-11-07 13:23:32,842][09687] Decorrelating experience for 64 frames... +[2024-11-07 13:23:33,039][09694] Decorrelating experience for 96 frames... +[2024-11-07 13:23:33,344][09688] Decorrelating experience for 96 frames... +[2024-11-07 13:23:33,526][09687] Decorrelating experience for 96 frames... +[2024-11-07 13:23:33,808][09379] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 2650112. Throughput: 0: nan. Samples: 108. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-11-07 13:23:33,810][09379] Avg episode reward: [(0, '1.508')] +[2024-11-07 13:23:34,954][09667] Signal inference workers to stop experience collection... +[2024-11-07 13:23:34,966][09680] InferenceWorker_p0-w0: stopping experience collection +[2024-11-07 13:23:38,807][09379] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 2650112. Throughput: 0: 491.2. Samples: 2564. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-11-07 13:23:38,809][09379] Avg episode reward: [(0, '2.011')] +[2024-11-07 13:23:39,687][09379] Heartbeat connected on RolloutWorker_w3 +[2024-11-07 13:23:39,692][09379] Heartbeat connected on RolloutWorker_w1 +[2024-11-07 13:23:39,693][09379] Heartbeat connected on RolloutWorker_w0 +[2024-11-07 13:23:39,695][09379] Heartbeat connected on RolloutWorker_w5 +[2024-11-07 13:23:39,696][09379] Heartbeat connected on RolloutWorker_w2 +[2024-11-07 13:23:39,699][09379] Heartbeat connected on Batcher_0 +[2024-11-07 13:23:39,701][09379] Heartbeat connected on InferenceWorker_p0-w0 +[2024-11-07 13:23:39,704][09379] Heartbeat connected on RolloutWorker_w7 +[2024-11-07 13:23:39,711][09379] Heartbeat connected on RolloutWorker_w6 +[2024-11-07 13:23:39,713][09379] Heartbeat connected on RolloutWorker_w4 +[2024-11-07 13:23:43,239][09667] Signal inference workers to resume experience collection... +[2024-11-07 13:23:43,240][09680] InferenceWorker_p0-w0: resuming experience collection +[2024-11-07 13:23:43,722][09379] Heartbeat connected on LearnerWorker_p0 +[2024-11-07 13:23:43,807][09379] Fps is (10 sec: 819.2, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 2658304. Throughput: 0: 245.6. Samples: 2564. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) +[2024-11-07 13:23:43,812][09379] Avg episode reward: [(0, '2.815')] +[2024-11-07 13:23:48,807][09379] Fps is (10 sec: 3686.3, 60 sec: 2457.6, 300 sec: 2457.6). Total num frames: 2686976. Throughput: 0: 516.8. Samples: 7860. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-11-07 13:23:48,810][09379] Avg episode reward: [(0, '3.854')] +[2024-11-07 13:23:49,227][09680] Updated weights for policy 0, policy_version 657 (0.0145) +[2024-11-07 13:23:53,808][09379] Fps is (10 sec: 5734.4, 60 sec: 3276.8, 300 sec: 3276.8). Total num frames: 2715648. Throughput: 0: 818.3. Samples: 16474. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 13:23:53,810][09379] Avg episode reward: [(0, '4.550')] +[2024-11-07 13:23:56,301][09680] Updated weights for policy 0, policy_version 667 (0.0025) +[2024-11-07 13:23:58,807][09379] Fps is (10 sec: 5734.5, 60 sec: 3768.4, 300 sec: 3768.4). Total num frames: 2744320. Throughput: 0: 838.2. Samples: 21062. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 13:23:58,809][09379] Avg episode reward: [(0, '4.253')] +[2024-11-07 13:24:03,808][09379] Fps is (10 sec: 5324.7, 60 sec: 3959.5, 300 sec: 3959.5). Total num frames: 2768896. Throughput: 0: 961.1. Samples: 28942. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-07 13:24:03,811][09379] Avg episode reward: [(0, '4.408')] +[2024-11-07 13:24:03,980][09680] Updated weights for policy 0, policy_version 677 (0.0052) +[2024-11-07 13:24:08,807][09379] Fps is (10 sec: 5324.7, 60 sec: 4213.0, 300 sec: 4213.0). Total num frames: 2797568. Throughput: 0: 1070.4. Samples: 37572. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 13:24:08,809][09379] Avg episode reward: [(0, '4.407')] +[2024-11-07 13:24:13,168][09680] Updated weights for policy 0, policy_version 687 (0.0044) +[2024-11-07 13:24:13,808][09379] Fps is (10 sec: 4505.2, 60 sec: 4095.9, 300 sec: 4095.9). Total num frames: 2813952. Throughput: 0: 1009.0. Samples: 40470. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 13:24:13,826][09379] Avg episode reward: [(0, '4.455')] +[2024-11-07 13:24:18,813][09379] Fps is (10 sec: 4503.2, 60 sec: 4277.6, 300 sec: 4277.6). Total num frames: 2842624. Throughput: 0: 1037.6. Samples: 46806. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 13:24:18,814][09379] Avg episode reward: [(0, '4.384')] +[2024-11-07 13:24:20,307][09680] Updated weights for policy 0, policy_version 697 (0.0037) +[2024-11-07 13:24:23,808][09379] Fps is (10 sec: 6144.6, 60 sec: 4505.6, 300 sec: 4505.6). Total num frames: 2875392. Throughput: 0: 1211.8. Samples: 57094. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 13:24:23,809][09379] Avg episode reward: [(0, '4.468')] +[2024-11-07 13:24:26,144][09680] Updated weights for policy 0, policy_version 707 (0.0054) +[2024-11-07 13:24:28,808][09379] Fps is (10 sec: 6966.5, 60 sec: 4766.2, 300 sec: 4766.2). Total num frames: 2912256. Throughput: 0: 1328.8. Samples: 62362. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 13:24:28,811][09379] Avg episode reward: [(0, '4.492')] +[2024-11-07 13:24:32,975][09680] Updated weights for policy 0, policy_version 717 (0.0042) +[2024-11-07 13:24:33,808][09379] Fps is (10 sec: 6144.0, 60 sec: 4778.7, 300 sec: 4778.7). Total num frames: 2936832. Throughput: 0: 1416.7. Samples: 71610. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 13:24:33,810][09379] Avg episode reward: [(0, '4.530')] +[2024-11-07 13:24:38,807][09379] Fps is (10 sec: 5734.7, 60 sec: 5324.8, 300 sec: 4915.2). Total num frames: 2969600. Throughput: 0: 1418.6. Samples: 80310. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-07 13:24:38,809][09379] Avg episode reward: [(0, '4.484')] +[2024-11-07 13:24:40,028][09680] Updated weights for policy 0, policy_version 727 (0.0044) +[2024-11-07 13:24:43,808][09379] Fps is (10 sec: 6143.8, 60 sec: 5666.1, 300 sec: 4973.7). Total num frames: 2998272. Throughput: 0: 1412.3. Samples: 84614. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 13:24:43,810][09379] Avg episode reward: [(0, '4.236')] +[2024-11-07 13:24:47,449][09680] Updated weights for policy 0, policy_version 737 (0.0033) +[2024-11-07 13:24:48,807][09379] Fps is (10 sec: 5734.5, 60 sec: 5666.1, 300 sec: 5024.4). Total num frames: 3026944. Throughput: 0: 1418.0. Samples: 92750. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 13:24:48,809][09379] Avg episode reward: [(0, '4.404')] +[2024-11-07 13:24:53,808][09379] Fps is (10 sec: 5324.9, 60 sec: 5597.9, 300 sec: 5017.6). Total num frames: 3051520. Throughput: 0: 1420.4. Samples: 101492. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 13:24:53,809][09379] Avg episode reward: [(0, '4.505')] +[2024-11-07 13:24:55,139][09680] Updated weights for policy 0, policy_version 747 (0.0029) +[2024-11-07 13:24:58,810][09379] Fps is (10 sec: 4504.3, 60 sec: 5461.1, 300 sec: 4963.2). Total num frames: 3072000. Throughput: 0: 1418.6. Samples: 104310. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 13:24:58,816][09379] Avg episode reward: [(0, '4.553')] +[2024-11-07 13:25:03,808][09379] Fps is (10 sec: 4096.0, 60 sec: 5393.1, 300 sec: 4915.2). Total num frames: 3092480. Throughput: 0: 1403.0. Samples: 109932. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 13:25:03,810][09379] Avg episode reward: [(0, '4.461')] +[2024-11-07 13:25:05,380][09680] Updated weights for policy 0, policy_version 757 (0.0053) +[2024-11-07 13:25:08,808][09379] Fps is (10 sec: 4096.9, 60 sec: 5256.5, 300 sec: 4872.1). Total num frames: 3112960. Throughput: 0: 1321.0. Samples: 116540. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 13:25:08,812][09379] Avg episode reward: [(0, '4.396')] +[2024-11-07 13:25:13,828][09379] Fps is (10 sec: 4496.5, 60 sec: 5391.3, 300 sec: 4873.2). Total num frames: 3137536. Throughput: 0: 1286.8. Samples: 120296. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 13:25:13,830][09379] Avg episode reward: [(0, '4.375')] +[2024-11-07 13:25:13,866][09667] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000766_3137536.pth... +[2024-11-07 13:25:14,472][09667] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000645_2641920.pth +[2024-11-07 13:25:14,866][09680] Updated weights for policy 0, policy_version 767 (0.0033) +[2024-11-07 13:25:18,903][09379] Fps is (10 sec: 4868.8, 60 sec: 5316.8, 300 sec: 4871.8). Total num frames: 3162112. Throughput: 0: 1233.0. Samples: 127214. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 13:25:18,908][09379] Avg episode reward: [(0, '4.390')] +[2024-11-07 13:25:22,340][09680] Updated weights for policy 0, policy_version 777 (0.0048) +[2024-11-07 13:25:23,808][09379] Fps is (10 sec: 5335.7, 60 sec: 5256.5, 300 sec: 4915.2). Total num frames: 3190784. Throughput: 0: 1221.7. Samples: 135288. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 13:25:23,810][09379] Avg episode reward: [(0, '4.556')] +[2024-11-07 13:25:28,808][09379] Fps is (10 sec: 5789.6, 60 sec: 5120.0, 300 sec: 4950.8). Total num frames: 3219456. Throughput: 0: 1224.6. Samples: 139722. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 13:25:28,810][09379] Avg episode reward: [(0, '4.370')] +[2024-11-07 13:25:29,662][09680] Updated weights for policy 0, policy_version 787 (0.0035) +[2024-11-07 13:25:33,808][09379] Fps is (10 sec: 5324.9, 60 sec: 5120.0, 300 sec: 4949.3). Total num frames: 3244032. Throughput: 0: 1216.3. Samples: 147484. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 13:25:33,809][09379] Avg episode reward: [(0, '4.519')] +[2024-11-07 13:25:38,682][09680] Updated weights for policy 0, policy_version 797 (0.0058) +[2024-11-07 13:25:38,807][09379] Fps is (10 sec: 4505.8, 60 sec: 4915.2, 300 sec: 4915.2). Total num frames: 3264512. Throughput: 0: 1161.2. Samples: 153746. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 13:25:38,809][09379] Avg episode reward: [(0, '4.491')] +[2024-11-07 13:25:43,807][09379] Fps is (10 sec: 5324.9, 60 sec: 4983.5, 300 sec: 4978.2). Total num frames: 3297280. Throughput: 0: 1204.4. Samples: 158504. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 13:25:43,811][09379] Avg episode reward: [(0, '4.404')] +[2024-11-07 13:25:44,859][09680] Updated weights for policy 0, policy_version 807 (0.0036) +[2024-11-07 13:25:48,807][09379] Fps is (10 sec: 6553.6, 60 sec: 5051.7, 300 sec: 5036.6). Total num frames: 3330048. Throughput: 0: 1305.8. Samples: 168692. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 13:25:48,810][09379] Avg episode reward: [(0, '4.359')] +[2024-11-07 13:25:50,715][09680] Updated weights for policy 0, policy_version 817 (0.0028) +[2024-11-07 13:25:53,809][09379] Fps is (10 sec: 6143.3, 60 sec: 5119.9, 300 sec: 5061.4). Total num frames: 3358720. Throughput: 0: 1362.7. Samples: 177862. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 13:25:53,812][09379] Avg episode reward: [(0, '4.424')] +[2024-11-07 13:25:57,499][09680] Updated weights for policy 0, policy_version 827 (0.0033) +[2024-11-07 13:25:58,808][09379] Fps is (10 sec: 6553.2, 60 sec: 5393.3, 300 sec: 5141.2). Total num frames: 3395584. Throughput: 0: 1392.9. Samples: 182950. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 13:25:58,811][09379] Avg episode reward: [(0, '4.532')] +[2024-11-07 13:26:03,807][09379] Fps is (10 sec: 6144.7, 60 sec: 5461.4, 300 sec: 5133.7). Total num frames: 3420160. Throughput: 0: 1444.6. Samples: 192082. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 13:26:03,810][09379] Avg episode reward: [(0, '4.664')] +[2024-11-07 13:26:04,512][09680] Updated weights for policy 0, policy_version 837 (0.0026) +[2024-11-07 13:26:08,807][09379] Fps is (10 sec: 5325.1, 60 sec: 5597.9, 300 sec: 5153.0). Total num frames: 3448832. Throughput: 0: 1446.3. Samples: 200370. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-11-07 13:26:08,809][09379] Avg episode reward: [(0, '4.565')] +[2024-11-07 13:26:11,306][09680] Updated weights for policy 0, policy_version 847 (0.0035) +[2024-11-07 13:26:13,807][09379] Fps is (10 sec: 6144.0, 60 sec: 5736.4, 300 sec: 5196.8). Total num frames: 3481600. Throughput: 0: 1459.3. Samples: 205392. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-07 13:26:13,809][09379] Avg episode reward: [(0, '4.378')] +[2024-11-07 13:26:18,016][09680] Updated weights for policy 0, policy_version 857 (0.0042) +[2024-11-07 13:26:18,810][09379] Fps is (10 sec: 6143.8, 60 sec: 5811.9, 300 sec: 5213.1). Total num frames: 3510272. Throughput: 0: 1491.9. Samples: 214618. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 13:26:18,815][09379] Avg episode reward: [(0, '4.483')] +[2024-11-07 13:26:23,812][09379] Fps is (10 sec: 6550.9, 60 sec: 5938.8, 300 sec: 5276.5). Total num frames: 3547136. Throughput: 0: 1572.7. Samples: 224524. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 13:26:23,816][09379] Avg episode reward: [(0, '4.256')] +[2024-11-07 13:26:25,085][09680] Updated weights for policy 0, policy_version 867 (0.0039) +[2024-11-07 13:26:28,807][09379] Fps is (10 sec: 5734.6, 60 sec: 5802.7, 300 sec: 5242.9). Total num frames: 3567616. Throughput: 0: 1534.2. Samples: 227542. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 13:26:28,812][09379] Avg episode reward: [(0, '4.291')] +[2024-11-07 13:26:32,256][09379] Keyboard interrupt detected in the event loop EvtLoop [Runner_EvtLoop, process=main process 9379], exiting... +[2024-11-07 13:26:32,269][09379] Runner profile tree view: +main_loop: 193.3628 +[2024-11-07 13:26:32,284][09667] Stopping Batcher_0... +[2024-11-07 13:26:32,285][09667] Loop batcher_evt_loop terminating... +[2024-11-07 13:26:32,272][09379] Collected {0: 3588096}, FPS: 4850.9 +[2024-11-07 13:26:32,366][09680] Weights refcount: 2 0 +[2024-11-07 13:26:32,371][09680] Stopping InferenceWorker_p0-w0... +[2024-11-07 13:26:32,372][09680] Loop inference_proc0-0_evt_loop terminating... +[2024-11-07 13:26:32,392][09667] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000877_3592192.pth... +[2024-11-07 13:26:32,520][09667] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000647_2650112.pth +[2024-11-07 13:26:32,521][09690] Stopping RolloutWorker_w3... +[2024-11-07 13:26:32,521][09690] Loop rollout_proc3_evt_loop terminating... +[2024-11-07 13:26:32,525][09667] Stopping LearnerWorker_p0... +[2024-11-07 13:26:32,526][09667] Loop learner_proc0_evt_loop terminating... +[2024-11-07 13:26:32,530][09693] Stopping RolloutWorker_w6... +[2024-11-07 13:26:32,531][09693] Loop rollout_proc6_evt_loop terminating... +[2024-11-07 13:26:32,534][09692] Stopping RolloutWorker_w5... +[2024-11-07 13:26:32,535][09692] Loop rollout_proc5_evt_loop terminating... +[2024-11-07 13:26:32,578][09688] Stopping RolloutWorker_w1... +[2024-11-07 13:26:32,580][09688] Loop rollout_proc1_evt_loop terminating... +[2024-11-07 13:26:32,586][09691] Stopping RolloutWorker_w4... +[2024-11-07 13:26:32,587][09691] Loop rollout_proc4_evt_loop terminating... +[2024-11-07 13:26:32,714][09694] Stopping RolloutWorker_w7... +[2024-11-07 13:26:32,724][09694] Loop rollout_proc7_evt_loop terminating... +[2024-11-07 13:26:32,772][09689] Stopping RolloutWorker_w2... +[2024-11-07 13:26:32,773][09689] Loop rollout_proc2_evt_loop terminating... +[2024-11-07 13:26:32,823][09687] Stopping RolloutWorker_w0... +[2024-11-07 13:26:32,884][09687] Loop rollout_proc0_evt_loop terminating... +[2024-11-07 13:28:07,838][10732] Saving configuration to /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/config.json... +[2024-11-07 13:28:07,840][10732] Rollout worker 0 uses device cpu +[2024-11-07 13:28:07,842][10732] Rollout worker 1 uses device cpu +[2024-11-07 13:28:07,842][10732] Rollout worker 2 uses device cpu +[2024-11-07 13:28:07,845][10732] Rollout worker 3 uses device cpu +[2024-11-07 13:28:07,848][10732] Rollout worker 4 uses device cpu +[2024-11-07 13:28:07,852][10732] Rollout worker 5 uses device cpu +[2024-11-07 13:28:07,853][10732] Rollout worker 6 uses device cpu +[2024-11-07 13:28:07,854][10732] Rollout worker 7 uses device cpu +[2024-11-07 13:28:07,916][10732] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-11-07 13:28:07,918][10732] InferenceWorker_p0-w0: min num requests: 2 +[2024-11-07 13:28:07,963][10732] Starting all processes... +[2024-11-07 13:28:07,964][10732] Starting process learner_proc0 +[2024-11-07 13:28:08,065][10732] Starting all processes... +[2024-11-07 13:28:08,074][10732] Starting process inference_proc0-0 +[2024-11-07 13:28:08,074][10732] Starting process rollout_proc0 +[2024-11-07 13:28:08,076][10732] Starting process rollout_proc1 +[2024-11-07 13:28:08,077][10732] Starting process rollout_proc2 +[2024-11-07 13:28:08,080][10732] Starting process rollout_proc3 +[2024-11-07 13:28:08,183][10732] Starting process rollout_proc4 +[2024-11-07 13:28:08,186][10732] Starting process rollout_proc5 +[2024-11-07 13:28:08,187][10732] Starting process rollout_proc6 +[2024-11-07 13:28:08,194][10732] Starting process rollout_proc7 +[2024-11-07 13:28:13,465][11017] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-11-07 13:28:13,465][11017] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 +[2024-11-07 13:28:13,666][11017] Num visible devices: 1 +[2024-11-07 13:28:13,735][11017] Starting seed is not provided +[2024-11-07 13:28:13,736][11017] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-11-07 13:28:13,736][11017] Initializing actor-critic model on device cuda:0 +[2024-11-07 13:28:13,737][11017] RunningMeanStd input shape: (3, 72, 128) +[2024-11-07 13:28:13,739][11017] RunningMeanStd input shape: (1,) +[2024-11-07 13:28:13,944][11017] ConvEncoder: input_channels=3 +[2024-11-07 13:28:14,059][11033] Worker 1 uses CPU cores [1] +[2024-11-07 13:28:14,140][11030] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-11-07 13:28:14,140][11030] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 +[2024-11-07 13:28:14,169][11030] Num visible devices: 1 +[2024-11-07 13:28:14,427][11017] Conv encoder output size: 512 +[2024-11-07 13:28:14,433][11017] Policy head output size: 512 +[2024-11-07 13:28:14,484][11017] Created Actor Critic model with architecture: +[2024-11-07 13:28:14,484][11017] ActorCriticSharedWeights( + (obs_normalizer): ObservationNormalizer( + (running_mean_std): RunningMeanStdDictInPlace( + (running_mean_std): ModuleDict( + (obs): RunningMeanStdInPlace() + ) + ) + ) + (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) + (encoder): VizdoomEncoder( + (basic_encoder): ConvEncoder( + (enc): RecursiveScriptModule( + original_name=ConvEncoderImpl + (conv_head): RecursiveScriptModule( + original_name=Sequential + (0): RecursiveScriptModule(original_name=Conv2d) + (1): RecursiveScriptModule(original_name=ELU) + (2): RecursiveScriptModule(original_name=Conv2d) + (3): RecursiveScriptModule(original_name=ELU) + (4): RecursiveScriptModule(original_name=Conv2d) + (5): RecursiveScriptModule(original_name=ELU) + ) + (mlp_layers): RecursiveScriptModule( + original_name=Sequential + (0): RecursiveScriptModule(original_name=Linear) + (1): RecursiveScriptModule(original_name=ELU) + ) + ) + ) + ) + (core): ModelCoreRNN( + (core): GRU(512, 512) + ) + (decoder): MlpDecoder( + (mlp): Identity() + ) + (critic_linear): Linear(in_features=512, out_features=1, bias=True) + (action_parameterization): ActionParameterizationDefault( + (distribution_linear): Linear(in_features=512, out_features=5, bias=True) + ) +) +[2024-11-07 13:28:14,689][11035] Worker 5 uses CPU cores [5] +[2024-11-07 13:28:14,789][11034] Worker 3 uses CPU cores [3] +[2024-11-07 13:28:14,913][11037] Worker 6 uses CPU cores [6] +[2024-11-07 13:28:14,986][11031] Worker 0 uses CPU cores [0] +[2024-11-07 13:28:15,019][11032] Worker 2 uses CPU cores [2] +[2024-11-07 13:28:15,049][11036] Worker 4 uses CPU cores [4] +[2024-11-07 13:28:15,097][11038] Worker 7 uses CPU cores [0, 1, 2, 3, 4, 5, 6] +[2024-11-07 13:28:15,542][11017] Using optimizer +[2024-11-07 13:28:17,478][11017] Loading state from checkpoint /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000877_3592192.pth... +[2024-11-07 13:28:17,542][11017] Loading model from checkpoint +[2024-11-07 13:28:17,546][11017] Loaded experiment state at self.train_step=877, self.env_steps=3592192 +[2024-11-07 13:28:17,546][11017] Initialized policy 0 weights for model version 877 +[2024-11-07 13:28:17,552][11017] LearnerWorker_p0 finished initialization! +[2024-11-07 13:28:17,552][11017] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-11-07 13:28:17,765][11030] RunningMeanStd input shape: (3, 72, 128) +[2024-11-07 13:28:17,766][11030] RunningMeanStd input shape: (1,) +[2024-11-07 13:28:17,790][11030] ConvEncoder: input_channels=3 +[2024-11-07 13:28:17,962][11030] Conv encoder output size: 512 +[2024-11-07 13:28:17,963][11030] Policy head output size: 512 +[2024-11-07 13:28:18,029][10732] Inference worker 0-0 is ready! +[2024-11-07 13:28:18,030][10732] All inference workers are ready! Signal rollout workers to start! +[2024-11-07 13:28:18,116][11032] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 13:28:18,120][11034] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 13:28:18,141][11031] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 13:28:18,148][11033] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 13:28:18,149][11037] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 13:28:18,169][11036] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 13:28:18,241][11035] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 13:28:18,242][11038] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 13:28:18,777][11032] Decorrelating experience for 0 frames... +[2024-11-07 13:28:18,778][11034] Decorrelating experience for 0 frames... +[2024-11-07 13:28:18,779][11036] Decorrelating experience for 0 frames... +[2024-11-07 13:28:18,782][11037] Decorrelating experience for 0 frames... +[2024-11-07 13:28:18,835][11033] Decorrelating experience for 0 frames... +[2024-11-07 13:28:19,234][11034] Decorrelating experience for 32 frames... +[2024-11-07 13:28:19,251][11032] Decorrelating experience for 32 frames... +[2024-11-07 13:28:19,285][11036] Decorrelating experience for 32 frames... +[2024-11-07 13:28:19,362][11035] Decorrelating experience for 0 frames... +[2024-11-07 13:28:19,452][11031] Decorrelating experience for 0 frames... +[2024-11-07 13:28:19,497][11033] Decorrelating experience for 32 frames... +[2024-11-07 13:28:19,812][11038] Decorrelating experience for 0 frames... +[2024-11-07 13:28:19,849][11035] Decorrelating experience for 32 frames... +[2024-11-07 13:28:19,965][11036] Decorrelating experience for 64 frames... +[2024-11-07 13:28:20,085][11033] Decorrelating experience for 64 frames... +[2024-11-07 13:28:20,335][11034] Decorrelating experience for 64 frames... +[2024-11-07 13:28:20,397][11032] Decorrelating experience for 64 frames... +[2024-11-07 13:28:20,421][11038] Decorrelating experience for 32 frames... +[2024-11-07 13:28:20,513][11036] Decorrelating experience for 96 frames... +[2024-11-07 13:28:20,810][11035] Decorrelating experience for 64 frames... +[2024-11-07 13:28:20,917][11034] Decorrelating experience for 96 frames... +[2024-11-07 13:28:20,964][11032] Decorrelating experience for 96 frames... +[2024-11-07 13:28:20,997][11037] Decorrelating experience for 32 frames... +[2024-11-07 13:28:21,136][11038] Decorrelating experience for 64 frames... +[2024-11-07 13:28:21,137][11033] Decorrelating experience for 96 frames... +[2024-11-07 13:28:21,444][11031] Decorrelating experience for 32 frames... +[2024-11-07 13:28:21,567][11037] Decorrelating experience for 64 frames... +[2024-11-07 13:28:21,570][11035] Decorrelating experience for 96 frames... +[2024-11-07 13:28:21,912][10732] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 3592192. Throughput: 0: nan. Samples: 20. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-11-07 13:28:22,075][11031] Decorrelating experience for 64 frames... +[2024-11-07 13:28:22,074][11038] Decorrelating experience for 96 frames... +[2024-11-07 13:28:22,658][11031] Decorrelating experience for 96 frames... +[2024-11-07 13:28:22,698][11037] Decorrelating experience for 96 frames... +[2024-11-07 13:28:23,828][11017] Signal inference workers to stop experience collection... +[2024-11-07 13:28:23,842][11030] InferenceWorker_p0-w0: stopping experience collection +[2024-11-07 13:28:26,912][10732] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 3592192. Throughput: 0: 499.2. Samples: 2516. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-11-07 13:28:26,915][10732] Avg episode reward: [(0, '2.447')] +[2024-11-07 13:28:27,907][10732] Heartbeat connected on Batcher_0 +[2024-11-07 13:28:27,916][10732] Heartbeat connected on InferenceWorker_p0-w0 +[2024-11-07 13:28:27,925][10732] Heartbeat connected on RolloutWorker_w0 +[2024-11-07 13:28:27,932][10732] Heartbeat connected on RolloutWorker_w1 +[2024-11-07 13:28:27,937][10732] Heartbeat connected on RolloutWorker_w2 +[2024-11-07 13:28:27,949][10732] Heartbeat connected on RolloutWorker_w4 +[2024-11-07 13:28:27,951][10732] Heartbeat connected on RolloutWorker_w3 +[2024-11-07 13:28:27,953][10732] Heartbeat connected on RolloutWorker_w5 +[2024-11-07 13:28:27,957][10732] Heartbeat connected on RolloutWorker_w6 +[2024-11-07 13:28:27,962][10732] Heartbeat connected on RolloutWorker_w7 +[2024-11-07 13:28:28,660][11017] Signal inference workers to resume experience collection... +[2024-11-07 13:28:28,661][11030] InferenceWorker_p0-w0: resuming experience collection +[2024-11-07 13:28:29,107][10732] Heartbeat connected on LearnerWorker_p0 +[2024-11-07 13:28:31,912][10732] Fps is (10 sec: 2457.6, 60 sec: 2457.6, 300 sec: 2457.6). Total num frames: 3616768. Throughput: 0: 463.4. Samples: 4654. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-11-07 13:28:31,914][10732] Avg episode reward: [(0, '3.662')] +[2024-11-07 13:28:35,934][11030] Updated weights for policy 0, policy_version 887 (0.0180) +[2024-11-07 13:28:36,967][10732] Fps is (10 sec: 4073.5, 60 sec: 2720.7, 300 sec: 2720.7). Total num frames: 3633152. Throughput: 0: 732.2. Samples: 11044. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-07 13:28:36,971][10732] Avg episode reward: [(0, '4.103')] +[2024-11-07 13:28:41,912][10732] Fps is (10 sec: 3686.3, 60 sec: 3072.0, 300 sec: 3072.0). Total num frames: 3653632. Throughput: 0: 672.3. Samples: 13466. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 13:28:41,921][10732] Avg episode reward: [(0, '4.330')] +[2024-11-07 13:28:46,604][11030] Updated weights for policy 0, policy_version 897 (0.0061) +[2024-11-07 13:28:46,912][10732] Fps is (10 sec: 4118.6, 60 sec: 3276.8, 300 sec: 3276.8). Total num frames: 3674112. Throughput: 0: 772.6. Samples: 19336. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 13:28:46,914][10732] Avg episode reward: [(0, '4.329')] +[2024-11-07 13:28:51,912][10732] Fps is (10 sec: 4915.4, 60 sec: 3686.4, 300 sec: 3686.4). Total num frames: 3702784. Throughput: 0: 914.3. Samples: 27448. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 13:28:51,914][10732] Avg episode reward: [(0, '4.308')] +[2024-11-07 13:28:53,575][11030] Updated weights for policy 0, policy_version 907 (0.0034) +[2024-11-07 13:28:56,912][10732] Fps is (10 sec: 6144.1, 60 sec: 4096.0, 300 sec: 4096.0). Total num frames: 3735552. Throughput: 0: 919.2. Samples: 32192. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 13:28:56,914][10732] Avg episode reward: [(0, '4.552')] +[2024-11-07 13:28:59,822][11030] Updated weights for policy 0, policy_version 917 (0.0037) +[2024-11-07 13:29:01,912][10732] Fps is (10 sec: 6143.8, 60 sec: 4300.8, 300 sec: 4300.8). Total num frames: 3764224. Throughput: 0: 1046.4. Samples: 41876. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-11-07 13:29:01,919][10732] Avg episode reward: [(0, '4.401')] +[2024-11-07 13:29:06,286][11030] Updated weights for policy 0, policy_version 927 (0.0031) +[2024-11-07 13:29:06,912][10732] Fps is (10 sec: 6144.1, 60 sec: 4551.1, 300 sec: 4551.1). Total num frames: 3796992. Throughput: 0: 1145.6. Samples: 51570. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 13:29:06,913][10732] Avg episode reward: [(0, '4.400')] +[2024-11-07 13:29:11,912][10732] Fps is (10 sec: 6144.2, 60 sec: 4669.5, 300 sec: 4669.5). Total num frames: 3825664. Throughput: 0: 1200.0. Samples: 56514. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-07 13:29:11,915][10732] Avg episode reward: [(0, '4.256')] +[2024-11-07 13:29:13,526][11030] Updated weights for policy 0, policy_version 937 (0.0024) +[2024-11-07 13:29:16,912][10732] Fps is (10 sec: 6144.0, 60 sec: 4840.8, 300 sec: 4840.8). Total num frames: 3858432. Throughput: 0: 1328.4. Samples: 64430. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 13:29:16,913][10732] Avg episode reward: [(0, '4.483')] +[2024-11-07 13:29:20,208][11030] Updated weights for policy 0, policy_version 947 (0.0058) +[2024-11-07 13:29:21,912][10732] Fps is (10 sec: 6553.5, 60 sec: 4983.5, 300 sec: 4983.5). Total num frames: 3891200. Throughput: 0: 1397.2. Samples: 73840. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-07 13:29:21,914][10732] Avg episode reward: [(0, '4.370')] +[2024-11-07 13:29:25,891][11030] Updated weights for policy 0, policy_version 957 (0.0021) +[2024-11-07 13:29:26,912][10732] Fps is (10 sec: 6553.6, 60 sec: 5529.6, 300 sec: 5104.3). Total num frames: 3923968. Throughput: 0: 1463.5. Samples: 79324. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-11-07 13:29:26,913][10732] Avg episode reward: [(0, '4.497')] +[2024-11-07 13:29:31,912][10732] Fps is (10 sec: 6553.7, 60 sec: 5666.2, 300 sec: 5207.8). Total num frames: 3956736. Throughput: 0: 1552.5. Samples: 89200. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 13:29:31,914][10732] Avg episode reward: [(0, '4.394')] +[2024-11-07 13:29:32,158][11030] Updated weights for policy 0, policy_version 967 (0.0034) +[2024-11-07 13:29:36,913][10732] Fps is (10 sec: 6552.9, 60 sec: 5944.6, 300 sec: 5297.4). Total num frames: 3989504. Throughput: 0: 1599.2. Samples: 99414. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 13:29:36,918][10732] Avg episode reward: [(0, '4.530')] +[2024-11-07 13:29:38,554][11030] Updated weights for policy 0, policy_version 977 (0.0027) +[2024-11-07 13:29:39,066][11017] Stopping Batcher_0... +[2024-11-07 13:29:39,067][11017] Loop batcher_evt_loop terminating... +[2024-11-07 13:29:39,068][10732] Component Batcher_0 stopped! +[2024-11-07 13:29:39,072][11017] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... +[2024-11-07 13:29:39,102][11030] Weights refcount: 2 0 +[2024-11-07 13:29:39,104][11030] Stopping InferenceWorker_p0-w0... +[2024-11-07 13:29:39,105][11030] Loop inference_proc0-0_evt_loop terminating... +[2024-11-07 13:29:39,105][10732] Component InferenceWorker_p0-w0 stopped! +[2024-11-07 13:29:39,175][11034] Stopping RolloutWorker_w3... +[2024-11-07 13:29:39,176][11034] Loop rollout_proc3_evt_loop terminating... +[2024-11-07 13:29:39,175][10732] Component RolloutWorker_w3 stopped! +[2024-11-07 13:29:39,187][11031] Stopping RolloutWorker_w0... +[2024-11-07 13:29:39,188][11031] Loop rollout_proc0_evt_loop terminating... +[2024-11-07 13:29:39,187][10732] Component RolloutWorker_w0 stopped! +[2024-11-07 13:29:39,194][11036] Stopping RolloutWorker_w4... +[2024-11-07 13:29:39,196][11036] Loop rollout_proc4_evt_loop terminating... +[2024-11-07 13:29:39,195][10732] Component RolloutWorker_w4 stopped! +[2024-11-07 13:29:39,209][10732] Component RolloutWorker_w1 stopped! +[2024-11-07 13:29:39,209][11033] Stopping RolloutWorker_w1... +[2024-11-07 13:29:39,213][11033] Loop rollout_proc1_evt_loop terminating... +[2024-11-07 13:29:39,218][11017] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000766_3137536.pth +[2024-11-07 13:29:39,235][11017] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... +[2024-11-07 13:29:39,256][11038] Stopping RolloutWorker_w7... +[2024-11-07 13:29:39,255][10732] Component RolloutWorker_w7 stopped! +[2024-11-07 13:29:39,263][11038] Loop rollout_proc7_evt_loop terminating... +[2024-11-07 13:29:39,269][10732] Component RolloutWorker_w6 stopped! +[2024-11-07 13:29:39,268][11037] Stopping RolloutWorker_w6... +[2024-11-07 13:29:39,281][11037] Loop rollout_proc6_evt_loop terminating... +[2024-11-07 13:29:39,390][11032] Stopping RolloutWorker_w2... +[2024-11-07 13:29:39,392][11032] Loop rollout_proc2_evt_loop terminating... +[2024-11-07 13:29:39,390][10732] Component RolloutWorker_w2 stopped! +[2024-11-07 13:29:39,486][11035] Stopping RolloutWorker_w5... +[2024-11-07 13:29:39,485][10732] Component RolloutWorker_w5 stopped! +[2024-11-07 13:29:39,489][11035] Loop rollout_proc5_evt_loop terminating... +[2024-11-07 13:29:39,645][10732] Component LearnerWorker_p0 stopped! +[2024-11-07 13:29:39,651][10732] Waiting for process learner_proc0 to stop... +[2024-11-07 13:29:39,645][11017] Stopping LearnerWorker_p0... +[2024-11-07 13:29:39,654][11017] Loop learner_proc0_evt_loop terminating... +[2024-11-07 13:29:41,382][10732] Waiting for process inference_proc0-0 to join... +[2024-11-07 13:29:41,383][10732] Waiting for process rollout_proc0 to join... +[2024-11-07 13:29:41,384][10732] Waiting for process rollout_proc1 to join... +[2024-11-07 13:29:41,386][10732] Waiting for process rollout_proc2 to join... +[2024-11-07 13:29:41,387][10732] Waiting for process rollout_proc3 to join... +[2024-11-07 13:29:41,388][10732] Waiting for process rollout_proc4 to join... +[2024-11-07 13:29:41,389][10732] Waiting for process rollout_proc5 to join... +[2024-11-07 13:29:41,391][10732] Waiting for process rollout_proc6 to join... +[2024-11-07 13:29:41,392][10732] Waiting for process rollout_proc7 to join... +[2024-11-07 13:29:41,393][10732] Batcher 0 profile tree view: +batching: 4.1772, releasing_batches: 0.0071 +[2024-11-07 13:29:41,394][10732] InferenceWorker_p0-w0 profile tree view: +wait_policy: 0.0001 + wait_policy_total: 3.3427 +update_model: 1.2367 + weight_update: 0.0027 +one_step: 0.0111 + handle_policy_step: 68.2333 + deserialize: 1.8008, stack: 0.3402, obs_to_device_normalize: 19.5773, forward: 29.6476, send_messages: 4.4855 + prepare_outputs: 10.0586 + to_cpu: 6.9987 +[2024-11-07 13:29:41,396][10732] Learner 0 profile tree view: +misc: 0.0005, prepare_batch: 3.8501 +train: 16.4720 + epoch_init: 0.0008, minibatch_init: 0.0049, losses_postprocess: 0.1092, kl_divergence: 0.1493, after_optimizer: 0.7284 + calculate_losses: 4.1094 + losses_init: 0.0007, forward_head: 0.6633, bptt_initial: 2.3525, tail: 0.1590, advantages_returns: 0.0443, losses: 0.4608 + bptt: 0.3909 + bptt_forward_core: 0.3798 + update: 11.2589 + clip: 0.2380 +[2024-11-07 13:29:41,396][10732] RolloutWorker_w0 profile tree view: +wait_for_trajectories: 0.0249, enqueue_policy_requests: 1.6630, env_step: 18.9067, overhead: 1.4135, complete_rollouts: 0.0593 +save_policy_outputs: 2.1104 + split_output_tensors: 0.7211 +[2024-11-07 13:29:41,397][10732] RolloutWorker_w7 profile tree view: +wait_for_trajectories: 0.0229, enqueue_policy_requests: 1.5401, env_step: 31.2942, overhead: 1.4459, complete_rollouts: 0.0523 +save_policy_outputs: 2.0803 + split_output_tensors: 0.6932 +[2024-11-07 13:29:41,401][10732] Loop Runner_EvtLoop terminating... +[2024-11-07 13:29:41,403][10732] Runner profile tree view: +main_loop: 93.4405 +[2024-11-07 13:29:41,404][10732] Collected {0: 4005888}, FPS: 4427.4 +[2024-11-07 13:39:17,255][11922] Saving configuration to /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/config.json... +[2024-11-07 13:39:17,258][11922] Rollout worker 0 uses device cpu +[2024-11-07 13:39:17,259][11922] Rollout worker 1 uses device cpu +[2024-11-07 13:39:17,261][11922] Rollout worker 2 uses device cpu +[2024-11-07 13:39:17,263][11922] Rollout worker 3 uses device cpu +[2024-11-07 13:39:17,265][11922] Rollout worker 4 uses device cpu +[2024-11-07 13:39:17,266][11922] Rollout worker 5 uses device cpu +[2024-11-07 13:39:17,268][11922] Rollout worker 6 uses device cpu +[2024-11-07 13:39:17,269][11922] Rollout worker 7 uses device cpu +[2024-11-07 13:39:17,330][11922] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-11-07 13:39:17,332][11922] InferenceWorker_p0-w0: min num requests: 2 +[2024-11-07 13:39:17,371][11922] Starting all processes... +[2024-11-07 13:39:17,373][11922] Starting process learner_proc0 +[2024-11-07 13:39:17,483][11922] Starting all processes... +[2024-11-07 13:39:17,490][11922] Starting process inference_proc0-0 +[2024-11-07 13:39:17,561][11922] Starting process rollout_proc0 +[2024-11-07 13:39:17,570][11922] Starting process rollout_proc4 +[2024-11-07 13:39:17,564][11922] Starting process rollout_proc2 +[2024-11-07 13:39:17,569][11922] Starting process rollout_proc3 +[2024-11-07 13:39:17,571][11922] Starting process rollout_proc5 +[2024-11-07 13:39:17,563][11922] Starting process rollout_proc1 +[2024-11-07 13:39:17,571][11922] Starting process rollout_proc6 +[2024-11-07 13:39:17,578][11922] Starting process rollout_proc7 +[2024-11-07 13:39:23,020][13795] Worker 2 uses CPU cores [2] +[2024-11-07 13:39:23,164][13792] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-11-07 13:39:23,165][13792] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 +[2024-11-07 13:39:23,266][13798] Worker 1 uses CPU cores [1] +[2024-11-07 13:39:23,338][13792] Num visible devices: 1 +[2024-11-07 13:39:23,429][13793] Worker 0 uses CPU cores [0] +[2024-11-07 13:39:23,606][13779] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-11-07 13:39:23,607][13779] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 +[2024-11-07 13:39:23,643][13779] Num visible devices: 1 +[2024-11-07 13:39:23,662][13779] Starting seed is not provided +[2024-11-07 13:39:23,662][13779] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-11-07 13:39:23,663][13779] Initializing actor-critic model on device cuda:0 +[2024-11-07 13:39:23,663][13779] RunningMeanStd input shape: (3, 72, 128) +[2024-11-07 13:39:23,664][13779] RunningMeanStd input shape: (1,) +[2024-11-07 13:39:23,687][13779] ConvEncoder: input_channels=3 +[2024-11-07 13:39:23,929][13794] Worker 4 uses CPU cores [4] +[2024-11-07 13:39:24,026][13779] Conv encoder output size: 512 +[2024-11-07 13:39:24,027][13779] Policy head output size: 512 +[2024-11-07 13:39:24,029][13797] Worker 5 uses CPU cores [5] +[2024-11-07 13:39:24,044][13779] Created Actor Critic model with architecture: +[2024-11-07 13:39:24,045][13779] ActorCriticSharedWeights( + (obs_normalizer): ObservationNormalizer( + (running_mean_std): RunningMeanStdDictInPlace( + (running_mean_std): ModuleDict( + (obs): RunningMeanStdInPlace() + ) + ) + ) + (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) + (encoder): VizdoomEncoder( + (basic_encoder): ConvEncoder( + (enc): RecursiveScriptModule( + original_name=ConvEncoderImpl + (conv_head): RecursiveScriptModule( + original_name=Sequential + (0): RecursiveScriptModule(original_name=Conv2d) + (1): RecursiveScriptModule(original_name=ELU) + (2): RecursiveScriptModule(original_name=Conv2d) + (3): RecursiveScriptModule(original_name=ELU) + (4): RecursiveScriptModule(original_name=Conv2d) + (5): RecursiveScriptModule(original_name=ELU) + ) + (mlp_layers): RecursiveScriptModule( + original_name=Sequential + (0): RecursiveScriptModule(original_name=Linear) + (1): RecursiveScriptModule(original_name=ELU) + ) + ) + ) + ) + (core): ModelCoreRNN( + (core): GRU(512, 512) + ) + (decoder): MlpDecoder( + (mlp): Identity() + ) + (critic_linear): Linear(in_features=512, out_features=1, bias=True) + (action_parameterization): ActionParameterizationDefault( + (distribution_linear): Linear(in_features=512, out_features=5, bias=True) + ) +) +[2024-11-07 13:39:24,052][13806] Worker 7 uses CPU cores [0, 1, 2, 3, 4, 5, 6] +[2024-11-07 13:39:24,184][13796] Worker 3 uses CPU cores [3] +[2024-11-07 13:39:24,380][13805] Worker 6 uses CPU cores [6] +[2024-11-07 13:39:24,922][13779] Using optimizer +[2024-11-07 13:39:26,175][13779] Loading state from checkpoint /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... +[2024-11-07 13:39:26,246][13779] Loading model from checkpoint +[2024-11-07 13:39:26,249][13779] Loaded experiment state at self.train_step=978, self.env_steps=4005888 +[2024-11-07 13:39:26,249][13779] Initialized policy 0 weights for model version 978 +[2024-11-07 13:39:26,260][13779] LearnerWorker_p0 finished initialization! +[2024-11-07 13:39:26,261][13779] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-11-07 13:39:26,511][13792] RunningMeanStd input shape: (3, 72, 128) +[2024-11-07 13:39:26,512][13792] RunningMeanStd input shape: (1,) +[2024-11-07 13:39:26,539][13792] ConvEncoder: input_channels=3 +[2024-11-07 13:39:26,678][13792] Conv encoder output size: 512 +[2024-11-07 13:39:26,678][13792] Policy head output size: 512 +[2024-11-07 13:39:26,736][11922] Inference worker 0-0 is ready! +[2024-11-07 13:39:26,738][11922] All inference workers are ready! Signal rollout workers to start! +[2024-11-07 13:39:26,827][13795] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 13:39:26,837][13796] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 13:39:26,845][13794] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 13:39:26,898][13797] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 13:39:26,909][13798] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 13:39:26,923][13805] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 13:39:26,975][13806] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 13:39:27,065][13793] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 13:39:27,367][11922] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 4005888. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-11-07 13:39:27,764][13795] Decorrelating experience for 0 frames... +[2024-11-07 13:39:27,768][13796] Decorrelating experience for 0 frames... +[2024-11-07 13:39:27,990][13798] Decorrelating experience for 0 frames... +[2024-11-07 13:39:28,044][13797] Decorrelating experience for 0 frames... +[2024-11-07 13:39:28,045][13793] Decorrelating experience for 0 frames... +[2024-11-07 13:39:28,337][13795] Decorrelating experience for 32 frames... +[2024-11-07 13:39:28,360][13806] Decorrelating experience for 0 frames... +[2024-11-07 13:39:28,566][13798] Decorrelating experience for 32 frames... +[2024-11-07 13:39:28,636][13805] Decorrelating experience for 0 frames... +[2024-11-07 13:39:28,639][13793] Decorrelating experience for 32 frames... +[2024-11-07 13:39:28,863][13797] Decorrelating experience for 32 frames... +[2024-11-07 13:39:28,870][13794] Decorrelating experience for 0 frames... +[2024-11-07 13:39:28,905][13795] Decorrelating experience for 64 frames... +[2024-11-07 13:39:29,069][13796] Decorrelating experience for 32 frames... +[2024-11-07 13:39:29,786][13795] Decorrelating experience for 96 frames... +[2024-11-07 13:39:29,787][13794] Decorrelating experience for 32 frames... +[2024-11-07 13:39:29,824][13805] Decorrelating experience for 32 frames... +[2024-11-07 13:39:29,869][13806] Decorrelating experience for 32 frames... +[2024-11-07 13:39:29,878][13798] Decorrelating experience for 64 frames... +[2024-11-07 13:39:30,063][13796] Decorrelating experience for 64 frames... +[2024-11-07 13:39:30,112][13797] Decorrelating experience for 64 frames... +[2024-11-07 13:39:30,492][13794] Decorrelating experience for 64 frames... +[2024-11-07 13:39:30,540][13806] Decorrelating experience for 64 frames... +[2024-11-07 13:39:30,598][13798] Decorrelating experience for 96 frames... +[2024-11-07 13:39:30,629][13796] Decorrelating experience for 96 frames... +[2024-11-07 13:39:30,669][13797] Decorrelating experience for 96 frames... +[2024-11-07 13:39:30,869][13793] Decorrelating experience for 64 frames... +[2024-11-07 13:39:31,079][13794] Decorrelating experience for 96 frames... +[2024-11-07 13:39:31,145][13806] Decorrelating experience for 96 frames... +[2024-11-07 13:39:31,182][13805] Decorrelating experience for 64 frames... +[2024-11-07 13:39:31,415][13793] Decorrelating experience for 96 frames... +[2024-11-07 13:39:31,709][13805] Decorrelating experience for 96 frames... +[2024-11-07 13:39:32,367][11922] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 4005888. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-11-07 13:39:32,368][11922] Avg episode reward: [(0, '1.136')] +[2024-11-07 13:39:33,228][13779] Signal inference workers to stop experience collection... +[2024-11-07 13:39:33,240][13792] InferenceWorker_p0-w0: stopping experience collection +[2024-11-07 13:39:38,026][11922] Heartbeat connected on InferenceWorker_p0-w0 +[2024-11-07 13:39:38,028][11922] Heartbeat connected on RolloutWorker_w6 +[2024-11-07 13:39:38,030][11922] Heartbeat connected on Batcher_0 +[2024-11-07 13:39:38,032][11922] Heartbeat connected on RolloutWorker_w5 +[2024-11-07 13:39:38,034][11922] Heartbeat connected on RolloutWorker_w7 +[2024-11-07 13:39:38,036][11922] Heartbeat connected on RolloutWorker_w0 +[2024-11-07 13:39:38,039][11922] Heartbeat connected on RolloutWorker_w3 +[2024-11-07 13:39:38,040][11922] Heartbeat connected on RolloutWorker_w2 +[2024-11-07 13:39:38,046][11922] Heartbeat connected on RolloutWorker_w1 +[2024-11-07 13:39:38,048][11922] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 4005888. Throughput: 0: 243.6. Samples: 2602. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-11-07 13:39:38,051][11922] Avg episode reward: [(0, '2.131')] +[2024-11-07 13:39:38,063][11922] Heartbeat connected on RolloutWorker_w4 +[2024-11-07 13:39:42,367][11922] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 4005888. Throughput: 0: 173.5. Samples: 2602. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-11-07 13:39:42,369][11922] Avg episode reward: [(0, '2.131')] +[2024-11-07 13:39:43,444][13779] Signal inference workers to resume experience collection... +[2024-11-07 13:39:43,445][13779] Stopping Batcher_0... +[2024-11-07 13:39:43,445][13792] InferenceWorker_p0-w0: resuming experience collection +[2024-11-07 13:39:43,446][13779] Loop batcher_evt_loop terminating... +[2024-11-07 13:39:43,453][11922] Component Batcher_0 stopped! +[2024-11-07 13:39:43,517][13792] Weights refcount: 2 0 +[2024-11-07 13:39:43,545][13792] Stopping InferenceWorker_p0-w0... +[2024-11-07 13:39:43,546][13792] Loop inference_proc0-0_evt_loop terminating... +[2024-11-07 13:39:43,549][11922] Component InferenceWorker_p0-w0 stopped! +[2024-11-07 13:39:44,148][11922] Component RolloutWorker_w2 stopped! +[2024-11-07 13:39:44,150][13795] Stopping RolloutWorker_w2... +[2024-11-07 13:39:44,151][13795] Loop rollout_proc2_evt_loop terminating... +[2024-11-07 13:39:44,236][13794] Stopping RolloutWorker_w4... +[2024-11-07 13:39:44,236][13794] Loop rollout_proc4_evt_loop terminating... +[2024-11-07 13:39:44,235][11922] Component RolloutWorker_w4 stopped! +[2024-11-07 13:39:44,241][13793] Stopping RolloutWorker_w0... +[2024-11-07 13:39:44,242][13793] Loop rollout_proc0_evt_loop terminating... +[2024-11-07 13:39:44,242][11922] Component RolloutWorker_w0 stopped! +[2024-11-07 13:39:44,288][13806] Stopping RolloutWorker_w7... +[2024-11-07 13:39:44,288][11922] Component RolloutWorker_w7 stopped! +[2024-11-07 13:39:44,295][13806] Loop rollout_proc7_evt_loop terminating... +[2024-11-07 13:39:44,331][13779] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000980_4014080.pth... +[2024-11-07 13:39:44,336][11922] Heartbeat connected on LearnerWorker_p0 +[2024-11-07 13:39:44,381][11922] Component RolloutWorker_w6 stopped! +[2024-11-07 13:39:44,398][13797] Stopping RolloutWorker_w5... +[2024-11-07 13:39:44,399][13797] Loop rollout_proc5_evt_loop terminating... +[2024-11-07 13:39:44,400][11922] Component RolloutWorker_w5 stopped! +[2024-11-07 13:39:44,380][13805] Stopping RolloutWorker_w6... +[2024-11-07 13:39:44,412][13805] Loop rollout_proc6_evt_loop terminating... +[2024-11-07 13:39:44,561][13796] Stopping RolloutWorker_w3... +[2024-11-07 13:39:44,561][13796] Loop rollout_proc3_evt_loop terminating... +[2024-11-07 13:39:44,561][11922] Component RolloutWorker_w3 stopped! +[2024-11-07 13:39:44,604][13798] Stopping RolloutWorker_w1... +[2024-11-07 13:39:44,609][13798] Loop rollout_proc1_evt_loop terminating... +[2024-11-07 13:39:44,605][11922] Component RolloutWorker_w1 stopped! +[2024-11-07 13:39:44,765][13779] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000877_3592192.pth +[2024-11-07 13:39:44,816][13779] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000980_4014080.pth... +[2024-11-07 13:39:45,460][13779] Stopping LearnerWorker_p0... +[2024-11-07 13:39:45,460][13779] Loop learner_proc0_evt_loop terminating... +[2024-11-07 13:39:45,461][11922] Component LearnerWorker_p0 stopped! +[2024-11-07 13:39:45,466][11922] Waiting for process learner_proc0 to stop... +[2024-11-07 13:39:47,174][11922] Waiting for process inference_proc0-0 to join... +[2024-11-07 13:39:47,176][11922] Waiting for process rollout_proc0 to join... +[2024-11-07 13:39:47,178][11922] Waiting for process rollout_proc1 to join... +[2024-11-07 13:39:47,180][11922] Waiting for process rollout_proc2 to join... +[2024-11-07 13:39:47,182][11922] Waiting for process rollout_proc3 to join... +[2024-11-07 13:39:47,184][11922] Waiting for process rollout_proc4 to join... +[2024-11-07 13:39:47,186][11922] Waiting for process rollout_proc5 to join... +[2024-11-07 13:39:47,189][11922] Waiting for process rollout_proc6 to join... +[2024-11-07 13:39:47,195][11922] Waiting for process rollout_proc7 to join... +[2024-11-07 13:39:47,199][11922] Batcher 0 profile tree view: +batching: 0.0833, releasing_batches: 0.0013 +[2024-11-07 13:39:47,201][11922] InferenceWorker_p0-w0 profile tree view: +update_model: 0.0149 +wait_policy: 0.0000 + wait_policy_total: 3.0989 +one_step: 0.0127 + handle_policy_step: 3.2174 + deserialize: 0.0602, stack: 0.0094, obs_to_device_normalize: 0.7060, forward: 1.9784, send_messages: 0.1164 + prepare_outputs: 0.2735 + to_cpu: 0.2112 +[2024-11-07 13:39:47,204][11922] Learner 0 profile tree view: +misc: 0.0000, prepare_batch: 1.8149 +train: 10.0389 + epoch_init: 0.0000, minibatch_init: 0.0000, losses_postprocess: 0.0008, kl_divergence: 0.0238, after_optimizer: 3.8432 + calculate_losses: 0.8068 + losses_init: 0.0000, forward_head: 0.5561, bptt_initial: 0.1521, tail: 0.0237, advantages_returns: 0.0014, losses: 0.0411 + bptt: 0.0316 + bptt_forward_core: 0.0313 + update: 5.3565 + clip: 0.0752 +[2024-11-07 13:39:47,207][11922] RolloutWorker_w0 profile tree view: +wait_for_trajectories: 0.0008, enqueue_policy_requests: 0.0580, env_step: 0.4822, overhead: 0.0318, complete_rollouts: 0.0011 +save_policy_outputs: 0.0730 + split_output_tensors: 0.0210 +[2024-11-07 13:39:47,210][11922] RolloutWorker_w7 profile tree view: +wait_for_trajectories: 0.0010, enqueue_policy_requests: 0.0571, env_step: 0.8575, overhead: 0.0384, complete_rollouts: 0.0041 +save_policy_outputs: 0.0486 + split_output_tensors: 0.0154 +[2024-11-07 13:39:47,215][11922] Loop Runner_EvtLoop terminating... +[2024-11-07 13:39:47,218][11922] Runner profile tree view: +main_loop: 29.8465 +[2024-11-07 13:39:47,222][11922] Collected {0: 4014080}, FPS: 274.5 +[2024-11-07 13:46:19,573][11922] Environment doom_basic already registered, overwriting... +[2024-11-07 13:46:19,579][11922] Environment doom_two_colors_easy already registered, overwriting... +[2024-11-07 13:46:19,582][11922] Environment doom_two_colors_hard already registered, overwriting... +[2024-11-07 13:46:19,583][11922] Environment doom_dm already registered, overwriting... +[2024-11-07 13:46:19,586][11922] Environment doom_dwango5 already registered, overwriting... +[2024-11-07 13:46:19,589][11922] Environment doom_my_way_home_flat_actions already registered, overwriting... +[2024-11-07 13:46:19,593][11922] Environment doom_defend_the_center_flat_actions already registered, overwriting... +[2024-11-07 13:46:19,595][11922] Environment doom_my_way_home already registered, overwriting... +[2024-11-07 13:46:19,597][11922] Environment doom_deadly_corridor already registered, overwriting... +[2024-11-07 13:46:19,599][11922] Environment doom_defend_the_center already registered, overwriting... +[2024-11-07 13:46:19,601][11922] Environment doom_defend_the_line already registered, overwriting... +[2024-11-07 13:46:19,603][11922] Environment doom_health_gathering already registered, overwriting... +[2024-11-07 13:46:19,605][11922] Environment doom_health_gathering_supreme already registered, overwriting... +[2024-11-07 13:46:19,607][11922] Environment doom_battle already registered, overwriting... +[2024-11-07 13:46:19,608][11922] Environment doom_battle2 already registered, overwriting... +[2024-11-07 13:46:19,611][11922] Environment doom_duel_bots already registered, overwriting... +[2024-11-07 13:46:19,613][11922] Environment doom_deathmatch_bots already registered, overwriting... +[2024-11-07 13:46:19,616][11922] Environment doom_duel already registered, overwriting... +[2024-11-07 13:46:19,620][11922] Environment doom_deathmatch_full already registered, overwriting... +[2024-11-07 13:46:19,622][11922] Environment doom_benchmark already registered, overwriting... +[2024-11-07 13:46:19,624][11922] register_encoder_factory: +[2024-11-07 13:46:19,794][11922] Loading existing experiment configuration from /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/config.json +[2024-11-07 13:46:19,803][11922] Experiment dir /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment already exists! +[2024-11-07 13:46:19,805][11922] Resuming existing experiment from /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment... +[2024-11-07 13:46:19,806][11922] Weights and Biases integration disabled +[2024-11-07 13:46:19,812][11922] Environment var CUDA_VISIBLE_DEVICES is 0 + +[2024-11-07 13:46:25,974][11922] Starting experiment with the following configuration: +help=False +algo=APPO +env=doom_health_gathering_supreme +experiment=default_experiment +train_dir=/root/hfRL/ml/LunarLander-v2/train_dir +restart_behavior=resume +device=gpu +seed=None +num_policies=1 +async_rl=True +serial_mode=False +batched_sampling=False +num_batches_to_accumulate=2 +worker_num_splits=2 +policy_workers_per_policy=1 +max_policy_lag=1000 +num_workers=8 +num_envs_per_worker=4 +batch_size=1024 +num_batches_per_epoch=1 +num_epochs=1 +rollout=32 +recurrence=32 +shuffle_minibatches=False +gamma=0.99 +reward_scale=1.0 +reward_clip=1000.0 +value_bootstrap=False +normalize_returns=True +exploration_loss_coeff=0.001 +value_loss_coeff=0.5 +kl_loss_coeff=0.0 +exploration_loss=symmetric_kl +gae_lambda=0.95 +ppo_clip_ratio=0.1 +ppo_clip_value=0.2 +with_vtrace=False +vtrace_rho=1.0 +vtrace_c=1.0 +optimizer=adam +adam_eps=1e-06 +adam_beta1=0.9 +adam_beta2=0.999 +max_grad_norm=4.0 +learning_rate=0.0001 +lr_schedule=constant +lr_schedule_kl_threshold=0.008 +lr_adaptive_min=1e-06 +lr_adaptive_max=0.01 +obs_subtract_mean=0.0 +obs_scale=255.0 +normalize_input=True +normalize_input_keys=None +decorrelate_experience_max_seconds=0 +decorrelate_envs_on_one_worker=True +actor_worker_gpus=[] +set_workers_cpu_affinity=True +force_envs_single_thread=False +default_niceness=0 +log_to_file=True +experiment_summaries_interval=10 +flush_summaries_interval=30 +stats_avg=100 +summaries_use_frameskip=True +heartbeat_interval=20 +heartbeat_reporting_interval=600 +train_for_env_steps=4000000 +train_for_seconds=10000000000 +save_every_sec=120 +keep_checkpoints=2 +load_checkpoint_kind=latest +save_milestones_sec=-1 +save_best_every_sec=5 +save_best_metric=reward +save_best_after=100000 +benchmark=False +encoder_mlp_layers=[512, 512] +encoder_conv_architecture=convnet_simple +encoder_conv_mlp_layers=[512] +use_rnn=True +rnn_size=512 +rnn_type=gru +rnn_num_layers=1 +decoder_mlp_layers=[] +nonlinearity=elu +policy_initialization=orthogonal +policy_init_gain=1.0 +actor_critic_share_weights=True +adaptive_stddev=True +continuous_tanh_scale=0.0 +initial_stddev=1.0 +use_env_info_cache=False +env_gpu_actions=False +env_gpu_observations=True +env_frameskip=4 +env_framestack=1 +pixel_format=CHW +use_record_episode_statistics=False +with_wandb=False +wandb_user=None +wandb_project=sample_factory +wandb_group=None +wandb_job_type=SF +wandb_tags=[] +with_pbt=False +pbt_mix_policies_in_one_env=True +pbt_period_env_steps=5000000 +pbt_start_mutation=20000000 +pbt_replace_fraction=0.3 +pbt_mutation_rate=0.15 +pbt_replace_reward_gap=0.1 +pbt_replace_reward_gap_absolute=1e-06 +pbt_optimize_gamma=False +pbt_target_objective=true_objective +pbt_perturb_min=1.1 +pbt_perturb_max=1.5 +num_agents=-1 +num_humans=0 +num_bots=-1 +start_bot_difficulty=None +timelimit=None +res_w=128 +res_h=72 +wide_aspect_ratio=False +eval_env_frameskip=1 +fps=35 +command_line=--env=doom_health_gathering_supreme --num_workers=8 --num_envs_per_worker=4 --train_for_env_steps=4000000 +cli_args={'env': 'doom_health_gathering_supreme', 'num_workers': 8, 'num_envs_per_worker': 4, 'train_for_env_steps': 4000000} +git_hash=unknown +git_repo_name=not a git repository +[2024-11-07 13:46:25,976][11922] Saving configuration to /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/config.json... +[2024-11-07 13:46:25,978][11922] Rollout worker 0 uses device cpu +[2024-11-07 13:46:25,979][11922] Rollout worker 1 uses device cpu +[2024-11-07 13:46:25,981][11922] Rollout worker 2 uses device cpu +[2024-11-07 13:46:25,983][11922] Rollout worker 3 uses device cpu +[2024-11-07 13:46:25,984][11922] Rollout worker 4 uses device cpu +[2024-11-07 13:46:25,985][11922] Rollout worker 5 uses device cpu +[2024-11-07 13:46:25,986][11922] Rollout worker 6 uses device cpu +[2024-11-07 13:46:25,988][11922] Rollout worker 7 uses device cpu +[2024-11-07 13:46:26,047][11922] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-11-07 13:46:26,048][11922] InferenceWorker_p0-w0: min num requests: 2 +[2024-11-07 13:46:26,084][11922] Starting all processes... +[2024-11-07 13:46:26,085][11922] Starting process learner_proc0 +[2024-11-07 13:46:26,128][11922] Starting all processes... +[2024-11-07 13:46:26,134][11922] Starting process inference_proc0-0 +[2024-11-07 13:46:26,135][11922] Starting process rollout_proc0 +[2024-11-07 13:46:26,136][11922] Starting process rollout_proc1 +[2024-11-07 13:46:26,137][11922] Starting process rollout_proc2 +[2024-11-07 13:46:26,138][11922] Starting process rollout_proc3 +[2024-11-07 13:46:26,142][11922] Starting process rollout_proc4 +[2024-11-07 13:46:26,142][11922] Starting process rollout_proc5 +[2024-11-07 13:46:26,148][11922] Starting process rollout_proc6 +[2024-11-07 13:46:26,150][11922] Starting process rollout_proc7 +[2024-11-07 13:46:32,861][15894] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-11-07 13:46:32,862][15894] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 +[2024-11-07 13:46:33,101][15894] Num visible devices: 1 +[2024-11-07 13:46:33,152][15894] Starting seed is not provided +[2024-11-07 13:46:33,152][15894] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-11-07 13:46:33,153][15894] Initializing actor-critic model on device cuda:0 +[2024-11-07 13:46:33,153][15894] RunningMeanStd input shape: (3, 72, 128) +[2024-11-07 13:46:33,156][15894] RunningMeanStd input shape: (1,) +[2024-11-07 13:46:33,188][15894] ConvEncoder: input_channels=3 +[2024-11-07 13:46:33,934][15894] Conv encoder output size: 512 +[2024-11-07 13:46:33,935][15894] Policy head output size: 512 +[2024-11-07 13:46:33,989][15894] Created Actor Critic model with architecture: +[2024-11-07 13:46:34,012][15894] ActorCriticSharedWeights( + (obs_normalizer): ObservationNormalizer( + (running_mean_std): RunningMeanStdDictInPlace( + (running_mean_std): ModuleDict( + (obs): RunningMeanStdInPlace() + ) + ) + ) + (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) + (encoder): VizdoomEncoder( + (basic_encoder): ConvEncoder( + (enc): RecursiveScriptModule( + original_name=ConvEncoderImpl + (conv_head): RecursiveScriptModule( + original_name=Sequential + (0): RecursiveScriptModule(original_name=Conv2d) + (1): RecursiveScriptModule(original_name=ELU) + (2): RecursiveScriptModule(original_name=Conv2d) + (3): RecursiveScriptModule(original_name=ELU) + (4): RecursiveScriptModule(original_name=Conv2d) + (5): RecursiveScriptModule(original_name=ELU) + ) + (mlp_layers): RecursiveScriptModule( + original_name=Sequential + (0): RecursiveScriptModule(original_name=Linear) + (1): RecursiveScriptModule(original_name=ELU) + ) + ) + ) + ) + (core): ModelCoreRNN( + (core): GRU(512, 512) + ) + (decoder): MlpDecoder( + (mlp): Identity() + ) + (critic_linear): Linear(in_features=512, out_features=1, bias=True) + (action_parameterization): ActionParameterizationDefault( + (distribution_linear): Linear(in_features=512, out_features=5, bias=True) + ) +) +[2024-11-07 13:46:34,270][15914] Worker 6 uses CPU cores [6] +[2024-11-07 13:46:34,717][15915] Worker 7 uses CPU cores [0, 1, 2, 3, 4, 5, 6] +[2024-11-07 13:46:34,760][15912] Worker 4 uses CPU cores [4] +[2024-11-07 13:46:35,118][15911] Worker 3 uses CPU cores [3] +[2024-11-07 13:46:35,310][15910] Worker 2 uses CPU cores [2] +[2024-11-07 13:46:35,910][15913] Worker 5 uses CPU cores [5] +[2024-11-07 13:46:36,133][15894] Using optimizer +[2024-11-07 13:46:36,385][15907] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-11-07 13:46:36,386][15907] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 +[2024-11-07 13:46:36,392][15908] Worker 0 uses CPU cores [0] +[2024-11-07 13:46:36,413][15907] Num visible devices: 1 +[2024-11-07 13:46:36,420][15909] Worker 1 uses CPU cores [1] +[2024-11-07 13:46:37,928][15894] Loading state from checkpoint /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000980_4014080.pth... +[2024-11-07 13:46:38,045][15894] Loading model from checkpoint +[2024-11-07 13:46:38,049][15894] Loaded experiment state at self.train_step=980, self.env_steps=4014080 +[2024-11-07 13:46:38,050][15894] Initialized policy 0 weights for model version 980 +[2024-11-07 13:46:38,059][15894] LearnerWorker_p0 finished initialization! +[2024-11-07 13:46:38,059][15894] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-11-07 13:46:38,256][15907] RunningMeanStd input shape: (3, 72, 128) +[2024-11-07 13:46:38,257][15907] RunningMeanStd input shape: (1,) +[2024-11-07 13:46:38,272][15907] ConvEncoder: input_channels=3 +[2024-11-07 13:46:38,433][15907] Conv encoder output size: 512 +[2024-11-07 13:46:38,434][15907] Policy head output size: 512 +[2024-11-07 13:46:38,489][11922] Inference worker 0-0 is ready! +[2024-11-07 13:46:38,491][11922] All inference workers are ready! Signal rollout workers to start! +[2024-11-07 13:46:38,572][15912] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 13:46:38,588][15910] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 13:46:38,592][15909] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 13:46:38,605][15908] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 13:46:38,607][15913] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 13:46:38,615][15911] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 13:46:38,671][15914] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 13:46:38,691][15915] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 13:46:39,151][15910] Decorrelating experience for 0 frames... +[2024-11-07 13:46:39,153][15911] Decorrelating experience for 0 frames... +[2024-11-07 13:46:39,154][15912] Decorrelating experience for 0 frames... +[2024-11-07 13:46:39,214][15908] Decorrelating experience for 0 frames... +[2024-11-07 13:46:39,218][15913] Decorrelating experience for 0 frames... +[2024-11-07 13:46:39,320][15914] Decorrelating experience for 0 frames... +[2024-11-07 13:46:39,597][15912] Decorrelating experience for 32 frames... +[2024-11-07 13:46:39,601][15911] Decorrelating experience for 32 frames... +[2024-11-07 13:46:39,804][15909] Decorrelating experience for 0 frames... +[2024-11-07 13:46:39,813][11922] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 4014080. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-11-07 13:46:39,812][15910] Decorrelating experience for 32 frames... +[2024-11-07 13:46:39,815][15908] Decorrelating experience for 32 frames... +[2024-11-07 13:46:39,818][15913] Decorrelating experience for 32 frames... +[2024-11-07 13:46:40,179][15914] Decorrelating experience for 32 frames... +[2024-11-07 13:46:40,448][15912] Decorrelating experience for 64 frames... +[2024-11-07 13:46:40,454][15909] Decorrelating experience for 32 frames... +[2024-11-07 13:46:40,491][15910] Decorrelating experience for 64 frames... +[2024-11-07 13:46:40,532][15908] Decorrelating experience for 64 frames... +[2024-11-07 13:46:40,548][15913] Decorrelating experience for 64 frames... +[2024-11-07 13:46:40,806][15915] Decorrelating experience for 0 frames... +[2024-11-07 13:46:41,100][15912] Decorrelating experience for 96 frames... +[2024-11-07 13:46:41,140][15910] Decorrelating experience for 96 frames... +[2024-11-07 13:46:41,259][15909] Decorrelating experience for 64 frames... +[2024-11-07 13:46:41,267][15914] Decorrelating experience for 64 frames... +[2024-11-07 13:46:41,381][15913] Decorrelating experience for 96 frames... +[2024-11-07 13:46:41,381][15911] Decorrelating experience for 64 frames... +[2024-11-07 13:46:41,744][15908] Decorrelating experience for 96 frames... +[2024-11-07 13:46:41,760][15915] Decorrelating experience for 32 frames... +[2024-11-07 13:46:41,947][15914] Decorrelating experience for 96 frames... +[2024-11-07 13:46:42,091][15909] Decorrelating experience for 96 frames... +[2024-11-07 13:46:42,535][15911] Decorrelating experience for 96 frames... +[2024-11-07 13:46:42,731][15915] Decorrelating experience for 64 frames... +[2024-11-07 13:46:43,672][15915] Decorrelating experience for 96 frames... +[2024-11-07 13:46:44,816][11922] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 4014080. Throughput: 0: 383.4. Samples: 1918. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-11-07 13:46:44,824][11922] Avg episode reward: [(0, '1.815')] +[2024-11-07 13:46:45,370][15894] Signal inference workers to stop experience collection... +[2024-11-07 13:46:45,393][15907] InferenceWorker_p0-w0: stopping experience collection +[2024-11-07 13:46:46,038][11922] Heartbeat connected on Batcher_0 +[2024-11-07 13:46:46,047][11922] Heartbeat connected on InferenceWorker_p0-w0 +[2024-11-07 13:46:46,054][11922] Heartbeat connected on RolloutWorker_w0 +[2024-11-07 13:46:46,061][11922] Heartbeat connected on RolloutWorker_w1 +[2024-11-07 13:46:46,068][11922] Heartbeat connected on RolloutWorker_w3 +[2024-11-07 13:46:46,072][11922] Heartbeat connected on RolloutWorker_w4 +[2024-11-07 13:46:46,076][11922] Heartbeat connected on RolloutWorker_w5 +[2024-11-07 13:46:46,077][11922] Heartbeat connected on RolloutWorker_w2 +[2024-11-07 13:46:46,080][11922] Heartbeat connected on RolloutWorker_w6 +[2024-11-07 13:46:46,084][11922] Heartbeat connected on RolloutWorker_w7 +[2024-11-07 13:46:49,814][11922] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 4014080. Throughput: 0: 280.0. Samples: 2800. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-11-07 13:46:49,815][11922] Avg episode reward: [(0, '2.198')] +[2024-11-07 13:46:53,479][15894] Signal inference workers to resume experience collection... +[2024-11-07 13:46:53,479][15907] InferenceWorker_p0-w0: resuming experience collection +[2024-11-07 13:46:53,483][15894] Stopping Batcher_0... +[2024-11-07 13:46:53,483][15894] Loop batcher_evt_loop terminating... +[2024-11-07 13:46:53,491][11922] Component Batcher_0 stopped! +[2024-11-07 13:46:53,503][15907] Weights refcount: 2 0 +[2024-11-07 13:46:53,506][15907] Stopping InferenceWorker_p0-w0... +[2024-11-07 13:46:53,507][15907] Loop inference_proc0-0_evt_loop terminating... +[2024-11-07 13:46:53,507][11922] Component InferenceWorker_p0-w0 stopped! +[2024-11-07 13:46:53,609][15911] Stopping RolloutWorker_w3... +[2024-11-07 13:46:53,610][15911] Loop rollout_proc3_evt_loop terminating... +[2024-11-07 13:46:53,609][11922] Component RolloutWorker_w3 stopped! +[2024-11-07 13:46:53,681][11922] Component RolloutWorker_w1 stopped! +[2024-11-07 13:46:53,685][15909] Stopping RolloutWorker_w1... +[2024-11-07 13:46:53,686][15909] Loop rollout_proc1_evt_loop terminating... +[2024-11-07 13:46:53,700][11922] Component RolloutWorker_w5 stopped! +[2024-11-07 13:46:53,699][15913] Stopping RolloutWorker_w5... +[2024-11-07 13:46:53,705][15913] Loop rollout_proc5_evt_loop terminating... +[2024-11-07 13:46:53,765][15912] Stopping RolloutWorker_w4... +[2024-11-07 13:46:53,773][11922] Component RolloutWorker_w4 stopped! +[2024-11-07 13:46:53,776][15912] Loop rollout_proc4_evt_loop terminating... +[2024-11-07 13:46:54,091][15910] Stopping RolloutWorker_w2... +[2024-11-07 13:46:54,091][15910] Loop rollout_proc2_evt_loop terminating... +[2024-11-07 13:46:54,096][11922] Component RolloutWorker_w2 stopped! +[2024-11-07 13:46:54,158][15915] Stopping RolloutWorker_w7... +[2024-11-07 13:46:54,160][11922] Component RolloutWorker_w7 stopped! +[2024-11-07 13:46:54,176][15915] Loop rollout_proc7_evt_loop terminating... +[2024-11-07 13:46:54,256][15914] Stopping RolloutWorker_w6... +[2024-11-07 13:46:54,245][11922] Component RolloutWorker_w6 stopped! +[2024-11-07 13:46:54,270][15914] Loop rollout_proc6_evt_loop terminating... +[2024-11-07 13:46:54,464][11922] Component RolloutWorker_w0 stopped! +[2024-11-07 13:46:54,467][15908] Stopping RolloutWorker_w0... +[2024-11-07 13:46:54,469][15908] Loop rollout_proc0_evt_loop terminating... +[2024-11-07 13:46:54,999][15894] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000982_4022272.pth... +[2024-11-07 13:46:55,359][15894] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth +[2024-11-07 13:46:55,376][15894] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000982_4022272.pth... +[2024-11-07 13:46:55,375][11922] Heartbeat connected on LearnerWorker_p0 +[2024-11-07 13:46:55,741][15894] Stopping LearnerWorker_p0... +[2024-11-07 13:46:55,742][15894] Loop learner_proc0_evt_loop terminating... +[2024-11-07 13:46:55,743][11922] Component LearnerWorker_p0 stopped! +[2024-11-07 13:46:55,751][11922] Waiting for process learner_proc0 to stop... +[2024-11-07 13:46:57,104][11922] Waiting for process inference_proc0-0 to join... +[2024-11-07 13:46:57,106][11922] Waiting for process rollout_proc0 to join... +[2024-11-07 13:46:57,108][11922] Waiting for process rollout_proc1 to join... +[2024-11-07 13:46:57,109][11922] Waiting for process rollout_proc2 to join... +[2024-11-07 13:46:57,111][11922] Waiting for process rollout_proc3 to join... +[2024-11-07 13:46:57,113][11922] Waiting for process rollout_proc4 to join... +[2024-11-07 13:46:57,114][11922] Waiting for process rollout_proc5 to join... +[2024-11-07 13:46:57,116][11922] Waiting for process rollout_proc6 to join... +[2024-11-07 13:46:57,117][11922] Waiting for process rollout_proc7 to join... +[2024-11-07 13:46:57,124][11922] Batcher 0 profile tree view: +batching: 0.0647, releasing_batches: 0.0037 +[2024-11-07 13:46:57,127][11922] InferenceWorker_p0-w0 profile tree view: +update_model: 0.0205 +wait_policy: 0.0001 + wait_policy_total: 2.6671 +one_step: 0.0122 + handle_policy_step: 4.0411 + deserialize: 0.0722, stack: 0.0210, obs_to_device_normalize: 0.9497, forward: 2.4204, send_messages: 0.1187 + prepare_outputs: 0.3446 + to_cpu: 0.2395 +[2024-11-07 13:46:57,128][11922] Learner 0 profile tree view: +misc: 0.0001, prepare_batch: 2.5911 +train: 8.0508 + epoch_init: 0.0000, minibatch_init: 0.0000, losses_postprocess: 0.0011, kl_divergence: 0.0721, after_optimizer: 0.4508 + calculate_losses: 1.1483 + losses_init: 0.0000, forward_head: 0.5330, bptt_initial: 0.3452, tail: 0.0913, advantages_returns: 0.0017, losses: 0.1113 + bptt: 0.0650 + bptt_forward_core: 0.0649 + update: 6.3767 + clip: 0.1802 +[2024-11-07 13:46:57,130][11922] RolloutWorker_w0 profile tree view: +wait_for_trajectories: 0.0006, enqueue_policy_requests: 0.0646, env_step: 0.7793, overhead: 0.0623, complete_rollouts: 0.0010 +save_policy_outputs: 0.0912 + split_output_tensors: 0.0297 +[2024-11-07 13:46:57,131][11922] RolloutWorker_w7 profile tree view: +wait_for_trajectories: 0.0009, enqueue_policy_requests: 0.0323, env_step: 0.6692, overhead: 0.0294, complete_rollouts: 0.0012 +save_policy_outputs: 0.0352 + split_output_tensors: 0.0111 +[2024-11-07 13:46:57,134][11922] Loop Runner_EvtLoop terminating... +[2024-11-07 13:46:57,136][11922] Runner profile tree view: +main_loop: 31.0520 +[2024-11-07 13:46:57,138][11922] Collected {0: 4022272}, FPS: 263.8 +[2024-11-07 13:56:29,353][01021] Saving configuration to /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/config.json... +[2024-11-07 13:56:29,375][01021] Rollout worker 0 uses device cpu +[2024-11-07 13:56:29,376][01021] Rollout worker 1 uses device cpu +[2024-11-07 13:56:29,378][01021] Rollout worker 2 uses device cpu +[2024-11-07 13:56:29,379][01021] Rollout worker 3 uses device cpu +[2024-11-07 13:56:29,380][01021] Rollout worker 4 uses device cpu +[2024-11-07 13:56:29,381][01021] Rollout worker 5 uses device cpu +[2024-11-07 13:56:29,382][01021] Rollout worker 6 uses device cpu +[2024-11-07 13:56:29,382][01021] Rollout worker 7 uses device cpu +[2024-11-07 13:56:29,753][01021] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-11-07 13:56:29,755][01021] InferenceWorker_p0-w0: min num requests: 2 +[2024-11-07 13:56:29,791][01021] Starting all processes... +[2024-11-07 13:56:29,792][01021] Starting process learner_proc0 +[2024-11-07 13:56:29,946][01021] Starting all processes... +[2024-11-07 13:56:29,999][01021] Starting process inference_proc0-0 +[2024-11-07 13:56:30,001][01021] Starting process rollout_proc0 +[2024-11-07 13:56:30,001][01021] Starting process rollout_proc1 +[2024-11-07 13:56:30,005][01021] Starting process rollout_proc2 +[2024-11-07 13:56:30,005][01021] Starting process rollout_proc3 +[2024-11-07 13:56:30,006][01021] Starting process rollout_proc4 +[2024-11-07 13:56:30,011][01021] Starting process rollout_proc5 +[2024-11-07 13:56:30,013][01021] Starting process rollout_proc6 +[2024-11-07 13:56:30,017][01021] Starting process rollout_proc7 +[2024-11-07 13:56:36,174][01326] Worker 2 uses CPU cores [2] +[2024-11-07 13:56:36,430][01327] Worker 3 uses CPU cores [3] +[2024-11-07 13:56:36,676][01310] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-11-07 13:56:36,677][01310] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 +[2024-11-07 13:56:36,769][01328] Worker 4 uses CPU cores [4] +[2024-11-07 13:56:36,883][01310] Num visible devices: 1 +[2024-11-07 13:56:36,934][01310] Starting seed is not provided +[2024-11-07 13:56:36,935][01310] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-11-07 13:56:36,935][01310] Initializing actor-critic model on device cuda:0 +[2024-11-07 13:56:36,935][01310] RunningMeanStd input shape: (3, 72, 128) +[2024-11-07 13:56:36,950][01310] RunningMeanStd input shape: (1,) +[2024-11-07 13:56:36,985][01310] ConvEncoder: input_channels=3 +[2024-11-07 13:56:37,147][01331] Worker 7 uses CPU cores [0, 1, 2, 3, 4, 5, 6] +[2024-11-07 13:56:37,496][01324] Worker 0 uses CPU cores [0] +[2024-11-07 13:56:37,671][01323] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-11-07 13:56:37,671][01323] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 +[2024-11-07 13:56:37,733][01323] Num visible devices: 1 +[2024-11-07 13:56:37,916][01325] Worker 1 uses CPU cores [1] +[2024-11-07 13:56:37,960][01330] Worker 6 uses CPU cores [6] +[2024-11-07 13:56:38,125][01329] Worker 5 uses CPU cores [5] +[2024-11-07 13:56:38,455][01310] Conv encoder output size: 512 +[2024-11-07 13:56:38,456][01310] Policy head output size: 512 +[2024-11-07 13:56:38,662][01310] Created Actor Critic model with architecture: +[2024-11-07 13:56:38,662][01310] ActorCriticSharedWeights( + (obs_normalizer): ObservationNormalizer( + (running_mean_std): RunningMeanStdDictInPlace( + (running_mean_std): ModuleDict( + (obs): RunningMeanStdInPlace() + ) + ) + ) + (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) + (encoder): VizdoomEncoder( + (basic_encoder): ConvEncoder( + (enc): RecursiveScriptModule( + original_name=ConvEncoderImpl + (conv_head): RecursiveScriptModule( + original_name=Sequential + (0): RecursiveScriptModule(original_name=Conv2d) + (1): RecursiveScriptModule(original_name=ELU) + (2): RecursiveScriptModule(original_name=Conv2d) + (3): RecursiveScriptModule(original_name=ELU) + (4): RecursiveScriptModule(original_name=Conv2d) + (5): RecursiveScriptModule(original_name=ELU) + ) + (mlp_layers): RecursiveScriptModule( + original_name=Sequential + (0): RecursiveScriptModule(original_name=Linear) + (1): RecursiveScriptModule(original_name=ELU) + ) + ) + ) + ) + (core): ModelCoreRNN( + (core): GRU(512, 512) + ) + (decoder): MlpDecoder( + (mlp): Identity() + ) + (critic_linear): Linear(in_features=512, out_features=1, bias=True) + (action_parameterization): ActionParameterizationDefault( + (distribution_linear): Linear(in_features=512, out_features=5, bias=True) + ) +) +[2024-11-07 13:56:41,671][01310] Using optimizer +[2024-11-07 13:56:49,591][01310] Loading state from checkpoint /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000982_4022272.pth... +[2024-11-07 13:56:49,745][01021] Heartbeat connected on Batcher_0 +[2024-11-07 13:56:49,754][01021] Heartbeat connected on InferenceWorker_p0-w0 +[2024-11-07 13:56:49,761][01021] Heartbeat connected on RolloutWorker_w0 +[2024-11-07 13:56:49,764][01021] Heartbeat connected on RolloutWorker_w1 +[2024-11-07 13:56:49,769][01021] Heartbeat connected on RolloutWorker_w2 +[2024-11-07 13:56:49,773][01021] Heartbeat connected on RolloutWorker_w3 +[2024-11-07 13:56:49,778][01021] Heartbeat connected on RolloutWorker_w4 +[2024-11-07 13:56:49,782][01021] Heartbeat connected on RolloutWorker_w5 +[2024-11-07 13:56:49,787][01021] Heartbeat connected on RolloutWorker_w6 +[2024-11-07 13:56:49,790][01021] Heartbeat connected on RolloutWorker_w7 +[2024-11-07 13:56:49,951][01310] Loading model from checkpoint +[2024-11-07 13:56:49,954][01310] Loaded experiment state at self.train_step=982, self.env_steps=4022272 +[2024-11-07 13:56:49,955][01310] Initialized policy 0 weights for model version 982 +[2024-11-07 13:56:49,964][01310] LearnerWorker_p0 finished initialization! +[2024-11-07 13:56:49,965][01310] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-11-07 13:56:49,966][01021] Heartbeat connected on LearnerWorker_p0 +[2024-11-07 13:56:50,158][01323] RunningMeanStd input shape: (3, 72, 128) +[2024-11-07 13:56:50,159][01323] RunningMeanStd input shape: (1,) +[2024-11-07 13:56:50,174][01323] ConvEncoder: input_channels=3 +[2024-11-07 13:56:50,310][01323] Conv encoder output size: 512 +[2024-11-07 13:56:50,310][01323] Policy head output size: 512 +[2024-11-07 13:56:50,362][01021] Inference worker 0-0 is ready! +[2024-11-07 13:56:50,363][01021] All inference workers are ready! Signal rollout workers to start! +[2024-11-07 13:56:50,460][01328] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 13:56:50,476][01324] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 13:56:50,479][01327] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 13:56:50,484][01325] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 13:56:50,491][01326] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 13:56:50,500][01329] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 13:56:50,563][01330] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 13:56:50,596][01331] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 13:56:53,662][01328] Decorrelating experience for 0 frames... +[2024-11-07 13:56:53,662][01330] Decorrelating experience for 0 frames... +[2024-11-07 13:56:53,662][01324] Decorrelating experience for 0 frames... +[2024-11-07 13:56:53,662][01325] Decorrelating experience for 0 frames... +[2024-11-07 13:56:53,668][01327] Decorrelating experience for 0 frames... +[2024-11-07 13:56:53,979][01326] Decorrelating experience for 0 frames... +[2024-11-07 13:56:53,996][01331] Decorrelating experience for 0 frames... +[2024-11-07 13:56:54,023][01329] Decorrelating experience for 0 frames... +[2024-11-07 13:56:54,091][01324] Decorrelating experience for 32 frames... +[2024-11-07 13:56:54,109][01325] Decorrelating experience for 32 frames... +[2024-11-07 13:56:54,131][01021] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 4022272. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-11-07 13:56:54,431][01331] Decorrelating experience for 32 frames... +[2024-11-07 13:56:54,452][01328] Decorrelating experience for 32 frames... +[2024-11-07 13:56:54,542][01330] Decorrelating experience for 32 frames... +[2024-11-07 13:56:54,805][01329] Decorrelating experience for 32 frames... +[2024-11-07 13:56:54,843][01324] Decorrelating experience for 64 frames... +[2024-11-07 13:56:54,872][01325] Decorrelating experience for 64 frames... +[2024-11-07 13:56:55,034][01331] Decorrelating experience for 64 frames... +[2024-11-07 13:56:55,041][01328] Decorrelating experience for 64 frames... +[2024-11-07 13:56:55,044][01327] Decorrelating experience for 32 frames... +[2024-11-07 13:56:55,174][01330] Decorrelating experience for 64 frames... +[2024-11-07 13:56:55,333][01324] Decorrelating experience for 96 frames... +[2024-11-07 13:56:55,463][01326] Decorrelating experience for 32 frames... +[2024-11-07 13:56:55,464][01328] Decorrelating experience for 96 frames... +[2024-11-07 13:56:55,520][01331] Decorrelating experience for 96 frames... +[2024-11-07 13:56:55,555][01327] Decorrelating experience for 64 frames... +[2024-11-07 13:56:55,867][01329] Decorrelating experience for 64 frames... +[2024-11-07 13:56:55,898][01330] Decorrelating experience for 96 frames... +[2024-11-07 13:56:55,935][01326] Decorrelating experience for 64 frames... +[2024-11-07 13:56:55,947][01327] Decorrelating experience for 96 frames... +[2024-11-07 13:56:56,231][01329] Decorrelating experience for 96 frames... +[2024-11-07 13:56:56,233][01325] Decorrelating experience for 96 frames... +[2024-11-07 13:56:56,289][01326] Decorrelating experience for 96 frames... +[2024-11-07 13:56:59,130][01021] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 4022272. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-11-07 13:57:04,131][01021] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 4022272. Throughput: 0: 8.2. Samples: 82. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-11-07 13:57:04,134][01021] Avg episode reward: [(0, '1.040')] +[2024-11-07 13:57:04,869][01310] Signal inference workers to stop experience collection... +[2024-11-07 13:57:04,898][01323] InferenceWorker_p0-w0: stopping experience collection +[2024-11-07 13:57:09,130][01021] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 4022272. Throughput: 0: 161.6. Samples: 2424. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-11-07 13:57:09,132][01021] Avg episode reward: [(0, '1.992')] +[2024-11-07 13:57:14,905][01021] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 4022272. Throughput: 0: 116.7. Samples: 2424. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-11-07 13:57:14,908][01021] Avg episode reward: [(0, '1.992')] +[2024-11-07 13:57:19,131][01021] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 4022272. Throughput: 0: 97.0. Samples: 2424. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-11-07 13:57:19,134][01021] Avg episode reward: [(0, '1.992')] +[2024-11-07 13:57:19,956][01310] Signal inference workers to resume experience collection... +[2024-11-07 13:57:19,957][01323] InferenceWorker_p0-w0: resuming experience collection +[2024-11-07 13:57:19,958][01310] Stopping Batcher_0... +[2024-11-07 13:57:19,958][01310] Loop batcher_evt_loop terminating... +[2024-11-07 13:57:19,969][01021] Component Batcher_0 stopped! +[2024-11-07 13:57:19,983][01323] Weights refcount: 2 0 +[2024-11-07 13:57:19,997][01323] Stopping InferenceWorker_p0-w0... +[2024-11-07 13:57:19,997][01323] Loop inference_proc0-0_evt_loop terminating... +[2024-11-07 13:57:19,997][01021] Component InferenceWorker_p0-w0 stopped! +[2024-11-07 13:57:20,142][01328] Stopping RolloutWorker_w4... +[2024-11-07 13:57:20,142][01021] Component RolloutWorker_w4 stopped! +[2024-11-07 13:57:20,145][01327] Stopping RolloutWorker_w3... +[2024-11-07 13:57:20,146][01327] Loop rollout_proc3_evt_loop terminating... +[2024-11-07 13:57:20,149][01328] Loop rollout_proc4_evt_loop terminating... +[2024-11-07 13:57:20,157][01325] Stopping RolloutWorker_w1... +[2024-11-07 13:57:20,158][01325] Loop rollout_proc1_evt_loop terminating... +[2024-11-07 13:57:20,145][01021] Component RolloutWorker_w3 stopped! +[2024-11-07 13:57:20,159][01021] Component RolloutWorker_w1 stopped! +[2024-11-07 13:57:20,203][01021] Component RolloutWorker_w2 stopped! +[2024-11-07 13:57:20,203][01326] Stopping RolloutWorker_w2... +[2024-11-07 13:57:20,205][01326] Loop rollout_proc2_evt_loop terminating... +[2024-11-07 13:57:20,207][01324] Stopping RolloutWorker_w0... +[2024-11-07 13:57:20,207][01324] Loop rollout_proc0_evt_loop terminating... +[2024-11-07 13:57:20,207][01021] Component RolloutWorker_w0 stopped! +[2024-11-07 13:57:20,215][01329] Stopping RolloutWorker_w5... +[2024-11-07 13:57:20,215][01329] Loop rollout_proc5_evt_loop terminating... +[2024-11-07 13:57:20,214][01021] Component RolloutWorker_w6 stopped! +[2024-11-07 13:57:20,217][01021] Component RolloutWorker_w5 stopped! +[2024-11-07 13:57:20,219][01330] Stopping RolloutWorker_w6... +[2024-11-07 13:57:20,220][01330] Loop rollout_proc6_evt_loop terminating... +[2024-11-07 13:57:20,230][01021] Component RolloutWorker_w7 stopped! +[2024-11-07 13:57:20,232][01331] Stopping RolloutWorker_w7... +[2024-11-07 13:57:20,232][01331] Loop rollout_proc7_evt_loop terminating... +[2024-11-07 13:57:21,122][01310] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000984_4030464.pth... +[2024-11-07 13:57:21,556][01310] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000980_4014080.pth +[2024-11-07 13:57:21,559][01310] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000984_4030464.pth... +[2024-11-07 13:57:21,776][01310] Stopping LearnerWorker_p0... +[2024-11-07 13:57:21,777][01310] Loop learner_proc0_evt_loop terminating... +[2024-11-07 13:57:21,777][01021] Component LearnerWorker_p0 stopped! +[2024-11-07 13:57:21,787][01021] Waiting for process learner_proc0 to stop... +[2024-11-07 13:57:23,265][01021] Waiting for process inference_proc0-0 to join... +[2024-11-07 13:57:23,267][01021] Waiting for process rollout_proc0 to join... +[2024-11-07 13:57:23,268][01021] Waiting for process rollout_proc1 to join... +[2024-11-07 13:57:23,269][01021] Waiting for process rollout_proc2 to join... +[2024-11-07 13:57:23,271][01021] Waiting for process rollout_proc3 to join... +[2024-11-07 13:57:23,274][01021] Waiting for process rollout_proc4 to join... +[2024-11-07 13:57:23,276][01021] Waiting for process rollout_proc5 to join... +[2024-11-07 13:57:23,278][01021] Waiting for process rollout_proc6 to join... +[2024-11-07 13:57:23,279][01021] Waiting for process rollout_proc7 to join... +[2024-11-07 13:57:23,281][01021] Batcher 0 profile tree view: +batching: 0.2353, releasing_batches: 0.0013 +[2024-11-07 13:57:23,283][01021] InferenceWorker_p0-w0 profile tree view: +update_model: 0.0166 +wait_policy: 0.0001 + wait_policy_total: 4.9637 +one_step: 0.0133 + handle_policy_step: 9.3400 + deserialize: 0.0446, stack: 0.0056, obs_to_device_normalize: 1.9853, forward: 6.9578, send_messages: 0.0880 + prepare_outputs: 0.2132 + to_cpu: 0.1588 +[2024-11-07 13:57:23,284][01021] Learner 0 profile tree view: +misc: 0.0000, prepare_batch: 5.2481 +train: 11.8101 + epoch_init: 0.0000, minibatch_init: 0.0000, losses_postprocess: 0.0014, kl_divergence: 0.0295, after_optimizer: 0.4496 + calculate_losses: 2.7341 + losses_init: 0.0000, forward_head: 0.5470, bptt_initial: 1.6040, tail: 0.0654, advantages_returns: 0.0018, losses: 0.2425 + bptt: 0.2468 + bptt_forward_core: 0.2465 + update: 8.5930 + clip: 0.9877 +[2024-11-07 13:57:23,287][01021] RolloutWorker_w0 profile tree view: +wait_for_trajectories: 0.0012, enqueue_policy_requests: 0.0547, env_step: 0.3752, overhead: 0.0338, complete_rollouts: 0.0017 +save_policy_outputs: 0.0544 + split_output_tensors: 0.0182 +[2024-11-07 13:57:23,291][01021] RolloutWorker_w7 profile tree view: +wait_for_trajectories: 0.0007, enqueue_policy_requests: 0.0555, env_step: 0.7234, overhead: 0.0269, complete_rollouts: 0.0012 +save_policy_outputs: 0.0563 + split_output_tensors: 0.0150 +[2024-11-07 13:57:23,299][01021] Loop Runner_EvtLoop terminating... +[2024-11-07 13:57:23,303][01021] Runner profile tree view: +main_loop: 53.5122 +[2024-11-07 13:57:23,305][01021] Collected {0: 4030464}, FPS: 153.1 +[2024-11-07 14:00:00,289][01021] Loading existing experiment configuration from /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/config.json +[2024-11-07 14:00:00,291][01021] Overriding arg 'num_workers' with value 1 passed from command line +[2024-11-07 14:00:00,292][01021] Adding new argument 'no_render'=True that is not in the saved config file! +[2024-11-07 14:00:00,293][01021] Adding new argument 'save_video'=True that is not in the saved config file! +[2024-11-07 14:00:00,294][01021] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! +[2024-11-07 14:00:00,295][01021] Adding new argument 'video_name'=None that is not in the saved config file! +[2024-11-07 14:00:00,295][01021] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! +[2024-11-07 14:00:00,296][01021] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! +[2024-11-07 14:00:00,297][01021] Adding new argument 'push_to_hub'=False that is not in the saved config file! +[2024-11-07 14:00:00,298][01021] Adding new argument 'hf_repository'=None that is not in the saved config file! +[2024-11-07 14:00:00,299][01021] Adding new argument 'policy_index'=0 that is not in the saved config file! +[2024-11-07 14:00:00,299][01021] Adding new argument 'eval_deterministic'=False that is not in the saved config file! +[2024-11-07 14:00:00,300][01021] Adding new argument 'train_script'=None that is not in the saved config file! +[2024-11-07 14:00:00,303][01021] Adding new argument 'enjoy_script'=None that is not in the saved config file! +[2024-11-07 14:00:00,305][01021] Using frameskip 1 and render_action_repeat=4 for evaluation +[2024-11-07 14:00:00,346][01021] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 14:00:00,350][01021] RunningMeanStd input shape: (3, 72, 128) +[2024-11-07 14:00:00,353][01021] RunningMeanStd input shape: (1,) +[2024-11-07 14:00:00,390][01021] ConvEncoder: input_channels=3 +[2024-11-07 14:00:00,786][01021] Conv encoder output size: 512 +[2024-11-07 14:00:00,788][01021] Policy head output size: 512 +[2024-11-07 14:00:01,587][01021] Loading state from checkpoint /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000984_4030464.pth... +[2024-11-07 14:00:04,356][01021] Num frames 100... +[2024-11-07 14:00:04,562][01021] Num frames 200... +[2024-11-07 14:00:04,746][01021] Num frames 300... +[2024-11-07 14:00:04,936][01021] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840 +[2024-11-07 14:00:04,939][01021] Avg episode reward: 3.840, avg true_objective: 3.840 +[2024-11-07 14:00:04,988][01021] Num frames 400... +[2024-11-07 14:00:05,172][01021] Num frames 500... +[2024-11-07 14:00:05,341][01021] Num frames 600... +[2024-11-07 14:00:05,508][01021] Num frames 700... +[2024-11-07 14:00:05,673][01021] Num frames 800... +[2024-11-07 14:00:05,789][01021] Avg episode rewards: #0: 4.660, true rewards: #0: 4.160 +[2024-11-07 14:00:05,793][01021] Avg episode reward: 4.660, avg true_objective: 4.160 +[2024-11-07 14:00:05,918][01021] Num frames 900... +[2024-11-07 14:00:06,069][01021] Num frames 1000... +[2024-11-07 14:00:06,227][01021] Num frames 1100... +[2024-11-07 14:00:06,391][01021] Num frames 1200... +[2024-11-07 14:00:06,479][01021] Avg episode rewards: #0: 4.387, true rewards: #0: 4.053 +[2024-11-07 14:00:06,483][01021] Avg episode reward: 4.387, avg true_objective: 4.053 +[2024-11-07 14:00:06,692][01021] Num frames 1300... +[2024-11-07 14:00:06,916][01021] Num frames 1400... +[2024-11-07 14:00:07,079][01021] Num frames 1500... +[2024-11-07 14:00:07,271][01021] Num frames 1600... +[2024-11-07 14:00:07,323][01021] Avg episode rewards: #0: 4.250, true rewards: #0: 4.000 +[2024-11-07 14:00:07,325][01021] Avg episode reward: 4.250, avg true_objective: 4.000 +[2024-11-07 14:00:07,479][01021] Num frames 1700... +[2024-11-07 14:00:07,634][01021] Num frames 1800... +[2024-11-07 14:00:07,786][01021] Num frames 1900... +[2024-11-07 14:00:08,004][01021] Avg episode rewards: #0: 4.168, true rewards: #0: 3.968 +[2024-11-07 14:00:08,008][01021] Avg episode reward: 4.168, avg true_objective: 3.968 +[2024-11-07 14:00:08,059][01021] Num frames 2000... +[2024-11-07 14:00:08,245][01021] Num frames 2100... +[2024-11-07 14:00:08,406][01021] Num frames 2200... +[2024-11-07 14:00:08,566][01021] Num frames 2300... +[2024-11-07 14:00:08,737][01021] Avg episode rewards: #0: 4.113, true rewards: #0: 3.947 +[2024-11-07 14:00:08,742][01021] Avg episode reward: 4.113, avg true_objective: 3.947 +[2024-11-07 14:00:08,812][01021] Num frames 2400... +[2024-11-07 14:00:08,961][01021] Num frames 2500... +[2024-11-07 14:00:09,102][01021] Num frames 2600... +[2024-11-07 14:00:09,253][01021] Num frames 2700... +[2024-11-07 14:00:09,408][01021] Num frames 2800... +[2024-11-07 14:00:09,493][01021] Avg episode rewards: #0: 4.309, true rewards: #0: 4.023 +[2024-11-07 14:00:09,496][01021] Avg episode reward: 4.309, avg true_objective: 4.023 +[2024-11-07 14:00:09,649][01021] Num frames 2900... +[2024-11-07 14:00:09,796][01021] Num frames 3000... +[2024-11-07 14:00:09,947][01021] Num frames 3100... +[2024-11-07 14:00:10,098][01021] Num frames 3200... +[2024-11-07 14:00:10,151][01021] Avg episode rewards: #0: 4.250, true rewards: #0: 4.000 +[2024-11-07 14:00:10,153][01021] Avg episode reward: 4.250, avg true_objective: 4.000 +[2024-11-07 14:00:10,317][01021] Num frames 3300... +[2024-11-07 14:00:10,471][01021] Num frames 3400... +[2024-11-07 14:00:10,628][01021] Num frames 3500... +[2024-11-07 14:00:10,807][01021] Avg episode rewards: #0: 4.204, true rewards: #0: 3.982 +[2024-11-07 14:00:10,808][01021] Avg episode reward: 4.204, avg true_objective: 3.982 +[2024-11-07 14:00:10,841][01021] Num frames 3600... +[2024-11-07 14:00:11,029][01021] Num frames 3700... +[2024-11-07 14:00:11,179][01021] Num frames 3800... +[2024-11-07 14:00:11,327][01021] Num frames 3900... +[2024-11-07 14:00:11,471][01021] Num frames 4000... +[2024-11-07 14:00:11,524][01021] Avg episode rewards: #0: 4.300, true rewards: #0: 4.000 +[2024-11-07 14:00:11,526][01021] Avg episode reward: 4.300, avg true_objective: 4.000 +[2024-11-07 14:00:25,803][01021] Replay video saved to /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/replay.mp4! +[2024-11-07 14:19:59,772][01364] Saving configuration to /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/config.json... +[2024-11-07 14:19:59,796][01364] Rollout worker 0 uses device cpu +[2024-11-07 14:19:59,799][01364] Rollout worker 1 uses device cpu +[2024-11-07 14:19:59,801][01364] Rollout worker 2 uses device cpu +[2024-11-07 14:19:59,804][01364] Rollout worker 3 uses device cpu +[2024-11-07 14:19:59,806][01364] Rollout worker 4 uses device cpu +[2024-11-07 14:19:59,811][01364] Rollout worker 5 uses device cpu +[2024-11-07 14:19:59,815][01364] Rollout worker 6 uses device cpu +[2024-11-07 14:19:59,819][01364] Rollout worker 7 uses device cpu +[2024-11-07 14:20:00,277][01364] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-11-07 14:20:00,278][01364] InferenceWorker_p0-w0: min num requests: 2 +[2024-11-07 14:20:00,326][01364] Starting all processes... +[2024-11-07 14:20:00,328][01364] Starting process learner_proc0 +[2024-11-07 14:20:00,463][01364] Starting all processes... +[2024-11-07 14:20:00,522][01364] Starting process inference_proc0-0 +[2024-11-07 14:20:00,524][01364] Starting process rollout_proc0 +[2024-11-07 14:20:00,525][01364] Starting process rollout_proc1 +[2024-11-07 14:20:00,531][01364] Starting process rollout_proc2 +[2024-11-07 14:20:00,532][01364] Starting process rollout_proc3 +[2024-11-07 14:20:00,533][01364] Starting process rollout_proc4 +[2024-11-07 14:20:00,534][01364] Starting process rollout_proc5 +[2024-11-07 14:20:00,534][01364] Starting process rollout_proc6 +[2024-11-07 14:20:00,539][01364] Starting process rollout_proc7 +[2024-11-07 14:20:09,203][01593] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-11-07 14:20:09,203][01593] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 +[2024-11-07 14:20:09,347][01608] Worker 2 uses CPU cores [2] +[2024-11-07 14:20:09,365][01617] Worker 7 uses CPU cores [0, 1, 2, 3, 4, 5, 6] +[2024-11-07 14:20:09,426][01615] Worker 6 uses CPU cores [6] +[2024-11-07 14:20:09,451][01609] Worker 1 uses CPU cores [1] +[2024-11-07 14:20:09,547][01593] Num visible devices: 1 +[2024-11-07 14:20:09,599][01593] Starting seed is not provided +[2024-11-07 14:20:09,600][01593] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-11-07 14:20:09,601][01593] Initializing actor-critic model on device cuda:0 +[2024-11-07 14:20:09,610][01593] RunningMeanStd input shape: (3, 72, 128) +[2024-11-07 14:20:09,636][01593] RunningMeanStd input shape: (1,) +[2024-11-07 14:20:09,721][01593] ConvEncoder: input_channels=3 +[2024-11-07 14:20:09,749][01614] Worker 5 uses CPU cores [5] +[2024-11-07 14:20:10,207][01610] Worker 3 uses CPU cores [3] +[2024-11-07 14:20:10,222][01612] Worker 4 uses CPU cores [4] +[2024-11-07 14:20:10,250][01606] Worker 0 uses CPU cores [0] +[2024-11-07 14:20:10,328][01607] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-11-07 14:20:10,328][01607] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 +[2024-11-07 14:20:10,365][01607] Num visible devices: 1 +[2024-11-07 14:20:11,011][01593] Conv encoder output size: 512 +[2024-11-07 14:20:11,012][01593] Policy head output size: 512 +[2024-11-07 14:20:11,256][01593] Created Actor Critic model with architecture: +[2024-11-07 14:20:11,256][01593] ActorCriticSharedWeights( + (obs_normalizer): ObservationNormalizer( + (running_mean_std): RunningMeanStdDictInPlace( + (running_mean_std): ModuleDict( + (obs): RunningMeanStdInPlace() + ) + ) + ) + (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) + (encoder): VizdoomEncoder( + (basic_encoder): ConvEncoder( + (enc): RecursiveScriptModule( + original_name=ConvEncoderImpl + (conv_head): RecursiveScriptModule( + original_name=Sequential + (0): RecursiveScriptModule(original_name=Conv2d) + (1): RecursiveScriptModule(original_name=ELU) + (2): RecursiveScriptModule(original_name=Conv2d) + (3): RecursiveScriptModule(original_name=ELU) + (4): RecursiveScriptModule(original_name=Conv2d) + (5): RecursiveScriptModule(original_name=ELU) + ) + (mlp_layers): RecursiveScriptModule( + original_name=Sequential + (0): RecursiveScriptModule(original_name=Linear) + (1): RecursiveScriptModule(original_name=ELU) + ) + ) + ) + ) + (core): ModelCoreRNN( + (core): GRU(512, 512) + ) + (decoder): MlpDecoder( + (mlp): Identity() + ) + (critic_linear): Linear(in_features=512, out_features=1, bias=True) + (action_parameterization): ActionParameterizationDefault( + (distribution_linear): Linear(in_features=512, out_features=5, bias=True) + ) +) +[2024-11-07 14:20:15,178][01593] Using optimizer +[2024-11-07 14:20:20,270][01364] Heartbeat connected on Batcher_0 +[2024-11-07 14:20:20,280][01364] Heartbeat connected on InferenceWorker_p0-w0 +[2024-11-07 14:20:20,287][01364] Heartbeat connected on RolloutWorker_w0 +[2024-11-07 14:20:20,293][01364] Heartbeat connected on RolloutWorker_w1 +[2024-11-07 14:20:20,305][01364] Heartbeat connected on RolloutWorker_w3 +[2024-11-07 14:20:20,308][01364] Heartbeat connected on RolloutWorker_w2 +[2024-11-07 14:20:20,312][01364] Heartbeat connected on RolloutWorker_w4 +[2024-11-07 14:20:20,318][01364] Heartbeat connected on RolloutWorker_w5 +[2024-11-07 14:20:20,327][01364] Heartbeat connected on RolloutWorker_w6 +[2024-11-07 14:20:20,328][01364] Heartbeat connected on RolloutWorker_w7 +[2024-11-07 14:20:22,849][01593] Loading state from checkpoint /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000984_4030464.pth... +[2024-11-07 14:20:23,407][01593] Loading model from checkpoint +[2024-11-07 14:20:23,412][01593] Loaded experiment state at self.train_step=984, self.env_steps=4030464 +[2024-11-07 14:20:23,413][01593] Initialized policy 0 weights for model version 984 +[2024-11-07 14:20:23,423][01593] LearnerWorker_p0 finished initialization! +[2024-11-07 14:20:23,423][01593] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-11-07 14:20:23,425][01364] Heartbeat connected on LearnerWorker_p0 +[2024-11-07 14:20:23,701][01607] RunningMeanStd input shape: (3, 72, 128) +[2024-11-07 14:20:23,702][01607] RunningMeanStd input shape: (1,) +[2024-11-07 14:20:23,723][01607] ConvEncoder: input_channels=3 +[2024-11-07 14:20:23,896][01607] Conv encoder output size: 512 +[2024-11-07 14:20:23,897][01607] Policy head output size: 512 +[2024-11-07 14:20:23,967][01364] Inference worker 0-0 is ready! +[2024-11-07 14:20:23,969][01364] All inference workers are ready! Signal rollout workers to start! +[2024-11-07 14:20:24,129][01612] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 14:20:24,149][01608] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 14:20:24,151][01610] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 14:20:24,184][01614] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 14:20:24,211][01609] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 14:20:24,261][01617] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 14:20:24,269][01615] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 14:20:24,270][01606] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 14:20:24,824][01364] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 4030464. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-11-07 14:20:27,103][01617] Decorrelating experience for 0 frames... +[2024-11-07 14:20:27,104][01608] Decorrelating experience for 0 frames... +[2024-11-07 14:20:27,104][01610] Decorrelating experience for 0 frames... +[2024-11-07 14:20:27,104][01609] Decorrelating experience for 0 frames... +[2024-11-07 14:20:27,103][01614] Decorrelating experience for 0 frames... +[2024-11-07 14:20:27,569][01612] Decorrelating experience for 0 frames... +[2024-11-07 14:20:27,573][01615] Decorrelating experience for 0 frames... +[2024-11-07 14:20:27,611][01617] Decorrelating experience for 32 frames... +[2024-11-07 14:20:27,652][01609] Decorrelating experience for 32 frames... +[2024-11-07 14:20:27,843][01614] Decorrelating experience for 32 frames... +[2024-11-07 14:20:28,161][01608] Decorrelating experience for 32 frames... +[2024-11-07 14:20:28,192][01615] Decorrelating experience for 32 frames... +[2024-11-07 14:20:28,215][01612] Decorrelating experience for 32 frames... +[2024-11-07 14:20:28,243][01606] Decorrelating experience for 0 frames... +[2024-11-07 14:20:28,384][01610] Decorrelating experience for 32 frames... +[2024-11-07 14:20:28,438][01617] Decorrelating experience for 64 frames... +[2024-11-07 14:20:28,446][01609] Decorrelating experience for 64 frames... +[2024-11-07 14:20:28,616][01614] Decorrelating experience for 64 frames... +[2024-11-07 14:20:28,992][01608] Decorrelating experience for 64 frames... +[2024-11-07 14:20:29,095][01612] Decorrelating experience for 64 frames... +[2024-11-07 14:20:29,309][01606] Decorrelating experience for 32 frames... +[2024-11-07 14:20:29,709][01610] Decorrelating experience for 64 frames... +[2024-11-07 14:20:29,735][01609] Decorrelating experience for 96 frames... +[2024-11-07 14:20:29,821][01364] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 4030464. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-11-07 14:20:29,941][01614] Decorrelating experience for 96 frames... +[2024-11-07 14:20:29,954][01608] Decorrelating experience for 96 frames... +[2024-11-07 14:20:29,975][01615] Decorrelating experience for 64 frames... +[2024-11-07 14:20:30,490][01612] Decorrelating experience for 96 frames... +[2024-11-07 14:20:30,705][01617] Decorrelating experience for 96 frames... +[2024-11-07 14:20:31,022][01610] Decorrelating experience for 96 frames... +[2024-11-07 14:20:31,053][01606] Decorrelating experience for 64 frames... +[2024-11-07 14:20:31,060][01615] Decorrelating experience for 96 frames... +[2024-11-07 14:20:31,473][01606] Decorrelating experience for 96 frames... +[2024-11-07 14:20:34,819][01364] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 4030464. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-11-07 14:20:38,893][01593] Signal inference workers to stop experience collection... +[2024-11-07 14:20:38,915][01607] InferenceWorker_p0-w0: stopping experience collection +[2024-11-07 14:20:39,820][01364] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 4030464. Throughput: 0: 151.9. Samples: 2278. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-11-07 14:20:39,822][01364] Avg episode reward: [(0, '1.962')] +[2024-11-07 14:20:44,819][01364] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 4030464. Throughput: 0: 113.9. Samples: 2278. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-11-07 14:20:44,823][01364] Avg episode reward: [(0, '1.962')] +[2024-11-07 14:20:49,819][01364] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 4030464. Throughput: 0: 91.1. Samples: 2278. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-11-07 14:20:49,821][01364] Avg episode reward: [(0, '1.962')] +[2024-11-07 14:20:51,836][01593] Signal inference workers to resume experience collection... +[2024-11-07 14:20:51,842][01593] Stopping Batcher_0... +[2024-11-07 14:20:51,842][01593] Loop batcher_evt_loop terminating... +[2024-11-07 14:20:51,882][01364] Component Batcher_0 stopped! +[2024-11-07 14:20:51,951][01614] Stopping RolloutWorker_w5... +[2024-11-07 14:20:51,976][01614] Loop rollout_proc5_evt_loop terminating... +[2024-11-07 14:20:51,977][01608] Stopping RolloutWorker_w2... +[2024-11-07 14:20:51,977][01608] Loop rollout_proc2_evt_loop terminating... +[2024-11-07 14:20:51,978][01609] Stopping RolloutWorker_w1... +[2024-11-07 14:20:51,979][01609] Loop rollout_proc1_evt_loop terminating... +[2024-11-07 14:20:51,952][01364] Component RolloutWorker_w5 stopped! +[2024-11-07 14:20:52,051][01364] Component RolloutWorker_w2 stopped! +[2024-11-07 14:20:52,053][01364] Component RolloutWorker_w1 stopped! +[2024-11-07 14:20:52,062][01606] Stopping RolloutWorker_w0... +[2024-11-07 14:20:52,063][01606] Loop rollout_proc0_evt_loop terminating... +[2024-11-07 14:20:52,067][01364] Component RolloutWorker_w0 stopped! +[2024-11-07 14:20:52,133][01364] Component RolloutWorker_w6 stopped! +[2024-11-07 14:20:52,135][01615] Stopping RolloutWorker_w6... +[2024-11-07 14:20:52,136][01615] Loop rollout_proc6_evt_loop terminating... +[2024-11-07 14:20:52,214][01607] Weights refcount: 2 0 +[2024-11-07 14:20:52,217][01607] Stopping InferenceWorker_p0-w0... +[2024-11-07 14:20:52,217][01607] Loop inference_proc0-0_evt_loop terminating... +[2024-11-07 14:20:52,217][01364] Component InferenceWorker_p0-w0 stopped! +[2024-11-07 14:20:52,251][01610] Stopping RolloutWorker_w3... +[2024-11-07 14:20:52,252][01610] Loop rollout_proc3_evt_loop terminating... +[2024-11-07 14:20:52,245][01364] Component RolloutWorker_w3 stopped! +[2024-11-07 14:20:52,289][01617] Stopping RolloutWorker_w7... +[2024-11-07 14:20:52,288][01364] Component RolloutWorker_w7 stopped! +[2024-11-07 14:20:52,290][01617] Loop rollout_proc7_evt_loop terminating... +[2024-11-07 14:20:52,398][01612] Stopping RolloutWorker_w4... +[2024-11-07 14:20:52,399][01612] Loop rollout_proc4_evt_loop terminating... +[2024-11-07 14:20:52,398][01364] Component RolloutWorker_w4 stopped! +[2024-11-07 14:20:53,361][01593] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000986_4038656.pth... +[2024-11-07 14:20:53,602][01593] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000982_4022272.pth +[2024-11-07 14:20:53,608][01593] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000986_4038656.pth... +[2024-11-07 14:20:53,773][01364] Component LearnerWorker_p0 stopped! +[2024-11-07 14:20:53,776][01364] Waiting for process learner_proc0 to stop... +[2024-11-07 14:20:53,778][01593] Stopping LearnerWorker_p0... +[2024-11-07 14:20:53,779][01593] Loop learner_proc0_evt_loop terminating... +[2024-11-07 14:20:55,232][01364] Waiting for process inference_proc0-0 to join... +[2024-11-07 14:20:55,234][01364] Waiting for process rollout_proc0 to join... +[2024-11-07 14:20:55,237][01364] Waiting for process rollout_proc1 to join... +[2024-11-07 14:20:55,238][01364] Waiting for process rollout_proc2 to join... +[2024-11-07 14:20:55,240][01364] Waiting for process rollout_proc3 to join... +[2024-11-07 14:20:55,242][01364] Waiting for process rollout_proc4 to join... +[2024-11-07 14:20:55,244][01364] Waiting for process rollout_proc5 to join... +[2024-11-07 14:20:55,246][01364] Waiting for process rollout_proc6 to join... +[2024-11-07 14:20:55,248][01364] Waiting for process rollout_proc7 to join... +[2024-11-07 14:20:55,251][01364] Batcher 0 profile tree view: +batching: 0.1728, releasing_batches: 0.0057 +[2024-11-07 14:20:55,254][01364] InferenceWorker_p0-w0 profile tree view: +update_model: 0.0126 +wait_policy: 0.0001 + wait_policy_total: 5.8211 +one_step: 0.0248 + handle_policy_step: 8.8489 + deserialize: 0.0802, stack: 0.0286, obs_to_device_normalize: 1.8886, forward: 6.2329, send_messages: 0.1620 + prepare_outputs: 0.4037 + to_cpu: 0.3303 +[2024-11-07 14:20:55,256][01364] Learner 0 profile tree view: +misc: 0.0000, prepare_batch: 5.2266 +train: 10.5341 + epoch_init: 0.0000, minibatch_init: 0.0000, losses_postprocess: 0.0017, kl_divergence: 0.0164, after_optimizer: 0.4198 + calculate_losses: 2.3312 + losses_init: 0.0000, forward_head: 0.6032, bptt_initial: 1.2839, tail: 0.0609, advantages_returns: 0.0013, losses: 0.2480 + bptt: 0.1068 + bptt_forward_core: 0.1065 + update: 7.7635 + clip: 0.7057 +[2024-11-07 14:20:55,257][01364] RolloutWorker_w0 profile tree view: +wait_for_trajectories: 0.0005, enqueue_policy_requests: 0.0795, env_step: 0.4782, overhead: 0.0679, complete_rollouts: 0.0116 +save_policy_outputs: 0.0594 + split_output_tensors: 0.0209 +[2024-11-07 14:20:55,259][01364] RolloutWorker_w7 profile tree view: +wait_for_trajectories: 0.0008, enqueue_policy_requests: 0.0926, env_step: 0.8226, overhead: 0.0360, complete_rollouts: 0.0009 +save_policy_outputs: 0.0530 + split_output_tensors: 0.0154 +[2024-11-07 14:20:55,262][01364] Loop Runner_EvtLoop terminating... +[2024-11-07 14:20:55,264][01364] Runner profile tree view: +main_loop: 54.9388 +[2024-11-07 14:20:55,268][01364] Collected {0: 4038656}, FPS: 149.1 +[2024-11-07 14:20:56,973][01364] Loading existing experiment configuration from /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/config.json +[2024-11-07 14:20:56,975][01364] Overriding arg 'num_workers' with value 1 passed from command line +[2024-11-07 14:20:56,977][01364] Adding new argument 'no_render'=True that is not in the saved config file! +[2024-11-07 14:20:56,980][01364] Adding new argument 'save_video'=True that is not in the saved config file! +[2024-11-07 14:20:56,982][01364] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! +[2024-11-07 14:20:56,985][01364] Adding new argument 'video_name'=None that is not in the saved config file! +[2024-11-07 14:20:56,992][01364] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! +[2024-11-07 14:20:56,995][01364] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! +[2024-11-07 14:20:56,999][01364] Adding new argument 'push_to_hub'=False that is not in the saved config file! +[2024-11-07 14:20:57,001][01364] Adding new argument 'hf_repository'=None that is not in the saved config file! +[2024-11-07 14:20:57,004][01364] Adding new argument 'policy_index'=0 that is not in the saved config file! +[2024-11-07 14:20:57,006][01364] Adding new argument 'eval_deterministic'=False that is not in the saved config file! +[2024-11-07 14:20:57,008][01364] Adding new argument 'train_script'=None that is not in the saved config file! +[2024-11-07 14:20:57,010][01364] Adding new argument 'enjoy_script'=None that is not in the saved config file! +[2024-11-07 14:20:57,013][01364] Using frameskip 1 and render_action_repeat=4 for evaluation +[2024-11-07 14:20:57,069][01364] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 14:20:57,073][01364] RunningMeanStd input shape: (3, 72, 128) +[2024-11-07 14:20:57,077][01364] RunningMeanStd input shape: (1,) +[2024-11-07 14:20:57,119][01364] ConvEncoder: input_channels=3 +[2024-11-07 14:20:57,361][01364] Conv encoder output size: 512 +[2024-11-07 14:20:57,363][01364] Policy head output size: 512 +[2024-11-07 14:20:58,389][01364] Loading state from checkpoint /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000986_4038656.pth... +[2024-11-07 14:20:59,532][01364] Num frames 100... +[2024-11-07 14:20:59,794][01364] Num frames 200... +[2024-11-07 14:21:00,049][01364] Num frames 300... +[2024-11-07 14:21:00,361][01364] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840 +[2024-11-07 14:21:00,366][01364] Avg episode reward: 3.840, avg true_objective: 3.840 +[2024-11-07 14:21:00,413][01364] Num frames 400... +[2024-11-07 14:21:00,720][01364] Num frames 500... +[2024-11-07 14:21:01,029][01364] Num frames 600... +[2024-11-07 14:21:01,338][01364] Num frames 700... +[2024-11-07 14:21:01,669][01364] Num frames 800... +[2024-11-07 14:21:01,723][01364] Avg episode rewards: #0: 4.500, true rewards: #0: 4.000 +[2024-11-07 14:21:01,726][01364] Avg episode reward: 4.500, avg true_objective: 4.000 +[2024-11-07 14:21:02,061][01364] Num frames 900... +[2024-11-07 14:21:02,385][01364] Num frames 1000... +[2024-11-07 14:21:02,682][01364] Num frames 1100... +[2024-11-07 14:21:02,997][01364] Avg episode rewards: #0: 4.280, true rewards: #0: 3.947 +[2024-11-07 14:21:02,998][01364] Avg episode reward: 4.280, avg true_objective: 3.947 +[2024-11-07 14:21:03,057][01364] Num frames 1200... +[2024-11-07 14:21:03,486][01364] Num frames 1300... +[2024-11-07 14:21:03,816][01364] Num frames 1400... +[2024-11-07 14:21:04,086][01364] Num frames 1500... +[2024-11-07 14:21:04,305][01364] Avg episode rewards: #0: 4.170, true rewards: #0: 3.920 +[2024-11-07 14:21:04,310][01364] Avg episode reward: 4.170, avg true_objective: 3.920 +[2024-11-07 14:21:04,411][01364] Num frames 1600... +[2024-11-07 14:21:04,677][01364] Num frames 1700... +[2024-11-07 14:21:04,958][01364] Num frames 1800... +[2024-11-07 14:21:05,230][01364] Num frames 1900... +[2024-11-07 14:21:05,420][01364] Avg episode rewards: #0: 4.104, true rewards: #0: 3.904 +[2024-11-07 14:21:05,422][01364] Avg episode reward: 4.104, avg true_objective: 3.904 +[2024-11-07 14:21:05,555][01364] Num frames 2000... +[2024-11-07 14:21:05,791][01364] Num frames 2100... +[2024-11-07 14:21:06,071][01364] Num frames 2200... +[2024-11-07 14:21:06,353][01364] Num frames 2300... +[2024-11-07 14:21:06,653][01364] Num frames 2400... +[2024-11-07 14:21:06,706][01364] Avg episode rewards: #0: 4.333, true rewards: #0: 4.000 +[2024-11-07 14:21:06,707][01364] Avg episode reward: 4.333, avg true_objective: 4.000 +[2024-11-07 14:21:06,949][01364] Num frames 2500... +[2024-11-07 14:21:07,174][01364] Num frames 2600... +[2024-11-07 14:21:07,368][01364] Num frames 2700... +[2024-11-07 14:21:07,468][01364] Avg episode rewards: #0: 4.170, true rewards: #0: 3.884 +[2024-11-07 14:21:07,470][01364] Avg episode reward: 4.170, avg true_objective: 3.884 +[2024-11-07 14:21:07,648][01364] Num frames 2800... +[2024-11-07 14:21:08,033][01364] Num frames 2900... +[2024-11-07 14:21:08,231][01364] Num frames 3000... +[2024-11-07 14:21:08,439][01364] Num frames 3100... +[2024-11-07 14:21:08,507][01364] Avg episode rewards: #0: 4.129, true rewards: #0: 3.879 +[2024-11-07 14:21:08,510][01364] Avg episode reward: 4.129, avg true_objective: 3.879 +[2024-11-07 14:21:08,729][01364] Num frames 3200... +[2024-11-07 14:21:08,921][01364] Num frames 3300... +[2024-11-07 14:21:09,156][01364] Num frames 3400... +[2024-11-07 14:21:09,370][01364] Num frames 3500... +[2024-11-07 14:21:09,590][01364] Num frames 3600... +[2024-11-07 14:21:09,769][01364] Avg episode rewards: #0: 4.497, true rewards: #0: 4.052 +[2024-11-07 14:21:09,771][01364] Avg episode reward: 4.497, avg true_objective: 4.052 +[2024-11-07 14:21:09,918][01364] Num frames 3700... +[2024-11-07 14:21:10,199][01364] Num frames 3800... +[2024-11-07 14:21:10,507][01364] Num frames 3900... +[2024-11-07 14:21:10,841][01364] Num frames 4000... +[2024-11-07 14:21:11,242][01364] Avg episode rewards: #0: 4.595, true rewards: #0: 4.095 +[2024-11-07 14:21:11,244][01364] Avg episode reward: 4.595, avg true_objective: 4.095 +[2024-11-07 14:21:30,385][01364] Replay video saved to /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/replay.mp4! +[2024-11-07 14:21:30,983][01364] Loading existing experiment configuration from /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/config.json +[2024-11-07 14:21:30,986][01364] Overriding arg 'num_workers' with value 1 passed from command line +[2024-11-07 14:21:30,987][01364] Adding new argument 'no_render'=True that is not in the saved config file! +[2024-11-07 14:21:30,989][01364] Adding new argument 'save_video'=True that is not in the saved config file! +[2024-11-07 14:21:30,991][01364] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! +[2024-11-07 14:21:30,992][01364] Adding new argument 'video_name'=None that is not in the saved config file! +[2024-11-07 14:21:30,996][01364] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! +[2024-11-07 14:21:31,003][01364] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! +[2024-11-07 14:21:31,008][01364] Adding new argument 'push_to_hub'=True that is not in the saved config file! +[2024-11-07 14:21:31,015][01364] Adding new argument 'hf_repository'='alidenewade/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! +[2024-11-07 14:21:31,016][01364] Adding new argument 'policy_index'=0 that is not in the saved config file! +[2024-11-07 14:21:31,019][01364] Adding new argument 'eval_deterministic'=False that is not in the saved config file! +[2024-11-07 14:21:31,022][01364] Adding new argument 'train_script'=None that is not in the saved config file! +[2024-11-07 14:21:31,024][01364] Adding new argument 'enjoy_script'=None that is not in the saved config file! +[2024-11-07 14:21:31,027][01364] Using frameskip 1 and render_action_repeat=4 for evaluation +[2024-11-07 14:21:31,140][01364] RunningMeanStd input shape: (3, 72, 128) +[2024-11-07 14:21:31,144][01364] RunningMeanStd input shape: (1,) +[2024-11-07 14:21:31,185][01364] ConvEncoder: input_channels=3 +[2024-11-07 14:21:31,424][01364] Conv encoder output size: 512 +[2024-11-07 14:21:31,427][01364] Policy head output size: 512 +[2024-11-07 14:21:31,517][01364] Loading state from checkpoint /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000986_4038656.pth... +[2024-11-07 14:21:32,709][01364] Num frames 100... +[2024-11-07 14:21:33,141][01364] Num frames 200... +[2024-11-07 14:21:33,553][01364] Num frames 300... +[2024-11-07 14:21:33,980][01364] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840 +[2024-11-07 14:21:33,982][01364] Avg episode reward: 3.840, avg true_objective: 3.840 +[2024-11-07 14:21:34,090][01364] Num frames 400... +[2024-11-07 14:21:34,592][01364] Num frames 500... +[2024-11-07 14:21:34,983][01364] Num frames 600... +[2024-11-07 14:21:35,204][01364] Num frames 700... +[2024-11-07 14:21:35,568][01364] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840 +[2024-11-07 14:21:35,569][01364] Avg episode reward: 3.840, avg true_objective: 3.840 +[2024-11-07 14:21:35,685][01364] Num frames 800... +[2024-11-07 14:21:35,875][01364] Num frames 900... +[2024-11-07 14:21:36,122][01364] Num frames 1000... +[2024-11-07 14:21:36,392][01364] Num frames 1100... +[2024-11-07 14:21:36,594][01364] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840 +[2024-11-07 14:21:36,600][01364] Avg episode reward: 3.840, avg true_objective: 3.840 +[2024-11-07 14:21:36,708][01364] Num frames 1200... +[2024-11-07 14:21:36,914][01364] Num frames 1300... +[2024-11-07 14:21:37,102][01364] Num frames 1400... +[2024-11-07 14:21:37,285][01364] Num frames 1500... +[2024-11-07 14:21:37,408][01364] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840 +[2024-11-07 14:21:37,410][01364] Avg episode reward: 3.840, avg true_objective: 3.840 +[2024-11-07 14:21:37,556][01364] Num frames 1600... +[2024-11-07 14:21:37,767][01364] Num frames 1700... +[2024-11-07 14:21:37,971][01364] Num frames 1800... +[2024-11-07 14:21:38,265][01364] Num frames 1900... +[2024-11-07 14:21:38,537][01364] Avg episode rewards: #0: 4.168, true rewards: #0: 3.968 +[2024-11-07 14:21:38,539][01364] Avg episode reward: 4.168, avg true_objective: 3.968 +[2024-11-07 14:21:38,587][01364] Num frames 2000... +[2024-11-07 14:21:38,827][01364] Num frames 2100... +[2024-11-07 14:21:39,012][01364] Num frames 2200... +[2024-11-07 14:21:39,203][01364] Num frames 2300... +[2024-11-07 14:21:39,384][01364] Avg episode rewards: #0: 4.113, true rewards: #0: 3.947 +[2024-11-07 14:21:39,386][01364] Avg episode reward: 4.113, avg true_objective: 3.947 +[2024-11-07 14:21:39,445][01364] Num frames 2400... +[2024-11-07 14:21:39,646][01364] Num frames 2500... +[2024-11-07 14:21:39,981][01364] Num frames 2600... +[2024-11-07 14:21:40,214][01364] Num frames 2700... +[2024-11-07 14:21:40,359][01364] Avg episode rewards: #0: 4.074, true rewards: #0: 3.931 +[2024-11-07 14:21:40,364][01364] Avg episode reward: 4.074, avg true_objective: 3.931 +[2024-11-07 14:21:40,462][01364] Num frames 2800... +[2024-11-07 14:21:40,724][01364] Num frames 2900... +[2024-11-07 14:21:40,965][01364] Num frames 3000... +[2024-11-07 14:21:41,149][01364] Num frames 3100... +[2024-11-07 14:21:41,273][01364] Avg episode rewards: #0: 4.045, true rewards: #0: 3.920 +[2024-11-07 14:21:41,276][01364] Avg episode reward: 4.045, avg true_objective: 3.920 +[2024-11-07 14:21:41,466][01364] Num frames 3200... +[2024-11-07 14:21:41,748][01364] Num frames 3300... +[2024-11-07 14:21:42,245][01364] Num frames 3400... +[2024-11-07 14:21:42,632][01364] Num frames 3500... +[2024-11-07 14:21:43,045][01364] Avg episode rewards: #0: 4.204, true rewards: #0: 3.982 +[2024-11-07 14:21:43,049][01364] Avg episode reward: 4.204, avg true_objective: 3.982 +[2024-11-07 14:21:43,134][01364] Num frames 3600... +[2024-11-07 14:21:43,343][01364] Num frames 3700... +[2024-11-07 14:21:43,558][01364] Num frames 3800... +[2024-11-07 14:21:43,775][01364] Num frames 3900... +[2024-11-07 14:21:43,963][01364] Avg episode rewards: #0: 4.168, true rewards: #0: 3.968 +[2024-11-07 14:21:43,965][01364] Avg episode reward: 4.168, avg true_objective: 3.968 +[2024-11-07 14:21:56,432][01364] Replay video saved to /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/replay.mp4! +[2024-11-07 14:22:10,194][01364] The model has been pushed to https://huggingface.co/alidenewade/rl_course_vizdoom_health_gathering_supreme +[2024-11-07 14:22:17,362][01364] Loading existing experiment configuration from /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/config.json +[2024-11-07 14:22:17,364][01364] Overriding arg 'num_workers' with value 1 passed from command line +[2024-11-07 14:22:17,367][01364] Adding new argument 'no_render'=True that is not in the saved config file! +[2024-11-07 14:22:17,370][01364] Adding new argument 'save_video'=True that is not in the saved config file! +[2024-11-07 14:22:17,372][01364] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! +[2024-11-07 14:22:17,375][01364] Adding new argument 'video_name'=None that is not in the saved config file! +[2024-11-07 14:22:17,377][01364] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! +[2024-11-07 14:22:17,382][01364] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! +[2024-11-07 14:22:17,384][01364] Adding new argument 'push_to_hub'=True that is not in the saved config file! +[2024-11-07 14:22:17,385][01364] Adding new argument 'hf_repository'='alidenewade/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! +[2024-11-07 14:22:17,388][01364] Adding new argument 'policy_index'=0 that is not in the saved config file! +[2024-11-07 14:22:17,389][01364] Adding new argument 'eval_deterministic'=False that is not in the saved config file! +[2024-11-07 14:22:17,392][01364] Adding new argument 'train_script'=None that is not in the saved config file! +[2024-11-07 14:22:17,396][01364] Adding new argument 'enjoy_script'=None that is not in the saved config file! +[2024-11-07 14:22:17,398][01364] Using frameskip 1 and render_action_repeat=4 for evaluation +[2024-11-07 14:22:17,445][01364] RunningMeanStd input shape: (3, 72, 128) +[2024-11-07 14:22:17,450][01364] RunningMeanStd input shape: (1,) +[2024-11-07 14:22:17,485][01364] ConvEncoder: input_channels=3 +[2024-11-07 14:22:17,580][01364] Conv encoder output size: 512 +[2024-11-07 14:22:17,582][01364] Policy head output size: 512 +[2024-11-07 14:22:17,619][01364] Loading state from checkpoint /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000986_4038656.pth... +[2024-11-07 14:22:18,441][01364] Num frames 100... +[2024-11-07 14:22:18,875][01364] Num frames 200... +[2024-11-07 14:22:19,235][01364] Num frames 300... +[2024-11-07 14:22:19,612][01364] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840 +[2024-11-07 14:22:19,614][01364] Avg episode reward: 3.840, avg true_objective: 3.840 +[2024-11-07 14:22:19,694][01364] Num frames 400... +[2024-11-07 14:22:20,083][01364] Num frames 500... +[2024-11-07 14:22:20,493][01364] Num frames 600... +[2024-11-07 14:22:20,960][01364] Num frames 700... +[2024-11-07 14:22:21,365][01364] Num frames 800... +[2024-11-07 14:22:21,518][01364] Avg episode rewards: #0: 4.660, true rewards: #0: 4.160 +[2024-11-07 14:22:21,519][01364] Avg episode reward: 4.660, avg true_objective: 4.160 +[2024-11-07 14:22:21,722][01364] Num frames 900... +[2024-11-07 14:22:22,052][01364] Num frames 1000... +[2024-11-07 14:22:22,332][01364] Num frames 1100... +[2024-11-07 14:22:22,612][01364] Num frames 1200... +[2024-11-07 14:22:22,926][01364] Avg episode rewards: #0: 4.933, true rewards: #0: 4.267 +[2024-11-07 14:22:22,927][01364] Avg episode reward: 4.933, avg true_objective: 4.267 +[2024-11-07 14:22:23,001][01364] Num frames 1300... +[2024-11-07 14:22:23,320][01364] Num frames 1400... +[2024-11-07 14:22:23,606][01364] Num frames 1500... +[2024-11-07 14:22:23,880][01364] Num frames 1600... +[2024-11-07 14:22:24,135][01364] Avg episode rewards: #0: 4.660, true rewards: #0: 4.160 +[2024-11-07 14:22:24,136][01364] Avg episode reward: 4.660, avg true_objective: 4.160 +[2024-11-07 14:22:24,254][01364] Num frames 1700... +[2024-11-07 14:22:24,530][01364] Num frames 1800... +[2024-11-07 14:22:24,780][01364] Num frames 1900... +[2024-11-07 14:22:25,066][01364] Num frames 2000... +[2024-11-07 14:22:25,268][01364] Avg episode rewards: #0: 4.496, true rewards: #0: 4.096 +[2024-11-07 14:22:25,274][01364] Avg episode reward: 4.496, avg true_objective: 4.096 +[2024-11-07 14:22:25,413][01364] Num frames 2100... +[2024-11-07 14:22:25,642][01364] Num frames 2200... +[2024-11-07 14:22:25,905][01364] Num frames 2300... +[2024-11-07 14:22:26,208][01364] Num frames 2400... +[2024-11-07 14:22:26,349][01364] Avg episode rewards: #0: 4.387, true rewards: #0: 4.053 +[2024-11-07 14:22:26,350][01364] Avg episode reward: 4.387, avg true_objective: 4.053 +[2024-11-07 14:22:26,528][01364] Num frames 2500... +[2024-11-07 14:22:26,795][01364] Num frames 2600... +[2024-11-07 14:22:27,084][01364] Num frames 2700... +[2024-11-07 14:22:27,366][01364] Num frames 2800... +[2024-11-07 14:22:29,207][01364] Num frames 2900... +[2024-11-07 14:22:29,387][01364] Avg episode rewards: #0: 4.777, true rewards: #0: 4.206 +[2024-11-07 14:22:29,392][01364] Avg episode reward: 4.777, avg true_objective: 4.206 +[2024-11-07 14:22:29,605][01364] Num frames 3000... +[2024-11-07 14:22:29,954][01364] Num frames 3100... +[2024-11-07 14:22:30,304][01364] Num frames 3200... +[2024-11-07 14:22:30,773][01364] Num frames 3300... +[2024-11-07 14:22:30,997][01364] Avg episode rewards: #0: 4.660, true rewards: #0: 4.160 +[2024-11-07 14:22:30,999][01364] Avg episode reward: 4.660, avg true_objective: 4.160 +[2024-11-07 14:22:31,199][01364] Num frames 3400... +[2024-11-07 14:22:31,513][01364] Num frames 3500... +[2024-11-07 14:22:31,765][01364] Num frames 3600... +[2024-11-07 14:22:32,016][01364] Num frames 3700... +[2024-11-07 14:22:32,260][01364] Avg episode rewards: #0: 4.751, true rewards: #0: 4.196 +[2024-11-07 14:22:32,266][01364] Avg episode reward: 4.751, avg true_objective: 4.196 +[2024-11-07 14:22:32,344][01364] Num frames 3800... +[2024-11-07 14:22:32,570][01364] Num frames 3900... +[2024-11-07 14:22:32,817][01364] Num frames 4000... +[2024-11-07 14:22:33,072][01364] Num frames 4100... +[2024-11-07 14:22:33,260][01364] Avg episode rewards: #0: 4.660, true rewards: #0: 4.160 +[2024-11-07 14:22:33,261][01364] Avg episode reward: 4.660, avg true_objective: 4.160 +[2024-11-07 14:22:46,334][01364] Replay video saved to /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/replay.mp4! +[2024-11-07 14:22:51,976][01364] The model has been pushed to https://huggingface.co/alidenewade/rl_course_vizdoom_health_gathering_supreme +[2024-11-07 14:26:53,545][01364] Environment doom_basic already registered, overwriting... +[2024-11-07 14:26:53,548][01364] Environment doom_two_colors_easy already registered, overwriting... +[2024-11-07 14:26:53,550][01364] Environment doom_two_colors_hard already registered, overwriting... +[2024-11-07 14:26:53,552][01364] Environment doom_dm already registered, overwriting... +[2024-11-07 14:26:53,553][01364] Environment doom_dwango5 already registered, overwriting... +[2024-11-07 14:26:53,555][01364] Environment doom_my_way_home_flat_actions already registered, overwriting... +[2024-11-07 14:26:53,556][01364] Environment doom_defend_the_center_flat_actions already registered, overwriting... +[2024-11-07 14:26:53,558][01364] Environment doom_my_way_home already registered, overwriting... +[2024-11-07 14:26:53,559][01364] Environment doom_deadly_corridor already registered, overwriting... +[2024-11-07 14:26:53,561][01364] Environment doom_defend_the_center already registered, overwriting... +[2024-11-07 14:26:53,564][01364] Environment doom_defend_the_line already registered, overwriting... +[2024-11-07 14:26:53,565][01364] Environment doom_health_gathering already registered, overwriting... +[2024-11-07 14:26:53,565][01364] Environment doom_health_gathering_supreme already registered, overwriting... +[2024-11-07 14:26:53,568][01364] Environment doom_battle already registered, overwriting... +[2024-11-07 14:26:53,569][01364] Environment doom_battle2 already registered, overwriting... +[2024-11-07 14:26:53,570][01364] Environment doom_duel_bots already registered, overwriting... +[2024-11-07 14:26:53,572][01364] Environment doom_deathmatch_bots already registered, overwriting... +[2024-11-07 14:26:53,574][01364] Environment doom_duel already registered, overwriting... +[2024-11-07 14:26:53,575][01364] Environment doom_deathmatch_full already registered, overwriting... +[2024-11-07 14:26:53,577][01364] Environment doom_benchmark already registered, overwriting... +[2024-11-07 14:26:53,579][01364] register_encoder_factory: +[2024-11-07 14:26:53,595][01364] Loading existing experiment configuration from /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/config.json +[2024-11-07 14:26:53,597][01364] Overriding arg 'env' with value 'LunarLander-v2' passed from command line +[2024-11-07 14:26:53,603][01364] Experiment dir /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment already exists! +[2024-11-07 14:26:53,604][01364] Resuming existing experiment from /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment... +[2024-11-07 14:26:53,607][01364] Weights and Biases integration disabled +[2024-11-07 14:26:53,612][01364] Environment var CUDA_VISIBLE_DEVICES is 0 + +[2024-11-07 14:32:23,905][03851] Saving configuration to /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/config.json... +[2024-11-07 14:32:23,907][03851] Rollout worker 0 uses device cpu +[2024-11-07 14:32:23,908][03851] Rollout worker 1 uses device cpu +[2024-11-07 14:32:23,909][03851] Rollout worker 2 uses device cpu +[2024-11-07 14:32:23,911][03851] Rollout worker 3 uses device cpu +[2024-11-07 14:32:23,913][03851] Rollout worker 4 uses device cpu +[2024-11-07 14:32:23,914][03851] Rollout worker 5 uses device cpu +[2024-11-07 14:32:23,916][03851] Rollout worker 6 uses device cpu +[2024-11-07 14:32:23,918][03851] Rollout worker 7 uses device cpu +[2024-11-07 14:32:23,982][03851] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-11-07 14:32:23,983][03851] InferenceWorker_p0-w0: min num requests: 2 +[2024-11-07 14:32:24,069][03851] Starting all processes... +[2024-11-07 14:32:24,070][03851] Starting process learner_proc0 +[2024-11-07 14:32:24,119][03851] Starting all processes... +[2024-11-07 14:32:24,126][03851] Starting process inference_proc0-0 +[2024-11-07 14:32:24,126][03851] Starting process rollout_proc0 +[2024-11-07 14:32:24,127][03851] Starting process rollout_proc1 +[2024-11-07 14:32:24,128][03851] Starting process rollout_proc2 +[2024-11-07 14:32:24,130][03851] Starting process rollout_proc3 +[2024-11-07 14:32:24,131][03851] Starting process rollout_proc4 +[2024-11-07 14:32:24,132][03851] Starting process rollout_proc5 +[2024-11-07 14:32:24,133][03851] Starting process rollout_proc6 +[2024-11-07 14:32:24,134][03851] Starting process rollout_proc7 +[2024-11-07 14:32:28,240][04103] Worker 3 uses CPU cores [3] +[2024-11-07 14:32:28,599][04099] Worker 0 uses CPU cores [0] +[2024-11-07 14:32:28,800][04102] Worker 2 uses CPU cores [2] +[2024-11-07 14:32:28,820][04086] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-11-07 14:32:28,820][04086] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 +[2024-11-07 14:32:28,984][04086] Num visible devices: 1 +[2024-11-07 14:32:29,023][04086] Starting seed is not provided +[2024-11-07 14:32:29,023][04086] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-11-07 14:32:29,023][04086] Initializing actor-critic model on device cuda:0 +[2024-11-07 14:32:29,024][04086] RunningMeanStd input shape: (3, 72, 128) +[2024-11-07 14:32:29,030][04086] RunningMeanStd input shape: (1,) +[2024-11-07 14:32:29,031][04101] Worker 1 uses CPU cores [1] +[2024-11-07 14:32:29,065][04086] ConvEncoder: input_channels=3 +[2024-11-07 14:32:29,179][04105] Worker 5 uses CPU cores [5] +[2024-11-07 14:32:29,418][04086] Conv encoder output size: 512 +[2024-11-07 14:32:29,418][04086] Policy head output size: 512 +[2024-11-07 14:32:29,488][04086] Created Actor Critic model with architecture: +[2024-11-07 14:32:29,488][04086] ActorCriticSharedWeights( + (obs_normalizer): ObservationNormalizer( + (running_mean_std): RunningMeanStdDictInPlace( + (running_mean_std): ModuleDict( + (obs): RunningMeanStdInPlace() + ) + ) + ) + (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) + (encoder): VizdoomEncoder( + (basic_encoder): ConvEncoder( + (enc): RecursiveScriptModule( + original_name=ConvEncoderImpl + (conv_head): RecursiveScriptModule( + original_name=Sequential + (0): RecursiveScriptModule(original_name=Conv2d) + (1): RecursiveScriptModule(original_name=ELU) + (2): RecursiveScriptModule(original_name=Conv2d) + (3): RecursiveScriptModule(original_name=ELU) + (4): RecursiveScriptModule(original_name=Conv2d) + (5): RecursiveScriptModule(original_name=ELU) + ) + (mlp_layers): RecursiveScriptModule( + original_name=Sequential + (0): RecursiveScriptModule(original_name=Linear) + (1): RecursiveScriptModule(original_name=ELU) + ) + ) + ) + ) + (core): ModelCoreRNN( + (core): GRU(512, 512) + ) + (decoder): MlpDecoder( + (mlp): Identity() + ) + (critic_linear): Linear(in_features=512, out_features=1, bias=True) + (action_parameterization): ActionParameterizationDefault( + (distribution_linear): Linear(in_features=512, out_features=4, bias=True) + ) +) +[2024-11-07 14:32:29,721][04100] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-11-07 14:32:29,722][04100] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 +[2024-11-07 14:32:29,759][04100] Num visible devices: 1 +[2024-11-07 14:32:29,806][04106] Worker 6 uses CPU cores [6] +[2024-11-07 14:32:29,878][04107] Worker 7 uses CPU cores [0, 1, 2, 3, 4, 5, 6] +[2024-11-07 14:32:29,920][04104] Worker 4 uses CPU cores [4] +[2024-11-07 14:32:30,421][04086] Using optimizer +[2024-11-07 14:32:31,436][04086] Loading state from checkpoint /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000986_4038656.pth... +[2024-11-07 14:32:31,484][04086] Loading model from checkpoint +[2024-11-07 14:32:31,485][04086] EvtLoop [learner_proc0_evt_loop, process=learner_proc0] unhandled exception in slot='init' connected to emitter=Emitter(object_id='Runner_EvtLoop', signal_name='start'), args=() +Traceback (most recent call last): + File "/root/miniconda3/envs/unit8/lib/python3.10/site-packages/signal_slot/signal_slot.py", line 355, in _process_signal + slot_callable(*args) + File "/root/miniconda3/envs/unit8/lib/python3.10/site-packages/sample_factory/algo/learning/learner_worker.py", line 139, in init + init_model_data = self.learner.init() + File "/root/miniconda3/envs/unit8/lib/python3.10/site-packages/sample_factory/algo/learning/learner.py", line 245, in init + self.load_from_checkpoint(self.policy_id) + File "/root/miniconda3/envs/unit8/lib/python3.10/site-packages/sample_factory/algo/learning/learner.py", line 307, in load_from_checkpoint + self._load_state(checkpoint_dict, load_progress=load_progress) + File "/root/miniconda3/envs/unit8/lib/python3.10/site-packages/sample_factory/algo/learning/learner.py", line 291, in _load_state + self.actor_critic.load_state_dict(checkpoint_dict["model"]) + File "/root/miniconda3/envs/unit8/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2584, in load_state_dict + raise RuntimeError( +RuntimeError: Error(s) in loading state_dict for ActorCriticSharedWeights: + size mismatch for action_parameterization.distribution_linear.weight: copying a param with shape torch.Size([5, 512]) from checkpoint, the shape in current model is torch.Size([4, 512]). + size mismatch for action_parameterization.distribution_linear.bias: copying a param with shape torch.Size([5]) from checkpoint, the shape in current model is torch.Size([4]). +[2024-11-07 14:32:31,488][04086] Unhandled exception Error(s) in loading state_dict for ActorCriticSharedWeights: + size mismatch for action_parameterization.distribution_linear.weight: copying a param with shape torch.Size([5, 512]) from checkpoint, the shape in current model is torch.Size([4, 512]). + size mismatch for action_parameterization.distribution_linear.bias: copying a param with shape torch.Size([5]) from checkpoint, the shape in current model is torch.Size([4]). in evt loop learner_proc0_evt_loop +[2024-11-07 14:32:43,973][03851] Heartbeat connected on Batcher_0 +[2024-11-07 14:32:43,982][03851] Heartbeat connected on InferenceWorker_p0-w0 +[2024-11-07 14:32:43,989][03851] Heartbeat connected on RolloutWorker_w0 +[2024-11-07 14:32:43,992][03851] Heartbeat connected on RolloutWorker_w1 +[2024-11-07 14:32:43,997][03851] Heartbeat connected on RolloutWorker_w2 +[2024-11-07 14:32:44,000][03851] Heartbeat connected on RolloutWorker_w3 +[2024-11-07 14:32:44,004][03851] Heartbeat connected on RolloutWorker_w4 +[2024-11-07 14:32:44,007][03851] Heartbeat connected on RolloutWorker_w5 +[2024-11-07 14:32:44,067][03851] Heartbeat connected on RolloutWorker_w6 +[2024-11-07 14:32:44,068][03851] Heartbeat connected on RolloutWorker_w7 +[2024-11-07 14:34:40,019][03851] Keyboard interrupt detected in the event loop EvtLoop [Runner_EvtLoop, process=main process 3851], exiting... +[2024-11-07 14:34:40,023][04100] Stopping InferenceWorker_p0-w0... +[2024-11-07 14:34:40,023][04106] Stopping RolloutWorker_w6... +[2024-11-07 14:34:40,023][04099] Stopping RolloutWorker_w0... +[2024-11-07 14:34:40,023][04106] Loop rollout_proc6_evt_loop terminating... +[2024-11-07 14:34:40,023][04100] Loop inference_proc0-0_evt_loop terminating... +[2024-11-07 14:34:40,023][04099] Loop rollout_proc0_evt_loop terminating... +[2024-11-07 14:34:40,022][04107] Stopping RolloutWorker_w7... +[2024-11-07 14:34:40,024][04107] Loop rollout_proc7_evt_loop terminating... +[2024-11-07 14:34:40,024][04104] Stopping RolloutWorker_w4... +[2024-11-07 14:34:40,026][04104] Loop rollout_proc4_evt_loop terminating... +[2024-11-07 14:34:40,023][03851] Runner profile tree view: +main_loop: 135.9546 +[2024-11-07 14:34:40,027][04086] Stopping Batcher_0... +[2024-11-07 14:34:40,028][04086] Loop batcher_evt_loop terminating... +[2024-11-07 14:34:40,027][03851] Collected {}, FPS: 0.0 +[2024-11-07 14:34:40,029][04102] Stopping RolloutWorker_w2... +[2024-11-07 14:34:40,029][04102] Loop rollout_proc2_evt_loop terminating... +[2024-11-07 14:34:40,030][04105] Stopping RolloutWorker_w5... +[2024-11-07 14:34:40,031][04105] Loop rollout_proc5_evt_loop terminating... +[2024-11-07 14:34:40,034][04101] Stopping RolloutWorker_w1... +[2024-11-07 14:34:40,040][04101] Loop rollout_proc1_evt_loop terminating... +[2024-11-07 14:34:40,042][04103] Stopping RolloutWorker_w3... +[2024-11-07 14:34:40,046][04103] Loop rollout_proc3_evt_loop terminating... +[2024-11-07 14:35:11,133][04584] Saving configuration to /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/config.json... +[2024-11-07 14:35:11,135][04584] Rollout worker 0 uses device cpu +[2024-11-07 14:35:11,136][04584] Rollout worker 1 uses device cpu +[2024-11-07 14:35:11,138][04584] Rollout worker 2 uses device cpu +[2024-11-07 14:35:11,139][04584] Rollout worker 3 uses device cpu +[2024-11-07 14:35:11,141][04584] Rollout worker 4 uses device cpu +[2024-11-07 14:35:11,142][04584] Rollout worker 5 uses device cpu +[2024-11-07 14:35:11,144][04584] Rollout worker 6 uses device cpu +[2024-11-07 14:35:11,146][04584] Rollout worker 7 uses device cpu +[2024-11-07 14:35:11,205][04584] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-11-07 14:35:11,206][04584] InferenceWorker_p0-w0: min num requests: 2 +[2024-11-07 14:35:11,239][04584] Starting all processes... +[2024-11-07 14:35:11,241][04584] Starting process learner_proc0 +[2024-11-07 14:35:11,376][04584] Starting all processes... +[2024-11-07 14:35:11,383][04584] Starting process inference_proc0-0 +[2024-11-07 14:35:11,384][04584] Starting process rollout_proc0 +[2024-11-07 14:35:11,385][04584] Starting process rollout_proc1 +[2024-11-07 14:35:11,385][04584] Starting process rollout_proc2 +[2024-11-07 14:35:11,386][04584] Starting process rollout_proc3 +[2024-11-07 14:35:11,387][04584] Starting process rollout_proc4 +[2024-11-07 14:35:11,388][04584] Starting process rollout_proc5 +[2024-11-07 14:35:11,390][04584] Starting process rollout_proc6 +[2024-11-07 14:35:11,390][04584] Starting process rollout_proc7 +[2024-11-07 14:35:16,833][04708] Worker 6 uses CPU cores [6] +[2024-11-07 14:35:16,839][04706] Worker 4 uses CPU cores [4] +[2024-11-07 14:35:17,166][04709] Worker 7 uses CPU cores [0, 1, 2, 3, 4, 5, 6] +[2024-11-07 14:35:17,188][04705] Worker 2 uses CPU cores [2] +[2024-11-07 14:35:17,642][04702] Worker 0 uses CPU cores [0] +[2024-11-07 14:35:17,899][04707] Worker 5 uses CPU cores [5] +[2024-11-07 14:35:17,907][04701] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-11-07 14:35:17,907][04701] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 +[2024-11-07 14:35:17,932][04703] Worker 1 uses CPU cores [1] +[2024-11-07 14:35:17,951][04688] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-11-07 14:35:17,952][04688] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 +[2024-11-07 14:35:17,978][04688] Num visible devices: 1 +[2024-11-07 14:35:17,978][04701] Num visible devices: 1 +[2024-11-07 14:35:17,993][04688] Starting seed is not provided +[2024-11-07 14:35:17,993][04688] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-11-07 14:35:17,994][04688] Initializing actor-critic model on device cuda:0 +[2024-11-07 14:35:17,994][04688] RunningMeanStd input shape: (3, 72, 128) +[2024-11-07 14:35:17,995][04688] RunningMeanStd input shape: (1,) +[2024-11-07 14:35:18,006][04688] ConvEncoder: input_channels=3 +[2024-11-07 14:35:18,132][04688] Conv encoder output size: 512 +[2024-11-07 14:35:18,133][04688] Policy head output size: 512 +[2024-11-07 14:35:18,148][04688] Created Actor Critic model with architecture: +[2024-11-07 14:35:18,148][04688] ActorCriticSharedWeights( + (obs_normalizer): ObservationNormalizer( + (running_mean_std): RunningMeanStdDictInPlace( + (running_mean_std): ModuleDict( + (obs): RunningMeanStdInPlace() + ) + ) + ) + (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) + (encoder): VizdoomEncoder( + (basic_encoder): ConvEncoder( + (enc): RecursiveScriptModule( + original_name=ConvEncoderImpl + (conv_head): RecursiveScriptModule( + original_name=Sequential + (0): RecursiveScriptModule(original_name=Conv2d) + (1): RecursiveScriptModule(original_name=ELU) + (2): RecursiveScriptModule(original_name=Conv2d) + (3): RecursiveScriptModule(original_name=ELU) + (4): RecursiveScriptModule(original_name=Conv2d) + (5): RecursiveScriptModule(original_name=ELU) + ) + (mlp_layers): RecursiveScriptModule( + original_name=Sequential + (0): RecursiveScriptModule(original_name=Linear) + (1): RecursiveScriptModule(original_name=ELU) + ) + ) + ) + ) + (core): ModelCoreRNN( + (core): GRU(512, 512) + ) + (decoder): MlpDecoder( + (mlp): Identity() + ) + (critic_linear): Linear(in_features=512, out_features=1, bias=True) + (action_parameterization): ActionParameterizationDefault( + (distribution_linear): Linear(in_features=512, out_features=5, bias=True) + ) +) +[2024-11-07 14:35:18,249][04704] Worker 3 uses CPU cores [3] +[2024-11-07 14:35:18,791][04688] Using optimizer +[2024-11-07 14:35:19,775][04688] Loading state from checkpoint /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000986_4038656.pth... +[2024-11-07 14:35:19,840][04688] Loading model from checkpoint +[2024-11-07 14:35:19,842][04688] Loaded experiment state at self.train_step=986, self.env_steps=4038656 +[2024-11-07 14:35:19,842][04688] Initialized policy 0 weights for model version 986 +[2024-11-07 14:35:19,848][04688] LearnerWorker_p0 finished initialization! +[2024-11-07 14:35:19,849][04688] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-11-07 14:35:20,005][04701] RunningMeanStd input shape: (3, 72, 128) +[2024-11-07 14:35:20,006][04701] RunningMeanStd input shape: (1,) +[2024-11-07 14:35:20,019][04701] ConvEncoder: input_channels=3 +[2024-11-07 14:35:20,124][04701] Conv encoder output size: 512 +[2024-11-07 14:35:20,124][04701] Policy head output size: 512 +[2024-11-07 14:35:20,165][04584] Inference worker 0-0 is ready! +[2024-11-07 14:35:20,167][04584] All inference workers are ready! Signal rollout workers to start! +[2024-11-07 14:35:20,253][04705] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 14:35:20,255][04703] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 14:35:20,257][04706] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 14:35:20,259][04707] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 14:35:20,265][04704] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 14:35:20,283][04708] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 14:35:20,317][04709] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 14:35:20,332][04702] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 14:35:20,647][04705] Decorrelating experience for 0 frames... +[2024-11-07 14:35:20,650][04703] Decorrelating experience for 0 frames... +[2024-11-07 14:35:20,663][04707] Decorrelating experience for 0 frames... +[2024-11-07 14:35:20,681][04708] Decorrelating experience for 0 frames... +[2024-11-07 14:35:20,693][04709] Decorrelating experience for 0 frames... +[2024-11-07 14:35:20,977][04704] Decorrelating experience for 0 frames... +[2024-11-07 14:35:21,024][04708] Decorrelating experience for 32 frames... +[2024-11-07 14:35:21,043][04709] Decorrelating experience for 32 frames... +[2024-11-07 14:35:21,063][04705] Decorrelating experience for 32 frames... +[2024-11-07 14:35:21,064][04707] Decorrelating experience for 32 frames... +[2024-11-07 14:35:21,066][04703] Decorrelating experience for 32 frames... +[2024-11-07 14:35:21,373][04702] Decorrelating experience for 0 frames... +[2024-11-07 14:35:21,407][04704] Decorrelating experience for 32 frames... +[2024-11-07 14:35:21,456][04705] Decorrelating experience for 64 frames... +[2024-11-07 14:35:21,457][04708] Decorrelating experience for 64 frames... +[2024-11-07 14:35:21,525][04703] Decorrelating experience for 64 frames... +[2024-11-07 14:35:21,694][04702] Decorrelating experience for 32 frames... +[2024-11-07 14:35:21,817][04704] Decorrelating experience for 64 frames... +[2024-11-07 14:35:21,843][04705] Decorrelating experience for 96 frames... +[2024-11-07 14:35:21,844][04708] Decorrelating experience for 96 frames... +[2024-11-07 14:35:22,066][04703] Decorrelating experience for 96 frames... +[2024-11-07 14:35:22,068][04706] Decorrelating experience for 0 frames... +[2024-11-07 14:35:22,108][04584] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 4038656. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-11-07 14:35:22,184][04704] Decorrelating experience for 96 frames... +[2024-11-07 14:35:22,274][04702] Decorrelating experience for 64 frames... +[2024-11-07 14:35:22,403][04706] Decorrelating experience for 32 frames... +[2024-11-07 14:35:22,687][04702] Decorrelating experience for 96 frames... +[2024-11-07 14:35:22,831][04707] Decorrelating experience for 64 frames... +[2024-11-07 14:35:23,152][04706] Decorrelating experience for 64 frames... +[2024-11-07 14:35:23,300][04707] Decorrelating experience for 96 frames... +[2024-11-07 14:35:23,313][04709] Decorrelating experience for 64 frames... +[2024-11-07 14:35:23,775][04706] Decorrelating experience for 96 frames... +[2024-11-07 14:35:24,023][04709] Decorrelating experience for 96 frames... +[2024-11-07 14:35:24,784][04688] Signal inference workers to stop experience collection... +[2024-11-07 14:35:24,831][04701] InferenceWorker_p0-w0: stopping experience collection +[2024-11-07 14:35:27,108][04584] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 4038656. Throughput: 0: 546.0. Samples: 2730. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-11-07 14:35:27,109][04584] Avg episode reward: [(0, '2.632')] +[2024-11-07 14:35:29,009][04688] Signal inference workers to resume experience collection... +[2024-11-07 14:35:29,010][04701] InferenceWorker_p0-w0: resuming experience collection +[2024-11-07 14:35:31,197][04584] Heartbeat connected on Batcher_0 +[2024-11-07 14:35:31,200][04584] Heartbeat connected on LearnerWorker_p0 +[2024-11-07 14:35:31,212][04584] Heartbeat connected on RolloutWorker_w0 +[2024-11-07 14:35:31,216][04584] Heartbeat connected on RolloutWorker_w1 +[2024-11-07 14:35:31,223][04584] Heartbeat connected on InferenceWorker_p0-w0 +[2024-11-07 14:35:31,227][04584] Heartbeat connected on RolloutWorker_w2 +[2024-11-07 14:35:31,235][04584] Heartbeat connected on RolloutWorker_w3 +[2024-11-07 14:35:31,238][04584] Heartbeat connected on RolloutWorker_w6 +[2024-11-07 14:35:31,244][04584] Heartbeat connected on RolloutWorker_w4 +[2024-11-07 14:35:31,250][04584] Heartbeat connected on RolloutWorker_w5 +[2024-11-07 14:35:31,255][04584] Heartbeat connected on RolloutWorker_w7 +[2024-11-07 14:35:32,108][04584] Fps is (10 sec: 2867.1, 60 sec: 2867.1, 300 sec: 2867.1). Total num frames: 4067328. Throughput: 0: 299.6. Samples: 2996. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-07 14:35:32,111][04584] Avg episode reward: [(0, '3.943')] +[2024-11-07 14:35:35,402][04701] Updated weights for policy 0, policy_version 996 (0.0025) +[2024-11-07 14:35:37,108][04584] Fps is (10 sec: 5324.7, 60 sec: 3549.8, 300 sec: 3549.8). Total num frames: 4091904. Throughput: 0: 700.9. Samples: 10514. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-07 14:35:37,112][04584] Avg episode reward: [(0, '4.473')] +[2024-11-07 14:35:40,471][04701] Updated weights for policy 0, policy_version 1006 (0.0023) +[2024-11-07 14:35:42,108][04584] Fps is (10 sec: 6553.5, 60 sec: 4710.3, 300 sec: 4710.3). Total num frames: 4132864. Throughput: 0: 1134.7. Samples: 22694. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 14:35:42,112][04584] Avg episode reward: [(0, '4.427')] +[2024-11-07 14:35:45,559][04701] Updated weights for policy 0, policy_version 1016 (0.0033) +[2024-11-07 14:35:47,108][04584] Fps is (10 sec: 8192.2, 60 sec: 5406.7, 300 sec: 5406.7). Total num frames: 4173824. Throughput: 0: 1148.1. Samples: 28702. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 14:35:47,110][04584] Avg episode reward: [(0, '4.231')] +[2024-11-07 14:35:50,462][04701] Updated weights for policy 0, policy_version 1026 (0.0025) +[2024-11-07 14:35:52,108][04584] Fps is (10 sec: 8192.2, 60 sec: 5870.9, 300 sec: 5870.9). Total num frames: 4214784. Throughput: 0: 1364.1. Samples: 40924. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-11-07 14:35:52,111][04584] Avg episode reward: [(0, '4.551')] +[2024-11-07 14:35:55,421][04701] Updated weights for policy 0, policy_version 1036 (0.0029) +[2024-11-07 14:35:57,108][04584] Fps is (10 sec: 8192.0, 60 sec: 6202.5, 300 sec: 6202.5). Total num frames: 4255744. Throughput: 0: 1530.4. Samples: 53564. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 14:35:57,109][04584] Avg episode reward: [(0, '4.498')] +[2024-11-07 14:36:00,664][04701] Updated weights for policy 0, policy_version 1046 (0.0037) +[2024-11-07 14:36:02,109][04584] Fps is (10 sec: 7781.9, 60 sec: 6348.7, 300 sec: 6348.7). Total num frames: 4292608. Throughput: 0: 1490.7. Samples: 59628. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 14:36:02,118][04584] Avg episode reward: [(0, '4.486')] +[2024-11-07 14:36:07,108][04584] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6280.5). Total num frames: 4321280. Throughput: 0: 1538.8. Samples: 69244. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-07 14:36:07,110][04584] Avg episode reward: [(0, '4.349')] +[2024-11-07 14:36:07,113][04701] Updated weights for policy 0, policy_version 1056 (0.0030) +[2024-11-07 14:36:12,109][04584] Fps is (10 sec: 4915.4, 60 sec: 6062.0, 300 sec: 6062.0). Total num frames: 4341760. Throughput: 0: 1618.7. Samples: 75570. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-07 14:36:12,111][04584] Avg episode reward: [(0, '4.336')] +[2024-11-07 14:36:15,164][04701] Updated weights for policy 0, policy_version 1066 (0.0029) +[2024-11-07 14:36:17,109][04584] Fps is (10 sec: 5734.2, 60 sec: 6181.2, 300 sec: 6181.2). Total num frames: 4378624. Throughput: 0: 1729.5. Samples: 80826. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 14:36:17,114][04584] Avg episode reward: [(0, '4.405')] +[2024-11-07 14:36:20,267][04701] Updated weights for policy 0, policy_version 1076 (0.0026) +[2024-11-07 14:36:22,108][04584] Fps is (10 sec: 7782.7, 60 sec: 6348.8, 300 sec: 6348.8). Total num frames: 4419584. Throughput: 0: 1830.0. Samples: 92864. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 14:36:22,111][04584] Avg episode reward: [(0, '4.417')] +[2024-11-07 14:36:25,223][04701] Updated weights for policy 0, policy_version 1086 (0.0024) +[2024-11-07 14:36:27,108][04584] Fps is (10 sec: 8192.4, 60 sec: 7031.5, 300 sec: 6490.6). Total num frames: 4460544. Throughput: 0: 1831.8. Samples: 105126. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 14:36:27,110][04584] Avg episode reward: [(0, '4.564')] +[2024-11-07 14:36:30,159][04701] Updated weights for policy 0, policy_version 1096 (0.0032) +[2024-11-07 14:36:32,108][04584] Fps is (10 sec: 8192.0, 60 sec: 7236.3, 300 sec: 6612.1). Total num frames: 4501504. Throughput: 0: 1834.5. Samples: 111254. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 14:36:32,110][04584] Avg episode reward: [(0, '4.676')] +[2024-11-07 14:36:35,225][04701] Updated weights for policy 0, policy_version 1106 (0.0022) +[2024-11-07 14:36:37,108][04584] Fps is (10 sec: 8191.8, 60 sec: 7509.3, 300 sec: 6717.4). Total num frames: 4542464. Throughput: 0: 1839.1. Samples: 123682. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 14:36:37,110][04584] Avg episode reward: [(0, '4.525')] +[2024-11-07 14:36:40,134][04701] Updated weights for policy 0, policy_version 1116 (0.0024) +[2024-11-07 14:36:43,724][04584] Fps is (10 sec: 6699.8, 60 sec: 7246.0, 300 sec: 6624.6). Total num frames: 4579328. Throughput: 0: 1768.0. Samples: 135982. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 14:36:43,725][04584] Avg episode reward: [(0, '4.556')] +[2024-11-07 14:36:47,108][04584] Fps is (10 sec: 6553.6, 60 sec: 7236.2, 300 sec: 6698.1). Total num frames: 4608000. Throughput: 0: 1733.3. Samples: 137624. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 14:36:47,113][04584] Avg episode reward: [(0, '4.348')] +[2024-11-07 14:36:47,253][04701] Updated weights for policy 0, policy_version 1126 (0.0024) +[2024-11-07 14:36:52,108][04584] Fps is (10 sec: 8305.3, 60 sec: 7236.3, 300 sec: 6781.2). Total num frames: 4648960. Throughput: 0: 1783.3. Samples: 149492. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-11-07 14:36:52,109][04584] Avg episode reward: [(0, '4.319')] +[2024-11-07 14:36:52,633][04701] Updated weights for policy 0, policy_version 1136 (0.0028) +[2024-11-07 14:36:57,108][04584] Fps is (10 sec: 7782.6, 60 sec: 7168.0, 300 sec: 6812.3). Total num frames: 4685824. Throughput: 0: 1909.7. Samples: 161508. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 14:36:57,110][04584] Avg episode reward: [(0, '4.504')] +[2024-11-07 14:36:57,814][04701] Updated weights for policy 0, policy_version 1146 (0.0030) +[2024-11-07 14:37:02,108][04584] Fps is (10 sec: 7372.7, 60 sec: 7168.1, 300 sec: 6840.3). Total num frames: 4722688. Throughput: 0: 1923.7. Samples: 167394. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 14:37:02,110][04584] Avg episode reward: [(0, '4.470')] +[2024-11-07 14:37:03,385][04701] Updated weights for policy 0, policy_version 1156 (0.0031) +[2024-11-07 14:37:07,108][04584] Fps is (10 sec: 7782.3, 60 sec: 7372.8, 300 sec: 6904.7). Total num frames: 4763648. Throughput: 0: 1904.9. Samples: 178584. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 14:37:07,111][04584] Avg episode reward: [(0, '4.431')] +[2024-11-07 14:37:07,126][04688] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000001163_4763648.pth... +[2024-11-07 14:37:07,332][04688] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000984_4030464.pth +[2024-11-07 14:37:08,663][04701] Updated weights for policy 0, policy_version 1166 (0.0024) +[2024-11-07 14:37:12,108][04584] Fps is (10 sec: 7372.8, 60 sec: 7577.6, 300 sec: 6888.7). Total num frames: 4796416. Throughput: 0: 1873.3. Samples: 189424. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 14:37:12,110][04584] Avg episode reward: [(0, '4.392')] +[2024-11-07 14:37:14,273][04701] Updated weights for policy 0, policy_version 1176 (0.0024) +[2024-11-07 14:37:18,138][04584] Fps is (10 sec: 5941.6, 60 sec: 7382.6, 300 sec: 6813.1). Total num frames: 4829184. Throughput: 0: 1821.9. Samples: 195116. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 14:37:18,142][04584] Avg episode reward: [(0, '4.604')] +[2024-11-07 14:37:21,528][04701] Updated weights for policy 0, policy_version 1186 (0.0027) +[2024-11-07 14:37:22,108][04584] Fps is (10 sec: 6553.5, 60 sec: 7372.8, 300 sec: 6860.8). Total num frames: 4861952. Throughput: 0: 1756.1. Samples: 202708. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 14:37:22,110][04584] Avg episode reward: [(0, '4.486')] +[2024-11-07 14:37:26,690][04701] Updated weights for policy 0, policy_version 1196 (0.0022) +[2024-11-07 14:37:27,108][04584] Fps is (10 sec: 7763.0, 60 sec: 7304.5, 300 sec: 6881.3). Total num frames: 4898816. Throughput: 0: 1816.0. Samples: 214768. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 14:37:27,110][04584] Avg episode reward: [(0, '4.418')] +[2024-11-07 14:37:31,942][04701] Updated weights for policy 0, policy_version 1206 (0.0030) +[2024-11-07 14:37:32,108][04584] Fps is (10 sec: 7782.5, 60 sec: 7304.5, 300 sec: 6931.7). Total num frames: 4939776. Throughput: 0: 1847.4. Samples: 220758. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-07 14:37:32,110][04584] Avg episode reward: [(0, '4.316')] +[2024-11-07 14:37:36,986][04701] Updated weights for policy 0, policy_version 1216 (0.0022) +[2024-11-07 14:37:37,108][04584] Fps is (10 sec: 8192.1, 60 sec: 7304.6, 300 sec: 6978.4). Total num frames: 4980736. Throughput: 0: 1848.0. Samples: 232650. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 14:37:37,109][04584] Avg episode reward: [(0, '4.397')] +[2024-11-07 14:37:41,959][04701] Updated weights for policy 0, policy_version 1226 (0.0024) +[2024-11-07 14:37:42,109][04584] Fps is (10 sec: 8191.4, 60 sec: 7576.7, 300 sec: 7021.7). Total num frames: 5021696. Throughput: 0: 1853.3. Samples: 244906. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 14:37:42,111][04584] Avg episode reward: [(0, '4.489')] +[2024-11-07 14:37:46,940][04701] Updated weights for policy 0, policy_version 1236 (0.0028) +[2024-11-07 14:37:47,108][04584] Fps is (10 sec: 8191.8, 60 sec: 7577.6, 300 sec: 7062.1). Total num frames: 5062656. Throughput: 0: 1859.2. Samples: 251056. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 14:37:47,110][04584] Avg episode reward: [(0, '4.449')] +[2024-11-07 14:37:52,554][04584] Fps is (10 sec: 6274.2, 60 sec: 7250.6, 300 sec: 6969.8). Total num frames: 5087232. Throughput: 0: 1865.9. Samples: 263382. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 14:37:52,559][04584] Avg episode reward: [(0, '4.337')] +[2024-11-07 14:37:54,172][04701] Updated weights for policy 0, policy_version 1246 (0.0027) +[2024-11-07 14:37:57,108][04584] Fps is (10 sec: 6144.0, 60 sec: 7304.5, 300 sec: 7002.8). Total num frames: 5124096. Throughput: 0: 1812.0. Samples: 270964. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 14:37:57,110][04584] Avg episode reward: [(0, '4.357')] +[2024-11-07 14:37:59,230][04701] Updated weights for policy 0, policy_version 1256 (0.0026) +[2024-11-07 14:38:02,108][04584] Fps is (10 sec: 8145.9, 60 sec: 7372.8, 300 sec: 7040.0). Total num frames: 5165056. Throughput: 0: 1866.2. Samples: 277174. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 14:38:02,109][04584] Avg episode reward: [(0, '4.437')] +[2024-11-07 14:38:04,744][04701] Updated weights for policy 0, policy_version 1266 (0.0023) +[2024-11-07 14:38:07,108][04584] Fps is (10 sec: 7782.5, 60 sec: 7304.5, 300 sec: 7050.1). Total num frames: 5201920. Throughput: 0: 1901.1. Samples: 288256. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 14:38:07,110][04584] Avg episode reward: [(0, '4.331')] +[2024-11-07 14:38:09,768][04701] Updated weights for policy 0, policy_version 1276 (0.0026) +[2024-11-07 14:38:12,108][04584] Fps is (10 sec: 7782.4, 60 sec: 7441.1, 300 sec: 7083.7). Total num frames: 5242880. Throughput: 0: 1908.4. Samples: 300644. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 14:38:12,110][04584] Avg episode reward: [(0, '4.525')] +[2024-11-07 14:38:14,720][04701] Updated weights for policy 0, policy_version 1286 (0.0019) +[2024-11-07 14:38:17,108][04584] Fps is (10 sec: 8191.9, 60 sec: 7710.0, 300 sec: 7115.3). Total num frames: 5283840. Throughput: 0: 1914.2. Samples: 306896. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 14:38:17,110][04584] Avg episode reward: [(0, '4.575')] +[2024-11-07 14:38:19,757][04701] Updated weights for policy 0, policy_version 1296 (0.0029) +[2024-11-07 14:38:22,108][04584] Fps is (10 sec: 8192.0, 60 sec: 7714.2, 300 sec: 7145.2). Total num frames: 5324800. Throughput: 0: 1920.4. Samples: 319070. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 14:38:22,112][04584] Avg episode reward: [(0, '4.339')] +[2024-11-07 14:38:24,651][04701] Updated weights for policy 0, policy_version 1306 (0.0030) +[2024-11-07 14:38:27,108][04584] Fps is (10 sec: 6553.5, 60 sec: 7509.3, 300 sec: 7085.0). Total num frames: 5349376. Throughput: 0: 1862.0. Samples: 328694. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 14:38:27,110][04584] Avg episode reward: [(0, '4.349')] +[2024-11-07 14:38:31,931][04701] Updated weights for policy 0, policy_version 1316 (0.0027) +[2024-11-07 14:38:32,108][04584] Fps is (10 sec: 6553.6, 60 sec: 7509.4, 300 sec: 7114.1). Total num frames: 5390336. Throughput: 0: 1825.5. Samples: 333202. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 14:38:32,110][04584] Avg episode reward: [(0, '4.554')] +[2024-11-07 14:38:36,771][04701] Updated weights for policy 0, policy_version 1326 (0.0026) +[2024-11-07 14:38:37,108][04584] Fps is (10 sec: 8192.1, 60 sec: 7509.3, 300 sec: 7141.7). Total num frames: 5431296. Throughput: 0: 1844.8. Samples: 345574. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 14:38:37,111][04584] Avg episode reward: [(0, '4.685')] +[2024-11-07 14:38:41,807][04701] Updated weights for policy 0, policy_version 1336 (0.0021) +[2024-11-07 14:38:42,109][04584] Fps is (10 sec: 8191.4, 60 sec: 7509.4, 300 sec: 7168.0). Total num frames: 5472256. Throughput: 0: 1933.9. Samples: 357990. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 14:38:42,111][04584] Avg episode reward: [(0, '4.322')] +[2024-11-07 14:38:46,713][04701] Updated weights for policy 0, policy_version 1346 (0.0026) +[2024-11-07 14:38:47,109][04584] Fps is (10 sec: 8191.8, 60 sec: 7509.3, 300 sec: 7193.0). Total num frames: 5513216. Throughput: 0: 1934.4. Samples: 364222. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 14:38:47,112][04584] Avg episode reward: [(0, '4.330')] +[2024-11-07 14:38:51,629][04701] Updated weights for policy 0, policy_version 1356 (0.0022) +[2024-11-07 14:38:52,108][04584] Fps is (10 sec: 8192.4, 60 sec: 7840.7, 300 sec: 7216.8). Total num frames: 5554176. Throughput: 0: 1963.5. Samples: 376612. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 14:38:52,110][04584] Avg episode reward: [(0, '4.316')] +[2024-11-07 14:38:56,630][04701] Updated weights for policy 0, policy_version 1366 (0.0025) +[2024-11-07 14:38:57,109][04584] Fps is (10 sec: 8192.0, 60 sec: 7850.7, 300 sec: 7239.4). Total num frames: 5595136. Throughput: 0: 1961.9. Samples: 388932. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 14:38:57,111][04584] Avg episode reward: [(0, '4.435')] +[2024-11-07 14:39:02,108][04584] Fps is (10 sec: 6553.8, 60 sec: 7577.6, 300 sec: 7186.6). Total num frames: 5619712. Throughput: 0: 1958.1. Samples: 395012. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 14:39:02,110][04584] Avg episode reward: [(0, '4.515')] +[2024-11-07 14:39:04,078][04701] Updated weights for policy 0, policy_version 1376 (0.0027) +[2024-11-07 14:39:07,108][04584] Fps is (10 sec: 6144.1, 60 sec: 7577.6, 300 sec: 7190.7). Total num frames: 5656576. Throughput: 0: 1847.9. Samples: 402224. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 14:39:07,110][04584] Avg episode reward: [(0, '4.473')] +[2024-11-07 14:39:07,120][04688] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000001382_5660672.pth... +[2024-11-07 14:39:07,214][04688] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000000986_4038656.pth +[2024-11-07 14:39:09,137][04701] Updated weights for policy 0, policy_version 1386 (0.0028) +[2024-11-07 14:39:12,108][04584] Fps is (10 sec: 7782.3, 60 sec: 7577.6, 300 sec: 7212.5). Total num frames: 5697536. Throughput: 0: 1907.0. Samples: 414508. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 14:39:12,111][04584] Avg episode reward: [(0, '4.401')] +[2024-11-07 14:39:14,203][04701] Updated weights for policy 0, policy_version 1396 (0.0024) +[2024-11-07 14:39:17,110][04584] Fps is (10 sec: 8600.6, 60 sec: 7645.7, 300 sec: 7250.7). Total num frames: 5742592. Throughput: 0: 1942.3. Samples: 420606. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 14:39:17,112][04584] Avg episode reward: [(0, '4.251')] +[2024-11-07 14:39:19,131][04701] Updated weights for policy 0, policy_version 1406 (0.0027) +[2024-11-07 14:39:22,108][04584] Fps is (10 sec: 8601.8, 60 sec: 7645.9, 300 sec: 7270.4). Total num frames: 5783552. Throughput: 0: 1941.7. Samples: 432950. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 14:39:22,110][04584] Avg episode reward: [(0, '4.408')] +[2024-11-07 14:39:24,033][04701] Updated weights for policy 0, policy_version 1416 (0.0026) +[2024-11-07 14:39:27,108][04584] Fps is (10 sec: 8193.0, 60 sec: 7918.9, 300 sec: 7289.2). Total num frames: 5824512. Throughput: 0: 1942.5. Samples: 445400. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 14:39:27,112][04584] Avg episode reward: [(0, '4.368')] +[2024-11-07 14:39:29,061][04701] Updated weights for policy 0, policy_version 1426 (0.0028) +[2024-11-07 14:39:32,109][04584] Fps is (10 sec: 8191.0, 60 sec: 7918.8, 300 sec: 7307.2). Total num frames: 5865472. Throughput: 0: 1942.4. Samples: 451632. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 14:39:32,112][04584] Avg episode reward: [(0, '4.311')] +[2024-11-07 14:39:36,164][04701] Updated weights for policy 0, policy_version 1436 (0.0029) +[2024-11-07 14:39:37,108][04584] Fps is (10 sec: 6144.0, 60 sec: 7577.6, 300 sec: 7244.3). Total num frames: 5885952. Throughput: 0: 1871.7. Samples: 460840. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 14:39:37,110][04584] Avg episode reward: [(0, '4.291')] +[2024-11-07 14:39:41,153][04701] Updated weights for policy 0, policy_version 1446 (0.0026) +[2024-11-07 14:39:42,108][04584] Fps is (10 sec: 6144.7, 60 sec: 7577.7, 300 sec: 7262.5). Total num frames: 5926912. Throughput: 0: 1838.3. Samples: 471654. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 14:39:42,110][04584] Avg episode reward: [(0, '4.399')] +[2024-11-07 14:39:46,287][04701] Updated weights for policy 0, policy_version 1456 (0.0024) +[2024-11-07 14:39:47,109][04584] Fps is (10 sec: 8191.7, 60 sec: 7577.6, 300 sec: 7280.0). Total num frames: 5967872. Throughput: 0: 1836.7. Samples: 477666. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 14:39:47,111][04584] Avg episode reward: [(0, '4.391')] +[2024-11-07 14:39:51,647][04701] Updated weights for policy 0, policy_version 1466 (0.0024) +[2024-11-07 14:39:52,108][04584] Fps is (10 sec: 8192.0, 60 sec: 7577.6, 300 sec: 7296.9). Total num frames: 6008832. Throughput: 0: 1936.8. Samples: 489382. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 14:39:52,110][04584] Avg episode reward: [(0, '4.306')] +[2024-11-07 14:39:56,638][04701] Updated weights for policy 0, policy_version 1476 (0.0027) +[2024-11-07 14:39:57,109][04584] Fps is (10 sec: 7782.4, 60 sec: 7509.3, 300 sec: 7298.3). Total num frames: 6045696. Throughput: 0: 1930.7. Samples: 501392. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 14:39:57,111][04584] Avg episode reward: [(0, '4.586')] +[2024-11-07 14:40:01,928][04701] Updated weights for policy 0, policy_version 1486 (0.0022) +[2024-11-07 14:40:02,108][04584] Fps is (10 sec: 7782.5, 60 sec: 7782.4, 300 sec: 7314.3). Total num frames: 6086656. Throughput: 0: 1933.2. Samples: 507596. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 14:40:02,111][04584] Avg episode reward: [(0, '4.157')] +[2024-11-07 14:40:07,045][04701] Updated weights for policy 0, policy_version 1496 (0.0028) +[2024-11-07 14:40:07,108][04584] Fps is (10 sec: 8192.4, 60 sec: 7850.7, 300 sec: 7329.7). Total num frames: 6127616. Throughput: 0: 1921.9. Samples: 519434. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 14:40:07,110][04584] Avg episode reward: [(0, '4.650')] +[2024-11-07 14:40:12,108][04584] Fps is (10 sec: 5734.3, 60 sec: 7441.1, 300 sec: 7259.8). Total num frames: 6144000. Throughput: 0: 1798.1. Samples: 526316. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 14:40:12,115][04584] Avg episode reward: [(0, '4.644')] +[2024-11-07 14:40:15,781][04701] Updated weights for policy 0, policy_version 1506 (0.0037) +[2024-11-07 14:40:17,108][04584] Fps is (10 sec: 4915.2, 60 sec: 7236.4, 300 sec: 7247.8). Total num frames: 6176768. Throughput: 0: 1762.2. Samples: 530928. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 14:40:17,111][04584] Avg episode reward: [(0, '4.390')] +[2024-11-07 14:40:21,601][04701] Updated weights for policy 0, policy_version 1516 (0.0034) +[2024-11-07 14:40:22,108][04584] Fps is (10 sec: 6553.7, 60 sec: 7099.7, 300 sec: 7358.9). Total num frames: 6209536. Throughput: 0: 1769.1. Samples: 540450. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 14:40:22,110][04584] Avg episode reward: [(0, '4.347')] +[2024-11-07 14:40:26,748][04701] Updated weights for policy 0, policy_version 1526 (0.0027) +[2024-11-07 14:40:27,109][04584] Fps is (10 sec: 7372.5, 60 sec: 7099.7, 300 sec: 7400.6). Total num frames: 6250496. Throughput: 0: 1796.7. Samples: 552504. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 14:40:27,110][04584] Avg episode reward: [(0, '4.378')] +[2024-11-07 14:40:31,804][04701] Updated weights for policy 0, policy_version 1536 (0.0030) +[2024-11-07 14:40:32,108][04584] Fps is (10 sec: 8191.9, 60 sec: 7099.9, 300 sec: 7456.1). Total num frames: 6291456. Throughput: 0: 1797.6. Samples: 558558. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 14:40:32,111][04584] Avg episode reward: [(0, '4.402')] +[2024-11-07 14:40:36,996][04701] Updated weights for policy 0, policy_version 1546 (0.0025) +[2024-11-07 14:40:37,108][04584] Fps is (10 sec: 8192.1, 60 sec: 7441.1, 300 sec: 7456.1). Total num frames: 6332416. Throughput: 0: 1804.4. Samples: 570582. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 14:40:37,110][04584] Avg episode reward: [(0, '4.223')] +[2024-11-07 14:40:42,112][04584] Fps is (10 sec: 7779.5, 60 sec: 7372.4, 300 sec: 7442.1). Total num frames: 6369280. Throughput: 0: 1800.2. Samples: 582408. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 14:40:42,115][04584] Avg episode reward: [(0, '4.626')] +[2024-11-07 14:40:42,246][04701] Updated weights for policy 0, policy_version 1556 (0.0021) +[2024-11-07 14:40:47,109][04584] Fps is (10 sec: 5734.3, 60 sec: 7031.5, 300 sec: 7372.8). Total num frames: 6389760. Throughput: 0: 1715.1. Samples: 584778. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 14:40:47,111][04584] Avg episode reward: [(0, '4.667')] +[2024-11-07 14:40:50,289][04701] Updated weights for policy 0, policy_version 1566 (0.0036) +[2024-11-07 14:40:52,108][04584] Fps is (10 sec: 5736.4, 60 sec: 6963.2, 300 sec: 7358.9). Total num frames: 6426624. Throughput: 0: 1668.8. Samples: 594530. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 14:40:52,111][04584] Avg episode reward: [(0, '4.452')] +[2024-11-07 14:40:55,426][04701] Updated weights for policy 0, policy_version 1576 (0.0025) +[2024-11-07 14:40:57,108][04584] Fps is (10 sec: 7782.4, 60 sec: 7031.5, 300 sec: 7372.8). Total num frames: 6467584. Throughput: 0: 1779.1. Samples: 606376. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 14:40:57,110][04584] Avg episode reward: [(0, '4.567')] +[2024-11-07 14:41:00,789][04701] Updated weights for policy 0, policy_version 1586 (0.0028) +[2024-11-07 14:41:02,109][04584] Fps is (10 sec: 7781.9, 60 sec: 6963.1, 300 sec: 7400.5). Total num frames: 6504448. Throughput: 0: 1799.3. Samples: 611900. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 14:41:02,112][04584] Avg episode reward: [(0, '4.513')] +[2024-11-07 14:41:06,194][04701] Updated weights for policy 0, policy_version 1596 (0.0034) +[2024-11-07 14:41:07,108][04584] Fps is (10 sec: 7373.0, 60 sec: 6894.9, 300 sec: 7456.1). Total num frames: 6541312. Throughput: 0: 1839.9. Samples: 623244. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 14:41:07,110][04584] Avg episode reward: [(0, '4.449')] +[2024-11-07 14:41:07,219][04688] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000001598_6545408.pth... +[2024-11-07 14:41:07,348][04688] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000001163_4763648.pth +[2024-11-07 14:41:11,911][04701] Updated weights for policy 0, policy_version 1606 (0.0033) +[2024-11-07 14:41:12,108][04584] Fps is (10 sec: 7373.5, 60 sec: 7236.3, 300 sec: 7456.1). Total num frames: 6578176. Throughput: 0: 1821.6. Samples: 634474. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 14:41:12,110][04584] Avg episode reward: [(0, '4.328')] +[2024-11-07 14:41:18,681][04584] Fps is (10 sec: 6370.7, 60 sec: 7117.9, 300 sec: 7402.7). Total num frames: 6615040. Throughput: 0: 1743.5. Samples: 639758. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 14:41:18,683][04584] Avg episode reward: [(0, '4.432')] +[2024-11-07 14:41:18,837][04701] Updated weights for policy 0, policy_version 1616 (0.0024) +[2024-11-07 14:41:22,108][04584] Fps is (10 sec: 6144.0, 60 sec: 7168.0, 300 sec: 7386.7). Total num frames: 6639616. Throughput: 0: 1727.3. Samples: 648310. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 14:41:22,112][04584] Avg episode reward: [(0, '4.391')] +[2024-11-07 14:41:24,288][04701] Updated weights for policy 0, policy_version 1626 (0.0036) +[2024-11-07 14:41:27,108][04584] Fps is (10 sec: 7777.0, 60 sec: 7168.0, 300 sec: 7386.7). Total num frames: 6680576. Throughput: 0: 1725.2. Samples: 660036. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 14:41:27,110][04584] Avg episode reward: [(0, '4.320')] +[2024-11-07 14:41:29,172][04701] Updated weights for policy 0, policy_version 1636 (0.0028) +[2024-11-07 14:41:32,108][04584] Fps is (10 sec: 8192.0, 60 sec: 7168.0, 300 sec: 7386.7). Total num frames: 6721536. Throughput: 0: 1813.1. Samples: 666366. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 14:41:32,110][04584] Avg episode reward: [(0, '4.490')] +[2024-11-07 14:41:34,208][04701] Updated weights for policy 0, policy_version 1646 (0.0023) +[2024-11-07 14:41:37,108][04584] Fps is (10 sec: 8192.0, 60 sec: 7168.0, 300 sec: 7441.3). Total num frames: 6762496. Throughput: 0: 1863.7. Samples: 678394. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-11-07 14:41:37,109][04584] Avg episode reward: [(0, '4.537')] +[2024-11-07 14:41:39,513][04701] Updated weights for policy 0, policy_version 1656 (0.0026) +[2024-11-07 14:41:42,108][04584] Fps is (10 sec: 7782.3, 60 sec: 7168.5, 300 sec: 7428.3). Total num frames: 6799360. Throughput: 0: 1856.8. Samples: 689932. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 14:41:42,110][04584] Avg episode reward: [(0, '4.371')] +[2024-11-07 14:41:44,930][04701] Updated weights for policy 0, policy_version 1666 (0.0028) +[2024-11-07 14:41:47,108][04584] Fps is (10 sec: 7782.4, 60 sec: 7509.4, 300 sec: 7428.3). Total num frames: 6840320. Throughput: 0: 1863.2. Samples: 695744. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 14:41:47,110][04584] Avg episode reward: [(0, '4.330')] +[2024-11-07 14:41:50,403][04701] Updated weights for policy 0, policy_version 1676 (0.0030) +[2024-11-07 14:41:53,381][04584] Fps is (10 sec: 6177.0, 60 sec: 7219.7, 300 sec: 7368.8). Total num frames: 6868992. Throughput: 0: 1810.9. Samples: 707038. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 14:41:53,382][04584] Avg episode reward: [(0, '4.407')] +[2024-11-07 14:41:57,108][04584] Fps is (10 sec: 5734.4, 60 sec: 7168.0, 300 sec: 7372.8). Total num frames: 6897664. Throughput: 0: 1764.1. Samples: 713860. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 14:41:57,110][04584] Avg episode reward: [(0, '4.690')] +[2024-11-07 14:41:58,056][04701] Updated weights for policy 0, policy_version 1686 (0.0034) +[2024-11-07 14:42:02,109][04584] Fps is (10 sec: 7509.1, 60 sec: 7168.1, 300 sec: 7358.9). Total num frames: 6934528. Throughput: 0: 1843.5. Samples: 719816. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 14:42:02,110][04584] Avg episode reward: [(0, '4.442')] +[2024-11-07 14:42:03,920][04701] Updated weights for policy 0, policy_version 1696 (0.0041) +[2024-11-07 14:42:07,108][04584] Fps is (10 sec: 6963.1, 60 sec: 7099.7, 300 sec: 7358.9). Total num frames: 6967296. Throughput: 0: 1819.1. Samples: 730168. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 14:42:07,111][04584] Avg episode reward: [(0, '4.615')] +[2024-11-07 14:42:09,344][04701] Updated weights for policy 0, policy_version 1706 (0.0034) +[2024-11-07 14:42:12,108][04584] Fps is (10 sec: 6963.5, 60 sec: 7099.7, 300 sec: 7398.6). Total num frames: 7004160. Throughput: 0: 1801.8. Samples: 741116. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 14:42:12,111][04584] Avg episode reward: [(0, '4.542')] +[2024-11-07 14:42:15,345][04701] Updated weights for policy 0, policy_version 1716 (0.0029) +[2024-11-07 14:42:17,109][04584] Fps is (10 sec: 7372.2, 60 sec: 7290.8, 300 sec: 7386.7). Total num frames: 7041024. Throughput: 0: 1774.6. Samples: 746224. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 14:42:17,130][04584] Avg episode reward: [(0, '4.596')] +[2024-11-07 14:42:20,616][04701] Updated weights for policy 0, policy_version 1726 (0.0025) +[2024-11-07 14:42:22,108][04584] Fps is (10 sec: 7372.7, 60 sec: 7304.5, 300 sec: 7386.7). Total num frames: 7077888. Throughput: 0: 1764.3. Samples: 757786. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 14:42:22,110][04584] Avg episode reward: [(0, '4.395')] +[2024-11-07 14:42:27,773][04584] Fps is (10 sec: 6145.8, 60 sec: 7022.0, 300 sec: 7328.5). Total num frames: 7106560. Throughput: 0: 1616.7. Samples: 763760. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 14:42:27,774][04584] Avg episode reward: [(0, '4.528')] +[2024-11-07 14:42:27,814][04701] Updated weights for policy 0, policy_version 1736 (0.0034) +[2024-11-07 14:42:32,108][04584] Fps is (10 sec: 6553.6, 60 sec: 7031.4, 300 sec: 7331.1). Total num frames: 7143424. Throughput: 0: 1680.9. Samples: 771384. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 14:42:32,113][04584] Avg episode reward: [(0, '4.745')] +[2024-11-07 14:42:33,100][04701] Updated weights for policy 0, policy_version 1746 (0.0023) +[2024-11-07 14:42:37,108][04584] Fps is (10 sec: 8336.2, 60 sec: 7031.4, 300 sec: 7331.2). Total num frames: 7184384. Throughput: 0: 1741.0. Samples: 783168. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 14:42:37,110][04584] Avg episode reward: [(0, '4.300')] +[2024-11-07 14:42:38,140][04701] Updated weights for policy 0, policy_version 1756 (0.0034) +[2024-11-07 14:42:42,108][04584] Fps is (10 sec: 7372.9, 60 sec: 6963.2, 300 sec: 7303.4). Total num frames: 7217152. Throughput: 0: 1784.7. Samples: 794170. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-07 14:42:42,110][04584] Avg episode reward: [(0, '4.610')] +[2024-11-07 14:42:43,824][04701] Updated weights for policy 0, policy_version 1766 (0.0026) +[2024-11-07 14:42:47,108][04584] Fps is (10 sec: 7372.9, 60 sec: 6963.2, 300 sec: 7370.1). Total num frames: 7258112. Throughput: 0: 1789.5. Samples: 800344. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-07 14:42:47,110][04584] Avg episode reward: [(0, '4.393')] +[2024-11-07 14:42:49,006][04701] Updated weights for policy 0, policy_version 1776 (0.0030) +[2024-11-07 14:42:52,109][04584] Fps is (10 sec: 7781.6, 60 sec: 7253.5, 300 sec: 7358.9). Total num frames: 7294976. Throughput: 0: 1817.0. Samples: 811934. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 14:42:52,113][04584] Avg episode reward: [(0, '4.440')] +[2024-11-07 14:42:54,207][04701] Updated weights for policy 0, policy_version 1786 (0.0029) +[2024-11-07 14:42:57,108][04584] Fps is (10 sec: 7782.3, 60 sec: 7304.5, 300 sec: 7358.9). Total num frames: 7335936. Throughput: 0: 1841.5. Samples: 823984. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 14:42:57,110][04584] Avg episode reward: [(0, '4.300')] +[2024-11-07 14:42:59,392][04701] Updated weights for policy 0, policy_version 1796 (0.0028) +[2024-11-07 14:43:02,199][04584] Fps is (10 sec: 6495.4, 60 sec: 7089.1, 300 sec: 7315.0). Total num frames: 7360512. Throughput: 0: 1856.0. Samples: 829910. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 14:43:02,202][04584] Avg episode reward: [(0, '4.381')] +[2024-11-07 14:43:06,958][04701] Updated weights for policy 0, policy_version 1806 (0.0026) +[2024-11-07 14:43:07,108][04584] Fps is (10 sec: 6144.0, 60 sec: 7168.0, 300 sec: 7303.4). Total num frames: 7397376. Throughput: 0: 1761.0. Samples: 837030. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 14:43:07,110][04584] Avg episode reward: [(0, '4.652')] +[2024-11-07 14:43:07,127][04688] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000001806_7397376.pth... +[2024-11-07 14:43:07,318][04688] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000001382_5660672.pth +[2024-11-07 14:43:12,082][04701] Updated weights for policy 0, policy_version 1816 (0.0029) +[2024-11-07 14:43:12,108][04584] Fps is (10 sec: 7853.5, 60 sec: 7236.3, 300 sec: 7303.4). Total num frames: 7438336. Throughput: 0: 1917.2. Samples: 848760. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 14:43:12,110][04584] Avg episode reward: [(0, '4.544')] +[2024-11-07 14:43:17,108][04584] Fps is (10 sec: 7782.6, 60 sec: 7236.4, 300 sec: 7289.5). Total num frames: 7475200. Throughput: 0: 1853.6. Samples: 854794. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 14:43:17,111][04584] Avg episode reward: [(0, '4.361')] +[2024-11-07 14:43:17,153][04701] Updated weights for policy 0, policy_version 1826 (0.0025) +[2024-11-07 14:43:22,108][04584] Fps is (10 sec: 7782.4, 60 sec: 7304.5, 300 sec: 7345.0). Total num frames: 7516160. Throughput: 0: 1864.3. Samples: 867062. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 14:43:22,112][04584] Avg episode reward: [(0, '4.248')] +[2024-11-07 14:43:22,238][04701] Updated weights for policy 0, policy_version 1836 (0.0026) +[2024-11-07 14:43:27,109][04584] Fps is (10 sec: 8191.7, 60 sec: 7593.4, 300 sec: 7345.0). Total num frames: 7557120. Throughput: 0: 1885.3. Samples: 879010. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 14:43:27,111][04584] Avg episode reward: [(0, '4.410')] +[2024-11-07 14:43:27,492][04701] Updated weights for policy 0, policy_version 1846 (0.0023) +[2024-11-07 14:43:32,108][04584] Fps is (10 sec: 7782.3, 60 sec: 7509.3, 300 sec: 7331.1). Total num frames: 7593984. Throughput: 0: 1875.4. Samples: 884738. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 14:43:32,110][04584] Avg episode reward: [(0, '4.527')] +[2024-11-07 14:43:32,694][04701] Updated weights for policy 0, policy_version 1856 (0.0022) +[2024-11-07 14:43:37,108][04584] Fps is (10 sec: 6144.2, 60 sec: 7236.3, 300 sec: 7275.6). Total num frames: 7618560. Throughput: 0: 1849.5. Samples: 895158. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 14:43:37,110][04584] Avg episode reward: [(0, '4.513')] +[2024-11-07 14:43:39,970][04701] Updated weights for policy 0, policy_version 1866 (0.0035) +[2024-11-07 14:43:42,108][04584] Fps is (10 sec: 6553.6, 60 sec: 7372.8, 300 sec: 7275.6). Total num frames: 7659520. Throughput: 0: 1785.7. Samples: 904342. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 14:43:42,110][04584] Avg episode reward: [(0, '4.196')] +[2024-11-07 14:43:45,251][04701] Updated weights for policy 0, policy_version 1876 (0.0029) +[2024-11-07 14:43:47,108][04584] Fps is (10 sec: 7782.4, 60 sec: 7304.5, 300 sec: 7261.7). Total num frames: 7696384. Throughput: 0: 1783.9. Samples: 910026. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 14:43:47,110][04584] Avg episode reward: [(0, '4.252')] +[2024-11-07 14:43:50,305][04701] Updated weights for policy 0, policy_version 1886 (0.0024) +[2024-11-07 14:43:52,108][04584] Fps is (10 sec: 7782.5, 60 sec: 7372.9, 300 sec: 7261.7). Total num frames: 7737344. Throughput: 0: 1890.9. Samples: 922118. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 14:43:52,110][04584] Avg episode reward: [(0, '4.414')] +[2024-11-07 14:43:55,379][04701] Updated weights for policy 0, policy_version 1896 (0.0020) +[2024-11-07 14:43:57,109][04584] Fps is (10 sec: 8191.5, 60 sec: 7372.7, 300 sec: 7317.2). Total num frames: 7778304. Throughput: 0: 1901.1. Samples: 934312. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 14:43:57,111][04584] Avg episode reward: [(0, '4.448')] +[2024-11-07 14:44:00,489][04701] Updated weights for policy 0, policy_version 1906 (0.0023) +[2024-11-07 14:44:02,108][04584] Fps is (10 sec: 7782.3, 60 sec: 7589.0, 300 sec: 7317.3). Total num frames: 7815168. Throughput: 0: 1898.8. Samples: 940240. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-11-07 14:44:02,111][04584] Avg episode reward: [(0, '4.370')] +[2024-11-07 14:44:05,857][04701] Updated weights for policy 0, policy_version 1916 (0.0025) +[2024-11-07 14:44:07,108][04584] Fps is (10 sec: 7782.9, 60 sec: 7645.9, 300 sec: 7317.3). Total num frames: 7856128. Throughput: 0: 1880.7. Samples: 951692. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-07 14:44:07,110][04584] Avg episode reward: [(0, '4.466')] +[2024-11-07 14:44:12,108][04584] Fps is (10 sec: 6144.1, 60 sec: 7304.6, 300 sec: 7234.0). Total num frames: 7876608. Throughput: 0: 1785.3. Samples: 959350. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 14:44:12,110][04584] Avg episode reward: [(0, '4.323')] +[2024-11-07 14:44:13,147][04701] Updated weights for policy 0, policy_version 1926 (0.0023) +[2024-11-07 14:44:17,108][04584] Fps is (10 sec: 6143.9, 60 sec: 7372.8, 300 sec: 7233.9). Total num frames: 7917568. Throughput: 0: 1789.9. Samples: 965282. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 14:44:17,111][04584] Avg episode reward: [(0, '4.247')] +[2024-11-07 14:44:18,201][04701] Updated weights for policy 0, policy_version 1936 (0.0026) +[2024-11-07 14:44:22,109][04584] Fps is (10 sec: 8191.6, 60 sec: 7372.8, 300 sec: 7233.9). Total num frames: 7958528. Throughput: 0: 1830.9. Samples: 977550. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 14:44:22,114][04584] Avg episode reward: [(0, '4.299')] +[2024-11-07 14:44:23,245][04701] Updated weights for policy 0, policy_version 1946 (0.0026) +[2024-11-07 14:44:27,109][04584] Fps is (10 sec: 8601.1, 60 sec: 7441.0, 300 sec: 7247.8). Total num frames: 8003584. Throughput: 0: 1905.0. Samples: 990068. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-07 14:44:27,111][04584] Avg episode reward: [(0, '4.466')] +[2024-11-07 14:44:27,617][04688] Stopping Batcher_0... +[2024-11-07 14:44:27,618][04688] Loop batcher_evt_loop terminating... +[2024-11-07 14:44:27,620][04688] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000001955_8007680.pth... +[2024-11-07 14:44:27,619][04584] Component Batcher_0 stopped! +[2024-11-07 14:44:27,676][04701] Weights refcount: 2 0 +[2024-11-07 14:44:27,678][04701] Stopping InferenceWorker_p0-w0... +[2024-11-07 14:44:27,679][04701] Loop inference_proc0-0_evt_loop terminating... +[2024-11-07 14:44:27,678][04584] Component InferenceWorker_p0-w0 stopped! +[2024-11-07 14:44:27,709][04688] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000001598_6545408.pth +[2024-11-07 14:44:27,722][04688] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000001955_8007680.pth... +[2024-11-07 14:44:27,750][04584] Component RolloutWorker_w0 stopped! +[2024-11-07 14:44:27,751][04702] Stopping RolloutWorker_w0... +[2024-11-07 14:44:27,752][04702] Loop rollout_proc0_evt_loop terminating... +[2024-11-07 14:44:27,756][04703] Stopping RolloutWorker_w1... +[2024-11-07 14:44:27,756][04584] Component RolloutWorker_w1 stopped! +[2024-11-07 14:44:27,759][04703] Loop rollout_proc1_evt_loop terminating... +[2024-11-07 14:44:27,765][04707] Stopping RolloutWorker_w5... +[2024-11-07 14:44:27,768][04707] Loop rollout_proc5_evt_loop terminating... +[2024-11-07 14:44:27,772][04706] Stopping RolloutWorker_w4... +[2024-11-07 14:44:27,774][04706] Loop rollout_proc4_evt_loop terminating... +[2024-11-07 14:44:27,765][04584] Component RolloutWorker_w5 stopped! +[2024-11-07 14:44:27,796][04584] Component RolloutWorker_w4 stopped! +[2024-11-07 14:44:27,812][04708] Stopping RolloutWorker_w6... +[2024-11-07 14:44:27,814][04708] Loop rollout_proc6_evt_loop terminating... +[2024-11-07 14:44:27,812][04584] Component RolloutWorker_w6 stopped! +[2024-11-07 14:44:27,845][04688] Stopping LearnerWorker_p0... +[2024-11-07 14:44:27,846][04688] Loop learner_proc0_evt_loop terminating... +[2024-11-07 14:44:27,846][04584] Component LearnerWorker_p0 stopped! +[2024-11-07 14:44:27,888][04709] Stopping RolloutWorker_w7... +[2024-11-07 14:44:27,889][04584] Component RolloutWorker_w7 stopped! +[2024-11-07 14:44:27,908][04709] Loop rollout_proc7_evt_loop terminating... +[2024-11-07 14:44:27,913][04704] Stopping RolloutWorker_w3... +[2024-11-07 14:44:27,913][04584] Component RolloutWorker_w3 stopped! +[2024-11-07 14:44:27,916][04704] Loop rollout_proc3_evt_loop terminating... +[2024-11-07 14:44:28,016][04705] Stopping RolloutWorker_w2... +[2024-11-07 14:44:28,016][04584] Component RolloutWorker_w2 stopped! +[2024-11-07 14:44:28,019][04705] Loop rollout_proc2_evt_loop terminating... +[2024-11-07 14:44:28,019][04584] Waiting for process learner_proc0 to stop... +[2024-11-07 14:44:29,498][04584] Waiting for process inference_proc0-0 to join... +[2024-11-07 14:44:29,500][04584] Waiting for process rollout_proc0 to join... +[2024-11-07 14:44:29,501][04584] Waiting for process rollout_proc1 to join... +[2024-11-07 14:44:29,504][04584] Waiting for process rollout_proc2 to join... +[2024-11-07 14:44:29,653][04584] Waiting for process rollout_proc3 to join... +[2024-11-07 14:44:29,655][04584] Waiting for process rollout_proc4 to join... +[2024-11-07 14:44:29,656][04584] Waiting for process rollout_proc5 to join... +[2024-11-07 14:44:29,657][04584] Waiting for process rollout_proc6 to join... +[2024-11-07 14:44:29,660][04584] Waiting for process rollout_proc7 to join... +[2024-11-07 14:44:29,661][04584] Batcher 0 profile tree view: +batching: 27.4377, releasing_batches: 0.0459 +[2024-11-07 14:44:29,663][04584] InferenceWorker_p0-w0 profile tree view: +wait_policy: 0.0000 + wait_policy_total: 6.5812 +update_model: 7.6567 + weight_update: 0.0027 +one_step: 0.0070 + handle_policy_step: 510.3365 + deserialize: 13.7673, stack: 2.4265, obs_to_device_normalize: 150.3995, forward: 227.4887, send_messages: 31.2471 + prepare_outputs: 68.6568 + to_cpu: 53.1126 +[2024-11-07 14:44:29,665][04584] Learner 0 profile tree view: +misc: 0.0053, prepare_batch: 30.8301 +train: 106.7453 + epoch_init: 0.0079, minibatch_init: 0.0131, losses_postprocess: 0.8354, kl_divergence: 0.9434, after_optimizer: 4.0108 + calculate_losses: 31.3550 + losses_init: 0.0058, forward_head: 1.9126, bptt_initial: 21.8044, tail: 1.0207, advantages_returns: 0.3056, losses: 3.3075 + bptt: 2.6948 + bptt_forward_core: 2.5765 + update: 68.9694 + clip: 1.2501 +[2024-11-07 14:44:29,666][04584] RolloutWorker_w0 profile tree view: +wait_for_trajectories: 0.2375, enqueue_policy_requests: 11.9428, env_step: 149.5260, overhead: 10.7908, complete_rollouts: 0.6476 +save_policy_outputs: 16.0346 + split_output_tensors: 5.4611 +[2024-11-07 14:44:29,670][04584] RolloutWorker_w7 profile tree view: +wait_for_trajectories: 0.2022, enqueue_policy_requests: 12.7636, env_step: 249.0214, overhead: 10.6673, complete_rollouts: 0.3962 +save_policy_outputs: 18.4068 + split_output_tensors: 7.8369 +[2024-11-07 14:44:29,672][04584] Loop Runner_EvtLoop terminating... +[2024-11-07 14:44:29,675][04584] Runner profile tree view: +main_loop: 558.4363 +[2024-11-07 14:44:29,676][04584] Collected {0: 8007680}, FPS: 7107.4 +[2024-11-07 14:44:30,065][04584] Loading existing experiment configuration from /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/config.json +[2024-11-07 14:44:30,066][04584] Overriding arg 'num_workers' with value 1 passed from command line +[2024-11-07 14:44:30,067][04584] Adding new argument 'no_render'=True that is not in the saved config file! +[2024-11-07 14:44:30,068][04584] Adding new argument 'save_video'=True that is not in the saved config file! +[2024-11-07 14:44:30,071][04584] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! +[2024-11-07 14:44:30,072][04584] Adding new argument 'video_name'=None that is not in the saved config file! +[2024-11-07 14:44:30,074][04584] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! +[2024-11-07 14:44:30,075][04584] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! +[2024-11-07 14:44:30,077][04584] Adding new argument 'push_to_hub'=False that is not in the saved config file! +[2024-11-07 14:44:30,079][04584] Adding new argument 'hf_repository'=None that is not in the saved config file! +[2024-11-07 14:44:30,081][04584] Adding new argument 'policy_index'=0 that is not in the saved config file! +[2024-11-07 14:44:30,082][04584] Adding new argument 'eval_deterministic'=False that is not in the saved config file! +[2024-11-07 14:44:30,084][04584] Adding new argument 'train_script'=None that is not in the saved config file! +[2024-11-07 14:44:30,085][04584] Adding new argument 'enjoy_script'=None that is not in the saved config file! +[2024-11-07 14:44:30,088][04584] Using frameskip 1 and render_action_repeat=4 for evaluation +[2024-11-07 14:44:30,176][04584] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 14:44:30,189][04584] RunningMeanStd input shape: (3, 72, 128) +[2024-11-07 14:44:30,193][04584] RunningMeanStd input shape: (1,) +[2024-11-07 14:44:30,218][04584] ConvEncoder: input_channels=3 +[2024-11-07 14:44:30,407][04584] Conv encoder output size: 512 +[2024-11-07 14:44:30,408][04584] Policy head output size: 512 +[2024-11-07 14:44:31,399][04584] Loading state from checkpoint /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000001955_8007680.pth... +[2024-11-07 14:44:32,337][04584] Num frames 100... +[2024-11-07 14:44:32,546][04584] Num frames 200... +[2024-11-07 14:44:32,757][04584] Num frames 300... +[2024-11-07 14:44:32,967][04584] Num frames 400... +[2024-11-07 14:44:33,113][04584] Avg episode rewards: #0: 5.480, true rewards: #0: 4.480 +[2024-11-07 14:44:33,114][04584] Avg episode reward: 5.480, avg true_objective: 4.480 +[2024-11-07 14:44:33,216][04584] Num frames 500... +[2024-11-07 14:44:33,402][04584] Num frames 600... +[2024-11-07 14:44:33,611][04584] Num frames 700... +[2024-11-07 14:44:33,809][04584] Num frames 800... +[2024-11-07 14:44:34,017][04584] Num frames 900... +[2024-11-07 14:44:34,144][04584] Avg episode rewards: #0: 6.140, true rewards: #0: 4.640 +[2024-11-07 14:44:34,146][04584] Avg episode reward: 6.140, avg true_objective: 4.640 +[2024-11-07 14:44:34,289][04584] Num frames 1000... +[2024-11-07 14:44:34,485][04584] Num frames 1100... +[2024-11-07 14:44:34,693][04584] Num frames 1200... +[2024-11-07 14:44:34,909][04584] Avg episode rewards: #0: 5.600, true rewards: #0: 4.267 +[2024-11-07 14:44:34,910][04584] Avg episode reward: 5.600, avg true_objective: 4.267 +[2024-11-07 14:44:34,955][04584] Num frames 1300... +[2024-11-07 14:44:35,146][04584] Num frames 1400... +[2024-11-07 14:44:35,332][04584] Num frames 1500... +[2024-11-07 14:44:35,527][04584] Num frames 1600... +[2024-11-07 14:44:35,702][04584] Avg episode rewards: #0: 5.160, true rewards: #0: 4.160 +[2024-11-07 14:44:35,706][04584] Avg episode reward: 5.160, avg true_objective: 4.160 +[2024-11-07 14:44:35,791][04584] Num frames 1700... +[2024-11-07 14:44:35,959][04584] Num frames 1800... +[2024-11-07 14:44:36,136][04584] Num frames 1900... +[2024-11-07 14:44:36,333][04584] Num frames 2000... +[2024-11-07 14:44:36,475][04584] Avg episode rewards: #0: 4.896, true rewards: #0: 4.096 +[2024-11-07 14:44:36,477][04584] Avg episode reward: 4.896, avg true_objective: 4.096 +[2024-11-07 14:44:36,590][04584] Num frames 2100... +[2024-11-07 14:44:36,787][04584] Num frames 2200... +[2024-11-07 14:44:36,980][04584] Num frames 2300... +[2024-11-07 14:44:37,181][04584] Num frames 2400... +[2024-11-07 14:44:37,439][04584] Avg episode rewards: #0: 4.993, true rewards: #0: 4.160 +[2024-11-07 14:44:37,440][04584] Avg episode reward: 4.993, avg true_objective: 4.160 +[2024-11-07 14:44:37,450][04584] Num frames 2500... +[2024-11-07 14:44:37,658][04584] Num frames 2600... +[2024-11-07 14:44:37,847][04584] Num frames 2700... +[2024-11-07 14:44:38,030][04584] Num frames 2800... +[2024-11-07 14:44:38,229][04584] Avg episode rewards: #0: 4.829, true rewards: #0: 4.114 +[2024-11-07 14:44:38,232][04584] Avg episode reward: 4.829, avg true_objective: 4.114 +[2024-11-07 14:44:38,291][04584] Num frames 2900... +[2024-11-07 14:44:38,471][04584] Num frames 3000... +[2024-11-07 14:44:38,681][04584] Num frames 3100... +[2024-11-07 14:44:38,872][04584] Num frames 3200... +[2024-11-07 14:44:39,064][04584] Avg episode rewards: #0: 4.705, true rewards: #0: 4.080 +[2024-11-07 14:44:39,066][04584] Avg episode reward: 4.705, avg true_objective: 4.080 +[2024-11-07 14:44:39,137][04584] Num frames 3300... +[2024-11-07 14:44:39,304][04584] Num frames 3400... +[2024-11-07 14:44:39,485][04584] Num frames 3500... +[2024-11-07 14:44:39,725][04584] Num frames 3600... +[2024-11-07 14:44:39,856][04584] Avg episode rewards: #0: 4.609, true rewards: #0: 4.053 +[2024-11-07 14:44:39,861][04584] Avg episode reward: 4.609, avg true_objective: 4.053 +[2024-11-07 14:44:39,965][04584] Num frames 3700... +[2024-11-07 14:44:40,133][04584] Num frames 3800... +[2024-11-07 14:44:40,308][04584] Num frames 3900... +[2024-11-07 14:44:40,493][04584] Num frames 4000... +[2024-11-07 14:44:40,602][04584] Avg episode rewards: #0: 4.532, true rewards: #0: 4.032 +[2024-11-07 14:44:40,606][04584] Avg episode reward: 4.532, avg true_objective: 4.032 +[2024-11-07 14:44:51,260][04584] Replay video saved to /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/replay.mp4! +[2024-11-07 14:44:53,903][04584] Loading existing experiment configuration from /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/config.json +[2024-11-07 14:44:53,904][04584] Overriding arg 'num_workers' with value 1 passed from command line +[2024-11-07 14:44:53,906][04584] Adding new argument 'no_render'=True that is not in the saved config file! +[2024-11-07 14:44:53,908][04584] Adding new argument 'save_video'=True that is not in the saved config file! +[2024-11-07 14:44:53,909][04584] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! +[2024-11-07 14:44:53,910][04584] Adding new argument 'video_name'=None that is not in the saved config file! +[2024-11-07 14:44:53,913][04584] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! +[2024-11-07 14:44:53,915][04584] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! +[2024-11-07 14:44:53,917][04584] Adding new argument 'push_to_hub'=True that is not in the saved config file! +[2024-11-07 14:44:53,920][04584] Adding new argument 'hf_repository'='alidenewade/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! +[2024-11-07 14:44:53,922][04584] Adding new argument 'policy_index'=0 that is not in the saved config file! +[2024-11-07 14:44:53,923][04584] Adding new argument 'eval_deterministic'=False that is not in the saved config file! +[2024-11-07 14:44:53,926][04584] Adding new argument 'train_script'=None that is not in the saved config file! +[2024-11-07 14:44:53,928][04584] Adding new argument 'enjoy_script'=None that is not in the saved config file! +[2024-11-07 14:44:53,931][04584] Using frameskip 1 and render_action_repeat=4 for evaluation +[2024-11-07 14:44:53,958][04584] RunningMeanStd input shape: (3, 72, 128) +[2024-11-07 14:44:53,960][04584] RunningMeanStd input shape: (1,) +[2024-11-07 14:44:53,973][04584] ConvEncoder: input_channels=3 +[2024-11-07 14:44:54,021][04584] Conv encoder output size: 512 +[2024-11-07 14:44:54,023][04584] Policy head output size: 512 +[2024-11-07 14:44:54,046][04584] Loading state from checkpoint /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000001955_8007680.pth... +[2024-11-07 14:44:54,568][04584] Num frames 100... +[2024-11-07 14:44:54,745][04584] Num frames 200... +[2024-11-07 14:44:54,907][04584] Num frames 300... +[2024-11-07 14:44:55,060][04584] Num frames 400... +[2024-11-07 14:44:55,186][04584] Avg episode rewards: #0: 5.480, true rewards: #0: 4.480 +[2024-11-07 14:44:55,187][04584] Avg episode reward: 5.480, avg true_objective: 4.480 +[2024-11-07 14:44:55,276][04584] Num frames 500... +[2024-11-07 14:44:55,445][04584] Num frames 600... +[2024-11-07 14:44:55,620][04584] Num frames 700... +[2024-11-07 14:44:55,759][04584] Num frames 800... +[2024-11-07 14:44:55,862][04584] Avg episode rewards: #0: 4.660, true rewards: #0: 4.160 +[2024-11-07 14:44:55,863][04584] Avg episode reward: 4.660, avg true_objective: 4.160 +[2024-11-07 14:44:55,968][04584] Num frames 900... +[2024-11-07 14:44:56,121][04584] Num frames 1000... +[2024-11-07 14:44:56,271][04584] Num frames 1100... +[2024-11-07 14:44:56,427][04584] Num frames 1200... +[2024-11-07 14:44:56,511][04584] Avg episode rewards: #0: 4.387, true rewards: #0: 4.053 +[2024-11-07 14:44:56,512][04584] Avg episode reward: 4.387, avg true_objective: 4.053 +[2024-11-07 14:44:56,656][04584] Num frames 1300... +[2024-11-07 14:44:56,815][04584] Num frames 1400... +[2024-11-07 14:44:56,966][04584] Num frames 1500... +[2024-11-07 14:44:57,111][04584] Num frames 1600... +[2024-11-07 14:44:57,215][04584] Avg episode rewards: #0: 4.580, true rewards: #0: 4.080 +[2024-11-07 14:44:57,215][04584] Avg episode reward: 4.580, avg true_objective: 4.080 +[2024-11-07 14:44:57,321][04584] Num frames 1700... +[2024-11-07 14:44:57,471][04584] Num frames 1800... +[2024-11-07 14:44:57,627][04584] Num frames 1900... +[2024-11-07 14:44:57,776][04584] Num frames 2000... +[2024-11-07 14:44:57,856][04584] Avg episode rewards: #0: 4.432, true rewards: #0: 4.032 +[2024-11-07 14:44:57,857][04584] Avg episode reward: 4.432, avg true_objective: 4.032 +[2024-11-07 14:44:57,981][04584] Num frames 2100... +[2024-11-07 14:44:58,132][04584] Num frames 2200... +[2024-11-07 14:44:58,301][04584] Num frames 2300... +[2024-11-07 14:44:58,455][04584] Num frames 2400... +[2024-11-07 14:44:58,615][04584] Avg episode rewards: #0: 4.607, true rewards: #0: 4.107 +[2024-11-07 14:44:58,616][04584] Avg episode reward: 4.607, avg true_objective: 4.107 +[2024-11-07 14:44:58,681][04584] Num frames 2500... +[2024-11-07 14:44:58,837][04584] Num frames 2600... +[2024-11-07 14:44:58,996][04584] Num frames 2700... +[2024-11-07 14:44:59,149][04584] Num frames 2800... +[2024-11-07 14:44:59,330][04584] Avg episode rewards: #0: 4.686, true rewards: #0: 4.114 +[2024-11-07 14:44:59,331][04584] Avg episode reward: 4.686, avg true_objective: 4.114 +[2024-11-07 14:44:59,363][04584] Num frames 2900... +[2024-11-07 14:44:59,538][04584] Num frames 3000... +[2024-11-07 14:44:59,703][04584] Num frames 3100... +[2024-11-07 14:44:59,853][04584] Num frames 3200... +[2024-11-07 14:45:00,003][04584] Avg episode rewards: #0: 4.580, true rewards: #0: 4.080 +[2024-11-07 14:45:00,005][04584] Avg episode reward: 4.580, avg true_objective: 4.080 +[2024-11-07 14:45:00,066][04584] Num frames 3300... +[2024-11-07 14:45:00,231][04584] Num frames 3400... +[2024-11-07 14:45:00,389][04584] Num frames 3500... +[2024-11-07 14:45:00,565][04584] Num frames 3600... +[2024-11-07 14:45:00,743][04584] Num frames 3700... +[2024-11-07 14:45:00,972][04584] Avg episode rewards: #0: 4.862, true rewards: #0: 4.196 +[2024-11-07 14:45:00,974][04584] Avg episode reward: 4.862, avg true_objective: 4.196 +[2024-11-07 14:45:01,026][04584] Num frames 3800... +[2024-11-07 14:45:01,224][04584] Num frames 3900... +[2024-11-07 14:45:01,414][04584] Num frames 4000... +[2024-11-07 14:45:01,597][04584] Num frames 4100... +[2024-11-07 14:45:01,777][04584] Avg episode rewards: #0: 4.760, true rewards: #0: 4.160 +[2024-11-07 14:45:01,778][04584] Avg episode reward: 4.760, avg true_objective: 4.160 +[2024-11-07 14:45:10,932][04584] Replay video saved to /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/replay.mp4! +[2024-11-07 14:45:22,820][04584] The model has been pushed to https://huggingface.co/alidenewade/rl_course_vizdoom_health_gathering_supreme +[2024-11-07 14:52:22,743][04584] Loading existing experiment configuration from /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/config.json +[2024-11-07 14:52:22,744][04584] Overriding arg 'num_workers' with value 1 passed from command line +[2024-11-07 14:52:22,746][04584] Adding new argument 'no_render'=True that is not in the saved config file! +[2024-11-07 14:52:22,747][04584] Adding new argument 'save_video'=True that is not in the saved config file! +[2024-11-07 14:52:22,749][04584] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! +[2024-11-07 14:52:22,750][04584] Adding new argument 'video_name'=None that is not in the saved config file! +[2024-11-07 14:52:22,753][04584] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! +[2024-11-07 14:52:22,755][04584] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! +[2024-11-07 14:52:22,756][04584] Adding new argument 'push_to_hub'=True that is not in the saved config file! +[2024-11-07 14:52:22,757][04584] Adding new argument 'hf_repository'='alidenewade/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! +[2024-11-07 14:52:22,758][04584] Adding new argument 'policy_index'=0 that is not in the saved config file! +[2024-11-07 14:52:22,761][04584] Adding new argument 'eval_deterministic'=False that is not in the saved config file! +[2024-11-07 14:52:22,762][04584] Adding new argument 'train_script'=None that is not in the saved config file! +[2024-11-07 14:52:22,764][04584] Adding new argument 'enjoy_script'=None that is not in the saved config file! +[2024-11-07 14:52:22,765][04584] Using frameskip 1 and render_action_repeat=4 for evaluation +[2024-11-07 14:52:22,805][04584] RunningMeanStd input shape: (3, 72, 128) +[2024-11-07 14:52:22,807][04584] RunningMeanStd input shape: (1,) +[2024-11-07 14:52:22,823][04584] ConvEncoder: input_channels=3 +[2024-11-07 14:52:22,886][04584] Conv encoder output size: 512 +[2024-11-07 14:52:22,887][04584] Policy head output size: 512 +[2024-11-07 14:52:22,925][04584] Loading state from checkpoint /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000001955_8007680.pth... +[2024-11-07 14:52:23,495][04584] Num frames 100... +[2024-11-07 14:52:23,750][04584] Num frames 200... +[2024-11-07 14:52:23,933][04584] Num frames 300... +[2024-11-07 14:52:24,160][04584] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840 +[2024-11-07 14:52:24,166][04584] Avg episode reward: 3.840, avg true_objective: 3.840 +[2024-11-07 14:52:24,207][04584] Num frames 400... +[2024-11-07 14:52:24,395][04584] Num frames 500... +[2024-11-07 14:52:24,568][04584] Num frames 600... +[2024-11-07 14:52:24,735][04584] Num frames 700... +[2024-11-07 14:52:24,905][04584] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840 +[2024-11-07 14:52:24,908][04584] Avg episode reward: 3.840, avg true_objective: 3.840 +[2024-11-07 14:52:24,982][04584] Num frames 800... +[2024-11-07 14:52:25,141][04584] Num frames 900... +[2024-11-07 14:52:25,293][04584] Num frames 1000... +[2024-11-07 14:52:25,495][04584] Num frames 1100... +[2024-11-07 14:52:25,703][04584] Num frames 1200... +[2024-11-07 14:52:25,787][04584] Avg episode rewards: #0: 4.387, true rewards: #0: 4.053 +[2024-11-07 14:52:25,789][04584] Avg episode reward: 4.387, avg true_objective: 4.053 +[2024-11-07 14:52:25,973][04584] Num frames 1300... +[2024-11-07 14:52:26,162][04584] Num frames 1400... +[2024-11-07 14:52:26,327][04584] Avg episode rewards: #0: 3.925, true rewards: #0: 3.675 +[2024-11-07 14:52:26,328][04584] Avg episode reward: 3.925, avg true_objective: 3.675 +[2024-11-07 14:52:26,390][04584] Num frames 1500... +[2024-11-07 14:52:26,547][04584] Num frames 1600... +[2024-11-07 14:52:26,703][04584] Num frames 1700... +[2024-11-07 14:52:26,865][04584] Num frames 1800... +[2024-11-07 14:52:27,004][04584] Avg episode rewards: #0: 3.908, true rewards: #0: 3.708 +[2024-11-07 14:52:27,009][04584] Avg episode reward: 3.908, avg true_objective: 3.708 +[2024-11-07 14:52:27,106][04584] Num frames 1900... +[2024-11-07 14:52:27,292][04584] Num frames 2000... +[2024-11-07 14:52:27,460][04584] Num frames 2100... +[2024-11-07 14:52:27,618][04584] Num frames 2200... +[2024-11-07 14:52:27,770][04584] Num frames 2300... +[2024-11-07 14:52:27,829][04584] Avg episode rewards: #0: 4.170, true rewards: #0: 3.837 +[2024-11-07 14:52:27,830][04584] Avg episode reward: 4.170, avg true_objective: 3.837 +[2024-11-07 14:52:28,026][04584] Num frames 2400... +[2024-11-07 14:52:28,195][04584] Num frames 2500... +[2024-11-07 14:52:28,412][04584] Num frames 2600... +[2024-11-07 14:52:28,639][04584] Num frames 2700... +[2024-11-07 14:52:28,757][04584] Avg episode rewards: #0: 4.169, true rewards: #0: 3.883 +[2024-11-07 14:52:28,759][04584] Avg episode reward: 4.169, avg true_objective: 3.883 +[2024-11-07 14:52:28,999][04584] Num frames 2800... +[2024-11-07 14:52:29,214][04584] Num frames 2900... +[2024-11-07 14:52:29,416][04584] Num frames 3000... +[2024-11-07 14:52:29,700][04584] Num frames 3100... +[2024-11-07 14:52:30,033][04584] Avg episode rewards: #0: 4.498, true rewards: #0: 3.997 +[2024-11-07 14:52:30,037][04584] Avg episode reward: 4.498, avg true_objective: 3.997 +[2024-11-07 14:52:30,057][04584] Num frames 3200... +[2024-11-07 14:52:30,268][04584] Num frames 3300... +[2024-11-07 14:52:30,520][04584] Num frames 3400... +[2024-11-07 14:52:30,893][04584] Num frames 3500... +[2024-11-07 14:52:31,203][04584] Avg episode rewards: #0: 4.424, true rewards: #0: 3.980 +[2024-11-07 14:52:31,209][04584] Avg episode reward: 4.424, avg true_objective: 3.980 +[2024-11-07 14:52:31,268][04584] Num frames 3600... +[2024-11-07 14:52:31,510][04584] Num frames 3700... +[2024-11-07 14:52:31,731][04584] Num frames 3800... +[2024-11-07 14:52:31,955][04584] Num frames 3900... +[2024-11-07 14:52:32,276][04584] Avg episode rewards: #0: 4.498, true rewards: #0: 3.998 +[2024-11-07 14:52:32,278][04584] Avg episode reward: 4.498, avg true_objective: 3.998 +[2024-11-07 14:52:32,283][04584] Num frames 4000... +[2024-11-07 14:52:40,832][04584] Replay video saved to /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/replay.mp4! +[2024-11-07 14:52:50,207][04584] The model has been pushed to https://huggingface.co/alidenewade/rl_course_vizdoom_health_gathering_supreme +[2024-11-07 14:55:34,046][04584] Environment doom_basic already registered, overwriting... +[2024-11-07 14:55:34,050][04584] Environment doom_two_colors_easy already registered, overwriting... +[2024-11-07 14:55:34,053][04584] Environment doom_two_colors_hard already registered, overwriting... +[2024-11-07 14:55:34,054][04584] Environment doom_dm already registered, overwriting... +[2024-11-07 14:55:34,055][04584] Environment doom_dwango5 already registered, overwriting... +[2024-11-07 14:55:34,057][04584] Environment doom_my_way_home_flat_actions already registered, overwriting... +[2024-11-07 14:55:34,058][04584] Environment doom_defend_the_center_flat_actions already registered, overwriting... +[2024-11-07 14:55:34,059][04584] Environment doom_my_way_home already registered, overwriting... +[2024-11-07 14:55:34,060][04584] Environment doom_deadly_corridor already registered, overwriting... +[2024-11-07 14:55:34,063][04584] Environment doom_defend_the_center already registered, overwriting... +[2024-11-07 14:55:34,065][04584] Environment doom_defend_the_line already registered, overwriting... +[2024-11-07 14:55:34,067][04584] Environment doom_health_gathering already registered, overwriting... +[2024-11-07 14:55:34,069][04584] Environment doom_health_gathering_supreme already registered, overwriting... +[2024-11-07 14:55:34,070][04584] Environment doom_battle already registered, overwriting... +[2024-11-07 14:55:34,072][04584] Environment doom_battle2 already registered, overwriting... +[2024-11-07 14:55:34,073][04584] Environment doom_duel_bots already registered, overwriting... +[2024-11-07 14:55:34,075][04584] Environment doom_deathmatch_bots already registered, overwriting... +[2024-11-07 14:55:34,075][04584] Environment doom_duel already registered, overwriting... +[2024-11-07 14:55:34,078][04584] Environment doom_deathmatch_full already registered, overwriting... +[2024-11-07 14:55:34,079][04584] Environment doom_benchmark already registered, overwriting... +[2024-11-07 14:55:34,081][04584] register_encoder_factory: +[2024-11-07 14:55:34,099][04584] Loading existing experiment configuration from /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/config.json +[2024-11-07 14:55:34,107][04584] Experiment dir /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment already exists! +[2024-11-07 14:55:34,109][04584] Resuming existing experiment from /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment... +[2024-11-07 14:55:34,110][04584] Weights and Biases integration disabled +[2024-11-07 14:55:34,116][04584] Environment var CUDA_VISIBLE_DEVICES is 0 + +[2024-11-07 14:55:36,623][04584] Starting experiment with the following configuration: +help=False +algo=APPO +env=doom_health_gathering_supreme +experiment=default_experiment +train_dir=/root/hfRL/ml/LunarLander-v2/train_dir +restart_behavior=resume +device=gpu +seed=None +num_policies=1 +async_rl=True +serial_mode=False +batched_sampling=False +num_batches_to_accumulate=2 +worker_num_splits=2 +policy_workers_per_policy=1 +max_policy_lag=1000 +num_workers=8 +num_envs_per_worker=4 +batch_size=1024 +num_batches_per_epoch=1 +num_epochs=1 +rollout=32 +recurrence=32 +shuffle_minibatches=False +gamma=0.99 +reward_scale=1.0 +reward_clip=1000.0 +value_bootstrap=False +normalize_returns=True +exploration_loss_coeff=0.001 +value_loss_coeff=0.5 +kl_loss_coeff=0.0 +exploration_loss=symmetric_kl +gae_lambda=0.95 +ppo_clip_ratio=0.1 +ppo_clip_value=0.2 +with_vtrace=False +vtrace_rho=1.0 +vtrace_c=1.0 +optimizer=adam +adam_eps=1e-06 +adam_beta1=0.9 +adam_beta2=0.999 +max_grad_norm=4.0 +learning_rate=0.0001 +lr_schedule=constant +lr_schedule_kl_threshold=0.008 +lr_adaptive_min=1e-06 +lr_adaptive_max=0.01 +obs_subtract_mean=0.0 +obs_scale=255.0 +normalize_input=True +normalize_input_keys=None +decorrelate_experience_max_seconds=0 +decorrelate_envs_on_one_worker=True +actor_worker_gpus=[] +set_workers_cpu_affinity=True +force_envs_single_thread=False +default_niceness=0 +log_to_file=True +experiment_summaries_interval=10 +flush_summaries_interval=30 +stats_avg=100 +summaries_use_frameskip=True +heartbeat_interval=20 +heartbeat_reporting_interval=600 +train_for_env_steps=8000000 +train_for_seconds=10000000000 +save_every_sec=120 +keep_checkpoints=2 +load_checkpoint_kind=latest +save_milestones_sec=-1 +save_best_every_sec=5 +save_best_metric=reward +save_best_after=100000 +benchmark=False +encoder_mlp_layers=[512, 512] +encoder_conv_architecture=convnet_simple +encoder_conv_mlp_layers=[512] +use_rnn=True +rnn_size=512 +rnn_type=gru +rnn_num_layers=1 +decoder_mlp_layers=[] +nonlinearity=elu +policy_initialization=orthogonal +policy_init_gain=1.0 +actor_critic_share_weights=True +adaptive_stddev=True +continuous_tanh_scale=0.0 +initial_stddev=1.0 +use_env_info_cache=False +env_gpu_actions=False +env_gpu_observations=True +env_frameskip=4 +env_framestack=1 +pixel_format=CHW +use_record_episode_statistics=False +with_wandb=False +wandb_user=None +wandb_project=sample_factory +wandb_group=None +wandb_job_type=SF +wandb_tags=[] +with_pbt=False +pbt_mix_policies_in_one_env=True +pbt_period_env_steps=5000000 +pbt_start_mutation=20000000 +pbt_replace_fraction=0.3 +pbt_mutation_rate=0.15 +pbt_replace_reward_gap=0.1 +pbt_replace_reward_gap_absolute=1e-06 +pbt_optimize_gamma=False +pbt_target_objective=true_objective +pbt_perturb_min=1.1 +pbt_perturb_max=1.5 +num_agents=-1 +num_humans=0 +num_bots=-1 +start_bot_difficulty=None +timelimit=None +res_w=128 +res_h=72 +wide_aspect_ratio=False +eval_env_frameskip=1 +fps=35 +command_line=--env=doom_health_gathering_supreme --num_workers=8 --num_envs_per_worker=4 --train_for_env_steps=4000000 +cli_args={'env': 'doom_health_gathering_supreme', 'num_workers': 8, 'num_envs_per_worker': 4, 'train_for_env_steps': 4000000} +git_hash=unknown +git_repo_name=not a git repository +[2024-11-07 14:55:36,625][04584] Saving configuration to /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/config.json... +[2024-11-07 14:55:36,628][04584] Rollout worker 0 uses device cpu +[2024-11-07 14:55:36,629][04584] Rollout worker 1 uses device cpu +[2024-11-07 14:55:36,631][04584] Rollout worker 2 uses device cpu +[2024-11-07 14:55:36,633][04584] Rollout worker 3 uses device cpu +[2024-11-07 14:55:36,635][04584] Rollout worker 4 uses device cpu +[2024-11-07 14:55:36,637][04584] Rollout worker 5 uses device cpu +[2024-11-07 14:55:36,639][04584] Rollout worker 6 uses device cpu +[2024-11-07 14:55:36,641][04584] Rollout worker 7 uses device cpu +[2024-11-07 14:55:36,708][04584] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-11-07 14:55:36,710][04584] InferenceWorker_p0-w0: min num requests: 2 +[2024-11-07 14:55:36,746][04584] Starting all processes... +[2024-11-07 14:55:36,748][04584] Starting process learner_proc0 +[2024-11-07 14:55:36,796][04584] Starting all processes... +[2024-11-07 14:55:36,802][04584] Starting process inference_proc0-0 +[2024-11-07 14:55:36,803][04584] Starting process rollout_proc0 +[2024-11-07 14:55:36,803][04584] Starting process rollout_proc1 +[2024-11-07 14:55:36,803][04584] Starting process rollout_proc2 +[2024-11-07 14:55:36,804][04584] Starting process rollout_proc3 +[2024-11-07 14:55:36,805][04584] Starting process rollout_proc4 +[2024-11-07 14:55:36,806][04584] Starting process rollout_proc5 +[2024-11-07 14:55:36,807][04584] Starting process rollout_proc6 +[2024-11-07 14:55:36,808][04584] Starting process rollout_proc7 +[2024-11-07 14:55:43,371][07866] Worker 0 uses CPU cores [0] +[2024-11-07 14:55:44,070][07873] Worker 1 uses CPU cores [1] +[2024-11-07 14:55:44,330][07871] Worker 3 uses CPU cores [3] +[2024-11-07 14:55:44,346][07852] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-11-07 14:55:44,347][07852] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 +[2024-11-07 14:55:44,350][07865] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-11-07 14:55:44,351][07865] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 +[2024-11-07 14:55:44,383][07852] Num visible devices: 1 +[2024-11-07 14:55:44,418][07865] Num visible devices: 1 +[2024-11-07 14:55:44,418][07852] Starting seed is not provided +[2024-11-07 14:55:44,418][07852] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-11-07 14:55:44,418][07852] Initializing actor-critic model on device cuda:0 +[2024-11-07 14:55:44,419][07852] RunningMeanStd input shape: (3, 72, 128) +[2024-11-07 14:55:44,422][07852] RunningMeanStd input shape: (1,) +[2024-11-07 14:55:44,488][07852] ConvEncoder: input_channels=3 +[2024-11-07 14:55:44,600][07874] Worker 5 uses CPU cores [5] +[2024-11-07 14:55:44,619][07885] Worker 7 uses CPU cores [0, 1, 2, 3, 4, 5, 6] +[2024-11-07 14:55:44,664][07884] Worker 6 uses CPU cores [6] +[2024-11-07 14:55:44,740][07870] Worker 2 uses CPU cores [2] +[2024-11-07 14:55:44,747][07852] Conv encoder output size: 512 +[2024-11-07 14:55:44,747][07852] Policy head output size: 512 +[2024-11-07 14:55:44,767][07852] Created Actor Critic model with architecture: +[2024-11-07 14:55:44,768][07852] ActorCriticSharedWeights( + (obs_normalizer): ObservationNormalizer( + (running_mean_std): RunningMeanStdDictInPlace( + (running_mean_std): ModuleDict( + (obs): RunningMeanStdInPlace() + ) + ) + ) + (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) + (encoder): VizdoomEncoder( + (basic_encoder): ConvEncoder( + (enc): RecursiveScriptModule( + original_name=ConvEncoderImpl + (conv_head): RecursiveScriptModule( + original_name=Sequential + (0): RecursiveScriptModule(original_name=Conv2d) + (1): RecursiveScriptModule(original_name=ELU) + (2): RecursiveScriptModule(original_name=Conv2d) + (3): RecursiveScriptModule(original_name=ELU) + (4): RecursiveScriptModule(original_name=Conv2d) + (5): RecursiveScriptModule(original_name=ELU) + ) + (mlp_layers): RecursiveScriptModule( + original_name=Sequential + (0): RecursiveScriptModule(original_name=Linear) + (1): RecursiveScriptModule(original_name=ELU) + ) + ) + ) + ) + (core): ModelCoreRNN( + (core): GRU(512, 512) + ) + (decoder): MlpDecoder( + (mlp): Identity() + ) + (critic_linear): Linear(in_features=512, out_features=1, bias=True) + (action_parameterization): ActionParameterizationDefault( + (distribution_linear): Linear(in_features=512, out_features=5, bias=True) + ) +) +[2024-11-07 14:55:45,005][07852] Using optimizer +[2024-11-07 14:55:45,120][07872] Worker 4 uses CPU cores [4] +[2024-11-07 14:55:46,269][07852] Loading state from checkpoint /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000001955_8007680.pth... +[2024-11-07 14:55:46,317][07852] Loading model from checkpoint +[2024-11-07 14:55:46,320][07852] Loaded experiment state at self.train_step=1955, self.env_steps=8007680 +[2024-11-07 14:55:46,320][07852] Initialized policy 0 weights for model version 1955 +[2024-11-07 14:55:46,327][07852] LearnerWorker_p0 finished initialization! +[2024-11-07 14:55:46,327][07852] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-11-07 14:55:46,561][07865] RunningMeanStd input shape: (3, 72, 128) +[2024-11-07 14:55:46,563][07865] RunningMeanStd input shape: (1,) +[2024-11-07 14:55:46,580][07865] ConvEncoder: input_channels=3 +[2024-11-07 14:55:46,746][07865] Conv encoder output size: 512 +[2024-11-07 14:55:46,747][07865] Policy head output size: 512 +[2024-11-07 14:55:46,824][04584] Inference worker 0-0 is ready! +[2024-11-07 14:55:46,826][04584] All inference workers are ready! Signal rollout workers to start! +[2024-11-07 14:55:47,036][07866] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 14:55:47,038][07872] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 14:55:47,069][07874] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 14:55:47,090][07871] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 14:55:47,104][07873] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 14:55:47,104][07870] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 14:55:47,247][07885] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 14:55:47,261][07884] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 14:55:48,282][07871] Decorrelating experience for 0 frames... +[2024-11-07 14:55:48,298][07872] Decorrelating experience for 0 frames... +[2024-11-07 14:55:48,386][07885] Decorrelating experience for 0 frames... +[2024-11-07 14:55:48,436][07866] Decorrelating experience for 0 frames... +[2024-11-07 14:55:48,449][07884] Decorrelating experience for 0 frames... +[2024-11-07 14:55:48,975][07871] Decorrelating experience for 32 frames... +[2024-11-07 14:55:48,987][07870] Decorrelating experience for 0 frames... +[2024-11-07 14:55:49,060][07872] Decorrelating experience for 32 frames... +[2024-11-07 14:55:49,080][07885] Decorrelating experience for 32 frames... +[2024-11-07 14:55:49,117][04584] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 8007680. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-11-07 14:55:49,212][07884] Decorrelating experience for 32 frames... +[2024-11-07 14:55:49,240][07866] Decorrelating experience for 32 frames... +[2024-11-07 14:55:49,253][07874] Decorrelating experience for 0 frames... +[2024-11-07 14:55:49,700][07870] Decorrelating experience for 32 frames... +[2024-11-07 14:55:49,826][07872] Decorrelating experience for 64 frames... +[2024-11-07 14:55:49,939][07871] Decorrelating experience for 64 frames... +[2024-11-07 14:55:50,064][07885] Decorrelating experience for 64 frames... +[2024-11-07 14:55:50,088][07874] Decorrelating experience for 32 frames... +[2024-11-07 14:55:50,292][07870] Decorrelating experience for 64 frames... +[2024-11-07 14:55:50,415][07873] Decorrelating experience for 0 frames... +[2024-11-07 14:55:50,424][07866] Decorrelating experience for 64 frames... +[2024-11-07 14:55:50,533][07872] Decorrelating experience for 96 frames... +[2024-11-07 14:55:50,673][07884] Decorrelating experience for 64 frames... +[2024-11-07 14:55:50,733][07871] Decorrelating experience for 96 frames... +[2024-11-07 14:55:50,841][07885] Decorrelating experience for 96 frames... +[2024-11-07 14:55:50,955][07870] Decorrelating experience for 96 frames... +[2024-11-07 14:55:50,987][07873] Decorrelating experience for 32 frames... +[2024-11-07 14:55:51,133][07866] Decorrelating experience for 96 frames... +[2024-11-07 14:55:51,244][07874] Decorrelating experience for 64 frames... +[2024-11-07 14:55:51,659][07873] Decorrelating experience for 64 frames... +[2024-11-07 14:55:51,785][07874] Decorrelating experience for 96 frames... +[2024-11-07 14:55:51,817][07884] Decorrelating experience for 96 frames... +[2024-11-07 14:55:52,517][07873] Decorrelating experience for 96 frames... +[2024-11-07 14:55:54,118][04584] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 8007680. Throughput: 0: 231.1. Samples: 1156. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-11-07 14:55:54,125][07852] Signal inference workers to stop experience collection... +[2024-11-07 14:55:54,125][04584] Avg episode reward: [(0, '2.241')] +[2024-11-07 14:55:54,133][07865] InferenceWorker_p0-w0: stopping experience collection +[2024-11-07 14:55:56,697][04584] Heartbeat connected on Batcher_0 +[2024-11-07 14:55:56,708][04584] Heartbeat connected on InferenceWorker_p0-w0 +[2024-11-07 14:55:56,717][04584] Heartbeat connected on RolloutWorker_w0 +[2024-11-07 14:55:56,721][04584] Heartbeat connected on RolloutWorker_w1 +[2024-11-07 14:55:56,726][04584] Heartbeat connected on RolloutWorker_w2 +[2024-11-07 14:55:56,738][04584] Heartbeat connected on RolloutWorker_w5 +[2024-11-07 14:55:56,742][04584] Heartbeat connected on RolloutWorker_w3 +[2024-11-07 14:55:56,744][04584] Heartbeat connected on RolloutWorker_w6 +[2024-11-07 14:55:56,746][04584] Heartbeat connected on RolloutWorker_w4 +[2024-11-07 14:55:56,749][04584] Heartbeat connected on RolloutWorker_w7 +[2024-11-07 14:55:59,116][04584] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 8007680. Throughput: 0: 272.4. Samples: 2724. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-11-07 14:55:59,117][04584] Avg episode reward: [(0, '2.386')] +[2024-11-07 14:56:04,116][04584] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 8007680. Throughput: 0: 181.6. Samples: 2724. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-11-07 14:56:04,121][04584] Avg episode reward: [(0, '2.386')] +[2024-11-07 14:56:06,231][07852] Signal inference workers to resume experience collection... +[2024-11-07 14:56:06,233][07865] InferenceWorker_p0-w0: resuming experience collection +[2024-11-07 14:56:06,236][07852] Stopping Batcher_0... +[2024-11-07 14:56:06,237][07852] Loop batcher_evt_loop terminating... +[2024-11-07 14:56:06,247][04584] Component Batcher_0 stopped! +[2024-11-07 14:56:06,361][04584] Component RolloutWorker_w0 stopped! +[2024-11-07 14:56:06,362][07866] Stopping RolloutWorker_w0... +[2024-11-07 14:56:06,365][07866] Loop rollout_proc0_evt_loop terminating... +[2024-11-07 14:56:06,366][07874] Stopping RolloutWorker_w5... +[2024-11-07 14:56:06,367][07874] Loop rollout_proc5_evt_loop terminating... +[2024-11-07 14:56:06,366][04584] Component RolloutWorker_w5 stopped! +[2024-11-07 14:56:06,377][07865] Weights refcount: 2 0 +[2024-11-07 14:56:06,422][07865] Stopping InferenceWorker_p0-w0... +[2024-11-07 14:56:06,423][07865] Loop inference_proc0-0_evt_loop terminating... +[2024-11-07 14:56:06,422][04584] Component InferenceWorker_p0-w0 stopped! +[2024-11-07 14:56:06,461][07884] Stopping RolloutWorker_w6... +[2024-11-07 14:56:06,462][07884] Loop rollout_proc6_evt_loop terminating... +[2024-11-07 14:56:06,461][04584] Component RolloutWorker_w6 stopped! +[2024-11-07 14:56:06,467][07871] Stopping RolloutWorker_w3... +[2024-11-07 14:56:06,468][07871] Loop rollout_proc3_evt_loop terminating... +[2024-11-07 14:56:06,468][04584] Component RolloutWorker_w3 stopped! +[2024-11-07 14:56:06,479][07872] Stopping RolloutWorker_w4... +[2024-11-07 14:56:06,479][07872] Loop rollout_proc4_evt_loop terminating... +[2024-11-07 14:56:06,479][04584] Component RolloutWorker_w4 stopped! +[2024-11-07 14:56:06,516][07885] Stopping RolloutWorker_w7... +[2024-11-07 14:56:06,515][04584] Component RolloutWorker_w7 stopped! +[2024-11-07 14:56:06,517][07885] Loop rollout_proc7_evt_loop terminating... +[2024-11-07 14:56:06,608][07870] Stopping RolloutWorker_w2... +[2024-11-07 14:56:06,609][07870] Loop rollout_proc2_evt_loop terminating... +[2024-11-07 14:56:06,610][04584] Component RolloutWorker_w2 stopped! +[2024-11-07 14:56:06,634][07873] Stopping RolloutWorker_w1... +[2024-11-07 14:56:06,635][04584] Component RolloutWorker_w1 stopped! +[2024-11-07 14:56:06,640][07873] Loop rollout_proc1_evt_loop terminating... +[2024-11-07 14:56:07,253][07852] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000001957_8015872.pth... +[2024-11-07 14:56:07,252][04584] Heartbeat connected on LearnerWorker_p0 +[2024-11-07 14:56:07,807][07852] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000001806_7397376.pth +[2024-11-07 14:56:07,821][07852] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000001957_8015872.pth... +[2024-11-07 14:56:08,058][07852] Stopping LearnerWorker_p0... +[2024-11-07 14:56:08,058][07852] Loop learner_proc0_evt_loop terminating... +[2024-11-07 14:56:08,073][04584] Component LearnerWorker_p0 stopped! +[2024-11-07 14:56:08,075][04584] Waiting for process learner_proc0 to stop... +[2024-11-07 14:56:09,968][04584] Waiting for process inference_proc0-0 to join... +[2024-11-07 14:56:09,970][04584] Waiting for process rollout_proc0 to join... +[2024-11-07 14:56:09,971][04584] Waiting for process rollout_proc1 to join... +[2024-11-07 14:56:09,973][04584] Waiting for process rollout_proc2 to join... +[2024-11-07 14:56:09,975][04584] Waiting for process rollout_proc3 to join... +[2024-11-07 14:56:09,978][04584] Waiting for process rollout_proc4 to join... +[2024-11-07 14:56:09,980][04584] Waiting for process rollout_proc5 to join... +[2024-11-07 14:56:09,982][04584] Waiting for process rollout_proc6 to join... +[2024-11-07 14:56:09,985][04584] Waiting for process rollout_proc7 to join... +[2024-11-07 14:56:09,988][04584] Batcher 0 profile tree view: +batching: 0.0548, releasing_batches: 0.0018 +[2024-11-07 14:56:09,990][04584] InferenceWorker_p0-w0 profile tree view: +update_model: 0.0160 +wait_policy: 0.0005 + wait_policy_total: 3.7567 +one_step: 0.0108 + handle_policy_step: 3.3550 + deserialize: 0.0684, stack: 0.0110, obs_to_device_normalize: 0.7746, forward: 2.0215, send_messages: 0.1367 + prepare_outputs: 0.2665 + to_cpu: 0.1882 +[2024-11-07 14:56:09,992][04584] Learner 0 profile tree view: +misc: 0.0001, prepare_batch: 2.3351 +train: 11.7240 + epoch_init: 0.0001, minibatch_init: 0.0000, losses_postprocess: 0.0015, kl_divergence: 0.4367, after_optimizer: 0.9795 + calculate_losses: 2.5855 + losses_init: 0.0000, forward_head: 0.4665, bptt_initial: 1.1746, tail: 0.3188, advantages_returns: 0.0022, losses: 0.4083 + bptt: 0.2146 + bptt_forward_core: 0.2144 + update: 7.7171 + clip: 0.6435 +[2024-11-07 14:56:09,993][04584] RolloutWorker_w0 profile tree view: +wait_for_trajectories: 0.0039, enqueue_policy_requests: 0.0693, env_step: 0.7017, overhead: 0.0387, complete_rollouts: 0.0014 +save_policy_outputs: 0.0719 + split_output_tensors: 0.0275 +[2024-11-07 14:56:09,996][04584] RolloutWorker_w7 profile tree view: +wait_for_trajectories: 0.0008, enqueue_policy_requests: 0.0552, env_step: 1.1240, overhead: 0.0403, complete_rollouts: 0.0010 +save_policy_outputs: 0.0683 + split_output_tensors: 0.0209 +[2024-11-07 14:56:10,000][04584] Loop Runner_EvtLoop terminating... +[2024-11-07 14:56:10,003][04584] Runner profile tree view: +main_loop: 33.2571 +[2024-11-07 14:56:10,006][04584] Collected {0: 8015872}, FPS: 246.3 +[2024-11-07 14:56:10,173][04584] Loading existing experiment configuration from /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/config.json +[2024-11-07 14:56:10,175][04584] Overriding arg 'num_workers' with value 4 passed from command line +[2024-11-07 14:56:10,177][04584] Adding new argument 'no_render'=True that is not in the saved config file! +[2024-11-07 14:56:10,179][04584] Adding new argument 'save_video'=True that is not in the saved config file! +[2024-11-07 14:56:10,182][04584] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! +[2024-11-07 14:56:10,184][04584] Adding new argument 'video_name'=None that is not in the saved config file! +[2024-11-07 14:56:10,185][04584] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! +[2024-11-07 14:56:10,187][04584] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! +[2024-11-07 14:56:10,189][04584] Adding new argument 'push_to_hub'=False that is not in the saved config file! +[2024-11-07 14:56:10,191][04584] Adding new argument 'hf_repository'=None that is not in the saved config file! +[2024-11-07 14:56:10,193][04584] Adding new argument 'policy_index'=0 that is not in the saved config file! +[2024-11-07 14:56:10,194][04584] Adding new argument 'eval_deterministic'=False that is not in the saved config file! +[2024-11-07 14:56:10,195][04584] Adding new argument 'train_script'=None that is not in the saved config file! +[2024-11-07 14:56:10,197][04584] Adding new argument 'enjoy_script'=None that is not in the saved config file! +[2024-11-07 14:56:10,202][04584] Using frameskip 1 and render_action_repeat=4 for evaluation +[2024-11-07 14:56:10,254][04584] RunningMeanStd input shape: (3, 72, 128) +[2024-11-07 14:56:10,256][04584] RunningMeanStd input shape: (1,) +[2024-11-07 14:56:10,293][04584] ConvEncoder: input_channels=3 +[2024-11-07 14:56:10,355][04584] Conv encoder output size: 512 +[2024-11-07 14:56:10,357][04584] Policy head output size: 512 +[2024-11-07 14:56:10,406][04584] Loading state from checkpoint /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000001957_8015872.pth... +[2024-11-07 14:56:13,878][04584] Num frames 100... +[2024-11-07 14:56:14,106][04584] Num frames 200... +[2024-11-07 14:56:14,341][04584] Num frames 300... +[2024-11-07 14:56:14,572][04584] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840 +[2024-11-07 14:56:14,574][04584] Avg episode reward: 3.840, avg true_objective: 3.840 +[2024-11-07 14:56:14,609][04584] Num frames 400... +[2024-11-07 14:56:14,867][04584] Num frames 500... +[2024-11-07 14:56:15,349][04584] Num frames 600... +[2024-11-07 14:56:16,008][04584] Num frames 700... +[2024-11-07 14:56:16,669][04584] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840 +[2024-11-07 14:56:16,683][04584] Avg episode reward: 3.840, avg true_objective: 3.840 +[2024-11-07 14:56:16,990][04584] Num frames 800... +[2024-11-07 14:56:17,996][04584] Num frames 900... +[2024-11-07 14:56:18,675][04584] Num frames 1000... +[2024-11-07 14:56:19,397][04584] Num frames 1100... +[2024-11-07 14:56:19,814][04584] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840 +[2024-11-07 14:56:19,818][04584] Avg episode reward: 3.840, avg true_objective: 3.840 +[2024-11-07 14:56:20,211][04584] Num frames 1200... +[2024-11-07 14:56:20,758][04584] Num frames 1300... +[2024-11-07 14:56:21,379][04584] Num frames 1400... +[2024-11-07 14:56:21,844][04584] Num frames 1500... +[2024-11-07 14:56:22,003][04584] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840 +[2024-11-07 14:56:22,005][04584] Avg episode reward: 3.840, avg true_objective: 3.840 +[2024-11-07 14:56:22,207][04584] Num frames 1600... +[2024-11-07 14:56:22,423][04584] Num frames 1700... +[2024-11-07 14:56:22,654][04584] Num frames 1800... +[2024-11-07 14:56:22,864][04584] Num frames 1900... +[2024-11-07 14:56:22,961][04584] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840 +[2024-11-07 14:56:22,962][04584] Avg episode reward: 3.840, avg true_objective: 3.840 +[2024-11-07 14:56:23,201][04584] Num frames 2000... +[2024-11-07 14:56:23,415][04584] Num frames 2100... +[2024-11-07 14:56:23,717][04584] Num frames 2200... +[2024-11-07 14:56:23,940][04584] Num frames 2300... +[2024-11-07 14:56:24,003][04584] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840 +[2024-11-07 14:56:24,005][04584] Avg episode reward: 3.840, avg true_objective: 3.840 +[2024-11-07 14:56:24,222][04584] Num frames 2400... +[2024-11-07 14:56:24,428][04584] Num frames 2500... +[2024-11-07 14:56:24,677][04584] Num frames 2600... +[2024-11-07 14:56:24,929][04584] Num frames 2700... +[2024-11-07 14:56:25,159][04584] Num frames 2800... +[2024-11-07 14:56:25,249][04584] Avg episode rewards: #0: 4.309, true rewards: #0: 4.023 +[2024-11-07 14:56:25,250][04584] Avg episode reward: 4.309, avg true_objective: 4.023 +[2024-11-07 14:56:25,442][04584] Num frames 2900... +[2024-11-07 14:56:25,633][04584] Num frames 3000... +[2024-11-07 14:56:25,835][04584] Num frames 3100... +[2024-11-07 14:56:26,055][04584] Num frames 3200... +[2024-11-07 14:56:26,248][04584] Avg episode rewards: #0: 4.455, true rewards: #0: 4.080 +[2024-11-07 14:56:26,251][04584] Avg episode reward: 4.455, avg true_objective: 4.080 +[2024-11-07 14:56:26,359][04584] Num frames 3300... +[2024-11-07 14:56:26,561][04584] Num frames 3400... +[2024-11-07 14:56:26,732][04584] Num frames 3500... +[2024-11-07 14:56:26,889][04584] Num frames 3600... +[2024-11-07 14:56:27,027][04584] Avg episode rewards: #0: 4.387, true rewards: #0: 4.053 +[2024-11-07 14:56:27,030][04584] Avg episode reward: 4.387, avg true_objective: 4.053 +[2024-11-07 14:56:27,125][04584] Num frames 3700... +[2024-11-07 14:56:27,303][04584] Num frames 3800... +[2024-11-07 14:56:27,472][04584] Num frames 3900... +[2024-11-07 14:56:27,634][04584] Num frames 4000... +[2024-11-07 14:56:27,752][04584] Avg episode rewards: #0: 4.332, true rewards: #0: 4.032 +[2024-11-07 14:56:27,754][04584] Avg episode reward: 4.332, avg true_objective: 4.032 +[2024-11-07 14:56:40,409][04584] Replay video saved to /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/replay.mp4! +[2024-11-07 14:56:40,960][04584] Loading existing experiment configuration from /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/config.json +[2024-11-07 14:56:40,962][04584] Overriding arg 'num_workers' with value 4 passed from command line +[2024-11-07 14:56:40,964][04584] Adding new argument 'no_render'=True that is not in the saved config file! +[2024-11-07 14:56:40,966][04584] Adding new argument 'save_video'=True that is not in the saved config file! +[2024-11-07 14:56:40,973][04584] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! +[2024-11-07 14:56:40,976][04584] Adding new argument 'video_name'=None that is not in the saved config file! +[2024-11-07 14:56:40,979][04584] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! +[2024-11-07 14:56:40,988][04584] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! +[2024-11-07 14:56:40,990][04584] Adding new argument 'push_to_hub'=True that is not in the saved config file! +[2024-11-07 14:56:40,993][04584] Adding new argument 'hf_repository'='alidenewade/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! +[2024-11-07 14:56:40,995][04584] Adding new argument 'policy_index'=0 that is not in the saved config file! +[2024-11-07 14:56:40,996][04584] Adding new argument 'eval_deterministic'=False that is not in the saved config file! +[2024-11-07 14:56:41,005][04584] Adding new argument 'train_script'=None that is not in the saved config file! +[2024-11-07 14:56:41,007][04584] Adding new argument 'enjoy_script'=None that is not in the saved config file! +[2024-11-07 14:56:41,009][04584] Using frameskip 1 and render_action_repeat=4 for evaluation +[2024-11-07 14:56:41,091][04584] RunningMeanStd input shape: (3, 72, 128) +[2024-11-07 14:56:41,095][04584] RunningMeanStd input shape: (1,) +[2024-11-07 14:56:41,124][04584] ConvEncoder: input_channels=3 +[2024-11-07 14:56:41,187][04584] Conv encoder output size: 512 +[2024-11-07 14:56:41,188][04584] Policy head output size: 512 +[2024-11-07 14:56:41,211][04584] Loading state from checkpoint /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000001957_8015872.pth... +[2024-11-07 14:56:41,745][04584] Num frames 100... +[2024-11-07 14:56:41,957][04584] Num frames 200... +[2024-11-07 14:56:42,160][04584] Num frames 300... +[2024-11-07 14:56:42,410][04584] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840 +[2024-11-07 14:56:42,414][04584] Avg episode reward: 3.840, avg true_objective: 3.840 +[2024-11-07 14:56:42,475][04584] Num frames 400... +[2024-11-07 14:56:42,737][04584] Num frames 500... +[2024-11-07 14:56:43,040][04584] Num frames 600... +[2024-11-07 14:56:43,312][04584] Num frames 700... +[2024-11-07 14:56:43,592][04584] Num frames 800... +[2024-11-07 14:56:43,713][04584] Avg episode rewards: #0: 4.660, true rewards: #0: 4.160 +[2024-11-07 14:56:43,714][04584] Avg episode reward: 4.660, avg true_objective: 4.160 +[2024-11-07 14:56:43,872][04584] Num frames 900... +[2024-11-07 14:56:44,191][04584] Num frames 1000... +[2024-11-07 14:56:44,433][04584] Num frames 1100... +[2024-11-07 14:56:44,628][04584] Num frames 1200... +[2024-11-07 14:56:44,766][04584] Avg episode rewards: #0: 4.827, true rewards: #0: 4.160 +[2024-11-07 14:56:44,772][04584] Avg episode reward: 4.827, avg true_objective: 4.160 +[2024-11-07 14:56:44,879][04584] Num frames 1300... +[2024-11-07 14:56:45,064][04584] Num frames 1400... +[2024-11-07 14:56:45,236][04584] Num frames 1500... +[2024-11-07 14:56:45,421][04584] Num frames 1600... +[2024-11-07 14:56:47,592][04584] Avg episode rewards: #0: 4.580, true rewards: #0: 4.080 +[2024-11-07 14:56:47,594][04584] Avg episode reward: 4.580, avg true_objective: 4.080 +[2024-11-07 14:56:47,724][04584] Num frames 1700... +[2024-11-07 14:56:47,905][04584] Num frames 1800... +[2024-11-07 14:56:48,109][04584] Num frames 1900... +[2024-11-07 14:56:48,295][04584] Num frames 2000... +[2024-11-07 14:56:48,425][04584] Avg episode rewards: #0: 4.680, true rewards: #0: 4.080 +[2024-11-07 14:56:48,428][04584] Avg episode reward: 4.680, avg true_objective: 4.080 +[2024-11-07 14:56:48,562][04584] Num frames 2100... +[2024-11-07 14:56:48,738][04584] Num frames 2200... +[2024-11-07 14:56:48,913][04584] Num frames 2300... +[2024-11-07 14:56:49,095][04584] Num frames 2400... +[2024-11-07 14:56:49,191][04584] Avg episode rewards: #0: 4.540, true rewards: #0: 4.040 +[2024-11-07 14:56:49,193][04584] Avg episode reward: 4.540, avg true_objective: 4.040 +[2024-11-07 14:56:49,333][04584] Num frames 2500... +[2024-11-07 14:56:49,501][04584] Num frames 2600... +[2024-11-07 14:56:49,658][04584] Num frames 2700... +[2024-11-07 14:56:49,848][04584] Num frames 2800... +[2024-11-07 14:56:49,918][04584] Avg episode rewards: #0: 4.440, true rewards: #0: 4.011 +[2024-11-07 14:56:49,921][04584] Avg episode reward: 4.440, avg true_objective: 4.011 +[2024-11-07 14:56:50,094][04584] Num frames 2900... +[2024-11-07 14:56:50,431][04584] Num frames 3000... +[2024-11-07 14:56:50,666][04584] Num frames 3100... +[2024-11-07 14:56:50,917][04584] Avg episode rewards: #0: 4.365, true rewards: #0: 3.990 +[2024-11-07 14:56:50,920][04584] Avg episode reward: 4.365, avg true_objective: 3.990 +[2024-11-07 14:56:50,945][04584] Num frames 3200... +[2024-11-07 14:56:51,105][04584] Num frames 3300... +[2024-11-07 14:56:51,283][04584] Num frames 3400... +[2024-11-07 14:56:51,493][04584] Num frames 3500... +[2024-11-07 14:56:51,666][04584] Num frames 3600... +[2024-11-07 14:56:51,793][04584] Avg episode rewards: #0: 4.489, true rewards: #0: 4.044 +[2024-11-07 14:56:51,795][04584] Avg episode reward: 4.489, avg true_objective: 4.044 +[2024-11-07 14:56:51,923][04584] Num frames 3700... +[2024-11-07 14:56:52,126][04584] Num frames 3800... +[2024-11-07 14:56:52,324][04584] Num frames 3900... +[2024-11-07 14:56:52,514][04584] Num frames 4000... +[2024-11-07 14:56:52,732][04584] Avg episode rewards: #0: 4.588, true rewards: #0: 4.088 +[2024-11-07 14:56:52,736][04584] Avg episode reward: 4.588, avg true_objective: 4.088 +[2024-11-07 14:57:02,444][04584] Replay video saved to /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/replay.mp4! +[2024-11-07 14:57:10,868][04584] The model has been pushed to https://huggingface.co/alidenewade/rl_course_vizdoom_health_gathering_supreme +[2024-11-07 14:59:42,796][04584] Environment doom_basic already registered, overwriting... +[2024-11-07 14:59:42,798][04584] Environment doom_two_colors_easy already registered, overwriting... +[2024-11-07 14:59:42,800][04584] Environment doom_two_colors_hard already registered, overwriting... +[2024-11-07 14:59:42,802][04584] Environment doom_dm already registered, overwriting... +[2024-11-07 14:59:42,803][04584] Environment doom_dwango5 already registered, overwriting... +[2024-11-07 14:59:42,804][04584] Environment doom_my_way_home_flat_actions already registered, overwriting... +[2024-11-07 14:59:42,805][04584] Environment doom_defend_the_center_flat_actions already registered, overwriting... +[2024-11-07 14:59:42,806][04584] Environment doom_my_way_home already registered, overwriting... +[2024-11-07 14:59:42,808][04584] Environment doom_deadly_corridor already registered, overwriting... +[2024-11-07 14:59:42,809][04584] Environment doom_defend_the_center already registered, overwriting... +[2024-11-07 14:59:42,813][04584] Environment doom_defend_the_line already registered, overwriting... +[2024-11-07 14:59:42,814][04584] Environment doom_health_gathering already registered, overwriting... +[2024-11-07 14:59:42,815][04584] Environment doom_health_gathering_supreme already registered, overwriting... +[2024-11-07 14:59:42,817][04584] Environment doom_battle already registered, overwriting... +[2024-11-07 14:59:42,820][04584] Environment doom_battle2 already registered, overwriting... +[2024-11-07 14:59:42,822][04584] Environment doom_duel_bots already registered, overwriting... +[2024-11-07 14:59:42,825][04584] Environment doom_deathmatch_bots already registered, overwriting... +[2024-11-07 14:59:42,828][04584] Environment doom_duel already registered, overwriting... +[2024-11-07 14:59:42,829][04584] Environment doom_deathmatch_full already registered, overwriting... +[2024-11-07 14:59:42,831][04584] Environment doom_benchmark already registered, overwriting... +[2024-11-07 14:59:42,833][04584] register_encoder_factory: +[2024-11-07 15:01:10,944][04584] Environment doom_basic already registered, overwriting... +[2024-11-07 15:01:10,947][04584] Environment doom_two_colors_easy already registered, overwriting... +[2024-11-07 15:01:10,949][04584] Environment doom_two_colors_hard already registered, overwriting... +[2024-11-07 15:01:10,950][04584] Environment doom_dm already registered, overwriting... +[2024-11-07 15:01:10,951][04584] Environment doom_dwango5 already registered, overwriting... +[2024-11-07 15:01:10,953][04584] Environment doom_my_way_home_flat_actions already registered, overwriting... +[2024-11-07 15:01:10,954][04584] Environment doom_defend_the_center_flat_actions already registered, overwriting... +[2024-11-07 15:01:10,956][04584] Environment doom_my_way_home already registered, overwriting... +[2024-11-07 15:01:10,958][04584] Environment doom_deadly_corridor already registered, overwriting... +[2024-11-07 15:01:10,960][04584] Environment doom_defend_the_center already registered, overwriting... +[2024-11-07 15:01:10,962][04584] Environment doom_defend_the_line already registered, overwriting... +[2024-11-07 15:01:10,963][04584] Environment doom_health_gathering already registered, overwriting... +[2024-11-07 15:01:10,965][04584] Environment doom_health_gathering_supreme already registered, overwriting... +[2024-11-07 15:01:10,967][04584] Environment doom_battle already registered, overwriting... +[2024-11-07 15:01:10,969][04584] Environment doom_battle2 already registered, overwriting... +[2024-11-07 15:01:10,971][04584] Environment doom_duel_bots already registered, overwriting... +[2024-11-07 15:01:10,974][04584] Environment doom_deathmatch_bots already registered, overwriting... +[2024-11-07 15:01:10,975][04584] Environment doom_duel already registered, overwriting... +[2024-11-07 15:01:10,976][04584] Environment doom_deathmatch_full already registered, overwriting... +[2024-11-07 15:01:10,979][04584] Environment doom_benchmark already registered, overwriting... +[2024-11-07 15:01:10,983][04584] register_encoder_factory: +[2024-11-07 15:01:11,005][04584] Loading existing experiment configuration from /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/config.json +[2024-11-07 15:01:11,008][04584] Overriding arg 'num_workers' with value 10 passed from command line +[2024-11-07 15:01:11,010][04584] Overriding arg 'num_envs_per_worker' with value 6 passed from command line +[2024-11-07 15:01:11,011][04584] Overriding arg 'train_for_env_steps' with value 16000000 passed from command line +[2024-11-07 15:01:11,021][04584] Experiment dir /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment already exists! +[2024-11-07 15:01:11,022][04584] Resuming existing experiment from /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment... +[2024-11-07 15:01:11,024][04584] Weights and Biases integration disabled +[2024-11-07 15:01:11,027][04584] Environment var CUDA_VISIBLE_DEVICES is 0 + +[2024-11-07 15:01:16,848][04584] Starting experiment with the following configuration: +help=False +algo=APPO +env=doom_health_gathering_supreme +experiment=default_experiment +train_dir=/root/hfRL/ml/LunarLander-v2/train_dir +restart_behavior=resume +device=gpu +seed=None +num_policies=1 +async_rl=True +serial_mode=False +batched_sampling=False +num_batches_to_accumulate=2 +worker_num_splits=2 +policy_workers_per_policy=1 +max_policy_lag=1000 +num_workers=10 +num_envs_per_worker=6 +batch_size=1024 +num_batches_per_epoch=1 +num_epochs=1 +rollout=32 +recurrence=32 +shuffle_minibatches=False +gamma=0.99 +reward_scale=1.0 +reward_clip=1000.0 +value_bootstrap=False +normalize_returns=True +exploration_loss_coeff=0.001 +value_loss_coeff=0.5 +kl_loss_coeff=0.0 +exploration_loss=symmetric_kl +gae_lambda=0.95 +ppo_clip_ratio=0.1 +ppo_clip_value=0.2 +with_vtrace=False +vtrace_rho=1.0 +vtrace_c=1.0 +optimizer=adam +adam_eps=1e-06 +adam_beta1=0.9 +adam_beta2=0.999 +max_grad_norm=4.0 +learning_rate=0.0001 +lr_schedule=constant +lr_schedule_kl_threshold=0.008 +lr_adaptive_min=1e-06 +lr_adaptive_max=0.01 +obs_subtract_mean=0.0 +obs_scale=255.0 +normalize_input=True +normalize_input_keys=None +decorrelate_experience_max_seconds=0 +decorrelate_envs_on_one_worker=True +actor_worker_gpus=[] +set_workers_cpu_affinity=True +force_envs_single_thread=False +default_niceness=0 +log_to_file=True +experiment_summaries_interval=10 +flush_summaries_interval=30 +stats_avg=100 +summaries_use_frameskip=True +heartbeat_interval=20 +heartbeat_reporting_interval=600 +train_for_env_steps=16000000 +train_for_seconds=10000000000 +save_every_sec=120 +keep_checkpoints=2 +load_checkpoint_kind=latest +save_milestones_sec=-1 +save_best_every_sec=5 +save_best_metric=reward +save_best_after=100000 +benchmark=False +encoder_mlp_layers=[512, 512] +encoder_conv_architecture=convnet_simple +encoder_conv_mlp_layers=[512] +use_rnn=True +rnn_size=512 +rnn_type=gru +rnn_num_layers=1 +decoder_mlp_layers=[] +nonlinearity=elu +policy_initialization=orthogonal +policy_init_gain=1.0 +actor_critic_share_weights=True +adaptive_stddev=True +continuous_tanh_scale=0.0 +initial_stddev=1.0 +use_env_info_cache=False +env_gpu_actions=False +env_gpu_observations=True +env_frameskip=4 +env_framestack=1 +pixel_format=CHW +use_record_episode_statistics=False +with_wandb=False +wandb_user=None +wandb_project=sample_factory +wandb_group=None +wandb_job_type=SF +wandb_tags=[] +with_pbt=False +pbt_mix_policies_in_one_env=True +pbt_period_env_steps=5000000 +pbt_start_mutation=20000000 +pbt_replace_fraction=0.3 +pbt_mutation_rate=0.15 +pbt_replace_reward_gap=0.1 +pbt_replace_reward_gap_absolute=1e-06 +pbt_optimize_gamma=False +pbt_target_objective=true_objective +pbt_perturb_min=1.1 +pbt_perturb_max=1.5 +num_agents=-1 +num_humans=0 +num_bots=-1 +start_bot_difficulty=None +timelimit=None +res_w=128 +res_h=72 +wide_aspect_ratio=False +eval_env_frameskip=1 +fps=35 +command_line=--env=doom_health_gathering_supreme --num_workers=8 --num_envs_per_worker=4 --train_for_env_steps=4000000 +cli_args={'env': 'doom_health_gathering_supreme', 'num_workers': 8, 'num_envs_per_worker': 4, 'train_for_env_steps': 4000000} +git_hash=unknown +git_repo_name=not a git repository +[2024-11-07 15:01:16,849][04584] Saving configuration to /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/config.json... +[2024-11-07 15:01:16,851][04584] Rollout worker 0 uses device cpu +[2024-11-07 15:01:16,852][04584] Rollout worker 1 uses device cpu +[2024-11-07 15:01:16,854][04584] Rollout worker 2 uses device cpu +[2024-11-07 15:01:16,855][04584] Rollout worker 3 uses device cpu +[2024-11-07 15:01:16,857][04584] Rollout worker 4 uses device cpu +[2024-11-07 15:01:16,859][04584] Rollout worker 5 uses device cpu +[2024-11-07 15:01:16,862][04584] Rollout worker 6 uses device cpu +[2024-11-07 15:01:16,863][04584] Rollout worker 7 uses device cpu +[2024-11-07 15:01:16,866][04584] Rollout worker 8 uses device cpu +[2024-11-07 15:01:16,868][04584] Rollout worker 9 uses device cpu +[2024-11-07 15:01:17,011][04584] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-11-07 15:01:17,012][04584] InferenceWorker_p0-w0: min num requests: 3 +[2024-11-07 15:01:17,055][04584] Starting all processes... +[2024-11-07 15:01:17,056][04584] Starting process learner_proc0 +[2024-11-07 15:01:17,097][04584] Starting all processes... +[2024-11-07 15:01:17,104][04584] Starting process inference_proc0-0 +[2024-11-07 15:01:17,106][04584] Starting process rollout_proc0 +[2024-11-07 15:01:17,106][04584] Starting process rollout_proc1 +[2024-11-07 15:01:17,106][04584] Starting process rollout_proc2 +[2024-11-07 15:01:17,107][04584] Starting process rollout_proc3 +[2024-11-07 15:01:17,109][04584] Starting process rollout_proc4 +[2024-11-07 15:01:17,109][04584] Starting process rollout_proc5 +[2024-11-07 15:01:17,112][04584] Starting process rollout_proc6 +[2024-11-07 15:01:17,113][04584] Starting process rollout_proc7 +[2024-11-07 15:01:17,114][04584] Starting process rollout_proc8 +[2024-11-07 15:01:17,125][04584] Starting process rollout_proc9 +[2024-11-07 15:01:25,913][09025] Worker 0 uses CPU cores [0] +[2024-11-07 15:01:25,954][09009] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-11-07 15:01:25,954][09009] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 +[2024-11-07 15:01:26,005][09009] Num visible devices: 1 +[2024-11-07 15:01:26,051][09009] Starting seed is not provided +[2024-11-07 15:01:26,051][09009] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-11-07 15:01:26,051][09009] Initializing actor-critic model on device cuda:0 +[2024-11-07 15:01:26,052][09009] RunningMeanStd input shape: (3, 72, 128) +[2024-11-07 15:01:26,053][09009] RunningMeanStd input shape: (1,) +[2024-11-07 15:01:26,083][09009] ConvEncoder: input_channels=3 +[2024-11-07 15:01:26,334][09028] Worker 5 uses CPU cores [5] +[2024-11-07 15:01:26,676][09009] Conv encoder output size: 512 +[2024-11-07 15:01:26,677][09009] Policy head output size: 512 +[2024-11-07 15:01:26,705][09009] Created Actor Critic model with architecture: +[2024-11-07 15:01:26,706][09009] ActorCriticSharedWeights( + (obs_normalizer): ObservationNormalizer( + (running_mean_std): RunningMeanStdDictInPlace( + (running_mean_std): ModuleDict( + (obs): RunningMeanStdInPlace() + ) + ) + ) + (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) + (encoder): VizdoomEncoder( + (basic_encoder): ConvEncoder( + (enc): RecursiveScriptModule( + original_name=ConvEncoderImpl + (conv_head): RecursiveScriptModule( + original_name=Sequential + (0): RecursiveScriptModule(original_name=Conv2d) + (1): RecursiveScriptModule(original_name=ELU) + (2): RecursiveScriptModule(original_name=Conv2d) + (3): RecursiveScriptModule(original_name=ELU) + (4): RecursiveScriptModule(original_name=Conv2d) + (5): RecursiveScriptModule(original_name=ELU) + ) + (mlp_layers): RecursiveScriptModule( + original_name=Sequential + (0): RecursiveScriptModule(original_name=Linear) + (1): RecursiveScriptModule(original_name=ELU) + ) + ) + ) + ) + (core): ModelCoreRNN( + (core): GRU(512, 512) + ) + (decoder): MlpDecoder( + (mlp): Identity() + ) + (critic_linear): Linear(in_features=512, out_features=1, bias=True) + (action_parameterization): ActionParameterizationDefault( + (distribution_linear): Linear(in_features=512, out_features=5, bias=True) + ) +) +[2024-11-07 15:01:26,934][09037] Worker 6 uses CPU cores [6] +[2024-11-07 15:01:27,144][09029] Worker 1 uses CPU cores [1] +[2024-11-07 15:01:27,256][09024] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-11-07 15:01:27,257][09024] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 +[2024-11-07 15:01:27,288][09038] Worker 7 uses CPU cores [0, 1, 2, 3, 4, 5, 6] +[2024-11-07 15:01:27,300][09024] Num visible devices: 1 +[2024-11-07 15:01:27,315][09009] Using optimizer +[2024-11-07 15:01:27,485][09026] Worker 3 uses CPU cores [3] +[2024-11-07 15:01:27,594][09027] Worker 2 uses CPU cores [2] +[2024-11-07 15:01:27,608][09040] Worker 8 uses CPU cores [0, 1, 2, 3, 4, 5, 6] +[2024-11-07 15:01:27,695][09039] Worker 9 uses CPU cores [0, 1, 2, 3, 4, 5, 6] +[2024-11-07 15:01:27,770][09030] Worker 4 uses CPU cores [4] +[2024-11-07 15:01:28,594][09009] Loading state from checkpoint /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000001957_8015872.pth... +[2024-11-07 15:01:28,657][09009] Loading model from checkpoint +[2024-11-07 15:01:28,659][09009] Loaded experiment state at self.train_step=1957, self.env_steps=8015872 +[2024-11-07 15:01:28,659][09009] Initialized policy 0 weights for model version 1957 +[2024-11-07 15:01:28,667][09009] LearnerWorker_p0 finished initialization! +[2024-11-07 15:01:28,667][09009] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-11-07 15:01:28,872][09024] RunningMeanStd input shape: (3, 72, 128) +[2024-11-07 15:01:28,873][09024] RunningMeanStd input shape: (1,) +[2024-11-07 15:01:28,885][09024] ConvEncoder: input_channels=3 +[2024-11-07 15:01:28,989][09024] Conv encoder output size: 512 +[2024-11-07 15:01:28,990][09024] Policy head output size: 512 +[2024-11-07 15:01:29,034][04584] Inference worker 0-0 is ready! +[2024-11-07 15:01:29,035][04584] All inference workers are ready! Signal rollout workers to start! +[2024-11-07 15:01:29,114][09030] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 15:01:29,123][09028] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 15:01:29,124][09029] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 15:01:29,146][09037] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 15:01:29,171][09027] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 15:01:29,179][09026] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 15:01:29,195][09038] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 15:01:29,196][09040] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 15:01:29,201][09039] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 15:01:29,234][09025] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 15:01:29,686][09030] Decorrelating experience for 0 frames... +[2024-11-07 15:01:29,772][09028] Decorrelating experience for 0 frames... +[2024-11-07 15:01:29,822][09026] Decorrelating experience for 0 frames... +[2024-11-07 15:01:29,854][09029] Decorrelating experience for 0 frames... +[2024-11-07 15:01:29,856][09039] Decorrelating experience for 0 frames... +[2024-11-07 15:01:29,892][09025] Decorrelating experience for 0 frames... +[2024-11-07 15:01:30,044][09037] Decorrelating experience for 0 frames... +[2024-11-07 15:01:30,158][09028] Decorrelating experience for 32 frames... +[2024-11-07 15:01:30,194][09029] Decorrelating experience for 32 frames... +[2024-11-07 15:01:30,250][09025] Decorrelating experience for 32 frames... +[2024-11-07 15:01:30,266][09026] Decorrelating experience for 32 frames... +[2024-11-07 15:01:30,378][09030] Decorrelating experience for 32 frames... +[2024-11-07 15:01:30,568][09029] Decorrelating experience for 64 frames... +[2024-11-07 15:01:30,574][09038] Decorrelating experience for 0 frames... +[2024-11-07 15:01:30,755][09037] Decorrelating experience for 32 frames... +[2024-11-07 15:01:30,762][09025] Decorrelating experience for 64 frames... +[2024-11-07 15:01:30,781][09028] Decorrelating experience for 64 frames... +[2024-11-07 15:01:30,817][09040] Decorrelating experience for 0 frames... +[2024-11-07 15:01:31,015][09038] Decorrelating experience for 32 frames... +[2024-11-07 15:01:31,028][04584] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 8015872. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-11-07 15:01:31,099][09029] Decorrelating experience for 96 frames... +[2024-11-07 15:01:31,165][09037] Decorrelating experience for 64 frames... +[2024-11-07 15:01:31,187][09039] Decorrelating experience for 32 frames... +[2024-11-07 15:01:31,232][09027] Decorrelating experience for 0 frames... +[2024-11-07 15:01:31,277][09040] Decorrelating experience for 32 frames... +[2024-11-07 15:01:31,623][09025] Decorrelating experience for 96 frames... +[2024-11-07 15:01:31,645][09030] Decorrelating experience for 64 frames... +[2024-11-07 15:01:31,670][09027] Decorrelating experience for 32 frames... +[2024-11-07 15:01:31,724][09038] Decorrelating experience for 64 frames... +[2024-11-07 15:01:31,779][09039] Decorrelating experience for 64 frames... +[2024-11-07 15:01:31,820][09037] Decorrelating experience for 96 frames... +[2024-11-07 15:01:32,124][09028] Decorrelating experience for 96 frames... +[2024-11-07 15:01:32,158][09040] Decorrelating experience for 64 frames... +[2024-11-07 15:01:32,236][09025] Decorrelating experience for 128 frames... +[2024-11-07 15:01:32,377][09037] Decorrelating experience for 128 frames... +[2024-11-07 15:01:32,378][09039] Decorrelating experience for 96 frames... +[2024-11-07 15:01:32,382][09027] Decorrelating experience for 64 frames... +[2024-11-07 15:01:32,595][09029] Decorrelating experience for 128 frames... +[2024-11-07 15:01:32,641][09038] Decorrelating experience for 96 frames... +[2024-11-07 15:01:32,642][09028] Decorrelating experience for 128 frames... +[2024-11-07 15:01:32,801][09025] Decorrelating experience for 160 frames... +[2024-11-07 15:01:32,914][09037] Decorrelating experience for 160 frames... +[2024-11-07 15:01:33,079][09039] Decorrelating experience for 128 frames... +[2024-11-07 15:01:33,111][09029] Decorrelating experience for 160 frames... +[2024-11-07 15:01:33,115][09040] Decorrelating experience for 96 frames... +[2024-11-07 15:01:33,257][09030] Decorrelating experience for 96 frames... +[2024-11-07 15:01:33,332][09028] Decorrelating experience for 160 frames... +[2024-11-07 15:01:33,679][09027] Decorrelating experience for 96 frames... +[2024-11-07 15:01:33,725][09038] Decorrelating experience for 128 frames... +[2024-11-07 15:01:34,031][09040] Decorrelating experience for 128 frames... +[2024-11-07 15:01:34,034][09026] Decorrelating experience for 64 frames... +[2024-11-07 15:01:34,045][09039] Decorrelating experience for 160 frames... +[2024-11-07 15:01:34,590][09027] Decorrelating experience for 128 frames... +[2024-11-07 15:01:34,691][09038] Decorrelating experience for 160 frames... +[2024-11-07 15:01:34,869][09030] Decorrelating experience for 128 frames... +[2024-11-07 15:01:35,419][09026] Decorrelating experience for 96 frames... +[2024-11-07 15:01:35,430][09027] Decorrelating experience for 160 frames... +[2024-11-07 15:01:35,434][09040] Decorrelating experience for 160 frames... +[2024-11-07 15:01:36,027][04584] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 8015872. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-11-07 15:01:36,611][09026] Decorrelating experience for 128 frames... +[2024-11-07 15:01:36,611][09030] Decorrelating experience for 160 frames... +[2024-11-07 15:01:37,001][04584] Heartbeat connected on Batcher_0 +[2024-11-07 15:01:37,006][04584] Heartbeat connected on LearnerWorker_p0 +[2024-11-07 15:01:37,020][04584] Heartbeat connected on RolloutWorker_w0 +[2024-11-07 15:01:37,022][04584] Heartbeat connected on RolloutWorker_w1 +[2024-11-07 15:01:37,030][04584] Heartbeat connected on RolloutWorker_w2 +[2024-11-07 15:01:37,037][04584] Heartbeat connected on RolloutWorker_w4 +[2024-11-07 15:01:37,040][04584] Heartbeat connected on RolloutWorker_w5 +[2024-11-07 15:01:37,047][04584] Heartbeat connected on RolloutWorker_w7 +[2024-11-07 15:01:37,050][04584] Heartbeat connected on RolloutWorker_w8 +[2024-11-07 15:01:37,054][04584] Heartbeat connected on RolloutWorker_w6 +[2024-11-07 15:01:37,056][04584] Heartbeat connected on RolloutWorker_w9 +[2024-11-07 15:01:37,083][04584] Heartbeat connected on InferenceWorker_p0-w0 +[2024-11-07 15:01:37,535][09026] Decorrelating experience for 160 frames... +[2024-11-07 15:01:37,748][04584] Heartbeat connected on RolloutWorker_w3 +[2024-11-07 15:01:39,624][09009] Signal inference workers to stop experience collection... +[2024-11-07 15:01:39,636][09024] InferenceWorker_p0-w0: stopping experience collection +[2024-11-07 15:01:41,028][04584] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 8015872. Throughput: 0: 266.1. Samples: 2661. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-11-07 15:01:41,030][04584] Avg episode reward: [(0, '1.997')] +[2024-11-07 15:01:46,027][04584] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 8015872. Throughput: 0: 322.8. Samples: 4842. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-11-07 15:01:46,029][04584] Avg episode reward: [(0, '1.997')] +[2024-11-07 15:01:49,711][09009] Signal inference workers to resume experience collection... +[2024-11-07 15:01:49,726][09024] InferenceWorker_p0-w0: resuming experience collection +[2024-11-07 15:01:51,029][04584] Fps is (10 sec: 1638.4, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 8032256. Throughput: 0: 248.1. Samples: 4962. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) +[2024-11-07 15:01:51,034][04584] Avg episode reward: [(0, '2.057')] +[2024-11-07 15:01:56,244][04584] Fps is (10 sec: 3207.3, 60 sec: 1299.5, 300 sec: 1299.5). Total num frames: 8048640. Throughput: 0: 315.9. Samples: 7965. Policy #0 lag: (min: 0.0, avg: 1.5, max: 5.0) +[2024-11-07 15:01:56,253][04584] Avg episode reward: [(0, '3.230')] +[2024-11-07 15:01:58,030][09024] Updated weights for policy 0, policy_version 1967 (0.0072) +[2024-11-07 15:02:01,029][04584] Fps is (10 sec: 3276.4, 60 sec: 1638.3, 300 sec: 1638.3). Total num frames: 8065024. Throughput: 0: 400.0. Samples: 12000. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2024-11-07 15:02:01,049][04584] Avg episode reward: [(0, '4.097')] +[2024-11-07 15:02:06,035][04584] Fps is (10 sec: 2928.5, 60 sec: 1755.1, 300 sec: 1755.1). Total num frames: 8077312. Throughput: 0: 478.1. Samples: 16737. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) +[2024-11-07 15:02:06,038][04584] Avg episode reward: [(0, '4.522')] +[2024-11-07 15:02:09,709][09024] Updated weights for policy 0, policy_version 1977 (0.0075) +[2024-11-07 15:02:11,028][04584] Fps is (10 sec: 3686.6, 60 sec: 2150.4, 300 sec: 2150.4). Total num frames: 8101888. Throughput: 0: 483.9. Samples: 19356. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2024-11-07 15:02:11,034][04584] Avg episode reward: [(0, '4.531')] +[2024-11-07 15:02:16,028][04584] Fps is (10 sec: 5328.5, 60 sec: 2548.6, 300 sec: 2548.6). Total num frames: 8130560. Throughput: 0: 634.1. Samples: 28533. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2024-11-07 15:02:16,037][04584] Avg episode reward: [(0, '4.269')] +[2024-11-07 15:02:16,968][09024] Updated weights for policy 0, policy_version 1987 (0.0049) +[2024-11-07 15:02:21,028][04584] Fps is (10 sec: 5325.0, 60 sec: 2785.3, 300 sec: 2785.3). Total num frames: 8155136. Throughput: 0: 791.9. Samples: 35634. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) +[2024-11-07 15:02:21,040][04584] Avg episode reward: [(0, '4.001')] +[2024-11-07 15:02:24,670][09024] Updated weights for policy 0, policy_version 1997 (0.0064) +[2024-11-07 15:02:26,032][04584] Fps is (10 sec: 5731.8, 60 sec: 3127.6, 300 sec: 3127.6). Total num frames: 8187904. Throughput: 0: 829.0. Samples: 39969. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) +[2024-11-07 15:02:26,037][04584] Avg episode reward: [(0, '4.182')] +[2024-11-07 15:02:31,028][04584] Fps is (10 sec: 4505.7, 60 sec: 3072.0, 300 sec: 3072.0). Total num frames: 8200192. Throughput: 0: 954.2. Samples: 47781. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) +[2024-11-07 15:02:31,029][04584] Avg episode reward: [(0, '4.225')] +[2024-11-07 15:02:34,783][09024] Updated weights for policy 0, policy_version 2007 (0.0084) +[2024-11-07 15:02:36,032][04584] Fps is (10 sec: 3686.7, 60 sec: 3481.4, 300 sec: 3213.6). Total num frames: 8224768. Throughput: 0: 1066.8. Samples: 52974. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2024-11-07 15:02:36,045][04584] Avg episode reward: [(0, '4.560')] +[2024-11-07 15:02:41,030][04584] Fps is (10 sec: 5733.4, 60 sec: 4027.6, 300 sec: 3452.2). Total num frames: 8257536. Throughput: 0: 1122.3. Samples: 58227. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2024-11-07 15:02:41,181][04584] Avg episode reward: [(0, '4.499')] +[2024-11-07 15:02:42,375][09024] Updated weights for policy 0, policy_version 2017 (0.0076) +[2024-11-07 15:02:46,028][04584] Fps is (10 sec: 5736.4, 60 sec: 4437.3, 300 sec: 3549.8). Total num frames: 8282112. Throughput: 0: 1179.7. Samples: 65085. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2024-11-07 15:02:46,033][04584] Avg episode reward: [(0, '4.458')] +[2024-11-07 15:02:48,966][09024] Updated weights for policy 0, policy_version 2027 (0.0057) +[2024-11-07 15:02:51,044][04584] Fps is (10 sec: 5317.1, 60 sec: 4640.9, 300 sec: 3685.6). Total num frames: 8310784. Throughput: 0: 1271.3. Samples: 73959. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) +[2024-11-07 15:02:51,050][04584] Avg episode reward: [(0, '4.488')] +[2024-11-07 15:02:55,463][09024] Updated weights for policy 0, policy_version 2037 (0.0045) +[2024-11-07 15:02:56,027][04584] Fps is (10 sec: 6553.9, 60 sec: 5001.5, 300 sec: 3903.3). Total num frames: 8347648. Throughput: 0: 1327.8. Samples: 79104. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2024-11-07 15:02:56,029][04584] Avg episode reward: [(0, '4.301')] +[2024-11-07 15:03:01,028][04584] Fps is (10 sec: 6974.4, 60 sec: 5256.6, 300 sec: 4050.5). Total num frames: 8380416. Throughput: 0: 1384.7. Samples: 90843. Policy #0 lag: (min: 0.0, avg: 1.6, max: 3.0) +[2024-11-07 15:03:01,035][04584] Avg episode reward: [(0, '4.534')] +[2024-11-07 15:03:01,342][09024] Updated weights for policy 0, policy_version 2047 (0.0066) +[2024-11-07 15:03:06,028][04584] Fps is (10 sec: 4505.6, 60 sec: 5257.2, 300 sec: 3966.7). Total num frames: 8392704. Throughput: 0: 1321.1. Samples: 95085. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) +[2024-11-07 15:03:06,035][04584] Avg episode reward: [(0, '4.484')] +[2024-11-07 15:03:11,030][04584] Fps is (10 sec: 4095.1, 60 sec: 5324.6, 300 sec: 4054.9). Total num frames: 8421376. Throughput: 0: 1306.7. Samples: 98769. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2024-11-07 15:03:11,033][04584] Avg episode reward: [(0, '4.455')] +[2024-11-07 15:03:11,076][09009] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000002056_8421376.pth... +[2024-11-07 15:03:12,209][09009] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000001955_8007680.pth +[2024-11-07 15:03:12,448][09024] Updated weights for policy 0, policy_version 2057 (0.0053) +[2024-11-07 15:03:16,028][04584] Fps is (10 sec: 5324.7, 60 sec: 5256.5, 300 sec: 4096.0). Total num frames: 8445952. Throughput: 0: 1310.5. Samples: 106755. Policy #0 lag: (min: 0.0, avg: 1.9, max: 3.0) +[2024-11-07 15:03:16,033][04584] Avg episode reward: [(0, '4.398')] +[2024-11-07 15:03:18,413][09024] Updated weights for policy 0, policy_version 2067 (0.0046) +[2024-11-07 15:03:21,028][04584] Fps is (10 sec: 6145.6, 60 sec: 5461.4, 300 sec: 4244.9). Total num frames: 8482816. Throughput: 0: 1428.1. Samples: 117234. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) +[2024-11-07 15:03:21,032][04584] Avg episode reward: [(0, '4.426')] +[2024-11-07 15:03:23,406][09024] Updated weights for policy 0, policy_version 2077 (0.0053) +[2024-11-07 15:03:26,029][04584] Fps is (10 sec: 8191.4, 60 sec: 5666.5, 300 sec: 4452.1). Total num frames: 8527872. Throughput: 0: 1449.0. Samples: 123429. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) +[2024-11-07 15:03:26,044][04584] Avg episode reward: [(0, '4.548')] +[2024-11-07 15:03:28,644][09024] Updated weights for policy 0, policy_version 2087 (0.0052) +[2024-11-07 15:03:31,028][04584] Fps is (10 sec: 8601.3, 60 sec: 6144.0, 300 sec: 4608.0). Total num frames: 8568832. Throughput: 0: 1560.3. Samples: 135300. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2024-11-07 15:03:31,038][04584] Avg episode reward: [(0, '4.514')] +[2024-11-07 15:03:33,631][09024] Updated weights for policy 0, policy_version 2097 (0.0036) +[2024-11-07 15:03:36,147][04584] Fps is (10 sec: 6882.2, 60 sec: 6200.4, 300 sec: 4648.6). Total num frames: 8597504. Throughput: 0: 1612.3. Samples: 146679. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2024-11-07 15:03:36,153][04584] Avg episode reward: [(0, '4.475')] +[2024-11-07 15:03:41,028][04584] Fps is (10 sec: 4096.1, 60 sec: 5871.1, 300 sec: 4568.6). Total num frames: 8609792. Throughput: 0: 1535.9. Samples: 148221. Policy #0 lag: (min: 0.0, avg: 1.5, max: 3.0) +[2024-11-07 15:03:41,049][04584] Avg episode reward: [(0, '4.297')] +[2024-11-07 15:03:43,494][09024] Updated weights for policy 0, policy_version 2107 (0.0056) +[2024-11-07 15:03:46,039][04584] Fps is (10 sec: 4554.6, 60 sec: 6006.4, 300 sec: 4641.8). Total num frames: 8642560. Throughput: 0: 1440.9. Samples: 155700. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) +[2024-11-07 15:03:46,062][04584] Avg episode reward: [(0, '4.360')] +[2024-11-07 15:03:50,512][09024] Updated weights for policy 0, policy_version 2117 (0.0062) +[2024-11-07 15:03:51,029][04584] Fps is (10 sec: 6552.9, 60 sec: 6077.3, 300 sec: 4710.4). Total num frames: 8675328. Throughput: 0: 1533.7. Samples: 164103. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2024-11-07 15:03:51,034][04584] Avg episode reward: [(0, '4.391')] +[2024-11-07 15:03:56,028][04584] Fps is (10 sec: 6560.9, 60 sec: 6007.5, 300 sec: 4774.0). Total num frames: 8708096. Throughput: 0: 1600.7. Samples: 170796. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 15:03:56,034][04584] Avg episode reward: [(0, '4.322')] +[2024-11-07 15:03:56,064][09024] Updated weights for policy 0, policy_version 2127 (0.0079) +[2024-11-07 15:04:01,028][04584] Fps is (10 sec: 6554.4, 60 sec: 6007.5, 300 sec: 4833.3). Total num frames: 8740864. Throughput: 0: 1627.9. Samples: 180012. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2024-11-07 15:04:01,029][04584] Avg episode reward: [(0, '4.624')] +[2024-11-07 15:04:03,142][09024] Updated weights for policy 0, policy_version 2137 (0.0080) +[2024-11-07 15:04:06,031][04584] Fps is (10 sec: 6551.5, 60 sec: 6348.5, 300 sec: 4888.7). Total num frames: 8773632. Throughput: 0: 1591.7. Samples: 188865. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2024-11-07 15:04:06,033][04584] Avg episode reward: [(0, '4.471')] +[2024-11-07 15:04:09,383][09024] Updated weights for policy 0, policy_version 2147 (0.0045) +[2024-11-07 15:04:11,030][04584] Fps is (10 sec: 6142.3, 60 sec: 6348.8, 300 sec: 4915.1). Total num frames: 8802304. Throughput: 0: 1567.7. Samples: 193977. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) +[2024-11-07 15:04:11,034][04584] Avg episode reward: [(0, '4.431')] +[2024-11-07 15:04:16,028][04584] Fps is (10 sec: 5326.3, 60 sec: 6348.8, 300 sec: 4915.2). Total num frames: 8826880. Throughput: 0: 1439.4. Samples: 200073. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) +[2024-11-07 15:04:16,031][04584] Avg episode reward: [(0, '4.391')] +[2024-11-07 15:04:17,708][09024] Updated weights for policy 0, policy_version 2157 (0.0058) +[2024-11-07 15:04:21,028][04584] Fps is (10 sec: 6145.7, 60 sec: 6348.8, 300 sec: 4987.5). Total num frames: 8863744. Throughput: 0: 1450.2. Samples: 211764. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2024-11-07 15:04:21,030][04584] Avg episode reward: [(0, '4.511')] +[2024-11-07 15:04:22,029][09024] Updated weights for policy 0, policy_version 2167 (0.0046) +[2024-11-07 15:04:26,028][04584] Fps is (10 sec: 7373.2, 60 sec: 6212.4, 300 sec: 5055.6). Total num frames: 8900608. Throughput: 0: 1557.3. Samples: 218301. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2024-11-07 15:04:26,031][04584] Avg episode reward: [(0, '4.462')] +[2024-11-07 15:04:28,346][09024] Updated weights for policy 0, policy_version 2177 (0.0073) +[2024-11-07 15:04:31,028][04584] Fps is (10 sec: 7372.6, 60 sec: 6144.0, 300 sec: 5120.0). Total num frames: 8937472. Throughput: 0: 1611.6. Samples: 228204. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2024-11-07 15:04:31,029][04584] Avg episode reward: [(0, '4.386')] +[2024-11-07 15:04:33,371][09024] Updated weights for policy 0, policy_version 2187 (0.0044) +[2024-11-07 15:04:36,034][04584] Fps is (10 sec: 8186.5, 60 sec: 6429.1, 300 sec: 5225.0). Total num frames: 8982528. Throughput: 0: 1702.5. Samples: 240723. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2024-11-07 15:04:36,038][04584] Avg episode reward: [(0, '4.766')] +[2024-11-07 15:04:37,745][09024] Updated weights for policy 0, policy_version 2197 (0.0031) +[2024-11-07 15:04:41,028][04584] Fps is (10 sec: 9011.4, 60 sec: 6963.2, 300 sec: 5324.8). Total num frames: 9027584. Throughput: 0: 1710.0. Samples: 247746. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2024-11-07 15:04:41,030][04584] Avg episode reward: [(0, '4.323')] +[2024-11-07 15:04:42,365][09024] Updated weights for policy 0, policy_version 2207 (0.0034) +[2024-11-07 15:04:47,914][04584] Fps is (10 sec: 7240.3, 60 sec: 6884.5, 300 sec: 5346.6). Total num frames: 9068544. Throughput: 0: 1731.7. Samples: 261207. Policy #0 lag: (min: 0.0, avg: 1.5, max: 3.0) +[2024-11-07 15:04:47,916][04584] Avg episode reward: [(0, '4.443')] +[2024-11-07 15:04:48,941][09024] Updated weights for policy 0, policy_version 2217 (0.0039) +[2024-11-07 15:04:51,029][04584] Fps is (10 sec: 6143.1, 60 sec: 6894.9, 300 sec: 5365.7). Total num frames: 9089024. Throughput: 0: 1786.5. Samples: 269256. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2024-11-07 15:04:51,038][04584] Avg episode reward: [(0, '4.315')] +[2024-11-07 15:04:55,889][09024] Updated weights for policy 0, policy_version 2227 (0.0049) +[2024-11-07 15:04:56,028][04584] Fps is (10 sec: 6563.0, 60 sec: 6894.9, 300 sec: 5394.7). Total num frames: 9121792. Throughput: 0: 1776.3. Samples: 273906. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) +[2024-11-07 15:04:56,030][04584] Avg episode reward: [(0, '4.585')] +[2024-11-07 15:05:01,028][04584] Fps is (10 sec: 6145.0, 60 sec: 6826.7, 300 sec: 5402.8). Total num frames: 9150464. Throughput: 0: 1833.8. Samples: 282591. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2024-11-07 15:05:01,030][04584] Avg episode reward: [(0, '4.626')] +[2024-11-07 15:05:02,542][09024] Updated weights for policy 0, policy_version 2237 (0.0078) +[2024-11-07 15:05:06,028][04584] Fps is (10 sec: 7372.5, 60 sec: 7031.8, 300 sec: 5486.7). Total num frames: 9195520. Throughput: 0: 1828.4. Samples: 294042. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2024-11-07 15:05:06,031][04584] Avg episode reward: [(0, '4.594')] +[2024-11-07 15:05:06,896][09024] Updated weights for policy 0, policy_version 2247 (0.0041) +[2024-11-07 15:05:11,028][04584] Fps is (10 sec: 8601.0, 60 sec: 7236.5, 300 sec: 5548.2). Total num frames: 9236480. Throughput: 0: 1833.2. Samples: 300798. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) +[2024-11-07 15:05:11,030][04584] Avg episode reward: [(0, '4.237')] +[2024-11-07 15:05:11,039][09009] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000002255_9236480.pth... +[2024-11-07 15:05:11,180][09009] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000001957_8015872.pth +[2024-11-07 15:05:11,798][09024] Updated weights for policy 0, policy_version 2257 (0.0037) +[2024-11-07 15:05:16,028][04584] Fps is (10 sec: 6963.7, 60 sec: 7304.6, 300 sec: 5552.4). Total num frames: 9265152. Throughput: 0: 1843.3. Samples: 311151. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2024-11-07 15:05:16,029][04584] Avg episode reward: [(0, '4.511')] +[2024-11-07 15:05:18,230][09024] Updated weights for policy 0, policy_version 2267 (0.0036) +[2024-11-07 15:05:22,243][04584] Fps is (10 sec: 5478.3, 60 sec: 7092.5, 300 sec: 5544.8). Total num frames: 9297920. Throughput: 0: 1763.7. Samples: 322221. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2024-11-07 15:05:22,263][04584] Avg episode reward: [(0, '4.497')] +[2024-11-07 15:05:25,579][09024] Updated weights for policy 0, policy_version 2277 (0.0035) +[2024-11-07 15:05:26,028][04584] Fps is (10 sec: 6553.6, 60 sec: 7168.0, 300 sec: 5595.0). Total num frames: 9330688. Throughput: 0: 1691.6. Samples: 323868. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) +[2024-11-07 15:05:26,029][04584] Avg episode reward: [(0, '4.590')] +[2024-11-07 15:05:30,294][09024] Updated weights for policy 0, policy_version 2287 (0.0041) +[2024-11-07 15:05:31,028][04584] Fps is (10 sec: 8393.2, 60 sec: 7236.3, 300 sec: 5649.1). Total num frames: 9371648. Throughput: 0: 1751.2. Samples: 336708. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2024-11-07 15:05:31,030][04584] Avg episode reward: [(0, '4.708')] +[2024-11-07 15:05:34,912][09024] Updated weights for policy 0, policy_version 2297 (0.0042) +[2024-11-07 15:05:36,027][04584] Fps is (10 sec: 9011.3, 60 sec: 7305.4, 300 sec: 5734.4). Total num frames: 9420800. Throughput: 0: 1796.1. Samples: 350079. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2024-11-07 15:05:36,029][04584] Avg episode reward: [(0, '4.454')] +[2024-11-07 15:05:39,037][09024] Updated weights for policy 0, policy_version 2307 (0.0063) +[2024-11-07 15:05:41,029][04584] Fps is (10 sec: 8600.8, 60 sec: 7167.9, 300 sec: 5767.1). Total num frames: 9457664. Throughput: 0: 1853.6. Samples: 357318. Policy #0 lag: (min: 0.0, avg: 1.5, max: 3.0) +[2024-11-07 15:05:41,031][04584] Avg episode reward: [(0, '4.661')] +[2024-11-07 15:05:45,685][09024] Updated weights for policy 0, policy_version 2317 (0.0065) +[2024-11-07 15:05:46,028][04584] Fps is (10 sec: 7372.7, 60 sec: 7330.3, 300 sec: 5798.7). Total num frames: 9494528. Throughput: 0: 1890.9. Samples: 367683. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) +[2024-11-07 15:05:46,030][04584] Avg episode reward: [(0, '4.410')] +[2024-11-07 15:05:50,912][09024] Updated weights for policy 0, policy_version 2327 (0.0035) +[2024-11-07 15:05:51,028][04584] Fps is (10 sec: 7373.4, 60 sec: 7373.0, 300 sec: 5828.9). Total num frames: 9531392. Throughput: 0: 1887.5. Samples: 378978. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2024-11-07 15:05:51,031][04584] Avg episode reward: [(0, '4.357')] +[2024-11-07 15:05:56,548][04584] Fps is (10 sec: 5450.5, 60 sec: 7106.3, 300 sec: 5784.9). Total num frames: 9551872. Throughput: 0: 1821.6. Samples: 383718. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) +[2024-11-07 15:05:56,552][04584] Avg episode reward: [(0, '4.499')] +[2024-11-07 15:05:59,046][09024] Updated weights for policy 0, policy_version 2337 (0.0071) +[2024-11-07 15:06:01,028][04584] Fps is (10 sec: 5324.8, 60 sec: 7236.3, 300 sec: 5810.3). Total num frames: 9584640. Throughput: 0: 1762.5. Samples: 390465. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) +[2024-11-07 15:06:01,030][04584] Avg episode reward: [(0, '4.315')] +[2024-11-07 15:06:04,144][09024] Updated weights for policy 0, policy_version 2347 (0.0033) +[2024-11-07 15:06:06,029][04584] Fps is (10 sec: 7777.0, 60 sec: 7168.0, 300 sec: 5853.5). Total num frames: 9625600. Throughput: 0: 1835.6. Samples: 402594. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2024-11-07 15:06:06,031][04584] Avg episode reward: [(0, '4.349')] +[2024-11-07 15:06:09,038][09024] Updated weights for policy 0, policy_version 2357 (0.0025) +[2024-11-07 15:06:11,028][04584] Fps is (10 sec: 8192.0, 60 sec: 7168.1, 300 sec: 5895.3). Total num frames: 9666560. Throughput: 0: 1899.2. Samples: 409332. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) +[2024-11-07 15:06:11,033][04584] Avg episode reward: [(0, '4.490')] +[2024-11-07 15:06:14,224][09024] Updated weights for policy 0, policy_version 2367 (0.0037) +[2024-11-07 15:06:16,028][04584] Fps is (10 sec: 8192.1, 60 sec: 7372.7, 300 sec: 5935.6). Total num frames: 9707520. Throughput: 0: 1871.7. Samples: 420936. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2024-11-07 15:06:16,030][04584] Avg episode reward: [(0, '4.367')] +[2024-11-07 15:06:19,411][09024] Updated weights for policy 0, policy_version 2377 (0.0041) +[2024-11-07 15:06:21,029][04584] Fps is (10 sec: 8190.8, 60 sec: 7664.4, 300 sec: 5974.5). Total num frames: 9748480. Throughput: 0: 1848.9. Samples: 433281. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) +[2024-11-07 15:06:21,032][04584] Avg episode reward: [(0, '4.296')] +[2024-11-07 15:06:24,504][09024] Updated weights for policy 0, policy_version 2387 (0.0053) +[2024-11-07 15:06:26,027][04584] Fps is (10 sec: 7783.2, 60 sec: 7577.6, 300 sec: 5998.2). Total num frames: 9785344. Throughput: 0: 1812.0. Samples: 438855. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) +[2024-11-07 15:06:26,031][04584] Avg episode reward: [(0, '4.495')] +[2024-11-07 15:06:31,028][04584] Fps is (10 sec: 5735.2, 60 sec: 7236.3, 300 sec: 6067.6). Total num frames: 9805824. Throughput: 0: 1818.3. Samples: 449505. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) +[2024-11-07 15:06:31,030][04584] Avg episode reward: [(0, '4.282')] +[2024-11-07 15:06:33,393][09024] Updated weights for policy 0, policy_version 2397 (0.0054) +[2024-11-07 15:06:36,028][04584] Fps is (10 sec: 4505.4, 60 sec: 6826.6, 300 sec: 6150.9). Total num frames: 9830400. Throughput: 0: 1667.3. Samples: 454008. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) +[2024-11-07 15:06:36,033][04584] Avg episode reward: [(0, '4.324')] +[2024-11-07 15:06:41,028][04584] Fps is (10 sec: 4505.6, 60 sec: 6553.7, 300 sec: 6220.4). Total num frames: 9850880. Throughput: 0: 1647.4. Samples: 456993. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2024-11-07 15:06:41,030][04584] Avg episode reward: [(0, '4.430')] +[2024-11-07 15:06:42,724][09024] Updated weights for policy 0, policy_version 2407 (0.0111) +[2024-11-07 15:06:46,037][04584] Fps is (10 sec: 4092.3, 60 sec: 6279.5, 300 sec: 6234.1). Total num frames: 9871360. Throughput: 0: 1631.1. Samples: 463878. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) +[2024-11-07 15:06:46,049][04584] Avg episode reward: [(0, '4.396')] +[2024-11-07 15:06:50,217][09024] Updated weights for policy 0, policy_version 2417 (0.0070) +[2024-11-07 15:06:51,028][04584] Fps is (10 sec: 5324.8, 60 sec: 6212.3, 300 sec: 6294.4). Total num frames: 9904128. Throughput: 0: 1541.4. Samples: 471957. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2024-11-07 15:06:51,033][04584] Avg episode reward: [(0, '4.471')] +[2024-11-07 15:06:56,028][04584] Fps is (10 sec: 6559.7, 60 sec: 6473.3, 300 sec: 6345.4). Total num frames: 9936896. Throughput: 0: 1495.2. Samples: 476616. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2024-11-07 15:06:56,030][04584] Avg episode reward: [(0, '4.333')] +[2024-11-07 15:06:56,741][09024] Updated weights for policy 0, policy_version 2427 (0.0055) +[2024-11-07 15:07:01,028][04584] Fps is (10 sec: 6143.9, 60 sec: 6348.8, 300 sec: 6401.0). Total num frames: 9965568. Throughput: 0: 1452.2. Samples: 486282. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) +[2024-11-07 15:07:01,030][04584] Avg episode reward: [(0, '4.301')] +[2024-11-07 15:07:06,028][04584] Fps is (10 sec: 4096.0, 60 sec: 5871.0, 300 sec: 6359.2). Total num frames: 9977856. Throughput: 0: 1286.4. Samples: 491169. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) +[2024-11-07 15:07:06,029][04584] Avg episode reward: [(0, '4.232')] +[2024-11-07 15:07:06,836][09024] Updated weights for policy 0, policy_version 2437 (0.0050) +[2024-11-07 15:07:11,028][04584] Fps is (10 sec: 2867.1, 60 sec: 5461.3, 300 sec: 6317.6). Total num frames: 9994240. Throughput: 0: 1219.8. Samples: 493749. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2024-11-07 15:07:11,049][04584] Avg episode reward: [(0, '4.415')] +[2024-11-07 15:07:11,110][09009] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000002440_9994240.pth... +[2024-11-07 15:07:12,466][09009] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000002056_8421376.pth +[2024-11-07 15:07:16,031][04584] Fps is (10 sec: 3685.2, 60 sec: 5119.8, 300 sec: 6303.6). Total num frames: 10014720. Throughput: 0: 1105.2. Samples: 499242. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) +[2024-11-07 15:07:16,033][04584] Avg episode reward: [(0, '4.528')] +[2024-11-07 15:07:17,973][09024] Updated weights for policy 0, policy_version 2447 (0.0103) +[2024-11-07 15:07:21,028][04584] Fps is (10 sec: 4096.0, 60 sec: 4778.8, 300 sec: 6262.1). Total num frames: 10035200. Throughput: 0: 1144.8. Samples: 505524. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2024-11-07 15:07:21,031][04584] Avg episode reward: [(0, '4.479')] +[2024-11-07 15:07:26,028][04584] Fps is (10 sec: 4097.4, 60 sec: 4505.6, 300 sec: 6289.8). Total num frames: 10055680. Throughput: 0: 1155.7. Samples: 509001. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) +[2024-11-07 15:07:26,030][04584] Avg episode reward: [(0, '4.388')] +[2024-11-07 15:07:26,954][09024] Updated weights for policy 0, policy_version 2457 (0.0073) +[2024-11-07 15:07:31,029][04584] Fps is (10 sec: 4914.9, 60 sec: 4642.1, 300 sec: 6303.7). Total num frames: 10084352. Throughput: 0: 1154.4. Samples: 515817. Policy #0 lag: (min: 0.0, avg: 1.3, max: 4.0) +[2024-11-07 15:07:31,031][04584] Avg episode reward: [(0, '4.512')] +[2024-11-07 15:07:34,484][09024] Updated weights for policy 0, policy_version 2467 (0.0077) +[2024-11-07 15:07:36,039][04584] Fps is (10 sec: 5318.9, 60 sec: 4641.3, 300 sec: 6275.7). Total num frames: 10108928. Throughput: 0: 1161.7. Samples: 524247. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) +[2024-11-07 15:07:36,041][04584] Avg episode reward: [(0, '4.336')] +[2024-11-07 15:07:41,029][04584] Fps is (10 sec: 4095.9, 60 sec: 4573.8, 300 sec: 6248.1). Total num frames: 10125312. Throughput: 0: 1109.8. Samples: 526557. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) +[2024-11-07 15:07:41,033][04584] Avg episode reward: [(0, '4.377')] +[2024-11-07 15:07:44,723][09024] Updated weights for policy 0, policy_version 2477 (0.0055) +[2024-11-07 15:07:46,028][04584] Fps is (10 sec: 4510.3, 60 sec: 4711.1, 300 sec: 6248.5). Total num frames: 10153984. Throughput: 0: 1039.3. Samples: 533052. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) +[2024-11-07 15:07:46,030][04584] Avg episode reward: [(0, '4.577')] +[2024-11-07 15:07:51,028][04584] Fps is (10 sec: 5325.4, 60 sec: 4573.9, 300 sec: 6206.5). Total num frames: 10178560. Throughput: 0: 1120.3. Samples: 541584. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2024-11-07 15:07:51,030][04584] Avg episode reward: [(0, '4.321')] +[2024-11-07 15:07:52,827][09024] Updated weights for policy 0, policy_version 2487 (0.0129) +[2024-11-07 15:07:56,028][04584] Fps is (10 sec: 4915.5, 60 sec: 4437.3, 300 sec: 6178.7). Total num frames: 10203136. Throughput: 0: 1124.5. Samples: 544350. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2024-11-07 15:07:56,030][04584] Avg episode reward: [(0, '4.500')] +[2024-11-07 15:07:59,814][09024] Updated weights for policy 0, policy_version 2497 (0.0061) +[2024-11-07 15:08:01,029][04584] Fps is (10 sec: 5324.1, 60 sec: 4437.2, 300 sec: 6234.2). Total num frames: 10231808. Throughput: 0: 1192.4. Samples: 552897. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) +[2024-11-07 15:08:01,033][04584] Avg episode reward: [(0, '4.706')] +[2024-11-07 15:08:06,028][04584] Fps is (10 sec: 6144.0, 60 sec: 4778.7, 300 sec: 6248.2). Total num frames: 10264576. Throughput: 0: 1246.2. Samples: 561603. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2024-11-07 15:08:06,029][04584] Avg episode reward: [(0, '4.515')] +[2024-11-07 15:08:06,822][09024] Updated weights for policy 0, policy_version 2507 (0.0059) +[2024-11-07 15:08:11,035][04584] Fps is (10 sec: 6140.4, 60 sec: 4982.9, 300 sec: 6261.9). Total num frames: 10293248. Throughput: 0: 1278.5. Samples: 566541. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2024-11-07 15:08:11,038][04584] Avg episode reward: [(0, '4.344')] +[2024-11-07 15:08:15,470][09024] Updated weights for policy 0, policy_version 2517 (0.0058) +[2024-11-07 15:08:16,028][04584] Fps is (10 sec: 4915.1, 60 sec: 4983.7, 300 sec: 6206.5). Total num frames: 10313728. Throughput: 0: 1262.9. Samples: 572646. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2024-11-07 15:08:16,040][04584] Avg episode reward: [(0, '4.524')] +[2024-11-07 15:08:21,027][04584] Fps is (10 sec: 4918.9, 60 sec: 5120.0, 300 sec: 6151.0). Total num frames: 10342400. Throughput: 0: 1276.5. Samples: 581676. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2024-11-07 15:08:21,030][04584] Avg episode reward: [(0, '4.594')] +[2024-11-07 15:08:22,084][09024] Updated weights for policy 0, policy_version 2527 (0.0055) +[2024-11-07 15:08:26,028][04584] Fps is (10 sec: 6143.8, 60 sec: 5324.8, 300 sec: 6123.2). Total num frames: 10375168. Throughput: 0: 1339.8. Samples: 586845. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2024-11-07 15:08:26,030][04584] Avg episode reward: [(0, '4.716')] +[2024-11-07 15:08:28,099][09024] Updated weights for policy 0, policy_version 2537 (0.0069) +[2024-11-07 15:08:31,028][04584] Fps is (10 sec: 6553.3, 60 sec: 5393.1, 300 sec: 6139.5). Total num frames: 10407936. Throughput: 0: 1419.3. Samples: 596922. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2024-11-07 15:08:31,031][04584] Avg episode reward: [(0, '4.377')] +[2024-11-07 15:08:34,340][09024] Updated weights for policy 0, policy_version 2547 (0.0045) +[2024-11-07 15:08:36,028][04584] Fps is (10 sec: 6553.8, 60 sec: 5530.6, 300 sec: 6206.5). Total num frames: 10440704. Throughput: 0: 1443.5. Samples: 606540. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) +[2024-11-07 15:08:36,030][04584] Avg episode reward: [(0, '4.421')] +[2024-11-07 15:08:40,592][09024] Updated weights for policy 0, policy_version 2557 (0.0049) +[2024-11-07 15:08:41,028][04584] Fps is (10 sec: 6553.6, 60 sec: 5802.8, 300 sec: 6206.7). Total num frames: 10473472. Throughput: 0: 1493.3. Samples: 611547. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2024-11-07 15:08:41,034][04584] Avg episode reward: [(0, '4.527')] +[2024-11-07 15:08:48,061][04584] Fps is (10 sec: 5446.4, 60 sec: 5678.6, 300 sec: 6164.0). Total num frames: 10506240. Throughput: 0: 1458.3. Samples: 621483. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) +[2024-11-07 15:08:48,062][04584] Avg episode reward: [(0, '4.465')] +[2024-11-07 15:08:48,629][09024] Updated weights for policy 0, policy_version 2567 (0.0053) +[2024-11-07 15:08:51,028][04584] Fps is (10 sec: 5324.6, 60 sec: 5802.6, 300 sec: 6164.8). Total num frames: 10526720. Throughput: 0: 1481.7. Samples: 628281. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) +[2024-11-07 15:08:51,037][04584] Avg episode reward: [(0, '4.470')] +[2024-11-07 15:08:54,930][09024] Updated weights for policy 0, policy_version 2577 (0.0040) +[2024-11-07 15:08:56,028][04584] Fps is (10 sec: 6683.4, 60 sec: 5939.2, 300 sec: 6164.8). Total num frames: 10559488. Throughput: 0: 1483.7. Samples: 633297. Policy #0 lag: (min: 0.0, avg: 1.6, max: 3.0) +[2024-11-07 15:08:56,035][04584] Avg episode reward: [(0, '4.549')] +[2024-11-07 15:09:00,969][09024] Updated weights for policy 0, policy_version 2587 (0.0046) +[2024-11-07 15:09:01,028][04584] Fps is (10 sec: 6963.3, 60 sec: 6075.8, 300 sec: 6178.8). Total num frames: 10596352. Throughput: 0: 1563.5. Samples: 643005. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2024-11-07 15:09:01,030][04584] Avg episode reward: [(0, '4.366')] +[2024-11-07 15:09:06,030][04584] Fps is (10 sec: 6552.4, 60 sec: 6007.3, 300 sec: 6178.7). Total num frames: 10625024. Throughput: 0: 1577.3. Samples: 652659. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2024-11-07 15:09:06,049][04584] Avg episode reward: [(0, '4.493')] +[2024-11-07 15:09:08,985][09024] Updated weights for policy 0, policy_version 2597 (0.0064) +[2024-11-07 15:09:11,030][04584] Fps is (10 sec: 4914.3, 60 sec: 5871.5, 300 sec: 6164.8). Total num frames: 10645504. Throughput: 0: 1533.3. Samples: 655848. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2024-11-07 15:09:11,036][04584] Avg episode reward: [(0, '4.463')] +[2024-11-07 15:09:11,070][09009] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000002599_10645504.pth... +[2024-11-07 15:09:11,734][09009] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000002255_9236480.pth +[2024-11-07 15:09:16,028][04584] Fps is (10 sec: 4096.8, 60 sec: 5870.9, 300 sec: 6109.3). Total num frames: 10665984. Throughput: 0: 1445.5. Samples: 661971. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2024-11-07 15:09:16,032][04584] Avg episode reward: [(0, '4.449')] +[2024-11-07 15:09:18,879][09024] Updated weights for policy 0, policy_version 2607 (0.0099) +[2024-11-07 15:09:22,383][04584] Fps is (10 sec: 3247.1, 60 sec: 5541.0, 300 sec: 6012.2). Total num frames: 10682368. Throughput: 0: 1330.0. Samples: 668193. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2024-11-07 15:09:22,396][04584] Avg episode reward: [(0, '4.305')] +[2024-11-07 15:09:26,028][04584] Fps is (10 sec: 3276.8, 60 sec: 5393.1, 300 sec: 5970.4). Total num frames: 10698752. Throughput: 0: 1279.7. Samples: 669135. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2024-11-07 15:09:26,030][04584] Avg episode reward: [(0, '4.290')] +[2024-11-07 15:09:29,818][09024] Updated weights for policy 0, policy_version 2617 (0.0067) +[2024-11-07 15:09:31,028][04584] Fps is (10 sec: 4738.2, 60 sec: 5256.6, 300 sec: 5901.2). Total num frames: 10723328. Throughput: 0: 1272.1. Samples: 676140. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) +[2024-11-07 15:09:31,030][04584] Avg episode reward: [(0, '4.222')] +[2024-11-07 15:09:36,028][04584] Fps is (10 sec: 4505.6, 60 sec: 5051.7, 300 sec: 5817.7). Total num frames: 10743808. Throughput: 0: 1216.6. Samples: 683025. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2024-11-07 15:09:36,029][04584] Avg episode reward: [(0, '4.278')] +[2024-11-07 15:09:39,151][09024] Updated weights for policy 0, policy_version 2627 (0.0095) +[2024-11-07 15:09:41,028][04584] Fps is (10 sec: 4505.4, 60 sec: 4915.2, 300 sec: 5799.3). Total num frames: 10768384. Throughput: 0: 1181.2. Samples: 686451. Policy #0 lag: (min: 0.0, avg: 1.5, max: 3.0) +[2024-11-07 15:09:41,032][04584] Avg episode reward: [(0, '4.383')] +[2024-11-07 15:09:46,028][04584] Fps is (10 sec: 4505.5, 60 sec: 4875.6, 300 sec: 5762.2). Total num frames: 10788864. Throughput: 0: 1113.2. Samples: 693099. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2024-11-07 15:09:46,043][04584] Avg episode reward: [(0, '4.335')] +[2024-11-07 15:09:49,553][09024] Updated weights for policy 0, policy_version 2637 (0.0102) +[2024-11-07 15:09:51,028][04584] Fps is (10 sec: 3686.5, 60 sec: 4642.2, 300 sec: 5706.6). Total num frames: 10805248. Throughput: 0: 1009.0. Samples: 698064. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2024-11-07 15:09:51,030][04584] Avg episode reward: [(0, '4.319')] +[2024-11-07 15:09:56,739][04584] Fps is (10 sec: 2676.9, 60 sec: 4250.4, 300 sec: 5637.5). Total num frames: 10817536. Throughput: 0: 974.1. Samples: 700371. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) +[2024-11-07 15:09:56,745][04584] Avg episode reward: [(0, '4.441')] +[2024-11-07 15:10:01,028][04584] Fps is (10 sec: 2867.2, 60 sec: 3959.5, 300 sec: 5553.9). Total num frames: 10833920. Throughput: 0: 930.9. Samples: 703860. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2024-11-07 15:10:01,033][04584] Avg episode reward: [(0, '4.393')] +[2024-11-07 15:10:02,874][09024] Updated weights for policy 0, policy_version 2647 (0.0087) +[2024-11-07 15:10:06,066][04584] Fps is (10 sec: 3513.6, 60 sec: 3752.5, 300 sec: 5469.9). Total num frames: 10850304. Throughput: 0: 949.9. Samples: 709686. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2024-11-07 15:10:06,070][04584] Avg episode reward: [(0, '4.337')] +[2024-11-07 15:10:11,031][04584] Fps is (10 sec: 3685.4, 60 sec: 3754.6, 300 sec: 5442.8). Total num frames: 10870784. Throughput: 0: 963.9. Samples: 712512. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2024-11-07 15:10:11,047][04584] Avg episode reward: [(0, '4.329')] +[2024-11-07 15:10:13,800][09024] Updated weights for policy 0, policy_version 2657 (0.0062) +[2024-11-07 15:10:16,028][04584] Fps is (10 sec: 4111.2, 60 sec: 3754.7, 300 sec: 5423.5). Total num frames: 10891264. Throughput: 0: 943.7. Samples: 718608. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2024-11-07 15:10:16,060][04584] Avg episode reward: [(0, '4.453')] +[2024-11-07 15:10:21,029][04584] Fps is (10 sec: 4096.4, 60 sec: 3911.2, 300 sec: 5359.5). Total num frames: 10911744. Throughput: 0: 910.6. Samples: 724002. Policy #0 lag: (min: 0.0, avg: 1.0, max: 4.0) +[2024-11-07 15:10:21,035][04584] Avg episode reward: [(0, '4.501')] +[2024-11-07 15:10:24,256][09024] Updated weights for policy 0, policy_version 2667 (0.0070) +[2024-11-07 15:10:26,028][04584] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 5276.2). Total num frames: 10928128. Throughput: 0: 899.3. Samples: 726918. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2024-11-07 15:10:26,030][04584] Avg episode reward: [(0, '4.423')] +[2024-11-07 15:10:31,028][04584] Fps is (10 sec: 2458.0, 60 sec: 3549.9, 300 sec: 5137.4). Total num frames: 10936320. Throughput: 0: 820.8. Samples: 730035. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) +[2024-11-07 15:10:31,030][04584] Avg episode reward: [(0, '4.404')] +[2024-11-07 15:10:36,028][04584] Fps is (10 sec: 2457.6, 60 sec: 3481.6, 300 sec: 5067.9). Total num frames: 10952704. Throughput: 0: 815.3. Samples: 734751. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) +[2024-11-07 15:10:36,034][04584] Avg episode reward: [(0, '4.371')] +[2024-11-07 15:10:39,724][09024] Updated weights for policy 0, policy_version 2677 (0.0092) +[2024-11-07 15:10:41,028][04584] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 4998.5). Total num frames: 10969088. Throughput: 0: 831.2. Samples: 737184. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2024-11-07 15:10:41,030][04584] Avg episode reward: [(0, '4.353')] +[2024-11-07 15:10:46,030][04584] Fps is (10 sec: 3685.4, 60 sec: 3344.9, 300 sec: 4942.9). Total num frames: 10989568. Throughput: 0: 869.6. Samples: 742995. Policy #0 lag: (min: 0.0, avg: 1.0, max: 4.0) +[2024-11-07 15:10:46,045][04584] Avg episode reward: [(0, '4.321')] +[2024-11-07 15:10:49,813][09024] Updated weights for policy 0, policy_version 2687 (0.0083) +[2024-11-07 15:10:51,028][04584] Fps is (10 sec: 4096.0, 60 sec: 3413.3, 300 sec: 4951.7). Total num frames: 11010048. Throughput: 0: 878.9. Samples: 749202. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) +[2024-11-07 15:10:51,037][04584] Avg episode reward: [(0, '4.182')] +[2024-11-07 15:10:56,028][04584] Fps is (10 sec: 4097.0, 60 sec: 3592.4, 300 sec: 4901.3). Total num frames: 11030528. Throughput: 0: 881.2. Samples: 752163. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) +[2024-11-07 15:10:56,038][04584] Avg episode reward: [(0, '4.313')] +[2024-11-07 15:10:59,310][09024] Updated weights for policy 0, policy_version 2697 (0.0070) +[2024-11-07 15:11:01,028][04584] Fps is (10 sec: 4095.9, 60 sec: 3618.1, 300 sec: 4831.9). Total num frames: 11051008. Throughput: 0: 893.9. Samples: 758835. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) +[2024-11-07 15:11:01,030][04584] Avg episode reward: [(0, '4.453')] +[2024-11-07 15:11:06,028][04584] Fps is (10 sec: 2867.3, 60 sec: 3483.8, 300 sec: 4720.8). Total num frames: 11059200. Throughput: 0: 858.9. Samples: 762651. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) +[2024-11-07 15:11:06,050][04584] Avg episode reward: [(0, '4.518')] +[2024-11-07 15:11:11,039][04584] Fps is (10 sec: 2864.2, 60 sec: 3481.1, 300 sec: 4651.2). Total num frames: 11079680. Throughput: 0: 850.9. Samples: 765216. Policy #0 lag: (min: 1.0, avg: 1.3, max: 3.0) +[2024-11-07 15:11:11,054][04584] Avg episode reward: [(0, '4.563')] +[2024-11-07 15:11:11,654][09009] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000002706_11083776.pth... +[2024-11-07 15:11:12,410][09009] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000002440_9994240.pth +[2024-11-07 15:11:12,676][09024] Updated weights for policy 0, policy_version 2707 (0.0102) +[2024-11-07 15:11:16,028][04584] Fps is (10 sec: 4505.6, 60 sec: 3549.9, 300 sec: 4595.9). Total num frames: 11104256. Throughput: 0: 909.9. Samples: 770982. Policy #0 lag: (min: 0.0, avg: 1.6, max: 4.0) +[2024-11-07 15:11:16,035][04584] Avg episode reward: [(0, '4.511')] +[2024-11-07 15:11:20,515][09024] Updated weights for policy 0, policy_version 2717 (0.0108) +[2024-11-07 15:11:21,029][04584] Fps is (10 sec: 4919.9, 60 sec: 3618.2, 300 sec: 4554.2). Total num frames: 11128832. Throughput: 0: 974.7. Samples: 778614. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2024-11-07 15:11:21,032][04584] Avg episode reward: [(0, '4.433')] +[2024-11-07 15:11:26,028][04584] Fps is (10 sec: 4505.6, 60 sec: 3686.4, 300 sec: 4554.2). Total num frames: 11149312. Throughput: 0: 994.3. Samples: 781929. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2024-11-07 15:11:26,032][04584] Avg episode reward: [(0, '4.463')] +[2024-11-07 15:11:30,644][09024] Updated weights for policy 0, policy_version 2727 (0.0103) +[2024-11-07 15:11:31,036][04584] Fps is (10 sec: 4092.9, 60 sec: 3890.6, 300 sec: 4540.2). Total num frames: 11169792. Throughput: 0: 1013.8. Samples: 788622. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2024-11-07 15:11:31,051][04584] Avg episode reward: [(0, '4.541')] +[2024-11-07 15:11:36,028][04584] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 4526.4). Total num frames: 11186176. Throughput: 0: 990.2. Samples: 793761. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) +[2024-11-07 15:11:36,095][04584] Avg episode reward: [(0, '4.513')] +[2024-11-07 15:11:41,029][04584] Fps is (10 sec: 2869.4, 60 sec: 3822.9, 300 sec: 4498.8). Total num frames: 11198464. Throughput: 0: 979.4. Samples: 796236. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) +[2024-11-07 15:11:41,032][04584] Avg episode reward: [(0, '4.443')] +[2024-11-07 15:11:44,286][09024] Updated weights for policy 0, policy_version 2737 (0.0075) +[2024-11-07 15:11:46,028][04584] Fps is (10 sec: 2867.2, 60 sec: 3754.8, 300 sec: 4443.1). Total num frames: 11214848. Throughput: 0: 908.7. Samples: 799725. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) +[2024-11-07 15:11:46,036][04584] Avg episode reward: [(0, '4.426')] +[2024-11-07 15:11:51,028][04584] Fps is (10 sec: 4096.3, 60 sec: 3822.9, 300 sec: 4415.3). Total num frames: 11239424. Throughput: 0: 965.5. Samples: 806097. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2024-11-07 15:11:51,030][04584] Avg episode reward: [(0, '4.502')] +[2024-11-07 15:11:53,698][09024] Updated weights for policy 0, policy_version 2747 (0.0120) +[2024-11-07 15:11:56,031][04584] Fps is (10 sec: 4094.8, 60 sec: 3754.5, 300 sec: 4373.7). Total num frames: 11255808. Throughput: 0: 989.4. Samples: 809733. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2024-11-07 15:11:56,035][04584] Avg episode reward: [(0, '4.506')] +[2024-11-07 15:12:01,028][04584] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 4387.6). Total num frames: 11272192. Throughput: 0: 975.5. Samples: 814878. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2024-11-07 15:12:01,061][04584] Avg episode reward: [(0, '4.458')] +[2024-11-07 15:12:05,292][09024] Updated weights for policy 0, policy_version 2757 (0.0095) +[2024-11-07 15:12:06,028][04584] Fps is (10 sec: 3687.5, 60 sec: 3891.2, 300 sec: 4401.5). Total num frames: 11292672. Throughput: 0: 924.8. Samples: 820227. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2024-11-07 15:12:06,030][04584] Avg episode reward: [(0, '4.342')] +[2024-11-07 15:12:11,028][04584] Fps is (10 sec: 5324.8, 60 sec: 4096.7, 300 sec: 4443.2). Total num frames: 11325440. Throughput: 0: 946.0. Samples: 824499. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2024-11-07 15:12:11,035][04584] Avg episode reward: [(0, '4.363')] +[2024-11-07 15:12:11,709][09024] Updated weights for policy 0, policy_version 2767 (0.0038) +[2024-11-07 15:12:16,028][04584] Fps is (10 sec: 4915.2, 60 sec: 3959.5, 300 sec: 4429.2). Total num frames: 11341824. Throughput: 0: 948.7. Samples: 831306. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2024-11-07 15:12:16,036][04584] Avg episode reward: [(0, '4.397')] +[2024-11-07 15:12:21,030][04584] Fps is (10 sec: 4505.0, 60 sec: 4027.7, 300 sec: 4457.0). Total num frames: 11370496. Throughput: 0: 1013.9. Samples: 839388. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2024-11-07 15:12:21,033][04584] Avg episode reward: [(0, '4.267')] +[2024-11-07 15:12:21,441][09024] Updated weights for policy 0, policy_version 2777 (0.0056) +[2024-11-07 15:12:26,029][04584] Fps is (10 sec: 5733.9, 60 sec: 4164.2, 300 sec: 4457.0). Total num frames: 11399168. Throughput: 0: 1048.7. Samples: 843426. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) +[2024-11-07 15:12:26,040][04584] Avg episode reward: [(0, '4.437')] +[2024-11-07 15:12:27,849][09024] Updated weights for policy 0, policy_version 2787 (0.0065) +[2024-11-07 15:12:31,033][04584] Fps is (10 sec: 6141.7, 60 sec: 4369.3, 300 sec: 4484.9). Total num frames: 11431936. Throughput: 0: 1182.1. Samples: 852924. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) +[2024-11-07 15:12:31,045][04584] Avg episode reward: [(0, '4.275')] +[2024-11-07 15:12:34,138][09024] Updated weights for policy 0, policy_version 2797 (0.0074) +[2024-11-07 15:12:36,035][04584] Fps is (10 sec: 6958.7, 60 sec: 4709.8, 300 sec: 4554.1). Total num frames: 11468800. Throughput: 0: 1266.2. Samples: 863085. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) +[2024-11-07 15:12:36,038][04584] Avg episode reward: [(0, '4.393')] +[2024-11-07 15:12:40,396][09024] Updated weights for policy 0, policy_version 2807 (0.0059) +[2024-11-07 15:12:41,029][04584] Fps is (10 sec: 6556.5, 60 sec: 4983.5, 300 sec: 4554.2). Total num frames: 11497472. Throughput: 0: 1290.9. Samples: 867822. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2024-11-07 15:12:41,032][04584] Avg episode reward: [(0, '4.360')] +[2024-11-07 15:12:46,028][04584] Fps is (10 sec: 6558.3, 60 sec: 5324.8, 300 sec: 4595.9). Total num frames: 11534336. Throughput: 0: 1392.6. Samples: 877545. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2024-11-07 15:12:46,030][04584] Avg episode reward: [(0, '4.605')] +[2024-11-07 15:12:48,617][09024] Updated weights for policy 0, policy_version 2817 (0.0046) +[2024-11-07 15:12:51,028][04584] Fps is (10 sec: 5325.1, 60 sec: 5188.3, 300 sec: 4568.1). Total num frames: 11550720. Throughput: 0: 1424.7. Samples: 884337. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2024-11-07 15:12:51,031][04584] Avg episode reward: [(0, '4.664')] +[2024-11-07 15:12:54,695][09024] Updated weights for policy 0, policy_version 2827 (0.0045) +[2024-11-07 15:12:56,028][04584] Fps is (10 sec: 5325.0, 60 sec: 5529.9, 300 sec: 4595.9). Total num frames: 11587584. Throughput: 0: 1441.0. Samples: 889341. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) +[2024-11-07 15:12:56,033][04584] Avg episode reward: [(0, '4.448')] +[2024-11-07 15:13:00,537][09024] Updated weights for policy 0, policy_version 2837 (0.0054) +[2024-11-07 15:13:01,030][04584] Fps is (10 sec: 6961.8, 60 sec: 5802.5, 300 sec: 4595.8). Total num frames: 11620352. Throughput: 0: 1518.0. Samples: 899619. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2024-11-07 15:13:01,037][04584] Avg episode reward: [(0, '4.399')] +[2024-11-07 15:13:06,034][04584] Fps is (10 sec: 6139.8, 60 sec: 5938.5, 300 sec: 4595.9). Total num frames: 11649024. Throughput: 0: 1536.9. Samples: 908556. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) +[2024-11-07 15:13:06,038][04584] Avg episode reward: [(0, '4.576')] +[2024-11-07 15:13:07,824][09024] Updated weights for policy 0, policy_version 2847 (0.0046) +[2024-11-07 15:13:11,028][04584] Fps is (10 sec: 5735.9, 60 sec: 5871.0, 300 sec: 4623.6). Total num frames: 11677696. Throughput: 0: 1546.8. Samples: 913032. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2024-11-07 15:13:11,030][04584] Avg episode reward: [(0, '4.395')] +[2024-11-07 15:13:11,245][09009] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000002852_11681792.pth... +[2024-11-07 15:13:11,431][09009] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000002599_10645504.pth +[2024-11-07 15:13:14,223][09024] Updated weights for policy 0, policy_version 2857 (0.0048) +[2024-11-07 15:13:16,028][04584] Fps is (10 sec: 6148.2, 60 sec: 6144.0, 300 sec: 4637.5). Total num frames: 11710464. Throughput: 0: 1545.3. Samples: 922455. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2024-11-07 15:13:16,030][04584] Avg episode reward: [(0, '4.332')] +[2024-11-07 15:13:20,452][09024] Updated weights for policy 0, policy_version 2867 (0.0048) +[2024-11-07 15:13:22,886][04584] Fps is (10 sec: 5526.7, 60 sec: 6025.9, 300 sec: 4608.5). Total num frames: 11743232. Throughput: 0: 1485.1. Samples: 932661. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) +[2024-11-07 15:13:22,894][04584] Avg episode reward: [(0, '4.160')] +[2024-11-07 15:13:26,028][04584] Fps is (10 sec: 5734.4, 60 sec: 6144.1, 300 sec: 4609.7). Total num frames: 11767808. Throughput: 0: 1475.2. Samples: 934206. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2024-11-07 15:13:26,030][04584] Avg episode reward: [(0, '4.440')] +[2024-11-07 15:13:28,531][09024] Updated weights for policy 0, policy_version 2877 (0.0041) +[2024-11-07 15:13:31,027][04584] Fps is (10 sec: 7043.1, 60 sec: 6144.6, 300 sec: 4609.7). Total num frames: 11800576. Throughput: 0: 1484.0. Samples: 944325. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) +[2024-11-07 15:13:31,030][04584] Avg episode reward: [(0, '4.327')] +[2024-11-07 15:13:35,141][09024] Updated weights for policy 0, policy_version 2887 (0.0033) +[2024-11-07 15:13:36,033][04584] Fps is (10 sec: 6140.8, 60 sec: 6007.7, 300 sec: 4595.8). Total num frames: 11829248. Throughput: 0: 1536.6. Samples: 953490. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2024-11-07 15:13:36,034][04584] Avg episode reward: [(0, '4.494')] +[2024-11-07 15:13:41,035][04584] Fps is (10 sec: 6139.4, 60 sec: 6075.1, 300 sec: 4627.6). Total num frames: 11862016. Throughput: 0: 1525.2. Samples: 957987. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) +[2024-11-07 15:13:41,037][04584] Avg episode reward: [(0, '4.347')] +[2024-11-07 15:13:41,220][09024] Updated weights for policy 0, policy_version 2897 (0.0051) +[2024-11-07 15:13:46,028][04584] Fps is (10 sec: 6966.9, 60 sec: 6075.8, 300 sec: 4651.4). Total num frames: 11898880. Throughput: 0: 1542.1. Samples: 969009. Policy #0 lag: (min: 0.0, avg: 1.5, max: 3.0) +[2024-11-07 15:13:46,030][04584] Avg episode reward: [(0, '4.503')] +[2024-11-07 15:13:46,718][09024] Updated weights for policy 0, policy_version 2907 (0.0032) +[2024-11-07 15:13:51,028][04584] Fps is (10 sec: 7377.9, 60 sec: 6417.1, 300 sec: 4665.3). Total num frames: 11935744. Throughput: 0: 1589.0. Samples: 980049. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) +[2024-11-07 15:13:51,031][04584] Avg episode reward: [(0, '4.431')] +[2024-11-07 15:13:52,617][09024] Updated weights for policy 0, policy_version 2917 (0.0039) +[2024-11-07 15:13:57,206][04584] Fps is (10 sec: 5863.0, 60 sec: 6159.6, 300 sec: 4619.1). Total num frames: 11964416. Throughput: 0: 1561.5. Samples: 985140. Policy #0 lag: (min: 0.0, avg: 1.5, max: 3.0) +[2024-11-07 15:13:57,212][04584] Avg episode reward: [(0, '4.592')] +[2024-11-07 15:14:00,312][09024] Updated weights for policy 0, policy_version 2927 (0.0028) +[2024-11-07 15:14:01,028][04584] Fps is (10 sec: 5734.7, 60 sec: 6212.5, 300 sec: 4637.5). Total num frames: 11993088. Throughput: 0: 1551.5. Samples: 992271. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) +[2024-11-07 15:14:01,029][04584] Avg episode reward: [(0, '4.379')] +[2024-11-07 15:14:06,028][04584] Fps is (10 sec: 6964.1, 60 sec: 6281.2, 300 sec: 4679.2). Total num frames: 12025856. Throughput: 0: 1607.6. Samples: 1002018. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) +[2024-11-07 15:14:06,030][04584] Avg episode reward: [(0, '4.472')] +[2024-11-07 15:14:06,718][09024] Updated weights for policy 0, policy_version 2937 (0.0058) +[2024-11-07 15:14:11,028][04584] Fps is (10 sec: 6963.2, 60 sec: 6417.1, 300 sec: 4734.7). Total num frames: 12062720. Throughput: 0: 1629.3. Samples: 1007526. Policy #0 lag: (min: 0.0, avg: 1.3, max: 4.0) +[2024-11-07 15:14:11,033][04584] Avg episode reward: [(0, '4.317')] +[2024-11-07 15:14:11,886][09024] Updated weights for policy 0, policy_version 2947 (0.0051) +[2024-11-07 15:14:16,029][04584] Fps is (10 sec: 7372.4, 60 sec: 6485.2, 300 sec: 4826.3). Total num frames: 12099584. Throughput: 0: 1657.5. Samples: 1018914. Policy #0 lag: (min: 0.0, avg: 1.3, max: 4.0) +[2024-11-07 15:14:16,031][04584] Avg episode reward: [(0, '4.249')] +[2024-11-07 15:14:17,578][09024] Updated weights for policy 0, policy_version 2957 (0.0039) +[2024-11-07 15:14:21,028][04584] Fps is (10 sec: 7372.8, 60 sec: 6763.0, 300 sec: 4873.5). Total num frames: 12136448. Throughput: 0: 1692.9. Samples: 1029663. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2024-11-07 15:14:21,029][04584] Avg episode reward: [(0, '4.486')] +[2024-11-07 15:14:23,230][09024] Updated weights for policy 0, policy_version 2967 (0.0057) +[2024-11-07 15:14:26,028][04584] Fps is (10 sec: 6963.8, 60 sec: 6690.1, 300 sec: 4901.3). Total num frames: 12169216. Throughput: 0: 1719.2. Samples: 1035336. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2024-11-07 15:14:26,034][04584] Avg episode reward: [(0, '4.601')] +[2024-11-07 15:14:28,687][09024] Updated weights for policy 0, policy_version 2977 (0.0045) +[2024-11-07 15:14:31,524][04584] Fps is (10 sec: 5463.0, 60 sec: 6499.8, 300 sec: 4906.9). Total num frames: 12193792. Throughput: 0: 1696.9. Samples: 1046211. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) +[2024-11-07 15:14:31,526][04584] Avg episode reward: [(0, '4.651')] +[2024-11-07 15:14:36,028][04584] Fps is (10 sec: 6143.9, 60 sec: 6690.7, 300 sec: 4956.9). Total num frames: 12230656. Throughput: 0: 1634.1. Samples: 1053585. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2024-11-07 15:14:36,030][04584] Avg episode reward: [(0, '4.456')] +[2024-11-07 15:14:36,569][09024] Updated weights for policy 0, policy_version 2987 (0.0041) +[2024-11-07 15:14:41,028][04584] Fps is (10 sec: 6896.1, 60 sec: 6622.7, 300 sec: 4984.6). Total num frames: 12259328. Throughput: 0: 1685.2. Samples: 1058991. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) +[2024-11-07 15:14:41,031][04584] Avg episode reward: [(0, '4.358')] +[2024-11-07 15:14:44,208][09024] Updated weights for policy 0, policy_version 2997 (0.0079) +[2024-11-07 15:14:46,028][04584] Fps is (10 sec: 4915.3, 60 sec: 6348.8, 300 sec: 4998.5). Total num frames: 12279808. Throughput: 0: 1637.1. Samples: 1065942. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) +[2024-11-07 15:14:46,036][04584] Avg episode reward: [(0, '4.494')] +[2024-11-07 15:14:51,037][04584] Fps is (10 sec: 5319.7, 60 sec: 6279.6, 300 sec: 5080.0). Total num frames: 12312576. Throughput: 0: 1610.5. Samples: 1074504. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2024-11-07 15:14:51,040][04584] Avg episode reward: [(0, '4.573')] +[2024-11-07 15:14:51,137][09024] Updated weights for policy 0, policy_version 3007 (0.0068) +[2024-11-07 15:14:56,028][04584] Fps is (10 sec: 7372.9, 60 sec: 6615.2, 300 sec: 5151.2). Total num frames: 12353536. Throughput: 0: 1614.2. Samples: 1080165. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2024-11-07 15:14:56,030][04584] Avg episode reward: [(0, '4.318')] +[2024-11-07 15:14:56,610][09024] Updated weights for policy 0, policy_version 3017 (0.0036) +[2024-11-07 15:15:01,028][04584] Fps is (10 sec: 7379.8, 60 sec: 6553.6, 300 sec: 5207.4). Total num frames: 12386304. Throughput: 0: 1603.7. Samples: 1091079. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2024-11-07 15:15:01,033][04584] Avg episode reward: [(0, '4.576')] +[2024-11-07 15:15:02,672][09024] Updated weights for policy 0, policy_version 3027 (0.0045) +[2024-11-07 15:15:06,031][04584] Fps is (10 sec: 5323.2, 60 sec: 6348.5, 300 sec: 5206.8). Total num frames: 12406784. Throughput: 0: 1540.1. Samples: 1098972. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) +[2024-11-07 15:15:06,033][04584] Avg episode reward: [(0, '4.521')] +[2024-11-07 15:15:10,389][09024] Updated weights for policy 0, policy_version 3037 (0.0042) +[2024-11-07 15:15:11,029][04584] Fps is (10 sec: 5324.3, 60 sec: 6280.4, 300 sec: 5248.4). Total num frames: 12439552. Throughput: 0: 1499.1. Samples: 1102797. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2024-11-07 15:15:11,031][04584] Avg episode reward: [(0, '4.354')] +[2024-11-07 15:15:11,268][09009] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000003038_12443648.pth... +[2024-11-07 15:15:11,509][09009] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000002706_11083776.pth +[2024-11-07 15:15:16,028][04584] Fps is (10 sec: 6555.2, 60 sec: 6212.3, 300 sec: 5290.1). Total num frames: 12472320. Throughput: 0: 1503.8. Samples: 1113135. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2024-11-07 15:15:16,032][04584] Avg episode reward: [(0, '4.369')] +[2024-11-07 15:15:16,987][09024] Updated weights for policy 0, policy_version 3047 (0.0049) +[2024-11-07 15:15:21,028][04584] Fps is (10 sec: 6554.3, 60 sec: 6144.0, 300 sec: 5345.6). Total num frames: 12505088. Throughput: 0: 1518.1. Samples: 1121901. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2024-11-07 15:15:21,030][04584] Avg episode reward: [(0, '4.181')] +[2024-11-07 15:15:24,057][09024] Updated weights for policy 0, policy_version 3057 (0.0030) +[2024-11-07 15:15:26,029][04584] Fps is (10 sec: 6143.3, 60 sec: 6075.6, 300 sec: 5415.0). Total num frames: 12533760. Throughput: 0: 1497.6. Samples: 1126386. Policy #0 lag: (min: 0.0, avg: 1.2, max: 4.0) +[2024-11-07 15:15:26,031][04584] Avg episode reward: [(0, '4.241')] +[2024-11-07 15:15:29,604][09024] Updated weights for policy 0, policy_version 3067 (0.0041) +[2024-11-07 15:15:31,030][04584] Fps is (10 sec: 6551.9, 60 sec: 6332.7, 300 sec: 5484.4). Total num frames: 12570624. Throughput: 0: 1579.4. Samples: 1137018. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2024-11-07 15:15:31,040][04584] Avg episode reward: [(0, '4.564')] +[2024-11-07 15:15:35,284][09024] Updated weights for policy 0, policy_version 3077 (0.0032) +[2024-11-07 15:15:36,030][04584] Fps is (10 sec: 7372.0, 60 sec: 6280.3, 300 sec: 5553.8). Total num frames: 12607488. Throughput: 0: 1630.2. Samples: 1147851. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2024-11-07 15:15:36,032][04584] Avg episode reward: [(0, '4.505')] +[2024-11-07 15:15:41,028][04584] Fps is (10 sec: 5735.8, 60 sec: 6144.0, 300 sec: 5553.9). Total num frames: 12627968. Throughput: 0: 1625.3. Samples: 1153302. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2024-11-07 15:15:41,030][04584] Avg episode reward: [(0, '4.570')] +[2024-11-07 15:15:42,967][09024] Updated weights for policy 0, policy_version 3087 (0.0043) +[2024-11-07 15:15:46,028][04584] Fps is (10 sec: 5735.9, 60 sec: 6417.0, 300 sec: 5609.4). Total num frames: 12664832. Throughput: 0: 1542.5. Samples: 1160490. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) +[2024-11-07 15:15:46,030][04584] Avg episode reward: [(0, '4.759')] +[2024-11-07 15:15:49,866][09024] Updated weights for policy 0, policy_version 3097 (0.0048) +[2024-11-07 15:15:51,032][04584] Fps is (10 sec: 6141.4, 60 sec: 6281.1, 300 sec: 5623.2). Total num frames: 12689408. Throughput: 0: 1555.1. Samples: 1168953. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) +[2024-11-07 15:15:51,035][04584] Avg episode reward: [(0, '4.603')] +[2024-11-07 15:15:56,030][04584] Fps is (10 sec: 4914.2, 60 sec: 6007.2, 300 sec: 5637.2). Total num frames: 12713984. Throughput: 0: 1541.6. Samples: 1172169. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2024-11-07 15:15:56,034][04584] Avg episode reward: [(0, '4.540')] +[2024-11-07 15:15:58,053][09024] Updated weights for policy 0, policy_version 3107 (0.0034) +[2024-11-07 15:16:01,028][04584] Fps is (10 sec: 4917.4, 60 sec: 5871.0, 300 sec: 5692.7). Total num frames: 12738560. Throughput: 0: 1491.2. Samples: 1180236. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2024-11-07 15:16:01,031][04584] Avg episode reward: [(0, '4.267')] +[2024-11-07 15:16:06,033][04584] Fps is (10 sec: 4504.1, 60 sec: 5870.7, 300 sec: 5692.8). Total num frames: 12759040. Throughput: 0: 1437.4. Samples: 1186593. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2024-11-07 15:16:06,038][04584] Avg episode reward: [(0, '4.296')] +[2024-11-07 15:16:07,478][09024] Updated weights for policy 0, policy_version 3117 (0.0062) +[2024-11-07 15:16:11,030][04584] Fps is (10 sec: 4095.1, 60 sec: 5666.0, 300 sec: 5678.8). Total num frames: 12779520. Throughput: 0: 1411.1. Samples: 1189884. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2024-11-07 15:16:11,032][04584] Avg episode reward: [(0, '4.350')] +[2024-11-07 15:16:16,027][04584] Fps is (10 sec: 3278.6, 60 sec: 5324.9, 300 sec: 5637.2). Total num frames: 12791808. Throughput: 0: 1267.9. Samples: 1194069. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2024-11-07 15:16:16,039][04584] Avg episode reward: [(0, '4.416')] +[2024-11-07 15:16:19,334][09024] Updated weights for policy 0, policy_version 3127 (0.0075) +[2024-11-07 15:16:21,028][04584] Fps is (10 sec: 3277.4, 60 sec: 5120.0, 300 sec: 5637.2). Total num frames: 12812288. Throughput: 0: 1162.9. Samples: 1200180. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2024-11-07 15:16:21,035][04584] Avg episode reward: [(0, '4.512')] +[2024-11-07 15:16:26,028][04584] Fps is (10 sec: 4505.5, 60 sec: 5051.9, 300 sec: 5651.3). Total num frames: 12836864. Throughput: 0: 1108.3. Samples: 1203174. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2024-11-07 15:16:26,031][04584] Avg episode reward: [(0, '4.481')] +[2024-11-07 15:16:29,455][09024] Updated weights for policy 0, policy_version 3137 (0.0076) +[2024-11-07 15:16:31,034][04584] Fps is (10 sec: 4093.6, 60 sec: 4710.1, 300 sec: 5651.0). Total num frames: 12853248. Throughput: 0: 1089.5. Samples: 1209522. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) +[2024-11-07 15:16:31,035][04584] Avg episode reward: [(0, '4.504')] +[2024-11-07 15:16:36,028][04584] Fps is (10 sec: 4096.1, 60 sec: 4505.8, 300 sec: 5692.8). Total num frames: 12877824. Throughput: 0: 1049.7. Samples: 1216185. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2024-11-07 15:16:36,038][04584] Avg episode reward: [(0, '4.367')] +[2024-11-07 15:16:37,390][09024] Updated weights for policy 0, policy_version 3147 (0.0072) +[2024-11-07 15:16:41,027][04584] Fps is (10 sec: 6147.7, 60 sec: 4778.7, 300 sec: 5762.2). Total num frames: 12914688. Throughput: 0: 1088.7. Samples: 1221156. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2024-11-07 15:16:41,029][04584] Avg episode reward: [(0, '4.351')] +[2024-11-07 15:16:42,957][09024] Updated weights for policy 0, policy_version 3157 (0.0072) +[2024-11-07 15:16:46,028][04584] Fps is (10 sec: 7372.8, 60 sec: 4778.7, 300 sec: 5803.8). Total num frames: 12951552. Throughput: 0: 1153.2. Samples: 1232130. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2024-11-07 15:16:46,029][04584] Avg episode reward: [(0, '4.437')] +[2024-11-07 15:16:50,809][09024] Updated weights for policy 0, policy_version 3167 (0.0060) +[2024-11-07 15:16:51,031][04584] Fps is (10 sec: 5732.6, 60 sec: 4710.5, 300 sec: 5817.7). Total num frames: 12972032. Throughput: 0: 1166.0. Samples: 1239060. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) +[2024-11-07 15:16:51,033][04584] Avg episode reward: [(0, '4.411')] +[2024-11-07 15:16:56,028][04584] Fps is (10 sec: 5734.1, 60 sec: 4915.3, 300 sec: 5887.1). Total num frames: 13008896. Throughput: 0: 1216.6. Samples: 1244631. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2024-11-07 15:16:56,034][04584] Avg episode reward: [(0, '4.402')] +[2024-11-07 15:16:56,200][09024] Updated weights for policy 0, policy_version 3177 (0.0044) +[2024-11-07 15:17:01,029][04584] Fps is (10 sec: 7783.8, 60 sec: 5188.2, 300 sec: 5956.5). Total num frames: 13049856. Throughput: 0: 1382.7. Samples: 1256292. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) +[2024-11-07 15:17:01,031][04584] Avg episode reward: [(0, '4.430')] +[2024-11-07 15:17:01,393][09024] Updated weights for policy 0, policy_version 3187 (0.0050) +[2024-11-07 15:17:06,028][04584] Fps is (10 sec: 7782.9, 60 sec: 5461.8, 300 sec: 5970.5). Total num frames: 13086720. Throughput: 0: 1490.1. Samples: 1267233. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2024-11-07 15:17:06,029][04584] Avg episode reward: [(0, '4.488')] +[2024-11-07 15:17:07,166][09024] Updated weights for policy 0, policy_version 3197 (0.0034) +[2024-11-07 15:17:11,028][04584] Fps is (10 sec: 7373.7, 60 sec: 5734.6, 300 sec: 6039.9). Total num frames: 13123584. Throughput: 0: 1548.8. Samples: 1272870. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2024-11-07 15:17:11,029][04584] Avg episode reward: [(0, '4.295')] +[2024-11-07 15:17:11,242][09009] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000003205_13127680.pth... +[2024-11-07 15:17:11,341][09009] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000002852_11681792.pth +[2024-11-07 15:17:12,310][09024] Updated weights for policy 0, policy_version 3207 (0.0045) +[2024-11-07 15:17:16,028][04584] Fps is (10 sec: 7372.8, 60 sec: 6144.0, 300 sec: 6067.7). Total num frames: 13160448. Throughput: 0: 1663.0. Samples: 1284345. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) +[2024-11-07 15:17:16,029][04584] Avg episode reward: [(0, '4.660')] +[2024-11-07 15:17:17,924][09024] Updated weights for policy 0, policy_version 3217 (0.0044) +[2024-11-07 15:17:21,028][04584] Fps is (10 sec: 6963.1, 60 sec: 6348.8, 300 sec: 6081.5). Total num frames: 13193216. Throughput: 0: 1750.2. Samples: 1294944. Policy #0 lag: (min: 0.0, avg: 1.5, max: 3.0) +[2024-11-07 15:17:21,039][04584] Avg episode reward: [(0, '4.639')] +[2024-11-07 15:17:26,029][04584] Fps is (10 sec: 5324.3, 60 sec: 6280.5, 300 sec: 6040.0). Total num frames: 13213696. Throughput: 0: 1676.6. Samples: 1296603. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) +[2024-11-07 15:17:26,032][04584] Avg episode reward: [(0, '4.504')] +[2024-11-07 15:17:26,557][09024] Updated weights for policy 0, policy_version 3227 (0.0042) +[2024-11-07 15:17:31,028][04584] Fps is (10 sec: 5734.2, 60 sec: 6622.5, 300 sec: 6040.0). Total num frames: 13250560. Throughput: 0: 1658.7. Samples: 1306773. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2024-11-07 15:17:31,030][04584] Avg episode reward: [(0, '4.591')] +[2024-11-07 15:17:31,592][09024] Updated weights for policy 0, policy_version 3237 (0.0043) +[2024-11-07 15:17:36,028][04584] Fps is (10 sec: 6963.7, 60 sec: 6758.4, 300 sec: 6053.8). Total num frames: 13283328. Throughput: 0: 1743.0. Samples: 1317489. Policy #0 lag: (min: 0.0, avg: 1.1, max: 4.0) +[2024-11-07 15:17:36,039][04584] Avg episode reward: [(0, '4.412')] +[2024-11-07 15:17:40,291][09024] Updated weights for policy 0, policy_version 3247 (0.0078) +[2024-11-07 15:17:41,030][04584] Fps is (10 sec: 4914.0, 60 sec: 6416.8, 300 sec: 5984.3). Total num frames: 13299712. Throughput: 0: 1683.3. Samples: 1320381. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2024-11-07 15:17:41,034][04584] Avg episode reward: [(0, '4.433')] +[2024-11-07 15:17:46,035][04584] Fps is (10 sec: 3274.6, 60 sec: 6075.0, 300 sec: 5984.2). Total num frames: 13316096. Throughput: 0: 1530.9. Samples: 1325190. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) +[2024-11-07 15:17:46,038][04584] Avg episode reward: [(0, '4.402')] +[2024-11-07 15:17:51,032][04584] Fps is (10 sec: 3685.8, 60 sec: 6075.6, 300 sec: 5928.7). Total num frames: 13336576. Throughput: 0: 1412.1. Samples: 1330785. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) +[2024-11-07 15:17:51,040][04584] Avg episode reward: [(0, '4.321')] +[2024-11-07 15:17:51,466][09024] Updated weights for policy 0, policy_version 3257 (0.0066) +[2024-11-07 15:17:57,446][04584] Fps is (10 sec: 2871.7, 60 sec: 5535.4, 300 sec: 5831.4). Total num frames: 13348864. Throughput: 0: 1311.9. Samples: 1333767. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) +[2024-11-07 15:17:57,457][04584] Avg episode reward: [(0, '4.297')] +[2024-11-07 15:18:01,037][04584] Fps is (10 sec: 2865.7, 60 sec: 5255.8, 300 sec: 5817.6). Total num frames: 13365248. Throughput: 0: 1179.2. Samples: 1337421. Policy #0 lag: (min: 0.0, avg: 0.9, max: 4.0) +[2024-11-07 15:18:01,041][04584] Avg episode reward: [(0, '4.410')] +[2024-11-07 15:18:06,028][04584] Fps is (10 sec: 2863.6, 60 sec: 4778.6, 300 sec: 5748.3). Total num frames: 13373440. Throughput: 0: 1028.0. Samples: 1341204. Policy #0 lag: (min: 0.0, avg: 1.6, max: 3.0) +[2024-11-07 15:18:06,035][04584] Avg episode reward: [(0, '4.401')] +[2024-11-07 15:18:06,827][09024] Updated weights for policy 0, policy_version 3267 (0.0073) +[2024-11-07 15:18:11,028][04584] Fps is (10 sec: 3279.8, 60 sec: 4573.8, 300 sec: 5720.5). Total num frames: 13398016. Throughput: 0: 1053.3. Samples: 1344000. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) +[2024-11-07 15:18:11,035][04584] Avg episode reward: [(0, '4.495')] +[2024-11-07 15:18:16,033][04584] Fps is (10 sec: 4503.3, 60 sec: 4300.4, 300 sec: 5714.8). Total num frames: 13418496. Throughput: 0: 978.6. Samples: 1350813. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2024-11-07 15:18:16,047][04584] Avg episode reward: [(0, '4.469')] +[2024-11-07 15:18:16,227][09024] Updated weights for policy 0, policy_version 3277 (0.0046) +[2024-11-07 15:18:21,028][04584] Fps is (10 sec: 4096.1, 60 sec: 4096.0, 300 sec: 5665.0). Total num frames: 13438976. Throughput: 0: 873.2. Samples: 1356783. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) +[2024-11-07 15:18:21,031][04584] Avg episode reward: [(0, '4.397')] +[2024-11-07 15:18:25,955][09024] Updated weights for policy 0, policy_version 3287 (0.0095) +[2024-11-07 15:18:26,028][04584] Fps is (10 sec: 4507.9, 60 sec: 4164.3, 300 sec: 5637.2). Total num frames: 13463552. Throughput: 0: 880.2. Samples: 1359987. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2024-11-07 15:18:26,029][04584] Avg episode reward: [(0, '4.511')] +[2024-11-07 15:18:31,786][04584] Fps is (10 sec: 3045.8, 60 sec: 3640.4, 300 sec: 5553.6). Total num frames: 13471744. Throughput: 0: 882.3. Samples: 1365555. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2024-11-07 15:18:31,802][04584] Avg episode reward: [(0, '4.441')] +[2024-11-07 15:18:36,028][04584] Fps is (10 sec: 2457.6, 60 sec: 3413.3, 300 sec: 5512.4). Total num frames: 13488128. Throughput: 0: 844.5. Samples: 1368783. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2024-11-07 15:18:36,046][04584] Avg episode reward: [(0, '4.248')] +[2024-11-07 15:18:40,005][09024] Updated weights for policy 0, policy_version 3297 (0.0126) +[2024-11-07 15:18:41,028][04584] Fps is (10 sec: 3545.7, 60 sec: 3413.5, 300 sec: 5442.8). Total num frames: 13504512. Throughput: 0: 868.6. Samples: 1371624. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) +[2024-11-07 15:18:41,031][04584] Avg episode reward: [(0, '4.406')] +[2024-11-07 15:18:46,028][04584] Fps is (10 sec: 4095.8, 60 sec: 3550.2, 300 sec: 5401.2). Total num frames: 13529088. Throughput: 0: 893.7. Samples: 1377630. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2024-11-07 15:18:46,041][04584] Avg episode reward: [(0, '4.452')] +[2024-11-07 15:18:49,850][09024] Updated weights for policy 0, policy_version 3307 (0.0041) +[2024-11-07 15:18:51,028][04584] Fps is (10 sec: 4095.8, 60 sec: 3481.8, 300 sec: 5381.0). Total num frames: 13545472. Throughput: 0: 942.9. Samples: 1383636. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2024-11-07 15:18:51,037][04584] Avg episode reward: [(0, '4.478')] +[2024-11-07 15:18:56,031][04584] Fps is (10 sec: 4504.3, 60 sec: 3845.3, 300 sec: 5359.4). Total num frames: 13574144. Throughput: 0: 959.2. Samples: 1387167. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2024-11-07 15:18:56,040][04584] Avg episode reward: [(0, '4.499')] +[2024-11-07 15:18:57,652][09024] Updated weights for policy 0, policy_version 3317 (0.0054) +[2024-11-07 15:19:01,028][04584] Fps is (10 sec: 5734.7, 60 sec: 3960.1, 300 sec: 5345.6). Total num frames: 13602816. Throughput: 0: 1009.3. Samples: 1396227. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2024-11-07 15:19:01,042][04584] Avg episode reward: [(0, '4.339')] +[2024-11-07 15:19:06,270][04584] Fps is (10 sec: 4800.8, 60 sec: 4147.6, 300 sec: 5285.8). Total num frames: 13623296. Throughput: 0: 962.7. Samples: 1400337. Policy #0 lag: (min: 0.0, avg: 1.2, max: 4.0) +[2024-11-07 15:19:06,275][04584] Avg episode reward: [(0, '4.364')] +[2024-11-07 15:19:06,706][09024] Updated weights for policy 0, policy_version 3327 (0.0078) +[2024-11-07 15:19:11,028][04584] Fps is (10 sec: 4915.3, 60 sec: 4232.6, 300 sec: 5262.3). Total num frames: 13651968. Throughput: 0: 1026.0. Samples: 1406157. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) +[2024-11-07 15:19:11,031][04584] Avg episode reward: [(0, '4.456')] +[2024-11-07 15:19:11,165][09009] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000003333_13651968.pth... +[2024-11-07 15:19:11,634][09009] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000003038_12443648.pth +[2024-11-07 15:19:13,368][09024] Updated weights for policy 0, policy_version 3337 (0.0061) +[2024-11-07 15:19:16,029][04584] Fps is (10 sec: 6295.7, 60 sec: 4437.6, 300 sec: 5248.4). Total num frames: 13684736. Throughput: 0: 1129.8. Samples: 1415538. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) +[2024-11-07 15:19:16,031][04584] Avg episode reward: [(0, '4.392')] +[2024-11-07 15:19:19,544][09024] Updated weights for policy 0, policy_version 3347 (0.0048) +[2024-11-07 15:19:21,028][04584] Fps is (10 sec: 6553.6, 60 sec: 4642.2, 300 sec: 5248.4). Total num frames: 13717504. Throughput: 0: 1260.9. Samples: 1425525. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2024-11-07 15:19:21,030][04584] Avg episode reward: [(0, '4.638')] +[2024-11-07 15:19:25,100][09024] Updated weights for policy 0, policy_version 3357 (0.0035) +[2024-11-07 15:19:26,027][04584] Fps is (10 sec: 7373.7, 60 sec: 4915.2, 300 sec: 5312.9). Total num frames: 13758464. Throughput: 0: 1307.5. Samples: 1430463. Policy #0 lag: (min: 0.0, avg: 1.5, max: 3.0) +[2024-11-07 15:19:26,029][04584] Avg episode reward: [(0, '4.403')] +[2024-11-07 15:19:30,049][09024] Updated weights for policy 0, policy_version 3367 (0.0042) +[2024-11-07 15:19:31,028][04584] Fps is (10 sec: 8192.0, 60 sec: 5531.3, 300 sec: 5317.9). Total num frames: 13799424. Throughput: 0: 1453.4. Samples: 1443030. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2024-11-07 15:19:31,030][04584] Avg episode reward: [(0, '4.426')] +[2024-11-07 15:19:35,383][09024] Updated weights for policy 0, policy_version 3377 (0.0046) +[2024-11-07 15:19:36,028][04584] Fps is (10 sec: 7782.2, 60 sec: 5802.7, 300 sec: 5345.6). Total num frames: 13836288. Throughput: 0: 1583.3. Samples: 1454883. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2024-11-07 15:19:36,030][04584] Avg episode reward: [(0, '4.432')] +[2024-11-07 15:19:41,028][04584] Fps is (10 sec: 5734.0, 60 sec: 5870.9, 300 sec: 5345.6). Total num frames: 13856768. Throughput: 0: 1627.3. Samples: 1460391. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2024-11-07 15:19:41,031][04584] Avg episode reward: [(0, '4.316')] +[2024-11-07 15:19:43,198][09024] Updated weights for policy 0, policy_version 3387 (0.0046) +[2024-11-07 15:19:46,028][04584] Fps is (10 sec: 5734.4, 60 sec: 6075.8, 300 sec: 5359.7). Total num frames: 13893632. Throughput: 0: 1574.5. Samples: 1467081. Policy #0 lag: (min: 0.0, avg: 1.4, max: 2.0) +[2024-11-07 15:19:46,031][04584] Avg episode reward: [(0, '4.465')] +[2024-11-07 15:19:48,791][09024] Updated weights for policy 0, policy_version 3397 (0.0040) +[2024-11-07 15:19:51,029][04584] Fps is (10 sec: 7372.6, 60 sec: 6417.0, 300 sec: 5345.6). Total num frames: 13930496. Throughput: 0: 1741.6. Samples: 1478289. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) +[2024-11-07 15:19:51,031][04584] Avg episode reward: [(0, '4.315')] +[2024-11-07 15:19:56,032][04584] Fps is (10 sec: 5732.1, 60 sec: 6280.5, 300 sec: 5303.9). Total num frames: 13950976. Throughput: 0: 1703.6. Samples: 1482825. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) +[2024-11-07 15:19:56,033][04584] Avg episode reward: [(0, '4.279')] +[2024-11-07 15:19:56,173][09024] Updated weights for policy 0, policy_version 3407 (0.0061) +[2024-11-07 15:20:01,028][04584] Fps is (10 sec: 5325.3, 60 sec: 6348.8, 300 sec: 5345.7). Total num frames: 13983744. Throughput: 0: 1658.4. Samples: 1490166. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2024-11-07 15:20:01,038][04584] Avg episode reward: [(0, '4.491')] +[2024-11-07 15:20:03,670][09024] Updated weights for policy 0, policy_version 3417 (0.0068) +[2024-11-07 15:20:06,028][04584] Fps is (10 sec: 5736.7, 60 sec: 6443.0, 300 sec: 5317.9). Total num frames: 14008320. Throughput: 0: 1613.7. Samples: 1498143. Policy #0 lag: (min: 0.0, avg: 1.5, max: 3.0) +[2024-11-07 15:20:06,033][04584] Avg episode reward: [(0, '4.486')] +[2024-11-07 15:20:10,251][09024] Updated weights for policy 0, policy_version 3427 (0.0055) +[2024-11-07 15:20:11,031][04584] Fps is (10 sec: 5323.5, 60 sec: 6416.8, 300 sec: 5303.9). Total num frames: 14036992. Throughput: 0: 1615.9. Samples: 1503183. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2024-11-07 15:20:11,033][04584] Avg episode reward: [(0, '4.214')] +[2024-11-07 15:20:16,028][04584] Fps is (10 sec: 4505.5, 60 sec: 6144.1, 300 sec: 5248.4). Total num frames: 14053376. Throughput: 0: 1494.3. Samples: 1510272. Policy #0 lag: (min: 0.0, avg: 1.5, max: 2.0) +[2024-11-07 15:20:16,029][04584] Avg episode reward: [(0, '4.210')] +[2024-11-07 15:20:19,639][09024] Updated weights for policy 0, policy_version 3437 (0.0053) +[2024-11-07 15:20:21,033][04584] Fps is (10 sec: 4915.4, 60 sec: 6143.8, 300 sec: 5262.3). Total num frames: 14086144. Throughput: 0: 1395.5. Samples: 1517682. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2024-11-07 15:20:21,037][04584] Avg episode reward: [(0, '4.434')] +[2024-11-07 15:20:26,028][04584] Fps is (10 sec: 6143.8, 60 sec: 5939.1, 300 sec: 5234.6). Total num frames: 14114816. Throughput: 0: 1373.1. Samples: 1522182. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2024-11-07 15:20:26,032][04584] Avg episode reward: [(0, '4.349')] +[2024-11-07 15:20:26,131][09024] Updated weights for policy 0, policy_version 3447 (0.0076) +[2024-11-07 15:20:31,028][04584] Fps is (10 sec: 6144.9, 60 sec: 5802.6, 300 sec: 5220.7). Total num frames: 14147584. Throughput: 0: 1439.2. Samples: 1531848. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2024-11-07 15:20:31,031][04584] Avg episode reward: [(0, '4.535')] +[2024-11-07 15:20:32,703][09024] Updated weights for policy 0, policy_version 3457 (0.0041) +[2024-11-07 15:20:36,028][04584] Fps is (10 sec: 6554.1, 60 sec: 5734.4, 300 sec: 5262.3). Total num frames: 14180352. Throughput: 0: 1398.4. Samples: 1541214. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2024-11-07 15:20:36,030][04584] Avg episode reward: [(0, '4.496')] +[2024-11-07 15:20:38,720][09024] Updated weights for policy 0, policy_version 3467 (0.0053) +[2024-11-07 15:20:41,028][04584] Fps is (10 sec: 6553.8, 60 sec: 5939.2, 300 sec: 5248.4). Total num frames: 14213120. Throughput: 0: 1413.5. Samples: 1546428. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) +[2024-11-07 15:20:41,042][04584] Avg episode reward: [(0, '4.401')] +[2024-11-07 15:20:45,336][09024] Updated weights for policy 0, policy_version 3477 (0.0039) +[2024-11-07 15:20:46,028][04584] Fps is (10 sec: 6143.7, 60 sec: 5802.6, 300 sec: 5262.4). Total num frames: 14241792. Throughput: 0: 1449.3. Samples: 1555386. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) +[2024-11-07 15:20:46,041][04584] Avg episode reward: [(0, '4.415')] +[2024-11-07 15:20:51,028][04584] Fps is (10 sec: 4505.7, 60 sec: 5461.4, 300 sec: 5234.6). Total num frames: 14258176. Throughput: 0: 1400.9. Samples: 1561185. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) +[2024-11-07 15:20:51,034][04584] Avg episode reward: [(0, '4.540')] +[2024-11-07 15:20:54,462][09024] Updated weights for policy 0, policy_version 3487 (0.0060) +[2024-11-07 15:20:56,031][04584] Fps is (10 sec: 4913.8, 60 sec: 5666.2, 300 sec: 5262.3). Total num frames: 14290944. Throughput: 0: 1396.8. Samples: 1566039. Policy #0 lag: (min: 0.0, avg: 1.6, max: 3.0) +[2024-11-07 15:20:56,042][04584] Avg episode reward: [(0, '4.524')] +[2024-11-07 15:21:01,028][04584] Fps is (10 sec: 5734.5, 60 sec: 5529.6, 300 sec: 5276.3). Total num frames: 14315520. Throughput: 0: 1413.1. Samples: 1573863. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2024-11-07 15:21:01,031][04584] Avg episode reward: [(0, '4.498')] +[2024-11-07 15:21:02,337][09024] Updated weights for policy 0, policy_version 3497 (0.0062) +[2024-11-07 15:21:06,028][04584] Fps is (10 sec: 4916.8, 60 sec: 5529.6, 300 sec: 5290.1). Total num frames: 14340096. Throughput: 0: 1409.3. Samples: 1581096. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) +[2024-11-07 15:21:06,031][04584] Avg episode reward: [(0, '4.470')] +[2024-11-07 15:21:09,697][09024] Updated weights for policy 0, policy_version 3507 (0.0050) +[2024-11-07 15:21:11,028][04584] Fps is (10 sec: 5324.8, 60 sec: 5529.8, 300 sec: 5345.6). Total num frames: 14368768. Throughput: 0: 1419.2. Samples: 1586046. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2024-11-07 15:21:11,039][04584] Avg episode reward: [(0, '4.391')] +[2024-11-07 15:21:11,183][09009] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000003509_14372864.pth... +[2024-11-07 15:21:11,472][09009] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000003205_13127680.pth +[2024-11-07 15:21:16,028][04584] Fps is (10 sec: 5734.2, 60 sec: 5734.4, 300 sec: 5373.4). Total num frames: 14397440. Throughput: 0: 1388.4. Samples: 1594326. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2024-11-07 15:21:16,035][04584] Avg episode reward: [(0, '4.430')] +[2024-11-07 15:21:16,959][09024] Updated weights for policy 0, policy_version 3517 (0.0056) +[2024-11-07 15:21:21,028][04584] Fps is (10 sec: 5734.3, 60 sec: 5666.3, 300 sec: 5387.3). Total num frames: 14426112. Throughput: 0: 1373.9. Samples: 1603041. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) +[2024-11-07 15:21:21,030][04584] Avg episode reward: [(0, '4.463')] +[2024-11-07 15:21:26,028][04584] Fps is (10 sec: 4505.6, 60 sec: 5461.4, 300 sec: 5387.4). Total num frames: 14442496. Throughput: 0: 1323.1. Samples: 1605966. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2024-11-07 15:21:26,031][04584] Avg episode reward: [(0, '4.369')] +[2024-11-07 15:21:26,572][09024] Updated weights for policy 0, policy_version 3527 (0.0072) +[2024-11-07 15:21:31,029][04584] Fps is (10 sec: 4505.0, 60 sec: 5393.0, 300 sec: 5401.1). Total num frames: 14471168. Throughput: 0: 1270.9. Samples: 1612578. Policy #0 lag: (min: 0.0, avg: 1.5, max: 3.0) +[2024-11-07 15:21:31,031][04584] Avg episode reward: [(0, '4.422')] +[2024-11-07 15:21:32,854][09024] Updated weights for policy 0, policy_version 3537 (0.0041) +[2024-11-07 15:21:36,028][04584] Fps is (10 sec: 6553.3, 60 sec: 5461.3, 300 sec: 5401.1). Total num frames: 14508032. Throughput: 0: 1370.6. Samples: 1622862. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2024-11-07 15:21:36,030][04584] Avg episode reward: [(0, '4.480')] +[2024-11-07 15:21:38,882][09024] Updated weights for policy 0, policy_version 3547 (0.0037) +[2024-11-07 15:21:41,028][04584] Fps is (10 sec: 6964.0, 60 sec: 5461.3, 300 sec: 5387.3). Total num frames: 14540800. Throughput: 0: 1373.6. Samples: 1627848. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2024-11-07 15:21:41,031][04584] Avg episode reward: [(0, '4.407')] +[2024-11-07 15:21:45,700][09024] Updated weights for policy 0, policy_version 3557 (0.0073) +[2024-11-07 15:21:46,028][04584] Fps is (10 sec: 6144.4, 60 sec: 5461.3, 300 sec: 5415.1). Total num frames: 14569472. Throughput: 0: 1403.5. Samples: 1637022. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2024-11-07 15:21:46,032][04584] Avg episode reward: [(0, '4.427')] +[2024-11-07 15:21:51,028][04584] Fps is (10 sec: 6553.7, 60 sec: 5802.7, 300 sec: 5415.1). Total num frames: 14606336. Throughput: 0: 1468.6. Samples: 1647183. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2024-11-07 15:21:51,030][04584] Avg episode reward: [(0, '4.294')] +[2024-11-07 15:21:51,480][09024] Updated weights for policy 0, policy_version 3567 (0.0047) +[2024-11-07 15:21:56,028][04584] Fps is (10 sec: 6963.1, 60 sec: 5802.9, 300 sec: 5387.3). Total num frames: 14639104. Throughput: 0: 1479.7. Samples: 1652631. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2024-11-07 15:21:56,037][04584] Avg episode reward: [(0, '4.431')] +[2024-11-07 15:21:59,534][09024] Updated weights for policy 0, policy_version 3577 (0.0050) +[2024-11-07 15:22:01,029][04584] Fps is (10 sec: 5324.4, 60 sec: 5734.3, 300 sec: 5331.7). Total num frames: 14659584. Throughput: 0: 1446.8. Samples: 1659432. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2024-11-07 15:22:01,033][04584] Avg episode reward: [(0, '4.272')] +[2024-11-07 15:22:06,031][04584] Fps is (10 sec: 4913.5, 60 sec: 5802.3, 300 sec: 5303.9). Total num frames: 14688256. Throughput: 0: 1439.5. Samples: 1667823. Policy #0 lag: (min: 0.0, avg: 1.5, max: 3.0) +[2024-11-07 15:22:06,033][04584] Avg episode reward: [(0, '4.296')] +[2024-11-07 15:22:06,795][09024] Updated weights for policy 0, policy_version 3587 (0.0040) +[2024-11-07 15:22:11,027][04584] Fps is (10 sec: 5735.0, 60 sec: 5802.7, 300 sec: 5276.2). Total num frames: 14716928. Throughput: 0: 1472.8. Samples: 1672242. Policy #0 lag: (min: 0.0, avg: 1.2, max: 2.0) +[2024-11-07 15:22:11,030][04584] Avg episode reward: [(0, '4.152')] +[2024-11-07 15:22:12,703][09024] Updated weights for policy 0, policy_version 3597 (0.0034) +[2024-11-07 15:22:16,029][04584] Fps is (10 sec: 6145.8, 60 sec: 5870.9, 300 sec: 5276.2). Total num frames: 14749696. Throughput: 0: 1566.5. Samples: 1683072. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2024-11-07 15:22:16,030][04584] Avg episode reward: [(0, '4.296')] +[2024-11-07 15:22:20,096][09024] Updated weights for policy 0, policy_version 3607 (0.0047) +[2024-11-07 15:22:21,028][04584] Fps is (10 sec: 5734.2, 60 sec: 5802.7, 300 sec: 5290.1). Total num frames: 14774272. Throughput: 0: 1506.8. Samples: 1690665. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) +[2024-11-07 15:22:21,031][04584] Avg episode reward: [(0, '4.260')] +[2024-11-07 15:22:26,028][04584] Fps is (10 sec: 4505.9, 60 sec: 5871.0, 300 sec: 5234.6). Total num frames: 14794752. Throughput: 0: 1458.4. Samples: 1693476. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2024-11-07 15:22:26,031][04584] Avg episode reward: [(0, '4.190')] +[2024-11-07 15:22:29,189][09024] Updated weights for policy 0, policy_version 3617 (0.0069) +[2024-11-07 15:22:31,030][04584] Fps is (10 sec: 4914.7, 60 sec: 5871.0, 300 sec: 5220.6). Total num frames: 14823424. Throughput: 0: 1418.5. Samples: 1700856. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0) +[2024-11-07 15:22:31,035][04584] Avg episode reward: [(0, '4.454')] +[2024-11-07 15:22:36,031][04584] Fps is (10 sec: 4094.8, 60 sec: 5461.1, 300 sec: 5206.8). Total num frames: 14835712. Throughput: 0: 1289.9. Samples: 1705233. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) +[2024-11-07 15:22:36,034][04584] Avg episode reward: [(0, '4.470')] +[2024-11-07 15:22:39,738][09024] Updated weights for policy 0, policy_version 3627 (0.0034) +[2024-11-07 15:22:41,030][04584] Fps is (10 sec: 3685.9, 60 sec: 5324.6, 300 sec: 5234.6). Total num frames: 14860288. Throughput: 0: 1252.8. Samples: 1709010. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2024-11-07 15:22:41,035][04584] Avg episode reward: [(0, '4.389')] +[2024-11-07 15:22:46,028][04584] Fps is (10 sec: 5326.5, 60 sec: 5324.8, 300 sec: 5262.4). Total num frames: 14888960. Throughput: 0: 1274.8. Samples: 1716798. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) +[2024-11-07 15:22:46,038][04584] Avg episode reward: [(0, '4.541')] +[2024-11-07 15:22:47,436][09024] Updated weights for policy 0, policy_version 3637 (0.0068) +[2024-11-07 15:22:51,028][04584] Fps is (10 sec: 5326.1, 60 sec: 5120.0, 300 sec: 5329.6). Total num frames: 14913536. Throughput: 0: 1276.2. Samples: 1725249. Policy #0 lag: (min: 0.0, avg: 1.3, max: 2.0) +[2024-11-07 15:22:51,030][04584] Avg episode reward: [(0, '4.592')] +[2024-11-07 15:22:54,034][09024] Updated weights for policy 0, policy_version 3647 (0.0041) +[2024-11-07 15:22:56,028][04584] Fps is (10 sec: 6144.0, 60 sec: 5188.3, 300 sec: 5373.6). Total num frames: 14950400. Throughput: 0: 1286.1. Samples: 1730118. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2024-11-07 15:22:56,029][04584] Avg episode reward: [(0, '4.450')] +[2024-11-07 15:22:59,685][09024] Updated weights for policy 0, policy_version 3657 (0.0033) +[2024-11-07 15:23:01,029][04584] Fps is (10 sec: 6962.5, 60 sec: 5393.1, 300 sec: 5456.7). Total num frames: 14983168. Throughput: 0: 1286.4. Samples: 1740960. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2024-11-07 15:23:01,043][04584] Avg episode reward: [(0, '4.578')] +[2024-11-07 15:23:07,933][04584] Fps is (10 sec: 5160.5, 60 sec: 5227.4, 300 sec: 5435.5). Total num frames: 15011840. Throughput: 0: 1254.4. Samples: 1749504. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2024-11-07 15:23:07,937][04584] Avg episode reward: [(0, '4.497')] +[2024-11-07 15:23:08,868][09024] Updated weights for policy 0, policy_version 3667 (0.0061) +[2024-11-07 15:23:11,028][04584] Fps is (10 sec: 4915.6, 60 sec: 5256.5, 300 sec: 5470.7). Total num frames: 15032320. Throughput: 0: 1274.7. Samples: 1750839. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) +[2024-11-07 15:23:11,030][04584] Avg episode reward: [(0, '4.241')] +[2024-11-07 15:23:11,056][09009] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000003670_15032320.pth... +[2024-11-07 15:23:11,488][09009] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000003333_13651968.pth +[2024-11-07 15:23:15,771][09024] Updated weights for policy 0, policy_version 3677 (0.0062) +[2024-11-07 15:23:16,028][04584] Fps is (10 sec: 6072.5, 60 sec: 5188.3, 300 sec: 5498.4). Total num frames: 15060992. Throughput: 0: 1307.6. Samples: 1759698. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2024-11-07 15:23:16,030][04584] Avg episode reward: [(0, '4.370')] +[2024-11-07 15:23:21,028][04584] Fps is (10 sec: 5734.5, 60 sec: 5256.6, 300 sec: 5512.2). Total num frames: 15089664. Throughput: 0: 1416.4. Samples: 1768965. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2024-11-07 15:23:21,030][04584] Avg episode reward: [(0, '4.447')] +[2024-11-07 15:23:22,754][09024] Updated weights for policy 0, policy_version 3687 (0.0042) +[2024-11-07 15:23:26,028][04584] Fps is (10 sec: 6553.6, 60 sec: 5529.6, 300 sec: 5623.9). Total num frames: 15126528. Throughput: 0: 1440.3. Samples: 1773822. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2024-11-07 15:23:26,031][04584] Avg episode reward: [(0, '4.320')] +[2024-11-07 15:23:27,486][09024] Updated weights for policy 0, policy_version 3697 (0.0048) +[2024-11-07 15:23:31,028][04584] Fps is (10 sec: 7781.8, 60 sec: 5734.4, 300 sec: 5692.7). Total num frames: 15167488. Throughput: 0: 1530.0. Samples: 1785651. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2024-11-07 15:23:31,032][04584] Avg episode reward: [(0, '4.306')] +[2024-11-07 15:23:32,802][09024] Updated weights for policy 0, policy_version 3707 (0.0041) +[2024-11-07 15:23:36,033][04584] Fps is (10 sec: 7778.0, 60 sec: 6143.7, 300 sec: 5762.1). Total num frames: 15204352. Throughput: 0: 1604.4. Samples: 1797456. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) +[2024-11-07 15:23:36,036][04584] Avg episode reward: [(0, '4.600')] +[2024-11-07 15:23:39,526][09024] Updated weights for policy 0, policy_version 3717 (0.0049) +[2024-11-07 15:23:42,489][04584] Fps is (10 sec: 5361.2, 60 sec: 5998.2, 300 sec: 5733.8). Total num frames: 15228928. Throughput: 0: 1531.2. Samples: 1801260. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) +[2024-11-07 15:23:42,491][04584] Avg episode reward: [(0, '4.589')] +[2024-11-07 15:23:46,027][04584] Fps is (10 sec: 4508.3, 60 sec: 6007.5, 300 sec: 5776.1). Total num frames: 15249408. Throughput: 0: 1467.6. Samples: 1806999. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0) +[2024-11-07 15:23:46,030][04584] Avg episode reward: [(0, '4.559')] +[2024-11-07 15:23:47,911][09024] Updated weights for policy 0, policy_version 3727 (0.0051) +[2024-11-07 15:23:51,028][04584] Fps is (10 sec: 6715.1, 60 sec: 6212.2, 300 sec: 5803.9). Total num frames: 15286272. Throughput: 0: 1578.8. Samples: 1817544. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2024-11-07 15:23:51,030][04584] Avg episode reward: [(0, '4.696')] +[2024-11-07 15:23:53,536][09024] Updated weights for policy 0, policy_version 3737 (0.0043) +[2024-11-07 15:23:56,028][04584] Fps is (10 sec: 7782.1, 60 sec: 6280.5, 300 sec: 5845.5). Total num frames: 15327232. Throughput: 0: 1607.7. Samples: 1823187. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2024-11-07 15:23:56,029][04584] Avg episode reward: [(0, '4.438')] +[2024-11-07 15:23:58,746][09024] Updated weights for policy 0, policy_version 3747 (0.0047) +[2024-11-07 15:24:01,028][04584] Fps is (10 sec: 7782.8, 60 sec: 6348.9, 300 sec: 5905.9). Total num frames: 15364096. Throughput: 0: 1671.8. Samples: 1834929. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2024-11-07 15:24:01,033][04584] Avg episode reward: [(0, '4.452')] +[2024-11-07 15:24:04,806][09024] Updated weights for policy 0, policy_version 3757 (0.0063) +[2024-11-07 15:24:06,030][04584] Fps is (10 sec: 6961.5, 60 sec: 6627.3, 300 sec: 5914.9). Total num frames: 15396864. Throughput: 0: 1695.1. Samples: 1845249. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2024-11-07 15:24:06,035][04584] Avg episode reward: [(0, '4.504')] +[2024-11-07 15:24:10,314][09024] Updated weights for policy 0, policy_version 3767 (0.0044) +[2024-11-07 15:24:11,028][04584] Fps is (10 sec: 6963.2, 60 sec: 6690.2, 300 sec: 5928.8). Total num frames: 15433728. Throughput: 0: 1711.3. Samples: 1850832. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2024-11-07 15:24:11,031][04584] Avg episode reward: [(0, '4.455')] +[2024-11-07 15:24:16,915][04584] Fps is (10 sec: 5644.4, 60 sec: 6525.3, 300 sec: 5883.3). Total num frames: 15458304. Throughput: 0: 1663.4. Samples: 1861977. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2024-11-07 15:24:16,920][04584] Avg episode reward: [(0, '4.409')] +[2024-11-07 15:24:17,977][09024] Updated weights for policy 0, policy_version 3777 (0.0042) +[2024-11-07 15:24:21,033][04584] Fps is (10 sec: 5734.2, 60 sec: 6690.1, 300 sec: 5873.2). Total num frames: 15491072. Throughput: 0: 1585.9. Samples: 1868814. Policy #0 lag: (min: 0.0, avg: 1.5, max: 3.0) +[2024-11-07 15:24:21,041][04584] Avg episode reward: [(0, '4.502')] +[2024-11-07 15:24:23,334][09024] Updated weights for policy 0, policy_version 3787 (0.0046) +[2024-11-07 15:24:26,028][04584] Fps is (10 sec: 7641.4, 60 sec: 6690.1, 300 sec: 5859.4). Total num frames: 15527936. Throughput: 0: 1676.7. Samples: 1874262. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) +[2024-11-07 15:24:26,032][04584] Avg episode reward: [(0, '4.557')] +[2024-11-07 15:24:28,965][09024] Updated weights for policy 0, policy_version 3797 (0.0066) +[2024-11-07 15:24:31,028][04584] Fps is (10 sec: 7782.5, 60 sec: 6690.2, 300 sec: 5873.2). Total num frames: 15568896. Throughput: 0: 1750.1. Samples: 1885752. Policy #0 lag: (min: 0.0, avg: 1.4, max: 4.0) +[2024-11-07 15:24:31,030][04584] Avg episode reward: [(0, '4.387')] +[2024-11-07 15:24:33,986][09024] Updated weights for policy 0, policy_version 3807 (0.0035) +[2024-11-07 15:24:36,028][04584] Fps is (10 sec: 7782.2, 60 sec: 6690.7, 300 sec: 5928.8). Total num frames: 15605760. Throughput: 0: 1776.5. Samples: 1897488. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2024-11-07 15:24:36,031][04584] Avg episode reward: [(0, '4.476')] +[2024-11-07 15:24:39,316][09024] Updated weights for policy 0, policy_version 3817 (0.0047) +[2024-11-07 15:24:41,028][04584] Fps is (10 sec: 7372.9, 60 sec: 7067.0, 300 sec: 5928.8). Total num frames: 15642624. Throughput: 0: 1785.7. Samples: 1903545. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2024-11-07 15:24:41,030][04584] Avg episode reward: [(0, '4.429')] +[2024-11-07 15:24:45,770][09024] Updated weights for policy 0, policy_version 3827 (0.0057) +[2024-11-07 15:24:46,029][04584] Fps is (10 sec: 6962.5, 60 sec: 7099.6, 300 sec: 5914.9). Total num frames: 15675392. Throughput: 0: 1739.0. Samples: 1913187. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) +[2024-11-07 15:24:46,033][04584] Avg episode reward: [(0, '4.549')] +[2024-11-07 15:24:51,363][04584] Fps is (10 sec: 5152.1, 60 sec: 6788.8, 300 sec: 5908.3). Total num frames: 15695872. Throughput: 0: 1610.0. Samples: 1918233. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) +[2024-11-07 15:24:51,366][04584] Avg episode reward: [(0, '4.576')] +[2024-11-07 15:24:55,001][09024] Updated weights for policy 0, policy_version 3837 (0.0038) +[2024-11-07 15:24:56,028][04584] Fps is (10 sec: 4506.1, 60 sec: 6553.6, 300 sec: 5887.1). Total num frames: 15720448. Throughput: 0: 1615.8. Samples: 1923543. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) +[2024-11-07 15:24:56,030][04584] Avg episode reward: [(0, '4.527')] +[2024-11-07 15:25:00,829][09024] Updated weights for policy 0, policy_version 3847 (0.0064) +[2024-11-07 15:25:01,028][04584] Fps is (10 sec: 6357.0, 60 sec: 6553.6, 300 sec: 5928.8). Total num frames: 15757312. Throughput: 0: 1617.7. Samples: 1933338. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2024-11-07 15:25:01,030][04584] Avg episode reward: [(0, '4.338')] +[2024-11-07 15:25:06,027][04584] Fps is (10 sec: 7373.0, 60 sec: 6622.2, 300 sec: 5956.6). Total num frames: 15794176. Throughput: 0: 1676.0. Samples: 1944231. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2024-11-07 15:25:06,032][04584] Avg episode reward: [(0, '4.445')] +[2024-11-07 15:25:06,372][09024] Updated weights for policy 0, policy_version 3857 (0.0037) +[2024-11-07 15:25:11,031][04584] Fps is (10 sec: 7370.4, 60 sec: 6621.5, 300 sec: 6025.9). Total num frames: 15831040. Throughput: 0: 1680.4. Samples: 1949886. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) +[2024-11-07 15:25:11,033][04584] Avg episode reward: [(0, '4.314')] +[2024-11-07 15:25:11,052][09009] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000003865_15831040.pth... +[2024-11-07 15:25:11,277][09009] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000003509_14372864.pth +[2024-11-07 15:25:11,863][09024] Updated weights for policy 0, policy_version 3867 (0.0044) +[2024-11-07 15:25:16,028][04584] Fps is (10 sec: 7372.7, 60 sec: 6929.2, 300 sec: 6039.9). Total num frames: 15867904. Throughput: 0: 1678.3. Samples: 1961277. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) +[2024-11-07 15:25:16,035][04584] Avg episode reward: [(0, '4.238')] +[2024-11-07 15:25:17,024][09024] Updated weights for policy 0, policy_version 3877 (0.0041) +[2024-11-07 15:25:21,030][04584] Fps is (10 sec: 7783.6, 60 sec: 6963.0, 300 sec: 6081.5). Total num frames: 15908864. Throughput: 0: 1676.8. Samples: 1972947. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2024-11-07 15:25:21,032][04584] Avg episode reward: [(0, '4.292')] +[2024-11-07 15:25:22,877][09024] Updated weights for policy 0, policy_version 3887 (0.0059) +[2024-11-07 15:25:26,028][04584] Fps is (10 sec: 5734.1, 60 sec: 6621.8, 300 sec: 6026.0). Total num frames: 15925248. Throughput: 0: 1657.8. Samples: 1978146. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2024-11-07 15:25:26,030][04584] Avg episode reward: [(0, '4.499')] +[2024-11-07 15:25:30,836][09024] Updated weights for policy 0, policy_version 3897 (0.0053) +[2024-11-07 15:25:31,028][04584] Fps is (10 sec: 5325.6, 60 sec: 6553.6, 300 sec: 6039.9). Total num frames: 15962112. Throughput: 0: 1588.0. Samples: 1984647. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2024-11-07 15:25:31,030][04584] Avg episode reward: [(0, '4.349')] +[2024-11-07 15:25:36,029][04584] Fps is (10 sec: 7372.2, 60 sec: 6553.5, 300 sec: 6053.7). Total num frames: 15998976. Throughput: 0: 1733.7. Samples: 1995669. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) +[2024-11-07 15:25:36,035][04584] Avg episode reward: [(0, '4.612')] +[2024-11-07 15:25:36,308][09024] Updated weights for policy 0, policy_version 3907 (0.0043) +[2024-11-07 15:25:36,861][09009] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000003908_16007168.pth... +[2024-11-07 15:25:36,876][09009] Stopping Batcher_0... +[2024-11-07 15:25:36,877][09009] Loop batcher_evt_loop terminating... +[2024-11-07 15:25:36,874][04584] Component Batcher_0 stopped! +[2024-11-07 15:25:36,971][09024] Weights refcount: 2 0 +[2024-11-07 15:25:36,977][09024] Stopping InferenceWorker_p0-w0... +[2024-11-07 15:25:36,978][09024] Loop inference_proc0-0_evt_loop terminating... +[2024-11-07 15:25:36,979][04584] Component InferenceWorker_p0-w0 stopped! +[2024-11-07 15:25:37,003][09009] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000003670_15032320.pth +[2024-11-07 15:25:37,010][09009] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000003908_16007168.pth... +[2024-11-07 15:25:37,187][09009] Stopping LearnerWorker_p0... +[2024-11-07 15:25:37,188][09009] Loop learner_proc0_evt_loop terminating... +[2024-11-07 15:25:37,188][04584] Component LearnerWorker_p0 stopped! +[2024-11-07 15:25:37,438][04584] Component RolloutWorker_w3 stopped! +[2024-11-07 15:25:37,451][04584] Component RolloutWorker_w4 stopped! +[2024-11-07 15:25:37,446][09026] Stopping RolloutWorker_w3... +[2024-11-07 15:25:37,460][09026] Loop rollout_proc3_evt_loop terminating... +[2024-11-07 15:25:37,452][09030] Stopping RolloutWorker_w4... +[2024-11-07 15:25:37,467][09030] Loop rollout_proc4_evt_loop terminating... +[2024-11-07 15:25:37,514][04584] Component RolloutWorker_w6 stopped! +[2024-11-07 15:25:37,522][09037] Stopping RolloutWorker_w6... +[2024-11-07 15:25:37,528][09037] Loop rollout_proc6_evt_loop terminating... +[2024-11-07 15:25:37,552][04584] Component RolloutWorker_w0 stopped! +[2024-11-07 15:25:37,553][09025] Stopping RolloutWorker_w0... +[2024-11-07 15:25:37,556][09025] Loop rollout_proc0_evt_loop terminating... +[2024-11-07 15:25:37,632][04584] Component RolloutWorker_w1 stopped! +[2024-11-07 15:25:37,633][09029] Stopping RolloutWorker_w1... +[2024-11-07 15:25:37,643][09029] Loop rollout_proc1_evt_loop terminating... +[2024-11-07 15:25:37,666][04584] Component RolloutWorker_w9 stopped! +[2024-11-07 15:25:37,664][09039] Stopping RolloutWorker_w9... +[2024-11-07 15:25:37,702][09039] Loop rollout_proc9_evt_loop terminating... +[2024-11-07 15:25:38,109][04584] Component RolloutWorker_w8 stopped! +[2024-11-07 15:25:38,154][04584] Component RolloutWorker_w7 stopped! +[2024-11-07 15:25:38,111][09040] Stopping RolloutWorker_w8... +[2024-11-07 15:25:38,159][09040] Loop rollout_proc8_evt_loop terminating... +[2024-11-07 15:25:38,156][09038] Stopping RolloutWorker_w7... +[2024-11-07 15:25:38,168][04584] Component RolloutWorker_w5 stopped! +[2024-11-07 15:25:38,172][09038] Loop rollout_proc7_evt_loop terminating... +[2024-11-07 15:25:38,172][09028] Stopping RolloutWorker_w5... +[2024-11-07 15:25:38,178][09028] Loop rollout_proc5_evt_loop terminating... +[2024-11-07 15:25:38,397][04584] Component RolloutWorker_w2 stopped! +[2024-11-07 15:25:38,405][09027] Stopping RolloutWorker_w2... +[2024-11-07 15:25:38,402][04584] Waiting for process learner_proc0 to stop... +[2024-11-07 15:25:38,410][09027] Loop rollout_proc2_evt_loop terminating... +[2024-11-07 15:25:44,086][04584] Waiting for process inference_proc0-0 to join... +[2024-11-07 15:25:44,088][04584] Waiting for process rollout_proc0 to join... +[2024-11-07 15:25:44,090][04584] Waiting for process rollout_proc1 to join... +[2024-11-07 15:25:44,092][04584] Waiting for process rollout_proc2 to join... +[2024-11-07 15:25:44,094][04584] Waiting for process rollout_proc3 to join... +[2024-11-07 15:25:44,096][04584] Waiting for process rollout_proc4 to join... +[2024-11-07 15:25:44,098][04584] Waiting for process rollout_proc5 to join... +[2024-11-07 15:25:44,099][04584] Waiting for process rollout_proc6 to join... +[2024-11-07 15:25:44,102][04584] Waiting for process rollout_proc7 to join... +[2024-11-07 15:25:44,106][04584] Waiting for process rollout_proc8 to join... +[2024-11-07 15:25:44,108][04584] Waiting for process rollout_proc9 to join... +[2024-11-07 15:25:44,110][04584] Batcher 0 profile tree view: +batching: 177.8703, releasing_batches: 0.3180 +[2024-11-07 15:25:44,112][04584] InferenceWorker_p0-w0 profile tree view: +wait_policy: 0.0001 + wait_policy_total: 27.4860 +update_model: 26.3783 + weight_update: 0.0043 +one_step: 0.0063 + handle_policy_step: 1339.9856 + deserialize: 47.4091, stack: 6.2756, obs_to_device_normalize: 399.2838, forward: 569.5053, send_messages: 86.6617 + prepare_outputs: 187.4270 + to_cpu: 142.4986 +[2024-11-07 15:25:44,113][04584] Learner 0 profile tree view: +misc: 0.0142, prepare_batch: 70.5279 +train: 315.6715 + epoch_init: 0.0392, minibatch_init: 0.0518, losses_postprocess: 3.5406, kl_divergence: 4.0413, after_optimizer: 18.4153 + calculate_losses: 108.2507 + losses_init: 0.0171, forward_head: 8.9566, bptt_initial: 63.2059, tail: 4.5197, advantages_returns: 1.3485, losses: 15.0900 + bptt: 13.8609 + bptt_forward_core: 13.4036 + update: 178.6702 + clip: 5.0453 +[2024-11-07 15:25:44,118][04584] RolloutWorker_w0 profile tree view: +wait_for_trajectories: 0.6293, enqueue_policy_requests: 37.4664, env_step: 596.6403, overhead: 39.0474, complete_rollouts: 2.4262 +save_policy_outputs: 47.7074 + split_output_tensors: 15.5779 +[2024-11-07 15:25:44,120][04584] RolloutWorker_w9 profile tree view: +wait_for_trajectories: 0.5428, enqueue_policy_requests: 33.2939, env_step: 802.9854, overhead: 35.3433, complete_rollouts: 1.0714 +save_policy_outputs: 45.4609 + split_output_tensors: 15.8628 +[2024-11-07 15:25:44,122][04584] Loop Runner_EvtLoop terminating... +[2024-11-07 15:25:44,126][04584] Runner profile tree view: +main_loop: 1467.0711 +[2024-11-07 15:25:44,131][04584] Collected {0: 16007168}, FPS: 5447.1 +[2024-11-07 15:25:44,737][04584] Loading existing experiment configuration from /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/config.json +[2024-11-07 15:25:44,739][04584] Adding new argument 'no_render'=True that is not in the saved config file! +[2024-11-07 15:25:44,739][04584] Adding new argument 'save_video'=True that is not in the saved config file! +[2024-11-07 15:25:44,741][04584] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! +[2024-11-07 15:25:44,742][04584] Adding new argument 'video_name'=None that is not in the saved config file! +[2024-11-07 15:25:44,744][04584] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! +[2024-11-07 15:25:44,748][04584] Adding new argument 'max_num_episodes'=20 that is not in the saved config file! +[2024-11-07 15:25:44,750][04584] Adding new argument 'push_to_hub'=False that is not in the saved config file! +[2024-11-07 15:25:44,752][04584] Adding new argument 'hf_repository'=None that is not in the saved config file! +[2024-11-07 15:25:44,754][04584] Adding new argument 'policy_index'=0 that is not in the saved config file! +[2024-11-07 15:25:44,755][04584] Adding new argument 'eval_deterministic'=False that is not in the saved config file! +[2024-11-07 15:25:44,758][04584] Adding new argument 'train_script'=None that is not in the saved config file! +[2024-11-07 15:25:44,759][04584] Adding new argument 'enjoy_script'=None that is not in the saved config file! +[2024-11-07 15:25:44,763][04584] Using frameskip 1 and render_action_repeat=4 for evaluation +[2024-11-07 15:25:45,048][04584] RunningMeanStd input shape: (3, 72, 128) +[2024-11-07 15:25:45,059][04584] RunningMeanStd input shape: (1,) +[2024-11-07 15:25:45,267][04584] ConvEncoder: input_channels=3 +[2024-11-07 15:25:45,537][04584] Conv encoder output size: 512 +[2024-11-07 15:25:45,540][04584] Policy head output size: 512 +[2024-11-07 15:25:45,688][04584] Loading state from checkpoint /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000003908_16007168.pth... +[2024-11-07 15:25:46,915][04584] Num frames 100... +[2024-11-07 15:25:47,218][04584] Num frames 200... +[2024-11-07 15:25:47,432][04584] Num frames 300... +[2024-11-07 15:25:47,678][04584] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840 +[2024-11-07 15:25:47,680][04584] Avg episode reward: 3.840, avg true_objective: 3.840 +[2024-11-07 15:25:47,727][04584] Num frames 400... +[2024-11-07 15:25:47,953][04584] Num frames 500... +[2024-11-07 15:25:48,204][04584] Num frames 600... +[2024-11-07 15:25:48,378][04584] Avg episode rewards: #0: 3.200, true rewards: #0: 3.200 +[2024-11-07 15:25:48,379][04584] Avg episode reward: 3.200, avg true_objective: 3.200 +[2024-11-07 15:25:48,527][04584] Num frames 700... +[2024-11-07 15:25:48,761][04584] Num frames 800... +[2024-11-07 15:25:48,980][04584] Num frames 900... +[2024-11-07 15:25:49,257][04584] Num frames 1000... +[2024-11-07 15:25:49,367][04584] Avg episode rewards: #0: 3.413, true rewards: #0: 3.413 +[2024-11-07 15:25:49,369][04584] Avg episode reward: 3.413, avg true_objective: 3.413 +[2024-11-07 15:25:49,550][04584] Num frames 1100... +[2024-11-07 15:25:49,786][04584] Num frames 1200... +[2024-11-07 15:25:50,076][04584] Num frames 1300... +[2024-11-07 15:25:50,336][04584] Num frames 1400... +[2024-11-07 15:25:50,412][04584] Avg episode rewards: #0: 3.520, true rewards: #0: 3.520 +[2024-11-07 15:25:50,415][04584] Avg episode reward: 3.520, avg true_objective: 3.520 +[2024-11-07 15:25:50,657][04584] Num frames 1500... +[2024-11-07 15:25:50,950][04584] Num frames 1600... +[2024-11-07 15:25:51,197][04584] Num frames 1700... +[2024-11-07 15:25:51,432][04584] Num frames 1800... +[2024-11-07 15:25:51,561][04584] Avg episode rewards: #0: 3.848, true rewards: #0: 3.648 +[2024-11-07 15:25:51,562][04584] Avg episode reward: 3.848, avg true_objective: 3.648 +[2024-11-07 15:25:51,762][04584] Num frames 1900... +[2024-11-07 15:25:52,003][04584] Num frames 2000... +[2024-11-07 15:25:52,256][04584] Num frames 2100... +[2024-11-07 15:25:52,501][04584] Num frames 2200... +[2024-11-07 15:25:52,725][04584] Avg episode rewards: #0: 4.120, true rewards: #0: 3.787 +[2024-11-07 15:25:52,729][04584] Avg episode reward: 4.120, avg true_objective: 3.787 +[2024-11-07 15:25:52,810][04584] Num frames 2300... +[2024-11-07 15:25:53,060][04584] Num frames 2400... +[2024-11-07 15:25:53,296][04584] Num frames 2500... +[2024-11-07 15:25:53,533][04584] Num frames 2600... +[2024-11-07 15:25:53,726][04584] Avg episode rewards: #0: 4.080, true rewards: #0: 3.794 +[2024-11-07 15:25:53,728][04584] Avg episode reward: 4.080, avg true_objective: 3.794 +[2024-11-07 15:25:53,866][04584] Num frames 2700... +[2024-11-07 15:25:54,143][04584] Num frames 2800... +[2024-11-07 15:25:54,389][04584] Num frames 2900... +[2024-11-07 15:25:54,622][04584] Num frames 3000... +[2024-11-07 15:25:54,786][04584] Avg episode rewards: #0: 4.050, true rewards: #0: 3.800 +[2024-11-07 15:25:54,789][04584] Avg episode reward: 4.050, avg true_objective: 3.800 +[2024-11-07 15:25:54,940][04584] Num frames 3100... +[2024-11-07 15:25:55,222][04584] Num frames 3200... +[2024-11-07 15:25:55,492][04584] Num frames 3300... +[2024-11-07 15:25:55,765][04584] Num frames 3400... +[2024-11-07 15:25:56,060][04584] Avg episode rewards: #0: 4.209, true rewards: #0: 3.876 +[2024-11-07 15:25:56,062][04584] Avg episode reward: 4.209, avg true_objective: 3.876 +[2024-11-07 15:25:56,108][04584] Num frames 3500... +[2024-11-07 15:25:56,386][04584] Num frames 3600... +[2024-11-07 15:25:56,661][04584] Num frames 3700... +[2024-11-07 15:25:56,965][04584] Num frames 3800... +[2024-11-07 15:25:57,293][04584] Avg episode rewards: #0: 4.172, true rewards: #0: 3.872 +[2024-11-07 15:25:57,295][04584] Avg episode reward: 4.172, avg true_objective: 3.872 +[2024-11-07 15:25:57,407][04584] Num frames 3900... +[2024-11-07 15:25:57,705][04584] Num frames 4000... +[2024-11-07 15:25:57,968][04584] Num frames 4100... +[2024-11-07 15:26:00,392][04584] Num frames 4200... +[2024-11-07 15:26:00,634][04584] Avg episode rewards: #0: 4.142, true rewards: #0: 3.869 +[2024-11-07 15:26:00,639][04584] Avg episode reward: 4.142, avg true_objective: 3.869 +[2024-11-07 15:26:00,803][04584] Num frames 4300... +[2024-11-07 15:26:01,112][04584] Num frames 4400... +[2024-11-07 15:26:01,437][04584] Num frames 4500... +[2024-11-07 15:26:01,718][04584] Num frames 4600... +[2024-11-07 15:26:01,904][04584] Avg episode rewards: #0: 4.200, true rewards: #0: 3.867 +[2024-11-07 15:26:01,905][04584] Avg episode reward: 4.200, avg true_objective: 3.867 +[2024-11-07 15:26:02,104][04584] Num frames 4700... +[2024-11-07 15:26:02,487][04584] Num frames 4800... +[2024-11-07 15:26:02,890][04584] Num frames 4900... +[2024-11-07 15:26:03,241][04584] Num frames 5000... +[2024-11-07 15:26:03,366][04584] Avg episode rewards: #0: 4.172, true rewards: #0: 3.865 +[2024-11-07 15:26:03,368][04584] Avg episode reward: 4.172, avg true_objective: 3.865 +[2024-11-07 15:26:03,607][04584] Num frames 5100... +[2024-11-07 15:26:03,918][04584] Num frames 5200... +[2024-11-07 15:26:04,195][04584] Num frames 5300... +[2024-11-07 15:26:04,502][04584] Num frames 5400... +[2024-11-07 15:26:04,799][04584] Avg episode rewards: #0: 4.266, true rewards: #0: 3.909 +[2024-11-07 15:26:04,803][04584] Avg episode reward: 4.266, avg true_objective: 3.909 +[2024-11-07 15:26:04,915][04584] Num frames 5500... +[2024-11-07 15:26:05,256][04584] Num frames 5600... +[2024-11-07 15:26:05,613][04584] Num frames 5700... +[2024-11-07 15:26:05,979][04584] Num frames 5800... +[2024-11-07 15:26:06,209][04584] Avg episode rewards: #0: 4.237, true rewards: #0: 3.904 +[2024-11-07 15:26:06,213][04584] Avg episode reward: 4.237, avg true_objective: 3.904 +[2024-11-07 15:26:06,396][04584] Num frames 5900... +[2024-11-07 15:26:06,733][04584] Num frames 6000... +[2024-11-07 15:26:07,059][04584] Num frames 6100... +[2024-11-07 15:26:07,383][04584] Num frames 6200... +[2024-11-07 15:26:07,575][04584] Avg episode rewards: #0: 4.213, true rewards: #0: 3.900 +[2024-11-07 15:26:07,579][04584] Avg episode reward: 4.213, avg true_objective: 3.900 +[2024-11-07 15:26:07,773][04584] Num frames 6300... +[2024-11-07 15:26:08,122][04584] Num frames 6400... +[2024-11-07 15:26:08,510][04584] Avg episode rewards: #0: 4.115, true rewards: #0: 3.821 +[2024-11-07 15:26:08,513][04584] Avg episode reward: 4.115, avg true_objective: 3.821 +[2024-11-07 15:26:08,534][04584] Num frames 6500... +[2024-11-07 15:26:08,840][04584] Num frames 6600... +[2024-11-07 15:26:09,118][04584] Num frames 6700... +[2024-11-07 15:26:09,426][04584] Num frames 6800... +[2024-11-07 15:26:09,749][04584] Avg episode rewards: #0: 4.100, true rewards: #0: 3.822 +[2024-11-07 15:26:09,751][04584] Avg episode reward: 4.100, avg true_objective: 3.822 +[2024-11-07 15:26:09,827][04584] Num frames 6900... +[2024-11-07 15:26:10,154][04584] Num frames 7000... +[2024-11-07 15:26:10,468][04584] Num frames 7100... +[2024-11-07 15:26:10,765][04584] Num frames 7200... +[2024-11-07 15:26:11,015][04584] Avg episode rewards: #0: 4.086, true rewards: #0: 3.823 +[2024-11-07 15:26:11,018][04584] Avg episode reward: 4.086, avg true_objective: 3.823 +[2024-11-07 15:26:11,145][04584] Num frames 7300... +[2024-11-07 15:26:11,448][04584] Num frames 7400... +[2024-11-07 15:26:11,817][04584] Num frames 7500... +[2024-11-07 15:26:12,169][04584] Num frames 7600... +[2024-11-07 15:26:12,493][04584] Avg episode rewards: #0: 4.140, true rewards: #0: 3.840 +[2024-11-07 15:26:12,498][04584] Avg episode reward: 4.140, avg true_objective: 3.840 +[2024-11-07 15:26:39,781][04584] Replay video saved to /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/replay.mp4! +[2024-11-07 15:26:41,145][04584] Loading existing experiment configuration from /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/config.json +[2024-11-07 15:26:41,146][04584] Overriding arg 'num_workers' with value 4 passed from command line +[2024-11-07 15:26:41,148][04584] Adding new argument 'no_render'=True that is not in the saved config file! +[2024-11-07 15:26:41,149][04584] Adding new argument 'save_video'=True that is not in the saved config file! +[2024-11-07 15:26:41,152][04584] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! +[2024-11-07 15:26:41,154][04584] Adding new argument 'video_name'=None that is not in the saved config file! +[2024-11-07 15:26:41,156][04584] Adding new argument 'max_num_frames'=150000 that is not in the saved config file! +[2024-11-07 15:26:41,157][04584] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! +[2024-11-07 15:26:41,160][04584] Adding new argument 'push_to_hub'=True that is not in the saved config file! +[2024-11-07 15:26:41,163][04584] Adding new argument 'hf_repository'='alidenewade/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! +[2024-11-07 15:26:41,169][04584] Adding new argument 'policy_index'=0 that is not in the saved config file! +[2024-11-07 15:26:41,171][04584] Adding new argument 'eval_deterministic'=False that is not in the saved config file! +[2024-11-07 15:26:41,176][04584] Adding new argument 'train_script'=None that is not in the saved config file! +[2024-11-07 15:26:41,179][04584] Adding new argument 'enjoy_script'=None that is not in the saved config file! +[2024-11-07 15:26:41,182][04584] Using frameskip 1 and render_action_repeat=4 for evaluation +[2024-11-07 15:26:41,222][04584] RunningMeanStd input shape: (3, 72, 128) +[2024-11-07 15:26:41,224][04584] RunningMeanStd input shape: (1,) +[2024-11-07 15:26:41,260][04584] ConvEncoder: input_channels=3 +[2024-11-07 15:26:41,321][04584] Conv encoder output size: 512 +[2024-11-07 15:26:41,323][04584] Policy head output size: 512 +[2024-11-07 15:26:41,356][04584] Loading state from checkpoint /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000003908_16007168.pth... +[2024-11-07 15:26:42,080][04584] Num frames 100... +[2024-11-07 15:26:42,204][04584] Avg episode rewards: #0: 1.280, true rewards: #0: 1.280 +[2024-11-07 15:26:42,206][04584] Avg episode reward: 1.280, avg true_objective: 1.280 +[2024-11-07 15:26:42,386][04584] Num frames 200... +[2024-11-07 15:26:42,603][04584] Num frames 300... +[2024-11-07 15:26:42,828][04584] Num frames 400... +[2024-11-07 15:26:43,054][04584] Num frames 500... +[2024-11-07 15:26:43,135][04584] Avg episode rewards: #0: 2.560, true rewards: #0: 2.560 +[2024-11-07 15:26:43,137][04584] Avg episode reward: 2.560, avg true_objective: 2.560 +[2024-11-07 15:26:43,338][04584] Num frames 600... +[2024-11-07 15:26:43,556][04584] Num frames 700... +[2024-11-07 15:26:43,790][04584] Num frames 800... +[2024-11-07 15:26:44,021][04584] Num frames 900... +[2024-11-07 15:26:44,275][04584] Avg episode rewards: #0: 3.973, true rewards: #0: 3.307 +[2024-11-07 15:26:44,280][04584] Avg episode reward: 3.973, avg true_objective: 3.307 +[2024-11-07 15:26:44,313][04584] Num frames 1000... +[2024-11-07 15:26:44,514][04584] Num frames 1100... +[2024-11-07 15:26:44,732][04584] Num frames 1200... +[2024-11-07 15:26:44,965][04584] Num frames 1300... +[2024-11-07 15:26:45,194][04584] Avg episode rewards: #0: 3.940, true rewards: #0: 3.440 +[2024-11-07 15:26:45,199][04584] Avg episode reward: 3.940, avg true_objective: 3.440 +[2024-11-07 15:26:45,259][04584] Num frames 1400... +[2024-11-07 15:26:45,491][04584] Num frames 1500... +[2024-11-07 15:26:45,708][04584] Num frames 1600... +[2024-11-07 15:26:45,933][04584] Num frames 1700... +[2024-11-07 15:26:46,133][04584] Avg episode rewards: #0: 3.920, true rewards: #0: 3.520 +[2024-11-07 15:26:46,138][04584] Avg episode reward: 3.920, avg true_objective: 3.520 +[2024-11-07 15:26:46,227][04584] Num frames 1800... +[2024-11-07 15:26:46,458][04584] Num frames 1900... +[2024-11-07 15:26:46,696][04584] Num frames 2000... +[2024-11-07 15:26:46,930][04584] Num frames 2100... +[2024-11-07 15:26:47,112][04584] Avg episode rewards: #0: 3.907, true rewards: #0: 3.573 +[2024-11-07 15:26:47,115][04584] Avg episode reward: 3.907, avg true_objective: 3.573 +[2024-11-07 15:26:47,281][04584] Num frames 2200... +[2024-11-07 15:26:47,506][04584] Num frames 2300... +[2024-11-07 15:26:47,747][04584] Num frames 2400... +[2024-11-07 15:26:48,011][04584] Num frames 2500... +[2024-11-07 15:26:48,134][04584] Avg episode rewards: #0: 3.897, true rewards: #0: 3.611 +[2024-11-07 15:26:48,136][04584] Avg episode reward: 3.897, avg true_objective: 3.611 +[2024-11-07 15:26:48,329][04584] Num frames 2600... +[2024-11-07 15:26:48,555][04584] Num frames 2700... +[2024-11-07 15:26:48,784][04584] Num frames 2800... +[2024-11-07 15:26:48,992][04584] Num frames 2900... +[2024-11-07 15:26:49,076][04584] Avg episode rewards: #0: 3.890, true rewards: #0: 3.640 +[2024-11-07 15:26:49,079][04584] Avg episode reward: 3.890, avg true_objective: 3.640 +[2024-11-07 15:26:49,278][04584] Num frames 3000... +[2024-11-07 15:26:49,523][04584] Num frames 3100... +[2024-11-07 15:26:49,752][04584] Num frames 3200... +[2024-11-07 15:26:49,974][04584] Num frames 3300... +[2024-11-07 15:26:50,159][04584] Avg episode rewards: #0: 4.067, true rewards: #0: 3.733 +[2024-11-07 15:26:50,161][04584] Avg episode reward: 4.067, avg true_objective: 3.733 +[2024-11-07 15:26:50,253][04584] Num frames 3400... +[2024-11-07 15:26:50,471][04584] Num frames 3500... +[2024-11-07 15:26:50,695][04584] Num frames 3600... +[2024-11-07 15:26:50,913][04584] Num frames 3700... +[2024-11-07 15:26:51,056][04584] Avg episode rewards: #0: 4.044, true rewards: #0: 3.744 +[2024-11-07 15:26:51,059][04584] Avg episode reward: 4.044, avg true_objective: 3.744 +[2024-11-07 15:27:00,168][04584] Replay video saved to /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/replay.mp4! +[2024-11-07 15:27:15,613][04584] The model has been pushed to https://huggingface.co/alidenewade/rl_course_vizdoom_health_gathering_supreme +[2024-11-07 15:28:10,876][04584] Loading existing experiment configuration from /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/config.json +[2024-11-07 15:28:10,878][04584] Overriding arg 'num_workers' with value 4 passed from command line +[2024-11-07 15:28:10,879][04584] Adding new argument 'no_render'=True that is not in the saved config file! +[2024-11-07 15:28:10,880][04584] Adding new argument 'save_video'=True that is not in the saved config file! +[2024-11-07 15:28:10,883][04584] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! +[2024-11-07 15:28:10,884][04584] Adding new argument 'video_name'=None that is not in the saved config file! +[2024-11-07 15:28:10,885][04584] Adding new argument 'max_num_frames'=150000 that is not in the saved config file! +[2024-11-07 15:28:10,886][04584] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! +[2024-11-07 15:28:10,887][04584] Adding new argument 'push_to_hub'=True that is not in the saved config file! +[2024-11-07 15:28:10,890][04584] Adding new argument 'hf_repository'='alidenewade/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! +[2024-11-07 15:28:10,891][04584] Adding new argument 'policy_index'=0 that is not in the saved config file! +[2024-11-07 15:28:10,893][04584] Adding new argument 'eval_deterministic'=False that is not in the saved config file! +[2024-11-07 15:28:10,894][04584] Adding new argument 'train_script'=None that is not in the saved config file! +[2024-11-07 15:28:10,896][04584] Adding new argument 'enjoy_script'=None that is not in the saved config file! +[2024-11-07 15:28:10,898][04584] Using frameskip 1 and render_action_repeat=4 for evaluation +[2024-11-07 15:28:10,928][04584] RunningMeanStd input shape: (3, 72, 128) +[2024-11-07 15:28:10,931][04584] RunningMeanStd input shape: (1,) +[2024-11-07 15:28:10,949][04584] ConvEncoder: input_channels=3 +[2024-11-07 15:28:11,051][04584] Conv encoder output size: 512 +[2024-11-07 15:28:11,053][04584] Policy head output size: 512 +[2024-11-07 15:28:11,095][04584] Loading state from checkpoint /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000003908_16007168.pth... +[2024-11-07 15:28:11,668][04584] Num frames 100... +[2024-11-07 15:28:11,899][04584] Num frames 200... +[2024-11-07 15:28:12,095][04584] Avg episode rewards: #0: 2.560, true rewards: #0: 2.560 +[2024-11-07 15:28:12,099][04584] Avg episode reward: 2.560, avg true_objective: 2.560 +[2024-11-07 15:28:12,197][04584] Num frames 300... +[2024-11-07 15:28:12,417][04584] Num frames 400... +[2024-11-07 15:28:12,623][04584] Num frames 500... +[2024-11-07 15:28:12,833][04584] Num frames 600... +[2024-11-07 15:28:12,965][04584] Avg episode rewards: #0: 3.200, true rewards: #0: 3.200 +[2024-11-07 15:28:12,969][04584] Avg episode reward: 3.200, avg true_objective: 3.200 +[2024-11-07 15:28:13,114][04584] Num frames 700... +[2024-11-07 15:28:13,354][04584] Num frames 800... +[2024-11-07 15:28:13,583][04584] Num frames 900... +[2024-11-07 15:28:13,801][04584] Num frames 1000... +[2024-11-07 15:28:13,909][04584] Avg episode rewards: #0: 3.413, true rewards: #0: 3.413 +[2024-11-07 15:28:13,910][04584] Avg episode reward: 3.413, avg true_objective: 3.413 +[2024-11-07 15:28:14,079][04584] Num frames 1100... +[2024-11-07 15:28:14,288][04584] Num frames 1200... +[2024-11-07 15:28:14,513][04584] Num frames 1300... +[2024-11-07 15:28:14,726][04584] Num frames 1400... +[2024-11-07 15:28:14,806][04584] Avg episode rewards: #0: 3.520, true rewards: #0: 3.520 +[2024-11-07 15:28:14,809][04584] Avg episode reward: 3.520, avg true_objective: 3.520 +[2024-11-07 15:28:15,035][04584] Num frames 1500... +[2024-11-07 15:28:15,244][04584] Num frames 1600... +[2024-11-07 15:28:17,568][04584] Num frames 1700... +[2024-11-07 15:28:17,830][04584] Avg episode rewards: #0: 3.584, true rewards: #0: 3.584 +[2024-11-07 15:28:17,835][04584] Avg episode reward: 3.584, avg true_objective: 3.584 +[2024-11-07 15:28:17,873][04584] Num frames 1800... +[2024-11-07 15:28:18,082][04584] Num frames 1900... +[2024-11-07 15:28:18,295][04584] Num frames 2000... +[2024-11-07 15:28:18,511][04584] Num frames 2100... +[2024-11-07 15:28:18,737][04584] Num frames 2200... +[2024-11-07 15:28:18,880][04584] Avg episode rewards: #0: 3.900, true rewards: #0: 3.733 +[2024-11-07 15:28:18,886][04584] Avg episode reward: 3.900, avg true_objective: 3.733 +[2024-11-07 15:28:19,045][04584] Num frames 2300... +[2024-11-07 15:28:19,258][04584] Num frames 2400... +[2024-11-07 15:28:19,459][04584] Num frames 2500... +[2024-11-07 15:28:19,684][04584] Num frames 2600... +[2024-11-07 15:28:19,802][04584] Avg episode rewards: #0: 3.891, true rewards: #0: 3.749 +[2024-11-07 15:28:19,803][04584] Avg episode reward: 3.891, avg true_objective: 3.749 +[2024-11-07 15:28:19,962][04584] Num frames 2700... +[2024-11-07 15:28:20,182][04584] Num frames 2800... +[2024-11-07 15:28:20,404][04584] Num frames 2900... +[2024-11-07 15:28:20,622][04584] Num frames 3000... +[2024-11-07 15:28:20,830][04584] Avg episode rewards: #0: 4.090, true rewards: #0: 3.840 +[2024-11-07 15:28:20,835][04584] Avg episode reward: 4.090, avg true_objective: 3.840 +[2024-11-07 15:28:20,924][04584] Num frames 3100... +[2024-11-07 15:28:21,144][04584] Num frames 3200... +[2024-11-07 15:28:21,345][04584] Num frames 3300... +[2024-11-07 15:28:21,589][04584] Num frames 3400... +[2024-11-07 15:28:21,784][04584] Avg episode rewards: #0: 4.062, true rewards: #0: 3.840 +[2024-11-07 15:28:21,789][04584] Avg episode reward: 4.062, avg true_objective: 3.840 +[2024-11-07 15:28:21,910][04584] Num frames 3500... +[2024-11-07 15:28:22,124][04584] Num frames 3600... +[2024-11-07 15:28:22,341][04584] Num frames 3700... +[2024-11-07 15:28:22,426][04584] Avg episode rewards: #0: 3.912, true rewards: #0: 3.712 +[2024-11-07 15:28:22,431][04584] Avg episode reward: 3.912, avg true_objective: 3.712 +[2024-11-07 15:28:31,547][04584] Replay video saved to /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/replay.mp4! +[2024-11-07 15:28:36,824][04584] The model has been pushed to https://huggingface.co/alidenewade/rl_course_vizdoom_health_gathering_supreme +[2024-11-07 15:35:39,627][04584] Loading existing experiment configuration from /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/config.json +[2024-11-07 15:35:39,629][04584] Overriding arg 'num_envs_per_worker' with value 4 passed from command line +[2024-11-07 15:35:39,631][04584] Overriding arg 'learning_rate' with value 0.0003 passed from command line +[2024-11-07 15:35:39,633][04584] Overriding arg 'train_for_env_steps' with value 4000000 passed from command line +[2024-11-07 15:35:39,766][04584] Experiment dir /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment already exists! +[2024-11-07 15:35:39,767][04584] Resuming existing experiment from /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment... +[2024-11-07 15:35:39,768][04584] Weights and Biases integration disabled +[2024-11-07 15:35:39,773][04584] Environment var CUDA_VISIBLE_DEVICES is 0 + +[2024-11-07 15:35:53,100][04584] Starting experiment with the following configuration: +help=False +algo=APPO +env=doom_health_gathering_supreme +experiment=default_experiment +train_dir=/root/hfRL/ml/LunarLander-v2/train_dir +restart_behavior=resume +device=gpu +seed=None +num_policies=1 +async_rl=True +serial_mode=False +batched_sampling=False +num_batches_to_accumulate=2 +worker_num_splits=2 +policy_workers_per_policy=1 +max_policy_lag=1000 +num_workers=10 +num_envs_per_worker=4 +batch_size=1024 +num_batches_per_epoch=1 +num_epochs=1 +rollout=32 +recurrence=32 +shuffle_minibatches=False +gamma=0.99 +reward_scale=1.0 +reward_clip=1000.0 +value_bootstrap=False +normalize_returns=True +exploration_loss_coeff=0.001 +value_loss_coeff=0.5 +kl_loss_coeff=0.0 +exploration_loss=symmetric_kl +gae_lambda=0.95 +ppo_clip_ratio=0.1 +ppo_clip_value=0.2 +with_vtrace=False +vtrace_rho=1.0 +vtrace_c=1.0 +optimizer=adam +adam_eps=1e-06 +adam_beta1=0.9 +adam_beta2=0.999 +max_grad_norm=4.0 +learning_rate=0.0003 +lr_schedule=constant +lr_schedule_kl_threshold=0.008 +lr_adaptive_min=1e-06 +lr_adaptive_max=0.01 +obs_subtract_mean=0.0 +obs_scale=255.0 +normalize_input=True +normalize_input_keys=None +decorrelate_experience_max_seconds=0 +decorrelate_envs_on_one_worker=True +actor_worker_gpus=[] +set_workers_cpu_affinity=True +force_envs_single_thread=False +default_niceness=0 +log_to_file=True +experiment_summaries_interval=10 +flush_summaries_interval=30 +stats_avg=100 +summaries_use_frameskip=True +heartbeat_interval=20 +heartbeat_reporting_interval=600 +train_for_env_steps=4000000 +train_for_seconds=10000000000 +save_every_sec=120 +keep_checkpoints=2 +load_checkpoint_kind=latest +save_milestones_sec=-1 +save_best_every_sec=5 +save_best_metric=reward +save_best_after=100000 +benchmark=False +encoder_mlp_layers=[512, 512] +encoder_conv_architecture=convnet_simple +encoder_conv_mlp_layers=[512] +use_rnn=True +rnn_size=512 +rnn_type=gru +rnn_num_layers=1 +decoder_mlp_layers=[] +nonlinearity=elu +policy_initialization=orthogonal +policy_init_gain=1.0 +actor_critic_share_weights=True +adaptive_stddev=True +continuous_tanh_scale=0.0 +initial_stddev=1.0 +use_env_info_cache=False +env_gpu_actions=False +env_gpu_observations=True +env_frameskip=4 +env_framestack=1 +pixel_format=CHW +use_record_episode_statistics=False +with_wandb=False +wandb_user=None +wandb_project=sample_factory +wandb_group=None +wandb_job_type=SF +wandb_tags=[] +with_pbt=False +pbt_mix_policies_in_one_env=True +pbt_period_env_steps=5000000 +pbt_start_mutation=20000000 +pbt_replace_fraction=0.3 +pbt_mutation_rate=0.15 +pbt_replace_reward_gap=0.1 +pbt_replace_reward_gap_absolute=1e-06 +pbt_optimize_gamma=False +pbt_target_objective=true_objective +pbt_perturb_min=1.1 +pbt_perturb_max=1.5 +num_agents=-1 +num_humans=0 +num_bots=-1 +start_bot_difficulty=None +timelimit=None +res_w=128 +res_h=72 +wide_aspect_ratio=False +eval_env_frameskip=1 +fps=35 +command_line=--env=doom_health_gathering_supreme --num_workers=8 --num_envs_per_worker=4 --train_for_env_steps=4000000 +cli_args={'env': 'doom_health_gathering_supreme', 'num_workers': 8, 'num_envs_per_worker': 4, 'train_for_env_steps': 4000000} +git_hash=unknown +git_repo_name=not a git repository +[2024-11-07 15:35:53,102][04584] Saving configuration to /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/config.json... +[2024-11-07 15:35:53,105][04584] Rollout worker 0 uses device cpu +[2024-11-07 15:35:53,106][04584] Rollout worker 1 uses device cpu +[2024-11-07 15:35:53,108][04584] Rollout worker 2 uses device cpu +[2024-11-07 15:35:53,111][04584] Rollout worker 3 uses device cpu +[2024-11-07 15:35:53,113][04584] Rollout worker 4 uses device cpu +[2024-11-07 15:35:53,115][04584] Rollout worker 5 uses device cpu +[2024-11-07 15:35:53,117][04584] Rollout worker 6 uses device cpu +[2024-11-07 15:35:53,118][04584] Rollout worker 7 uses device cpu +[2024-11-07 15:35:53,121][04584] Rollout worker 8 uses device cpu +[2024-11-07 15:35:53,124][04584] Rollout worker 9 uses device cpu +[2024-11-07 15:35:53,218][04584] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-11-07 15:35:53,219][04584] InferenceWorker_p0-w0: min num requests: 3 +[2024-11-07 15:35:53,265][04584] Starting all processes... +[2024-11-07 15:35:53,267][04584] Starting process learner_proc0 +[2024-11-07 15:35:53,321][04584] Starting all processes... +[2024-11-07 15:35:53,334][04584] Starting process inference_proc0-0 +[2024-11-07 15:35:53,338][04584] Starting process rollout_proc0 +[2024-11-07 15:35:53,338][04584] Starting process rollout_proc1 +[2024-11-07 15:35:53,340][04584] Starting process rollout_proc2 +[2024-11-07 15:35:53,341][04584] Starting process rollout_proc3 +[2024-11-07 15:35:53,343][04584] Starting process rollout_proc4 +[2024-11-07 15:35:53,344][04584] Starting process rollout_proc5 +[2024-11-07 15:35:53,346][04584] Starting process rollout_proc6 +[2024-11-07 15:35:53,348][04584] Starting process rollout_proc7 +[2024-11-07 15:35:53,350][04584] Starting process rollout_proc8 +[2024-11-07 15:35:53,351][04584] Starting process rollout_proc9 +[2024-11-07 15:36:01,323][12380] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-11-07 15:36:01,324][12380] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 +[2024-11-07 15:36:01,855][12406] Worker 8 uses CPU cores [0, 1, 2, 3, 4, 5, 6] +[2024-11-07 15:36:01,933][12395] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-11-07 15:36:01,934][12395] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 +[2024-11-07 15:36:02,006][12380] Num visible devices: 1 +[2024-11-07 15:36:02,044][12395] Num visible devices: 1 +[2024-11-07 15:36:02,084][12398] Worker 2 uses CPU cores [2] +[2024-11-07 15:36:02,105][12380] Starting seed is not provided +[2024-11-07 15:36:02,106][12380] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-11-07 15:36:02,106][12380] Initializing actor-critic model on device cuda:0 +[2024-11-07 15:36:02,107][12380] RunningMeanStd input shape: (3, 72, 128) +[2024-11-07 15:36:02,125][12380] RunningMeanStd input shape: (1,) +[2024-11-07 15:36:02,413][12380] ConvEncoder: input_channels=3 +[2024-11-07 15:36:02,565][12407] Worker 4 uses CPU cores [4] +[2024-11-07 15:36:02,968][12408] Worker 6 uses CPU cores [6] +[2024-11-07 15:36:03,069][12397] Worker 0 uses CPU cores [0] +[2024-11-07 15:36:03,316][12380] Conv encoder output size: 512 +[2024-11-07 15:36:03,322][12380] Policy head output size: 512 +[2024-11-07 15:36:03,383][12396] Worker 1 uses CPU cores [1] +[2024-11-07 15:36:03,390][12380] Created Actor Critic model with architecture: +[2024-11-07 15:36:03,390][12380] ActorCriticSharedWeights( + (obs_normalizer): ObservationNormalizer( + (running_mean_std): RunningMeanStdDictInPlace( + (running_mean_std): ModuleDict( + (obs): RunningMeanStdInPlace() + ) + ) + ) + (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) + (encoder): VizdoomEncoder( + (basic_encoder): ConvEncoder( + (enc): RecursiveScriptModule( + original_name=ConvEncoderImpl + (conv_head): RecursiveScriptModule( + original_name=Sequential + (0): RecursiveScriptModule(original_name=Conv2d) + (1): RecursiveScriptModule(original_name=ELU) + (2): RecursiveScriptModule(original_name=Conv2d) + (3): RecursiveScriptModule(original_name=ELU) + (4): RecursiveScriptModule(original_name=Conv2d) + (5): RecursiveScriptModule(original_name=ELU) + ) + (mlp_layers): RecursiveScriptModule( + original_name=Sequential + (0): RecursiveScriptModule(original_name=Linear) + (1): RecursiveScriptModule(original_name=ELU) + ) + ) + ) + ) + (core): ModelCoreRNN( + (core): GRU(512, 512) + ) + (decoder): MlpDecoder( + (mlp): Identity() + ) + (critic_linear): Linear(in_features=512, out_features=1, bias=True) + (action_parameterization): ActionParameterizationDefault( + (distribution_linear): Linear(in_features=512, out_features=5, bias=True) + ) +) +[2024-11-07 15:36:03,484][12409] Worker 7 uses CPU cores [0, 1, 2, 3, 4, 5, 6] +[2024-11-07 15:36:03,503][12399] Worker 3 uses CPU cores [3] +[2024-11-07 15:36:03,843][12410] Worker 5 uses CPU cores [5] +[2024-11-07 15:36:03,920][12411] Worker 9 uses CPU cores [0, 1, 2, 3, 4, 5, 6] +[2024-11-07 15:36:04,363][12380] Using optimizer +[2024-11-07 15:36:07,491][12380] Loading state from checkpoint /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000003908_16007168.pth... +[2024-11-07 15:36:07,594][12380] Loading model from checkpoint +[2024-11-07 15:36:07,596][12380] Loaded experiment state at self.train_step=3908, self.env_steps=16007168 +[2024-11-07 15:36:07,597][12380] Initialized policy 0 weights for model version 3908 +[2024-11-07 15:36:07,606][12380] LearnerWorker_p0 finished initialization! +[2024-11-07 15:36:07,606][12380] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-11-07 15:36:08,060][12395] RunningMeanStd input shape: (3, 72, 128) +[2024-11-07 15:36:08,061][12395] RunningMeanStd input shape: (1,) +[2024-11-07 15:36:08,077][12395] ConvEncoder: input_channels=3 +[2024-11-07 15:36:08,262][12395] Conv encoder output size: 512 +[2024-11-07 15:36:08,262][12395] Policy head output size: 512 +[2024-11-07 15:36:08,356][04584] Inference worker 0-0 is ready! +[2024-11-07 15:36:08,358][04584] All inference workers are ready! Signal rollout workers to start! +[2024-11-07 15:36:08,479][12399] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 15:36:08,521][12407] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 15:36:08,526][12411] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 15:36:08,532][12409] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 15:36:08,535][12398] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 15:36:08,536][12396] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 15:36:08,545][12397] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 15:36:08,546][12410] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 15:36:08,689][12406] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 15:36:08,695][12408] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 15:36:09,198][12407] Decorrelating experience for 0 frames... +[2024-11-07 15:36:09,203][12399] Decorrelating experience for 0 frames... +[2024-11-07 15:36:09,236][12398] Decorrelating experience for 0 frames... +[2024-11-07 15:36:09,720][12411] Decorrelating experience for 0 frames... +[2024-11-07 15:36:09,748][12397] Decorrelating experience for 0 frames... +[2024-11-07 15:36:09,774][04584] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 16007168. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-11-07 15:36:09,841][12410] Decorrelating experience for 0 frames... +[2024-11-07 15:36:09,917][12407] Decorrelating experience for 32 frames... +[2024-11-07 15:36:09,948][12399] Decorrelating experience for 32 frames... +[2024-11-07 15:36:10,220][12396] Decorrelating experience for 0 frames... +[2024-11-07 15:36:10,305][12408] Decorrelating experience for 0 frames... +[2024-11-07 15:36:10,708][12406] Decorrelating experience for 0 frames... +[2024-11-07 15:36:10,813][12407] Decorrelating experience for 64 frames... +[2024-11-07 15:36:10,964][12399] Decorrelating experience for 64 frames... +[2024-11-07 15:36:11,144][12408] Decorrelating experience for 32 frames... +[2024-11-07 15:36:11,150][12396] Decorrelating experience for 32 frames... +[2024-11-07 15:36:11,225][12397] Decorrelating experience for 32 frames... +[2024-11-07 15:36:11,390][12411] Decorrelating experience for 32 frames... +[2024-11-07 15:36:11,571][12407] Decorrelating experience for 96 frames... +[2024-11-07 15:36:11,584][12399] Decorrelating experience for 96 frames... +[2024-11-07 15:36:13,567][04584] Heartbeat connected on LearnerWorker_p0 +[2024-11-07 15:36:13,569][04584] Heartbeat connected on RolloutWorker_w3 +[2024-11-07 15:36:13,575][04584] Heartbeat connected on Batcher_0 +[2024-11-07 15:36:13,580][04584] Heartbeat connected on RolloutWorker_w4 +[2024-11-07 15:36:13,730][12406] Decorrelating experience for 32 frames... +[2024-11-07 15:36:13,871][12410] Decorrelating experience for 32 frames... +[2024-11-07 15:36:13,880][12397] Decorrelating experience for 64 frames... +[2024-11-07 15:36:13,894][12408] Decorrelating experience for 64 frames... +[2024-11-07 15:36:14,131][12396] Decorrelating experience for 64 frames... +[2024-11-07 15:36:14,255][12411] Decorrelating experience for 64 frames... +[2024-11-07 15:36:14,386][12409] Decorrelating experience for 0 frames... +[2024-11-07 15:36:14,449][12408] Decorrelating experience for 96 frames... +[2024-11-07 15:36:14,489][12397] Decorrelating experience for 96 frames... +[2024-11-07 15:36:14,512][04584] Heartbeat connected on RolloutWorker_w6 +[2024-11-07 15:36:14,595][04584] Heartbeat connected on RolloutWorker_w0 +[2024-11-07 15:36:14,604][12396] Decorrelating experience for 96 frames... +[2024-11-07 15:36:14,678][04584] Heartbeat connected on RolloutWorker_w1 +[2024-11-07 15:36:14,774][04584] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 16007168. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-11-07 15:36:14,971][12398] Decorrelating experience for 32 frames... +[2024-11-07 15:36:15,065][12411] Decorrelating experience for 96 frames... +[2024-11-07 15:36:15,200][12409] Decorrelating experience for 32 frames... +[2024-11-07 15:36:15,266][04584] Heartbeat connected on RolloutWorker_w9 +[2024-11-07 15:36:15,504][04584] Heartbeat connected on InferenceWorker_p0-w0 +[2024-11-07 15:36:15,728][12410] Decorrelating experience for 64 frames... +[2024-11-07 15:36:16,300][12398] Decorrelating experience for 64 frames... +[2024-11-07 15:36:16,335][12409] Decorrelating experience for 64 frames... +[2024-11-07 15:36:17,274][12406] Decorrelating experience for 64 frames... +[2024-11-07 15:36:17,565][12410] Decorrelating experience for 96 frames... +[2024-11-07 15:36:17,901][04584] Heartbeat connected on RolloutWorker_w5 +[2024-11-07 15:36:18,870][12398] Decorrelating experience for 96 frames... +[2024-11-07 15:36:19,062][12406] Decorrelating experience for 96 frames... +[2024-11-07 15:36:19,077][04584] Heartbeat connected on RolloutWorker_w2 +[2024-11-07 15:36:19,382][04584] Heartbeat connected on RolloutWorker_w8 +[2024-11-07 15:36:19,785][04584] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 16007168. Throughput: 0: 73.1. Samples: 732. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-11-07 15:36:19,789][04584] Avg episode reward: [(0, '1.797')] +[2024-11-07 15:36:20,780][12409] Decorrelating experience for 96 frames... +[2024-11-07 15:36:20,905][12380] Signal inference workers to stop experience collection... +[2024-11-07 15:36:20,925][12395] InferenceWorker_p0-w0: stopping experience collection +[2024-11-07 15:36:21,046][04584] Heartbeat connected on RolloutWorker_w7 +[2024-11-07 15:36:24,774][04584] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 16007168. Throughput: 0: 184.4. Samples: 2766. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-11-07 15:36:24,775][04584] Avg episode reward: [(0, '2.131')] +[2024-11-07 15:36:29,774][04584] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 16007168. Throughput: 0: 138.3. Samples: 2766. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-11-07 15:36:29,776][04584] Avg episode reward: [(0, '2.131')] +[2024-11-07 15:36:31,391][12380] Signal inference workers to resume experience collection... +[2024-11-07 15:36:31,392][12395] InferenceWorker_p0-w0: resuming experience collection +[2024-11-07 15:36:31,396][12380] Stopping Batcher_0... +[2024-11-07 15:36:31,396][12380] Loop batcher_evt_loop terminating... +[2024-11-07 15:36:31,433][04584] Component Batcher_0 stopped! +[2024-11-07 15:36:31,892][12395] Weights refcount: 2 0 +[2024-11-07 15:36:31,894][12395] Stopping InferenceWorker_p0-w0... +[2024-11-07 15:36:31,894][12395] Loop inference_proc0-0_evt_loop terminating... +[2024-11-07 15:36:31,894][04584] Component InferenceWorker_p0-w0 stopped! +[2024-11-07 15:36:32,099][12380] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000003910_16015360.pth... +[2024-11-07 15:36:32,269][12411] Stopping RolloutWorker_w9... +[2024-11-07 15:36:32,270][12411] Loop rollout_proc9_evt_loop terminating... +[2024-11-07 15:36:32,269][04584] Component RolloutWorker_w9 stopped! +[2024-11-07 15:36:32,294][12408] Stopping RolloutWorker_w6... +[2024-11-07 15:36:32,295][12408] Loop rollout_proc6_evt_loop terminating... +[2024-11-07 15:36:32,296][12396] Stopping RolloutWorker_w1... +[2024-11-07 15:36:32,297][12396] Loop rollout_proc1_evt_loop terminating... +[2024-11-07 15:36:32,284][04584] Component RolloutWorker_w8 stopped! +[2024-11-07 15:36:32,468][12406] Stopping RolloutWorker_w8... +[2024-11-07 15:36:32,469][12406] Loop rollout_proc8_evt_loop terminating... +[2024-11-07 15:36:32,467][04584] Component RolloutWorker_w6 stopped! +[2024-11-07 15:36:32,470][04584] Component RolloutWorker_w1 stopped! +[2024-11-07 15:36:32,489][04584] Component RolloutWorker_w5 stopped! +[2024-11-07 15:36:32,490][12399] Stopping RolloutWorker_w3... +[2024-11-07 15:36:32,490][04584] Component RolloutWorker_w3 stopped! +[2024-11-07 15:36:32,492][12410] Stopping RolloutWorker_w5... +[2024-11-07 15:36:32,492][12399] Loop rollout_proc3_evt_loop terminating... +[2024-11-07 15:36:32,493][12410] Loop rollout_proc5_evt_loop terminating... +[2024-11-07 15:36:32,644][12409] Stopping RolloutWorker_w7... +[2024-11-07 15:36:32,645][12409] Loop rollout_proc7_evt_loop terminating... +[2024-11-07 15:36:32,645][04584] Component RolloutWorker_w7 stopped! +[2024-11-07 15:36:32,716][12397] Stopping RolloutWorker_w0... +[2024-11-07 15:36:32,717][04584] Component RolloutWorker_w0 stopped! +[2024-11-07 15:36:32,735][12397] Loop rollout_proc0_evt_loop terminating... +[2024-11-07 15:36:32,920][12398] Stopping RolloutWorker_w2... +[2024-11-07 15:36:32,920][12398] Loop rollout_proc2_evt_loop terminating... +[2024-11-07 15:36:32,920][04584] Component RolloutWorker_w2 stopped! +[2024-11-07 15:36:33,007][04584] Component RolloutWorker_w4 stopped! +[2024-11-07 15:36:33,009][12407] Stopping RolloutWorker_w4... +[2024-11-07 15:36:33,011][12407] Loop rollout_proc4_evt_loop terminating... +[2024-11-07 15:36:33,392][12380] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000003865_15831040.pth +[2024-11-07 15:36:33,454][12380] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000003910_16015360.pth... +[2024-11-07 15:36:34,291][04584] Component LearnerWorker_p0 stopped! +[2024-11-07 15:36:34,290][12380] Stopping LearnerWorker_p0... +[2024-11-07 15:36:34,300][12380] Loop learner_proc0_evt_loop terminating... +[2024-11-07 15:36:34,300][04584] Waiting for process learner_proc0 to stop... +[2024-11-07 15:36:36,214][04584] Waiting for process inference_proc0-0 to join... +[2024-11-07 15:36:36,215][04584] Waiting for process rollout_proc0 to join... +[2024-11-07 15:36:36,216][04584] Waiting for process rollout_proc1 to join... +[2024-11-07 15:36:36,218][04584] Waiting for process rollout_proc2 to join... +[2024-11-07 15:36:36,220][04584] Waiting for process rollout_proc3 to join... +[2024-11-07 15:36:36,224][04584] Waiting for process rollout_proc4 to join... +[2024-11-07 15:36:36,225][04584] Waiting for process rollout_proc5 to join... +[2024-11-07 15:36:36,228][04584] Waiting for process rollout_proc6 to join... +[2024-11-07 15:36:36,229][04584] Waiting for process rollout_proc7 to join... +[2024-11-07 15:36:36,231][04584] Waiting for process rollout_proc8 to join... +[2024-11-07 15:36:36,244][04584] Waiting for process rollout_proc9 to join... +[2024-11-07 15:36:36,246][04584] Batcher 0 profile tree view: +batching: 0.1186, releasing_batches: 0.0010 +[2024-11-07 15:36:36,249][04584] InferenceWorker_p0-w0 profile tree view: +wait_policy: 0.0051 + wait_policy_total: 3.3668 +update_model: 0.4621 + weight_update: 0.4034 +one_step: 0.0533 + handle_policy_step: 8.7830 + deserialize: 0.1270, stack: 0.0172, obs_to_device_normalize: 3.1395, forward: 4.5298, send_messages: 0.1979 + prepare_outputs: 0.5682 + to_cpu: 0.4418 +[2024-11-07 15:36:36,251][04584] Learner 0 profile tree view: +misc: 0.0000, prepare_batch: 3.5253 +train: 9.8063 + epoch_init: 0.0000, minibatch_init: 0.0000, losses_postprocess: 0.0012, kl_divergence: 0.1010, after_optimizer: 1.5531 + calculate_losses: 1.1109 + losses_init: 0.0000, forward_head: 0.5490, bptt_initial: 0.2427, tail: 0.0206, advantages_returns: 0.0022, losses: 0.2701 + bptt: 0.0253 + bptt_forward_core: 0.0251 + update: 7.0378 + clip: 0.2975 +[2024-11-07 15:36:36,255][04584] RolloutWorker_w0 profile tree view: +wait_for_trajectories: 0.0012, enqueue_policy_requests: 0.1829, env_step: 1.6410, overhead: 0.1598, complete_rollouts: 0.0046 +save_policy_outputs: 0.0904 + split_output_tensors: 0.0334 +[2024-11-07 15:36:36,260][04584] RolloutWorker_w9 profile tree view: +wait_for_trajectories: 0.0012, enqueue_policy_requests: 0.3210, env_step: 2.2709, overhead: 0.0749, complete_rollouts: 0.0414 +save_policy_outputs: 0.0948 + split_output_tensors: 0.0223 +[2024-11-07 15:36:36,263][04584] Loop Runner_EvtLoop terminating... +[2024-11-07 15:36:36,268][04584] Runner profile tree view: +main_loop: 43.0008 +[2024-11-07 15:36:36,270][04584] Collected {0: 16015360}, FPS: 190.5 +[2024-11-07 15:37:41,648][04584] Loading existing experiment configuration from /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/config.json +[2024-11-07 15:37:41,650][04584] Adding new argument 'no_render'=True that is not in the saved config file! +[2024-11-07 15:37:41,652][04584] Adding new argument 'save_video'=True that is not in the saved config file! +[2024-11-07 15:37:41,654][04584] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! +[2024-11-07 15:37:41,656][04584] Adding new argument 'video_name'=None that is not in the saved config file! +[2024-11-07 15:37:41,657][04584] Adding new argument 'max_num_frames'=50000 that is not in the saved config file! +[2024-11-07 15:37:41,659][04584] Adding new argument 'max_num_episodes'=50 that is not in the saved config file! +[2024-11-07 15:37:41,661][04584] Adding new argument 'push_to_hub'=False that is not in the saved config file! +[2024-11-07 15:37:41,662][04584] Adding new argument 'hf_repository'=None that is not in the saved config file! +[2024-11-07 15:37:41,663][04584] Adding new argument 'policy_index'=0 that is not in the saved config file! +[2024-11-07 15:37:41,665][04584] Adding new argument 'eval_deterministic'=False that is not in the saved config file! +[2024-11-07 15:37:41,666][04584] Adding new argument 'train_script'=None that is not in the saved config file! +[2024-11-07 15:37:41,668][04584] Adding new argument 'enjoy_script'=None that is not in the saved config file! +[2024-11-07 15:37:41,671][04584] Using frameskip 1 and render_action_repeat=4 for evaluation +[2024-11-07 15:37:41,960][04584] RunningMeanStd input shape: (3, 72, 128) +[2024-11-07 15:37:41,972][04584] RunningMeanStd input shape: (1,) +[2024-11-07 15:37:42,066][04584] ConvEncoder: input_channels=3 +[2024-11-07 15:37:42,267][04584] Conv encoder output size: 512 +[2024-11-07 15:37:42,271][04584] Policy head output size: 512 +[2024-11-07 15:37:42,469][04584] Loading state from checkpoint /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000003910_16015360.pth... +[2024-11-07 15:37:43,334][04584] Num frames 100... +[2024-11-07 15:37:43,616][04584] Num frames 200... +[2024-11-07 15:37:43,874][04584] Num frames 300... +[2024-11-07 15:37:44,131][04584] Num frames 400... +[2024-11-07 15:37:44,308][04584] Avg episode rewards: #0: 5.480, true rewards: #0: 4.480 +[2024-11-07 15:37:44,314][04584] Avg episode reward: 5.480, avg true_objective: 4.480 +[2024-11-07 15:37:44,458][04584] Num frames 500... +[2024-11-07 15:37:44,722][04584] Num frames 600... +[2024-11-07 15:37:44,966][04584] Num frames 700... +[2024-11-07 15:37:45,212][04584] Num frames 800... +[2024-11-07 15:37:45,399][04584] Num frames 900... +[2024-11-07 15:37:45,576][04584] Avg episode rewards: #0: 6.300, true rewards: #0: 4.800 +[2024-11-07 15:37:45,579][04584] Avg episode reward: 6.300, avg true_objective: 4.800 +[2024-11-07 15:37:45,668][04584] Num frames 1000... +[2024-11-07 15:37:45,866][04584] Num frames 1100... +[2024-11-07 15:37:46,070][04584] Num frames 1200... +[2024-11-07 15:37:46,219][04584] Avg episode rewards: #0: 5.493, true rewards: #0: 4.160 +[2024-11-07 15:37:46,222][04584] Avg episode reward: 5.493, avg true_objective: 4.160 +[2024-11-07 15:37:46,327][04584] Num frames 1300... +[2024-11-07 15:37:46,558][04584] Num frames 1400... +[2024-11-07 15:37:46,796][04584] Num frames 1500... +[2024-11-07 15:37:47,024][04584] Num frames 1600... +[2024-11-07 15:37:47,319][04584] Avg episode rewards: #0: 5.490, true rewards: #0: 4.240 +[2024-11-07 15:37:47,320][04584] Avg episode reward: 5.490, avg true_objective: 4.240 +[2024-11-07 15:37:47,332][04584] Num frames 1700... +[2024-11-07 15:37:47,581][04584] Num frames 1800... +[2024-11-07 15:37:47,849][04584] Num frames 1900... +[2024-11-07 15:37:48,088][04584] Num frames 2000... +[2024-11-07 15:37:48,351][04584] Num frames 2100... +[2024-11-07 15:37:48,524][04584] Avg episode rewards: #0: 5.488, true rewards: #0: 4.288 +[2024-11-07 15:37:48,526][04584] Avg episode reward: 5.488, avg true_objective: 4.288 +[2024-11-07 15:37:48,684][04584] Num frames 2200... +[2024-11-07 15:37:48,922][04584] Num frames 2300... +[2024-11-07 15:37:49,182][04584] Num frames 2400... +[2024-11-07 15:37:49,236][04584] Avg episode rewards: #0: 5.000, true rewards: #0: 4.000 +[2024-11-07 15:37:49,239][04584] Avg episode reward: 5.000, avg true_objective: 4.000 +[2024-11-07 15:37:49,535][04584] Num frames 2500... +[2024-11-07 15:37:49,918][04584] Num frames 2600... +[2024-11-07 15:37:50,190][04584] Num frames 2700... +[2024-11-07 15:37:50,445][04584] Avg episode rewards: #0: 4.834, true rewards: #0: 3.977 +[2024-11-07 15:37:50,448][04584] Avg episode reward: 4.834, avg true_objective: 3.977 +[2024-11-07 15:37:50,499][04584] Num frames 2800... +[2024-11-07 15:37:50,761][04584] Num frames 2900... +[2024-11-07 15:37:51,078][04584] Num frames 3000... +[2024-11-07 15:37:51,427][04584] Num frames 3100... +[2024-11-07 15:37:51,661][04584] Num frames 3200... +[2024-11-07 15:37:51,786][04584] Avg episode rewards: #0: 4.915, true rewards: #0: 4.040 +[2024-11-07 15:37:51,792][04584] Avg episode reward: 4.915, avg true_objective: 4.040 +[2024-11-07 15:37:51,964][04584] Num frames 3300... +[2024-11-07 15:37:52,359][04584] Num frames 3400... +[2024-11-07 15:37:52,697][04584] Num frames 3500... +[2024-11-07 15:37:52,980][04584] Avg episode rewards: #0: 4.777, true rewards: #0: 3.999 +[2024-11-07 15:37:52,985][04584] Avg episode reward: 4.777, avg true_objective: 3.999 +[2024-11-07 15:37:53,003][04584] Num frames 3600... +[2024-11-07 15:37:53,277][04584] Num frames 3700... +[2024-11-07 15:37:53,537][04584] Num frames 3800... +[2024-11-07 15:37:55,495][04584] Num frames 3900... +[2024-11-07 15:37:55,889][04584] Num frames 4000... +[2024-11-07 15:37:56,113][04584] Avg episode rewards: #0: 4.847, true rewards: #0: 4.047 +[2024-11-07 15:37:56,114][04584] Avg episode reward: 4.847, avg true_objective: 4.047 +[2024-11-07 15:37:56,302][04584] Num frames 4100... +[2024-11-07 15:37:56,492][04584] Num frames 4200... +[2024-11-07 15:37:56,686][04584] Num frames 4300... +[2024-11-07 15:37:56,874][04584] Num frames 4400... +[2024-11-07 15:37:56,989][04584] Avg episode rewards: #0: 4.755, true rewards: #0: 4.028 +[2024-11-07 15:37:56,995][04584] Avg episode reward: 4.755, avg true_objective: 4.028 +[2024-11-07 15:37:57,143][04584] Num frames 4500... +[2024-11-07 15:37:57,319][04584] Num frames 4600... +[2024-11-07 15:37:57,495][04584] Num frames 4700... +[2024-11-07 15:37:57,661][04584] Num frames 4800... +[2024-11-07 15:37:57,745][04584] Avg episode rewards: #0: 4.679, true rewards: #0: 4.012 +[2024-11-07 15:37:57,749][04584] Avg episode reward: 4.679, avg true_objective: 4.012 +[2024-11-07 15:37:57,913][04584] Num frames 4900... +[2024-11-07 15:37:58,074][04584] Num frames 5000... +[2024-11-07 15:37:58,242][04584] Num frames 5100... +[2024-11-07 15:37:58,405][04584] Num frames 5200... +[2024-11-07 15:37:58,518][04584] Avg episode rewards: #0: 4.639, true rewards: #0: 4.024 +[2024-11-07 15:37:58,524][04584] Avg episode reward: 4.639, avg true_objective: 4.024 +[2024-11-07 15:37:58,671][04584] Num frames 5300... +[2024-11-07 15:37:59,095][04584] Num frames 5400... +[2024-11-07 15:37:59,551][04584] Num frames 5500... +[2024-11-07 15:37:59,996][04584] Num frames 5600... +[2024-11-07 15:38:00,123][04584] Avg episode rewards: #0: 4.582, true rewards: #0: 4.011 +[2024-11-07 15:38:00,127][04584] Avg episode reward: 4.582, avg true_objective: 4.011 +[2024-11-07 15:38:00,379][04584] Num frames 5700... +[2024-11-07 15:38:00,650][04584] Num frames 5800... +[2024-11-07 15:38:00,927][04584] Num frames 5900... +[2024-11-07 15:38:01,238][04584] Avg episode rewards: #0: 4.533, true rewards: #0: 3.999 +[2024-11-07 15:38:01,240][04584] Avg episode reward: 4.533, avg true_objective: 3.999 +[2024-11-07 15:38:01,243][04584] Num frames 6000... +[2024-11-07 15:38:01,558][04584] Num frames 6100... +[2024-11-07 15:38:01,882][04584] Num frames 6200... +[2024-11-07 15:38:02,265][04584] Num frames 6300... +[2024-11-07 15:38:02,603][04584] Avg episode rewards: #0: 4.489, true rewards: #0: 3.989 +[2024-11-07 15:38:02,607][04584] Avg episode reward: 4.489, avg true_objective: 3.989 +[2024-11-07 15:38:02,683][04584] Num frames 6400... +[2024-11-07 15:38:03,119][04584] Num frames 6500... +[2024-11-07 15:38:03,370][04584] Num frames 6600... +[2024-11-07 15:38:03,681][04584] Num frames 6700... +[2024-11-07 15:38:03,940][04584] Avg episode rewards: #0: 4.451, true rewards: #0: 3.981 +[2024-11-07 15:38:03,946][04584] Avg episode reward: 4.451, avg true_objective: 3.981 +[2024-11-07 15:38:04,043][04584] Num frames 6800... +[2024-11-07 15:38:04,300][04584] Num frames 6900... +[2024-11-07 15:38:04,526][04584] Num frames 7000... +[2024-11-07 15:38:04,767][04584] Num frames 7100... +[2024-11-07 15:38:05,021][04584] Num frames 7200... +[2024-11-07 15:38:05,132][04584] Avg episode rewards: #0: 4.508, true rewards: #0: 4.008 +[2024-11-07 15:38:05,136][04584] Avg episode reward: 4.508, avg true_objective: 4.008 +[2024-11-07 15:38:05,405][04584] Num frames 7300... +[2024-11-07 15:38:05,721][04584] Num frames 7400... +[2024-11-07 15:38:06,063][04584] Num frames 7500... +[2024-11-07 15:38:06,318][04584] Num frames 7600... +[2024-11-07 15:38:06,580][04584] Num frames 7700... +[2024-11-07 15:38:06,845][04584] Num frames 7800... +[2024-11-07 15:38:06,965][04584] Avg episode rewards: #0: 4.749, true rewards: #0: 4.117 +[2024-11-07 15:38:06,971][04584] Avg episode reward: 4.749, avg true_objective: 4.117 +[2024-11-07 15:38:07,186][04584] Num frames 7900... +[2024-11-07 15:38:07,437][04584] Num frames 8000... +[2024-11-07 15:38:07,692][04584] Num frames 8100... +[2024-11-07 15:38:07,937][04584] Num frames 8200... +[2024-11-07 15:38:08,025][04584] Avg episode rewards: #0: 4.704, true rewards: #0: 4.103 +[2024-11-07 15:38:08,029][04584] Avg episode reward: 4.704, avg true_objective: 4.103 +[2024-11-07 15:38:08,317][04584] Num frames 8300... +[2024-11-07 15:38:08,566][04584] Num frames 8400... +[2024-11-07 15:38:08,837][04584] Num frames 8500... +[2024-11-07 15:38:09,107][04584] Num frames 8600... +[2024-11-07 15:38:09,307][04584] Avg episode rewards: #0: 4.740, true rewards: #0: 4.121 +[2024-11-07 15:38:09,308][04584] Avg episode reward: 4.740, avg true_objective: 4.121 +[2024-11-07 15:38:09,453][04584] Num frames 8700... +[2024-11-07 15:38:09,686][04584] Num frames 8800... +[2024-11-07 15:38:09,939][04584] Num frames 8900... +[2024-11-07 15:38:10,177][04584] Num frames 9000... +[2024-11-07 15:38:10,338][04584] Avg episode rewards: #0: 4.700, true rewards: #0: 4.109 +[2024-11-07 15:38:10,342][04584] Avg episode reward: 4.700, avg true_objective: 4.109 +[2024-11-07 15:38:10,508][04584] Num frames 9100... +[2024-11-07 15:38:10,747][04584] Num frames 9200... +[2024-11-07 15:38:11,002][04584] Num frames 9300... +[2024-11-07 15:38:11,253][04584] Num frames 9400... +[2024-11-07 15:38:11,371][04584] Avg episode rewards: #0: 4.662, true rewards: #0: 4.097 +[2024-11-07 15:38:11,373][04584] Avg episode reward: 4.662, avg true_objective: 4.097 +[2024-11-07 15:38:11,558][04584] Num frames 9500... +[2024-11-07 15:38:11,797][04584] Num frames 9600... +[2024-11-07 15:38:12,042][04584] Num frames 9700... +[2024-11-07 15:38:12,321][04584] Num frames 9800... +[2024-11-07 15:38:12,400][04584] Avg episode rewards: #0: 4.628, true rewards: #0: 4.086 +[2024-11-07 15:38:12,405][04584] Avg episode reward: 4.628, avg true_objective: 4.086 +[2024-11-07 15:38:12,627][04584] Num frames 9900... +[2024-11-07 15:38:12,861][04584] Num frames 10000... +[2024-11-07 15:38:13,121][04584] Num frames 10100... +[2024-11-07 15:38:13,439][04584] Avg episode rewards: #0: 4.596, true rewards: #0: 4.076 +[2024-11-07 15:38:13,441][04584] Avg episode reward: 4.596, avg true_objective: 4.076 +[2024-11-07 15:38:13,474][04584] Num frames 10200... +[2024-11-07 15:38:13,711][04584] Num frames 10300... +[2024-11-07 15:38:13,959][04584] Num frames 10400... +[2024-11-07 15:38:14,240][04584] Num frames 10500... +[2024-11-07 15:38:14,512][04584] Avg episode rewards: #0: 4.567, true rewards: #0: 4.067 +[2024-11-07 15:38:14,518][04584] Avg episode reward: 4.567, avg true_objective: 4.067 +[2024-11-07 15:38:14,597][04584] Num frames 10600... +[2024-11-07 15:38:14,856][04584] Num frames 10700... +[2024-11-07 15:38:15,109][04584] Num frames 10800... +[2024-11-07 15:38:15,380][04584] Num frames 10900... +[2024-11-07 15:38:15,591][04584] Avg episode rewards: #0: 4.540, true rewards: #0: 4.059 +[2024-11-07 15:38:15,597][04584] Avg episode reward: 4.540, avg true_objective: 4.059 +[2024-11-07 15:38:15,718][04584] Num frames 11000... +[2024-11-07 15:38:15,972][04584] Num frames 11100... +[2024-11-07 15:38:16,222][04584] Num frames 11200... +[2024-11-07 15:38:16,517][04584] Num frames 11300... +[2024-11-07 15:38:16,679][04584] Avg episode rewards: #0: 4.515, true rewards: #0: 4.051 +[2024-11-07 15:38:16,682][04584] Avg episode reward: 4.515, avg true_objective: 4.051 +[2024-11-07 15:38:16,832][04584] Num frames 11400... +[2024-11-07 15:38:17,074][04584] Num frames 11500... +[2024-11-07 15:38:17,327][04584] Num frames 11600... +[2024-11-07 15:38:17,592][04584] Num frames 11700... +[2024-11-07 15:38:17,715][04584] Avg episode rewards: #0: 4.492, true rewards: #0: 4.044 +[2024-11-07 15:38:17,716][04584] Avg episode reward: 4.492, avg true_objective: 4.044 +[2024-11-07 15:38:17,918][04584] Num frames 11800... +[2024-11-07 15:38:18,163][04584] Num frames 11900... +[2024-11-07 15:38:18,415][04584] Num frames 12000... +[2024-11-07 15:38:18,667][04584] Num frames 12100... +[2024-11-07 15:38:18,899][04584] Avg episode rewards: #0: 4.525, true rewards: #0: 4.058 +[2024-11-07 15:38:18,901][04584] Avg episode reward: 4.525, avg true_objective: 4.058 +[2024-11-07 15:38:18,974][04584] Num frames 12200... +[2024-11-07 15:38:19,220][04584] Num frames 12300... +[2024-11-07 15:38:19,443][04584] Num frames 12400... +[2024-11-07 15:38:19,702][04584] Num frames 12500... +[2024-11-07 15:38:19,901][04584] Avg episode rewards: #0: 4.503, true rewards: #0: 4.051 +[2024-11-07 15:38:19,907][04584] Avg episode reward: 4.503, avg true_objective: 4.051 +[2024-11-07 15:38:20,027][04584] Num frames 12600... +[2024-11-07 15:38:20,278][04584] Num frames 12700... +[2024-11-07 15:38:20,526][04584] Num frames 12800... +[2024-11-07 15:38:20,752][04584] Num frames 12900... +[2024-11-07 15:38:20,839][04584] Avg episode rewards: #0: 4.503, true rewards: #0: 4.035 +[2024-11-07 15:38:20,846][04584] Avg episode reward: 4.503, avg true_objective: 4.035 +[2024-11-07 15:38:21,054][04584] Num frames 13000... +[2024-11-07 15:38:21,288][04584] Num frames 13100... +[2024-11-07 15:38:21,597][04584] Num frames 13200... +[2024-11-07 15:38:21,876][04584] Avg episode rewards: #0: 4.483, true rewards: #0: 4.029 +[2024-11-07 15:38:21,880][04584] Avg episode reward: 4.483, avg true_objective: 4.029 +[2024-11-07 15:38:21,916][04584] Num frames 13300... +[2024-11-07 15:38:22,192][04584] Num frames 13400... +[2024-11-07 15:38:22,459][04584] Num frames 13500... +[2024-11-07 15:38:22,719][04584] Num frames 13600... +[2024-11-07 15:38:22,781][04584] Avg episode rewards: #0: 4.442, true rewards: #0: 4.001 +[2024-11-07 15:38:22,788][04584] Avg episode reward: 4.442, avg true_objective: 4.001 +[2024-11-07 15:38:23,111][04584] Num frames 13700... +[2024-11-07 15:38:23,346][04584] Num frames 13800... +[2024-11-07 15:38:23,604][04584] Num frames 13900... +[2024-11-07 15:38:23,879][04584] Num frames 14000... +[2024-11-07 15:38:23,997][04584] Avg episode rewards: #0: 4.434, true rewards: #0: 4.005 +[2024-11-07 15:38:24,001][04584] Avg episode reward: 4.434, avg true_objective: 4.005 +[2024-11-07 15:38:24,227][04584] Num frames 14100... +[2024-11-07 15:38:24,495][04584] Num frames 14200... +[2024-11-07 15:38:24,742][04584] Num frames 14300... +[2024-11-07 15:38:25,027][04584] Num frames 14400... +[2024-11-07 15:38:25,091][04584] Avg episode rewards: #0: 4.417, true rewards: #0: 4.001 +[2024-11-07 15:38:25,094][04584] Avg episode reward: 4.417, avg true_objective: 4.001 +[2024-11-07 15:38:25,343][04584] Num frames 14500... +[2024-11-07 15:38:25,587][04584] Num frames 14600... +[2024-11-07 15:38:25,844][04584] Num frames 14700... +[2024-11-07 15:38:26,130][04584] Avg episode rewards: #0: 4.402, true rewards: #0: 3.996 +[2024-11-07 15:38:26,132][04584] Avg episode reward: 4.402, avg true_objective: 3.996 +[2024-11-07 15:38:26,174][04584] Num frames 14800... +[2024-11-07 15:38:26,405][04584] Num frames 14900... +[2024-11-07 15:38:26,648][04584] Num frames 15000... +[2024-11-07 15:38:26,901][04584] Num frames 15100... +[2024-11-07 15:38:27,138][04584] Avg episode rewards: #0: 4.387, true rewards: #0: 3.992 +[2024-11-07 15:38:27,141][04584] Avg episode reward: 4.387, avg true_objective: 3.992 +[2024-11-07 15:38:27,236][04584] Num frames 15200... +[2024-11-07 15:38:27,478][04584] Num frames 15300... +[2024-11-07 15:38:29,403][04584] Num frames 15400... +[2024-11-07 15:38:29,697][04584] Num frames 15500... +[2024-11-07 15:38:29,972][04584] Num frames 15600... +[2024-11-07 15:38:30,085][04584] Avg episode rewards: #0: 4.415, true rewards: #0: 4.005 +[2024-11-07 15:38:30,089][04584] Avg episode reward: 4.415, avg true_objective: 4.005 +[2024-11-07 15:38:30,347][04584] Num frames 15700... +[2024-11-07 15:38:30,657][04584] Num frames 15800... +[2024-11-07 15:38:30,952][04584] Num frames 15900... +[2024-11-07 15:38:31,268][04584] Num frames 16000... +[2024-11-07 15:38:31,337][04584] Avg episode rewards: #0: 4.401, true rewards: #0: 4.000 +[2024-11-07 15:38:31,342][04584] Avg episode reward: 4.401, avg true_objective: 4.000 +[2024-11-07 15:38:31,684][04584] Num frames 16100... +[2024-11-07 15:38:32,012][04584] Num frames 16200... +[2024-11-07 15:38:32,319][04584] Num frames 16300... +[2024-11-07 15:38:32,645][04584] Avg episode rewards: #0: 4.387, true rewards: #0: 3.997 +[2024-11-07 15:38:32,646][04584] Avg episode reward: 4.387, avg true_objective: 3.997 +[2024-11-07 15:38:32,697][04584] Num frames 16400... +[2024-11-07 15:38:33,003][04584] Num frames 16500... +[2024-11-07 15:38:33,307][04584] Num frames 16600... +[2024-11-07 15:38:33,484][04584] Avg episode rewards: #0: 4.343, true rewards: #0: 3.962 +[2024-11-07 15:38:33,488][04584] Avg episode reward: 4.343, avg true_objective: 3.962 +[2024-11-07 15:38:33,668][04584] Num frames 16700... +[2024-11-07 15:38:34,047][04584] Num frames 16800... +[2024-11-07 15:38:34,523][04584] Num frames 16900... +[2024-11-07 15:38:34,994][04584] Num frames 17000... +[2024-11-07 15:38:35,794][04584] Avg episode rewards: #0: 4.370, true rewards: #0: 3.974 +[2024-11-07 15:38:35,798][04584] Avg episode reward: 4.370, avg true_objective: 3.974 +[2024-11-07 15:38:35,859][04584] Num frames 17100... +[2024-11-07 15:38:36,262][04584] Num frames 17200... +[2024-11-07 15:38:36,655][04584] Num frames 17300... +[2024-11-07 15:38:37,022][04584] Num frames 17400... +[2024-11-07 15:38:37,312][04584] Avg episode rewards: #0: 4.358, true rewards: #0: 3.971 +[2024-11-07 15:38:37,315][04584] Avg episode reward: 4.358, avg true_objective: 3.971 +[2024-11-07 15:38:37,421][04584] Num frames 17500... +[2024-11-07 15:38:37,774][04584] Num frames 17600... +[2024-11-07 15:38:38,184][04584] Num frames 17700... +[2024-11-07 15:38:38,526][04584] Num frames 17800... +[2024-11-07 15:38:38,811][04584] Avg episode rewards: #0: 4.346, true rewards: #0: 3.968 +[2024-11-07 15:38:38,813][04584] Avg episode reward: 4.346, avg true_objective: 3.968 +[2024-11-07 15:38:39,007][04584] Num frames 17900... +[2024-11-07 15:38:39,335][04584] Num frames 18000... +[2024-11-07 15:38:39,650][04584] Num frames 18100... +[2024-11-07 15:38:39,961][04584] Num frames 18200... +[2024-11-07 15:38:40,263][04584] Avg episode rewards: #0: 4.342, true rewards: #0: 3.973 +[2024-11-07 15:38:40,266][04584] Avg episode reward: 4.342, avg true_objective: 3.973 +[2024-11-07 15:38:40,341][04584] Num frames 18300... +[2024-11-07 15:38:40,634][04584] Num frames 18400... +[2024-11-07 15:38:40,919][04584] Num frames 18500... +[2024-11-07 15:38:41,232][04584] Num frames 18600... +[2024-11-07 15:38:41,471][04584] Avg episode rewards: #0: 4.331, true rewards: #0: 3.970 +[2024-11-07 15:38:41,474][04584] Avg episode reward: 4.331, avg true_objective: 3.970 +[2024-11-07 15:38:41,617][04584] Num frames 18700... +[2024-11-07 15:38:42,015][04584] Num frames 18800... +[2024-11-07 15:38:42,322][04584] Num frames 18900... +[2024-11-07 15:38:42,631][04584] Num frames 19000... +[2024-11-07 15:38:42,886][04584] Num frames 19100... +[2024-11-07 15:38:42,995][04584] Avg episode rewards: #0: 4.355, true rewards: #0: 3.980 +[2024-11-07 15:38:42,997][04584] Avg episode reward: 4.355, avg true_objective: 3.980 +[2024-11-07 15:38:43,228][04584] Num frames 19200... +[2024-11-07 15:38:43,472][04584] Num frames 19300... +[2024-11-07 15:38:43,715][04584] Num frames 19400... +[2024-11-07 15:38:44,003][04584] Avg episode rewards: #0: 4.345, true rewards: #0: 3.978 +[2024-11-07 15:38:44,006][04584] Avg episode reward: 4.345, avg true_objective: 3.978 +[2024-11-07 15:38:44,051][04584] Num frames 19500... +[2024-11-07 15:38:44,300][04584] Num frames 19600... +[2024-11-07 15:38:44,563][04584] Num frames 19700... +[2024-11-07 15:38:44,799][04584] Num frames 19800... +[2024-11-07 15:38:45,032][04584] Avg episode rewards: #0: 4.335, true rewards: #0: 3.975 +[2024-11-07 15:38:45,034][04584] Avg episode reward: 4.335, avg true_objective: 3.975 +[2024-11-07 15:39:53,318][04584] Replay video saved to /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/replay.mp4! +[2024-11-07 15:40:57,409][04584] Loading existing experiment configuration from /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/config.json +[2024-11-07 15:40:57,411][04584] Overriding arg 'num_workers' with value 4 passed from command line +[2024-11-07 15:40:57,412][04584] Adding new argument 'no_render'=True that is not in the saved config file! +[2024-11-07 15:40:57,414][04584] Adding new argument 'save_video'=True that is not in the saved config file! +[2024-11-07 15:40:57,416][04584] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! +[2024-11-07 15:40:57,417][04584] Adding new argument 'video_name'=None that is not in the saved config file! +[2024-11-07 15:40:57,418][04584] Adding new argument 'max_num_frames'=50000 that is not in the saved config file! +[2024-11-07 15:40:57,421][04584] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! +[2024-11-07 15:40:57,423][04584] Adding new argument 'push_to_hub'=True that is not in the saved config file! +[2024-11-07 15:40:57,424][04584] Adding new argument 'hf_repository'='alidenewade/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! +[2024-11-07 15:40:57,425][04584] Adding new argument 'policy_index'=0 that is not in the saved config file! +[2024-11-07 15:40:57,427][04584] Adding new argument 'eval_deterministic'=False that is not in the saved config file! +[2024-11-07 15:40:57,431][04584] Adding new argument 'train_script'=None that is not in the saved config file! +[2024-11-07 15:40:57,433][04584] Adding new argument 'enjoy_script'=None that is not in the saved config file! +[2024-11-07 15:40:57,436][04584] Using frameskip 1 and render_action_repeat=4 for evaluation +[2024-11-07 15:40:57,474][04584] RunningMeanStd input shape: (3, 72, 128) +[2024-11-07 15:40:57,476][04584] RunningMeanStd input shape: (1,) +[2024-11-07 15:40:57,512][04584] ConvEncoder: input_channels=3 +[2024-11-07 15:40:57,565][04584] Conv encoder output size: 512 +[2024-11-07 15:40:57,567][04584] Policy head output size: 512 +[2024-11-07 15:40:57,598][04584] Loading state from checkpoint /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000003910_16015360.pth... +[2024-11-07 15:40:58,332][04584] Num frames 100... +[2024-11-07 15:40:58,540][04584] Num frames 200... +[2024-11-07 15:40:58,710][04584] Avg episode rewards: #0: 2.560, true rewards: #0: 2.560 +[2024-11-07 15:40:58,711][04584] Avg episode reward: 2.560, avg true_objective: 2.560 +[2024-11-07 15:40:58,803][04584] Num frames 300... +[2024-11-07 15:40:58,973][04584] Num frames 400... +[2024-11-07 15:40:59,162][04584] Num frames 500... +[2024-11-07 15:40:59,354][04584] Num frames 600... +[2024-11-07 15:40:59,547][04584] Avg episode rewards: #0: 3.860, true rewards: #0: 3.360 +[2024-11-07 15:40:59,550][04584] Avg episode reward: 3.860, avg true_objective: 3.360 +[2024-11-07 15:40:59,619][04584] Num frames 700... +[2024-11-07 15:40:59,833][04584] Num frames 800... +[2024-11-07 15:41:00,011][04584] Num frames 900... +[2024-11-07 15:41:00,183][04584] Num frames 1000... +[2024-11-07 15:41:00,331][04584] Avg episode rewards: #0: 3.853, true rewards: #0: 3.520 +[2024-11-07 15:41:00,333][04584] Avg episode reward: 3.853, avg true_objective: 3.520 +[2024-11-07 15:41:00,444][04584] Num frames 1100... +[2024-11-07 15:41:00,779][04584] Num frames 1200... +[2024-11-07 15:41:01,066][04584] Num frames 1300... +[2024-11-07 15:41:01,488][04584] Num frames 1400... +[2024-11-07 15:41:01,736][04584] Avg episode rewards: #0: 3.850, true rewards: #0: 3.600 +[2024-11-07 15:41:01,738][04584] Avg episode reward: 3.850, avg true_objective: 3.600 +[2024-11-07 15:41:01,904][04584] Num frames 1500... +[2024-11-07 15:41:02,150][04584] Num frames 1600... +[2024-11-07 15:41:02,426][04584] Num frames 1700... +[2024-11-07 15:41:02,629][04584] Num frames 1800... +[2024-11-07 15:41:02,733][04584] Avg episode rewards: #0: 3.848, true rewards: #0: 3.648 +[2024-11-07 15:41:02,736][04584] Avg episode reward: 3.848, avg true_objective: 3.648 +[2024-11-07 15:41:03,184][04584] Num frames 1900... +[2024-11-07 15:41:03,620][04584] Num frames 2000... +[2024-11-07 15:41:03,844][04584] Num frames 2100... +[2024-11-07 15:41:04,080][04584] Num frames 2200... +[2024-11-07 15:41:04,173][04584] Avg episode rewards: #0: 3.847, true rewards: #0: 3.680 +[2024-11-07 15:41:04,174][04584] Avg episode reward: 3.847, avg true_objective: 3.680 +[2024-11-07 15:41:04,417][04584] Num frames 2300... +[2024-11-07 15:41:04,822][04584] Num frames 2400... +[2024-11-07 15:41:05,110][04584] Num frames 2500... +[2024-11-07 15:41:05,409][04584] Num frames 2600... +[2024-11-07 15:41:05,670][04584] Avg episode rewards: #0: 4.080, true rewards: #0: 3.794 +[2024-11-07 15:41:05,674][04584] Avg episode reward: 4.080, avg true_objective: 3.794 +[2024-11-07 15:41:05,824][04584] Num frames 2700... +[2024-11-07 15:41:06,090][04584] Num frames 2800... +[2024-11-07 15:41:06,377][04584] Num frames 2900... +[2024-11-07 15:41:06,676][04584] Num frames 3000... +[2024-11-07 15:41:06,844][04584] Avg episode rewards: #0: 4.050, true rewards: #0: 3.800 +[2024-11-07 15:41:06,846][04584] Avg episode reward: 4.050, avg true_objective: 3.800 +[2024-11-07 15:41:07,023][04584] Num frames 3100... +[2024-11-07 15:41:07,265][04584] Num frames 3200... +[2024-11-07 15:41:07,535][04584] Num frames 3300... +[2024-11-07 15:41:07,812][04584] Num frames 3400... +[2024-11-07 15:41:07,932][04584] Avg episode rewards: #0: 4.027, true rewards: #0: 3.804 +[2024-11-07 15:41:07,935][04584] Avg episode reward: 4.027, avg true_objective: 3.804 +[2024-11-07 15:41:08,163][04584] Num frames 3500... +[2024-11-07 15:41:08,509][04584] Num frames 3600... +[2024-11-07 15:41:08,879][04584] Num frames 3700... +[2024-11-07 15:41:09,205][04584] Num frames 3800... +[2024-11-07 15:41:09,303][04584] Avg episode rewards: #0: 4.008, true rewards: #0: 3.808 +[2024-11-07 15:41:09,305][04584] Avg episode reward: 4.008, avg true_objective: 3.808 +[2024-11-07 15:41:24,061][04584] Replay video saved to /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/replay.mp4! +[2024-11-07 15:41:33,571][04584] The model has been pushed to https://huggingface.co/alidenewade/rl_course_vizdoom_health_gathering_supreme +[2024-11-07 15:55:04,579][14395] Saving configuration to /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/config.json... +[2024-11-07 15:55:04,580][14395] Rollout worker 0 uses device cpu +[2024-11-07 15:55:04,583][14395] Rollout worker 1 uses device cpu +[2024-11-07 15:55:04,586][14395] Rollout worker 2 uses device cpu +[2024-11-07 15:55:04,590][14395] Rollout worker 3 uses device cpu +[2024-11-07 15:55:04,592][14395] Rollout worker 4 uses device cpu +[2024-11-07 15:55:04,594][14395] Rollout worker 5 uses device cpu +[2024-11-07 15:55:04,597][14395] Rollout worker 6 uses device cpu +[2024-11-07 15:55:04,599][14395] Rollout worker 7 uses device cpu +[2024-11-07 15:55:04,705][14395] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-11-07 15:55:04,709][14395] InferenceWorker_p0-w0: min num requests: 2 +[2024-11-07 15:55:04,753][14395] Starting all processes... +[2024-11-07 15:55:04,756][14395] Starting process learner_proc0 +[2024-11-07 15:55:04,934][14395] Starting all processes... +[2024-11-07 15:55:05,018][14395] Starting process inference_proc0-0 +[2024-11-07 15:55:05,019][14395] Starting process rollout_proc0 +[2024-11-07 15:55:05,020][14395] Starting process rollout_proc1 +[2024-11-07 15:55:05,020][14395] Starting process rollout_proc2 +[2024-11-07 15:55:05,021][14395] Starting process rollout_proc3 +[2024-11-07 15:55:05,021][14395] Starting process rollout_proc4 +[2024-11-07 15:55:05,022][14395] Starting process rollout_proc5 +[2024-11-07 15:55:05,022][14395] Starting process rollout_proc6 +[2024-11-07 15:55:05,023][14395] Starting process rollout_proc7 +[2024-11-07 15:55:11,298][14445] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-11-07 15:55:11,313][14445] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 +[2024-11-07 15:55:11,489][14477] Worker 4 uses CPU cores [4] +[2024-11-07 15:55:11,561][14466] Worker 2 uses CPU cores [2] +[2024-11-07 15:55:12,164][14445] Num visible devices: 1 +[2024-11-07 15:55:12,223][14445] Starting seed is not provided +[2024-11-07 15:55:12,223][14445] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-11-07 15:55:12,223][14445] Initializing actor-critic model on device cuda:0 +[2024-11-07 15:55:12,224][14445] RunningMeanStd input shape: (3, 72, 128) +[2024-11-07 15:55:12,226][14445] RunningMeanStd input shape: (1,) +[2024-11-07 15:55:12,331][14445] ConvEncoder: input_channels=3 +[2024-11-07 15:55:12,727][14468] Worker 3 uses CPU cores [3] +[2024-11-07 15:55:12,770][14445] Conv encoder output size: 512 +[2024-11-07 15:55:12,771][14445] Policy head output size: 512 +[2024-11-07 15:55:12,806][14445] Created Actor Critic model with architecture: +[2024-11-07 15:55:12,807][14445] ActorCriticSharedWeights( + (obs_normalizer): ObservationNormalizer( + (running_mean_std): RunningMeanStdDictInPlace( + (running_mean_std): ModuleDict( + (obs): RunningMeanStdInPlace() + ) + ) + ) + (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) + (encoder): VizdoomEncoder( + (basic_encoder): ConvEncoder( + (enc): RecursiveScriptModule( + original_name=ConvEncoderImpl + (conv_head): RecursiveScriptModule( + original_name=Sequential + (0): RecursiveScriptModule(original_name=Conv2d) + (1): RecursiveScriptModule(original_name=ELU) + (2): RecursiveScriptModule(original_name=Conv2d) + (3): RecursiveScriptModule(original_name=ELU) + (4): RecursiveScriptModule(original_name=Conv2d) + (5): RecursiveScriptModule(original_name=ELU) + ) + (mlp_layers): RecursiveScriptModule( + original_name=Sequential + (0): RecursiveScriptModule(original_name=Linear) + (1): RecursiveScriptModule(original_name=ELU) + ) + ) + ) + ) + (core): ModelCoreRNN( + (core): GRU(512, 512) + ) + (decoder): MlpDecoder( + (mlp): Identity() + ) + (critic_linear): Linear(in_features=512, out_features=1, bias=True) + (action_parameterization): ActionParameterizationDefault( + (distribution_linear): Linear(in_features=512, out_features=5, bias=True) + ) +) +[2024-11-07 15:55:13,345][14469] Worker 7 uses CPU cores [0, 1, 2, 3, 4, 5, 6] +[2024-11-07 15:55:13,552][14467] Worker 1 uses CPU cores [1] +[2024-11-07 15:55:13,584][14462] Worker 0 uses CPU cores [0] +[2024-11-07 15:55:13,653][14461] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-11-07 15:55:13,654][14461] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 +[2024-11-07 15:55:13,687][14461] Num visible devices: 1 +[2024-11-07 15:55:14,018][14478] Worker 5 uses CPU cores [5] +[2024-11-07 15:55:14,479][14470] Worker 6 uses CPU cores [6] +[2024-11-07 15:55:14,948][14445] Using optimizer +[2024-11-07 15:55:16,722][14445] Loading state from checkpoint /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000003910_16015360.pth... +[2024-11-07 15:55:16,819][14445] Loading model from checkpoint +[2024-11-07 15:55:16,822][14445] Loaded experiment state at self.train_step=3910, self.env_steps=16015360 +[2024-11-07 15:55:16,822][14445] Initialized policy 0 weights for model version 3910 +[2024-11-07 15:55:16,834][14445] LearnerWorker_p0 finished initialization! +[2024-11-07 15:55:16,834][14445] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-11-07 15:55:17,090][14461] RunningMeanStd input shape: (3, 72, 128) +[2024-11-07 15:55:17,092][14461] RunningMeanStd input shape: (1,) +[2024-11-07 15:55:17,104][14461] ConvEncoder: input_channels=3 +[2024-11-07 15:55:17,233][14461] Conv encoder output size: 512 +[2024-11-07 15:55:17,234][14461] Policy head output size: 512 +[2024-11-07 15:55:17,302][14395] Inference worker 0-0 is ready! +[2024-11-07 15:55:17,303][14395] All inference workers are ready! Signal rollout workers to start! +[2024-11-07 15:55:17,409][14477] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 15:55:17,415][14468] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 15:55:17,439][14466] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 15:55:17,467][14467] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 15:55:17,478][14462] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 15:55:17,512][14478] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 15:55:17,571][14470] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 15:55:17,576][14469] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 15:55:18,028][14468] Decorrelating experience for 0 frames... +[2024-11-07 15:55:18,032][14477] Decorrelating experience for 0 frames... +[2024-11-07 15:55:18,032][14466] Decorrelating experience for 0 frames... +[2024-11-07 15:55:18,261][14467] Decorrelating experience for 0 frames... +[2024-11-07 15:55:18,314][14462] Decorrelating experience for 0 frames... +[2024-11-07 15:55:18,449][14469] Decorrelating experience for 0 frames... +[2024-11-07 15:55:18,583][14477] Decorrelating experience for 32 frames... +[2024-11-07 15:55:18,825][14468] Decorrelating experience for 32 frames... +[2024-11-07 15:55:18,828][14467] Decorrelating experience for 32 frames... +[2024-11-07 15:55:18,837][14462] Decorrelating experience for 32 frames... +[2024-11-07 15:55:18,977][14478] Decorrelating experience for 0 frames... +[2024-11-07 15:55:19,335][14477] Decorrelating experience for 64 frames... +[2024-11-07 15:55:19,454][14478] Decorrelating experience for 32 frames... +[2024-11-07 15:55:19,456][14466] Decorrelating experience for 32 frames... +[2024-11-07 15:55:19,552][14462] Decorrelating experience for 64 frames... +[2024-11-07 15:55:19,814][14477] Decorrelating experience for 96 frames... +[2024-11-07 15:55:19,889][14470] Decorrelating experience for 0 frames... +[2024-11-07 15:55:19,910][14467] Decorrelating experience for 64 frames... +[2024-11-07 15:55:19,927][14395] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 16015360. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-11-07 15:55:20,253][14462] Decorrelating experience for 96 frames... +[2024-11-07 15:55:20,461][14478] Decorrelating experience for 64 frames... +[2024-11-07 15:55:20,464][14466] Decorrelating experience for 64 frames... +[2024-11-07 15:55:20,466][14469] Decorrelating experience for 32 frames... +[2024-11-07 15:55:20,646][14470] Decorrelating experience for 32 frames... +[2024-11-07 15:55:20,674][14467] Decorrelating experience for 96 frames... +[2024-11-07 15:55:21,007][14468] Decorrelating experience for 64 frames... +[2024-11-07 15:55:21,032][14466] Decorrelating experience for 96 frames... +[2024-11-07 15:55:21,394][14478] Decorrelating experience for 96 frames... +[2024-11-07 15:55:21,694][14468] Decorrelating experience for 96 frames... +[2024-11-07 15:55:21,701][14469] Decorrelating experience for 64 frames... +[2024-11-07 15:55:21,913][14470] Decorrelating experience for 64 frames... +[2024-11-07 15:55:22,539][14469] Decorrelating experience for 96 frames... +[2024-11-07 15:55:23,773][14445] Signal inference workers to stop experience collection... +[2024-11-07 15:55:23,794][14461] InferenceWorker_p0-w0: stopping experience collection +[2024-11-07 15:55:23,856][14470] Decorrelating experience for 96 frames... +[2024-11-07 15:55:25,856][14395] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 16015360. Throughput: 0: 338.7. Samples: 2008. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-11-07 15:55:25,860][14395] Avg episode reward: [(0, '2.242')] +[2024-11-07 15:55:25,865][14395] Heartbeat connected on RolloutWorker_w1 +[2024-11-07 15:55:25,866][14395] Heartbeat connected on InferenceWorker_p0-w0 +[2024-11-07 15:55:25,869][14395] Heartbeat connected on RolloutWorker_w6 +[2024-11-07 15:55:25,870][14395] Heartbeat connected on RolloutWorker_w7 +[2024-11-07 15:55:25,872][14395] Heartbeat connected on RolloutWorker_w2 +[2024-11-07 15:55:25,873][14395] Heartbeat connected on RolloutWorker_w0 +[2024-11-07 15:55:25,876][14395] Heartbeat connected on Batcher_0 +[2024-11-07 15:55:25,877][14395] Heartbeat connected on RolloutWorker_w5 +[2024-11-07 15:55:25,883][14395] Heartbeat connected on RolloutWorker_w3 +[2024-11-07 15:55:25,889][14395] Heartbeat connected on RolloutWorker_w4 +[2024-11-07 15:55:29,927][14395] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 16015360. Throughput: 0: 247.8. Samples: 2478. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-11-07 15:55:29,929][14395] Avg episode reward: [(0, '2.242')] +[2024-11-07 15:55:30,500][14445] Signal inference workers to resume experience collection... +[2024-11-07 15:55:30,501][14461] InferenceWorker_p0-w0: resuming experience collection +[2024-11-07 15:55:31,048][14395] Heartbeat connected on LearnerWorker_p0 +[2024-11-07 15:55:34,927][14395] Fps is (10 sec: 3160.8, 60 sec: 1911.5, 300 sec: 1911.5). Total num frames: 16044032. Throughput: 0: 313.5. Samples: 4702. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 15:55:34,933][14395] Avg episode reward: [(0, '3.523')] +[2024-11-07 15:55:36,987][14461] Updated weights for policy 0, policy_version 3920 (0.0055) +[2024-11-07 15:55:39,928][14395] Fps is (10 sec: 5734.1, 60 sec: 2867.2, 300 sec: 2867.2). Total num frames: 16072704. Throughput: 0: 644.5. Samples: 12890. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 15:55:39,934][14395] Avg episode reward: [(0, '4.344')] +[2024-11-07 15:55:44,929][14395] Fps is (10 sec: 4914.8, 60 sec: 3112.9, 300 sec: 3112.9). Total num frames: 16093184. Throughput: 0: 794.9. Samples: 19872. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 15:55:44,931][14395] Avg episode reward: [(0, '4.448')] +[2024-11-07 15:55:45,034][14461] Updated weights for policy 0, policy_version 3930 (0.0028) +[2024-11-07 15:55:49,927][14395] Fps is (10 sec: 4915.4, 60 sec: 3549.9, 300 sec: 3549.9). Total num frames: 16121856. Throughput: 0: 820.3. Samples: 24610. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 15:55:49,929][14395] Avg episode reward: [(0, '4.447')] +[2024-11-07 15:55:52,269][14461] Updated weights for policy 0, policy_version 3940 (0.0045) +[2024-11-07 15:55:54,927][14395] Fps is (10 sec: 6554.4, 60 sec: 4096.0, 300 sec: 4096.0). Total num frames: 16158720. Throughput: 0: 957.8. Samples: 33524. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 15:55:54,930][14395] Avg episode reward: [(0, '4.553')] +[2024-11-07 15:55:58,004][14461] Updated weights for policy 0, policy_version 3950 (0.0031) +[2024-11-07 15:55:59,928][14395] Fps is (10 sec: 5734.2, 60 sec: 4096.0, 300 sec: 4096.0). Total num frames: 16179200. Throughput: 0: 1050.7. Samples: 42028. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 15:55:59,934][14395] Avg episode reward: [(0, '4.512')] +[2024-11-07 15:56:04,928][14395] Fps is (10 sec: 4915.1, 60 sec: 4278.1, 300 sec: 4278.1). Total num frames: 16207872. Throughput: 0: 1004.8. Samples: 45216. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 15:56:04,930][14395] Avg episode reward: [(0, '4.356')] +[2024-11-07 15:56:06,269][14461] Updated weights for policy 0, policy_version 3960 (0.0029) +[2024-11-07 15:56:09,927][14395] Fps is (10 sec: 6963.4, 60 sec: 4669.5, 300 sec: 4669.5). Total num frames: 16248832. Throughput: 0: 1221.1. Samples: 55822. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 15:56:09,929][14395] Avg episode reward: [(0, '4.361')] +[2024-11-07 15:56:11,796][14461] Updated weights for policy 0, policy_version 3970 (0.0027) +[2024-11-07 15:56:14,936][14395] Fps is (10 sec: 6548.2, 60 sec: 4691.1, 300 sec: 4691.1). Total num frames: 16273408. Throughput: 0: 1383.7. Samples: 64754. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 15:56:14,939][14395] Avg episode reward: [(0, '4.444')] +[2024-11-07 15:56:19,928][14395] Fps is (10 sec: 4095.7, 60 sec: 4573.8, 300 sec: 4573.8). Total num frames: 16289792. Throughput: 0: 1394.7. Samples: 67462. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 15:56:19,931][14395] Avg episode reward: [(0, '4.359')] +[2024-11-07 15:56:21,825][14461] Updated weights for policy 0, policy_version 3980 (0.0043) +[2024-11-07 15:56:24,927][14395] Fps is (10 sec: 4509.4, 60 sec: 5131.2, 300 sec: 4663.2). Total num frames: 16318464. Throughput: 0: 1371.5. Samples: 74606. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 15:56:24,932][14395] Avg episode reward: [(0, '4.307')] +[2024-11-07 15:56:27,554][14461] Updated weights for policy 0, policy_version 3990 (0.0023) +[2024-11-07 15:56:29,928][14395] Fps is (10 sec: 6553.7, 60 sec: 5666.1, 300 sec: 4856.7). Total num frames: 16355328. Throughput: 0: 1448.9. Samples: 85070. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 15:56:29,931][14395] Avg episode reward: [(0, '4.548')] +[2024-11-07 15:56:34,927][14395] Fps is (10 sec: 5734.3, 60 sec: 5529.6, 300 sec: 4806.0). Total num frames: 16375808. Throughput: 0: 1439.1. Samples: 89368. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-07 15:56:34,929][14395] Avg episode reward: [(0, '4.400')] +[2024-11-07 15:56:35,600][14461] Updated weights for policy 0, policy_version 4000 (0.0043) +[2024-11-07 15:56:39,927][14395] Fps is (10 sec: 6144.3, 60 sec: 5734.4, 300 sec: 5017.6). Total num frames: 16416768. Throughput: 0: 1447.8. Samples: 98676. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-07 15:56:39,929][14395] Avg episode reward: [(0, '4.556')] +[2024-11-07 15:56:40,909][14461] Updated weights for policy 0, policy_version 4010 (0.0028) +[2024-11-07 15:56:44,927][14395] Fps is (10 sec: 6553.6, 60 sec: 5802.8, 300 sec: 5011.6). Total num frames: 16441344. Throughput: 0: 1453.7. Samples: 107446. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 15:56:44,940][14395] Avg episode reward: [(0, '4.731')] +[2024-11-07 15:56:49,927][14395] Fps is (10 sec: 4505.5, 60 sec: 5666.1, 300 sec: 4960.7). Total num frames: 16461824. Throughput: 0: 1449.9. Samples: 110462. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 15:56:49,930][14395] Avg episode reward: [(0, '4.555')] +[2024-11-07 15:56:50,788][14461] Updated weights for policy 0, policy_version 4020 (0.0026) +[2024-11-07 15:56:54,928][14395] Fps is (10 sec: 4095.9, 60 sec: 5393.0, 300 sec: 4915.2). Total num frames: 16482304. Throughput: 0: 1338.9. Samples: 116072. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 15:56:54,932][14395] Avg episode reward: [(0, '4.357')] +[2024-11-07 15:56:59,928][14395] Fps is (10 sec: 4095.9, 60 sec: 5393.1, 300 sec: 4874.2). Total num frames: 16502784. Throughput: 0: 1269.0. Samples: 121850. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 15:56:59,932][14395] Avg episode reward: [(0, '4.413')] +[2024-11-07 15:56:59,977][14445] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000004029_16502784.pth... +[2024-11-07 15:57:00,368][14445] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000003908_16007168.pth +[2024-11-07 15:57:00,908][14461] Updated weights for policy 0, policy_version 4030 (0.0067) +[2024-11-07 15:57:04,928][14395] Fps is (10 sec: 4505.4, 60 sec: 5324.7, 300 sec: 4876.2). Total num frames: 16527360. Throughput: 0: 1281.3. Samples: 125120. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 15:57:04,931][14395] Avg episode reward: [(0, '4.501')] +[2024-11-07 15:57:09,928][14395] Fps is (10 sec: 4095.6, 60 sec: 4915.1, 300 sec: 4803.4). Total num frames: 16543744. Throughput: 0: 1247.7. Samples: 130756. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 15:57:09,931][14395] Avg episode reward: [(0, '4.539')] +[2024-11-07 15:57:10,937][14461] Updated weights for policy 0, policy_version 4040 (0.0040) +[2024-11-07 15:57:14,927][14395] Fps is (10 sec: 4096.3, 60 sec: 4915.9, 300 sec: 4808.4). Total num frames: 16568320. Throughput: 0: 1192.9. Samples: 138748. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 15:57:14,930][14395] Avg episode reward: [(0, '4.564')] +[2024-11-07 15:57:19,301][14461] Updated weights for policy 0, policy_version 4050 (0.0059) +[2024-11-07 15:57:19,935][14395] Fps is (10 sec: 4502.5, 60 sec: 4982.8, 300 sec: 4778.3). Total num frames: 16588800. Throughput: 0: 1185.9. Samples: 142742. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 15:57:19,938][14395] Avg episode reward: [(0, '4.485')] +[2024-11-07 15:57:24,940][14395] Fps is (10 sec: 4499.7, 60 sec: 4914.1, 300 sec: 4783.6). Total num frames: 16613376. Throughput: 0: 1116.7. Samples: 148942. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 15:57:24,942][14395] Avg episode reward: [(0, '4.436')] +[2024-11-07 15:57:27,985][14461] Updated weights for policy 0, policy_version 4060 (0.0038) +[2024-11-07 15:57:29,930][14395] Fps is (10 sec: 4917.8, 60 sec: 4710.2, 300 sec: 4789.1). Total num frames: 16637952. Throughput: 0: 1088.2. Samples: 156418. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 15:57:29,933][14395] Avg episode reward: [(0, '4.518')] +[2024-11-07 15:57:34,927][14395] Fps is (10 sec: 5331.8, 60 sec: 4846.9, 300 sec: 4824.2). Total num frames: 16666624. Throughput: 0: 1114.8. Samples: 160630. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 15:57:34,936][14395] Avg episode reward: [(0, '4.508')] +[2024-11-07 15:57:35,266][14461] Updated weights for policy 0, policy_version 4070 (0.0031) +[2024-11-07 15:57:39,927][14395] Fps is (10 sec: 4916.6, 60 sec: 4505.6, 300 sec: 4798.2). Total num frames: 16687104. Throughput: 0: 1131.8. Samples: 167002. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 15:57:39,931][14395] Avg episode reward: [(0, '4.463')] +[2024-11-07 15:57:44,927][14395] Fps is (10 sec: 3276.8, 60 sec: 4300.8, 300 sec: 4717.5). Total num frames: 16699392. Throughput: 0: 1109.5. Samples: 171778. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 15:57:44,931][14395] Avg episode reward: [(0, '4.400')] +[2024-11-07 15:57:46,388][14461] Updated weights for policy 0, policy_version 4080 (0.0058) +[2024-11-07 15:57:49,927][14395] Fps is (10 sec: 4915.3, 60 sec: 4573.9, 300 sec: 4806.0). Total num frames: 16736256. Throughput: 0: 1139.0. Samples: 176372. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 15:57:49,929][14395] Avg episode reward: [(0, '4.383')] +[2024-11-07 15:57:51,689][14461] Updated weights for policy 0, policy_version 4090 (0.0024) +[2024-11-07 15:57:54,927][14395] Fps is (10 sec: 7782.4, 60 sec: 4915.2, 300 sec: 4915.2). Total num frames: 16777216. Throughput: 0: 1275.9. Samples: 188170. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 15:57:54,929][14395] Avg episode reward: [(0, '4.469')] +[2024-11-07 15:57:57,147][14461] Updated weights for policy 0, policy_version 4100 (0.0032) +[2024-11-07 15:57:59,927][14395] Fps is (10 sec: 7372.9, 60 sec: 5120.0, 300 sec: 4966.4). Total num frames: 16809984. Throughput: 0: 1340.7. Samples: 199080. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 15:57:59,929][14395] Avg episode reward: [(0, '4.689')] +[2024-11-07 15:58:04,528][14461] Updated weights for policy 0, policy_version 4110 (0.0048) +[2024-11-07 15:58:04,927][14395] Fps is (10 sec: 5734.4, 60 sec: 5120.1, 300 sec: 4964.9). Total num frames: 16834560. Throughput: 0: 1332.6. Samples: 202700. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 15:58:04,929][14395] Avg episode reward: [(0, '4.492')] +[2024-11-07 15:58:09,927][14395] Fps is (10 sec: 6144.0, 60 sec: 5461.5, 300 sec: 5035.7). Total num frames: 16871424. Throughput: 0: 1410.6. Samples: 212400. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 15:58:09,929][14395] Avg episode reward: [(0, '4.335')] +[2024-11-07 15:58:10,164][14461] Updated weights for policy 0, policy_version 4120 (0.0027) +[2024-11-07 15:58:16,757][14395] Fps is (10 sec: 5193.6, 60 sec: 5299.7, 300 sec: 4980.2). Total num frames: 16896000. Throughput: 0: 1304.1. Samples: 217486. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 15:58:16,759][14395] Avg episode reward: [(0, '4.390')] +[2024-11-07 15:58:19,359][14461] Updated weights for policy 0, policy_version 4130 (0.0031) +[2024-11-07 15:58:19,927][14395] Fps is (10 sec: 4505.6, 60 sec: 5462.1, 300 sec: 5006.2). Total num frames: 16916480. Throughput: 0: 1369.6. Samples: 222262. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-11-07 15:58:19,930][14395] Avg episode reward: [(0, '4.352')] +[2024-11-07 15:58:24,928][14395] Fps is (10 sec: 6517.1, 60 sec: 5599.1, 300 sec: 5048.0). Total num frames: 16949248. Throughput: 0: 1459.8. Samples: 232694. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 15:58:24,930][14395] Avg episode reward: [(0, '4.351')] +[2024-11-07 15:58:25,722][14461] Updated weights for policy 0, policy_version 4140 (0.0038) +[2024-11-07 15:58:29,927][14395] Fps is (10 sec: 6553.6, 60 sec: 5734.7, 300 sec: 5087.7). Total num frames: 16982016. Throughput: 0: 1550.2. Samples: 241538. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 15:58:29,929][14395] Avg episode reward: [(0, '4.556')] +[2024-11-07 15:58:31,857][14461] Updated weights for policy 0, policy_version 4150 (0.0032) +[2024-11-07 15:58:34,927][14395] Fps is (10 sec: 7373.2, 60 sec: 5939.2, 300 sec: 5167.3). Total num frames: 17022976. Throughput: 0: 1578.4. Samples: 247402. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 15:58:34,929][14395] Avg episode reward: [(0, '4.514')] +[2024-11-07 15:58:37,073][14461] Updated weights for policy 0, policy_version 4160 (0.0021) +[2024-11-07 15:58:39,927][14395] Fps is (10 sec: 7782.4, 60 sec: 6212.3, 300 sec: 5222.4). Total num frames: 17059840. Throughput: 0: 1581.1. Samples: 259320. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 15:58:39,928][14395] Avg episode reward: [(0, '4.365')] +[2024-11-07 15:58:42,649][14461] Updated weights for policy 0, policy_version 4170 (0.0027) +[2024-11-07 15:58:44,927][14395] Fps is (10 sec: 6963.1, 60 sec: 6553.6, 300 sec: 5254.9). Total num frames: 17092608. Throughput: 0: 1568.2. Samples: 269650. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 15:58:44,929][14395] Avg episode reward: [(0, '4.379')] +[2024-11-07 15:58:48,952][14461] Updated weights for policy 0, policy_version 4180 (0.0025) +[2024-11-07 15:58:51,218][14395] Fps is (10 sec: 5441.7, 60 sec: 6282.0, 300 sec: 5234.1). Total num frames: 17121280. Throughput: 0: 1553.4. Samples: 274606. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 15:58:51,220][14395] Avg episode reward: [(0, '4.417')] +[2024-11-07 15:58:54,927][14395] Fps is (10 sec: 5324.6, 60 sec: 6144.0, 300 sec: 5258.1). Total num frames: 17145856. Throughput: 0: 1511.7. Samples: 280426. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 15:58:54,930][14395] Avg episode reward: [(0, '4.625')] +[2024-11-07 15:58:57,073][14461] Updated weights for policy 0, policy_version 4190 (0.0026) +[2024-11-07 15:58:59,927][14395] Fps is (10 sec: 7054.4, 60 sec: 6212.3, 300 sec: 5306.2). Total num frames: 17182720. Throughput: 0: 1721.4. Samples: 291798. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 15:58:59,929][14395] Avg episode reward: [(0, '4.445')] +[2024-11-07 15:58:59,947][14445] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000004195_17182720.pth... +[2024-11-07 15:59:00,087][14445] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000003910_16015360.pth +[2024-11-07 15:59:02,802][14461] Updated weights for policy 0, policy_version 4200 (0.0023) +[2024-11-07 15:59:04,928][14395] Fps is (10 sec: 7372.4, 60 sec: 6417.0, 300 sec: 5352.1). Total num frames: 17219584. Throughput: 0: 1662.1. Samples: 297056. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 15:59:04,931][14395] Avg episode reward: [(0, '4.652')] +[2024-11-07 15:59:08,029][14461] Updated weights for policy 0, policy_version 4210 (0.0024) +[2024-11-07 15:59:09,930][14395] Fps is (10 sec: 7370.7, 60 sec: 6416.8, 300 sec: 5396.0). Total num frames: 17256448. Throughput: 0: 1687.7. Samples: 308644. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 15:59:09,933][14395] Avg episode reward: [(0, '4.373')] +[2024-11-07 15:59:14,459][14461] Updated weights for policy 0, policy_version 4220 (0.0045) +[2024-11-07 15:59:14,927][14395] Fps is (10 sec: 6554.2, 60 sec: 6689.4, 300 sec: 5403.2). Total num frames: 17285120. Throughput: 0: 1698.0. Samples: 317946. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 15:59:14,932][14395] Avg episode reward: [(0, '4.586')] +[2024-11-07 15:59:19,928][14395] Fps is (10 sec: 6145.4, 60 sec: 6690.1, 300 sec: 5427.2). Total num frames: 17317888. Throughput: 0: 1677.6. Samples: 322894. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 15:59:19,931][14395] Avg episode reward: [(0, '4.264')] +[2024-11-07 15:59:20,972][14461] Updated weights for policy 0, policy_version 4230 (0.0044) +[2024-11-07 15:59:25,496][14395] Fps is (10 sec: 4650.4, 60 sec: 6356.8, 300 sec: 5370.8). Total num frames: 17334272. Throughput: 0: 1578.1. Samples: 331234. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 15:59:25,502][14395] Avg episode reward: [(0, '4.232')] +[2024-11-07 15:59:29,927][14395] Fps is (10 sec: 3686.5, 60 sec: 6212.2, 300 sec: 5357.6). Total num frames: 17354752. Throughput: 0: 1457.4. Samples: 335232. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 15:59:29,929][14395] Avg episode reward: [(0, '4.326')] +[2024-11-07 15:59:31,611][14461] Updated weights for policy 0, policy_version 4240 (0.0051) +[2024-11-07 15:59:34,927][14395] Fps is (10 sec: 5646.1, 60 sec: 6075.7, 300 sec: 5381.0). Total num frames: 17387520. Throughput: 0: 1491.8. Samples: 339814. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 15:59:34,929][14395] Avg episode reward: [(0, '4.443')] +[2024-11-07 15:59:37,263][14461] Updated weights for policy 0, policy_version 4250 (0.0036) +[2024-11-07 15:59:39,927][14395] Fps is (10 sec: 7373.0, 60 sec: 6144.0, 300 sec: 5435.1). Total num frames: 17428480. Throughput: 0: 1568.5. Samples: 351010. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 15:59:39,932][14395] Avg episode reward: [(0, '4.296')] +[2024-11-07 15:59:42,885][14461] Updated weights for policy 0, policy_version 4260 (0.0025) +[2024-11-07 15:59:44,927][14395] Fps is (10 sec: 7373.0, 60 sec: 6144.0, 300 sec: 5456.2). Total num frames: 17461248. Throughput: 0: 1548.5. Samples: 361482. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 15:59:44,930][14395] Avg episode reward: [(0, '4.273')] +[2024-11-07 15:59:48,105][14461] Updated weights for policy 0, policy_version 4270 (0.0021) +[2024-11-07 15:59:49,927][14395] Fps is (10 sec: 7372.7, 60 sec: 6488.3, 300 sec: 5506.8). Total num frames: 17502208. Throughput: 0: 1569.5. Samples: 367682. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 15:59:49,929][14395] Avg episode reward: [(0, '4.594')] +[2024-11-07 15:59:53,467][14461] Updated weights for policy 0, policy_version 4280 (0.0023) +[2024-11-07 15:59:54,928][14395] Fps is (10 sec: 7782.0, 60 sec: 6553.6, 300 sec: 5540.8). Total num frames: 17539072. Throughput: 0: 1574.4. Samples: 379488. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 15:59:54,932][14395] Avg episode reward: [(0, '4.520')] +[2024-11-07 16:00:00,137][14395] Fps is (10 sec: 5616.5, 60 sec: 6258.6, 300 sec: 5510.8). Total num frames: 17559552. Throughput: 0: 1484.8. Samples: 385072. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 16:00:00,140][14395] Avg episode reward: [(0, '4.470')] +[2024-11-07 16:00:01,768][14461] Updated weights for policy 0, policy_version 4290 (0.0040) +[2024-11-07 16:00:04,927][14395] Fps is (10 sec: 4505.7, 60 sec: 6075.8, 300 sec: 5504.5). Total num frames: 17584128. Throughput: 0: 1506.7. Samples: 390696. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 16:00:04,930][14395] Avg episode reward: [(0, '4.333')] +[2024-11-07 16:00:08,597][14461] Updated weights for policy 0, policy_version 4300 (0.0031) +[2024-11-07 16:00:09,927][14395] Fps is (10 sec: 6275.8, 60 sec: 6076.0, 300 sec: 5536.7). Total num frames: 17620992. Throughput: 0: 1537.9. Samples: 399564. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 16:00:09,929][14395] Avg episode reward: [(0, '4.285')] +[2024-11-07 16:00:14,927][14395] Fps is (10 sec: 6553.6, 60 sec: 6075.7, 300 sec: 5540.0). Total num frames: 17649664. Throughput: 0: 1631.6. Samples: 408654. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 16:00:14,929][14395] Avg episode reward: [(0, '4.383')] +[2024-11-07 16:00:15,813][14461] Updated weights for policy 0, policy_version 4310 (0.0032) +[2024-11-07 16:00:19,927][14395] Fps is (10 sec: 5324.8, 60 sec: 5939.3, 300 sec: 5641.1). Total num frames: 17674240. Throughput: 0: 1607.7. Samples: 412162. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 16:00:19,929][14395] Avg episode reward: [(0, '4.445')] +[2024-11-07 16:00:22,655][14461] Updated weights for policy 0, policy_version 4320 (0.0036) +[2024-11-07 16:00:24,927][14395] Fps is (10 sec: 5734.5, 60 sec: 6271.8, 300 sec: 5734.4). Total num frames: 17707008. Throughput: 0: 1570.3. Samples: 421674. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 16:00:24,932][14395] Avg episode reward: [(0, '4.288')] +[2024-11-07 16:00:28,572][14461] Updated weights for policy 0, policy_version 4330 (0.0029) +[2024-11-07 16:00:29,927][14395] Fps is (10 sec: 6963.0, 60 sec: 6485.3, 300 sec: 5762.2). Total num frames: 17743872. Throughput: 0: 1572.7. Samples: 432254. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 16:00:29,929][14395] Avg episode reward: [(0, '4.498')] +[2024-11-07 16:00:34,927][14395] Fps is (10 sec: 5734.4, 60 sec: 6280.6, 300 sec: 5734.4). Total num frames: 17764352. Throughput: 0: 1546.8. Samples: 437286. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 16:00:34,929][14395] Avg episode reward: [(0, '4.555')] +[2024-11-07 16:00:37,269][14461] Updated weights for policy 0, policy_version 4340 (0.0042) +[2024-11-07 16:00:39,927][14395] Fps is (10 sec: 4096.1, 60 sec: 5939.2, 300 sec: 5734.4). Total num frames: 17784832. Throughput: 0: 1386.6. Samples: 441886. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 16:00:39,929][14395] Avg episode reward: [(0, '4.577')] +[2024-11-07 16:00:44,927][14395] Fps is (10 sec: 4505.5, 60 sec: 5802.6, 300 sec: 5720.5). Total num frames: 17809408. Throughput: 0: 1422.1. Samples: 448766. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 16:00:44,929][14395] Avg episode reward: [(0, '4.496')] +[2024-11-07 16:00:46,121][14461] Updated weights for policy 0, policy_version 4350 (0.0056) +[2024-11-07 16:00:49,927][14395] Fps is (10 sec: 5734.4, 60 sec: 5666.1, 300 sec: 5706.6). Total num frames: 17842176. Throughput: 0: 1381.6. Samples: 452868. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 16:00:49,929][14395] Avg episode reward: [(0, '4.489')] +[2024-11-07 16:00:52,000][14461] Updated weights for policy 0, policy_version 4360 (0.0030) +[2024-11-07 16:00:54,927][14395] Fps is (10 sec: 6963.4, 60 sec: 5666.2, 300 sec: 5762.2). Total num frames: 17879040. Throughput: 0: 1432.7. Samples: 464034. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 16:00:54,929][14395] Avg episode reward: [(0, '4.361')] +[2024-11-07 16:00:58,239][14461] Updated weights for policy 0, policy_version 4370 (0.0043) +[2024-11-07 16:00:59,931][14395] Fps is (10 sec: 6550.9, 60 sec: 5822.6, 300 sec: 5762.1). Total num frames: 17907712. Throughput: 0: 1439.8. Samples: 473450. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 16:00:59,937][14395] Avg episode reward: [(0, '4.600')] +[2024-11-07 16:00:59,955][14445] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000004372_17907712.pth... +[2024-11-07 16:01:00,145][14445] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000004029_16502784.pth +[2024-11-07 16:01:04,927][14395] Fps is (10 sec: 5324.7, 60 sec: 5802.7, 300 sec: 5706.6). Total num frames: 17932288. Throughput: 0: 1424.9. Samples: 476284. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 16:01:04,930][14395] Avg episode reward: [(0, '4.918')] +[2024-11-07 16:01:04,934][14445] Saving new best policy, reward=4.918! +[2024-11-07 16:01:06,342][14461] Updated weights for policy 0, policy_version 4380 (0.0053) +[2024-11-07 16:01:09,927][14395] Fps is (10 sec: 3688.0, 60 sec: 5393.1, 300 sec: 5665.1). Total num frames: 17944576. Throughput: 0: 1347.9. Samples: 482328. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 16:01:09,929][14395] Avg episode reward: [(0, '4.684')] +[2024-11-07 16:01:14,927][14395] Fps is (10 sec: 3686.4, 60 sec: 5324.8, 300 sec: 5692.8). Total num frames: 17969152. Throughput: 0: 1255.9. Samples: 488770. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 16:01:14,928][14395] Avg episode reward: [(0, '4.702')] +[2024-11-07 16:01:16,923][14461] Updated weights for policy 0, policy_version 4390 (0.0039) +[2024-11-07 16:01:19,927][14395] Fps is (10 sec: 4915.2, 60 sec: 5324.8, 300 sec: 5678.9). Total num frames: 17993728. Throughput: 0: 1229.5. Samples: 492612. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 16:01:19,929][14395] Avg episode reward: [(0, '4.364')] +[2024-11-07 16:01:24,927][14395] Fps is (10 sec: 4915.3, 60 sec: 5188.3, 300 sec: 5637.2). Total num frames: 18018304. Throughput: 0: 1287.0. Samples: 499802. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 16:01:24,929][14395] Avg episode reward: [(0, '4.314')] +[2024-11-07 16:01:25,040][14461] Updated weights for policy 0, policy_version 4400 (0.0054) +[2024-11-07 16:01:29,928][14395] Fps is (10 sec: 5734.2, 60 sec: 5120.0, 300 sec: 5678.9). Total num frames: 18051072. Throughput: 0: 1352.4. Samples: 509622. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 16:01:29,931][14395] Avg episode reward: [(0, '4.397')] +[2024-11-07 16:01:31,325][14461] Updated weights for policy 0, policy_version 4410 (0.0026) +[2024-11-07 16:01:34,927][14395] Fps is (10 sec: 6553.6, 60 sec: 5324.8, 300 sec: 5651.1). Total num frames: 18083840. Throughput: 0: 1370.8. Samples: 514552. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-07 16:01:34,931][14395] Avg episode reward: [(0, '4.457')] +[2024-11-07 16:01:37,833][14461] Updated weights for policy 0, policy_version 4420 (0.0036) +[2024-11-07 16:01:39,927][14395] Fps is (10 sec: 6553.7, 60 sec: 5529.6, 300 sec: 5678.9). Total num frames: 18116608. Throughput: 0: 1324.4. Samples: 523632. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-07 16:01:39,929][14395] Avg episode reward: [(0, '4.292')] +[2024-11-07 16:01:44,927][14395] Fps is (10 sec: 4915.2, 60 sec: 5393.1, 300 sec: 5665.0). Total num frames: 18132992. Throughput: 0: 1249.5. Samples: 529674. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 16:01:44,930][14395] Avg episode reward: [(0, '4.513')] +[2024-11-07 16:01:46,301][14461] Updated weights for policy 0, policy_version 4430 (0.0022) +[2024-11-07 16:01:49,927][14395] Fps is (10 sec: 4915.1, 60 sec: 5393.0, 300 sec: 5706.6). Total num frames: 18165760. Throughput: 0: 1301.3. Samples: 534842. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 16:01:49,929][14395] Avg episode reward: [(0, '4.579')] +[2024-11-07 16:01:53,333][14461] Updated weights for policy 0, policy_version 4440 (0.0042) +[2024-11-07 16:01:54,928][14395] Fps is (10 sec: 5733.9, 60 sec: 5188.2, 300 sec: 5720.5). Total num frames: 18190336. Throughput: 0: 1357.5. Samples: 543416. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 16:01:54,930][14395] Avg episode reward: [(0, '4.285')] +[2024-11-07 16:01:59,270][14461] Updated weights for policy 0, policy_version 4450 (0.0025) +[2024-11-07 16:01:59,927][14395] Fps is (10 sec: 6144.2, 60 sec: 5325.2, 300 sec: 5762.2). Total num frames: 18227200. Throughput: 0: 1444.0. Samples: 553752. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 16:01:59,928][14395] Avg episode reward: [(0, '4.199')] +[2024-11-07 16:02:04,928][14395] Fps is (10 sec: 7373.1, 60 sec: 5529.6, 300 sec: 5831.6). Total num frames: 18264064. Throughput: 0: 1459.9. Samples: 558310. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 16:02:04,929][14395] Avg episode reward: [(0, '4.323')] +[2024-11-07 16:02:05,171][14461] Updated weights for policy 0, policy_version 4460 (0.0035) +[2024-11-07 16:02:09,927][14395] Fps is (10 sec: 6963.0, 60 sec: 5870.9, 300 sec: 5859.4). Total num frames: 18296832. Throughput: 0: 1556.9. Samples: 569862. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 16:02:09,930][14395] Avg episode reward: [(0, '4.337')] +[2024-11-07 16:02:11,877][14461] Updated weights for policy 0, policy_version 4470 (0.0039) +[2024-11-07 16:02:14,928][14395] Fps is (10 sec: 6553.6, 60 sec: 6007.4, 300 sec: 5901.2). Total num frames: 18329600. Throughput: 0: 1525.9. Samples: 578288. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 16:02:14,929][14395] Avg episode reward: [(0, '4.454')] +[2024-11-07 16:02:19,928][14395] Fps is (10 sec: 4914.7, 60 sec: 5870.8, 300 sec: 5873.5). Total num frames: 18345984. Throughput: 0: 1478.4. Samples: 581084. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 16:02:19,930][14395] Avg episode reward: [(0, '4.500')] +[2024-11-07 16:02:20,550][14461] Updated weights for policy 0, policy_version 4480 (0.0038) +[2024-11-07 16:02:24,927][14395] Fps is (10 sec: 4915.3, 60 sec: 6007.4, 300 sec: 5901.1). Total num frames: 18378752. Throughput: 0: 1451.3. Samples: 588942. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 16:02:24,929][14395] Avg episode reward: [(0, '4.422')] +[2024-11-07 16:02:26,971][14461] Updated weights for policy 0, policy_version 4490 (0.0040) +[2024-11-07 16:02:29,928][14395] Fps is (10 sec: 6144.5, 60 sec: 5939.2, 300 sec: 5901.0). Total num frames: 18407424. Throughput: 0: 1530.0. Samples: 598524. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 16:02:29,932][14395] Avg episode reward: [(0, '4.494')] +[2024-11-07 16:02:34,478][14461] Updated weights for policy 0, policy_version 4500 (0.0036) +[2024-11-07 16:02:34,927][14395] Fps is (10 sec: 5324.8, 60 sec: 5802.7, 300 sec: 5914.9). Total num frames: 18432000. Throughput: 0: 1493.3. Samples: 602038. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-07 16:02:34,930][14395] Avg episode reward: [(0, '4.464')] +[2024-11-07 16:02:39,928][14395] Fps is (10 sec: 4915.2, 60 sec: 5666.1, 300 sec: 5956.5). Total num frames: 18456576. Throughput: 0: 1452.6. Samples: 608784. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-11-07 16:02:39,930][14395] Avg episode reward: [(0, '4.351')] +[2024-11-07 16:02:42,419][14461] Updated weights for policy 0, policy_version 4510 (0.0040) +[2024-11-07 16:02:44,927][14395] Fps is (10 sec: 5324.8, 60 sec: 5870.9, 300 sec: 5928.8). Total num frames: 18485248. Throughput: 0: 1429.1. Samples: 618062. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-11-07 16:02:44,929][14395] Avg episode reward: [(0, '4.379')] +[2024-11-07 16:02:49,091][14461] Updated weights for policy 0, policy_version 4520 (0.0051) +[2024-11-07 16:02:49,927][14395] Fps is (10 sec: 6144.2, 60 sec: 5870.9, 300 sec: 5901.0). Total num frames: 18518016. Throughput: 0: 1435.6. Samples: 622912. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 16:02:49,929][14395] Avg episode reward: [(0, '4.579')] +[2024-11-07 16:02:54,927][14395] Fps is (10 sec: 4915.2, 60 sec: 5734.5, 300 sec: 5845.5). Total num frames: 18534400. Throughput: 0: 1298.6. Samples: 628298. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 16:02:54,929][14395] Avg episode reward: [(0, '4.503')] +[2024-11-07 16:02:58,299][14461] Updated weights for policy 0, policy_version 4530 (0.0038) +[2024-11-07 16:02:59,932][14395] Fps is (10 sec: 4503.7, 60 sec: 5597.5, 300 sec: 5859.3). Total num frames: 18563072. Throughput: 0: 1309.3. Samples: 637212. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 16:02:59,939][14395] Avg episode reward: [(0, '4.345')] +[2024-11-07 16:02:59,951][14445] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000004532_18563072.pth... +[2024-11-07 16:03:00,280][14445] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000004195_17182720.pth +[2024-11-07 16:03:04,571][14461] Updated weights for policy 0, policy_version 4540 (0.0023) +[2024-11-07 16:03:04,927][14395] Fps is (10 sec: 6144.0, 60 sec: 5529.6, 300 sec: 5845.5). Total num frames: 18595840. Throughput: 0: 1357.9. Samples: 642186. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 16:03:04,940][14395] Avg episode reward: [(0, '4.329')] +[2024-11-07 16:03:09,928][14395] Fps is (10 sec: 6146.5, 60 sec: 5461.3, 300 sec: 5895.9). Total num frames: 18624512. Throughput: 0: 1394.9. Samples: 651714. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 16:03:09,930][14395] Avg episode reward: [(0, '4.494')] +[2024-11-07 16:03:11,780][14461] Updated weights for policy 0, policy_version 4550 (0.0036) +[2024-11-07 16:03:14,927][14395] Fps is (10 sec: 6143.9, 60 sec: 5461.3, 300 sec: 5901.0). Total num frames: 18657280. Throughput: 0: 1379.3. Samples: 660592. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 16:03:14,930][14395] Avg episode reward: [(0, '4.480')] +[2024-11-07 16:03:17,864][14461] Updated weights for policy 0, policy_version 4560 (0.0031) +[2024-11-07 16:03:19,927][14395] Fps is (10 sec: 6144.1, 60 sec: 5666.2, 300 sec: 5887.1). Total num frames: 18685952. Throughput: 0: 1405.0. Samples: 665262. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-07 16:03:19,930][14395] Avg episode reward: [(0, '4.341')] +[2024-11-07 16:03:27,180][14395] Fps is (10 sec: 4680.1, 60 sec: 5395.3, 300 sec: 5828.7). Total num frames: 18714624. Throughput: 0: 1375.1. Samples: 673762. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 16:03:27,196][14395] Avg episode reward: [(0, '4.461')] +[2024-11-07 16:03:27,887][14461] Updated weights for policy 0, policy_version 4570 (0.0042) +[2024-11-07 16:03:29,928][14395] Fps is (10 sec: 3686.0, 60 sec: 5256.5, 300 sec: 5762.1). Total num frames: 18722816. Throughput: 0: 1323.3. Samples: 677612. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-11-07 16:03:29,941][14395] Avg episode reward: [(0, '4.371')] +[2024-11-07 16:03:34,927][14395] Fps is (10 sec: 4229.7, 60 sec: 5256.5, 300 sec: 5720.5). Total num frames: 18747392. Throughput: 0: 1290.2. Samples: 680972. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 16:03:34,929][14395] Avg episode reward: [(0, '4.341')] +[2024-11-07 16:03:36,869][14461] Updated weights for policy 0, policy_version 4580 (0.0057) +[2024-11-07 16:03:39,928][14395] Fps is (10 sec: 4915.4, 60 sec: 5256.5, 300 sec: 5692.7). Total num frames: 18771968. Throughput: 0: 1337.3. Samples: 688476. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 16:03:39,930][14395] Avg episode reward: [(0, '4.376')] +[2024-11-07 16:03:44,774][14461] Updated weights for policy 0, policy_version 4590 (0.0065) +[2024-11-07 16:03:44,928][14395] Fps is (10 sec: 5324.7, 60 sec: 5256.5, 300 sec: 5717.8). Total num frames: 18800640. Throughput: 0: 1316.3. Samples: 696440. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 16:03:44,930][14395] Avg episode reward: [(0, '4.450')] +[2024-11-07 16:03:49,927][14395] Fps is (10 sec: 5325.2, 60 sec: 5120.0, 300 sec: 5692.8). Total num frames: 18825216. Throughput: 0: 1286.0. Samples: 700054. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 16:03:49,929][14395] Avg episode reward: [(0, '4.593')] +[2024-11-07 16:03:52,136][14461] Updated weights for policy 0, policy_version 4600 (0.0042) +[2024-11-07 16:03:54,927][14395] Fps is (10 sec: 5325.0, 60 sec: 5324.8, 300 sec: 5665.0). Total num frames: 18853888. Throughput: 0: 1271.2. Samples: 708916. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 16:03:54,930][14395] Avg episode reward: [(0, '4.477')] +[2024-11-07 16:03:59,332][14461] Updated weights for policy 0, policy_version 4610 (0.0046) +[2024-11-07 16:04:01,643][14395] Fps is (10 sec: 4894.5, 60 sec: 5177.1, 300 sec: 5604.6). Total num frames: 18882560. Throughput: 0: 1220.3. Samples: 717600. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 16:04:01,645][14395] Avg episode reward: [(0, '4.510')] +[2024-11-07 16:04:04,927][14395] Fps is (10 sec: 4505.5, 60 sec: 5051.7, 300 sec: 5567.8). Total num frames: 18898944. Throughput: 0: 1185.0. Samples: 718586. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 16:04:04,935][14395] Avg episode reward: [(0, '4.586')] +[2024-11-07 16:04:09,356][14461] Updated weights for policy 0, policy_version 4620 (0.0029) +[2024-11-07 16:04:09,930][14395] Fps is (10 sec: 4942.8, 60 sec: 4983.3, 300 sec: 5553.8). Total num frames: 18923520. Throughput: 0: 1227.2. Samples: 726224. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 16:04:09,932][14395] Avg episode reward: [(0, '4.688')] +[2024-11-07 16:04:14,927][14395] Fps is (10 sec: 5734.6, 60 sec: 4983.5, 300 sec: 5553.9). Total num frames: 18956288. Throughput: 0: 1282.8. Samples: 735336. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 16:04:14,928][14395] Avg episode reward: [(0, '4.624')] +[2024-11-07 16:04:16,116][14461] Updated weights for policy 0, policy_version 4630 (0.0026) +[2024-11-07 16:04:19,927][14395] Fps is (10 sec: 6145.6, 60 sec: 4983.5, 300 sec: 5606.4). Total num frames: 18984960. Throughput: 0: 1312.2. Samples: 740020. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 16:04:19,929][14395] Avg episode reward: [(0, '4.610')] +[2024-11-07 16:04:22,497][14461] Updated weights for policy 0, policy_version 4640 (0.0031) +[2024-11-07 16:04:24,927][14395] Fps is (10 sec: 6143.9, 60 sec: 5248.8, 300 sec: 5637.2). Total num frames: 19017728. Throughput: 0: 1357.9. Samples: 749580. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 16:04:24,931][14395] Avg episode reward: [(0, '4.613')] +[2024-11-07 16:04:29,009][14461] Updated weights for policy 0, policy_version 4650 (0.0034) +[2024-11-07 16:04:29,932][14395] Fps is (10 sec: 6550.6, 60 sec: 5461.0, 300 sec: 5637.1). Total num frames: 19050496. Throughput: 0: 1386.9. Samples: 758856. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 16:04:29,934][14395] Avg episode reward: [(0, '4.465')] +[2024-11-07 16:04:36,334][14395] Fps is (10 sec: 5027.1, 60 sec: 5336.2, 300 sec: 5555.2). Total num frames: 19075072. Throughput: 0: 1359.5. Samples: 763146. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 16:04:36,336][14395] Avg episode reward: [(0, '4.551')] +[2024-11-07 16:04:38,320][14461] Updated weights for policy 0, policy_version 4660 (0.0038) +[2024-11-07 16:04:39,927][14395] Fps is (10 sec: 4507.7, 60 sec: 5393.1, 300 sec: 5540.0). Total num frames: 19095552. Throughput: 0: 1327.1. Samples: 768636. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 16:04:39,930][14395] Avg episode reward: [(0, '4.531')] +[2024-11-07 16:04:44,927][14395] Fps is (10 sec: 5719.9, 60 sec: 5393.1, 300 sec: 5498.4). Total num frames: 19124224. Throughput: 0: 1389.3. Samples: 777736. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-07 16:04:44,930][14395] Avg episode reward: [(0, '4.333')] +[2024-11-07 16:04:45,079][14461] Updated weights for policy 0, policy_version 4670 (0.0038) +[2024-11-07 16:04:49,927][14395] Fps is (10 sec: 6144.0, 60 sec: 5529.6, 300 sec: 5484.5). Total num frames: 19156992. Throughput: 0: 1415.9. Samples: 782302. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 16:04:49,929][14395] Avg episode reward: [(0, '4.411')] +[2024-11-07 16:04:52,097][14461] Updated weights for policy 0, policy_version 4680 (0.0032) +[2024-11-07 16:04:54,928][14395] Fps is (10 sec: 5734.2, 60 sec: 5461.3, 300 sec: 5502.3). Total num frames: 19181568. Throughput: 0: 1438.7. Samples: 790964. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 16:04:54,929][14395] Avg episode reward: [(0, '4.371')] +[2024-11-07 16:04:58,806][14461] Updated weights for policy 0, policy_version 4690 (0.0029) +[2024-11-07 16:04:59,929][14395] Fps is (10 sec: 5733.3, 60 sec: 5692.2, 300 sec: 5526.1). Total num frames: 19214336. Throughput: 0: 1440.9. Samples: 800180. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 16:04:59,932][14395] Avg episode reward: [(0, '4.705')] +[2024-11-07 16:04:59,948][14445] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000004691_19214336.pth... +[2024-11-07 16:05:00,105][14445] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000004372_17907712.pth +[2024-11-07 16:05:04,927][14395] Fps is (10 sec: 6144.2, 60 sec: 5734.4, 300 sec: 5498.4). Total num frames: 19243008. Throughput: 0: 1419.7. Samples: 803908. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 16:05:04,930][14395] Avg episode reward: [(0, '4.499')] +[2024-11-07 16:05:06,089][14461] Updated weights for policy 0, policy_version 4700 (0.0041) +[2024-11-07 16:05:10,766][14395] Fps is (10 sec: 4535.6, 60 sec: 5588.2, 300 sec: 5455.1). Total num frames: 19263488. Throughput: 0: 1385.6. Samples: 813096. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 16:05:10,772][14395] Avg episode reward: [(0, '4.577')] +[2024-11-07 16:05:14,927][14395] Fps is (10 sec: 4505.6, 60 sec: 5529.6, 300 sec: 5470.6). Total num frames: 19288064. Throughput: 0: 1329.8. Samples: 818690. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 16:05:14,930][14395] Avg episode reward: [(0, '4.508')] +[2024-11-07 16:05:15,214][14461] Updated weights for policy 0, policy_version 4710 (0.0036) +[2024-11-07 16:05:19,927][14395] Fps is (10 sec: 5365.2, 60 sec: 5461.3, 300 sec: 5442.8). Total num frames: 19312640. Throughput: 0: 1343.5. Samples: 821712. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-07 16:05:19,937][14395] Avg episode reward: [(0, '4.623')] +[2024-11-07 16:05:23,759][14461] Updated weights for policy 0, policy_version 4720 (0.0068) +[2024-11-07 16:05:24,927][14395] Fps is (10 sec: 4915.2, 60 sec: 5324.8, 300 sec: 5401.2). Total num frames: 19337216. Throughput: 0: 1355.1. Samples: 829614. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 16:05:24,929][14395] Avg episode reward: [(0, '4.623')] +[2024-11-07 16:05:29,927][14395] Fps is (10 sec: 4915.1, 60 sec: 5188.6, 300 sec: 5415.0). Total num frames: 19361792. Throughput: 0: 1304.9. Samples: 836456. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 16:05:29,931][14395] Avg episode reward: [(0, '4.453')] +[2024-11-07 16:05:31,546][14461] Updated weights for policy 0, policy_version 4730 (0.0066) +[2024-11-07 16:05:34,927][14395] Fps is (10 sec: 5734.4, 60 sec: 5452.7, 300 sec: 5456.7). Total num frames: 19394560. Throughput: 0: 1324.0. Samples: 841884. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 16:05:34,930][14395] Avg episode reward: [(0, '4.237')] +[2024-11-07 16:05:37,528][14461] Updated weights for policy 0, policy_version 4740 (0.0035) +[2024-11-07 16:05:39,927][14395] Fps is (10 sec: 6963.4, 60 sec: 5597.9, 300 sec: 5498.4). Total num frames: 19431424. Throughput: 0: 1362.9. Samples: 852294. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 16:05:39,929][14395] Avg episode reward: [(0, '4.496')] +[2024-11-07 16:05:42,481][14461] Updated weights for policy 0, policy_version 4750 (0.0028) +[2024-11-07 16:05:45,234][14395] Fps is (10 sec: 5961.3, 60 sec: 5501.5, 300 sec: 5464.9). Total num frames: 19456000. Throughput: 0: 1287.3. Samples: 858502. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 16:05:45,236][14395] Avg episode reward: [(0, '4.379')] +[2024-11-07 16:05:49,850][14461] Updated weights for policy 0, policy_version 4760 (0.0026) +[2024-11-07 16:05:49,927][14395] Fps is (10 sec: 6553.6, 60 sec: 5666.1, 300 sec: 5484.5). Total num frames: 19496960. Throughput: 0: 1380.0. Samples: 866006. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 16:05:49,929][14395] Avg episode reward: [(0, '4.544')] +[2024-11-07 16:05:54,927][14395] Fps is (10 sec: 8028.4, 60 sec: 5871.0, 300 sec: 5512.3). Total num frames: 19533824. Throughput: 0: 1476.0. Samples: 878278. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 16:05:54,929][14395] Avg episode reward: [(0, '4.481')] +[2024-11-07 16:05:54,982][14461] Updated weights for policy 0, policy_version 4770 (0.0025) +[2024-11-07 16:05:59,923][14461] Updated weights for policy 0, policy_version 4780 (0.0029) +[2024-11-07 16:05:59,927][14395] Fps is (10 sec: 8192.0, 60 sec: 6075.9, 300 sec: 5581.7). Total num frames: 19578880. Throughput: 0: 1594.6. Samples: 890448. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 16:05:59,929][14395] Avg episode reward: [(0, '4.536')] +[2024-11-07 16:06:04,927][14395] Fps is (10 sec: 7782.5, 60 sec: 6144.0, 300 sec: 5651.1). Total num frames: 19611648. Throughput: 0: 1644.9. Samples: 895734. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 16:06:04,928][14395] Avg episode reward: [(0, '4.282')] +[2024-11-07 16:06:05,438][14461] Updated weights for policy 0, policy_version 4790 (0.0027) +[2024-11-07 16:06:09,927][14395] Fps is (10 sec: 7372.8, 60 sec: 6577.3, 300 sec: 5706.6). Total num frames: 19652608. Throughput: 0: 1738.2. Samples: 907832. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 16:06:09,930][14395] Avg episode reward: [(0, '4.543')] +[2024-11-07 16:06:10,827][14461] Updated weights for policy 0, policy_version 4800 (0.0028) +[2024-11-07 16:06:14,927][14395] Fps is (10 sec: 7782.3, 60 sec: 6690.1, 300 sec: 5748.3). Total num frames: 19689472. Throughput: 0: 1832.5. Samples: 918920. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 16:06:14,932][14395] Avg episode reward: [(0, '4.297')] +[2024-11-07 16:06:16,033][14461] Updated weights for policy 0, policy_version 4810 (0.0029) +[2024-11-07 16:06:19,927][14395] Fps is (10 sec: 6143.9, 60 sec: 6690.1, 300 sec: 5748.3). Total num frames: 19714048. Throughput: 0: 1850.5. Samples: 925158. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-07 16:06:19,931][14395] Avg episode reward: [(0, '4.377')] +[2024-11-07 16:06:23,171][14461] Updated weights for policy 0, policy_version 4820 (0.0031) +[2024-11-07 16:06:24,927][14395] Fps is (10 sec: 6553.6, 60 sec: 6963.2, 300 sec: 5776.1). Total num frames: 19755008. Throughput: 0: 1792.7. Samples: 932966. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-07 16:06:24,929][14395] Avg episode reward: [(0, '4.330')] +[2024-11-07 16:06:28,033][14461] Updated weights for policy 0, policy_version 4830 (0.0030) +[2024-11-07 16:06:29,927][14395] Fps is (10 sec: 8192.1, 60 sec: 7236.3, 300 sec: 5803.8). Total num frames: 19795968. Throughput: 0: 1945.3. Samples: 945444. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 16:06:29,929][14395] Avg episode reward: [(0, '4.467')] +[2024-11-07 16:06:33,086][14461] Updated weights for policy 0, policy_version 4840 (0.0030) +[2024-11-07 16:06:34,927][14395] Fps is (10 sec: 8192.0, 60 sec: 7372.8, 300 sec: 5831.6). Total num frames: 19836928. Throughput: 0: 1899.5. Samples: 951484. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 16:06:34,929][14395] Avg episode reward: [(0, '4.250')] +[2024-11-07 16:06:37,870][14461] Updated weights for policy 0, policy_version 4850 (0.0034) +[2024-11-07 16:06:39,927][14395] Fps is (10 sec: 8192.1, 60 sec: 7441.1, 300 sec: 5914.9). Total num frames: 19877888. Throughput: 0: 1907.3. Samples: 964104. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 16:06:39,928][14395] Avg episode reward: [(0, '4.500')] +[2024-11-07 16:06:43,599][14461] Updated weights for policy 0, policy_version 4860 (0.0025) +[2024-11-07 16:06:44,927][14395] Fps is (10 sec: 7782.4, 60 sec: 7685.1, 300 sec: 5928.8). Total num frames: 19914752. Throughput: 0: 1879.1. Samples: 975008. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 16:06:44,930][14395] Avg episode reward: [(0, '4.327')] +[2024-11-07 16:06:49,088][14461] Updated weights for policy 0, policy_version 4870 (0.0028) +[2024-11-07 16:06:49,927][14395] Fps is (10 sec: 7372.8, 60 sec: 7577.6, 300 sec: 5970.5). Total num frames: 19951616. Throughput: 0: 1880.6. Samples: 980362. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 16:06:49,929][14395] Avg episode reward: [(0, '4.475')] +[2024-11-07 16:06:54,927][14395] Fps is (10 sec: 6144.0, 60 sec: 7372.8, 300 sec: 5928.8). Total num frames: 19976192. Throughput: 0: 1814.1. Samples: 989466. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 16:06:54,930][14395] Avg episode reward: [(0, '4.451')] +[2024-11-07 16:06:56,409][14461] Updated weights for policy 0, policy_version 4880 (0.0033) +[2024-11-07 16:06:58,350][14445] Stopping Batcher_0... +[2024-11-07 16:06:58,350][14395] Component Batcher_0 stopped! +[2024-11-07 16:06:58,352][14445] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000004884_20004864.pth... +[2024-11-07 16:06:58,355][14445] Loop batcher_evt_loop terminating... +[2024-11-07 16:06:58,434][14461] Weights refcount: 2 0 +[2024-11-07 16:06:58,440][14461] Stopping InferenceWorker_p0-w0... +[2024-11-07 16:06:58,441][14461] Loop inference_proc0-0_evt_loop terminating... +[2024-11-07 16:06:58,440][14395] Component InferenceWorker_p0-w0 stopped! +[2024-11-07 16:06:58,493][14395] Component RolloutWorker_w3 stopped! +[2024-11-07 16:06:58,495][14468] Stopping RolloutWorker_w3... +[2024-11-07 16:06:58,501][14468] Loop rollout_proc3_evt_loop terminating... +[2024-11-07 16:06:58,516][14445] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000004532_18563072.pth +[2024-11-07 16:06:58,521][14395] Component RolloutWorker_w4 stopped! +[2024-11-07 16:06:58,526][14445] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000004884_20004864.pth... +[2024-11-07 16:06:58,531][14477] Stopping RolloutWorker_w4... +[2024-11-07 16:06:58,535][14477] Loop rollout_proc4_evt_loop terminating... +[2024-11-07 16:06:58,540][14395] Component RolloutWorker_w1 stopped! +[2024-11-07 16:06:58,543][14467] Stopping RolloutWorker_w1... +[2024-11-07 16:06:58,554][14467] Loop rollout_proc1_evt_loop terminating... +[2024-11-07 16:06:58,556][14395] Component RolloutWorker_w0 stopped! +[2024-11-07 16:06:58,556][14462] Stopping RolloutWorker_w0... +[2024-11-07 16:06:58,564][14462] Loop rollout_proc0_evt_loop terminating... +[2024-11-07 16:06:58,608][14470] Stopping RolloutWorker_w6... +[2024-11-07 16:06:58,608][14395] Component RolloutWorker_w6 stopped! +[2024-11-07 16:06:58,609][14470] Loop rollout_proc6_evt_loop terminating... +[2024-11-07 16:06:58,728][14445] Stopping LearnerWorker_p0... +[2024-11-07 16:06:58,728][14445] Loop learner_proc0_evt_loop terminating... +[2024-11-07 16:06:58,728][14395] Component LearnerWorker_p0 stopped! +[2024-11-07 16:06:58,738][14395] Component RolloutWorker_w7 stopped! +[2024-11-07 16:06:58,740][14469] Stopping RolloutWorker_w7... +[2024-11-07 16:06:58,825][14469] Loop rollout_proc7_evt_loop terminating... +[2024-11-07 16:06:59,006][14395] Component RolloutWorker_w2 stopped! +[2024-11-07 16:06:59,004][14466] Stopping RolloutWorker_w2... +[2024-11-07 16:06:59,007][14466] Loop rollout_proc2_evt_loop terminating... +[2024-11-07 16:06:59,405][14395] Component RolloutWorker_w5 stopped! +[2024-11-07 16:06:59,412][14395] Waiting for process learner_proc0 to stop... +[2024-11-07 16:06:59,416][14478] Stopping RolloutWorker_w5... +[2024-11-07 16:06:59,424][14478] Loop rollout_proc5_evt_loop terminating... +[2024-11-07 16:07:02,027][14395] Waiting for process inference_proc0-0 to join... +[2024-11-07 16:07:02,029][14395] Waiting for process rollout_proc0 to join... +[2024-11-07 16:07:02,030][14395] Waiting for process rollout_proc1 to join... +[2024-11-07 16:07:02,031][14395] Waiting for process rollout_proc2 to join... +[2024-11-07 16:07:02,033][14395] Waiting for process rollout_proc3 to join... +[2024-11-07 16:07:02,035][14395] Waiting for process rollout_proc4 to join... +[2024-11-07 16:07:02,038][14395] Waiting for process rollout_proc5 to join... +[2024-11-07 16:07:02,040][14395] Waiting for process rollout_proc6 to join... +[2024-11-07 16:07:02,043][14395] Waiting for process rollout_proc7 to join... +[2024-11-07 16:07:02,046][14395] Batcher 0 profile tree view: +batching: 32.3104, releasing_batches: 0.0488 +[2024-11-07 16:07:02,048][14395] InferenceWorker_p0-w0 profile tree view: +wait_policy: 0.0004 + wait_policy_total: 9.8218 +update_model: 10.4168 + weight_update: 0.0027 +one_step: 0.0077 + handle_policy_step: 648.0918 + deserialize: 19.6855, stack: 3.1513, obs_to_device_normalize: 190.8703, forward: 275.0630, send_messages: 49.3024 + prepare_outputs: 88.3335 + to_cpu: 66.1440 +[2024-11-07 16:07:02,050][14395] Learner 0 profile tree view: +misc: 0.0066, prepare_batch: 31.8707 +train: 129.8584 + epoch_init: 0.0102, minibatch_init: 0.0110, losses_postprocess: 3.2725, kl_divergence: 1.3944, after_optimizer: 4.4368 + calculate_losses: 43.1480 + losses_init: 0.0066, forward_head: 4.2252, bptt_initial: 29.1641, tail: 1.5559, advantages_returns: 0.3835, losses: 3.9513 + bptt: 3.5262 + bptt_forward_core: 3.3479 + update: 76.8025 + clip: 1.5711 +[2024-11-07 16:07:02,052][14395] RolloutWorker_w0 profile tree view: +wait_for_trajectories: 0.2704, enqueue_policy_requests: 17.1022, env_step: 199.2890, overhead: 15.8391, complete_rollouts: 0.7729 +save_policy_outputs: 23.6707 + split_output_tensors: 8.6823 +[2024-11-07 16:07:02,054][14395] RolloutWorker_w7 profile tree view: +wait_for_trajectories: 0.2330, enqueue_policy_requests: 15.3368, env_step: 303.0272, overhead: 17.1887, complete_rollouts: 0.5677 +save_policy_outputs: 18.8034 + split_output_tensors: 6.3821 +[2024-11-07 16:07:02,056][14395] Loop Runner_EvtLoop terminating... +[2024-11-07 16:07:02,058][14395] Runner profile tree view: +main_loop: 717.3057 +[2024-11-07 16:07:02,062][14395] Collected {0: 20004864}, FPS: 5561.8 +[2024-11-07 16:07:04,057][14395] Loading existing experiment configuration from /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/config.json +[2024-11-07 16:07:04,058][14395] Overriding arg 'num_workers' with value 1 passed from command line +[2024-11-07 16:07:04,060][14395] Adding new argument 'no_render'=True that is not in the saved config file! +[2024-11-07 16:07:04,063][14395] Adding new argument 'save_video'=True that is not in the saved config file! +[2024-11-07 16:07:04,064][14395] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! +[2024-11-07 16:07:04,066][14395] Adding new argument 'video_name'=None that is not in the saved config file! +[2024-11-07 16:07:04,068][14395] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! +[2024-11-07 16:07:04,069][14395] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! +[2024-11-07 16:07:04,071][14395] Adding new argument 'push_to_hub'=False that is not in the saved config file! +[2024-11-07 16:07:04,073][14395] Adding new argument 'hf_repository'=None that is not in the saved config file! +[2024-11-07 16:07:04,076][14395] Adding new argument 'policy_index'=0 that is not in the saved config file! +[2024-11-07 16:07:04,078][14395] Adding new argument 'eval_deterministic'=False that is not in the saved config file! +[2024-11-07 16:07:04,079][14395] Adding new argument 'train_script'=None that is not in the saved config file! +[2024-11-07 16:07:04,080][14395] Adding new argument 'enjoy_script'=None that is not in the saved config file! +[2024-11-07 16:07:04,081][14395] Using frameskip 1 and render_action_repeat=4 for evaluation +[2024-11-07 16:07:04,121][14395] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 16:07:04,125][14395] RunningMeanStd input shape: (3, 72, 128) +[2024-11-07 16:07:04,128][14395] RunningMeanStd input shape: (1,) +[2024-11-07 16:07:04,150][14395] ConvEncoder: input_channels=3 +[2024-11-07 16:07:04,283][14395] Conv encoder output size: 512 +[2024-11-07 16:07:04,285][14395] Policy head output size: 512 +[2024-11-07 16:07:05,321][14395] Loading state from checkpoint /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000004884_20004864.pth... +[2024-11-07 16:07:06,188][14395] Num frames 100... +[2024-11-07 16:07:06,422][14395] Num frames 200... +[2024-11-07 16:07:06,617][14395] Num frames 300... +[2024-11-07 16:07:06,844][14395] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840 +[2024-11-07 16:07:06,845][14395] Avg episode reward: 3.840, avg true_objective: 3.840 +[2024-11-07 16:07:06,884][14395] Num frames 400... +[2024-11-07 16:07:07,096][14395] Num frames 500... +[2024-11-07 16:07:07,280][14395] Num frames 600... +[2024-11-07 16:07:07,482][14395] Num frames 700... +[2024-11-07 16:07:07,675][14395] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840 +[2024-11-07 16:07:07,677][14395] Avg episode reward: 3.840, avg true_objective: 3.840 +[2024-11-07 16:07:07,748][14395] Num frames 800... +[2024-11-07 16:07:07,932][14395] Num frames 900... +[2024-11-07 16:07:08,126][14395] Num frames 1000... +[2024-11-07 16:07:08,328][14395] Num frames 1100... +[2024-11-07 16:07:08,555][14395] Avg episode rewards: #0: 4.280, true rewards: #0: 3.947 +[2024-11-07 16:07:08,556][14395] Avg episode reward: 4.280, avg true_objective: 3.947 +[2024-11-07 16:07:08,592][14395] Num frames 1200... +[2024-11-07 16:07:08,781][14395] Num frames 1300... +[2024-11-07 16:07:08,969][14395] Num frames 1400... +[2024-11-07 16:07:09,158][14395] Num frames 1500... +[2024-11-07 16:07:09,362][14395] Avg episode rewards: #0: 4.170, true rewards: #0: 3.920 +[2024-11-07 16:07:09,364][14395] Avg episode reward: 4.170, avg true_objective: 3.920 +[2024-11-07 16:07:09,437][14395] Num frames 1600... +[2024-11-07 16:07:09,623][14395] Num frames 1700... +[2024-11-07 16:07:09,824][14395] Num frames 1800... +[2024-11-07 16:07:10,015][14395] Num frames 1900... +[2024-11-07 16:07:10,204][14395] Num frames 2000... +[2024-11-07 16:07:10,297][14395] Avg episode rewards: #0: 4.432, true rewards: #0: 4.032 +[2024-11-07 16:07:10,298][14395] Avg episode reward: 4.432, avg true_objective: 4.032 +[2024-11-07 16:07:10,613][14395] Num frames 2100... +[2024-11-07 16:07:10,776][14395] Num frames 2200... +[2024-11-07 16:07:10,931][14395] Num frames 2300... +[2024-11-07 16:07:11,083][14395] Num frames 2400... +[2024-11-07 16:07:11,136][14395] Avg episode rewards: #0: 4.333, true rewards: #0: 4.000 +[2024-11-07 16:07:11,141][14395] Avg episode reward: 4.333, avg true_objective: 4.000 +[2024-11-07 16:07:11,310][14395] Num frames 2500... +[2024-11-07 16:07:11,467][14395] Num frames 2600... +[2024-11-07 16:07:11,629][14395] Num frames 2700... +[2024-11-07 16:07:11,821][14395] Avg episode rewards: #0: 4.263, true rewards: #0: 3.977 +[2024-11-07 16:07:11,823][14395] Avg episode reward: 4.263, avg true_objective: 3.977 +[2024-11-07 16:07:11,861][14395] Num frames 2800... +[2024-11-07 16:07:12,057][14395] Num frames 2900... +[2024-11-07 16:07:12,242][14395] Num frames 3000... +[2024-11-07 16:07:12,403][14395] Num frames 3100... +[2024-11-07 16:07:12,565][14395] Avg episode rewards: #0: 4.210, true rewards: #0: 3.960 +[2024-11-07 16:07:12,568][14395] Avg episode reward: 4.210, avg true_objective: 3.960 +[2024-11-07 16:07:12,641][14395] Num frames 3200... +[2024-11-07 16:07:12,801][14395] Num frames 3300... +[2024-11-07 16:07:12,954][14395] Num frames 3400... +[2024-11-07 16:07:13,121][14395] Num frames 3500... +[2024-11-07 16:07:13,265][14395] Avg episode rewards: #0: 4.169, true rewards: #0: 3.947 +[2024-11-07 16:07:13,266][14395] Avg episode reward: 4.169, avg true_objective: 3.947 +[2024-11-07 16:07:13,351][14395] Num frames 3600... +[2024-11-07 16:07:13,513][14395] Num frames 3700... +[2024-11-07 16:07:13,670][14395] Num frames 3800... +[2024-11-07 16:07:13,825][14395] Num frames 3900... +[2024-11-07 16:07:13,938][14395] Avg episode rewards: #0: 4.136, true rewards: #0: 3.936 +[2024-11-07 16:07:13,942][14395] Avg episode reward: 4.136, avg true_objective: 3.936 +[2024-11-07 16:07:21,728][14395] Replay video saved to /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/replay.mp4! +[2024-11-07 16:07:24,406][14395] Loading existing experiment configuration from /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/config.json +[2024-11-07 16:07:24,407][14395] Overriding arg 'num_workers' with value 1 passed from command line +[2024-11-07 16:07:24,408][14395] Adding new argument 'no_render'=True that is not in the saved config file! +[2024-11-07 16:07:24,410][14395] Adding new argument 'save_video'=True that is not in the saved config file! +[2024-11-07 16:07:24,410][14395] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! +[2024-11-07 16:07:24,412][14395] Adding new argument 'video_name'=None that is not in the saved config file! +[2024-11-07 16:07:24,414][14395] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! +[2024-11-07 16:07:24,415][14395] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! +[2024-11-07 16:07:24,417][14395] Adding new argument 'push_to_hub'=True that is not in the saved config file! +[2024-11-07 16:07:24,419][14395] Adding new argument 'hf_repository'='alidenewade/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! +[2024-11-07 16:07:24,420][14395] Adding new argument 'policy_index'=0 that is not in the saved config file! +[2024-11-07 16:07:24,422][14395] Adding new argument 'eval_deterministic'=False that is not in the saved config file! +[2024-11-07 16:07:24,423][14395] Adding new argument 'train_script'=None that is not in the saved config file! +[2024-11-07 16:07:24,424][14395] Adding new argument 'enjoy_script'=None that is not in the saved config file! +[2024-11-07 16:07:24,426][14395] Using frameskip 1 and render_action_repeat=4 for evaluation +[2024-11-07 16:07:24,453][14395] RunningMeanStd input shape: (3, 72, 128) +[2024-11-07 16:07:24,455][14395] RunningMeanStd input shape: (1,) +[2024-11-07 16:07:24,467][14395] ConvEncoder: input_channels=3 +[2024-11-07 16:07:24,506][14395] Conv encoder output size: 512 +[2024-11-07 16:07:24,508][14395] Policy head output size: 512 +[2024-11-07 16:07:24,534][14395] Loading state from checkpoint /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000004884_20004864.pth... +[2024-11-07 16:07:25,029][14395] Num frames 100... +[2024-11-07 16:07:25,225][14395] Num frames 200... +[2024-11-07 16:07:25,408][14395] Num frames 300... +[2024-11-07 16:07:25,589][14395] Num frames 400... +[2024-11-07 16:07:25,732][14395] Avg episode rewards: #0: 5.480, true rewards: #0: 4.480 +[2024-11-07 16:07:25,735][14395] Avg episode reward: 5.480, avg true_objective: 4.480 +[2024-11-07 16:07:25,849][14395] Num frames 500... +[2024-11-07 16:07:26,042][14395] Num frames 600... +[2024-11-07 16:07:26,234][14395] Num frames 700... +[2024-11-07 16:07:28,651][14395] Num frames 800... +[2024-11-07 16:07:28,777][14395] Avg episode rewards: #0: 4.660, true rewards: #0: 4.160 +[2024-11-07 16:07:28,780][14395] Avg episode reward: 4.660, avg true_objective: 4.160 +[2024-11-07 16:07:28,970][14395] Num frames 900... +[2024-11-07 16:07:29,166][14395] Num frames 1000... +[2024-11-07 16:07:29,376][14395] Num frames 1100... +[2024-11-07 16:07:29,567][14395] Num frames 1200... +[2024-11-07 16:07:29,644][14395] Avg episode rewards: #0: 4.367, true rewards: #0: 4.033 +[2024-11-07 16:07:29,645][14395] Avg episode reward: 4.367, avg true_objective: 4.033 +[2024-11-07 16:07:29,827][14395] Num frames 1300... +[2024-11-07 16:07:30,024][14395] Num frames 1400... +[2024-11-07 16:07:30,223][14395] Num frames 1500... +[2024-11-07 16:07:30,454][14395] Avg episode rewards: #0: 4.235, true rewards: #0: 3.985 +[2024-11-07 16:07:30,458][14395] Avg episode reward: 4.235, avg true_objective: 3.985 +[2024-11-07 16:07:30,487][14395] Num frames 1600... +[2024-11-07 16:07:30,674][14395] Num frames 1700... +[2024-11-07 16:07:30,864][14395] Num frames 1800... +[2024-11-07 16:07:31,053][14395] Num frames 1900... +[2024-11-07 16:07:31,263][14395] Avg episode rewards: #0: 4.156, true rewards: #0: 3.956 +[2024-11-07 16:07:31,266][14395] Avg episode reward: 4.156, avg true_objective: 3.956 +[2024-11-07 16:07:31,328][14395] Num frames 2000... +[2024-11-07 16:07:31,507][14395] Num frames 2100... +[2024-11-07 16:07:31,687][14395] Num frames 2200... +[2024-11-07 16:07:31,875][14395] Num frames 2300... +[2024-11-07 16:07:32,054][14395] Num frames 2400... +[2024-11-07 16:07:32,158][14395] Avg episode rewards: #0: 4.377, true rewards: #0: 4.043 +[2024-11-07 16:07:32,160][14395] Avg episode reward: 4.377, avg true_objective: 4.043 +[2024-11-07 16:07:32,315][14395] Num frames 2500... +[2024-11-07 16:07:32,504][14395] Num frames 2600... +[2024-11-07 16:07:32,692][14395] Num frames 2700... +[2024-11-07 16:07:32,883][14395] Num frames 2800... +[2024-11-07 16:07:32,959][14395] Avg episode rewards: #0: 4.300, true rewards: #0: 4.014 +[2024-11-07 16:07:32,961][14395] Avg episode reward: 4.300, avg true_objective: 4.014 +[2024-11-07 16:07:33,137][14395] Num frames 2900... +[2024-11-07 16:07:33,331][14395] Num frames 3000... +[2024-11-07 16:07:33,528][14395] Num frames 3100... +[2024-11-07 16:07:33,770][14395] Avg episode rewards: #0: 4.243, true rewards: #0: 3.992 +[2024-11-07 16:07:33,771][14395] Avg episode reward: 4.243, avg true_objective: 3.992 +[2024-11-07 16:07:33,782][14395] Num frames 3200... +[2024-11-07 16:07:33,993][14395] Num frames 3300... +[2024-11-07 16:07:34,194][14395] Num frames 3400... +[2024-11-07 16:07:34,400][14395] Num frames 3500... +[2024-11-07 16:07:34,611][14395] Avg episode rewards: #0: 4.198, true rewards: #0: 3.976 +[2024-11-07 16:07:34,612][14395] Avg episode reward: 4.198, avg true_objective: 3.976 +[2024-11-07 16:07:34,657][14395] Num frames 3600... +[2024-11-07 16:07:34,862][14395] Num frames 3700... +[2024-11-07 16:07:35,065][14395] Num frames 3800... +[2024-11-07 16:07:35,201][14395] Avg episode rewards: #0: 4.034, true rewards: #0: 3.834 +[2024-11-07 16:07:35,203][14395] Avg episode reward: 4.034, avg true_objective: 3.834 +[2024-11-07 16:07:42,638][14395] Replay video saved to /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/replay.mp4! +[2024-11-07 16:07:53,307][14395] The model has been pushed to https://huggingface.co/alidenewade/rl_course_vizdoom_health_gathering_supreme +[2024-11-07 16:08:24,254][14395] Loading existing experiment configuration from /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/config.json +[2024-11-07 16:08:24,256][14395] Overriding arg 'num_workers' with value 1 passed from command line +[2024-11-07 16:08:24,257][14395] Adding new argument 'no_render'=True that is not in the saved config file! +[2024-11-07 16:08:24,258][14395] Adding new argument 'save_video'=True that is not in the saved config file! +[2024-11-07 16:08:24,259][14395] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! +[2024-11-07 16:08:24,262][14395] Adding new argument 'video_name'=None that is not in the saved config file! +[2024-11-07 16:08:24,264][14395] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! +[2024-11-07 16:08:24,265][14395] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! +[2024-11-07 16:08:24,268][14395] Adding new argument 'push_to_hub'=True that is not in the saved config file! +[2024-11-07 16:08:24,269][14395] Adding new argument 'hf_repository'='alidenewade/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! +[2024-11-07 16:08:24,272][14395] Adding new argument 'policy_index'=0 that is not in the saved config file! +[2024-11-07 16:08:24,273][14395] Adding new argument 'eval_deterministic'=False that is not in the saved config file! +[2024-11-07 16:08:24,274][14395] Adding new argument 'train_script'=None that is not in the saved config file! +[2024-11-07 16:08:24,275][14395] Adding new argument 'enjoy_script'=None that is not in the saved config file! +[2024-11-07 16:08:24,276][14395] Using frameskip 1 and render_action_repeat=4 for evaluation +[2024-11-07 16:08:24,308][14395] RunningMeanStd input shape: (3, 72, 128) +[2024-11-07 16:08:24,311][14395] RunningMeanStd input shape: (1,) +[2024-11-07 16:08:24,323][14395] ConvEncoder: input_channels=3 +[2024-11-07 16:08:24,379][14395] Conv encoder output size: 512 +[2024-11-07 16:08:24,381][14395] Policy head output size: 512 +[2024-11-07 16:08:24,412][14395] Loading state from checkpoint /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000004884_20004864.pth... +[2024-11-07 16:08:24,953][14395] Num frames 100... +[2024-11-07 16:08:25,158][14395] Num frames 200... +[2024-11-07 16:08:25,344][14395] Num frames 300... +[2024-11-07 16:08:25,563][14395] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840 +[2024-11-07 16:08:25,565][14395] Avg episode reward: 3.840, avg true_objective: 3.840 +[2024-11-07 16:08:25,612][14395] Num frames 400... +[2024-11-07 16:08:25,820][14395] Num frames 500... +[2024-11-07 16:08:26,035][14395] Num frames 600... +[2024-11-07 16:08:26,232][14395] Num frames 700... +[2024-11-07 16:08:26,429][14395] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840 +[2024-11-07 16:08:26,431][14395] Avg episode reward: 3.840, avg true_objective: 3.840 +[2024-11-07 16:08:26,540][14395] Num frames 800... +[2024-11-07 16:08:26,863][14395] Num frames 900... +[2024-11-07 16:08:27,163][14395] Num frames 1000... +[2024-11-07 16:08:27,481][14395] Num frames 1100... +[2024-11-07 16:08:27,680][14395] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840 +[2024-11-07 16:08:27,687][14395] Avg episode reward: 3.840, avg true_objective: 3.840 +[2024-11-07 16:08:27,800][14395] Num frames 1200... +[2024-11-07 16:08:28,027][14395] Num frames 1300... +[2024-11-07 16:08:28,439][14395] Num frames 1400... +[2024-11-07 16:08:28,674][14395] Num frames 1500... +[2024-11-07 16:08:28,812][14395] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840 +[2024-11-07 16:08:28,814][14395] Avg episode reward: 3.840, avg true_objective: 3.840 +[2024-11-07 16:08:28,979][14395] Num frames 1600... +[2024-11-07 16:08:29,169][14395] Num frames 1700... +[2024-11-07 16:08:29,354][14395] Num frames 1800... +[2024-11-07 16:08:29,558][14395] Num frames 1900... +[2024-11-07 16:08:29,652][14395] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840 +[2024-11-07 16:08:29,655][14395] Avg episode reward: 3.840, avg true_objective: 3.840 +[2024-11-07 16:08:29,833][14395] Num frames 2000... +[2024-11-07 16:08:30,030][14395] Num frames 2100... +[2024-11-07 16:08:30,213][14395] Num frames 2200... +[2024-11-07 16:08:30,393][14395] Num frames 2300... +[2024-11-07 16:08:30,585][14395] Num frames 2400... +[2024-11-07 16:08:30,638][14395] Avg episode rewards: #0: 4.167, true rewards: #0: 4.000 +[2024-11-07 16:08:30,641][14395] Avg episode reward: 4.167, avg true_objective: 4.000 +[2024-11-07 16:08:30,847][14395] Num frames 2500... +[2024-11-07 16:08:31,112][14395] Num frames 2600... +[2024-11-07 16:08:31,378][14395] Num frames 2700... +[2024-11-07 16:08:31,689][14395] Num frames 2800... +[2024-11-07 16:08:31,913][14395] Avg episode rewards: #0: 4.354, true rewards: #0: 4.069 +[2024-11-07 16:08:31,916][14395] Avg episode reward: 4.354, avg true_objective: 4.069 +[2024-11-07 16:08:32,101][14395] Num frames 2900... +[2024-11-07 16:08:32,362][14395] Num frames 3000... +[2024-11-07 16:08:32,550][14395] Num frames 3100... +[2024-11-07 16:08:32,738][14395] Num frames 3200... +[2024-11-07 16:08:32,921][14395] Num frames 3300... +[2024-11-07 16:08:33,029][14395] Avg episode rewards: #0: 4.660, true rewards: #0: 4.160 +[2024-11-07 16:08:33,034][14395] Avg episode reward: 4.660, avg true_objective: 4.160 +[2024-11-07 16:08:33,248][14395] Num frames 3400... +[2024-11-07 16:08:33,453][14395] Num frames 3500... +[2024-11-07 16:08:33,645][14395] Num frames 3600... +[2024-11-07 16:08:33,825][14395] Num frames 3700... +[2024-11-07 16:08:33,963][14395] Avg episode rewards: #0: 4.604, true rewards: #0: 4.160 +[2024-11-07 16:08:33,967][14395] Avg episode reward: 4.604, avg true_objective: 4.160 +[2024-11-07 16:08:34,085][14395] Num frames 3800... +[2024-11-07 16:08:34,269][14395] Num frames 3900... +[2024-11-07 16:08:34,459][14395] Num frames 4000... +[2024-11-07 16:08:34,644][14395] Num frames 4100... +[2024-11-07 16:08:34,757][14395] Avg episode rewards: #0: 4.528, true rewards: #0: 4.128 +[2024-11-07 16:08:34,764][14395] Avg episode reward: 4.528, avg true_objective: 4.128 +[2024-11-07 16:08:45,297][14395] Replay video saved to /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/replay.mp4! +[2024-11-07 16:08:50,418][14395] The model has been pushed to https://huggingface.co/alidenewade/rl_course_vizdoom_health_gathering_supreme +[2024-11-07 16:09:26,118][14395] Loading existing experiment configuration from /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/config.json +[2024-11-07 16:09:26,119][14395] Overriding arg 'num_workers' with value 1 passed from command line +[2024-11-07 16:09:26,121][14395] Adding new argument 'no_render'=True that is not in the saved config file! +[2024-11-07 16:09:26,123][14395] Adding new argument 'save_video'=True that is not in the saved config file! +[2024-11-07 16:09:26,125][14395] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! +[2024-11-07 16:09:26,126][14395] Adding new argument 'video_name'=None that is not in the saved config file! +[2024-11-07 16:09:26,127][14395] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! +[2024-11-07 16:09:26,129][14395] Adding new argument 'max_num_episodes'=100 that is not in the saved config file! +[2024-11-07 16:09:26,129][14395] Adding new argument 'push_to_hub'=True that is not in the saved config file! +[2024-11-07 16:09:26,131][14395] Adding new argument 'hf_repository'='alidenewade/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! +[2024-11-07 16:09:26,133][14395] Adding new argument 'policy_index'=0 that is not in the saved config file! +[2024-11-07 16:09:26,135][14395] Adding new argument 'eval_deterministic'=False that is not in the saved config file! +[2024-11-07 16:09:26,137][14395] Adding new argument 'train_script'=None that is not in the saved config file! +[2024-11-07 16:09:26,139][14395] Adding new argument 'enjoy_script'=None that is not in the saved config file! +[2024-11-07 16:09:26,141][14395] Using frameskip 1 and render_action_repeat=4 for evaluation +[2024-11-07 16:09:26,167][14395] RunningMeanStd input shape: (3, 72, 128) +[2024-11-07 16:09:26,169][14395] RunningMeanStd input shape: (1,) +[2024-11-07 16:09:26,185][14395] ConvEncoder: input_channels=3 +[2024-11-07 16:09:26,249][14395] Conv encoder output size: 512 +[2024-11-07 16:09:26,252][14395] Policy head output size: 512 +[2024-11-07 16:09:26,286][14395] Loading state from checkpoint /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000004884_20004864.pth... +[2024-11-07 16:09:26,835][14395] Num frames 100... +[2024-11-07 16:09:27,044][14395] Num frames 200... +[2024-11-07 16:09:27,277][14395] Num frames 300... +[2024-11-07 16:09:27,507][14395] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840 +[2024-11-07 16:09:27,511][14395] Avg episode reward: 3.840, avg true_objective: 3.840 +[2024-11-07 16:09:27,567][14395] Num frames 400... +[2024-11-07 16:09:27,763][14395] Num frames 500... +[2024-11-07 16:09:27,959][14395] Num frames 600... +[2024-11-07 16:09:28,153][14395] Num frames 700... +[2024-11-07 16:09:28,347][14395] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840 +[2024-11-07 16:09:28,351][14395] Avg episode reward: 3.840, avg true_objective: 3.840 +[2024-11-07 16:09:28,430][14395] Num frames 800... +[2024-11-07 16:09:28,624][14395] Num frames 900... +[2024-11-07 16:09:28,817][14395] Num frames 1000... +[2024-11-07 16:09:29,018][14395] Num frames 1100... +[2024-11-07 16:09:29,172][14395] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840 +[2024-11-07 16:09:29,173][14395] Avg episode reward: 3.840, avg true_objective: 3.840 +[2024-11-07 16:09:29,272][14395] Num frames 1200... +[2024-11-07 16:09:29,467][14395] Num frames 1300... +[2024-11-07 16:09:29,656][14395] Num frames 1400... +[2024-11-07 16:09:29,844][14395] Num frames 1500... +[2024-11-07 16:09:29,964][14395] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840 +[2024-11-07 16:09:29,965][14395] Avg episode reward: 3.840, avg true_objective: 3.840 +[2024-11-07 16:09:30,090][14395] Num frames 1600... +[2024-11-07 16:09:30,292][14395] Num frames 1700... +[2024-11-07 16:09:30,534][14395] Avg episode rewards: #0: 3.584, true rewards: #0: 3.584 +[2024-11-07 16:09:30,536][14395] Avg episode reward: 3.584, avg true_objective: 3.584 +[2024-11-07 16:09:30,562][14395] Num frames 1800... +[2024-11-07 16:09:30,759][14395] Num frames 1900... +[2024-11-07 16:09:30,968][14395] Num frames 2000... +[2024-11-07 16:09:31,168][14395] Num frames 2100... +[2024-11-07 16:09:31,376][14395] Avg episode rewards: #0: 3.627, true rewards: #0: 3.627 +[2024-11-07 16:09:31,379][14395] Avg episode reward: 3.627, avg true_objective: 3.627 +[2024-11-07 16:09:31,445][14395] Num frames 2200... +[2024-11-07 16:09:31,636][14395] Num frames 2300... +[2024-11-07 16:09:31,850][14395] Num frames 2400... +[2024-11-07 16:09:32,051][14395] Num frames 2500... +[2024-11-07 16:09:32,253][14395] Num frames 2600... +[2024-11-07 16:09:32,354][14395] Avg episode rewards: #0: 3.891, true rewards: #0: 3.749 +[2024-11-07 16:09:32,356][14395] Avg episode reward: 3.891, avg true_objective: 3.749 +[2024-11-07 16:09:32,507][14395] Num frames 2700... +[2024-11-07 16:09:32,694][14395] Num frames 2800... +[2024-11-07 16:09:32,885][14395] Num frames 2900... +[2024-11-07 16:09:33,093][14395] Num frames 3000... +[2024-11-07 16:09:33,228][14395] Avg episode rewards: #0: 4.050, true rewards: #0: 3.800 +[2024-11-07 16:09:33,234][14395] Avg episode reward: 4.050, avg true_objective: 3.800 +[2024-11-07 16:09:33,365][14395] Num frames 3100... +[2024-11-07 16:09:33,555][14395] Num frames 3200... +[2024-11-07 16:09:33,754][14395] Num frames 3300... +[2024-11-07 16:09:33,949][14395] Num frames 3400... +[2024-11-07 16:09:34,055][14395] Avg episode rewards: #0: 4.027, true rewards: #0: 3.804 +[2024-11-07 16:09:34,057][14395] Avg episode reward: 4.027, avg true_objective: 3.804 +[2024-11-07 16:09:34,202][14395] Num frames 3500... +[2024-11-07 16:09:34,399][14395] Num frames 3600... +[2024-11-07 16:09:34,662][14395] Num frames 3700... +[2024-11-07 16:09:34,858][14395] Num frames 3800... +[2024-11-07 16:09:34,932][14395] Avg episode rewards: #0: 4.008, true rewards: #0: 3.808 +[2024-11-07 16:09:34,933][14395] Avg episode reward: 4.008, avg true_objective: 3.808 +[2024-11-07 16:09:35,129][14395] Num frames 3900... +[2024-11-07 16:09:35,326][14395] Num frames 4000... +[2024-11-07 16:09:35,511][14395] Num frames 4100... +[2024-11-07 16:09:35,742][14395] Avg episode rewards: #0: 3.993, true rewards: #0: 3.811 +[2024-11-07 16:09:35,744][14395] Avg episode reward: 3.993, avg true_objective: 3.811 +[2024-11-07 16:09:35,766][14395] Num frames 4200... +[2024-11-07 16:09:35,963][14395] Num frames 4300... +[2024-11-07 16:09:36,166][14395] Num frames 4400... +[2024-11-07 16:09:36,354][14395] Num frames 4500... +[2024-11-07 16:09:36,549][14395] Num frames 4600... +[2024-11-07 16:09:36,621][14395] Avg episode rewards: #0: 4.007, true rewards: #0: 3.840 +[2024-11-07 16:09:36,622][14395] Avg episode reward: 4.007, avg true_objective: 3.840 +[2024-11-07 16:09:36,806][14395] Num frames 4700... +[2024-11-07 16:09:36,997][14395] Num frames 4800... +[2024-11-07 16:09:37,178][14395] Num frames 4900... +[2024-11-07 16:09:37,407][14395] Avg episode rewards: #0: 3.994, true rewards: #0: 3.840 +[2024-11-07 16:09:37,411][14395] Avg episode reward: 3.994, avg true_objective: 3.840 +[2024-11-07 16:09:37,447][14395] Num frames 5000... +[2024-11-07 16:09:37,639][14395] Num frames 5100... +[2024-11-07 16:09:37,828][14395] Num frames 5200... +[2024-11-07 16:09:38,040][14395] Num frames 5300... +[2024-11-07 16:09:38,224][14395] Num frames 5400... +[2024-11-07 16:09:38,356][14395] Avg episode rewards: #0: 4.100, true rewards: #0: 3.886 +[2024-11-07 16:09:38,361][14395] Avg episode reward: 4.100, avg true_objective: 3.886 +[2024-11-07 16:09:38,506][14395] Num frames 5500... +[2024-11-07 16:09:38,698][14395] Num frames 5600... +[2024-11-07 16:09:38,900][14395] Num frames 5700... +[2024-11-07 16:09:39,096][14395] Num frames 5800... +[2024-11-07 16:09:39,316][14395] Avg episode rewards: #0: 4.192, true rewards: #0: 3.925 +[2024-11-07 16:09:39,317][14395] Avg episode reward: 4.192, avg true_objective: 3.925 +[2024-11-07 16:09:39,345][14395] Num frames 5900... +[2024-11-07 16:09:39,538][14395] Num frames 6000... +[2024-11-07 16:09:39,737][14395] Num frames 6100... +[2024-11-07 16:09:39,941][14395] Num frames 6200... +[2024-11-07 16:09:40,136][14395] Avg episode rewards: #0: 4.170, true rewards: #0: 3.920 +[2024-11-07 16:09:40,138][14395] Avg episode reward: 4.170, avg true_objective: 3.920 +[2024-11-07 16:09:40,212][14395] Num frames 6300... +[2024-11-07 16:09:40,412][14395] Num frames 6400... +[2024-11-07 16:09:40,599][14395] Num frames 6500... +[2024-11-07 16:09:40,779][14395] Num frames 6600... +[2024-11-07 16:09:40,963][14395] Avg episode rewards: #0: 4.151, true rewards: #0: 3.915 +[2024-11-07 16:09:40,968][14395] Avg episode reward: 4.151, avg true_objective: 3.915 +[2024-11-07 16:09:41,071][14395] Num frames 6700... +[2024-11-07 16:09:41,258][14395] Num frames 6800... +[2024-11-07 16:09:41,475][14395] Num frames 6900... +[2024-11-07 16:09:41,672][14395] Num frames 7000... +[2024-11-07 16:09:41,809][14395] Avg episode rewards: #0: 4.133, true rewards: #0: 3.911 +[2024-11-07 16:09:41,810][14395] Avg episode reward: 4.133, avg true_objective: 3.911 +[2024-11-07 16:09:41,946][14395] Num frames 7100... +[2024-11-07 16:09:42,137][14395] Num frames 7200... +[2024-11-07 16:09:42,340][14395] Num frames 7300... +[2024-11-07 16:09:42,527][14395] Num frames 7400... +[2024-11-07 16:09:42,627][14395] Avg episode rewards: #0: 4.118, true rewards: #0: 3.907 +[2024-11-07 16:09:42,628][14395] Avg episode reward: 4.118, avg true_objective: 3.907 +[2024-11-07 16:09:42,795][14395] Num frames 7500... +[2024-11-07 16:09:43,002][14395] Num frames 7600... +[2024-11-07 16:09:43,219][14395] Num frames 7700... +[2024-11-07 16:09:43,414][14395] Num frames 7800... +[2024-11-07 16:09:43,486][14395] Avg episode rewards: #0: 4.104, true rewards: #0: 3.904 +[2024-11-07 16:09:43,491][14395] Avg episode reward: 4.104, avg true_objective: 3.904 +[2024-11-07 16:09:43,687][14395] Num frames 7900... +[2024-11-07 16:09:43,877][14395] Num frames 8000... +[2024-11-07 16:09:44,052][14395] Num frames 8100... +[2024-11-07 16:09:46,452][14395] Num frames 8200... +[2024-11-07 16:09:46,609][14395] Avg episode rewards: #0: 4.170, true rewards: #0: 3.931 +[2024-11-07 16:09:46,612][14395] Avg episode reward: 4.170, avg true_objective: 3.931 +[2024-11-07 16:09:46,712][14395] Num frames 8300... +[2024-11-07 16:09:46,891][14395] Num frames 8400... +[2024-11-07 16:09:47,075][14395] Num frames 8500... +[2024-11-07 16:09:47,257][14395] Num frames 8600... +[2024-11-07 16:09:47,386][14395] Avg episode rewards: #0: 4.155, true rewards: #0: 3.927 +[2024-11-07 16:09:47,389][14395] Avg episode reward: 4.155, avg true_objective: 3.927 +[2024-11-07 16:09:47,548][14395] Num frames 8700... +[2024-11-07 16:09:47,730][14395] Num frames 8800... +[2024-11-07 16:09:47,914][14395] Num frames 8900... +[2024-11-07 16:09:48,122][14395] Num frames 9000... +[2024-11-07 16:09:48,225][14395] Avg episode rewards: #0: 4.141, true rewards: #0: 3.923 +[2024-11-07 16:09:48,227][14395] Avg episode reward: 4.141, avg true_objective: 3.923 +[2024-11-07 16:09:48,378][14395] Num frames 9100... +[2024-11-07 16:09:48,574][14395] Num frames 9200... +[2024-11-07 16:09:48,764][14395] Num frames 9300... +[2024-11-07 16:09:48,960][14395] Num frames 9400... +[2024-11-07 16:09:49,031][14395] Avg episode rewards: #0: 4.128, true rewards: #0: 3.920 +[2024-11-07 16:09:49,032][14395] Avg episode reward: 4.128, avg true_objective: 3.920 +[2024-11-07 16:09:49,218][14395] Num frames 9500... +[2024-11-07 16:09:49,410][14395] Num frames 9600... +[2024-11-07 16:09:49,594][14395] Num frames 9700... +[2024-11-07 16:09:49,813][14395] Avg episode rewards: #0: 4.117, true rewards: #0: 3.917 +[2024-11-07 16:09:49,818][14395] Avg episode reward: 4.117, avg true_objective: 3.917 +[2024-11-07 16:09:49,855][14395] Num frames 9800... +[2024-11-07 16:09:50,048][14395] Num frames 9900... +[2024-11-07 16:09:50,240][14395] Num frames 10000... +[2024-11-07 16:09:50,477][14395] Num frames 10100... +[2024-11-07 16:09:50,667][14395] Num frames 10200... +[2024-11-07 16:09:50,855][14395] Num frames 10300... +[2024-11-07 16:09:50,991][14395] Avg episode rewards: #0: 4.245, true rewards: #0: 3.975 +[2024-11-07 16:09:50,993][14395] Avg episode reward: 4.245, avg true_objective: 3.975 +[2024-11-07 16:09:51,117][14395] Num frames 10400... +[2024-11-07 16:09:51,305][14395] Num frames 10500... +[2024-11-07 16:09:51,496][14395] Num frames 10600... +[2024-11-07 16:09:51,683][14395] Num frames 10700... +[2024-11-07 16:09:51,898][14395] Avg episode rewards: #0: 4.290, true rewards: #0: 3.994 +[2024-11-07 16:09:51,900][14395] Avg episode reward: 4.290, avg true_objective: 3.994 +[2024-11-07 16:09:51,945][14395] Num frames 10800... +[2024-11-07 16:09:52,134][14395] Num frames 10900... +[2024-11-07 16:09:52,322][14395] Num frames 11000... +[2024-11-07 16:09:52,503][14395] Num frames 11100... +[2024-11-07 16:09:52,714][14395] Num frames 11200... +[2024-11-07 16:09:52,835][14395] Avg episode rewards: #0: 4.333, true rewards: #0: 4.011 +[2024-11-07 16:09:52,837][14395] Avg episode reward: 4.333, avg true_objective: 4.011 +[2024-11-07 16:09:53,000][14395] Num frames 11300... +[2024-11-07 16:09:53,185][14395] Num frames 11400... +[2024-11-07 16:09:53,377][14395] Num frames 11500... +[2024-11-07 16:09:53,569][14395] Num frames 11600... +[2024-11-07 16:09:53,657][14395] Avg episode rewards: #0: 4.316, true rewards: #0: 4.006 +[2024-11-07 16:09:53,658][14395] Avg episode reward: 4.316, avg true_objective: 4.006 +[2024-11-07 16:09:53,820][14395] Num frames 11700... +[2024-11-07 16:09:54,017][14395] Num frames 11800... +[2024-11-07 16:09:54,209][14395] Num frames 11900... +[2024-11-07 16:09:54,397][14395] Num frames 12000... +[2024-11-07 16:09:54,449][14395] Avg episode rewards: #0: 4.300, true rewards: #0: 4.000 +[2024-11-07 16:09:54,450][14395] Avg episode reward: 4.300, avg true_objective: 4.000 +[2024-11-07 16:09:54,644][14395] Num frames 12100... +[2024-11-07 16:09:54,838][14395] Num frames 12200... +[2024-11-07 16:09:55,030][14395] Num frames 12300... +[2024-11-07 16:09:55,208][14395] Num frames 12400... +[2024-11-07 16:09:55,353][14395] Avg episode rewards: #0: 4.338, true rewards: #0: 4.015 +[2024-11-07 16:09:55,356][14395] Avg episode reward: 4.338, avg true_objective: 4.015 +[2024-11-07 16:09:55,475][14395] Num frames 12500... +[2024-11-07 16:09:55,663][14395] Num frames 12600... +[2024-11-07 16:09:55,844][14395] Num frames 12700... +[2024-11-07 16:09:56,028][14395] Num frames 12800... +[2024-11-07 16:09:56,222][14395] Num frames 12900... +[2024-11-07 16:09:56,398][14395] Num frames 13000... +[2024-11-07 16:09:56,501][14395] Avg episode rewards: #0: 4.476, true rewards: #0: 4.070 +[2024-11-07 16:09:56,505][14395] Avg episode reward: 4.476, avg true_objective: 4.070 +[2024-11-07 16:09:56,666][14395] Num frames 13100... +[2024-11-07 16:09:56,854][14395] Num frames 13200... +[2024-11-07 16:09:57,053][14395] Num frames 13300... +[2024-11-07 16:09:57,243][14395] Num frames 13400... +[2024-11-07 16:09:57,313][14395] Avg episode rewards: #0: 4.457, true rewards: #0: 4.063 +[2024-11-07 16:09:57,318][14395] Avg episode reward: 4.457, avg true_objective: 4.063 +[2024-11-07 16:09:57,497][14395] Num frames 13500... +[2024-11-07 16:09:57,678][14395] Num frames 13600... +[2024-11-07 16:09:57,862][14395] Num frames 13700... +[2024-11-07 16:09:58,098][14395] Avg episode rewards: #0: 4.439, true rewards: #0: 4.056 +[2024-11-07 16:09:58,100][14395] Avg episode reward: 4.439, avg true_objective: 4.056 +[2024-11-07 16:09:58,118][14395] Num frames 13800... +[2024-11-07 16:09:58,312][14395] Num frames 13900... +[2024-11-07 16:09:58,501][14395] Num frames 14000... +[2024-11-07 16:09:58,681][14395] Num frames 14100... +[2024-11-07 16:09:58,870][14395] Avg episode rewards: #0: 4.422, true rewards: #0: 4.050 +[2024-11-07 16:09:58,874][14395] Avg episode reward: 4.422, avg true_objective: 4.050 +[2024-11-07 16:09:58,936][14395] Num frames 14200... +[2024-11-07 16:09:59,123][14395] Num frames 14300... +[2024-11-07 16:09:59,301][14395] Num frames 14400... +[2024-11-07 16:09:59,485][14395] Num frames 14500... +[2024-11-07 16:09:59,650][14395] Avg episode rewards: #0: 4.406, true rewards: #0: 4.044 +[2024-11-07 16:09:59,656][14395] Avg episode reward: 4.406, avg true_objective: 4.044 +[2024-11-07 16:09:59,752][14395] Num frames 14600... +[2024-11-07 16:09:59,946][14395] Num frames 14700... +[2024-11-07 16:10:00,136][14395] Num frames 14800... +[2024-11-07 16:10:00,336][14395] Num frames 14900... +[2024-11-07 16:10:00,482][14395] Avg episode rewards: #0: 4.390, true rewards: #0: 4.039 +[2024-11-07 16:10:00,483][14395] Avg episode reward: 4.390, avg true_objective: 4.039 +[2024-11-07 16:10:00,606][14395] Num frames 15000... +[2024-11-07 16:10:00,826][14395] Num frames 15100... +[2024-11-07 16:10:01,119][14395] Num frames 15200... +[2024-11-07 16:10:01,381][14395] Num frames 15300... +[2024-11-07 16:10:01,665][14395] Avg episode rewards: #0: 4.419, true rewards: #0: 4.051 +[2024-11-07 16:10:01,670][14395] Avg episode reward: 4.419, avg true_objective: 4.051 +[2024-11-07 16:10:01,702][14395] Num frames 15400... +[2024-11-07 16:10:01,968][14395] Num frames 15500... +[2024-11-07 16:10:02,190][14395] Num frames 15600... +[2024-11-07 16:10:02,413][14395] Num frames 15700... +[2024-11-07 16:10:02,638][14395] Avg episode rewards: #0: 4.404, true rewards: #0: 4.045 +[2024-11-07 16:10:02,643][14395] Avg episode reward: 4.404, avg true_objective: 4.045 +[2024-11-07 16:10:02,707][14395] Num frames 15800... +[2024-11-07 16:10:02,891][14395] Num frames 15900... +[2024-11-07 16:10:03,090][14395] Num frames 16000... +[2024-11-07 16:10:03,276][14395] Num frames 16100... +[2024-11-07 16:10:03,445][14395] Avg episode rewards: #0: 4.390, true rewards: #0: 4.040 +[2024-11-07 16:10:03,447][14395] Avg episode reward: 4.390, avg true_objective: 4.040 +[2024-11-07 16:10:03,532][14395] Num frames 16200... +[2024-11-07 16:10:03,762][14395] Num frames 16300... +[2024-11-07 16:10:03,992][14395] Num frames 16400... +[2024-11-07 16:10:04,228][14395] Num frames 16500... +[2024-11-07 16:10:04,387][14395] Avg episode rewards: #0: 4.377, true rewards: #0: 4.035 +[2024-11-07 16:10:04,388][14395] Avg episode reward: 4.377, avg true_objective: 4.035 +[2024-11-07 16:10:04,519][14395] Num frames 16600... +[2024-11-07 16:10:04,743][14395] Num frames 16700... +[2024-11-07 16:10:04,945][14395] Num frames 16800... +[2024-11-07 16:10:05,128][14395] Num frames 16900... +[2024-11-07 16:10:05,350][14395] Avg episode rewards: #0: 4.403, true rewards: #0: 4.046 +[2024-11-07 16:10:05,354][14395] Avg episode reward: 4.403, avg true_objective: 4.046 +[2024-11-07 16:10:05,389][14395] Num frames 17000... +[2024-11-07 16:10:05,597][14395] Num frames 17100... +[2024-11-07 16:10:05,822][14395] Num frames 17200... +[2024-11-07 16:10:06,051][14395] Num frames 17300... +[2024-11-07 16:10:06,274][14395] Avg episode rewards: #0: 4.390, true rewards: #0: 4.041 +[2024-11-07 16:10:06,279][14395] Avg episode reward: 4.390, avg true_objective: 4.041 +[2024-11-07 16:10:06,352][14395] Num frames 17400... +[2024-11-07 16:10:06,566][14395] Num frames 17500... +[2024-11-07 16:10:06,786][14395] Num frames 17600... +[2024-11-07 16:10:07,004][14395] Num frames 17700... +[2024-11-07 16:10:07,218][14395] Num frames 17800... +[2024-11-07 16:10:07,331][14395] Avg episode rewards: #0: 4.415, true rewards: #0: 4.051 +[2024-11-07 16:10:07,335][14395] Avg episode reward: 4.415, avg true_objective: 4.051 +[2024-11-07 16:10:07,519][14395] Num frames 17900... +[2024-11-07 16:10:07,738][14395] Num frames 18000... +[2024-11-07 16:10:07,956][14395] Num frames 18100... +[2024-11-07 16:10:08,165][14395] Num frames 18200... +[2024-11-07 16:10:08,239][14395] Avg episode rewards: #0: 4.402, true rewards: #0: 4.046 +[2024-11-07 16:10:08,242][14395] Avg episode reward: 4.402, avg true_objective: 4.046 +[2024-11-07 16:10:08,475][14395] Num frames 18300... +[2024-11-07 16:10:08,694][14395] Num frames 18400... +[2024-11-07 16:10:08,929][14395] Num frames 18500... +[2024-11-07 16:10:09,146][14395] Num frames 18600... +[2024-11-07 16:10:09,306][14395] Avg episode rewards: #0: 4.425, true rewards: #0: 4.056 +[2024-11-07 16:10:09,307][14395] Avg episode reward: 4.425, avg true_objective: 4.056 +[2024-11-07 16:10:09,399][14395] Num frames 18700... +[2024-11-07 16:10:09,581][14395] Num frames 18800... +[2024-11-07 16:10:09,767][14395] Num frames 18900... +[2024-11-07 16:10:09,958][14395] Num frames 19000... +[2024-11-07 16:10:10,102][14395] Avg episode rewards: #0: 4.413, true rewards: #0: 4.051 +[2024-11-07 16:10:10,104][14395] Avg episode reward: 4.413, avg true_objective: 4.051 +[2024-11-07 16:10:10,218][14395] Num frames 19100... +[2024-11-07 16:10:10,396][14395] Num frames 19200... +[2024-11-07 16:10:10,597][14395] Num frames 19300... +[2024-11-07 16:10:10,809][14395] Num frames 19400... +[2024-11-07 16:10:10,910][14395] Avg episode rewards: #0: 4.401, true rewards: #0: 4.047 +[2024-11-07 16:10:10,915][14395] Avg episode reward: 4.401, avg true_objective: 4.047 +[2024-11-07 16:10:11,078][14395] Num frames 19500... +[2024-11-07 16:10:11,327][14395] Num frames 19600... +[2024-11-07 16:10:11,600][14395] Num frames 19700... +[2024-11-07 16:10:11,798][14395] Num frames 19800... +[2024-11-07 16:10:11,869][14395] Avg episode rewards: #0: 4.389, true rewards: #0: 4.042 +[2024-11-07 16:10:11,873][14395] Avg episode reward: 4.389, avg true_objective: 4.042 +[2024-11-07 16:10:12,193][14395] Num frames 19900... +[2024-11-07 16:10:12,583][14395] Num frames 20000... +[2024-11-07 16:10:12,836][14395] Num frames 20100... +[2024-11-07 16:10:13,060][14395] Num frames 20200... +[2024-11-07 16:10:13,248][14395] Avg episode rewards: #0: 4.411, true rewards: #0: 4.051 +[2024-11-07 16:10:13,251][14395] Avg episode reward: 4.411, avg true_objective: 4.051 +[2024-11-07 16:10:13,401][14395] Num frames 20300... +[2024-11-07 16:10:13,785][14395] Num frames 20400... +[2024-11-07 16:10:14,056][14395] Num frames 20500... +[2024-11-07 16:10:14,360][14395] Num frames 20600... +[2024-11-07 16:10:14,506][14395] Avg episode rewards: #0: 4.400, true rewards: #0: 4.047 +[2024-11-07 16:10:14,508][14395] Avg episode reward: 4.400, avg true_objective: 4.047 +[2024-11-07 16:10:14,635][14395] Num frames 20700... +[2024-11-07 16:10:14,849][14395] Num frames 20800... +[2024-11-07 16:10:15,083][14395] Num frames 20900... +[2024-11-07 16:10:15,335][14395] Avg episode rewards: #0: 4.402, true rewards: #0: 4.037 +[2024-11-07 16:10:15,339][14395] Avg episode reward: 4.402, avg true_objective: 4.037 +[2024-11-07 16:10:15,371][14395] Num frames 21000... +[2024-11-07 16:10:15,627][14395] Num frames 21100... +[2024-11-07 16:10:15,848][14395] Num frames 21200... +[2024-11-07 16:10:16,051][14395] Num frames 21300... +[2024-11-07 16:10:16,259][14395] Avg episode rewards: #0: 4.392, true rewards: #0: 4.033 +[2024-11-07 16:10:16,260][14395] Avg episode reward: 4.392, avg true_objective: 4.033 +[2024-11-07 16:10:16,328][14395] Num frames 21400... +[2024-11-07 16:10:16,522][14395] Num frames 21500... +[2024-11-07 16:10:16,710][14395] Num frames 21600... +[2024-11-07 16:10:16,911][14395] Num frames 21700... +[2024-11-07 16:10:17,083][14395] Avg episode rewards: #0: 4.381, true rewards: #0: 4.030 +[2024-11-07 16:10:17,089][14395] Avg episode reward: 4.381, avg true_objective: 4.030 +[2024-11-07 16:10:17,191][14395] Num frames 21800... +[2024-11-07 16:10:17,381][14395] Num frames 21900... +[2024-11-07 16:10:17,577][14395] Num frames 22000... +[2024-11-07 16:10:17,776][14395] Num frames 22100... +[2024-11-07 16:10:17,924][14395] Avg episode rewards: #0: 4.372, true rewards: #0: 4.026 +[2024-11-07 16:10:17,929][14395] Avg episode reward: 4.372, avg true_objective: 4.026 +[2024-11-07 16:10:18,049][14395] Num frames 22200... +[2024-11-07 16:10:18,235][14395] Num frames 22300... +[2024-11-07 16:10:18,426][14395] Num frames 22400... +[2024-11-07 16:10:18,610][14395] Num frames 22500... +[2024-11-07 16:10:20,933][14395] Avg episode rewards: #0: 4.362, true rewards: #0: 4.023 +[2024-11-07 16:10:20,937][14395] Avg episode reward: 4.362, avg true_objective: 4.023 +[2024-11-07 16:10:21,096][14395] Num frames 22600... +[2024-11-07 16:10:21,280][14395] Num frames 22700... +[2024-11-07 16:10:21,464][14395] Num frames 22800... +[2024-11-07 16:10:21,652][14395] Num frames 22900... +[2024-11-07 16:10:21,848][14395] Avg episode rewards: #0: 4.382, true rewards: #0: 4.031 +[2024-11-07 16:10:21,850][14395] Avg episode reward: 4.382, avg true_objective: 4.031 +[2024-11-07 16:10:21,920][14395] Num frames 23000... +[2024-11-07 16:10:22,121][14395] Num frames 23100... +[2024-11-07 16:10:22,308][14395] Num frames 23200... +[2024-11-07 16:10:22,494][14395] Num frames 23300... +[2024-11-07 16:10:22,662][14395] Avg episode rewards: #0: 4.372, true rewards: #0: 4.028 +[2024-11-07 16:10:22,667][14395] Avg episode reward: 4.372, avg true_objective: 4.028 +[2024-11-07 16:10:22,769][14395] Num frames 23400... +[2024-11-07 16:10:22,958][14395] Num frames 23500... +[2024-11-07 16:10:23,146][14395] Num frames 23600... +[2024-11-07 16:10:23,350][14395] Num frames 23700... +[2024-11-07 16:10:23,500][14395] Avg episode rewards: #0: 4.363, true rewards: #0: 4.024 +[2024-11-07 16:10:23,501][14395] Avg episode reward: 4.363, avg true_objective: 4.024 +[2024-11-07 16:10:23,620][14395] Num frames 23800... +[2024-11-07 16:10:23,829][14395] Num frames 23900... +[2024-11-07 16:10:24,027][14395] Num frames 24000... +[2024-11-07 16:10:24,234][14395] Num frames 24100... +[2024-11-07 16:10:24,488][14395] Num frames 24200... +[2024-11-07 16:10:24,602][14395] Avg episode rewards: #0: 4.404, true rewards: #0: 4.037 +[2024-11-07 16:10:24,603][14395] Avg episode reward: 4.404, avg true_objective: 4.037 +[2024-11-07 16:10:24,766][14395] Num frames 24300... +[2024-11-07 16:10:25,002][14395] Num frames 24400... +[2024-11-07 16:10:25,248][14395] Num frames 24500... +[2024-11-07 16:10:25,449][14395] Num frames 24600... +[2024-11-07 16:10:25,521][14395] Avg episode rewards: #0: 4.395, true rewards: #0: 4.034 +[2024-11-07 16:10:25,526][14395] Avg episode reward: 4.395, avg true_objective: 4.034 +[2024-11-07 16:10:25,720][14395] Num frames 24700... +[2024-11-07 16:10:25,908][14395] Num frames 24800... +[2024-11-07 16:10:26,116][14395] Num frames 24900... +[2024-11-07 16:10:26,375][14395] Avg episode rewards: #0: 4.386, true rewards: #0: 4.031 +[2024-11-07 16:10:26,380][14395] Avg episode reward: 4.386, avg true_objective: 4.031 +[2024-11-07 16:10:26,417][14395] Num frames 25000... +[2024-11-07 16:10:26,617][14395] Num frames 25100... +[2024-11-07 16:10:26,806][14395] Num frames 25200... +[2024-11-07 16:10:27,006][14395] Num frames 25300... +[2024-11-07 16:10:27,214][14395] Avg episode rewards: #0: 4.377, true rewards: #0: 4.028 +[2024-11-07 16:10:27,215][14395] Avg episode reward: 4.377, avg true_objective: 4.028 +[2024-11-07 16:10:27,273][14395] Num frames 25400... +[2024-11-07 16:10:27,465][14395] Num frames 25500... +[2024-11-07 16:10:27,657][14395] Num frames 25600... +[2024-11-07 16:10:27,844][14395] Num frames 25700... +[2024-11-07 16:10:28,058][14395] Avg episode rewards: #0: 4.369, true rewards: #0: 4.025 +[2024-11-07 16:10:28,061][14395] Avg episode reward: 4.369, avg true_objective: 4.025 +[2024-11-07 16:10:28,157][14395] Num frames 25800... +[2024-11-07 16:10:28,349][14395] Num frames 25900... +[2024-11-07 16:10:28,540][14395] Num frames 26000... +[2024-11-07 16:10:28,733][14395] Num frames 26100... +[2024-11-07 16:10:28,874][14395] Avg episode rewards: #0: 4.361, true rewards: #0: 4.022 +[2024-11-07 16:10:28,875][14395] Avg episode reward: 4.361, avg true_objective: 4.022 +[2024-11-07 16:10:28,998][14395] Num frames 26200... +[2024-11-07 16:10:29,188][14395] Num frames 26300... +[2024-11-07 16:10:29,384][14395] Num frames 26400... +[2024-11-07 16:10:29,462][14395] Avg episode rewards: #0: 4.335, true rewards: #0: 4.002 +[2024-11-07 16:10:29,466][14395] Avg episode reward: 4.335, avg true_objective: 4.002 +[2024-11-07 16:10:29,656][14395] Num frames 26500... +[2024-11-07 16:10:29,849][14395] Num frames 26600... +[2024-11-07 16:10:30,053][14395] Num frames 26700... +[2024-11-07 16:10:30,247][14395] Num frames 26800... +[2024-11-07 16:10:30,405][14395] Avg episode rewards: #0: 4.333, true rewards: #0: 4.004 +[2024-11-07 16:10:30,407][14395] Avg episode reward: 4.333, avg true_objective: 4.004 +[2024-11-07 16:10:30,577][14395] Num frames 26900... +[2024-11-07 16:10:30,924][14395] Num frames 27000... +[2024-11-07 16:10:31,169][14395] Num frames 27100... +[2024-11-07 16:10:31,548][14395] Num frames 27200... +[2024-11-07 16:10:31,643][14395] Avg episode rewards: #0: 4.325, true rewards: #0: 4.002 +[2024-11-07 16:10:31,645][14395] Avg episode reward: 4.325, avg true_objective: 4.002 +[2024-11-07 16:10:31,915][14395] Num frames 27300... +[2024-11-07 16:10:32,421][14395] Num frames 27400... +[2024-11-07 16:10:32,745][14395] Num frames 27500... +[2024-11-07 16:10:33,186][14395] Avg episode rewards: #0: 4.318, true rewards: #0: 3.999 +[2024-11-07 16:10:33,190][14395] Avg episode reward: 4.318, avg true_objective: 3.999 +[2024-11-07 16:10:33,217][14395] Num frames 27600... +[2024-11-07 16:10:33,555][14395] Num frames 27700... +[2024-11-07 16:10:33,797][14395] Num frames 27800... +[2024-11-07 16:10:34,106][14395] Num frames 27900... +[2024-11-07 16:10:34,301][14395] Avg episode rewards: #0: 4.311, true rewards: #0: 3.997 +[2024-11-07 16:10:34,305][14395] Avg episode reward: 4.311, avg true_objective: 3.997 +[2024-11-07 16:10:34,364][14395] Num frames 28000... +[2024-11-07 16:10:34,545][14395] Num frames 28100... +[2024-11-07 16:10:34,783][14395] Num frames 28200... +[2024-11-07 16:10:35,056][14395] Num frames 28300... +[2024-11-07 16:10:35,386][14395] Num frames 28400... +[2024-11-07 16:10:35,516][14395] Avg episode rewards: #0: 4.328, true rewards: #0: 4.004 +[2024-11-07 16:10:35,518][14395] Avg episode reward: 4.328, avg true_objective: 4.004 +[2024-11-07 16:10:35,736][14395] Num frames 28500... +[2024-11-07 16:10:36,008][14395] Num frames 28600... +[2024-11-07 16:10:36,406][14395] Num frames 28700... +[2024-11-07 16:10:36,673][14395] Num frames 28800... +[2024-11-07 16:10:36,848][14395] Avg episode rewards: #0: 4.339, true rewards: #0: 4.006 +[2024-11-07 16:10:36,850][14395] Avg episode reward: 4.339, avg true_objective: 4.006 +[2024-11-07 16:10:36,993][14395] Num frames 28900... +[2024-11-07 16:10:37,205][14395] Num frames 29000... +[2024-11-07 16:10:37,398][14395] Num frames 29100... +[2024-11-07 16:10:37,664][14395] Num frames 29200... +[2024-11-07 16:10:37,778][14395] Avg episode rewards: #0: 4.333, true rewards: #0: 4.004 +[2024-11-07 16:10:37,782][14395] Avg episode reward: 4.333, avg true_objective: 4.004 +[2024-11-07 16:10:37,943][14395] Num frames 29300... +[2024-11-07 16:10:38,133][14395] Num frames 29400... +[2024-11-07 16:10:38,306][14395] Num frames 29500... +[2024-11-07 16:10:38,491][14395] Num frames 29600... +[2024-11-07 16:10:38,572][14395] Avg episode rewards: #0: 4.326, true rewards: #0: 4.002 +[2024-11-07 16:10:38,577][14395] Avg episode reward: 4.326, avg true_objective: 4.002 +[2024-11-07 16:10:38,747][14395] Num frames 29700... +[2024-11-07 16:10:38,940][14395] Num frames 29800... +[2024-11-07 16:10:39,114][14395] Num frames 29900... +[2024-11-07 16:10:39,341][14395] Avg episode rewards: #0: 4.319, true rewards: #0: 3.999 +[2024-11-07 16:10:39,344][14395] Avg episode reward: 4.319, avg true_objective: 3.999 +[2024-11-07 16:10:39,374][14395] Num frames 30000... +[2024-11-07 16:10:39,567][14395] Num frames 30100... +[2024-11-07 16:10:39,764][14395] Num frames 30200... +[2024-11-07 16:10:39,961][14395] Num frames 30300... +[2024-11-07 16:10:40,171][14395] Avg episode rewards: #0: 4.313, true rewards: #0: 3.997 +[2024-11-07 16:10:40,174][14395] Avg episode reward: 4.313, avg true_objective: 3.997 +[2024-11-07 16:10:40,237][14395] Num frames 30400... +[2024-11-07 16:10:40,437][14395] Num frames 30500... +[2024-11-07 16:10:40,630][14395] Num frames 30600... +[2024-11-07 16:10:40,828][14395] Num frames 30700... +[2024-11-07 16:10:41,034][14395] Num frames 30800... +[2024-11-07 16:10:41,242][14395] Num frames 30900... +[2024-11-07 16:10:41,465][14395] Avg episode rewards: #0: 4.375, true rewards: #0: 4.024 +[2024-11-07 16:10:41,468][14395] Avg episode reward: 4.375, avg true_objective: 4.024 +[2024-11-07 16:10:41,516][14395] Num frames 31000... +[2024-11-07 16:10:41,718][14395] Num frames 31100... +[2024-11-07 16:10:41,924][14395] Num frames 31200... +[2024-11-07 16:10:42,123][14395] Num frames 31300... +[2024-11-07 16:10:42,325][14395] Num frames 31400... +[2024-11-07 16:10:42,391][14395] Avg episode rewards: #0: 4.385, true rewards: #0: 4.026 +[2024-11-07 16:10:42,392][14395] Avg episode reward: 4.385, avg true_objective: 4.026 +[2024-11-07 16:10:42,580][14395] Num frames 31500... +[2024-11-07 16:10:42,767][14395] Num frames 31600... +[2024-11-07 16:10:42,980][14395] Num frames 31700... +[2024-11-07 16:10:43,199][14395] Avg episode rewards: #0: 4.378, true rewards: #0: 4.024 +[2024-11-07 16:10:43,201][14395] Avg episode reward: 4.378, avg true_objective: 4.024 +[2024-11-07 16:10:43,238][14395] Num frames 31800... +[2024-11-07 16:10:43,426][14395] Num frames 31900... +[2024-11-07 16:10:43,625][14395] Num frames 32000... +[2024-11-07 16:10:43,818][14395] Num frames 32100... +[2024-11-07 16:10:44,016][14395] Avg episode rewards: #0: 4.372, true rewards: #0: 4.021 +[2024-11-07 16:10:44,020][14395] Avg episode reward: 4.372, avg true_objective: 4.021 +[2024-11-07 16:10:44,093][14395] Num frames 32200... +[2024-11-07 16:10:44,318][14395] Num frames 32300... +[2024-11-07 16:10:44,503][14395] Num frames 32400... +[2024-11-07 16:10:44,689][14395] Num frames 32500... +[2024-11-07 16:10:44,881][14395] Num frames 32600... +[2024-11-07 16:10:44,972][14395] Avg episode rewards: #0: 4.385, true rewards: #0: 4.027 +[2024-11-07 16:10:44,975][14395] Avg episode reward: 4.385, avg true_objective: 4.027 +[2024-11-07 16:10:45,151][14395] Num frames 32700... +[2024-11-07 16:10:45,331][14395] Num frames 32800... +[2024-11-07 16:10:45,508][14395] Num frames 32900... +[2024-11-07 16:10:45,690][14395] Num frames 33000... +[2024-11-07 16:10:45,873][14395] Avg episode rewards: #0: 4.399, true rewards: #0: 4.033 +[2024-11-07 16:10:45,877][14395] Avg episode reward: 4.399, avg true_objective: 4.033 +[2024-11-07 16:10:45,959][14395] Num frames 33100... +[2024-11-07 16:10:46,150][14395] Num frames 33200... +[2024-11-07 16:10:46,332][14395] Num frames 33300... +[2024-11-07 16:10:46,515][14395] Num frames 33400... +[2024-11-07 16:10:46,667][14395] Avg episode rewards: #0: 4.392, true rewards: #0: 4.030 +[2024-11-07 16:10:46,672][14395] Avg episode reward: 4.392, avg true_objective: 4.030 +[2024-11-07 16:10:46,784][14395] Num frames 33500... +[2024-11-07 16:10:46,971][14395] Num frames 33600... +[2024-11-07 16:10:47,153][14395] Num frames 33700... +[2024-11-07 16:10:47,337][14395] Num frames 33800... +[2024-11-07 16:10:47,528][14395] Num frames 33900... +[2024-11-07 16:10:47,580][14395] Avg episode rewards: #0: 4.405, true rewards: #0: 4.036 +[2024-11-07 16:10:47,581][14395] Avg episode reward: 4.405, avg true_objective: 4.036 +[2024-11-07 16:10:47,777][14395] Num frames 34000... +[2024-11-07 16:10:47,962][14395] Num frames 34100... +[2024-11-07 16:10:48,157][14395] Num frames 34200... +[2024-11-07 16:10:48,365][14395] Avg episode rewards: #0: 4.398, true rewards: #0: 4.033 +[2024-11-07 16:10:48,369][14395] Avg episode reward: 4.398, avg true_objective: 4.033 +[2024-11-07 16:10:48,425][14395] Num frames 34300... +[2024-11-07 16:10:48,611][14395] Num frames 34400... +[2024-11-07 16:10:48,802][14395] Num frames 34500... +[2024-11-07 16:10:48,981][14395] Num frames 34600... +[2024-11-07 16:10:49,172][14395] Num frames 34700... +[2024-11-07 16:10:49,346][14395] Avg episode rewards: #0: 4.414, true rewards: #0: 4.042 +[2024-11-07 16:10:49,350][14395] Avg episode reward: 4.414, avg true_objective: 4.042 +[2024-11-07 16:10:49,441][14395] Num frames 34800... +[2024-11-07 16:10:49,630][14395] Num frames 34900... +[2024-11-07 16:10:49,819][14395] Num frames 35000... +[2024-11-07 16:10:49,998][14395] Num frames 35100... +[2024-11-07 16:10:50,144][14395] Avg episode rewards: #0: 4.408, true rewards: #0: 4.040 +[2024-11-07 16:10:50,148][14395] Avg episode reward: 4.408, avg true_objective: 4.040 +[2024-11-07 16:10:50,266][14395] Num frames 35200... +[2024-11-07 16:10:50,449][14395] Num frames 35300... +[2024-11-07 16:10:50,640][14395] Num frames 35400... +[2024-11-07 16:10:50,842][14395] Num frames 35500... +[2024-11-07 16:10:50,975][14395] Avg episode rewards: #0: 4.401, true rewards: #0: 4.038 +[2024-11-07 16:10:50,980][14395] Avg episode reward: 4.401, avg true_objective: 4.038 +[2024-11-07 16:10:51,129][14395] Num frames 35600... +[2024-11-07 16:10:51,310][14395] Num frames 35700... +[2024-11-07 16:10:51,495][14395] Num frames 35800... +[2024-11-07 16:10:51,679][14395] Num frames 35900... +[2024-11-07 16:10:51,772][14395] Avg episode rewards: #0: 4.395, true rewards: #0: 4.036 +[2024-11-07 16:10:51,775][14395] Avg episode reward: 4.395, avg true_objective: 4.036 +[2024-11-07 16:10:51,949][14395] Num frames 36000... +[2024-11-07 16:10:52,143][14395] Num frames 36100... +[2024-11-07 16:10:52,322][14395] Num frames 36200... +[2024-11-07 16:10:52,516][14395] Num frames 36300... +[2024-11-07 16:10:52,568][14395] Avg episode rewards: #0: 4.389, true rewards: #0: 4.033 +[2024-11-07 16:10:52,571][14395] Avg episode reward: 4.389, avg true_objective: 4.033 +[2024-11-07 16:10:52,785][14395] Num frames 36400... +[2024-11-07 16:10:52,973][14395] Num frames 36500... +[2024-11-07 16:10:55,429][14395] Num frames 36600... +[2024-11-07 16:10:55,637][14395] Avg episode rewards: #0: 4.383, true rewards: #0: 4.031 +[2024-11-07 16:10:55,640][14395] Avg episode reward: 4.383, avg true_objective: 4.031 +[2024-11-07 16:10:55,697][14395] Num frames 36700... +[2024-11-07 16:10:55,888][14395] Num frames 36800... +[2024-11-07 16:10:56,084][14395] Num frames 36900... +[2024-11-07 16:10:56,346][14395] Num frames 37000... +[2024-11-07 16:10:56,534][14395] Avg episode rewards: #0: 4.377, true rewards: #0: 4.029 +[2024-11-07 16:10:56,539][14395] Avg episode reward: 4.377, avg true_objective: 4.029 +[2024-11-07 16:10:56,614][14395] Num frames 37100... +[2024-11-07 16:10:56,794][14395] Num frames 37200... +[2024-11-07 16:10:56,980][14395] Num frames 37300... +[2024-11-07 16:10:57,176][14395] Num frames 37400... +[2024-11-07 16:10:57,327][14395] Avg episode rewards: #0: 4.371, true rewards: #0: 4.027 +[2024-11-07 16:10:57,329][14395] Avg episode reward: 4.371, avg true_objective: 4.027 +[2024-11-07 16:10:57,438][14395] Num frames 37500... +[2024-11-07 16:10:57,622][14395] Num frames 37600... +[2024-11-07 16:10:57,802][14395] Num frames 37700... +[2024-11-07 16:10:57,988][14395] Num frames 37800... +[2024-11-07 16:10:58,112][14395] Avg episode rewards: #0: 4.366, true rewards: #0: 4.025 +[2024-11-07 16:10:58,115][14395] Avg episode reward: 4.366, avg true_objective: 4.025 +[2024-11-07 16:10:58,237][14395] Num frames 37900... +[2024-11-07 16:10:58,417][14395] Num frames 38000... +[2024-11-07 16:10:58,605][14395] Num frames 38100... +[2024-11-07 16:10:58,788][14395] Num frames 38200... +[2024-11-07 16:10:58,883][14395] Avg episode rewards: #0: 4.360, true rewards: #0: 4.023 +[2024-11-07 16:10:58,887][14395] Avg episode reward: 4.360, avg true_objective: 4.023 +[2024-11-07 16:10:59,069][14395] Num frames 38300... +[2024-11-07 16:10:59,257][14395] Num frames 38400... +[2024-11-07 16:10:59,446][14395] Num frames 38500... +[2024-11-07 16:10:59,629][14395] Num frames 38600... +[2024-11-07 16:10:59,691][14395] Avg episode rewards: #0: 4.355, true rewards: #0: 4.021 +[2024-11-07 16:10:59,694][14395] Avg episode reward: 4.355, avg true_objective: 4.021 +[2024-11-07 16:10:59,921][14395] Num frames 38700... +[2024-11-07 16:11:00,134][14395] Num frames 38800... +[2024-11-07 16:11:00,321][14395] Num frames 38900... +[2024-11-07 16:11:00,511][14395] Num frames 39000... +[2024-11-07 16:11:00,663][14395] Avg episode rewards: #0: 4.366, true rewards: #0: 4.026 +[2024-11-07 16:11:00,668][14395] Avg episode reward: 4.366, avg true_objective: 4.026 +[2024-11-07 16:11:00,778][14395] Num frames 39100... +[2024-11-07 16:11:00,989][14395] Num frames 39200... +[2024-11-07 16:11:01,187][14395] Num frames 39300... +[2024-11-07 16:11:01,398][14395] Num frames 39400... +[2024-11-07 16:11:01,531][14395] Avg episode rewards: #0: 4.361, true rewards: #0: 4.024 +[2024-11-07 16:11:01,534][14395] Avg episode reward: 4.361, avg true_objective: 4.024 +[2024-11-07 16:11:01,701][14395] Num frames 39500... +[2024-11-07 16:11:01,912][14395] Num frames 39600... +[2024-11-07 16:11:02,139][14395] Num frames 39700... +[2024-11-07 16:11:02,366][14395] Num frames 39800... +[2024-11-07 16:11:02,468][14395] Avg episode rewards: #0: 4.356, true rewards: #0: 4.022 +[2024-11-07 16:11:02,472][14395] Avg episode reward: 4.356, avg true_objective: 4.022 +[2024-11-07 16:11:02,655][14395] Num frames 39900... +[2024-11-07 16:11:02,836][14395] Num frames 40000... +[2024-11-07 16:11:03,028][14395] Num frames 40100... +[2024-11-07 16:11:03,218][14395] Num frames 40200... +[2024-11-07 16:11:03,281][14395] Avg episode rewards: #0: 4.350, true rewards: #0: 4.020 +[2024-11-07 16:11:03,283][14395] Avg episode reward: 4.350, avg true_objective: 4.020 +[2024-11-07 16:12:27,856][14395] Replay video saved to /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/replay.mp4! +[2024-11-07 16:12:42,572][14395] The model has been pushed to https://huggingface.co/alidenewade/rl_course_vizdoom_health_gathering_supreme +[2024-11-07 16:13:23,833][14395] Loading existing experiment configuration from /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/config.json +[2024-11-07 16:13:23,835][14395] Overriding arg 'num_workers' with value 1 passed from command line +[2024-11-07 16:13:23,836][14395] Adding new argument 'no_render'=True that is not in the saved config file! +[2024-11-07 16:13:23,838][14395] Adding new argument 'save_video'=True that is not in the saved config file! +[2024-11-07 16:13:23,839][14395] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! +[2024-11-07 16:13:23,842][14395] Adding new argument 'video_name'=None that is not in the saved config file! +[2024-11-07 16:13:23,843][14395] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! +[2024-11-07 16:13:23,845][14395] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! +[2024-11-07 16:13:23,847][14395] Adding new argument 'push_to_hub'=True that is not in the saved config file! +[2024-11-07 16:13:23,848][14395] Adding new argument 'hf_repository'='alidenewade/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! +[2024-11-07 16:13:23,850][14395] Adding new argument 'policy_index'=0 that is not in the saved config file! +[2024-11-07 16:13:23,852][14395] Adding new argument 'eval_deterministic'=False that is not in the saved config file! +[2024-11-07 16:13:23,854][14395] Adding new argument 'train_script'=None that is not in the saved config file! +[2024-11-07 16:13:23,855][14395] Adding new argument 'enjoy_script'=None that is not in the saved config file! +[2024-11-07 16:13:23,856][14395] Using frameskip 1 and render_action_repeat=4 for evaluation +[2024-11-07 16:13:23,891][14395] RunningMeanStd input shape: (3, 72, 128) +[2024-11-07 16:13:23,894][14395] RunningMeanStd input shape: (1,) +[2024-11-07 16:13:23,922][14395] ConvEncoder: input_channels=3 +[2024-11-07 16:13:24,017][14395] Conv encoder output size: 512 +[2024-11-07 16:13:24,019][14395] Policy head output size: 512 +[2024-11-07 16:13:24,047][14395] Loading state from checkpoint /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000004884_20004864.pth... +[2024-11-07 16:13:24,945][14395] Num frames 100... +[2024-11-07 16:13:25,137][14395] Num frames 200... +[2024-11-07 16:13:25,307][14395] Avg episode rewards: #0: 2.560, true rewards: #0: 2.560 +[2024-11-07 16:13:25,310][14395] Avg episode reward: 2.560, avg true_objective: 2.560 +[2024-11-07 16:13:25,407][14395] Num frames 300... +[2024-11-07 16:13:25,618][14395] Num frames 400... +[2024-11-07 16:13:25,850][14395] Num frames 500... +[2024-11-07 16:13:26,097][14395] Num frames 600... +[2024-11-07 16:13:26,329][14395] Avg episode rewards: #0: 3.860, true rewards: #0: 3.360 +[2024-11-07 16:13:26,334][14395] Avg episode reward: 3.860, avg true_objective: 3.360 +[2024-11-07 16:13:26,417][14395] Num frames 700... +[2024-11-07 16:13:26,647][14395] Num frames 800... +[2024-11-07 16:13:26,854][14395] Num frames 900... +[2024-11-07 16:13:27,073][14395] Num frames 1000... +[2024-11-07 16:13:27,307][14395] Num frames 1100... +[2024-11-07 16:13:27,411][14395] Avg episode rewards: #0: 4.400, true rewards: #0: 3.733 +[2024-11-07 16:13:27,412][14395] Avg episode reward: 4.400, avg true_objective: 3.733 +[2024-11-07 16:13:27,600][14395] Num frames 1200... +[2024-11-07 16:13:27,827][14395] Num frames 1300... +[2024-11-07 16:13:28,050][14395] Num frames 1400... +[2024-11-07 16:13:28,267][14395] Num frames 1500... +[2024-11-07 16:13:28,335][14395] Avg episode rewards: #0: 4.260, true rewards: #0: 3.760 +[2024-11-07 16:13:28,339][14395] Avg episode reward: 4.260, avg true_objective: 3.760 +[2024-11-07 16:13:28,557][14395] Num frames 1600... +[2024-11-07 16:13:28,771][14395] Num frames 1700... +[2024-11-07 16:13:28,975][14395] Num frames 1800... +[2024-11-07 16:13:29,235][14395] Avg episode rewards: #0: 4.176, true rewards: #0: 3.776 +[2024-11-07 16:13:29,239][14395] Avg episode reward: 4.176, avg true_objective: 3.776 +[2024-11-07 16:13:29,285][14395] Num frames 1900... +[2024-11-07 16:13:29,504][14395] Num frames 2000... +[2024-11-07 16:13:29,728][14395] Num frames 2100... +[2024-11-07 16:13:29,968][14395] Num frames 2200... +[2024-11-07 16:13:30,182][14395] Avg episode rewards: #0: 4.120, true rewards: #0: 3.787 +[2024-11-07 16:13:30,185][14395] Avg episode reward: 4.120, avg true_objective: 3.787 +[2024-11-07 16:13:30,269][14395] Num frames 2300... +[2024-11-07 16:13:30,512][14395] Num frames 2400... +[2024-11-07 16:13:30,740][14395] Num frames 2500... +[2024-11-07 16:13:30,981][14395] Num frames 2600... +[2024-11-07 16:13:31,180][14395] Avg episode rewards: #0: 4.080, true rewards: #0: 3.794 +[2024-11-07 16:13:31,183][14395] Avg episode reward: 4.080, avg true_objective: 3.794 +[2024-11-07 16:13:31,302][14395] Num frames 2700... +[2024-11-07 16:13:31,556][14395] Num frames 2800... +[2024-11-07 16:13:31,799][14395] Num frames 2900... +[2024-11-07 16:13:32,100][14395] Num frames 3000... +[2024-11-07 16:13:32,253][14395] Avg episode rewards: #0: 4.050, true rewards: #0: 3.800 +[2024-11-07 16:13:32,254][14395] Avg episode reward: 4.050, avg true_objective: 3.800 +[2024-11-07 16:13:32,451][14395] Num frames 3100... +[2024-11-07 16:13:32,657][14395] Num frames 3200... +[2024-11-07 16:13:32,876][14395] Num frames 3300... +[2024-11-07 16:13:33,090][14395] Num frames 3400... +[2024-11-07 16:13:33,331][14395] Avg episode rewards: #0: 4.209, true rewards: #0: 3.876 +[2024-11-07 16:13:33,334][14395] Avg episode reward: 4.209, avg true_objective: 3.876 +[2024-11-07 16:13:33,376][14395] Num frames 3500... +[2024-11-07 16:13:33,582][14395] Num frames 3600... +[2024-11-07 16:13:33,808][14395] Num frames 3700... +[2024-11-07 16:13:34,019][14395] Num frames 3800... +[2024-11-07 16:13:34,233][14395] Avg episode rewards: #0: 4.172, true rewards: #0: 3.872 +[2024-11-07 16:13:34,237][14395] Avg episode reward: 4.172, avg true_objective: 3.872 +[2024-11-07 16:13:42,153][14395] Replay video saved to /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/replay.mp4! +[2024-11-07 16:13:49,817][14395] The model has been pushed to https://huggingface.co/alidenewade/rl_course_vizdoom_health_gathering_supreme +[2024-11-07 16:15:32,672][14395] Loading existing experiment configuration from /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/config.json +[2024-11-07 16:15:32,674][14395] Overriding arg 'num_workers' with value 4 passed from command line +[2024-11-07 16:15:32,675][14395] Adding new argument 'no_render'=True that is not in the saved config file! +[2024-11-07 16:15:32,676][14395] Adding new argument 'save_video'=True that is not in the saved config file! +[2024-11-07 16:15:32,679][14395] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! +[2024-11-07 16:15:32,681][14395] Adding new argument 'video_name'=None that is not in the saved config file! +[2024-11-07 16:15:32,682][14395] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! +[2024-11-07 16:15:32,683][14395] Adding new argument 'max_num_episodes'=100 that is not in the saved config file! +[2024-11-07 16:15:32,685][14395] Adding new argument 'push_to_hub'=True that is not in the saved config file! +[2024-11-07 16:15:32,686][14395] Adding new argument 'hf_repository'='alidenewade/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! +[2024-11-07 16:15:32,688][14395] Adding new argument 'policy_index'=0 that is not in the saved config file! +[2024-11-07 16:15:32,690][14395] Adding new argument 'eval_deterministic'=False that is not in the saved config file! +[2024-11-07 16:15:32,691][14395] Adding new argument 'train_script'=None that is not in the saved config file! +[2024-11-07 16:15:32,694][14395] Adding new argument 'enjoy_script'=None that is not in the saved config file! +[2024-11-07 16:15:32,696][14395] Using frameskip 1 and render_action_repeat=4 for evaluation +[2024-11-07 16:15:32,726][14395] RunningMeanStd input shape: (3, 72, 128) +[2024-11-07 16:15:32,728][14395] RunningMeanStd input shape: (1,) +[2024-11-07 16:15:32,740][14395] ConvEncoder: input_channels=3 +[2024-11-07 16:15:32,787][14395] Conv encoder output size: 512 +[2024-11-07 16:15:32,788][14395] Policy head output size: 512 +[2024-11-07 16:15:32,813][14395] Loading state from checkpoint /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000004884_20004864.pth... +[2024-11-07 16:15:33,417][14395] Num frames 100... +[2024-11-07 16:15:33,598][14395] Num frames 200... +[2024-11-07 16:15:33,773][14395] Num frames 300... +[2024-11-07 16:15:33,978][14395] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840 +[2024-11-07 16:15:33,982][14395] Avg episode reward: 3.840, avg true_objective: 3.840 +[2024-11-07 16:15:34,036][14395] Num frames 400... +[2024-11-07 16:15:34,218][14395] Num frames 500... +[2024-11-07 16:15:34,404][14395] Num frames 600... +[2024-11-07 16:15:34,581][14395] Num frames 700... +[2024-11-07 16:15:34,762][14395] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840 +[2024-11-07 16:15:34,768][14395] Avg episode reward: 3.840, avg true_objective: 3.840 +[2024-11-07 16:15:34,866][14395] Num frames 800... +[2024-11-07 16:15:35,118][14395] Num frames 900... +[2024-11-07 16:15:35,368][14395] Num frames 1000... +[2024-11-07 16:15:35,592][14395] Num frames 1100... +[2024-11-07 16:15:35,830][14395] Avg episode rewards: #0: 4.280, true rewards: #0: 3.947 +[2024-11-07 16:15:35,831][14395] Avg episode reward: 4.280, avg true_objective: 3.947 +[2024-11-07 16:15:35,868][14395] Num frames 1200... +[2024-11-07 16:15:36,073][14395] Num frames 1300... +[2024-11-07 16:15:36,277][14395] Num frames 1400... +[2024-11-07 16:15:36,528][14395] Num frames 1500... +[2024-11-07 16:15:36,730][14395] Avg episode rewards: #0: 4.170, true rewards: #0: 3.920 +[2024-11-07 16:15:36,733][14395] Avg episode reward: 4.170, avg true_objective: 3.920 +[2024-11-07 16:15:36,810][14395] Num frames 1600... +[2024-11-07 16:15:37,034][14395] Num frames 1700... +[2024-11-07 16:15:37,261][14395] Num frames 1800... +[2024-11-07 16:15:37,488][14395] Num frames 1900... +[2024-11-07 16:15:37,706][14395] Num frames 2000... +[2024-11-07 16:15:37,802][14395] Avg episode rewards: #0: 4.432, true rewards: #0: 4.032 +[2024-11-07 16:15:37,805][14395] Avg episode reward: 4.432, avg true_objective: 4.032 +[2024-11-07 16:15:38,014][14395] Num frames 2100... +[2024-11-07 16:15:38,227][14395] Num frames 2200... +[2024-11-07 16:15:38,437][14395] Num frames 2300... +[2024-11-07 16:15:38,666][14395] Num frames 2400... +[2024-11-07 16:15:38,719][14395] Avg episode rewards: #0: 4.333, true rewards: #0: 4.000 +[2024-11-07 16:15:38,723][14395] Avg episode reward: 4.333, avg true_objective: 4.000 +[2024-11-07 16:15:38,967][14395] Num frames 2500... +[2024-11-07 16:15:39,184][14395] Num frames 2600... +[2024-11-07 16:15:39,401][14395] Num frames 2700... +[2024-11-07 16:15:39,613][14395] Num frames 2800... +[2024-11-07 16:15:39,781][14395] Avg episode rewards: #0: 4.497, true rewards: #0: 4.069 +[2024-11-07 16:15:39,784][14395] Avg episode reward: 4.497, avg true_objective: 4.069 +[2024-11-07 16:15:39,907][14395] Num frames 2900... +[2024-11-07 16:15:40,136][14395] Num frames 3000... +[2024-11-07 16:15:40,355][14395] Num frames 3100... +[2024-11-07 16:15:40,580][14395] Num frames 3200... +[2024-11-07 16:15:40,708][14395] Avg episode rewards: #0: 4.415, true rewards: #0: 4.040 +[2024-11-07 16:15:40,712][14395] Avg episode reward: 4.415, avg true_objective: 4.040 +[2024-11-07 16:15:40,886][14395] Num frames 3300... +[2024-11-07 16:15:41,095][14395] Num frames 3400... +[2024-11-07 16:15:41,308][14395] Num frames 3500... +[2024-11-07 16:15:41,529][14395] Num frames 3600... +[2024-11-07 16:15:41,623][14395] Avg episode rewards: #0: 4.351, true rewards: #0: 4.018 +[2024-11-07 16:15:41,628][14395] Avg episode reward: 4.351, avg true_objective: 4.018 +[2024-11-07 16:15:41,873][14395] Num frames 3700... +[2024-11-07 16:15:42,076][14395] Num frames 3800... +[2024-11-07 16:15:42,297][14395] Num frames 3900... +[2024-11-07 16:15:42,489][14395] Num frames 4000... +[2024-11-07 16:15:42,541][14395] Avg episode rewards: #0: 4.300, true rewards: #0: 4.000 +[2024-11-07 16:15:42,545][14395] Avg episode reward: 4.300, avg true_objective: 4.000 +[2024-11-07 16:15:42,749][14395] Num frames 4100... +[2024-11-07 16:15:42,942][14395] Num frames 4200... +[2024-11-07 16:15:43,118][14395] Avg episode rewards: #0: 4.142, true rewards: #0: 3.869 +[2024-11-07 16:15:43,120][14395] Avg episode reward: 4.142, avg true_objective: 3.869 +[2024-11-07 16:15:43,227][14395] Num frames 4300... +[2024-11-07 16:15:43,442][14395] Num frames 4400... +[2024-11-07 16:15:43,654][14395] Num frames 4500... +[2024-11-07 16:15:43,869][14395] Num frames 4600... +[2024-11-07 16:15:44,008][14395] Avg episode rewards: #0: 4.117, true rewards: #0: 3.867 +[2024-11-07 16:15:44,012][14395] Avg episode reward: 4.117, avg true_objective: 3.867 +[2024-11-07 16:15:44,145][14395] Num frames 4700... +[2024-11-07 16:15:44,332][14395] Num frames 4800... +[2024-11-07 16:15:44,519][14395] Num frames 4900... +[2024-11-07 16:15:44,708][14395] Num frames 5000... +[2024-11-07 16:15:44,806][14395] Avg episode rewards: #0: 4.095, true rewards: #0: 3.865 +[2024-11-07 16:15:44,810][14395] Avg episode reward: 4.095, avg true_objective: 3.865 +[2024-11-07 16:15:44,979][14395] Num frames 5100... +[2024-11-07 16:15:45,193][14395] Num frames 5200... +[2024-11-07 16:15:45,406][14395] Num frames 5300... +[2024-11-07 16:15:45,632][14395] Num frames 5400... +[2024-11-07 16:15:45,834][14395] Avg episode rewards: #0: 4.194, true rewards: #0: 3.909 +[2024-11-07 16:15:45,836][14395] Avg episode reward: 4.194, avg true_objective: 3.909 +[2024-11-07 16:15:45,911][14395] Num frames 5500... +[2024-11-07 16:15:46,133][14395] Num frames 5600... +[2024-11-07 16:15:46,350][14395] Num frames 5700... +[2024-11-07 16:15:46,578][14395] Num frames 5800... +[2024-11-07 16:15:46,757][14395] Avg episode rewards: #0: 4.171, true rewards: #0: 3.904 +[2024-11-07 16:15:46,761][14395] Avg episode reward: 4.171, avg true_objective: 3.904 +[2024-11-07 16:15:46,888][14395] Num frames 5900... +[2024-11-07 16:15:47,102][14395] Num frames 6000... +[2024-11-07 16:15:47,336][14395] Num frames 6100... +[2024-11-07 16:15:47,547][14395] Num frames 6200... +[2024-11-07 16:15:47,690][14395] Avg episode rewards: #0: 4.150, true rewards: #0: 3.900 +[2024-11-07 16:15:47,694][14395] Avg episode reward: 4.150, avg true_objective: 3.900 +[2024-11-07 16:15:47,851][14395] Num frames 6300... +[2024-11-07 16:15:48,085][14395] Num frames 6400... +[2024-11-07 16:15:48,321][14395] Num frames 6500... +[2024-11-07 16:15:48,543][14395] Num frames 6600... +[2024-11-07 16:15:48,657][14395] Avg episode rewards: #0: 4.132, true rewards: #0: 3.896 +[2024-11-07 16:15:48,661][14395] Avg episode reward: 4.132, avg true_objective: 3.896 +[2024-11-07 16:15:48,938][14395] Num frames 6700... +[2024-11-07 16:15:49,170][14395] Num frames 6800... +[2024-11-07 16:15:49,406][14395] Num frames 6900... +[2024-11-07 16:15:49,637][14395] Num frames 7000... +[2024-11-07 16:15:49,869][14395] Num frames 7100... +[2024-11-07 16:15:50,082][14395] Avg episode rewards: #0: 4.316, true rewards: #0: 3.982 +[2024-11-07 16:15:50,085][14395] Avg episode reward: 4.316, avg true_objective: 3.982 +[2024-11-07 16:15:50,179][14395] Num frames 7200... +[2024-11-07 16:15:50,412][14395] Num frames 7300... +[2024-11-07 16:15:50,623][14395] Num frames 7400... +[2024-11-07 16:15:50,849][14395] Num frames 7500... +[2024-11-07 16:15:51,016][14395] Avg episode rewards: #0: 4.291, true rewards: #0: 3.975 +[2024-11-07 16:15:51,021][14395] Avg episode reward: 4.291, avg true_objective: 3.975 +[2024-11-07 16:15:51,143][14395] Num frames 7600... +[2024-11-07 16:15:51,368][14395] Num frames 7700... +[2024-11-07 16:15:51,594][14395] Num frames 7800... +[2024-11-07 16:15:51,830][14395] Num frames 7900... +[2024-11-07 16:15:52,061][14395] Num frames 8000... +[2024-11-07 16:15:52,113][14395] Avg episode rewards: #0: 4.350, true rewards: #0: 4.000 +[2024-11-07 16:15:52,116][14395] Avg episode reward: 4.350, avg true_objective: 4.000 +[2024-11-07 16:15:52,348][14395] Num frames 8100... +[2024-11-07 16:15:52,575][14395] Num frames 8200... +[2024-11-07 16:15:52,797][14395] Num frames 8300... +[2024-11-07 16:15:53,028][14395] Num frames 8400... +[2024-11-07 16:15:53,158][14395] Avg episode rewards: #0: 4.341, true rewards: #0: 4.008 +[2024-11-07 16:15:53,160][14395] Avg episode reward: 4.341, avg true_objective: 4.008 +[2024-11-07 16:15:53,357][14395] Num frames 8500... +[2024-11-07 16:15:53,574][14395] Num frames 8600... +[2024-11-07 16:15:53,801][14395] Num frames 8700... +[2024-11-07 16:15:54,023][14395] Num frames 8800... +[2024-11-07 16:15:54,220][14395] Avg episode rewards: #0: 4.393, true rewards: #0: 4.029 +[2024-11-07 16:15:54,224][14395] Avg episode reward: 4.393, avg true_objective: 4.029 +[2024-11-07 16:15:54,321][14395] Num frames 8900... +[2024-11-07 16:15:54,543][14395] Num frames 9000... +[2024-11-07 16:15:54,770][14395] Num frames 9100... +[2024-11-07 16:15:55,001][14395] Num frames 9200... +[2024-11-07 16:15:55,168][14395] Avg episode rewards: #0: 4.369, true rewards: #0: 4.021 +[2024-11-07 16:15:55,172][14395] Avg episode reward: 4.369, avg true_objective: 4.021 +[2024-11-07 16:15:55,312][14395] Num frames 9300... +[2024-11-07 16:15:55,536][14395] Num frames 9400... +[2024-11-07 16:15:55,760][14395] Num frames 9500... +[2024-11-07 16:15:55,993][14395] Num frames 9600... +[2024-11-07 16:15:56,124][14395] Avg episode rewards: #0: 4.347, true rewards: #0: 4.013 +[2024-11-07 16:15:56,129][14395] Avg episode reward: 4.347, avg true_objective: 4.013 +[2024-11-07 16:15:56,304][14395] Num frames 9700... +[2024-11-07 16:15:56,522][14395] Num frames 9800... +[2024-11-07 16:15:56,758][14395] Num frames 9900... +[2024-11-07 16:15:56,981][14395] Num frames 10000... +[2024-11-07 16:15:57,184][14395] Num frames 10100... +[2024-11-07 16:15:57,398][14395] Avg episode rewards: #0: 4.470, true rewards: #0: 4.070 +[2024-11-07 16:15:57,402][14395] Avg episode reward: 4.470, avg true_objective: 4.070 +[2024-11-07 16:15:57,472][14395] Num frames 10200... +[2024-11-07 16:15:57,701][14395] Num frames 10300... +[2024-11-07 16:15:57,940][14395] Num frames 10400... +[2024-11-07 16:15:58,144][14395] Num frames 10500... +[2024-11-07 16:15:58,371][14395] Num frames 10600... +[2024-11-07 16:15:58,478][14395] Avg episode rewards: #0: 4.509, true rewards: #0: 4.086 +[2024-11-07 16:15:58,482][14395] Avg episode reward: 4.509, avg true_objective: 4.086 +[2024-11-07 16:15:58,668][14395] Num frames 10700... +[2024-11-07 16:15:58,892][14395] Num frames 10800... +[2024-11-07 16:15:59,101][14395] Num frames 10900... +[2024-11-07 16:15:59,306][14395] Num frames 11000... +[2024-11-07 16:15:59,546][14395] Avg episode rewards: #0: 4.545, true rewards: #0: 4.101 +[2024-11-07 16:15:59,547][14395] Avg episode reward: 4.545, avg true_objective: 4.101 +[2024-11-07 16:15:59,626][14395] Num frames 11100... +[2024-11-07 16:15:59,872][14395] Num frames 11200... +[2024-11-07 16:16:00,140][14395] Num frames 11300... +[2024-11-07 16:16:00,376][14395] Num frames 11400... +[2024-11-07 16:16:00,616][14395] Avg episode rewards: #0: 4.520, true rewards: #0: 4.091 +[2024-11-07 16:16:00,617][14395] Avg episode reward: 4.520, avg true_objective: 4.091 +[2024-11-07 16:16:00,739][14395] Num frames 11500... +[2024-11-07 16:16:01,012][14395] Num frames 11600... +[2024-11-07 16:16:01,253][14395] Num frames 11700... +[2024-11-07 16:16:01,487][14395] Num frames 11800... +[2024-11-07 16:16:01,632][14395] Avg episode rewards: #0: 4.497, true rewards: #0: 4.083 +[2024-11-07 16:16:01,634][14395] Avg episode reward: 4.497, avg true_objective: 4.083 +[2024-11-07 16:16:01,780][14395] Num frames 11900... +[2024-11-07 16:16:02,000][14395] Num frames 12000... +[2024-11-07 16:16:02,201][14395] Num frames 12100... +[2024-11-07 16:16:02,405][14395] Num frames 12200... +[2024-11-07 16:16:02,514][14395] Avg episode rewards: #0: 4.475, true rewards: #0: 4.075 +[2024-11-07 16:16:02,516][14395] Avg episode reward: 4.475, avg true_objective: 4.075 +[2024-11-07 16:16:02,673][14395] Num frames 12300... +[2024-11-07 16:16:02,885][14395] Num frames 12400... +[2024-11-07 16:16:03,091][14395] Num frames 12500... +[2024-11-07 16:16:05,525][14395] Num frames 12600... +[2024-11-07 16:16:05,601][14395] Avg episode rewards: #0: 4.454, true rewards: #0: 4.067 +[2024-11-07 16:16:05,604][14395] Avg episode reward: 4.454, avg true_objective: 4.067 +[2024-11-07 16:16:05,844][14395] Num frames 12700... +[2024-11-07 16:16:06,081][14395] Num frames 12800... +[2024-11-07 16:16:06,311][14395] Num frames 12900... +[2024-11-07 16:16:06,546][14395] Num frames 13000... +[2024-11-07 16:16:06,739][14395] Avg episode rewards: #0: 4.486, true rewards: #0: 4.080 +[2024-11-07 16:16:06,741][14395] Avg episode reward: 4.486, avg true_objective: 4.080 +[2024-11-07 16:16:06,852][14395] Num frames 13100... +[2024-11-07 16:16:07,077][14395] Num frames 13200... +[2024-11-07 16:16:07,302][14395] Num frames 13300... +[2024-11-07 16:16:07,516][14395] Num frames 13400... +[2024-11-07 16:16:07,738][14395] Avg episode rewards: #0: 4.507, true rewards: #0: 4.082 +[2024-11-07 16:16:07,743][14395] Avg episode reward: 4.507, avg true_objective: 4.082 +[2024-11-07 16:16:07,827][14395] Num frames 13500... +[2024-11-07 16:16:08,061][14395] Num frames 13600... +[2024-11-07 16:16:08,279][14395] Num frames 13700... +[2024-11-07 16:16:08,499][14395] Num frames 13800... +[2024-11-07 16:16:08,669][14395] Avg episode rewards: #0: 4.487, true rewards: #0: 4.075 +[2024-11-07 16:16:08,674][14395] Avg episode reward: 4.487, avg true_objective: 4.075 +[2024-11-07 16:16:08,792][14395] Num frames 13900... +[2024-11-07 16:16:09,022][14395] Num frames 14000... +[2024-11-07 16:16:09,264][14395] Num frames 14100... +[2024-11-07 16:16:09,498][14395] Num frames 14200... +[2024-11-07 16:16:09,647][14395] Avg episode rewards: #0: 4.469, true rewards: #0: 4.069 +[2024-11-07 16:16:09,649][14395] Avg episode reward: 4.469, avg true_objective: 4.069 +[2024-11-07 16:16:09,793][14395] Num frames 14300... +[2024-11-07 16:16:10,039][14395] Num frames 14400... +[2024-11-07 16:16:10,421][14395] Num frames 14500... +[2024-11-07 16:16:10,664][14395] Num frames 14600... +[2024-11-07 16:16:10,842][14395] Avg episode rewards: #0: 4.488, true rewards: #0: 4.071 +[2024-11-07 16:16:10,843][14395] Avg episode reward: 4.488, avg true_objective: 4.071 +[2024-11-07 16:16:10,956][14395] Num frames 14700... +[2024-11-07 16:16:11,161][14395] Num frames 14800... +[2024-11-07 16:16:11,354][14395] Num frames 14900... +[2024-11-07 16:16:11,542][14395] Num frames 15000... +[2024-11-07 16:16:11,674][14395] Avg episode rewards: #0: 4.470, true rewards: #0: 4.065 +[2024-11-07 16:16:11,678][14395] Avg episode reward: 4.470, avg true_objective: 4.065 +[2024-11-07 16:16:11,805][14395] Num frames 15100... +[2024-11-07 16:16:12,010][14395] Num frames 15200... +[2024-11-07 16:16:12,227][14395] Num frames 15300... +[2024-11-07 16:16:12,451][14395] Num frames 15400... +[2024-11-07 16:16:12,663][14395] Num frames 15500... +[2024-11-07 16:16:12,897][14395] Num frames 15600... +[2024-11-07 16:16:13,052][14395] Avg episode rewards: #0: 4.592, true rewards: #0: 4.118 +[2024-11-07 16:16:13,056][14395] Avg episode reward: 4.592, avg true_objective: 4.118 +[2024-11-07 16:16:13,176][14395] Num frames 15700... +[2024-11-07 16:16:13,370][14395] Num frames 15800... +[2024-11-07 16:16:13,571][14395] Num frames 15900... +[2024-11-07 16:16:13,764][14395] Num frames 16000... +[2024-11-07 16:16:14,013][14395] Avg episode rewards: #0: 4.614, true rewards: #0: 4.127 +[2024-11-07 16:16:14,015][14395] Avg episode reward: 4.614, avg true_objective: 4.127 +[2024-11-07 16:16:14,043][14395] Num frames 16100... +[2024-11-07 16:16:14,248][14395] Num frames 16200... +[2024-11-07 16:16:14,508][14395] Num frames 16300... +[2024-11-07 16:16:14,701][14395] Num frames 16400... +[2024-11-07 16:16:14,899][14395] Num frames 16500... +[2024-11-07 16:16:15,054][14395] Avg episode rewards: #0: 4.636, true rewards: #0: 4.136 +[2024-11-07 16:16:15,057][14395] Avg episode reward: 4.636, avg true_objective: 4.136 +[2024-11-07 16:16:15,186][14395] Num frames 16600... +[2024-11-07 16:16:15,379][14395] Num frames 16700... +[2024-11-07 16:16:15,570][14395] Num frames 16800... +[2024-11-07 16:16:15,762][14395] Num frames 16900... +[2024-11-07 16:16:15,879][14395] Avg episode rewards: #0: 4.617, true rewards: #0: 4.129 +[2024-11-07 16:16:15,880][14395] Avg episode reward: 4.617, avg true_objective: 4.129 +[2024-11-07 16:16:16,023][14395] Num frames 17000... +[2024-11-07 16:16:16,209][14395] Num frames 17100... +[2024-11-07 16:16:16,401][14395] Num frames 17200... +[2024-11-07 16:16:16,588][14395] Num frames 17300... +[2024-11-07 16:16:16,667][14395] Avg episode rewards: #0: 4.598, true rewards: #0: 4.122 +[2024-11-07 16:16:16,671][14395] Avg episode reward: 4.598, avg true_objective: 4.122 +[2024-11-07 16:16:16,860][14395] Num frames 17400... +[2024-11-07 16:16:17,053][14395] Num frames 17500... +[2024-11-07 16:16:17,253][14395] Num frames 17600... +[2024-11-07 16:16:17,467][14395] Num frames 17700... +[2024-11-07 16:16:17,584][14395] Avg episode rewards: #0: 4.611, true rewards: #0: 4.123 +[2024-11-07 16:16:17,588][14395] Avg episode reward: 4.611, avg true_objective: 4.123 +[2024-11-07 16:16:17,772][14395] Num frames 17800... +[2024-11-07 16:16:18,049][14395] Num frames 17900... +[2024-11-07 16:16:18,285][14395] Num frames 18000... +[2024-11-07 16:16:18,769][14395] Num frames 18100... +[2024-11-07 16:16:18,932][14395] Avg episode rewards: #0: 4.594, true rewards: #0: 4.116 +[2024-11-07 16:16:18,935][14395] Avg episode reward: 4.594, avg true_objective: 4.116 +[2024-11-07 16:16:19,204][14395] Num frames 18200... +[2024-11-07 16:16:19,406][14395] Num frames 18300... +[2024-11-07 16:16:19,962][14395] Num frames 18400... +[2024-11-07 16:16:20,227][14395] Avg episode rewards: #0: 4.577, true rewards: #0: 4.110 +[2024-11-07 16:16:20,230][14395] Avg episode reward: 4.577, avg true_objective: 4.110 +[2024-11-07 16:16:20,255][14395] Num frames 18500... +[2024-11-07 16:16:20,430][14395] Num frames 18600... +[2024-11-07 16:16:20,605][14395] Num frames 18700... +[2024-11-07 16:16:20,784][14395] Num frames 18800... +[2024-11-07 16:16:20,977][14395] Avg episode rewards: #0: 4.561, true rewards: #0: 4.104 +[2024-11-07 16:16:20,980][14395] Avg episode reward: 4.561, avg true_objective: 4.104 +[2024-11-07 16:16:21,037][14395] Num frames 18900... +[2024-11-07 16:16:21,207][14395] Num frames 19000... +[2024-11-07 16:16:21,358][14395] Num frames 19100... +[2024-11-07 16:16:21,512][14395] Num frames 19200... +[2024-11-07 16:16:21,662][14395] Avg episode rewards: #0: 4.546, true rewards: #0: 4.099 +[2024-11-07 16:16:21,667][14395] Avg episode reward: 4.546, avg true_objective: 4.099 +[2024-11-07 16:16:21,753][14395] Num frames 19300... +[2024-11-07 16:16:21,915][14395] Num frames 19400... +[2024-11-07 16:16:22,064][14395] Num frames 19500... +[2024-11-07 16:16:22,256][14395] Num frames 19600... +[2024-11-07 16:16:22,402][14395] Avg episode rewards: #0: 4.531, true rewards: #0: 4.093 +[2024-11-07 16:16:22,407][14395] Avg episode reward: 4.531, avg true_objective: 4.093 +[2024-11-07 16:16:22,518][14395] Num frames 19700... +[2024-11-07 16:16:22,697][14395] Num frames 19800... +[2024-11-07 16:16:22,892][14395] Num frames 19900... +[2024-11-07 16:16:23,073][14395] Num frames 20000... +[2024-11-07 16:16:23,189][14395] Avg episode rewards: #0: 4.517, true rewards: #0: 4.088 +[2024-11-07 16:16:23,194][14395] Avg episode reward: 4.517, avg true_objective: 4.088 +[2024-11-07 16:16:23,328][14395] Num frames 20100... +[2024-11-07 16:16:23,493][14395] Num frames 20200... +[2024-11-07 16:16:23,659][14395] Num frames 20300... +[2024-11-07 16:16:23,825][14395] Num frames 20400... +[2024-11-07 16:16:23,916][14395] Avg episode rewards: #0: 4.503, true rewards: #0: 4.083 +[2024-11-07 16:16:23,920][14395] Avg episode reward: 4.503, avg true_objective: 4.083 +[2024-11-07 16:16:24,072][14395] Num frames 20500... +[2024-11-07 16:16:24,229][14395] Num frames 20600... +[2024-11-07 16:16:24,406][14395] Num frames 20700... +[2024-11-07 16:16:24,569][14395] Num frames 20800... +[2024-11-07 16:16:24,728][14395] Avg episode rewards: #0: 4.522, true rewards: #0: 4.091 +[2024-11-07 16:16:24,733][14395] Avg episode reward: 4.522, avg true_objective: 4.091 +[2024-11-07 16:16:24,804][14395] Num frames 20900... +[2024-11-07 16:16:24,971][14395] Num frames 21000... +[2024-11-07 16:16:25,140][14395] Num frames 21100... +[2024-11-07 16:16:25,309][14395] Num frames 21200... +[2024-11-07 16:16:25,500][14395] Avg episode rewards: #0: 4.535, true rewards: #0: 4.092 +[2024-11-07 16:16:25,504][14395] Avg episode reward: 4.535, avg true_objective: 4.092 +[2024-11-07 16:16:25,559][14395] Num frames 21300... +[2024-11-07 16:16:25,718][14395] Num frames 21400... +[2024-11-07 16:16:25,884][14395] Num frames 21500... +[2024-11-07 16:16:26,046][14395] Num frames 21600... +[2024-11-07 16:16:26,211][14395] Num frames 21700... +[2024-11-07 16:16:26,318][14395] Avg episode rewards: #0: 4.552, true rewards: #0: 4.100 +[2024-11-07 16:16:26,322][14395] Avg episode reward: 4.552, avg true_objective: 4.100 +[2024-11-07 16:16:26,462][14395] Num frames 21800... +[2024-11-07 16:16:26,619][14395] Num frames 21900... +[2024-11-07 16:16:26,775][14395] Num frames 22000... +[2024-11-07 16:16:26,945][14395] Num frames 22100... +[2024-11-07 16:16:27,027][14395] Avg episode rewards: #0: 4.539, true rewards: #0: 4.095 +[2024-11-07 16:16:27,031][14395] Avg episode reward: 4.539, avg true_objective: 4.095 +[2024-11-07 16:16:27,212][14395] Num frames 22200... +[2024-11-07 16:16:27,401][14395] Num frames 22300... +[2024-11-07 16:16:27,584][14395] Num frames 22400... +[2024-11-07 16:16:27,819][14395] Avg episode rewards: #0: 4.527, true rewards: #0: 4.090 +[2024-11-07 16:16:27,820][14395] Avg episode reward: 4.527, avg true_objective: 4.090 +[2024-11-07 16:16:27,829][14395] Num frames 22500... +[2024-11-07 16:16:28,016][14395] Num frames 22600... +[2024-11-07 16:16:28,189][14395] Num frames 22700... +[2024-11-07 16:16:28,379][14395] Num frames 22800... +[2024-11-07 16:16:28,570][14395] Avg episode rewards: #0: 4.514, true rewards: #0: 4.086 +[2024-11-07 16:16:28,574][14395] Avg episode reward: 4.514, avg true_objective: 4.086 +[2024-11-07 16:16:28,631][14395] Num frames 22900... +[2024-11-07 16:16:28,819][14395] Num frames 23000... +[2024-11-07 16:16:28,999][14395] Num frames 23100... +[2024-11-07 16:16:29,181][14395] Num frames 23200... +[2024-11-07 16:16:29,356][14395] Avg episode rewards: #0: 4.502, true rewards: #0: 4.081 +[2024-11-07 16:16:29,360][14395] Avg episode reward: 4.502, avg true_objective: 4.081 +[2024-11-07 16:16:29,447][14395] Num frames 23300... +[2024-11-07 16:16:29,632][14395] Num frames 23400... +[2024-11-07 16:16:29,844][14395] Num frames 23500... +[2024-11-07 16:16:30,066][14395] Num frames 23600... +[2024-11-07 16:16:30,273][14395] Num frames 23700... +[2024-11-07 16:16:30,359][14395] Avg episode rewards: #0: 4.519, true rewards: #0: 4.088 +[2024-11-07 16:16:30,362][14395] Avg episode reward: 4.519, avg true_objective: 4.088 +[2024-11-07 16:16:30,563][14395] Num frames 23800... +[2024-11-07 16:16:30,779][14395] Num frames 23900... +[2024-11-07 16:16:30,990][14395] Num frames 24000... +[2024-11-07 16:16:31,239][14395] Avg episode rewards: #0: 4.508, true rewards: #0: 4.084 +[2024-11-07 16:16:31,240][14395] Avg episode reward: 4.508, avg true_objective: 4.084 +[2024-11-07 16:16:31,248][14395] Num frames 24100... +[2024-11-07 16:16:31,473][14395] Num frames 24200... +[2024-11-07 16:16:31,691][14395] Num frames 24300... +[2024-11-07 16:16:31,914][14395] Num frames 24400... +[2024-11-07 16:16:32,144][14395] Num frames 24500... +[2024-11-07 16:16:32,362][14395] Avg episode rewards: #0: 4.546, true rewards: #0: 4.096 +[2024-11-07 16:16:32,363][14395] Avg episode reward: 4.546, avg true_objective: 4.096 +[2024-11-07 16:16:32,443][14395] Num frames 24600... +[2024-11-07 16:16:32,656][14395] Num frames 24700... +[2024-11-07 16:16:32,880][14395] Num frames 24800... +[2024-11-07 16:16:33,089][14395] Num frames 24900... +[2024-11-07 16:16:33,309][14395] Num frames 25000... +[2024-11-07 16:16:33,420][14395] Avg episode rewards: #0: 4.561, true rewards: #0: 4.102 +[2024-11-07 16:16:33,423][14395] Avg episode reward: 4.561, avg true_objective: 4.102 +[2024-11-07 16:16:33,606][14395] Num frames 25100... +[2024-11-07 16:16:33,821][14395] Num frames 25200... +[2024-11-07 16:16:34,043][14395] Num frames 25300... +[2024-11-07 16:16:34,244][14395] Num frames 25400... +[2024-11-07 16:16:34,316][14395] Avg episode rewards: #0: 4.550, true rewards: #0: 4.098 +[2024-11-07 16:16:34,320][14395] Avg episode reward: 4.550, avg true_objective: 4.098 +[2024-11-07 16:16:34,545][14395] Num frames 25500... +[2024-11-07 16:16:34,812][14395] Num frames 25600... +[2024-11-07 16:16:35,039][14395] Num frames 25700... +[2024-11-07 16:16:35,288][14395] Avg episode rewards: #0: 4.538, true rewards: #0: 4.094 +[2024-11-07 16:16:35,290][14395] Avg episode reward: 4.538, avg true_objective: 4.094 +[2024-11-07 16:16:35,328][14395] Num frames 25800... +[2024-11-07 16:16:35,545][14395] Num frames 25900... +[2024-11-07 16:16:35,751][14395] Num frames 26000... +[2024-11-07 16:16:35,968][14395] Num frames 26100... +[2024-11-07 16:16:36,179][14395] Num frames 26200... +[2024-11-07 16:16:36,389][14395] Avg episode rewards: #0: 4.574, true rewards: #0: 4.105 +[2024-11-07 16:16:36,392][14395] Avg episode reward: 4.574, avg true_objective: 4.105 +[2024-11-07 16:16:36,478][14395] Num frames 26300... +[2024-11-07 16:16:36,692][14395] Num frames 26400... +[2024-11-07 16:16:36,917][14395] Num frames 26500... +[2024-11-07 16:16:37,134][14395] Num frames 26600... +[2024-11-07 16:16:37,336][14395] Num frames 26700... +[2024-11-07 16:16:37,434][14395] Avg episode rewards: #0: 4.588, true rewards: #0: 4.111 +[2024-11-07 16:16:37,435][14395] Avg episode reward: 4.588, avg true_objective: 4.111 +[2024-11-07 16:16:37,620][14395] Num frames 26800... +[2024-11-07 16:16:40,046][14395] Num frames 26900... +[2024-11-07 16:16:40,259][14395] Num frames 27000... +[2024-11-07 16:16:40,485][14395] Num frames 27100... +[2024-11-07 16:16:40,548][14395] Avg episode rewards: #0: 4.576, true rewards: #0: 4.107 +[2024-11-07 16:16:40,553][14395] Avg episode reward: 4.576, avg true_objective: 4.107 +[2024-11-07 16:16:40,770][14395] Num frames 27200... +[2024-11-07 16:16:40,981][14395] Num frames 27300... +[2024-11-07 16:16:41,200][14395] Num frames 27400... +[2024-11-07 16:16:41,412][14395] Num frames 27500... +[2024-11-07 16:16:41,514][14395] Avg episode rewards: #0: 4.585, true rewards: #0: 4.107 +[2024-11-07 16:16:41,518][14395] Avg episode reward: 4.585, avg true_objective: 4.107 +[2024-11-07 16:16:41,704][14395] Num frames 27600... +[2024-11-07 16:16:41,924][14395] Num frames 27700... +[2024-11-07 16:16:42,142][14395] Num frames 27800... +[2024-11-07 16:16:42,350][14395] Num frames 27900... +[2024-11-07 16:16:42,482][14395] Avg episode rewards: #0: 4.594, true rewards: #0: 4.108 +[2024-11-07 16:16:42,486][14395] Avg episode reward: 4.594, avg true_objective: 4.108 +[2024-11-07 16:16:42,652][14395] Num frames 28000... +[2024-11-07 16:16:42,843][14395] Num frames 28100... +[2024-11-07 16:16:43,060][14395] Num frames 28200... +[2024-11-07 16:16:43,274][14395] Num frames 28300... +[2024-11-07 16:16:43,491][14395] Num frames 28400... +[2024-11-07 16:16:43,585][14395] Avg episode rewards: #0: 4.626, true rewards: #0: 4.118 +[2024-11-07 16:16:43,590][14395] Avg episode reward: 4.626, avg true_objective: 4.118 +[2024-11-07 16:16:43,785][14395] Num frames 28500... +[2024-11-07 16:16:44,027][14395] Num frames 28600... +[2024-11-07 16:16:44,242][14395] Num frames 28700... +[2024-11-07 16:16:44,455][14395] Num frames 28800... +[2024-11-07 16:16:44,650][14395] Num frames 28900... +[2024-11-07 16:16:44,818][14395] Avg episode rewards: #0: 4.666, true rewards: #0: 4.137 +[2024-11-07 16:16:44,823][14395] Avg episode reward: 4.666, avg true_objective: 4.137 +[2024-11-07 16:16:44,916][14395] Num frames 29000... +[2024-11-07 16:16:45,109][14395] Num frames 29100... +[2024-11-07 16:16:45,293][14395] Num frames 29200... +[2024-11-07 16:16:45,472][14395] Num frames 29300... +[2024-11-07 16:16:45,652][14395] Num frames 29400... +[2024-11-07 16:16:45,725][14395] Avg episode rewards: #0: 4.677, true rewards: #0: 4.142 +[2024-11-07 16:16:45,728][14395] Avg episode reward: 4.677, avg true_objective: 4.142 +[2024-11-07 16:16:45,910][14395] Num frames 29500... +[2024-11-07 16:16:46,097][14395] Num frames 29600... +[2024-11-07 16:16:46,275][14395] Num frames 29700... +[2024-11-07 16:16:46,452][14395] Num frames 29800... +[2024-11-07 16:16:46,610][14395] Avg episode rewards: #0: 4.688, true rewards: #0: 4.147 +[2024-11-07 16:16:46,612][14395] Avg episode reward: 4.688, avg true_objective: 4.147 +[2024-11-07 16:16:46,707][14395] Num frames 29900... +[2024-11-07 16:16:46,889][14395] Num frames 30000... +[2024-11-07 16:16:47,080][14395] Num frames 30100... +[2024-11-07 16:16:47,258][14395] Num frames 30200... +[2024-11-07 16:16:47,443][14395] Avg episode rewards: #0: 4.695, true rewards: #0: 4.147 +[2024-11-07 16:16:47,444][14395] Avg episode reward: 4.695, avg true_objective: 4.147 +[2024-11-07 16:16:47,502][14395] Num frames 30300... +[2024-11-07 16:16:47,688][14395] Num frames 30400... +[2024-11-07 16:16:47,876][14395] Num frames 30500... +[2024-11-07 16:16:48,076][14395] Num frames 30600... +[2024-11-07 16:16:48,293][14395] Avg episode rewards: #0: 4.701, true rewards: #0: 4.147 +[2024-11-07 16:16:48,297][14395] Avg episode reward: 4.701, avg true_objective: 4.147 +[2024-11-07 16:16:48,336][14395] Num frames 30700... +[2024-11-07 16:16:48,527][14395] Num frames 30800... +[2024-11-07 16:16:48,719][14395] Num frames 30900... +[2024-11-07 16:16:48,901][14395] Num frames 31000... +[2024-11-07 16:16:49,090][14395] Avg episode rewards: #0: 4.690, true rewards: #0: 4.143 +[2024-11-07 16:16:49,094][14395] Avg episode reward: 4.690, avg true_objective: 4.143 +[2024-11-07 16:16:49,178][14395] Num frames 31100... +[2024-11-07 16:16:49,371][14395] Num frames 31200... +[2024-11-07 16:16:49,566][14395] Num frames 31300... +[2024-11-07 16:16:49,748][14395] Num frames 31400... +[2024-11-07 16:16:49,904][14395] Avg episode rewards: #0: 4.678, true rewards: #0: 4.139 +[2024-11-07 16:16:49,906][14395] Avg episode reward: 4.678, avg true_objective: 4.139 +[2024-11-07 16:16:49,989][14395] Num frames 31500... +[2024-11-07 16:16:50,166][14395] Num frames 31600... +[2024-11-07 16:16:50,347][14395] Num frames 31700... +[2024-11-07 16:16:50,523][14395] Num frames 31800... +[2024-11-07 16:16:50,651][14395] Avg episode rewards: #0: 4.668, true rewards: #0: 4.135 +[2024-11-07 16:16:50,652][14395] Avg episode reward: 4.668, avg true_objective: 4.135 +[2024-11-07 16:16:50,773][14395] Num frames 31900... +[2024-11-07 16:16:50,956][14395] Num frames 32000... +[2024-11-07 16:16:51,140][14395] Num frames 32100... +[2024-11-07 16:16:51,338][14395] Num frames 32200... +[2024-11-07 16:16:51,444][14395] Avg episode rewards: #0: 4.657, true rewards: #0: 4.131 +[2024-11-07 16:16:51,446][14395] Avg episode reward: 4.657, avg true_objective: 4.131 +[2024-11-07 16:16:51,598][14395] Num frames 32300... +[2024-11-07 16:16:51,772][14395] Num frames 32400... +[2024-11-07 16:16:51,954][14395] Num frames 32500... +[2024-11-07 16:16:52,128][14395] Num frames 32600... +[2024-11-07 16:16:52,198][14395] Avg episode rewards: #0: 4.647, true rewards: #0: 4.128 +[2024-11-07 16:16:52,199][14395] Avg episode reward: 4.647, avg true_objective: 4.128 +[2024-11-07 16:16:52,362][14395] Num frames 32700... +[2024-11-07 16:16:52,584][14395] Num frames 32800... +[2024-11-07 16:16:52,764][14395] Num frames 32900... +[2024-11-07 16:16:52,950][14395] Num frames 33000... +[2024-11-07 16:16:53,107][14395] Avg episode rewards: #0: 4.657, true rewards: #0: 4.132 +[2024-11-07 16:16:53,113][14395] Avg episode reward: 4.657, avg true_objective: 4.132 +[2024-11-07 16:16:53,212][14395] Num frames 33100... +[2024-11-07 16:16:53,395][14395] Num frames 33200... +[2024-11-07 16:16:53,564][14395] Num frames 33300... +[2024-11-07 16:16:53,732][14395] Num frames 33400... +[2024-11-07 16:16:53,913][14395] Num frames 33500... +[2024-11-07 16:16:54,091][14395] Num frames 33600... +[2024-11-07 16:16:54,144][14395] Avg episode rewards: #0: 4.691, true rewards: #0: 4.148 +[2024-11-07 16:16:54,148][14395] Avg episode reward: 4.691, avg true_objective: 4.148 +[2024-11-07 16:16:54,334][14395] Num frames 33700... +[2024-11-07 16:16:54,507][14395] Num frames 33800... +[2024-11-07 16:16:54,675][14395] Num frames 33900... +[2024-11-07 16:16:54,880][14395] Avg episode rewards: #0: 4.681, true rewards: #0: 4.144 +[2024-11-07 16:16:54,884][14395] Avg episode reward: 4.681, avg true_objective: 4.144 +[2024-11-07 16:16:54,941][14395] Num frames 34000... +[2024-11-07 16:16:55,132][14395] Num frames 34100... +[2024-11-07 16:16:55,311][14395] Num frames 34200... +[2024-11-07 16:16:55,441][14395] Avg episode rewards: #0: 4.655, true rewards: #0: 4.125 +[2024-11-07 16:16:55,446][14395] Avg episode reward: 4.655, avg true_objective: 4.125 +[2024-11-07 16:16:55,563][14395] Num frames 34300... +[2024-11-07 16:16:55,743][14395] Num frames 34400... +[2024-11-07 16:16:55,915][14395] Num frames 34500... +[2024-11-07 16:16:56,094][14395] Num frames 34600... +[2024-11-07 16:16:56,212][14395] Avg episode rewards: #0: 4.646, true rewards: #0: 4.122 +[2024-11-07 16:16:56,215][14395] Avg episode reward: 4.646, avg true_objective: 4.122 +[2024-11-07 16:16:56,363][14395] Num frames 34700... +[2024-11-07 16:16:56,537][14395] Num frames 34800... +[2024-11-07 16:16:56,734][14395] Avg episode rewards: #0: 4.621, true rewards: #0: 4.104 +[2024-11-07 16:16:56,738][14395] Avg episode reward: 4.621, avg true_objective: 4.104 +[2024-11-07 16:16:56,803][14395] Num frames 34900... +[2024-11-07 16:16:56,985][14395] Num frames 35000... +[2024-11-07 16:16:57,179][14395] Num frames 35100... +[2024-11-07 16:16:57,301][14395] Avg episode rewards: #0: 4.597, true rewards: #0: 4.086 +[2024-11-07 16:16:57,304][14395] Avg episode reward: 4.597, avg true_objective: 4.086 +[2024-11-07 16:16:57,443][14395] Num frames 35200... +[2024-11-07 16:16:57,617][14395] Num frames 35300... +[2024-11-07 16:16:57,814][14395] Num frames 35400... +[2024-11-07 16:16:57,994][14395] Num frames 35500... +[2024-11-07 16:16:58,089][14395] Avg episode rewards: #0: 4.589, true rewards: #0: 4.083 +[2024-11-07 16:16:58,093][14395] Avg episode reward: 4.589, avg true_objective: 4.083 +[2024-11-07 16:16:58,255][14395] Num frames 35600... +[2024-11-07 16:16:58,479][14395] Avg episode rewards: #0: 4.556, true rewards: #0: 4.056 +[2024-11-07 16:16:58,482][14395] Avg episode reward: 4.556, avg true_objective: 4.056 +[2024-11-07 16:16:58,512][14395] Num frames 35700... +[2024-11-07 16:16:58,720][14395] Num frames 35800... +[2024-11-07 16:16:58,938][14395] Num frames 35900... +[2024-11-07 16:16:59,165][14395] Num frames 36000... +[2024-11-07 16:16:59,382][14395] Avg episode rewards: #0: 4.548, true rewards: #0: 4.054 +[2024-11-07 16:16:59,387][14395] Avg episode reward: 4.548, avg true_objective: 4.054 +[2024-11-07 16:16:59,445][14395] Num frames 36100... +[2024-11-07 16:16:59,626][14395] Num frames 36200... +[2024-11-07 16:16:59,819][14395] Num frames 36300... +[2024-11-07 16:17:00,011][14395] Num frames 36400... +[2024-11-07 16:17:00,180][14395] Avg episode rewards: #0: 4.540, true rewards: #0: 4.051 +[2024-11-07 16:17:00,181][14395] Avg episode reward: 4.540, avg true_objective: 4.051 +[2024-11-07 16:17:00,255][14395] Num frames 36500... +[2024-11-07 16:17:00,503][14395] Num frames 36600... +[2024-11-07 16:17:00,720][14395] Num frames 36700... +[2024-11-07 16:17:00,828][14395] Avg episode rewards: #0: 4.519, true rewards: #0: 4.035 +[2024-11-07 16:17:00,833][14395] Avg episode reward: 4.519, avg true_objective: 4.035 +[2024-11-07 16:17:01,011][14395] Num frames 36800... +[2024-11-07 16:17:01,210][14395] Num frames 36900... +[2024-11-07 16:17:01,422][14395] Num frames 37000... +[2024-11-07 16:17:01,575][14395] Num frames 37100... +[2024-11-07 16:17:01,644][14395] Avg episode rewards: #0: 4.512, true rewards: #0: 4.033 +[2024-11-07 16:17:01,646][14395] Avg episode reward: 4.512, avg true_objective: 4.033 +[2024-11-07 16:17:01,816][14395] Num frames 37200... +[2024-11-07 16:17:01,969][14395] Num frames 37300... +[2024-11-07 16:17:02,120][14395] Num frames 37400... +[2024-11-07 16:17:02,312][14395] Avg episode rewards: #0: 4.504, true rewards: #0: 4.031 +[2024-11-07 16:17:02,316][14395] Avg episode reward: 4.504, avg true_objective: 4.031 +[2024-11-07 16:17:02,354][14395] Num frames 37500... +[2024-11-07 16:17:02,515][14395] Num frames 37600... +[2024-11-07 16:17:02,693][14395] Num frames 37700... +[2024-11-07 16:17:02,849][14395] Num frames 37800... +[2024-11-07 16:17:03,024][14395] Avg episode rewards: #0: 4.497, true rewards: #0: 4.029 +[2024-11-07 16:17:03,027][14395] Avg episode reward: 4.497, avg true_objective: 4.029 +[2024-11-07 16:17:03,101][14395] Num frames 37900... +[2024-11-07 16:17:03,262][14395] Num frames 38000... +[2024-11-07 16:17:03,419][14395] Num frames 38100... +[2024-11-07 16:17:03,567][14395] Num frames 38200... +[2024-11-07 16:17:03,730][14395] Avg episode rewards: #0: 4.490, true rewards: #0: 4.027 +[2024-11-07 16:17:03,733][14395] Avg episode reward: 4.490, avg true_objective: 4.027 +[2024-11-07 16:17:03,815][14395] Num frames 38300... +[2024-11-07 16:17:03,996][14395] Num frames 38400... +[2024-11-07 16:17:04,149][14395] Num frames 38500... +[2024-11-07 16:17:04,311][14395] Num frames 38600... +[2024-11-07 16:17:04,439][14395] Avg episode rewards: #0: 4.484, true rewards: #0: 4.025 +[2024-11-07 16:17:04,445][14395] Avg episode reward: 4.484, avg true_objective: 4.025 +[2024-11-07 16:17:04,544][14395] Num frames 38700... +[2024-11-07 16:17:04,711][14395] Num frames 38800... +[2024-11-07 16:17:04,872][14395] Num frames 38900... +[2024-11-07 16:17:05,038][14395] Num frames 39000... +[2024-11-07 16:17:05,209][14395] Avg episode rewards: #0: 4.491, true rewards: #0: 4.027 +[2024-11-07 16:17:05,214][14395] Avg episode reward: 4.491, avg true_objective: 4.027 +[2024-11-07 16:17:05,300][14395] Num frames 39100... +[2024-11-07 16:17:05,472][14395] Num frames 39200... +[2024-11-07 16:17:05,664][14395] Num frames 39300... +[2024-11-07 16:17:05,873][14395] Num frames 39400... +[2024-11-07 16:17:06,000][14395] Avg episode rewards: #0: 4.484, true rewards: #0: 4.025 +[2024-11-07 16:17:06,005][14395] Avg episode reward: 4.484, avg true_objective: 4.025 +[2024-11-07 16:17:06,116][14395] Num frames 39500... +[2024-11-07 16:17:06,277][14395] Num frames 39600... +[2024-11-07 16:17:06,436][14395] Num frames 39700... +[2024-11-07 16:17:06,601][14395] Num frames 39800... +[2024-11-07 16:17:06,705][14395] Avg episode rewards: #0: 4.477, true rewards: #0: 4.023 +[2024-11-07 16:17:06,710][14395] Avg episode reward: 4.477, avg true_objective: 4.023 +[2024-11-07 16:17:06,847][14395] Num frames 39900... +[2024-11-07 16:17:07,004][14395] Num frames 40000... +[2024-11-07 16:17:07,165][14395] Num frames 40100... +[2024-11-07 16:17:07,368][14395] Num frames 40200... +[2024-11-07 16:17:07,542][14395] Avg episode rewards: #0: 4.488, true rewards: #0: 4.027 +[2024-11-07 16:17:07,548][14395] Avg episode reward: 4.488, avg true_objective: 4.027 +[2024-11-07 16:18:34,581][14395] Replay video saved to /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/replay.mp4! +[2024-11-07 16:18:45,124][14395] The model has been pushed to https://huggingface.co/alidenewade/rl_course_vizdoom_health_gathering_supreme +[2024-11-07 16:21:18,089][14395] Loading existing experiment configuration from /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/config.json +[2024-11-07 16:21:18,091][14395] Overriding arg 'num_workers' with value 4 passed from command line +[2024-11-07 16:21:18,093][14395] Adding new argument 'no_render'=True that is not in the saved config file! +[2024-11-07 16:21:18,094][14395] Adding new argument 'save_video'=True that is not in the saved config file! +[2024-11-07 16:21:18,096][14395] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! +[2024-11-07 16:21:18,097][14395] Adding new argument 'video_name'=None that is not in the saved config file! +[2024-11-07 16:21:18,098][14395] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! +[2024-11-07 16:21:18,099][14395] Adding new argument 'max_num_episodes'=10000 that is not in the saved config file! +[2024-11-07 16:21:18,101][14395] Adding new argument 'push_to_hub'=True that is not in the saved config file! +[2024-11-07 16:21:18,102][14395] Adding new argument 'hf_repository'='alidenewade/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! +[2024-11-07 16:21:18,104][14395] Adding new argument 'policy_index'=0 that is not in the saved config file! +[2024-11-07 16:21:18,106][14395] Adding new argument 'eval_deterministic'=False that is not in the saved config file! +[2024-11-07 16:21:18,109][14395] Adding new argument 'train_script'=None that is not in the saved config file! +[2024-11-07 16:21:18,111][14395] Adding new argument 'enjoy_script'=None that is not in the saved config file! +[2024-11-07 16:21:18,113][14395] Using frameskip 1 and render_action_repeat=4 for evaluation +[2024-11-07 16:21:18,147][14395] RunningMeanStd input shape: (3, 72, 128) +[2024-11-07 16:21:18,151][14395] RunningMeanStd input shape: (1,) +[2024-11-07 16:21:18,170][14395] ConvEncoder: input_channels=3 +[2024-11-07 16:21:18,225][14395] Conv encoder output size: 512 +[2024-11-07 16:21:18,227][14395] Policy head output size: 512 +[2024-11-07 16:21:18,260][14395] Loading state from checkpoint /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000004884_20004864.pth... +[2024-11-07 16:21:18,986][14395] Num frames 100... +[2024-11-07 16:21:19,226][14395] Num frames 200... +[2024-11-07 16:21:19,387][14395] Num frames 300... +[2024-11-07 16:21:19,580][14395] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840 +[2024-11-07 16:21:19,582][14395] Avg episode reward: 3.840, avg true_objective: 3.840 +[2024-11-07 16:21:19,620][14395] Num frames 400... +[2024-11-07 16:21:19,981][14395] Num frames 500... +[2024-11-07 16:21:20,217][14395] Num frames 600... +[2024-11-07 16:21:20,467][14395] Num frames 700... +[2024-11-07 16:21:20,687][14395] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840 +[2024-11-07 16:21:20,691][14395] Avg episode reward: 3.840, avg true_objective: 3.840 +[2024-11-07 16:21:20,796][14395] Num frames 800... +[2024-11-07 16:21:20,983][14395] Num frames 900... +[2024-11-07 16:21:21,161][14395] Num frames 1000... +[2024-11-07 16:21:21,332][14395] Num frames 1100... +[2024-11-07 16:21:21,478][14395] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840 +[2024-11-07 16:21:21,484][14395] Avg episode reward: 3.840, avg true_objective: 3.840 +[2024-11-07 16:21:21,596][14395] Num frames 1200... +[2024-11-07 16:21:22,088][14395] Num frames 1300... +[2024-11-07 16:21:22,434][14395] Num frames 1400... +[2024-11-07 16:21:22,752][14395] Num frames 1500... +[2024-11-07 16:21:22,962][14395] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840 +[2024-11-07 16:21:22,966][14395] Avg episode reward: 3.840, avg true_objective: 3.840 +[2024-11-07 16:21:23,335][14395] Num frames 1600... +[2024-11-07 16:21:23,532][14395] Num frames 1700... +[2024-11-07 16:21:23,723][14395] Num frames 1800... +[2024-11-07 16:21:23,910][14395] Num frames 1900... +[2024-11-07 16:21:24,291][14395] Avg episode rewards: #0: 4.168, true rewards: #0: 3.968 +[2024-11-07 16:21:24,293][14395] Avg episode reward: 4.168, avg true_objective: 3.968 +[2024-11-07 16:21:24,332][14395] Num frames 2000... +[2024-11-07 16:21:24,577][14395] Num frames 2100... +[2024-11-07 16:21:24,737][14395] Num frames 2200... +[2024-11-07 16:21:24,900][14395] Num frames 2300... +[2024-11-07 16:21:25,077][14395] Avg episode rewards: #0: 4.113, true rewards: #0: 3.947 +[2024-11-07 16:21:25,080][14395] Avg episode reward: 4.113, avg true_objective: 3.947 +[2024-11-07 16:21:25,201][14395] Num frames 2400... +[2024-11-07 16:21:25,480][14395] Num frames 2500... +[2024-11-07 16:21:25,642][14395] Num frames 2600... +[2024-11-07 16:21:25,937][14395] Num frames 2700... +[2024-11-07 16:21:26,134][14395] Avg episode rewards: #0: 4.074, true rewards: #0: 3.931 +[2024-11-07 16:21:26,136][14395] Avg episode reward: 4.074, avg true_objective: 3.931 +[2024-11-07 16:21:26,293][14395] Num frames 2800... +[2024-11-07 16:21:26,625][14395] Num frames 2900... +[2024-11-07 16:21:27,006][14395] Num frames 3000... +[2024-11-07 16:21:27,263][14395] Num frames 3100... +[2024-11-07 16:21:27,504][14395] Num frames 3200... +[2024-11-07 16:21:27,558][14395] Avg episode rewards: #0: 4.250, true rewards: #0: 4.000 +[2024-11-07 16:21:27,562][14395] Avg episode reward: 4.250, avg true_objective: 4.000 +[2024-11-07 16:21:27,789][14395] Num frames 3300... +[2024-11-07 16:21:28,013][14395] Num frames 3400... +[2024-11-07 16:21:28,274][14395] Num frames 3500... +[2024-11-07 16:21:28,527][14395] Avg episode rewards: #0: 4.204, true rewards: #0: 3.982 +[2024-11-07 16:21:28,531][14395] Avg episode reward: 4.204, avg true_objective: 3.982 +[2024-11-07 16:21:28,584][14395] Num frames 3600... +[2024-11-07 16:21:28,772][14395] Num frames 3700... +[2024-11-07 16:21:28,953][14395] Num frames 3800... +[2024-11-07 16:21:29,084][14395] Avg episode rewards: #0: 4.040, true rewards: #0: 3.840 +[2024-11-07 16:21:29,087][14395] Avg episode reward: 4.040, avg true_objective: 3.840 +[2024-11-07 16:21:29,232][14395] Num frames 3900... +[2024-11-07 16:21:29,414][14395] Num frames 4000... +[2024-11-07 16:21:29,596][14395] Num frames 4100... +[2024-11-07 16:21:29,770][14395] Num frames 4200... +[2024-11-07 16:21:29,878][14395] Avg episode rewards: #0: 4.022, true rewards: #0: 3.840 +[2024-11-07 16:21:29,880][14395] Avg episode reward: 4.022, avg true_objective: 3.840 +[2024-11-07 16:21:30,067][14395] Num frames 4300... +[2024-11-07 16:21:30,283][14395] Num frames 4400... +[2024-11-07 16:21:30,497][14395] Num frames 4500... +[2024-11-07 16:21:30,730][14395] Num frames 4600... +[2024-11-07 16:21:30,810][14395] Avg episode rewards: #0: 4.007, true rewards: #0: 3.840 +[2024-11-07 16:21:30,814][14395] Avg episode reward: 4.007, avg true_objective: 3.840 +[2024-11-07 16:21:30,999][14395] Num frames 4700... +[2024-11-07 16:21:31,203][14395] Num frames 4800... +[2024-11-07 16:21:31,520][14395] Num frames 4900... +[2024-11-07 16:21:31,825][14395] Avg episode rewards: #0: 3.994, true rewards: #0: 3.840 +[2024-11-07 16:21:31,828][14395] Avg episode reward: 3.994, avg true_objective: 3.840 +[2024-11-07 16:21:31,858][14395] Num frames 5000... +[2024-11-07 16:21:32,076][14395] Num frames 5100... +[2024-11-07 16:21:32,299][14395] Num frames 5200... +[2024-11-07 16:21:32,584][14395] Num frames 5300... +[2024-11-07 16:21:32,894][14395] Num frames 5400... +[2024-11-07 16:21:33,060][14395] Avg episode rewards: #0: 4.100, true rewards: #0: 3.886 +[2024-11-07 16:21:33,064][14395] Avg episode reward: 4.100, avg true_objective: 3.886 +[2024-11-07 16:21:33,244][14395] Num frames 5500... +[2024-11-07 16:21:33,675][14395] Num frames 5600... +[2024-11-07 16:21:34,050][14395] Num frames 5700... +[2024-11-07 16:21:34,432][14395] Num frames 5800... +[2024-11-07 16:21:34,724][14395] Avg episode rewards: #0: 4.192, true rewards: #0: 3.925 +[2024-11-07 16:21:34,726][14395] Avg episode reward: 4.192, avg true_objective: 3.925 +[2024-11-07 16:21:34,760][14395] Num frames 5900... +[2024-11-07 16:21:34,996][14395] Num frames 6000... +[2024-11-07 16:21:35,225][14395] Num frames 6100... +[2024-11-07 16:21:35,421][14395] Num frames 6200... +[2024-11-07 16:21:35,628][14395] Avg episode rewards: #0: 4.170, true rewards: #0: 3.920 +[2024-11-07 16:21:35,631][14395] Avg episode reward: 4.170, avg true_objective: 3.920 +[2024-11-07 16:21:35,693][14395] Num frames 6300... +[2024-11-07 16:21:35,934][14395] Num frames 6400... +[2024-11-07 16:21:36,144][14395] Num frames 6500... +[2024-11-07 16:21:36,316][14395] Num frames 6600... +[2024-11-07 16:21:36,505][14395] Num frames 6700... +[2024-11-07 16:21:36,600][14395] Avg episode rewards: #0: 4.247, true rewards: #0: 3.953 +[2024-11-07 16:21:36,603][14395] Avg episode reward: 4.247, avg true_objective: 3.953 +[2024-11-07 16:21:36,793][14395] Num frames 6800... +[2024-11-07 16:21:37,059][14395] Num frames 6900... +[2024-11-07 16:21:37,355][14395] Num frames 7000... +[2024-11-07 16:21:37,597][14395] Num frames 7100... +[2024-11-07 16:21:37,661][14395] Avg episode rewards: #0: 4.224, true rewards: #0: 3.947 +[2024-11-07 16:21:37,667][14395] Avg episode reward: 4.224, avg true_objective: 3.947 +[2024-11-07 16:21:37,893][14395] Num frames 7200... +[2024-11-07 16:21:38,099][14395] Num frames 7300... +[2024-11-07 16:21:38,314][14395] Num frames 7400... +[2024-11-07 16:21:38,656][14395] Num frames 7500... +[2024-11-07 16:21:38,803][14395] Avg episode rewards: #0: 4.274, true rewards: #0: 3.958 +[2024-11-07 16:21:38,805][14395] Avg episode reward: 4.274, avg true_objective: 3.958 +[2024-11-07 16:21:39,060][14395] Num frames 7600... +[2024-11-07 16:21:39,246][14395] Num frames 7700... +[2024-11-07 16:21:39,450][14395] Num frames 7800... +[2024-11-07 16:21:39,676][14395] Num frames 7900... +[2024-11-07 16:21:39,896][14395] Avg episode rewards: #0: 4.334, true rewards: #0: 3.984 +[2024-11-07 16:21:39,900][14395] Avg episode reward: 4.334, avg true_objective: 3.984 +[2024-11-07 16:21:39,975][14395] Num frames 8000... +[2024-11-07 16:21:40,183][14395] Num frames 8100... +[2024-11-07 16:21:40,458][14395] Num frames 8200... +[2024-11-07 16:21:40,780][14395] Num frames 8300... +[2024-11-07 16:21:40,980][14395] Avg episode rewards: #0: 4.310, true rewards: #0: 3.977 +[2024-11-07 16:21:40,987][14395] Avg episode reward: 4.310, avg true_objective: 3.977 +[2024-11-07 16:21:41,089][14395] Num frames 8400... +[2024-11-07 16:21:41,321][14395] Num frames 8500... +[2024-11-07 16:21:41,492][14395] Num frames 8600... +[2024-11-07 16:21:41,672][14395] Num frames 8700... +[2024-11-07 16:21:41,858][14395] Num frames 8800... +[2024-11-07 16:21:42,027][14395] Num frames 8900... +[2024-11-07 16:21:42,184][14395] Avg episode rewards: #0: 4.527, true rewards: #0: 4.073 +[2024-11-07 16:21:42,190][14395] Avg episode reward: 4.527, avg true_objective: 4.073 +[2024-11-07 16:21:42,274][14395] Num frames 9000... +[2024-11-07 16:21:42,459][14395] Num frames 9100... +[2024-11-07 16:21:42,630][14395] Num frames 9200... +[2024-11-07 16:21:42,822][14395] Num frames 9300... +[2024-11-07 16:21:42,958][14395] Avg episode rewards: #0: 4.497, true rewards: #0: 4.063 +[2024-11-07 16:21:42,959][14395] Avg episode reward: 4.497, avg true_objective: 4.063 +[2024-11-07 16:21:43,057][14395] Num frames 9400... +[2024-11-07 16:21:43,288][14395] Num frames 9500... +[2024-11-07 16:21:43,494][14395] Num frames 9600... +[2024-11-07 16:21:43,547][14395] Avg episode rewards: #0: 4.417, true rewards: #0: 4.000 +[2024-11-07 16:21:43,549][14395] Avg episode reward: 4.417, avg true_objective: 4.000 +[2024-11-07 16:21:43,791][14395] Num frames 9700... +[2024-11-07 16:21:44,062][14395] Num frames 9800... +[2024-11-07 16:21:44,308][14395] Num frames 9900... +[2024-11-07 16:21:44,561][14395] Avg episode rewards: #0: 4.394, true rewards: #0: 3.994 +[2024-11-07 16:21:44,566][14395] Avg episode reward: 4.394, avg true_objective: 3.994 +[2024-11-07 16:21:44,615][14395] Num frames 10000... +[2024-11-07 16:21:44,831][14395] Num frames 10100... +[2024-11-07 16:21:45,066][14395] Num frames 10200... +[2024-11-07 16:21:45,283][14395] Num frames 10300... +[2024-11-07 16:21:45,523][14395] Avg episode rewards: #0: 4.372, true rewards: #0: 3.988 +[2024-11-07 16:21:45,527][14395] Avg episode reward: 4.372, avg true_objective: 3.988 +[2024-11-07 16:21:45,608][14395] Num frames 10400... +[2024-11-07 16:21:45,848][14395] Num frames 10500... +[2024-11-07 16:21:46,086][14395] Num frames 10600... +[2024-11-07 16:21:46,336][14395] Num frames 10700... +[2024-11-07 16:21:46,506][14395] Avg episode rewards: #0: 4.353, true rewards: #0: 3.982 +[2024-11-07 16:21:46,510][14395] Avg episode reward: 4.353, avg true_objective: 3.982 +[2024-11-07 16:21:46,676][14395] Num frames 10800... +[2024-11-07 16:21:46,931][14395] Num frames 10900... +[2024-11-07 16:21:47,152][14395] Num frames 11000... +[2024-11-07 16:21:47,227][14395] Avg episode rewards: #0: 4.289, true rewards: #0: 3.931 +[2024-11-07 16:21:47,232][14395] Avg episode reward: 4.289, avg true_objective: 3.931 +[2024-11-07 16:21:47,454][14395] Num frames 11100... +[2024-11-07 16:21:47,745][14395] Num frames 11200... +[2024-11-07 16:21:48,150][14395] Num frames 11300... +[2024-11-07 16:21:50,820][14395] Avg episode rewards: #0: 4.273, true rewards: #0: 3.928 +[2024-11-07 16:21:50,822][14395] Avg episode reward: 4.273, avg true_objective: 3.928 +[2024-11-07 16:21:50,849][14395] Num frames 11400... +[2024-11-07 16:21:51,170][14395] Num frames 11500... +[2024-11-07 16:21:51,408][14395] Num frames 11600... +[2024-11-07 16:21:51,886][14395] Num frames 11700... +[2024-11-07 16:21:52,124][14395] Avg episode rewards: #0: 4.259, true rewards: #0: 3.925 +[2024-11-07 16:21:52,128][14395] Avg episode reward: 4.259, avg true_objective: 3.925 +[2024-11-07 16:21:52,201][14395] Num frames 11800... +[2024-11-07 16:21:52,431][14395] Num frames 11900... +[2024-11-07 16:21:52,687][14395] Num frames 12000... +[2024-11-07 16:21:52,916][14395] Num frames 12100... +[2024-11-07 16:21:53,138][14395] Num frames 12200... +[2024-11-07 16:21:53,251][14395] Avg episode rewards: #0: 4.298, true rewards: #0: 3.943 +[2024-11-07 16:21:53,256][14395] Avg episode reward: 4.298, avg true_objective: 3.943 +[2024-11-07 16:21:53,456][14395] Num frames 12300... +[2024-11-07 16:21:53,690][14395] Num frames 12400... +[2024-11-07 16:21:53,922][14395] Num frames 12500... +[2024-11-07 16:21:54,152][14395] Num frames 12600... +[2024-11-07 16:21:54,227][14395] Avg episode rewards: #0: 4.284, true rewards: #0: 3.940 +[2024-11-07 16:21:54,234][14395] Avg episode reward: 4.284, avg true_objective: 3.940 +[2024-11-07 16:21:54,470][14395] Num frames 12700... +[2024-11-07 16:21:54,762][14395] Num frames 12800... +[2024-11-07 16:21:55,064][14395] Num frames 12900... +[2024-11-07 16:21:55,422][14395] Avg episode rewards: #0: 4.270, true rewards: #0: 3.937 +[2024-11-07 16:21:55,426][14395] Avg episode reward: 4.270, avg true_objective: 3.937 +[2024-11-07 16:21:55,459][14395] Num frames 13000... +[2024-11-07 16:21:55,787][14395] Num frames 13100... +[2024-11-07 16:21:56,134][14395] Num frames 13200... +[2024-11-07 16:21:56,454][14395] Num frames 13300... +[2024-11-07 16:21:56,833][14395] Avg episode rewards: #0: 4.258, true rewards: #0: 3.934 +[2024-11-07 16:21:56,834][14395] Avg episode reward: 4.258, avg true_objective: 3.934 +[2024-11-07 16:21:56,903][14395] Num frames 13400... +[2024-11-07 16:21:57,152][14395] Num frames 13500... +[2024-11-07 16:21:57,385][14395] Num frames 13600... +[2024-11-07 16:21:57,756][14395] Num frames 13700... +[2024-11-07 16:21:57,955][14395] Avg episode rewards: #0: 4.246, true rewards: #0: 3.931 +[2024-11-07 16:21:57,957][14395] Avg episode reward: 4.246, avg true_objective: 3.931 +[2024-11-07 16:21:58,068][14395] Num frames 13800... +[2024-11-07 16:21:58,337][14395] Num frames 13900... +[2024-11-07 16:21:58,595][14395] Num frames 14000... +[2024-11-07 16:21:58,875][14395] Num frames 14100... +[2024-11-07 16:21:59,064][14395] Avg episode rewards: #0: 4.234, true rewards: #0: 3.929 +[2024-11-07 16:21:59,067][14395] Avg episode reward: 4.234, avg true_objective: 3.929 +[2024-11-07 16:21:59,218][14395] Num frames 14200... +[2024-11-07 16:21:59,520][14395] Num frames 14300... +[2024-11-07 16:21:59,833][14395] Num frames 14400... +[2024-11-07 16:22:00,066][14395] Num frames 14500... +[2024-11-07 16:22:00,180][14395] Avg episode rewards: #0: 4.224, true rewards: #0: 3.926 +[2024-11-07 16:22:00,184][14395] Avg episode reward: 4.224, avg true_objective: 3.926 +[2024-11-07 16:22:00,348][14395] Num frames 14600... +[2024-11-07 16:22:00,643][14395] Num frames 14700... +[2024-11-07 16:22:00,975][14395] Num frames 14800... +[2024-11-07 16:22:01,473][14395] Num frames 14900... +[2024-11-07 16:22:01,826][14395] Num frames 15000... +[2024-11-07 16:22:01,906][14395] Avg episode rewards: #0: 4.265, true rewards: #0: 3.949 +[2024-11-07 16:22:01,909][14395] Avg episode reward: 4.265, avg true_objective: 3.949 +[2024-11-07 16:22:02,182][14395] Num frames 15100... +[2024-11-07 16:22:02,413][14395] Num frames 15200... +[2024-11-07 16:22:02,620][14395] Num frames 15300... +[2024-11-07 16:22:02,861][14395] Num frames 15400... +[2024-11-07 16:22:03,102][14395] Num frames 15500... +[2024-11-07 16:22:03,256][14395] Avg episode rewards: #0: 4.347, true rewards: #0: 3.988 +[2024-11-07 16:22:03,257][14395] Avg episode reward: 4.347, avg true_objective: 3.988 +[2024-11-07 16:22:03,348][14395] Num frames 15600... +[2024-11-07 16:22:03,533][14395] Num frames 15700... +[2024-11-07 16:22:03,725][14395] Num frames 15800... +[2024-11-07 16:22:03,919][14395] Num frames 15900... +[2024-11-07 16:22:04,046][14395] Avg episode rewards: #0: 4.334, true rewards: #0: 3.984 +[2024-11-07 16:22:04,050][14395] Avg episode reward: 4.334, avg true_objective: 3.984 +[2024-11-07 16:22:04,203][14395] Num frames 16000... +[2024-11-07 16:22:04,382][14395] Num frames 16100... +[2024-11-07 16:22:04,559][14395] Num frames 16200... +[2024-11-07 16:22:04,755][14395] Num frames 16300... +[2024-11-07 16:22:04,816][14395] Avg episode rewards: #0: 4.318, true rewards: #0: 3.976 +[2024-11-07 16:22:04,818][14395] Avg episode reward: 4.318, avg true_objective: 3.976 +[2024-11-07 16:22:04,990][14395] Num frames 16400... +[2024-11-07 16:22:05,162][14395] Num frames 16500... +[2024-11-07 16:22:05,326][14395] Num frames 16600... +[2024-11-07 16:22:05,529][14395] Avg episode rewards: #0: 4.306, true rewards: #0: 3.973 +[2024-11-07 16:22:05,533][14395] Avg episode reward: 4.306, avg true_objective: 3.973 +[2024-11-07 16:22:05,576][14395] Num frames 16700... +[2024-11-07 16:22:05,745][14395] Num frames 16800... +[2024-11-07 16:22:05,967][14395] Num frames 16900... +[2024-11-07 16:22:06,334][14395] Num frames 17000... +[2024-11-07 16:22:06,618][14395] Avg episode rewards: #0: 4.342, true rewards: #0: 3.970 +[2024-11-07 16:22:06,620][14395] Avg episode reward: 4.342, avg true_objective: 3.970 +[2024-11-07 16:22:06,700][14395] Num frames 17100... +[2024-11-07 16:22:06,957][14395] Num frames 17200... +[2024-11-07 16:22:07,191][14395] Num frames 17300... +[2024-11-07 16:22:07,361][14395] Num frames 17400... +[2024-11-07 16:22:07,525][14395] Avg episode rewards: #0: 4.331, true rewards: #0: 3.967 +[2024-11-07 16:22:07,528][14395] Avg episode reward: 4.331, avg true_objective: 3.967 +[2024-11-07 16:22:07,618][14395] Num frames 17500... +[2024-11-07 16:22:07,841][14395] Num frames 17600... +[2024-11-07 16:22:08,037][14395] Num frames 17700... +[2024-11-07 16:22:08,206][14395] Num frames 17800... +[2024-11-07 16:22:08,376][14395] Avg episode rewards: #0: 4.327, true rewards: #0: 3.971 +[2024-11-07 16:22:08,377][14395] Avg episode reward: 4.327, avg true_objective: 3.971 +[2024-11-07 16:22:08,438][14395] Num frames 17900... +[2024-11-07 16:22:08,593][14395] Num frames 18000... +[2024-11-07 16:22:08,804][14395] Num frames 18100... +[2024-11-07 16:22:08,926][14395] Avg episode rewards: #0: 4.288, true rewards: #0: 3.941 +[2024-11-07 16:22:08,930][14395] Avg episode reward: 4.288, avg true_objective: 3.941 +[2024-11-07 16:22:09,084][14395] Num frames 18200... +[2024-11-07 16:22:09,261][14395] Num frames 18300... +[2024-11-07 16:22:09,435][14395] Num frames 18400... +[2024-11-07 16:22:09,615][14395] Num frames 18500... +[2024-11-07 16:22:09,694][14395] Avg episode rewards: #0: 4.279, true rewards: #0: 3.939 +[2024-11-07 16:22:09,698][14395] Avg episode reward: 4.279, avg true_objective: 3.939 +[2024-11-07 16:22:09,881][14395] Num frames 18600... +[2024-11-07 16:22:10,059][14395] Num frames 18700... +[2024-11-07 16:22:10,298][14395] Num frames 18800... +[2024-11-07 16:22:10,540][14395] Avg episode rewards: #0: 4.270, true rewards: #0: 3.936 +[2024-11-07 16:22:10,543][14395] Avg episode reward: 4.270, avg true_objective: 3.936 +[2024-11-07 16:22:10,559][14395] Num frames 18900... +[2024-11-07 16:22:10,765][14395] Num frames 19000... +[2024-11-07 16:22:10,954][14395] Num frames 19100... +[2024-11-07 16:22:11,156][14395] Num frames 19200... +[2024-11-07 16:22:11,354][14395] Num frames 19300... +[2024-11-07 16:22:11,495][14395] Avg episode rewards: #0: 4.294, true rewards: #0: 3.948 +[2024-11-07 16:22:11,499][14395] Avg episode reward: 4.294, avg true_objective: 3.948 +[2024-11-07 16:22:11,645][14395] Num frames 19400... +[2024-11-07 16:22:11,855][14395] Num frames 19500... +[2024-11-07 16:22:12,073][14395] Num frames 19600... +[2024-11-07 16:22:12,314][14395] Num frames 19700... +[2024-11-07 16:22:12,642][14395] Avg episode rewards: #0: 4.318, true rewards: #0: 3.958 +[2024-11-07 16:22:12,645][14395] Avg episode reward: 4.318, avg true_objective: 3.958 +[2024-11-07 16:22:12,679][14395] Num frames 19800... +[2024-11-07 16:22:12,914][14395] Num frames 19900... +[2024-11-07 16:22:13,122][14395] Num frames 20000... +[2024-11-07 16:22:13,345][14395] Num frames 20100... +[2024-11-07 16:22:13,592][14395] Avg episode rewards: #0: 4.309, true rewards: #0: 3.956 +[2024-11-07 16:22:13,598][14395] Avg episode reward: 4.309, avg true_objective: 3.956 +[2024-11-07 16:22:13,669][14395] Num frames 20200... +[2024-11-07 16:22:13,899][14395] Num frames 20300... +[2024-11-07 16:22:14,111][14395] Num frames 20400... +[2024-11-07 16:22:14,256][14395] Avg episode rewards: #0: 4.275, true rewards: #0: 3.929 +[2024-11-07 16:22:14,258][14395] Avg episode reward: 4.275, avg true_objective: 3.929 +[2024-11-07 16:22:14,473][14395] Num frames 20500... +[2024-11-07 16:22:14,760][14395] Num frames 20600... +[2024-11-07 16:22:14,988][14395] Num frames 20700... +[2024-11-07 16:22:15,236][14395] Num frames 20800... +[2024-11-07 16:22:15,329][14395] Avg episode rewards: #0: 4.267, true rewards: #0: 3.927 +[2024-11-07 16:22:15,331][14395] Avg episode reward: 4.267, avg true_objective: 3.927 +[2024-11-07 16:22:15,540][14395] Num frames 20900... +[2024-11-07 16:22:15,759][14395] Num frames 21000... +[2024-11-07 16:22:15,977][14395] Num frames 21100... +[2024-11-07 16:22:16,221][14395] Num frames 21200... +[2024-11-07 16:22:16,424][14395] Avg episode rewards: #0: 4.289, true rewards: #0: 3.938 +[2024-11-07 16:22:16,425][14395] Avg episode reward: 4.289, avg true_objective: 3.938 +[2024-11-07 16:22:16,527][14395] Num frames 21300... +[2024-11-07 16:22:16,825][14395] Num frames 21400... +[2024-11-07 16:22:17,109][14395] Num frames 21500... +[2024-11-07 16:22:17,865][14395] Num frames 21600... +[2024-11-07 16:22:18,210][14395] Avg episode rewards: #0: 4.281, true rewards: #0: 3.936 +[2024-11-07 16:22:18,211][14395] Avg episode reward: 4.281, avg true_objective: 3.936 +[2024-11-07 16:22:18,452][14395] Num frames 21700... +[2024-11-07 16:22:19,000][14395] Num frames 21800... +[2024-11-07 16:22:19,294][14395] Num frames 21900... +[2024-11-07 16:22:19,621][14395] Num frames 22000... +[2024-11-07 16:22:20,104][14395] Avg episode rewards: #0: 4.303, true rewards: #0: 3.946 +[2024-11-07 16:22:20,106][14395] Avg episode reward: 4.303, avg true_objective: 3.946 +[2024-11-07 16:22:20,123][14395] Num frames 22100... +[2024-11-07 16:22:20,487][14395] Num frames 22200... +[2024-11-07 16:22:20,818][14395] Num frames 22300... +[2024-11-07 16:22:21,249][14395] Num frames 22400... +[2024-11-07 16:22:21,489][14395] Num frames 22500... +[2024-11-07 16:22:21,759][14395] Avg episode rewards: #0: 4.346, true rewards: #0: 3.961 +[2024-11-07 16:22:21,761][14395] Avg episode reward: 4.346, avg true_objective: 3.961 +[2024-11-07 16:22:21,834][14395] Num frames 22600... +[2024-11-07 16:22:22,090][14395] Num frames 22700... +[2024-11-07 16:22:22,316][14395] Num frames 22800... +[2024-11-07 16:22:22,574][14395] Num frames 22900... +[2024-11-07 16:22:22,790][14395] Avg episode rewards: #0: 4.338, true rewards: #0: 3.958 +[2024-11-07 16:22:22,796][14395] Avg episode reward: 4.338, avg true_objective: 3.958 +[2024-11-07 16:22:22,913][14395] Num frames 23000... +[2024-11-07 16:22:25,276][14395] Num frames 23100... +[2024-11-07 16:22:25,543][14395] Num frames 23200... +[2024-11-07 16:22:25,862][14395] Num frames 23300... +[2024-11-07 16:22:26,099][14395] Num frames 23400... +[2024-11-07 16:22:26,170][14395] Avg episode rewards: #0: 4.357, true rewards: #0: 3.967 +[2024-11-07 16:22:26,172][14395] Avg episode reward: 4.357, avg true_objective: 3.967 +[2024-11-07 16:22:26,430][14395] Num frames 23500... +[2024-11-07 16:22:26,739][14395] Num frames 23600... +[2024-11-07 16:22:27,071][14395] Num frames 23700... +[2024-11-07 16:22:27,323][14395] Num frames 23800... +[2024-11-07 16:22:27,443][14395] Avg episode rewards: #0: 4.371, true rewards: #0: 3.970 +[2024-11-07 16:22:27,446][14395] Avg episode reward: 4.371, avg true_objective: 3.970 +[2024-11-07 16:22:27,667][14395] Num frames 23900... +[2024-11-07 16:22:27,909][14395] Num frames 24000... +[2024-11-07 16:22:28,238][14395] Num frames 24100... +[2024-11-07 16:22:28,516][14395] Num frames 24200... +[2024-11-07 16:22:28,601][14395] Avg episode rewards: #0: 4.362, true rewards: #0: 3.968 +[2024-11-07 16:22:28,604][14395] Avg episode reward: 4.362, avg true_objective: 3.968 +[2024-11-07 16:22:28,837][14395] Num frames 24300... +[2024-11-07 16:22:29,075][14395] Num frames 24400... +[2024-11-07 16:22:29,309][14395] Num frames 24500... +[2024-11-07 16:22:29,530][14395] Num frames 24600... +[2024-11-07 16:22:29,703][14395] Avg episode rewards: #0: 4.380, true rewards: #0: 3.977 +[2024-11-07 16:22:29,705][14395] Avg episode reward: 4.380, avg true_objective: 3.977 +[2024-11-07 16:22:29,853][14395] Num frames 24700... +[2024-11-07 16:22:30,082][14395] Num frames 24800... +[2024-11-07 16:22:30,270][14395] Num frames 24900... +[2024-11-07 16:22:30,381][14395] Avg episode rewards: #0: 4.354, true rewards: #0: 3.957 +[2024-11-07 16:22:30,383][14395] Avg episode reward: 4.354, avg true_objective: 3.957 +[2024-11-07 16:22:30,586][14395] Num frames 25000... +[2024-11-07 16:22:30,837][14395] Num frames 25100... +[2024-11-07 16:22:31,151][14395] Num frames 25200... +[2024-11-07 16:22:31,403][14395] Num frames 25300... +[2024-11-07 16:22:31,487][14395] Avg episode rewards: #0: 4.345, true rewards: #0: 3.955 +[2024-11-07 16:22:31,488][14395] Avg episode reward: 4.345, avg true_objective: 3.955 +[2024-11-07 16:22:31,680][14395] Num frames 25400... +[2024-11-07 16:22:31,895][14395] Num frames 25500... +[2024-11-07 16:22:32,120][14395] Num frames 25600... +[2024-11-07 16:22:32,340][14395] Num frames 25700... +[2024-11-07 16:22:32,511][14395] Avg episode rewards: #0: 4.363, true rewards: #0: 3.963 +[2024-11-07 16:22:32,517][14395] Avg episode reward: 4.363, avg true_objective: 3.963 +[2024-11-07 16:22:32,613][14395] Num frames 25800... +[2024-11-07 16:22:32,802][14395] Num frames 25900... +[2024-11-07 16:22:33,007][14395] Num frames 26000... +[2024-11-07 16:22:33,222][14395] Num frames 26100... +[2024-11-07 16:22:33,369][14395] Avg episode rewards: #0: 4.355, true rewards: #0: 3.961 +[2024-11-07 16:22:33,370][14395] Avg episode reward: 4.355, avg true_objective: 3.961 +[2024-11-07 16:22:33,482][14395] Num frames 26200... +[2024-11-07 16:22:33,715][14395] Num frames 26300... +[2024-11-07 16:22:33,986][14395] Num frames 26400... +[2024-11-07 16:22:34,264][14395] Num frames 26500... +[2024-11-07 16:22:34,386][14395] Avg episode rewards: #0: 4.347, true rewards: #0: 3.959 +[2024-11-07 16:22:34,389][14395] Avg episode reward: 4.347, avg true_objective: 3.959 +[2024-11-07 16:22:34,572][14395] Num frames 26600... +[2024-11-07 16:22:34,809][14395] Num frames 26700... +[2024-11-07 16:22:35,043][14395] Num frames 26800... +[2024-11-07 16:22:35,243][14395] Num frames 26900... +[2024-11-07 16:22:35,323][14395] Avg episode rewards: #0: 4.340, true rewards: #0: 3.957 +[2024-11-07 16:22:35,330][14395] Avg episode reward: 4.340, avg true_objective: 3.957 +[2024-11-07 16:22:35,603][14395] Num frames 27000... +[2024-11-07 16:22:35,855][14395] Num frames 27100... +[2024-11-07 16:22:36,064][14395] Num frames 27200... +[2024-11-07 16:22:36,252][14395] Num frames 27300... +[2024-11-07 16:22:36,407][14395] Avg episode rewards: #0: 4.352, true rewards: #0: 3.960 +[2024-11-07 16:22:36,409][14395] Avg episode reward: 4.352, avg true_objective: 3.960 +[2024-11-07 16:22:36,554][14395] Num frames 27400... +[2024-11-07 16:22:36,744][14395] Num frames 27500... +[2024-11-07 16:22:36,926][14395] Num frames 27600... +[2024-11-07 16:22:37,146][14395] Num frames 27700... +[2024-11-07 16:22:37,306][14395] Avg episode rewards: #0: 4.363, true rewards: #0: 3.963 +[2024-11-07 16:22:37,308][14395] Avg episode reward: 4.363, avg true_objective: 3.963 +[2024-11-07 16:22:37,434][14395] Num frames 27800... +[2024-11-07 16:22:37,643][14395] Num frames 27900... +[2024-11-07 16:22:37,860][14395] Num frames 28000... +[2024-11-07 16:22:38,048][14395] Num frames 28100... +[2024-11-07 16:22:38,182][14395] Avg episode rewards: #0: 4.356, true rewards: #0: 3.962 +[2024-11-07 16:22:38,183][14395] Avg episode reward: 4.356, avg true_objective: 3.962 +[2024-11-07 16:22:38,373][14395] Num frames 28200... +[2024-11-07 16:22:38,581][14395] Num frames 28300... +[2024-11-07 16:22:38,800][14395] Num frames 28400... +[2024-11-07 16:22:39,040][14395] Num frames 28500... +[2024-11-07 16:22:39,257][14395] Avg episode rewards: #0: 4.372, true rewards: #0: 3.969 +[2024-11-07 16:22:39,261][14395] Avg episode reward: 4.372, avg true_objective: 3.969 +[2024-11-07 16:22:39,330][14395] Num frames 28600... +[2024-11-07 16:22:39,543][14395] Num frames 28700... +[2024-11-07 16:22:39,773][14395] Num frames 28800... +[2024-11-07 16:22:40,181][14395] Num frames 28900... +[2024-11-07 16:22:40,358][14395] Avg episode rewards: #0: 4.364, true rewards: #0: 3.967 +[2024-11-07 16:22:40,361][14395] Avg episode reward: 4.364, avg true_objective: 3.967 +[2024-11-07 16:22:40,460][14395] Num frames 29000... +[2024-11-07 16:22:40,656][14395] Num frames 29100... +[2024-11-07 16:22:40,862][14395] Num frames 29200... +[2024-11-07 16:22:41,035][14395] Num frames 29300... +[2024-11-07 16:22:41,212][14395] Num frames 29400... +[2024-11-07 16:22:41,396][14395] Num frames 29500... +[2024-11-07 16:22:41,458][14395] Avg episode rewards: #0: 4.406, true rewards: #0: 3.987 +[2024-11-07 16:22:41,465][14395] Avg episode reward: 4.406, avg true_objective: 3.987 +[2024-11-07 16:22:41,683][14395] Num frames 29600... +[2024-11-07 16:22:41,886][14395] Num frames 29700... +[2024-11-07 16:22:42,061][14395] Num frames 29800... +[2024-11-07 16:22:42,255][14395] Num frames 29900... +[2024-11-07 16:22:42,402][14395] Avg episode rewards: #0: 4.420, true rewards: #0: 3.993 +[2024-11-07 16:22:42,405][14395] Avg episode reward: 4.420, avg true_objective: 3.993 +[2024-11-07 16:22:42,503][14395] Num frames 30000... +[2024-11-07 16:22:42,714][14395] Num frames 30100... +[2024-11-07 16:22:42,893][14395] Num frames 30200... +[2024-11-07 16:22:43,068][14395] Num frames 30300... +[2024-11-07 16:22:43,253][14395] Num frames 30400... +[2024-11-07 16:22:43,473][14395] Avg episode rewards: #0: 4.460, true rewards: #0: 4.012 +[2024-11-07 16:22:43,474][14395] Avg episode reward: 4.460, avg true_objective: 4.012 +[2024-11-07 16:22:43,484][14395] Num frames 30500... +[2024-11-07 16:22:43,670][14395] Num frames 30600... +[2024-11-07 16:22:43,852][14395] Num frames 30700... +[2024-11-07 16:22:44,032][14395] Num frames 30800... +[2024-11-07 16:22:44,241][14395] Avg episode rewards: #0: 4.452, true rewards: #0: 4.010 +[2024-11-07 16:22:44,243][14395] Avg episode reward: 4.452, avg true_objective: 4.010 +[2024-11-07 16:22:44,298][14395] Num frames 30900... +[2024-11-07 16:22:44,492][14395] Num frames 31000... +[2024-11-07 16:22:44,694][14395] Num frames 31100... +[2024-11-07 16:22:44,885][14395] Num frames 31200... +[2024-11-07 16:22:45,053][14395] Num frames 31300... +[2024-11-07 16:22:45,171][14395] Avg episode rewards: #0: 4.465, true rewards: #0: 4.016 +[2024-11-07 16:22:45,172][14395] Avg episode reward: 4.465, avg true_objective: 4.016 +[2024-11-07 16:22:45,315][14395] Num frames 31400... +[2024-11-07 16:22:45,508][14395] Num frames 31500... +[2024-11-07 16:22:45,704][14395] Num frames 31600... +[2024-11-07 16:22:46,015][14395] Num frames 31700... +[2024-11-07 16:22:46,128][14395] Avg episode rewards: #0: 4.457, true rewards: #0: 4.014 +[2024-11-07 16:22:46,132][14395] Avg episode reward: 4.457, avg true_objective: 4.014 +[2024-11-07 16:22:46,343][14395] Num frames 31800... +[2024-11-07 16:22:46,516][14395] Num frames 31900... +[2024-11-07 16:22:46,730][14395] Num frames 32000... +[2024-11-07 16:22:46,913][14395] Num frames 32100... +[2024-11-07 16:22:47,145][14395] Avg episode rewards: #0: 4.486, true rewards: #0: 4.024 +[2024-11-07 16:22:47,146][14395] Avg episode reward: 4.486, avg true_objective: 4.024 +[2024-11-07 16:22:47,167][14395] Num frames 32200... +[2024-11-07 16:22:47,363][14395] Num frames 32300... +[2024-11-07 16:22:47,570][14395] Num frames 32400... +[2024-11-07 16:22:47,812][14395] Num frames 32500... +[2024-11-07 16:22:48,010][14395] Num frames 32600... +[2024-11-07 16:22:48,114][14395] Avg episode rewards: #0: 4.497, true rewards: #0: 4.028 +[2024-11-07 16:22:48,117][14395] Avg episode reward: 4.497, avg true_objective: 4.028 +[2024-11-07 16:22:48,275][14395] Num frames 32700... +[2024-11-07 16:22:48,462][14395] Num frames 32800... +[2024-11-07 16:22:48,653][14395] Num frames 32900... +[2024-11-07 16:22:48,849][14395] Num frames 33000... +[2024-11-07 16:22:49,042][14395] Num frames 33100... +[2024-11-07 16:22:49,106][14395] Avg episode rewards: #0: 4.513, true rewards: #0: 4.037 +[2024-11-07 16:22:49,111][14395] Avg episode reward: 4.513, avg true_objective: 4.037 +[2024-11-07 16:22:49,337][14395] Num frames 33200... +[2024-11-07 16:22:49,590][14395] Num frames 33300... +[2024-11-07 16:22:49,850][14395] Num frames 33400... +[2024-11-07 16:22:50,113][14395] Avg episode rewards: #0: 4.504, true rewards: #0: 4.035 +[2024-11-07 16:22:50,116][14395] Avg episode reward: 4.504, avg true_objective: 4.035 +[2024-11-07 16:22:50,151][14395] Num frames 33500... +[2024-11-07 16:22:50,397][14395] Num frames 33600... +[2024-11-07 16:22:50,682][14395] Num frames 33700... +[2024-11-07 16:22:50,931][14395] Num frames 33800... +[2024-11-07 16:22:51,158][14395] Avg episode rewards: #0: 4.497, true rewards: #0: 4.032 +[2024-11-07 16:22:51,161][14395] Avg episode reward: 4.497, avg true_objective: 4.032 +[2024-11-07 16:22:51,234][14395] Num frames 33900... +[2024-11-07 16:22:51,458][14395] Num frames 34000... +[2024-11-07 16:22:51,706][14395] Num frames 34100... +[2024-11-07 16:22:52,084][14395] Num frames 34200... +[2024-11-07 16:22:52,283][14395] Avg episode rewards: #0: 4.503, true rewards: #0: 4.032 +[2024-11-07 16:22:52,291][14395] Avg episode reward: 4.503, avg true_objective: 4.032 +[2024-11-07 16:22:52,357][14395] Num frames 34300... +[2024-11-07 16:22:52,563][14395] Num frames 34400... +[2024-11-07 16:22:52,763][14395] Num frames 34500... +[2024-11-07 16:22:52,953][14395] Num frames 34600... +[2024-11-07 16:22:53,141][14395] Avg episode rewards: #0: 4.495, true rewards: #0: 4.030 +[2024-11-07 16:22:53,145][14395] Avg episode reward: 4.495, avg true_objective: 4.030 +[2024-11-07 16:22:53,244][14395] Num frames 34700... +[2024-11-07 16:22:53,508][14395] Num frames 34800... +[2024-11-07 16:22:53,755][14395] Num frames 34900... +[2024-11-07 16:22:53,997][14395] Num frames 35000... +[2024-11-07 16:22:54,164][14395] Avg episode rewards: #0: 4.488, true rewards: #0: 4.028 +[2024-11-07 16:22:54,167][14395] Avg episode reward: 4.488, avg true_objective: 4.028 +[2024-11-07 16:22:54,329][14395] Num frames 35100... +[2024-11-07 16:22:54,564][14395] Num frames 35200... +[2024-11-07 16:22:54,826][14395] Num frames 35300... +[2024-11-07 16:22:55,060][14395] Num frames 35400... +[2024-11-07 16:22:55,285][14395] Num frames 35500... +[2024-11-07 16:22:55,398][14395] Avg episode rewards: #0: 4.503, true rewards: #0: 4.037 +[2024-11-07 16:22:55,405][14395] Avg episode reward: 4.503, avg true_objective: 4.037 +[2024-11-07 16:22:55,684][14395] Num frames 35600... +[2024-11-07 16:22:56,003][14395] Num frames 35700... +[2024-11-07 16:22:56,264][14395] Num frames 35800... +[2024-11-07 16:22:56,497][14395] Num frames 35900... +[2024-11-07 16:22:56,696][14395] Avg episode rewards: #0: 4.513, true rewards: #0: 4.042 +[2024-11-07 16:22:56,701][14395] Avg episode reward: 4.513, avg true_objective: 4.042 +[2024-11-07 16:22:56,782][14395] Num frames 36000... +[2024-11-07 16:22:56,987][14395] Num frames 36100... +[2024-11-07 16:22:57,191][14395] Num frames 36200... +[2024-11-07 16:22:59,796][14395] Num frames 36300... +[2024-11-07 16:22:59,969][14395] Avg episode rewards: #0: 4.506, true rewards: #0: 4.039 +[2024-11-07 16:22:59,973][14395] Avg episode reward: 4.506, avg true_objective: 4.039 +[2024-11-07 16:23:00,074][14395] Num frames 36400... +[2024-11-07 16:23:00,294][14395] Num frames 36500... +[2024-11-07 16:23:00,476][14395] Num frames 36600... +[2024-11-07 16:23:00,676][14395] Num frames 36700... +[2024-11-07 16:23:00,904][14395] Num frames 36800... +[2024-11-07 16:23:01,147][14395] Avg episode rewards: #0: 4.538, true rewards: #0: 4.055 +[2024-11-07 16:23:01,151][14395] Avg episode reward: 4.538, avg true_objective: 4.055 +[2024-11-07 16:23:01,158][14395] Num frames 36900... +[2024-11-07 16:23:01,443][14395] Num frames 37000... +[2024-11-07 16:23:01,694][14395] Num frames 37100... +[2024-11-07 16:23:01,899][14395] Num frames 37200... +[2024-11-07 16:23:02,182][14395] Avg episode rewards: #0: 4.531, true rewards: #0: 4.052 +[2024-11-07 16:23:02,184][14395] Avg episode reward: 4.531, avg true_objective: 4.052 +[2024-11-07 16:23:02,222][14395] Num frames 37300... +[2024-11-07 16:23:02,586][14395] Num frames 37400... +[2024-11-07 16:23:02,838][14395] Num frames 37500... +[2024-11-07 16:23:03,096][14395] Num frames 37600... +[2024-11-07 16:23:03,285][14395] Avg episode rewards: #0: 4.523, true rewards: #0: 4.050 +[2024-11-07 16:23:03,287][14395] Avg episode reward: 4.523, avg true_objective: 4.050 +[2024-11-07 16:23:03,350][14395] Num frames 37700... +[2024-11-07 16:23:03,546][14395] Num frames 37800... +[2024-11-07 16:23:03,724][14395] Num frames 37900... +[2024-11-07 16:23:03,923][14395] Num frames 38000... +[2024-11-07 16:23:04,089][14395] Avg episode rewards: #0: 4.516, true rewards: #0: 4.048 +[2024-11-07 16:23:04,093][14395] Avg episode reward: 4.516, avg true_objective: 4.048 +[2024-11-07 16:23:04,198][14395] Num frames 38100... +[2024-11-07 16:23:04,384][14395] Num frames 38200... +[2024-11-07 16:23:04,589][14395] Num frames 38300... +[2024-11-07 16:23:04,789][14395] Num frames 38400... +[2024-11-07 16:23:04,912][14395] Avg episode rewards: #0: 4.509, true rewards: #0: 4.046 +[2024-11-07 16:23:04,918][14395] Avg episode reward: 4.509, avg true_objective: 4.046 +[2024-11-07 16:23:05,061][14395] Num frames 38500... +[2024-11-07 16:23:05,241][14395] Num frames 38600... +[2024-11-07 16:23:05,443][14395] Num frames 38700... +[2024-11-07 16:23:05,644][14395] Num frames 38800... +[2024-11-07 16:23:05,871][14395] Avg episode rewards: #0: 4.519, true rewards: #0: 4.050 +[2024-11-07 16:23:05,875][14395] Avg episode reward: 4.519, avg true_objective: 4.050 +[2024-11-07 16:23:05,921][14395] Num frames 38900... +[2024-11-07 16:23:06,165][14395] Num frames 39000... +[2024-11-07 16:23:06,496][14395] Num frames 39100... +[2024-11-07 16:23:06,738][14395] Num frames 39200... +[2024-11-07 16:23:06,889][14395] Avg episode rewards: #0: 4.510, true rewards: #0: 4.046 +[2024-11-07 16:23:06,891][14395] Avg episode reward: 4.510, avg true_objective: 4.046 +[2024-11-07 16:23:07,101][14395] Num frames 39300... +[2024-11-07 16:23:07,346][14395] Num frames 39400... +[2024-11-07 16:23:07,570][14395] Num frames 39500... +[2024-11-07 16:23:07,772][14395] Num frames 39600... +[2024-11-07 16:23:07,921][14395] Avg episode rewards: #0: 4.503, true rewards: #0: 4.044 +[2024-11-07 16:23:07,922][14395] Avg episode reward: 4.503, avg true_objective: 4.044 +[2024-11-07 16:23:08,096][14395] Num frames 39700... +[2024-11-07 16:23:08,344][14395] Num frames 39800... +[2024-11-07 16:23:08,557][14395] Num frames 39900... +[2024-11-07 16:23:08,765][14395] Num frames 40000... +[2024-11-07 16:23:08,854][14395] Avg episode rewards: #0: 4.497, true rewards: #0: 4.042 +[2024-11-07 16:23:08,860][14395] Avg episode reward: 4.497, avg true_objective: 4.042 +[2024-11-07 16:23:09,040][14395] Num frames 40100... +[2024-11-07 16:23:09,274][14395] Num frames 40200... +[2024-11-07 16:23:09,467][14395] Num frames 40300... +[2024-11-07 16:23:09,699][14395] Num frames 40400... +[2024-11-07 16:23:09,752][14395] Avg episode rewards: #0: 4.490, true rewards: #0: 4.040 +[2024-11-07 16:23:09,755][14395] Avg episode reward: 4.490, avg true_objective: 4.040 +[2024-11-07 16:23:09,959][14395] Num frames 40500... +[2024-11-07 16:23:10,137][14395] Num frames 40600... +[2024-11-07 16:23:10,309][14395] Num frames 40700... +[2024-11-07 16:23:10,511][14395] Avg episode rewards: #0: 4.490, true rewards: #0: 4.040 +[2024-11-07 16:23:10,512][14395] Avg episode reward: 4.490, avg true_objective: 4.040 +[2024-11-07 16:23:10,554][14395] Num frames 40800... +[2024-11-07 16:23:10,754][14395] Num frames 40900... +[2024-11-07 16:23:10,940][14395] Num frames 41000... +[2024-11-07 16:23:11,155][14395] Num frames 41100... +[2024-11-07 16:23:11,327][14395] Num frames 41200... +[2024-11-07 16:23:11,442][14395] Avg episode rewards: #0: 4.506, true rewards: #0: 4.046 +[2024-11-07 16:23:11,443][14395] Avg episode reward: 4.506, avg true_objective: 4.046 +[2024-11-07 16:23:11,590][14395] Num frames 41300... +[2024-11-07 16:23:11,789][14395] Num frames 41400... +[2024-11-07 16:23:12,021][14395] Num frames 41500... +[2024-11-07 16:23:12,275][14395] Num frames 41600... +[2024-11-07 16:23:12,370][14395] Avg episode rewards: #0: 4.506, true rewards: #0: 4.046 +[2024-11-07 16:23:12,374][14395] Avg episode reward: 4.506, avg true_objective: 4.046 +[2024-11-07 16:23:12,565][14395] Num frames 41700... +[2024-11-07 16:23:12,770][14395] Num frames 41800... +[2024-11-07 16:23:12,956][14395] Num frames 41900... +[2024-11-07 16:23:13,163][14395] Avg episode rewards: #0: 4.503, true rewards: #0: 4.043 +[2024-11-07 16:23:13,168][14395] Avg episode reward: 4.503, avg true_objective: 4.043 +[2024-11-07 16:23:13,244][14395] Num frames 42000... +[2024-11-07 16:23:13,445][14395] Num frames 42100... +[2024-11-07 16:23:13,644][14395] Num frames 42200... +[2024-11-07 16:23:13,842][14395] Num frames 42300... +[2024-11-07 16:23:13,996][14395] Avg episode rewards: #0: 4.487, true rewards: #0: 4.037 +[2024-11-07 16:23:14,000][14395] Avg episode reward: 4.487, avg true_objective: 4.037 +[2024-11-07 16:23:14,103][14395] Num frames 42400... +[2024-11-07 16:23:14,278][14395] Num frames 42500... +[2024-11-07 16:23:14,469][14395] Num frames 42600... +[2024-11-07 16:23:14,652][14395] Num frames 42700... +[2024-11-07 16:23:14,771][14395] Avg episode rewards: #0: 4.487, true rewards: #0: 4.037 +[2024-11-07 16:23:14,775][14395] Avg episode reward: 4.487, avg true_objective: 4.037 +[2024-11-07 16:23:14,908][14395] Num frames 42800... +[2024-11-07 16:23:15,098][14395] Num frames 42900... +[2024-11-07 16:23:15,254][14395] Num frames 43000... +[2024-11-07 16:23:15,433][14395] Num frames 43100... +[2024-11-07 16:23:15,537][14395] Avg episode rewards: #0: 4.487, true rewards: #0: 4.037 +[2024-11-07 16:23:15,538][14395] Avg episode reward: 4.487, avg true_objective: 4.037 +[2024-11-07 16:23:15,691][14395] Num frames 43200... +[2024-11-07 16:23:15,868][14395] Num frames 43300... +[2024-11-07 16:23:16,037][14395] Num frames 43400... +[2024-11-07 16:23:16,201][14395] Num frames 43500... +[2024-11-07 16:23:16,263][14395] Avg episode rewards: #0: 4.470, true rewards: #0: 4.030 +[2024-11-07 16:23:16,268][14395] Avg episode reward: 4.470, avg true_objective: 4.030 +[2024-11-07 16:23:16,456][14395] Num frames 43600... +[2024-11-07 16:23:16,621][14395] Num frames 43700... +[2024-11-07 16:23:16,785][14395] Num frames 43800... +[2024-11-07 16:23:16,975][14395] Num frames 43900... +[2024-11-07 16:23:17,062][14395] Avg episode rewards: #0: 4.484, true rewards: #0: 4.033 +[2024-11-07 16:23:17,066][14395] Avg episode reward: 4.484, avg true_objective: 4.033 +[2024-11-07 16:23:17,218][14395] Num frames 44000... +[2024-11-07 16:23:17,405][14395] Num frames 44100... +[2024-11-07 16:23:17,570][14395] Num frames 44200... +[2024-11-07 16:23:17,744][14395] Num frames 44300... +[2024-11-07 16:23:17,808][14395] Avg episode rewards: #0: 4.496, true rewards: #0: 4.046 +[2024-11-07 16:23:17,812][14395] Avg episode reward: 4.496, avg true_objective: 4.046 +[2024-11-07 16:23:17,991][14395] Num frames 44400... +[2024-11-07 16:23:18,174][14395] Num frames 44500... +[2024-11-07 16:23:18,344][14395] Num frames 44600... +[2024-11-07 16:23:18,573][14395] Avg episode rewards: #0: 4.496, true rewards: #0: 4.046 +[2024-11-07 16:23:18,579][14395] Avg episode reward: 4.496, avg true_objective: 4.046 +[2024-11-07 16:23:18,615][14395] Num frames 44700... +[2024-11-07 16:23:18,788][14395] Num frames 44800... +[2024-11-07 16:23:18,959][14395] Num frames 44900... +[2024-11-07 16:23:19,139][14395] Num frames 45000... +[2024-11-07 16:23:19,327][14395] Avg episode rewards: #0: 4.496, true rewards: #0: 4.046 +[2024-11-07 16:23:19,331][14395] Avg episode reward: 4.496, avg true_objective: 4.046 +[2024-11-07 16:23:19,401][14395] Num frames 45100... +[2024-11-07 16:23:19,568][14395] Num frames 45200... +[2024-11-07 16:23:19,746][14395] Num frames 45300... +[2024-11-07 16:23:19,920][14395] Num frames 45400... +[2024-11-07 16:23:20,088][14395] Num frames 45500... +[2024-11-07 16:23:20,179][14395] Avg episode rewards: #0: 4.513, true rewards: #0: 4.053 +[2024-11-07 16:23:20,182][14395] Avg episode reward: 4.513, avg true_objective: 4.053 +[2024-11-07 16:23:20,321][14395] Num frames 45600... +[2024-11-07 16:23:20,501][14395] Num frames 45700... +[2024-11-07 16:23:20,667][14395] Num frames 45800... +[2024-11-07 16:23:20,853][14395] Num frames 45900... +[2024-11-07 16:23:20,973][14395] Avg episode rewards: #0: 4.500, true rewards: #0: 4.049 +[2024-11-07 16:23:20,981][14395] Avg episode reward: 4.500, avg true_objective: 4.049 +[2024-11-07 16:23:21,118][14395] Num frames 46000... +[2024-11-07 16:23:21,287][14395] Num frames 46100... +[2024-11-07 16:23:21,466][14395] Num frames 46200... +[2024-11-07 16:23:21,637][14395] Num frames 46300... +[2024-11-07 16:23:21,742][14395] Avg episode rewards: #0: 4.483, true rewards: #0: 4.043 +[2024-11-07 16:23:21,744][14395] Avg episode reward: 4.483, avg true_objective: 4.043 +[2024-11-07 16:23:21,893][14395] Num frames 46400... +[2024-11-07 16:23:22,167][14395] Num frames 46500... +[2024-11-07 16:23:22,339][14395] Num frames 46600... +[2024-11-07 16:23:22,496][14395] Num frames 46700... +[2024-11-07 16:23:22,671][14395] Avg episode rewards: #0: 4.500, true rewards: #0: 4.049 +[2024-11-07 16:23:22,675][14395] Avg episode reward: 4.500, avg true_objective: 4.049 +[2024-11-07 16:23:22,759][14395] Num frames 46800... +[2024-11-07 16:23:22,931][14395] Num frames 46900... +[2024-11-07 16:23:23,103][14395] Num frames 47000... +[2024-11-07 16:23:23,302][14395] Num frames 47100... +[2024-11-07 16:23:23,447][14395] Avg episode rewards: #0: 4.483, true rewards: #0: 4.043 +[2024-11-07 16:23:23,448][14395] Avg episode reward: 4.483, avg true_objective: 4.043 +[2024-11-07 16:23:23,533][14395] Num frames 47200... +[2024-11-07 16:23:23,716][14395] Num frames 47300... +[2024-11-07 16:23:24,014][14395] Num frames 47400... +[2024-11-07 16:23:24,285][14395] Num frames 47500... +[2024-11-07 16:23:24,407][14395] Avg episode rewards: #0: 4.483, true rewards: #0: 4.043 +[2024-11-07 16:23:24,409][14395] Avg episode reward: 4.483, avg true_objective: 4.043 +[2024-11-07 16:23:24,584][14395] Num frames 47600... +[2024-11-07 16:23:24,774][14395] Num frames 47700... +[2024-11-07 16:23:24,952][14395] Num frames 47800... +[2024-11-07 16:23:25,149][14395] Num frames 47900... +[2024-11-07 16:23:25,242][14395] Avg episode rewards: #0: 4.470, true rewards: #0: 4.040 +[2024-11-07 16:23:25,246][14395] Avg episode reward: 4.470, avg true_objective: 4.040 +[2024-11-07 16:23:25,423][14395] Num frames 48000... +[2024-11-07 16:23:25,610][14395] Num frames 48100... +[2024-11-07 16:23:25,795][14395] Num frames 48200... +[2024-11-07 16:23:25,975][14395] Num frames 48300... +[2024-11-07 16:23:26,037][14395] Avg episode rewards: #0: 4.454, true rewards: #0: 4.033 +[2024-11-07 16:23:26,039][14395] Avg episode reward: 4.454, avg true_objective: 4.033 +[2024-11-07 16:23:26,227][14395] Num frames 48400... +[2024-11-07 16:23:26,416][14395] Num frames 48500... +[2024-11-07 16:23:26,591][14395] Num frames 48600... +[2024-11-07 16:23:26,804][14395] Avg episode rewards: #0: 4.454, true rewards: #0: 4.033 +[2024-11-07 16:23:26,810][14395] Avg episode reward: 4.454, avg true_objective: 4.033 +[2024-11-07 16:23:26,849][14395] Num frames 48700... +[2024-11-07 16:23:27,037][14395] Num frames 48800... +[2024-11-07 16:23:27,251][14395] Num frames 48900... +[2024-11-07 16:23:27,502][14395] Num frames 49000... +[2024-11-07 16:23:27,697][14395] Avg episode rewards: #0: 4.401, true rewards: #0: 4.011 +[2024-11-07 16:23:27,699][14395] Avg episode reward: 4.401, avg true_objective: 4.011 +[2024-11-07 16:23:27,774][14395] Num frames 49100... +[2024-11-07 16:23:27,944][14395] Num frames 49200... +[2024-11-07 16:23:28,135][14395] Num frames 49300... +[2024-11-07 16:23:28,263][14395] Avg episode rewards: #0: 4.388, true rewards: #0: 3.998 +[2024-11-07 16:23:28,266][14395] Avg episode reward: 4.388, avg true_objective: 3.998 +[2024-11-07 16:23:28,420][14395] Num frames 49400... +[2024-11-07 16:23:28,622][14395] Num frames 49500... +[2024-11-07 16:23:28,844][14395] Avg episode rewards: #0: 4.388, true rewards: #0: 3.998 +[2024-11-07 16:23:28,848][14395] Avg episode reward: 4.388, avg true_objective: 3.998 +[2024-11-07 16:23:28,901][14395] Num frames 49600... +[2024-11-07 16:23:29,110][14395] Num frames 49700... +[2024-11-07 16:23:29,366][14395] Num frames 49800... +[2024-11-07 16:23:29,553][14395] Num frames 49900... +[2024-11-07 16:23:29,738][14395] Num frames 50000... +[2024-11-07 16:23:29,848][14395] Avg episode rewards: #0: 4.405, true rewards: #0: 4.005 +[2024-11-07 16:23:29,852][14395] Avg episode reward: 4.405, avg true_objective: 4.005 +[2024-11-07 16:23:30,008][14395] Num frames 50100... +[2024-11-07 16:23:30,224][14395] Num frames 50200... +[2024-11-07 16:23:30,400][14395] Num frames 50300... +[2024-11-07 16:23:30,582][14395] Num frames 50400... +[2024-11-07 16:23:30,731][14395] Avg episode rewards: #0: 4.408, true rewards: #0: 4.008 +[2024-11-07 16:23:30,734][14395] Avg episode reward: 4.408, avg true_objective: 4.008 +[2024-11-07 16:23:30,833][14395] Num frames 50500... +[2024-11-07 16:23:31,037][14395] Num frames 50600... +[2024-11-07 16:23:31,236][14395] Num frames 50700... +[2024-11-07 16:23:31,436][14395] Num frames 50800... +[2024-11-07 16:23:31,549][14395] Avg episode rewards: #0: 4.408, true rewards: #0: 4.008 +[2024-11-07 16:23:31,551][14395] Avg episode reward: 4.408, avg true_objective: 4.008 +[2024-11-07 16:23:31,716][14395] Num frames 50900... +[2024-11-07 16:23:31,899][14395] Num frames 51000... +[2024-11-07 16:23:34,271][14395] Num frames 51100... +[2024-11-07 16:23:34,505][14395] Num frames 51200... +[2024-11-07 16:23:34,602][14395] Avg episode rewards: #0: 4.421, true rewards: #0: 4.021 +[2024-11-07 16:23:34,607][14395] Avg episode reward: 4.421, avg true_objective: 4.021 +[2024-11-07 16:23:34,798][14395] Num frames 51300... +[2024-11-07 16:23:34,986][14395] Num frames 51400... +[2024-11-07 16:23:35,166][14395] Num frames 51500... +[2024-11-07 16:23:35,405][14395] Avg episode rewards: #0: 4.421, true rewards: #0: 4.021 +[2024-11-07 16:23:35,409][14395] Avg episode reward: 4.421, avg true_objective: 4.021 +[2024-11-07 16:23:35,415][14395] Num frames 51600... +[2024-11-07 16:23:35,610][14395] Num frames 51700... +[2024-11-07 16:23:35,794][14395] Num frames 51800... +[2024-11-07 16:23:35,986][14395] Num frames 51900... +[2024-11-07 16:23:36,197][14395] Avg episode rewards: #0: 4.421, true rewards: #0: 4.021 +[2024-11-07 16:23:36,199][14395] Avg episode reward: 4.421, avg true_objective: 4.021 +[2024-11-07 16:23:36,245][14395] Num frames 52000... +[2024-11-07 16:23:36,435][14395] Num frames 52100... +[2024-11-07 16:23:36,644][14395] Num frames 52200... +[2024-11-07 16:23:36,851][14395] Num frames 52300... +[2024-11-07 16:23:37,043][14395] Avg episode rewards: #0: 4.404, true rewards: #0: 4.014 +[2024-11-07 16:23:37,047][14395] Avg episode reward: 4.404, avg true_objective: 4.014 +[2024-11-07 16:23:37,127][14395] Num frames 52400... +[2024-11-07 16:23:37,315][14395] Num frames 52500... +[2024-11-07 16:23:37,527][14395] Num frames 52600... +[2024-11-07 16:23:37,718][14395] Num frames 52700... +[2024-11-07 16:23:37,879][14395] Avg episode rewards: #0: 4.404, true rewards: #0: 4.014 +[2024-11-07 16:23:37,882][14395] Avg episode reward: 4.404, avg true_objective: 4.014 +[2024-11-07 16:23:37,980][14395] Num frames 52800... +[2024-11-07 16:23:38,172][14395] Num frames 52900... +[2024-11-07 16:23:38,356][14395] Num frames 53000... +[2024-11-07 16:23:38,577][14395] Num frames 53100... +[2024-11-07 16:23:38,710][14395] Avg episode rewards: #0: 4.404, true rewards: #0: 4.014 +[2024-11-07 16:23:38,719][14395] Avg episode reward: 4.404, avg true_objective: 4.014 +[2024-11-07 16:23:38,856][14395] Num frames 53200... +[2024-11-07 16:23:39,054][14395] Num frames 53300... +[2024-11-07 16:23:39,237][14395] Num frames 53400... +[2024-11-07 16:23:39,418][14395] Num frames 53500... +[2024-11-07 16:23:39,627][14395] Num frames 53600... +[2024-11-07 16:23:39,835][14395] Avg episode rewards: #0: 4.440, true rewards: #0: 4.030 +[2024-11-07 16:23:39,841][14395] Avg episode reward: 4.440, avg true_objective: 4.030 +[2024-11-07 16:23:39,905][14395] Num frames 53700... +[2024-11-07 16:23:40,087][14395] Num frames 53800... +[2024-11-07 16:23:40,271][14395] Num frames 53900... +[2024-11-07 16:23:40,484][14395] Num frames 54000... +[2024-11-07 16:23:40,687][14395] Num frames 54100... +[2024-11-07 16:23:40,800][14395] Avg episode rewards: #0: 4.457, true rewards: #0: 4.037 +[2024-11-07 16:23:40,803][14395] Avg episode reward: 4.457, avg true_objective: 4.037 +[2024-11-07 16:23:40,951][14395] Num frames 54200... +[2024-11-07 16:23:41,162][14395] Num frames 54300... +[2024-11-07 16:23:41,360][14395] Num frames 54400... +[2024-11-07 16:23:41,537][14395] Num frames 54500... +[2024-11-07 16:23:41,623][14395] Avg episode rewards: #0: 4.457, true rewards: #0: 4.037 +[2024-11-07 16:23:41,627][14395] Avg episode reward: 4.457, avg true_objective: 4.037 +[2024-11-07 16:23:41,804][14395] Num frames 54600... +[2024-11-07 16:23:41,977][14395] Num frames 54700... +[2024-11-07 16:23:42,147][14395] Num frames 54800... +[2024-11-07 16:23:42,392][14395] Avg episode rewards: #0: 4.457, true rewards: #0: 4.037 +[2024-11-07 16:23:42,395][14395] Avg episode reward: 4.457, avg true_objective: 4.037 +[2024-11-07 16:23:42,421][14395] Num frames 54900... +[2024-11-07 16:23:42,675][14395] Num frames 55000... +[2024-11-07 16:23:42,849][14395] Num frames 55100... +[2024-11-07 16:23:43,034][14395] Num frames 55200... +[2024-11-07 16:23:43,217][14395] Num frames 55300... +[2024-11-07 16:23:43,370][14395] Avg episode rewards: #0: 4.454, true rewards: #0: 4.033 +[2024-11-07 16:23:43,374][14395] Avg episode reward: 4.454, avg true_objective: 4.033 +[2024-11-07 16:23:43,479][14395] Num frames 55400... +[2024-11-07 16:23:43,655][14395] Num frames 55500... +[2024-11-07 16:23:43,821][14395] Num frames 55600... +[2024-11-07 16:23:44,003][14395] Num frames 55700... +[2024-11-07 16:23:44,228][14395] Avg episode rewards: #0: 4.434, true rewards: #0: 4.024 +[2024-11-07 16:23:44,231][14395] Avg episode reward: 4.434, avg true_objective: 4.024 +[2024-11-07 16:23:44,261][14395] Num frames 55800... +[2024-11-07 16:23:44,439][14395] Num frames 55900... +[2024-11-07 16:23:44,619][14395] Num frames 56000... +[2024-11-07 16:23:44,809][14395] Num frames 56100... +[2024-11-07 16:23:44,991][14395] Avg episode rewards: #0: 4.434, true rewards: #0: 4.024 +[2024-11-07 16:23:44,995][14395] Avg episode reward: 4.434, avg true_objective: 4.024 +[2024-11-07 16:23:45,055][14395] Num frames 56200... +[2024-11-07 16:23:45,223][14395] Num frames 56300... +[2024-11-07 16:23:45,396][14395] Num frames 56400... +[2024-11-07 16:23:45,623][14395] Num frames 56500... +[2024-11-07 16:23:45,790][14395] Avg episode rewards: #0: 4.436, true rewards: #0: 4.026 +[2024-11-07 16:23:45,794][14395] Avg episode reward: 4.436, avg true_objective: 4.026 +[2024-11-07 16:23:45,897][14395] Num frames 56600... +[2024-11-07 16:23:46,097][14395] Num frames 56700... +[2024-11-07 16:23:46,300][14395] Num frames 56800... +[2024-11-07 16:23:46,496][14395] Num frames 56900... +[2024-11-07 16:23:46,687][14395] Avg episode rewards: #0: 4.449, true rewards: #0: 4.029 +[2024-11-07 16:23:46,690][14395] Avg episode reward: 4.449, avg true_objective: 4.029 +[2024-11-07 16:23:46,752][14395] Num frames 57000... +[2024-11-07 16:23:46,991][14395] Num frames 57100... +[2024-11-07 16:23:47,162][14395] Num frames 57200... +[2024-11-07 16:23:47,348][14395] Num frames 57300... +[2024-11-07 16:23:47,512][14395] Avg episode rewards: #0: 4.429, true rewards: #0: 4.029 +[2024-11-07 16:23:47,515][14395] Avg episode reward: 4.429, avg true_objective: 4.029 +[2024-11-07 16:23:47,603][14395] Num frames 57400... +[2024-11-07 16:23:47,790][14395] Num frames 57500... +[2024-11-07 16:23:48,015][14395] Num frames 57600... +[2024-11-07 16:23:48,220][14395] Num frames 57700... +[2024-11-07 16:23:48,400][14395] Avg episode rewards: #0: 4.429, true rewards: #0: 4.029 +[2024-11-07 16:23:48,403][14395] Avg episode reward: 4.429, avg true_objective: 4.029 +[2024-11-07 16:23:48,536][14395] Num frames 57800... +[2024-11-07 16:23:48,761][14395] Num frames 57900... +[2024-11-07 16:23:48,977][14395] Num frames 58000... +[2024-11-07 16:23:49,186][14395] Num frames 58100... +[2024-11-07 16:23:49,319][14395] Avg episode rewards: #0: 4.426, true rewards: #0: 4.026 +[2024-11-07 16:23:49,323][14395] Avg episode reward: 4.426, avg true_objective: 4.026 +[2024-11-07 16:23:49,486][14395] Num frames 58200... +[2024-11-07 16:23:49,717][14395] Num frames 58300... +[2024-11-07 16:23:49,927][14395] Num frames 58400... +[2024-11-07 16:23:50,148][14395] Num frames 58500... +[2024-11-07 16:23:50,230][14395] Avg episode rewards: #0: 4.438, true rewards: #0: 4.038 +[2024-11-07 16:23:50,231][14395] Avg episode reward: 4.438, avg true_objective: 4.038 +[2024-11-07 16:23:50,417][14395] Num frames 58600... +[2024-11-07 16:23:50,664][14395] Num frames 58700... +[2024-11-07 16:23:50,870][14395] Num frames 58800... +[2024-11-07 16:23:51,082][14395] Num frames 58900... +[2024-11-07 16:23:51,285][14395] Avg episode rewards: #0: 4.455, true rewards: #0: 4.045 +[2024-11-07 16:23:51,287][14395] Avg episode reward: 4.455, avg true_objective: 4.045 +[2024-11-07 16:23:51,386][14395] Num frames 59000... +[2024-11-07 16:23:51,617][14395] Num frames 59100... +[2024-11-07 16:23:51,835][14395] Num frames 59200... +[2024-11-07 16:23:52,045][14395] Num frames 59300... +[2024-11-07 16:23:52,279][14395] Avg episode rewards: #0: 4.458, true rewards: #0: 4.048 +[2024-11-07 16:23:52,282][14395] Avg episode reward: 4.458, avg true_objective: 4.048 +[2024-11-07 16:23:52,359][14395] Num frames 59400... +[2024-11-07 16:23:52,584][14395] Num frames 59500... +[2024-11-07 16:23:52,813][14395] Num frames 59600... +[2024-11-07 16:23:53,049][14395] Num frames 59700... +[2024-11-07 16:23:53,254][14395] Avg episode rewards: #0: 4.442, true rewards: #0: 4.042 +[2024-11-07 16:23:53,255][14395] Avg episode reward: 4.442, avg true_objective: 4.042 +[2024-11-07 16:23:53,359][14395] Num frames 59800... +[2024-11-07 16:23:53,620][14395] Num frames 59900... +[2024-11-07 16:23:53,825][14395] Num frames 60000... +[2024-11-07 16:23:54,058][14395] Num frames 60100... +[2024-11-07 16:23:54,341][14395] Num frames 60200... +[2024-11-07 16:23:54,424][14395] Avg episode rewards: #0: 4.442, true rewards: #0: 4.042 +[2024-11-07 16:23:54,425][14395] Avg episode reward: 4.442, avg true_objective: 4.042 +[2024-11-07 16:23:54,619][14395] Num frames 60300... +[2024-11-07 16:23:54,812][14395] Num frames 60400... +[2024-11-07 16:23:55,035][14395] Num frames 60500... +[2024-11-07 16:23:55,296][14395] Avg episode rewards: #0: 4.442, true rewards: #0: 4.042 +[2024-11-07 16:23:55,299][14395] Avg episode reward: 4.442, avg true_objective: 4.042 +[2024-11-07 16:23:55,335][14395] Num frames 60600... +[2024-11-07 16:23:55,561][14395] Num frames 60700... +[2024-11-07 16:23:55,766][14395] Num frames 60800... +[2024-11-07 16:23:55,988][14395] Num frames 60900... +[2024-11-07 16:23:56,193][14395] Num frames 61000... +[2024-11-07 16:23:56,420][14395] Num frames 61100... +[2024-11-07 16:23:56,481][14395] Avg episode rewards: #0: 4.487, true rewards: #0: 4.067 +[2024-11-07 16:23:56,484][14395] Avg episode reward: 4.487, avg true_objective: 4.067 +[2024-11-07 16:23:56,690][14395] Num frames 61200... +[2024-11-07 16:23:56,915][14395] Num frames 61300... +[2024-11-07 16:23:57,138][14395] Num frames 61400... +[2024-11-07 16:23:57,358][14395] Num frames 61500... +[2024-11-07 16:23:57,534][14395] Avg episode rewards: #0: 4.504, true rewards: #0: 4.074 +[2024-11-07 16:23:57,539][14395] Avg episode reward: 4.504, avg true_objective: 4.074 +[2024-11-07 16:23:57,675][14395] Num frames 61600... +[2024-11-07 16:23:57,916][14395] Num frames 61700... +[2024-11-07 16:23:58,159][14395] Num frames 61800... +[2024-11-07 16:23:58,421][14395] Num frames 61900... +[2024-11-07 16:23:58,557][14395] Avg episode rewards: #0: 4.487, true rewards: #0: 4.067 +[2024-11-07 16:23:58,561][14395] Avg episode reward: 4.487, avg true_objective: 4.067 +[2024-11-07 16:23:58,724][14395] Num frames 62000... +[2024-11-07 16:23:58,979][14395] Num frames 62100... +[2024-11-07 16:23:59,338][14395] Num frames 62200... +[2024-11-07 16:23:59,586][14395] Num frames 62300... +[2024-11-07 16:23:59,849][14395] Avg episode rewards: #0: 4.504, true rewards: #0: 4.074 +[2024-11-07 16:23:59,851][14395] Avg episode reward: 4.504, avg true_objective: 4.074 +[2024-11-07 16:23:59,943][14395] Num frames 62400... +[2024-11-07 16:24:00,280][14395] Num frames 62500... +[2024-11-07 16:24:00,490][14395] Num frames 62600... +[2024-11-07 16:24:00,703][14395] Num frames 62700... +[2024-11-07 16:24:00,933][14395] Num frames 62800... +[2024-11-07 16:24:01,117][14395] Num frames 62900... +[2024-11-07 16:24:01,228][14395] Avg episode rewards: #0: 4.523, true rewards: #0: 4.083 +[2024-11-07 16:24:01,233][14395] Avg episode reward: 4.523, avg true_objective: 4.083 +[2024-11-07 16:24:01,380][14395] Num frames 63000... +[2024-11-07 16:24:01,567][14395] Num frames 63100... +[2024-11-07 16:24:01,762][14395] Num frames 63200... +[2024-11-07 16:24:01,962][14395] Num frames 63300... +[2024-11-07 16:24:02,042][14395] Avg episode rewards: #0: 4.494, true rewards: #0: 4.074 +[2024-11-07 16:24:02,049][14395] Avg episode reward: 4.494, avg true_objective: 4.074 +[2024-11-07 16:24:02,239][14395] Num frames 63400... +[2024-11-07 16:24:02,428][14395] Num frames 63500... +[2024-11-07 16:24:02,614][14395] Num frames 63600... +[2024-11-07 16:24:02,810][14395] Num frames 63700... +[2024-11-07 16:24:02,994][14395] Avg episode rewards: #0: 4.510, true rewards: #0: 4.080 +[2024-11-07 16:24:02,999][14395] Avg episode reward: 4.510, avg true_objective: 4.080 +[2024-11-07 16:24:03,092][14395] Num frames 63800... +[2024-11-07 16:24:03,273][14395] Num frames 63900... +[2024-11-07 16:24:03,488][14395] Num frames 64000... +[2024-11-07 16:24:03,695][14395] Num frames 64100... +[2024-11-07 16:24:03,836][14395] Avg episode rewards: #0: 4.494, true rewards: #0: 4.074 +[2024-11-07 16:24:03,840][14395] Avg episode reward: 4.494, avg true_objective: 4.074 +[2024-11-07 16:24:03,978][14395] Num frames 64200... +[2024-11-07 16:24:04,149][14395] Num frames 64300... +[2024-11-07 16:24:04,335][14395] Num frames 64400... +[2024-11-07 16:24:04,519][14395] Num frames 64500... +[2024-11-07 16:24:04,711][14395] Num frames 64600... +[2024-11-07 16:24:04,952][14395] Avg episode rewards: #0: 4.516, true rewards: #0: 4.086 +[2024-11-07 16:24:04,954][14395] Avg episode reward: 4.516, avg true_objective: 4.086 +[2024-11-07 16:24:04,985][14395] Num frames 64700... +[2024-11-07 16:24:05,194][14395] Num frames 64800... +[2024-11-07 16:24:05,394][14395] Num frames 64900... +[2024-11-07 16:24:05,537][14395] Avg episode rewards: #0: 4.504, true rewards: #0: 4.074 +[2024-11-07 16:24:05,542][14395] Avg episode reward: 4.504, avg true_objective: 4.074 +[2024-11-07 16:24:05,667][14395] Num frames 65000... +[2024-11-07 16:24:05,862][14395] Num frames 65100... +[2024-11-07 16:24:06,054][14395] Num frames 65200... +[2024-11-07 16:24:06,239][14395] Num frames 65300... +[2024-11-07 16:24:06,353][14395] Avg episode rewards: #0: 4.487, true rewards: #0: 4.067 +[2024-11-07 16:24:06,356][14395] Avg episode reward: 4.487, avg true_objective: 4.067 +[2024-11-07 16:24:08,865][14395] Num frames 65400... +[2024-11-07 16:24:09,056][14395] Num frames 65500... +[2024-11-07 16:24:09,250][14395] Num frames 65600... +[2024-11-07 16:24:09,454][14395] Num frames 65700... +[2024-11-07 16:24:09,536][14395] Avg episode rewards: #0: 4.498, true rewards: #0: 4.078 +[2024-11-07 16:24:09,540][14395] Avg episode reward: 4.498, avg true_objective: 4.078 +[2024-11-07 16:24:09,733][14395] Num frames 65800... +[2024-11-07 16:24:09,914][14395] Num frames 65900... +[2024-11-07 16:24:10,106][14395] Num frames 66000... +[2024-11-07 16:24:10,332][14395] Avg episode rewards: #0: 4.498, true rewards: #0: 4.078 +[2024-11-07 16:24:10,338][14395] Avg episode reward: 4.498, avg true_objective: 4.078 +[2024-11-07 16:24:10,364][14395] Num frames 66100... +[2024-11-07 16:24:10,565][14395] Num frames 66200... +[2024-11-07 16:24:10,754][14395] Num frames 66300... +[2024-11-07 16:24:11,066][14395] Num frames 66400... +[2024-11-07 16:24:11,267][14395] Avg episode rewards: #0: 4.482, true rewards: #0: 4.072 +[2024-11-07 16:24:11,272][14395] Avg episode reward: 4.482, avg true_objective: 4.072 +[2024-11-07 16:24:11,329][14395] Num frames 66500... +[2024-11-07 16:24:11,514][14395] Num frames 66600... +[2024-11-07 16:24:11,706][14395] Num frames 66700... +[2024-11-07 16:24:11,904][14395] Num frames 66800... +[2024-11-07 16:24:12,063][14395] Avg episode rewards: #0: 4.481, true rewards: #0: 4.071 +[2024-11-07 16:24:12,067][14395] Avg episode reward: 4.481, avg true_objective: 4.071 +[2024-11-07 16:24:12,162][14395] Num frames 66900... +[2024-11-07 16:24:12,345][14395] Num frames 67000... +[2024-11-07 16:24:12,530][14395] Num frames 67100... +[2024-11-07 16:24:12,715][14395] Num frames 67200... +[2024-11-07 16:24:12,880][14395] Num frames 67300... +[2024-11-07 16:24:12,942][14395] Avg episode rewards: #0: 4.498, true rewards: #0: 4.078 +[2024-11-07 16:24:12,945][14395] Avg episode reward: 4.498, avg true_objective: 4.078 +[2024-11-07 16:24:13,127][14395] Num frames 67400... +[2024-11-07 16:24:13,299][14395] Num frames 67500... +[2024-11-07 16:24:13,472][14395] Num frames 67600... +[2024-11-07 16:24:13,663][14395] Num frames 67700... +[2024-11-07 16:24:13,758][14395] Avg episode rewards: #0: 4.511, true rewards: #0: 4.081 +[2024-11-07 16:24:13,761][14395] Avg episode reward: 4.511, avg true_objective: 4.081 +[2024-11-07 16:24:13,929][14395] Num frames 67800... +[2024-11-07 16:24:14,124][14395] Num frames 67900... +[2024-11-07 16:24:14,310][14395] Num frames 68000... +[2024-11-07 16:24:14,495][14395] Num frames 68100... +[2024-11-07 16:24:14,623][14395] Avg episode rewards: #0: 4.511, true rewards: #0: 4.081 +[2024-11-07 16:24:14,626][14395] Avg episode reward: 4.511, avg true_objective: 4.081 +[2024-11-07 16:24:14,744][14395] Num frames 68200... +[2024-11-07 16:24:14,947][14395] Num frames 68300... +[2024-11-07 16:24:15,159][14395] Num frames 68400... +[2024-11-07 16:24:15,360][14395] Num frames 68500... +[2024-11-07 16:24:15,570][14395] Avg episode rewards: #0: 4.514, true rewards: #0: 4.084 +[2024-11-07 16:24:15,572][14395] Avg episode reward: 4.514, avg true_objective: 4.084 +[2024-11-07 16:24:15,619][14395] Num frames 68600... +[2024-11-07 16:24:15,797][14395] Num frames 68700... +[2024-11-07 16:24:15,985][14395] Num frames 68800... +[2024-11-07 16:24:16,184][14395] Num frames 68900... +[2024-11-07 16:24:16,350][14395] Avg episode rewards: #0: 4.513, true rewards: #0: 4.083 +[2024-11-07 16:24:16,352][14395] Avg episode reward: 4.513, avg true_objective: 4.083 +[2024-11-07 16:24:16,445][14395] Num frames 69000... +[2024-11-07 16:24:16,630][14395] Num frames 69100... +[2024-11-07 16:24:16,821][14395] Num frames 69200... +[2024-11-07 16:24:16,993][14395] Num frames 69300... +[2024-11-07 16:24:17,128][14395] Avg episode rewards: #0: 4.497, true rewards: #0: 4.077 +[2024-11-07 16:24:17,134][14395] Avg episode reward: 4.497, avg true_objective: 4.077 +[2024-11-07 16:24:17,287][14395] Num frames 69400... +[2024-11-07 16:24:17,453][14395] Num frames 69500... +[2024-11-07 16:24:17,632][14395] Num frames 69600... +[2024-11-07 16:24:17,840][14395] Num frames 69700... +[2024-11-07 16:24:17,945][14395] Avg episode rewards: #0: 4.497, true rewards: #0: 4.077 +[2024-11-07 16:24:17,947][14395] Avg episode reward: 4.497, avg true_objective: 4.077 +[2024-11-07 16:24:18,083][14395] Num frames 69800... +[2024-11-07 16:24:18,256][14395] Num frames 69900... +[2024-11-07 16:24:18,432][14395] Num frames 70000... +[2024-11-07 16:24:18,611][14395] Num frames 70100... +[2024-11-07 16:24:18,693][14395] Avg episode rewards: #0: 4.461, true rewards: #0: 4.061 +[2024-11-07 16:24:18,695][14395] Avg episode reward: 4.461, avg true_objective: 4.061 +[2024-11-07 16:24:18,858][14395] Num frames 70200... +[2024-11-07 16:24:19,032][14395] Num frames 70300... +[2024-11-07 16:24:19,207][14395] Num frames 70400... +[2024-11-07 16:24:19,368][14395] Num frames 70500... +[2024-11-07 16:24:19,590][14395] Avg episode rewards: #0: 4.474, true rewards: #0: 4.064 +[2024-11-07 16:24:19,596][14395] Avg episode reward: 4.474, avg true_objective: 4.064 +[2024-11-07 16:24:19,626][14395] Num frames 70600... +[2024-11-07 16:24:19,794][14395] Num frames 70700... +[2024-11-07 16:24:19,970][14395] Num frames 70800... +[2024-11-07 16:24:20,145][14395] Num frames 70900... +[2024-11-07 16:24:20,329][14395] Avg episode rewards: #0: 4.438, true rewards: #0: 4.048 +[2024-11-07 16:24:20,330][14395] Avg episode reward: 4.438, avg true_objective: 4.048 +[2024-11-07 16:24:20,381][14395] Num frames 71000... +[2024-11-07 16:24:20,576][14395] Num frames 71100... +[2024-11-07 16:24:20,757][14395] Num frames 71200... +[2024-11-07 16:24:20,946][14395] Num frames 71300... +[2024-11-07 16:24:21,105][14395] Avg episode rewards: #0: 4.438, true rewards: #0: 4.048 +[2024-11-07 16:24:21,108][14395] Avg episode reward: 4.438, avg true_objective: 4.048 +[2024-11-07 16:24:21,188][14395] Num frames 71400... +[2024-11-07 16:24:21,367][14395] Num frames 71500... +[2024-11-07 16:24:21,537][14395] Num frames 71600... +[2024-11-07 16:24:21,721][14395] Num frames 71700... +[2024-11-07 16:24:21,874][14395] Avg episode rewards: #0: 4.422, true rewards: #0: 4.042 +[2024-11-07 16:24:21,879][14395] Avg episode reward: 4.422, avg true_objective: 4.042 +[2024-11-07 16:24:22,002][14395] Num frames 71800... +[2024-11-07 16:24:22,169][14395] Num frames 71900... +[2024-11-07 16:24:22,350][14395] Num frames 72000... +[2024-11-07 16:24:22,519][14395] Num frames 72100... +[2024-11-07 16:24:22,622][14395] Avg episode rewards: #0: 4.422, true rewards: #0: 4.042 +[2024-11-07 16:24:22,627][14395] Avg episode reward: 4.422, avg true_objective: 4.042 +[2024-11-07 16:24:22,777][14395] Num frames 72200... +[2024-11-07 16:24:22,960][14395] Num frames 72300... +[2024-11-07 16:24:23,130][14395] Num frames 72400... +[2024-11-07 16:24:23,302][14395] Num frames 72500... +[2024-11-07 16:24:23,478][14395] Avg episode rewards: #0: 4.409, true rewards: #0: 4.038 +[2024-11-07 16:24:23,483][14395] Avg episode reward: 4.409, avg true_objective: 4.038 +[2024-11-07 16:24:23,539][14395] Num frames 72600... +[2024-11-07 16:24:23,701][14395] Num frames 72700... +[2024-11-07 16:24:23,871][14395] Num frames 72800... +[2024-11-07 16:24:24,052][14395] Num frames 72900... +[2024-11-07 16:24:24,284][14395] Avg episode rewards: #0: 4.407, true rewards: #0: 4.037 +[2024-11-07 16:24:24,286][14395] Avg episode reward: 4.407, avg true_objective: 4.037 +[2024-11-07 16:24:24,313][14395] Num frames 73000... +[2024-11-07 16:24:24,486][14395] Num frames 73100... +[2024-11-07 16:24:24,660][14395] Num frames 73200... +[2024-11-07 16:24:24,827][14395] Num frames 73300... +[2024-11-07 16:24:25,018][14395] Avg episode rewards: #0: 4.387, true rewards: #0: 4.027 +[2024-11-07 16:24:25,022][14395] Avg episode reward: 4.387, avg true_objective: 4.027 +[2024-11-07 16:24:25,081][14395] Num frames 73400... +[2024-11-07 16:24:25,252][14395] Num frames 73500... +[2024-11-07 16:24:25,414][14395] Num frames 73600... +[2024-11-07 16:24:25,590][14395] Num frames 73700... +[2024-11-07 16:24:25,756][14395] Num frames 73800... +[2024-11-07 16:24:25,864][14395] Avg episode rewards: #0: 4.404, true rewards: #0: 4.034 +[2024-11-07 16:24:25,867][14395] Avg episode reward: 4.404, avg true_objective: 4.034 +[2024-11-07 16:24:26,014][14395] Num frames 73900... +[2024-11-07 16:24:26,190][14395] Num frames 74000... +[2024-11-07 16:24:26,354][14395] Num frames 74100... +[2024-11-07 16:24:26,620][14395] Num frames 74200... +[2024-11-07 16:24:26,706][14395] Avg episode rewards: #0: 4.404, true rewards: #0: 4.034 +[2024-11-07 16:24:26,707][14395] Avg episode reward: 4.404, avg true_objective: 4.034 +[2024-11-07 16:24:26,874][14395] Num frames 74300... +[2024-11-07 16:24:27,331][14395] Num frames 74400... +[2024-11-07 16:24:27,685][14395] Num frames 74500... +[2024-11-07 16:24:28,021][14395] Avg episode rewards: #0: 4.392, true rewards: #0: 4.032 +[2024-11-07 16:24:28,022][14395] Avg episode reward: 4.392, avg true_objective: 4.032 +[2024-11-07 16:24:28,048][14395] Num frames 74600... +[2024-11-07 16:24:28,299][14395] Num frames 74700... +[2024-11-07 16:24:28,526][14395] Num frames 74800... +[2024-11-07 16:24:28,751][14395] Num frames 74900... +[2024-11-07 16:24:29,065][14395] Num frames 75000... +[2024-11-07 16:24:29,264][14395] Avg episode rewards: #0: 4.408, true rewards: #0: 4.038 +[2024-11-07 16:24:29,267][14395] Avg episode reward: 4.408, avg true_objective: 4.038 +[2024-11-07 16:24:29,564][14395] Num frames 75100... +[2024-11-07 16:24:29,922][14395] Num frames 75200... +[2024-11-07 16:24:30,179][14395] Num frames 75300... +[2024-11-07 16:24:30,430][14395] Num frames 75400... +[2024-11-07 16:24:30,705][14395] Num frames 75500... +[2024-11-07 16:24:30,925][14395] Avg episode rewards: #0: 4.444, true rewards: #0: 4.054 +[2024-11-07 16:24:30,930][14395] Avg episode reward: 4.444, avg true_objective: 4.054 +[2024-11-07 16:24:30,988][14395] Num frames 75600... +[2024-11-07 16:24:31,168][14395] Num frames 75700... +[2024-11-07 16:24:31,354][14395] Num frames 75800... +[2024-11-07 16:24:31,535][14395] Num frames 75900... +[2024-11-07 16:24:31,717][14395] Avg episode rewards: #0: 4.425, true rewards: #0: 4.045 +[2024-11-07 16:24:31,722][14395] Avg episode reward: 4.425, avg true_objective: 4.045 +[2024-11-07 16:24:31,824][14395] Num frames 76000... +[2024-11-07 16:24:32,050][14395] Num frames 76100... +[2024-11-07 16:24:32,351][14395] Num frames 76200... +[2024-11-07 16:24:32,570][14395] Num frames 76300... +[2024-11-07 16:24:32,754][14395] Num frames 76400... +[2024-11-07 16:24:32,949][14395] Num frames 76500... +[2024-11-07 16:24:33,028][14395] Avg episode rewards: #0: 4.444, true rewards: #0: 4.054 +[2024-11-07 16:24:33,033][14395] Avg episode reward: 4.444, avg true_objective: 4.054 +[2024-11-07 16:24:33,211][14395] Num frames 76600... +[2024-11-07 16:24:33,434][14395] Num frames 76700... +[2024-11-07 16:24:33,625][14395] Num frames 76800... +[2024-11-07 16:24:33,855][14395] Avg episode rewards: #0: 4.444, true rewards: #0: 4.054 +[2024-11-07 16:24:33,858][14395] Avg episode reward: 4.444, avg true_objective: 4.054 +[2024-11-07 16:24:33,880][14395] Num frames 76900... +[2024-11-07 16:24:34,074][14395] Num frames 77000... +[2024-11-07 16:24:34,308][14395] Num frames 77100... +[2024-11-07 16:24:34,519][14395] Num frames 77200... +[2024-11-07 16:24:34,724][14395] Avg episode rewards: #0: 4.408, true rewards: #0: 4.038 +[2024-11-07 16:24:34,733][14395] Avg episode reward: 4.408, avg true_objective: 4.038 +[2024-11-07 16:24:34,790][14395] Num frames 77300... +[2024-11-07 16:24:34,974][14395] Num frames 77400... +[2024-11-07 16:24:35,170][14395] Num frames 77500... +[2024-11-07 16:24:35,410][14395] Num frames 77600... +[2024-11-07 16:24:35,618][14395] Avg episode rewards: #0: 4.408, true rewards: #0: 4.038 +[2024-11-07 16:24:35,621][14395] Avg episode reward: 4.408, avg true_objective: 4.038 +[2024-11-07 16:24:35,729][14395] Num frames 77700... +[2024-11-07 16:24:35,972][14395] Num frames 77800... +[2024-11-07 16:24:36,263][14395] Num frames 77900... +[2024-11-07 16:24:36,492][14395] Num frames 78000... +[2024-11-07 16:24:36,710][14395] Num frames 78100... +[2024-11-07 16:24:36,790][14395] Avg episode rewards: #0: 4.425, true rewards: #0: 4.045 +[2024-11-07 16:24:36,794][14395] Avg episode reward: 4.425, avg true_objective: 4.045 +[2024-11-07 16:24:36,998][14395] Num frames 78200... +[2024-11-07 16:24:37,225][14395] Num frames 78300... +[2024-11-07 16:24:37,473][14395] Num frames 78400... +[2024-11-07 16:24:37,746][14395] Avg episode rewards: #0: 4.425, true rewards: #0: 4.045 +[2024-11-07 16:24:37,751][14395] Avg episode reward: 4.425, avg true_objective: 4.045 +[2024-11-07 16:24:37,778][14395] Num frames 78500... +[2024-11-07 16:24:38,038][14395] Num frames 78600... +[2024-11-07 16:24:38,379][14395] Num frames 78700... +[2024-11-07 16:24:38,702][14395] Num frames 78800... +[2024-11-07 16:24:39,005][14395] Num frames 78900... +[2024-11-07 16:24:39,552][14395] Num frames 79000... +[2024-11-07 16:24:39,773][14395] Avg episode rewards: #0: 4.461, true rewards: #0: 4.061 +[2024-11-07 16:24:39,776][14395] Avg episode reward: 4.461, avg true_objective: 4.061 +[2024-11-07 16:24:39,974][14395] Num frames 79100... +[2024-11-07 16:24:40,620][14395] Num frames 79200... +[2024-11-07 16:24:40,958][14395] Num frames 79300... +[2024-11-07 16:24:43,538][14395] Num frames 79400... +[2024-11-07 16:24:43,704][14395] Avg episode rewards: #0: 4.444, true rewards: #0: 4.054 +[2024-11-07 16:24:43,707][14395] Avg episode reward: 4.444, avg true_objective: 4.054 +[2024-11-07 16:24:44,083][14395] Num frames 79500... +[2024-11-07 16:24:44,461][14395] Num frames 79600... +[2024-11-07 16:24:44,814][14395] Num frames 79700... +[2024-11-07 16:24:45,074][14395] Num frames 79800... +[2024-11-07 16:24:45,167][14395] Avg episode rewards: #0: 4.446, true rewards: #0: 4.056 +[2024-11-07 16:24:45,168][14395] Avg episode reward: 4.446, avg true_objective: 4.056 +[2024-11-07 16:24:45,379][14395] Num frames 79900... +[2024-11-07 16:24:45,668][14395] Num frames 80000... +[2024-11-07 16:24:46,019][14395] Num frames 80100... +[2024-11-07 16:24:46,273][14395] Avg episode rewards: #0: 4.446, true rewards: #0: 4.056 +[2024-11-07 16:24:46,275][14395] Avg episode reward: 4.446, avg true_objective: 4.056 +[2024-11-07 16:24:46,306][14395] Num frames 80200... +[2024-11-07 16:24:46,537][14395] Num frames 80300... +[2024-11-07 16:24:46,774][14395] Num frames 80400... +[2024-11-07 16:24:46,981][14395] Num frames 80500... +[2024-11-07 16:24:47,217][14395] Num frames 80600... +[2024-11-07 16:24:47,292][14395] Avg episode rewards: #0: 4.459, true rewards: #0: 4.059 +[2024-11-07 16:24:47,298][14395] Avg episode reward: 4.459, avg true_objective: 4.059 +[2024-11-07 16:24:47,510][14395] Num frames 80700... +[2024-11-07 16:24:47,712][14395] Num frames 80800... +[2024-11-07 16:24:47,925][14395] Num frames 80900... +[2024-11-07 16:24:48,175][14395] Avg episode rewards: #0: 4.459, true rewards: #0: 4.059 +[2024-11-07 16:24:48,178][14395] Avg episode reward: 4.459, avg true_objective: 4.059 +[2024-11-07 16:24:48,216][14395] Num frames 81000... +[2024-11-07 16:24:48,425][14395] Num frames 81100... +[2024-11-07 16:24:48,647][14395] Num frames 81200... +[2024-11-07 16:24:48,865][14395] Num frames 81300... +[2024-11-07 16:24:49,094][14395] Avg episode rewards: #0: 4.459, true rewards: #0: 4.059 +[2024-11-07 16:24:49,095][14395] Avg episode reward: 4.459, avg true_objective: 4.059 +[2024-11-07 16:24:49,168][14395] Num frames 81400... +[2024-11-07 16:24:49,395][14395] Num frames 81500... +[2024-11-07 16:24:49,636][14395] Num frames 81600... +[2024-11-07 16:24:49,874][14395] Num frames 81700... +[2024-11-07 16:24:50,064][14395] Avg episode rewards: #0: 4.443, true rewards: #0: 4.053 +[2024-11-07 16:24:50,065][14395] Avg episode reward: 4.443, avg true_objective: 4.053 +[2024-11-07 16:24:50,184][14395] Num frames 81800... +[2024-11-07 16:24:50,449][14395] Num frames 81900... +[2024-11-07 16:24:50,744][14395] Num frames 82000... +[2024-11-07 16:24:50,980][14395] Num frames 82100... +[2024-11-07 16:24:51,125][14395] Avg episode rewards: #0: 4.443, true rewards: #0: 4.053 +[2024-11-07 16:24:51,126][14395] Avg episode reward: 4.443, avg true_objective: 4.053 +[2024-11-07 16:24:51,320][14395] Num frames 82200... +[2024-11-07 16:24:51,667][14395] Num frames 82300... +[2024-11-07 16:24:51,967][14395] Num frames 82400... +[2024-11-07 16:24:52,265][14395] Num frames 82500... +[2024-11-07 16:24:52,382][14395] Avg episode rewards: #0: 4.446, true rewards: #0: 4.056 +[2024-11-07 16:24:52,386][14395] Avg episode reward: 4.446, avg true_objective: 4.056 +[2024-11-07 16:24:52,542][14395] Num frames 82600... +[2024-11-07 16:24:52,754][14395] Num frames 82700... +[2024-11-07 16:24:52,950][14395] Num frames 82800... +[2024-11-07 16:24:53,239][14395] Num frames 82900... +[2024-11-07 16:24:53,438][14395] Num frames 83000... +[2024-11-07 16:24:53,637][14395] Avg episode rewards: #0: 4.472, true rewards: #0: 4.072 +[2024-11-07 16:24:53,641][14395] Avg episode reward: 4.472, avg true_objective: 4.072 +[2024-11-07 16:24:53,714][14395] Num frames 83100... +[2024-11-07 16:24:53,937][14395] Num frames 83200... +[2024-11-07 16:24:54,183][14395] Num frames 83300... +[2024-11-07 16:24:54,366][14395] Num frames 83400... +[2024-11-07 16:24:54,571][14395] Avg episode rewards: #0: 4.472, true rewards: #0: 4.072 +[2024-11-07 16:24:54,575][14395] Avg episode reward: 4.472, avg true_objective: 4.072 +[2024-11-07 16:24:54,674][14395] Num frames 83500... +[2024-11-07 16:24:54,880][14395] Num frames 83600... +[2024-11-07 16:24:55,098][14395] Num frames 83700... +[2024-11-07 16:24:55,461][14395] Num frames 83800... +[2024-11-07 16:24:55,667][14395] Avg episode rewards: #0: 4.485, true rewards: #0: 4.075 +[2024-11-07 16:24:55,670][14395] Avg episode reward: 4.485, avg true_objective: 4.075 +[2024-11-07 16:24:55,770][14395] Num frames 83900... +[2024-11-07 16:24:55,962][14395] Num frames 84000... +[2024-11-07 16:24:56,169][14395] Num frames 84100... +[2024-11-07 16:24:56,406][14395] Num frames 84200... +[2024-11-07 16:24:56,577][14395] Avg episode rewards: #0: 4.485, true rewards: #0: 4.075 +[2024-11-07 16:24:56,583][14395] Avg episode reward: 4.485, avg true_objective: 4.075 +[2024-11-07 16:24:56,695][14395] Num frames 84300... +[2024-11-07 16:24:56,891][14395] Num frames 84400... +[2024-11-07 16:24:57,083][14395] Num frames 84500... +[2024-11-07 16:24:57,297][14395] Num frames 84600... +[2024-11-07 16:24:57,488][14395] Num frames 84700... +[2024-11-07 16:24:57,551][14395] Avg episode rewards: #0: 4.489, true rewards: #0: 4.078 +[2024-11-07 16:24:57,555][14395] Avg episode reward: 4.489, avg true_objective: 4.078 +[2024-11-07 16:24:57,766][14395] Num frames 84800... +[2024-11-07 16:24:57,975][14395] Num frames 84900... +[2024-11-07 16:24:58,176][14395] Num frames 85000... +[2024-11-07 16:24:58,447][14395] Avg episode rewards: #0: 4.489, true rewards: #0: 4.078 +[2024-11-07 16:24:58,456][14395] Avg episode reward: 4.489, avg true_objective: 4.078 +[2024-11-07 16:24:58,498][14395] Num frames 85100... +[2024-11-07 16:24:59,089][14395] Num frames 85200... +[2024-11-07 16:24:59,383][14395] Num frames 85300... +[2024-11-07 16:24:59,597][14395] Num frames 85400... +[2024-11-07 16:24:59,897][14395] Avg episode rewards: #0: 4.489, true rewards: #0: 4.078 +[2024-11-07 16:24:59,898][14395] Avg episode reward: 4.489, avg true_objective: 4.078 +[2024-11-07 16:25:00,027][14395] Num frames 85500... +[2024-11-07 16:25:00,392][14395] Num frames 85600... +[2024-11-07 16:25:00,654][14395] Num frames 85700... +[2024-11-07 16:25:01,034][14395] Num frames 85800... +[2024-11-07 16:25:01,253][14395] Avg episode rewards: #0: 4.489, true rewards: #0: 4.078 +[2024-11-07 16:25:01,255][14395] Avg episode reward: 4.489, avg true_objective: 4.078 +[2024-11-07 16:25:01,479][14395] Num frames 85900... +[2024-11-07 16:25:01,901][14395] Num frames 86000... +[2024-11-07 16:25:02,342][14395] Num frames 86100... +[2024-11-07 16:25:02,632][14395] Num frames 86200... +[2024-11-07 16:25:02,905][14395] Num frames 86300... +[2024-11-07 16:25:02,976][14395] Avg episode rewards: #0: 4.489, true rewards: #0: 4.078 +[2024-11-07 16:25:02,978][14395] Avg episode reward: 4.489, avg true_objective: 4.078 +[2024-11-07 16:25:03,520][14395] Num frames 86400... +[2024-11-07 16:25:03,854][14395] Num frames 86500... +[2024-11-07 16:25:04,118][14395] Num frames 86600... +[2024-11-07 16:25:04,400][14395] Avg episode rewards: #0: 4.495, true rewards: #0: 4.075 +[2024-11-07 16:25:04,403][14395] Avg episode reward: 4.495, avg true_objective: 4.075 +[2024-11-07 16:25:04,451][14395] Num frames 86700... +[2024-11-07 16:25:04,798][14395] Num frames 86800... +[2024-11-07 16:25:05,188][14395] Num frames 86900... +[2024-11-07 16:25:05,542][14395] Num frames 87000... +[2024-11-07 16:25:05,804][14395] Avg episode rewards: #0: 4.495, true rewards: #0: 4.075 +[2024-11-07 16:25:05,806][14395] Avg episode reward: 4.495, avg true_objective: 4.075 +[2024-11-07 16:25:05,957][14395] Num frames 87100... +[2024-11-07 16:25:06,264][14395] Num frames 87200... +[2024-11-07 16:25:06,819][14395] Num frames 87300... +[2024-11-07 16:25:07,170][14395] Num frames 87400... +[2024-11-07 16:25:07,550][14395] Avg episode rewards: #0: 4.479, true rewards: #0: 4.069 +[2024-11-07 16:25:07,551][14395] Avg episode reward: 4.479, avg true_objective: 4.069 +[2024-11-07 16:25:07,755][14395] Num frames 87500... +[2024-11-07 16:25:08,200][14395] Num frames 87600... +[2024-11-07 16:25:08,519][14395] Num frames 87700... +[2024-11-07 16:25:09,025][14395] Num frames 87800... +[2024-11-07 16:25:09,210][14395] Avg episode rewards: #0: 4.479, true rewards: #0: 4.069 +[2024-11-07 16:25:09,211][14395] Avg episode reward: 4.479, avg true_objective: 4.069 +[2024-11-07 16:25:09,353][14395] Num frames 87900... +[2024-11-07 16:25:09,919][14395] Num frames 88000... +[2024-11-07 16:25:10,241][14395] Num frames 88100... +[2024-11-07 16:25:10,692][14395] Num frames 88200... +[2024-11-07 16:25:10,811][14395] Avg episode rewards: #0: 4.479, true rewards: #0: 4.069 +[2024-11-07 16:25:10,813][14395] Avg episode reward: 4.479, avg true_objective: 4.069 +[2024-11-07 16:25:11,296][14395] Num frames 88300... +[2024-11-07 16:25:11,985][14395] Num frames 88400... +[2024-11-07 16:25:12,672][14395] Num frames 88500... +[2024-11-07 16:25:13,448][14395] Num frames 88600... +[2024-11-07 16:25:13,974][14395] Avg episode rewards: #0: 4.495, true rewards: #0: 4.075 +[2024-11-07 16:25:13,977][14395] Avg episode reward: 4.495, avg true_objective: 4.075 +[2024-11-07 16:25:14,152][14395] Num frames 88700... +[2024-11-07 16:25:14,595][14395] Num frames 88800... +[2024-11-07 16:25:15,060][14395] Num frames 88900... +[2024-11-07 16:25:17,503][14395] Num frames 89000... +[2024-11-07 16:25:18,022][14395] Avg episode rewards: #0: 4.495, true rewards: #0: 4.075 +[2024-11-07 16:25:18,024][14395] Avg episode reward: 4.495, avg true_objective: 4.075 +[2024-11-07 16:25:18,571][14395] Num frames 89100... +[2024-11-07 16:25:19,417][14395] Num frames 89200... +[2024-11-07 16:25:19,805][14395] Num frames 89300... +[2024-11-07 16:25:20,653][14395] Num frames 89400... +[2024-11-07 16:25:21,277][14395] Num frames 89500... +[2024-11-07 16:25:21,355][14395] Avg episode rewards: #0: 4.512, true rewards: #0: 4.082 +[2024-11-07 16:25:21,359][14395] Avg episode reward: 4.512, avg true_objective: 4.082 +[2024-11-07 16:25:22,093][14395] Num frames 89600... +[2024-11-07 16:25:22,598][14395] Num frames 89700... +[2024-11-07 16:25:22,826][14395] Avg episode rewards: #0: 4.499, true rewards: #0: 4.069 +[2024-11-07 16:25:22,828][14395] Avg episode reward: 4.499, avg true_objective: 4.069 +[2024-11-07 16:25:22,946][14395] Num frames 89800... +[2024-11-07 16:25:23,205][14395] Num frames 89900... +[2024-11-07 16:25:23,557][14395] Num frames 90000... +[2024-11-07 16:25:23,936][14395] Num frames 90100... +[2024-11-07 16:25:24,207][14395] Num frames 90200... +[2024-11-07 16:25:24,290][14395] Avg episode rewards: #0: 4.528, true rewards: #0: 4.088 +[2024-11-07 16:25:24,292][14395] Avg episode reward: 4.528, avg true_objective: 4.088 +[2024-11-07 16:25:24,643][14395] Num frames 90300... +[2024-11-07 16:25:24,923][14395] Num frames 90400... +[2024-11-07 16:25:25,212][14395] Num frames 90500... +[2024-11-07 16:25:25,509][14395] Avg episode rewards: #0: 4.541, true rewards: #0: 4.101 +[2024-11-07 16:25:25,511][14395] Avg episode reward: 4.541, avg true_objective: 4.101 +[2024-11-07 16:25:25,540][14395] Num frames 90600... +[2024-11-07 16:25:25,899][14395] Num frames 90700... +[2024-11-07 16:25:26,183][14395] Num frames 90800... +[2024-11-07 16:25:26,427][14395] Num frames 90900... +[2024-11-07 16:25:26,660][14395] Avg episode rewards: #0: 4.525, true rewards: #0: 4.094 +[2024-11-07 16:25:26,661][14395] Avg episode reward: 4.525, avg true_objective: 4.094 +[2024-11-07 16:25:26,737][14395] Num frames 91000... +[2024-11-07 16:25:27,045][14395] Num frames 91100... +[2024-11-07 16:25:27,368][14395] Num frames 91200... +[2024-11-07 16:25:27,717][14395] Num frames 91300... +[2024-11-07 16:25:28,018][14395] Avg episode rewards: #0: 4.535, true rewards: #0: 4.094 +[2024-11-07 16:25:28,024][14395] Avg episode reward: 4.535, avg true_objective: 4.094 +[2024-11-07 16:25:28,068][14395] Num frames 91400... +[2024-11-07 16:25:28,492][14395] Num frames 91500... +[2024-11-07 16:25:28,785][14395] Num frames 91600... +[2024-11-07 16:25:29,085][14395] Num frames 91700... +[2024-11-07 16:25:29,384][14395] Avg episode rewards: #0: 4.535, true rewards: #0: 4.094 +[2024-11-07 16:25:29,387][14395] Avg episode reward: 4.535, avg true_objective: 4.094 +[2024-11-07 16:25:29,469][14395] Num frames 91800... +[2024-11-07 16:25:29,731][14395] Num frames 91900... +[2024-11-07 16:25:30,140][14395] Num frames 92000... +[2024-11-07 16:25:30,479][14395] Num frames 92100... +[2024-11-07 16:25:30,776][14395] Avg episode rewards: #0: 4.535, true rewards: #0: 4.094 +[2024-11-07 16:25:30,778][14395] Avg episode reward: 4.535, avg true_objective: 4.094 +[2024-11-07 16:25:30,950][14395] Num frames 92200... +[2024-11-07 16:25:31,285][14395] Num frames 92300... +[2024-11-07 16:25:31,714][14395] Num frames 92400... +[2024-11-07 16:25:32,141][14395] Num frames 92500... +[2024-11-07 16:25:32,748][14395] Num frames 92600... +[2024-11-07 16:25:32,833][14395] Avg episode rewards: #0: 4.551, true rewards: #0: 4.101 +[2024-11-07 16:25:32,835][14395] Avg episode reward: 4.551, avg true_objective: 4.101 +[2024-11-07 16:25:33,272][14395] Num frames 92700... +[2024-11-07 16:25:33,680][14395] Num frames 92800... +[2024-11-07 16:25:33,987][14395] Num frames 92900... +[2024-11-07 16:25:34,413][14395] Avg episode rewards: #0: 4.551, true rewards: #0: 4.101 +[2024-11-07 16:25:34,414][14395] Avg episode reward: 4.551, avg true_objective: 4.101 +[2024-11-07 16:25:34,448][14395] Num frames 93000... +[2024-11-07 16:25:34,786][14395] Num frames 93100... +[2024-11-07 16:25:35,037][14395] Num frames 93200... +[2024-11-07 16:25:35,370][14395] Num frames 93300... +[2024-11-07 16:25:35,630][14395] Num frames 93400... +[2024-11-07 16:25:35,787][14395] Avg episode rewards: #0: 4.567, true rewards: #0: 4.107 +[2024-11-07 16:25:35,788][14395] Avg episode reward: 4.567, avg true_objective: 4.107 +[2024-11-07 16:25:35,958][14395] Num frames 93500... +[2024-11-07 16:25:36,266][14395] Num frames 93600... +[2024-11-07 16:25:36,577][14395] Num frames 93700... +[2024-11-07 16:25:36,825][14395] Num frames 93800... +[2024-11-07 16:25:36,944][14395] Avg episode rewards: #0: 4.567, true rewards: #0: 4.107 +[2024-11-07 16:25:36,952][14395] Avg episode reward: 4.567, avg true_objective: 4.107 +[2024-11-07 16:25:37,157][14395] Num frames 93900... +[2024-11-07 16:25:37,558][14395] Num frames 94000... +[2024-11-07 16:25:37,925][14395] Num frames 94100... +[2024-11-07 16:25:38,187][14395] Avg episode rewards: #0: 4.573, true rewards: #0: 4.103 +[2024-11-07 16:25:38,191][14395] Avg episode reward: 4.573, avg true_objective: 4.103 +[2024-11-07 16:25:38,284][14395] Num frames 94200... +[2024-11-07 16:25:38,571][14395] Num frames 94300... +[2024-11-07 16:25:38,858][14395] Num frames 94400... +[2024-11-07 16:25:39,173][14395] Num frames 94500... +[2024-11-07 16:25:39,494][14395] Avg episode rewards: #0: 4.537, true rewards: #0: 4.087 +[2024-11-07 16:25:39,496][14395] Avg episode reward: 4.537, avg true_objective: 4.087 +[2024-11-07 16:25:39,746][14395] Num frames 94600... +[2024-11-07 16:25:40,134][14395] Num frames 94700... +[2024-11-07 16:25:40,516][14395] Num frames 94800... +[2024-11-07 16:25:40,802][14395] Num frames 94900... +[2024-11-07 16:25:40,960][14395] Avg episode rewards: #0: 4.521, true rewards: #0: 4.081 +[2024-11-07 16:25:40,963][14395] Avg episode reward: 4.521, avg true_objective: 4.081 +[2024-11-07 16:25:41,169][14395] Num frames 95000... +[2024-11-07 16:25:41,478][14395] Num frames 95100... +[2024-11-07 16:25:41,772][14395] Num frames 95200... +[2024-11-07 16:25:42,036][14395] Num frames 95300... +[2024-11-07 16:25:42,171][14395] Avg episode rewards: #0: 4.521, true rewards: #0: 4.081 +[2024-11-07 16:25:42,174][14395] Avg episode reward: 4.521, avg true_objective: 4.081 +[2024-11-07 16:25:42,494][14395] Num frames 95400... +[2024-11-07 16:25:42,824][14395] Num frames 95500... +[2024-11-07 16:25:43,092][14395] Num frames 95600... +[2024-11-07 16:25:43,387][14395] Num frames 95700... +[2024-11-07 16:25:43,447][14395] Avg episode rewards: #0: 4.521, true rewards: #0: 4.081 +[2024-11-07 16:25:43,448][14395] Avg episode reward: 4.521, avg true_objective: 4.081 +[2024-11-07 16:25:43,708][14395] Num frames 95800... +[2024-11-07 16:25:43,962][14395] Num frames 95900... +[2024-11-07 16:25:44,262][14395] Avg episode rewards: #0: 4.491, true rewards: #0: 4.061 +[2024-11-07 16:25:44,263][14395] Avg episode reward: 4.491, avg true_objective: 4.061 +[2024-11-07 16:25:44,373][14395] Num frames 96000... +[2024-11-07 16:25:44,619][14395] Num frames 96100... +[2024-11-07 16:25:45,012][14395] Num frames 96200... +[2024-11-07 16:25:45,255][14395] Num frames 96300... +[2024-11-07 16:25:45,407][14395] Avg episode rewards: #0: 4.475, true rewards: #0: 4.055 +[2024-11-07 16:25:45,409][14395] Avg episode reward: 4.475, avg true_objective: 4.055 +[2024-11-07 16:25:45,575][14395] Num frames 96400... +[2024-11-07 16:25:45,807][14395] Num frames 96500... +[2024-11-07 16:25:46,094][14395] Avg episode rewards: #0: 4.462, true rewards: #0: 4.042 +[2024-11-07 16:25:46,096][14395] Avg episode reward: 4.462, avg true_objective: 4.042 +[2024-11-07 16:25:46,102][14395] Num frames 96600... +[2024-11-07 16:25:46,318][14395] Num frames 96700... +[2024-11-07 16:25:46,558][14395] Num frames 96800... +[2024-11-07 16:25:46,825][14395] Num frames 96900... +[2024-11-07 16:25:47,112][14395] Avg episode rewards: #0: 4.462, true rewards: #0: 4.042 +[2024-11-07 16:25:47,114][14395] Avg episode reward: 4.462, avg true_objective: 4.042 +[2024-11-07 16:25:47,167][14395] Num frames 97000... +[2024-11-07 16:25:47,443][14395] Num frames 97100... +[2024-11-07 16:25:47,749][14395] Num frames 97200... +[2024-11-07 16:25:48,060][14395] Num frames 97300... +[2024-11-07 16:25:48,351][14395] Avg episode rewards: #0: 4.449, true rewards: #0: 4.039 +[2024-11-07 16:25:48,353][14395] Avg episode reward: 4.449, avg true_objective: 4.039 +[2024-11-07 16:25:48,553][14395] Num frames 97400... +[2024-11-07 16:25:49,083][14395] Num frames 97500... +[2024-11-07 16:25:49,477][14395] Num frames 97600... +[2024-11-07 16:25:51,993][14395] Num frames 97700... +[2024-11-07 16:25:52,239][14395] Avg episode rewards: #0: 4.449, true rewards: #0: 4.039 +[2024-11-07 16:25:52,241][14395] Avg episode reward: 4.449, avg true_objective: 4.039 +[2024-11-07 16:25:52,419][14395] Num frames 97800... +[2024-11-07 16:25:52,694][14395] Num frames 97900... +[2024-11-07 16:25:53,018][14395] Num frames 98000... +[2024-11-07 16:25:53,286][14395] Num frames 98100... +[2024-11-07 16:25:53,430][14395] Avg episode rewards: #0: 4.449, true rewards: #0: 4.039 +[2024-11-07 16:25:53,435][14395] Avg episode reward: 4.449, avg true_objective: 4.039 +[2024-11-07 16:25:53,597][14395] Num frames 98200... +[2024-11-07 16:25:53,889][14395] Num frames 98300... +[2024-11-07 16:25:54,121][14395] Num frames 98400... +[2024-11-07 16:25:54,404][14395] Num frames 98500... +[2024-11-07 16:25:54,521][14395] Avg episode rewards: #0: 4.449, true rewards: #0: 4.039 +[2024-11-07 16:25:54,522][14395] Avg episode reward: 4.449, avg true_objective: 4.039 +[2024-11-07 16:25:54,796][14395] Num frames 98600... +[2024-11-07 16:25:55,069][14395] Num frames 98700... +[2024-11-07 16:25:55,355][14395] Num frames 98800... +[2024-11-07 16:25:55,630][14395] Num frames 98900... +[2024-11-07 16:25:55,691][14395] Avg episode rewards: #0: 4.449, true rewards: #0: 4.039 +[2024-11-07 16:25:55,694][14395] Avg episode reward: 4.449, avg true_objective: 4.039 +[2024-11-07 16:25:56,061][14395] Num frames 99000... +[2024-11-07 16:25:56,281][14395] Num frames 99100... +[2024-11-07 16:25:56,528][14395] Num frames 99200... +[2024-11-07 16:25:56,740][14395] Num frames 99300... +[2024-11-07 16:25:56,857][14395] Avg episode rewards: #0: 4.447, true rewards: #0: 4.037 +[2024-11-07 16:25:56,859][14395] Avg episode reward: 4.447, avg true_objective: 4.037 +[2024-11-07 16:25:57,008][14395] Num frames 99400... +[2024-11-07 16:25:57,197][14395] Num frames 99500... +[2024-11-07 16:25:57,381][14395] Num frames 99600... +[2024-11-07 16:25:57,575][14395] Num frames 99700... +[2024-11-07 16:25:57,851][14395] Avg episode rewards: #0: 4.460, true rewards: #0: 4.040 +[2024-11-07 16:25:57,857][14395] Avg episode reward: 4.460, avg true_objective: 4.040 +[2024-11-07 16:25:57,914][14395] Num frames 99800... +[2024-11-07 16:25:58,090][14395] Num frames 99900... +[2024-11-07 16:25:58,276][14395] Num frames 100000... +[2024-11-07 16:30:56,493][14395] Replay video saved to /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/replay.mp4! +[2024-11-07 16:31:34,770][14395] The model has been pushed to https://huggingface.co/alidenewade/rl_course_vizdoom_health_gathering_supreme +[2024-11-07 16:38:07,687][14395] Loading existing experiment configuration from /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/config.json +[2024-11-07 16:38:07,688][14395] Overriding arg 'num_workers' with value 1 passed from command line +[2024-11-07 16:38:07,689][14395] Adding new argument 'no_render'=True that is not in the saved config file! +[2024-11-07 16:38:07,691][14395] Adding new argument 'save_video'=True that is not in the saved config file! +[2024-11-07 16:38:07,693][14395] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! +[2024-11-07 16:38:07,695][14395] Adding new argument 'video_name'=None that is not in the saved config file! +[2024-11-07 16:38:07,698][14395] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! +[2024-11-07 16:38:07,699][14395] Adding new argument 'max_num_episodes'=100 that is not in the saved config file! +[2024-11-07 16:38:07,701][14395] Adding new argument 'push_to_hub'=False that is not in the saved config file! +[2024-11-07 16:38:07,703][14395] Adding new argument 'hf_repository'=None that is not in the saved config file! +[2024-11-07 16:38:07,705][14395] Adding new argument 'policy_index'=0 that is not in the saved config file! +[2024-11-07 16:38:07,708][14395] Adding new argument 'eval_deterministic'=False that is not in the saved config file! +[2024-11-07 16:38:07,711][14395] Adding new argument 'train_script'=None that is not in the saved config file! +[2024-11-07 16:38:07,714][14395] Adding new argument 'enjoy_script'=None that is not in the saved config file! +[2024-11-07 16:38:07,727][14395] Using frameskip 1 and render_action_repeat=4 for evaluation +[2024-11-07 16:38:08,313][14395] RunningMeanStd input shape: (3, 72, 128) +[2024-11-07 16:38:08,688][14395] RunningMeanStd input shape: (1,) +[2024-11-07 16:38:09,099][14395] ConvEncoder: input_channels=3 +[2024-11-07 16:38:10,478][14395] Conv encoder output size: 512 +[2024-11-07 16:38:10,481][14395] Policy head output size: 512 +[2024-11-07 16:38:11,827][14395] Loading state from checkpoint /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000004884_20004864.pth... +[2024-11-07 16:38:17,313][14395] Num frames 100... +[2024-11-07 16:38:17,579][14395] Num frames 200... +[2024-11-07 16:38:17,817][14395] Num frames 300... +[2024-11-07 16:38:18,019][14395] Num frames 400... +[2024-11-07 16:38:18,233][14395] Avg episode rewards: #0: 5.480, true rewards: #0: 4.480 +[2024-11-07 16:38:18,235][14395] Avg episode reward: 5.480, avg true_objective: 4.480 +[2024-11-07 16:38:18,359][14395] Num frames 500... +[2024-11-07 16:38:18,594][14395] Num frames 600... +[2024-11-07 16:38:18,786][14395] Num frames 700... +[2024-11-07 16:38:18,987][14395] Num frames 800... +[2024-11-07 16:38:19,105][14395] Avg episode rewards: #0: 4.660, true rewards: #0: 4.160 +[2024-11-07 16:38:19,110][14395] Avg episode reward: 4.660, avg true_objective: 4.160 +[2024-11-07 16:38:19,277][14395] Num frames 900... +[2024-11-07 16:38:19,499][14395] Num frames 1000... +[2024-11-07 16:38:19,754][14395] Num frames 1100... +[2024-11-07 16:38:20,016][14395] Num frames 1200... +[2024-11-07 16:38:20,109][14395] Avg episode rewards: #0: 4.387, true rewards: #0: 4.053 +[2024-11-07 16:38:20,110][14395] Avg episode reward: 4.387, avg true_objective: 4.053 +[2024-11-07 16:38:20,330][14395] Num frames 1300... +[2024-11-07 16:38:20,550][14395] Num frames 1400... +[2024-11-07 16:38:20,785][14395] Num frames 1500... +[2024-11-07 16:38:21,001][14395] Num frames 1600... +[2024-11-07 16:38:21,239][14395] Num frames 1700... +[2024-11-07 16:38:21,369][14395] Avg episode rewards: #0: 5.070, true rewards: #0: 4.320 +[2024-11-07 16:38:21,373][14395] Avg episode reward: 5.070, avg true_objective: 4.320 +[2024-11-07 16:38:21,533][14395] Num frames 1800... +[2024-11-07 16:38:21,766][14395] Num frames 1900... +[2024-11-07 16:38:21,971][14395] Num frames 2000... +[2024-11-07 16:38:22,151][14395] Num frames 2100... +[2024-11-07 16:38:22,232][14395] Avg episode rewards: #0: 4.824, true rewards: #0: 4.224 +[2024-11-07 16:38:22,238][14395] Avg episode reward: 4.824, avg true_objective: 4.224 +[2024-11-07 16:38:22,409][14395] Num frames 2200... +[2024-11-07 16:38:22,623][14395] Num frames 2300... +[2024-11-07 16:38:22,810][14395] Num frames 2400... +[2024-11-07 16:38:23,045][14395] Avg episode rewards: #0: 4.660, true rewards: #0: 4.160 +[2024-11-07 16:38:23,047][14395] Avg episode reward: 4.660, avg true_objective: 4.160 +[2024-11-07 16:38:23,058][14395] Num frames 2500... +[2024-11-07 16:38:23,259][14395] Num frames 2600... +[2024-11-07 16:38:23,470][14395] Num frames 2700... +[2024-11-07 16:38:25,810][14395] Num frames 2800... +[2024-11-07 16:38:26,038][14395] Avg episode rewards: #0: 4.543, true rewards: #0: 4.114 +[2024-11-07 16:38:26,042][14395] Avg episode reward: 4.543, avg true_objective: 4.114 +[2024-11-07 16:38:26,098][14395] Num frames 2900... +[2024-11-07 16:38:26,304][14395] Num frames 3000... +[2024-11-07 16:38:26,496][14395] Num frames 3100... +[2024-11-07 16:38:26,731][14395] Num frames 3200... +[2024-11-07 16:38:26,948][14395] Avg episode rewards: #0: 4.455, true rewards: #0: 4.080 +[2024-11-07 16:38:26,950][14395] Avg episode reward: 4.455, avg true_objective: 4.080 +[2024-11-07 16:38:27,041][14395] Num frames 3300... +[2024-11-07 16:38:27,321][14395] Num frames 3400... +[2024-11-07 16:38:27,547][14395] Num frames 3500... +[2024-11-07 16:38:27,774][14395] Num frames 3600... +[2024-11-07 16:38:27,944][14395] Avg episode rewards: #0: 4.387, true rewards: #0: 4.053 +[2024-11-07 16:38:27,945][14395] Avg episode reward: 4.387, avg true_objective: 4.053 +[2024-11-07 16:38:28,090][14395] Num frames 3700... +[2024-11-07 16:38:28,314][14395] Num frames 3800... +[2024-11-07 16:38:28,554][14395] Num frames 3900... +[2024-11-07 16:38:28,706][14395] Avg episode rewards: #0: 4.237, true rewards: #0: 3.937 +[2024-11-07 16:38:28,709][14395] Avg episode reward: 4.237, avg true_objective: 3.937 +[2024-11-07 16:38:28,871][14395] Num frames 4000... +[2024-11-07 16:38:29,088][14395] Num frames 4100... +[2024-11-07 16:38:29,357][14395] Num frames 4200... +[2024-11-07 16:38:29,603][14395] Num frames 4300... +[2024-11-07 16:38:29,710][14395] Avg episode rewards: #0: 4.201, true rewards: #0: 3.928 +[2024-11-07 16:38:29,711][14395] Avg episode reward: 4.201, avg true_objective: 3.928 +[2024-11-07 16:38:29,904][14395] Num frames 4400... +[2024-11-07 16:38:30,104][14395] Num frames 4500... +[2024-11-07 16:38:30,288][14395] Num frames 4600... +[2024-11-07 16:38:30,484][14395] Num frames 4700... +[2024-11-07 16:38:30,681][14395] Avg episode rewards: #0: 4.308, true rewards: #0: 3.974 +[2024-11-07 16:38:30,683][14395] Avg episode reward: 4.308, avg true_objective: 3.974 +[2024-11-07 16:38:30,761][14395] Num frames 4800... +[2024-11-07 16:38:30,968][14395] Num frames 4900... +[2024-11-07 16:38:31,186][14395] Num frames 5000... +[2024-11-07 16:38:31,393][14395] Avg episode rewards: #0: 4.198, true rewards: #0: 3.890 +[2024-11-07 16:38:31,394][14395] Avg episode reward: 4.198, avg true_objective: 3.890 +[2024-11-07 16:38:31,514][14395] Num frames 5100... +[2024-11-07 16:38:31,752][14395] Num frames 5200... +[2024-11-07 16:38:31,988][14395] Num frames 5300... +[2024-11-07 16:38:32,219][14395] Num frames 5400... +[2024-11-07 16:38:32,437][14395] Num frames 5500... +[2024-11-07 16:38:32,505][14395] Avg episode rewards: #0: 4.289, true rewards: #0: 3.932 +[2024-11-07 16:38:32,509][14395] Avg episode reward: 4.289, avg true_objective: 3.932 +[2024-11-07 16:38:32,703][14395] Num frames 5600... +[2024-11-07 16:38:32,900][14395] Num frames 5700... +[2024-11-07 16:38:33,102][14395] Num frames 5800... +[2024-11-07 16:38:33,301][14395] Num frames 5900... +[2024-11-07 16:38:33,466][14395] Avg episode rewards: #0: 4.369, true rewards: #0: 3.969 +[2024-11-07 16:38:33,467][14395] Avg episode reward: 4.369, avg true_objective: 3.969 +[2024-11-07 16:38:33,566][14395] Num frames 6000... +[2024-11-07 16:38:33,787][14395] Num frames 6100... +[2024-11-07 16:38:33,997][14395] Num frames 6200... +[2024-11-07 16:38:34,211][14395] Num frames 6300... +[2024-11-07 16:38:34,423][14395] Num frames 6400... +[2024-11-07 16:38:34,755][14395] Avg episode rewards: #0: 4.561, true rewards: #0: 4.061 +[2024-11-07 16:38:34,758][14395] Avg episode reward: 4.561, avg true_objective: 4.061 +[2024-11-07 16:38:34,776][14395] Num frames 6500... +[2024-11-07 16:38:34,998][14395] Num frames 6600... +[2024-11-07 16:38:35,225][14395] Num frames 6700... +[2024-11-07 16:38:35,471][14395] Num frames 6800... +[2024-11-07 16:38:35,693][14395] Avg episode rewards: #0: 4.518, true rewards: #0: 4.048 +[2024-11-07 16:38:35,694][14395] Avg episode reward: 4.518, avg true_objective: 4.048 +[2024-11-07 16:38:35,741][14395] Num frames 6900... +[2024-11-07 16:38:35,952][14395] Num frames 7000... +[2024-11-07 16:38:36,144][14395] Num frames 7100... +[2024-11-07 16:38:36,344][14395] Num frames 7200... +[2024-11-07 16:38:36,554][14395] Avg episode rewards: #0: 4.481, true rewards: #0: 4.036 +[2024-11-07 16:38:36,555][14395] Avg episode reward: 4.481, avg true_objective: 4.036 +[2024-11-07 16:38:36,629][14395] Num frames 7300... +[2024-11-07 16:38:36,859][14395] Num frames 7400... +[2024-11-07 16:38:37,154][14395] Num frames 7500... +[2024-11-07 16:38:37,450][14395] Num frames 7600... +[2024-11-07 16:38:37,717][14395] Avg episode rewards: #0: 4.516, true rewards: #0: 4.043 +[2024-11-07 16:38:37,724][14395] Avg episode reward: 4.516, avg true_objective: 4.043 +[2024-11-07 16:38:37,794][14395] Num frames 7700... +[2024-11-07 16:38:38,226][14395] Num frames 7800... +[2024-11-07 16:38:38,748][14395] Num frames 7900... +[2024-11-07 16:38:39,070][14395] Num frames 8000... +[2024-11-07 16:38:39,292][14395] Avg episode rewards: #0: 4.483, true rewards: #0: 4.032 +[2024-11-07 16:38:39,294][14395] Avg episode reward: 4.483, avg true_objective: 4.032 +[2024-11-07 16:38:39,414][14395] Num frames 8100... +[2024-11-07 16:38:39,656][14395] Num frames 8200... +[2024-11-07 16:38:39,915][14395] Num frames 8300... +[2024-11-07 16:38:40,181][14395] Num frames 8400... +[2024-11-07 16:38:40,399][14395] Avg episode rewards: #0: 4.452, true rewards: #0: 4.023 +[2024-11-07 16:38:40,400][14395] Avg episode reward: 4.452, avg true_objective: 4.023 +[2024-11-07 16:38:40,628][14395] Num frames 8500... +[2024-11-07 16:38:40,952][14395] Num frames 8600... +[2024-11-07 16:38:41,486][14395] Num frames 8700... +[2024-11-07 16:38:41,806][14395] Num frames 8800... +[2024-11-07 16:38:42,060][14395] Avg episode rewards: #0: 4.424, true rewards: #0: 4.015 +[2024-11-07 16:38:42,062][14395] Avg episode reward: 4.424, avg true_objective: 4.015 +[2024-11-07 16:38:42,353][14395] Num frames 8900... +[2024-11-07 16:38:42,807][14395] Num frames 9000... +[2024-11-07 16:38:43,151][14395] Num frames 9100... +[2024-11-07 16:38:43,507][14395] Num frames 9200... +[2024-11-07 16:38:43,881][14395] Avg episode rewards: #0: 4.470, true rewards: #0: 4.035 +[2024-11-07 16:38:43,884][14395] Avg episode reward: 4.470, avg true_objective: 4.035 +[2024-11-07 16:38:43,986][14395] Num frames 9300... +[2024-11-07 16:38:44,440][14395] Num frames 9400... +[2024-11-07 16:38:45,267][14395] Num frames 9500... +[2024-11-07 16:38:45,942][14395] Num frames 9600... +[2024-11-07 16:38:46,496][14395] Num frames 9700... +[2024-11-07 16:38:46,676][14395] Avg episode rewards: #0: 4.512, true rewards: #0: 4.054 +[2024-11-07 16:38:46,678][14395] Avg episode reward: 4.512, avg true_objective: 4.054 +[2024-11-07 16:38:47,074][14395] Num frames 9800... +[2024-11-07 16:38:47,433][14395] Num frames 9900... +[2024-11-07 16:38:47,817][14395] Num frames 10000... +[2024-11-07 16:38:48,101][14395] Num frames 10100... +[2024-11-07 16:38:48,194][14395] Avg episode rewards: #0: 4.485, true rewards: #0: 4.045 +[2024-11-07 16:38:48,195][14395] Avg episode reward: 4.485, avg true_objective: 4.045 +[2024-11-07 16:38:48,421][14395] Num frames 10200... +[2024-11-07 16:38:48,824][14395] Num frames 10300... +[2024-11-07 16:38:49,156][14395] Num frames 10400... +[2024-11-07 16:38:49,549][14395] Num frames 10500... +[2024-11-07 16:38:49,794][14395] Num frames 10600... +[2024-11-07 16:38:49,919][14395] Avg episode rewards: #0: 4.587, true rewards: #0: 4.087 +[2024-11-07 16:38:49,921][14395] Avg episode reward: 4.587, avg true_objective: 4.087 +[2024-11-07 16:38:50,121][14395] Num frames 10700... +[2024-11-07 16:38:50,363][14395] Num frames 10800... +[2024-11-07 16:38:50,651][14395] Num frames 10900... +[2024-11-07 16:38:50,885][14395] Num frames 11000... +[2024-11-07 16:38:50,965][14395] Avg episode rewards: #0: 4.559, true rewards: #0: 4.077 +[2024-11-07 16:38:50,968][14395] Avg episode reward: 4.559, avg true_objective: 4.077 +[2024-11-07 16:38:51,222][14395] Num frames 11100... +[2024-11-07 16:38:51,467][14395] Num frames 11200... +[2024-11-07 16:38:51,697][14395] Num frames 11300... +[2024-11-07 16:38:51,939][14395] Avg episode rewards: #0: 4.533, true rewards: #0: 4.069 +[2024-11-07 16:38:51,941][14395] Avg episode reward: 4.533, avg true_objective: 4.069 +[2024-11-07 16:38:51,959][14395] Num frames 11400... +[2024-11-07 16:38:52,172][14395] Num frames 11500... +[2024-11-07 16:38:52,358][14395] Num frames 11600... +[2024-11-07 16:38:52,520][14395] Avg episode rewards: #0: 4.465, true rewards: #0: 4.017 +[2024-11-07 16:38:52,521][14395] Avg episode reward: 4.465, avg true_objective: 4.017 +[2024-11-07 16:38:52,660][14395] Num frames 11700... +[2024-11-07 16:38:52,869][14395] Num frames 11800... +[2024-11-07 16:38:53,174][14395] Num frames 11900... +[2024-11-07 16:38:53,549][14395] Num frames 12000... +[2024-11-07 16:38:53,779][14395] Avg episode rewards: #0: 4.444, true rewards: #0: 4.011 +[2024-11-07 16:38:53,781][14395] Avg episode reward: 4.444, avg true_objective: 4.011 +[2024-11-07 16:38:54,073][14395] Num frames 12100... +[2024-11-07 16:38:54,379][14395] Num frames 12200... +[2024-11-07 16:38:54,660][14395] Num frames 12300... +[2024-11-07 16:38:55,132][14395] Num frames 12400... +[2024-11-07 16:38:55,326][14395] Avg episode rewards: #0: 4.425, true rewards: #0: 4.005 +[2024-11-07 16:38:55,329][14395] Avg episode reward: 4.425, avg true_objective: 4.005 +[2024-11-07 16:38:55,582][14395] Num frames 12500... +[2024-11-07 16:38:55,975][14395] Num frames 12600... +[2024-11-07 16:38:56,460][14395] Num frames 12700... +[2024-11-07 16:38:56,794][14395] Num frames 12800... +[2024-11-07 16:38:56,855][14395] Avg episode rewards: #0: 4.407, true rewards: #0: 4.000 +[2024-11-07 16:38:56,856][14395] Avg episode reward: 4.407, avg true_objective: 4.000 +[2024-11-07 16:38:57,121][14395] Num frames 12900... +[2024-11-07 16:38:57,339][14395] Num frames 13000... +[2024-11-07 16:38:57,593][14395] Num frames 13100... +[2024-11-07 16:38:57,814][14395] Num frames 13200... +[2024-11-07 16:39:00,296][14395] Avg episode rewards: #0: 4.439, true rewards: #0: 4.015 +[2024-11-07 16:39:00,300][14395] Avg episode reward: 4.439, avg true_objective: 4.015 +[2024-11-07 16:39:00,433][14395] Num frames 13300... +[2024-11-07 16:39:00,680][14395] Num frames 13400... +[2024-11-07 16:39:00,916][14395] Num frames 13500... +[2024-11-07 16:39:01,141][14395] Num frames 13600... +[2024-11-07 16:39:01,294][14395] Avg episode rewards: #0: 4.421, true rewards: #0: 4.010 +[2024-11-07 16:39:01,298][14395] Avg episode reward: 4.421, avg true_objective: 4.010 +[2024-11-07 16:39:01,469][14395] Num frames 13700... +[2024-11-07 16:39:01,728][14395] Num frames 13800... +[2024-11-07 16:39:02,047][14395] Num frames 13900... +[2024-11-07 16:39:02,317][14395] Num frames 14000... +[2024-11-07 16:39:02,433][14395] Avg episode rewards: #0: 4.405, true rewards: #0: 4.005 +[2024-11-07 16:39:02,438][14395] Avg episode reward: 4.405, avg true_objective: 4.005 +[2024-11-07 16:39:02,675][14395] Num frames 14100... +[2024-11-07 16:39:02,915][14395] Num frames 14200... +[2024-11-07 16:39:03,101][14395] Num frames 14300... +[2024-11-07 16:39:03,280][14395] Num frames 14400... +[2024-11-07 16:39:03,341][14395] Avg episode rewards: #0: 4.389, true rewards: #0: 4.000 +[2024-11-07 16:39:03,345][14395] Avg episode reward: 4.389, avg true_objective: 4.000 +[2024-11-07 16:39:03,567][14395] Num frames 14500... +[2024-11-07 16:39:03,933][14395] Num frames 14600... +[2024-11-07 16:39:04,176][14395] Num frames 14700... +[2024-11-07 16:39:04,535][14395] Num frames 14800... +[2024-11-07 16:39:04,912][14395] Avg episode rewards: #0: 4.419, true rewards: #0: 4.013 +[2024-11-07 16:39:04,916][14395] Avg episode reward: 4.419, avg true_objective: 4.013 +[2024-11-07 16:39:05,060][14395] Num frames 14900... +[2024-11-07 16:39:05,399][14395] Num frames 15000... +[2024-11-07 16:39:05,686][14395] Num frames 15100... +[2024-11-07 16:39:05,937][14395] Num frames 15200... +[2024-11-07 16:39:06,198][14395] Num frames 15300... +[2024-11-07 16:39:06,315][14395] Avg episode rewards: #0: 4.481, true rewards: #0: 4.034 +[2024-11-07 16:39:06,318][14395] Avg episode reward: 4.481, avg true_objective: 4.034 +[2024-11-07 16:39:06,589][14395] Num frames 15400... +[2024-11-07 16:39:07,037][14395] Num frames 15500... +[2024-11-07 16:39:07,414][14395] Num frames 15600... +[2024-11-07 16:39:07,686][14395] Num frames 15700... +[2024-11-07 16:39:07,784][14395] Avg episode rewards: #0: 4.465, true rewards: #0: 4.029 +[2024-11-07 16:39:07,785][14395] Avg episode reward: 4.465, avg true_objective: 4.029 +[2024-11-07 16:39:08,002][14395] Num frames 15800... +[2024-11-07 16:39:08,364][14395] Num frames 15900... +[2024-11-07 16:39:08,577][14395] Num frames 16000... +[2024-11-07 16:39:08,843][14395] Avg episode rewards: #0: 4.449, true rewards: #0: 4.024 +[2024-11-07 16:39:08,844][14395] Avg episode reward: 4.449, avg true_objective: 4.024 +[2024-11-07 16:39:08,850][14395] Num frames 16100... +[2024-11-07 16:39:09,086][14395] Num frames 16200... +[2024-11-07 16:39:09,331][14395] Num frames 16300... +[2024-11-07 16:39:09,547][14395] Num frames 16400... +[2024-11-07 16:39:09,812][14395] Avg episode rewards: #0: 4.434, true rewards: #0: 4.020 +[2024-11-07 16:39:09,813][14395] Avg episode reward: 4.434, avg true_objective: 4.020 +[2024-11-07 16:39:09,865][14395] Num frames 16500... +[2024-11-07 16:39:10,164][14395] Num frames 16600... +[2024-11-07 16:39:10,397][14395] Num frames 16700... +[2024-11-07 16:39:10,635][14395] Num frames 16800... +[2024-11-07 16:39:10,850][14395] Avg episode rewards: #0: 4.420, true rewards: #0: 4.015 +[2024-11-07 16:39:10,852][14395] Avg episode reward: 4.420, avg true_objective: 4.015 +[2024-11-07 16:39:10,932][14395] Num frames 16900... +[2024-11-07 16:39:11,120][14395] Num frames 17000... +[2024-11-07 16:39:11,327][14395] Num frames 17100... +[2024-11-07 16:39:11,536][14395] Num frames 17200... +[2024-11-07 16:39:11,706][14395] Avg episode rewards: #0: 4.407, true rewards: #0: 4.011 +[2024-11-07 16:39:11,710][14395] Avg episode reward: 4.407, avg true_objective: 4.011 +[2024-11-07 16:39:11,828][14395] Num frames 17300... +[2024-11-07 16:39:12,118][14395] Num frames 17400... +[2024-11-07 16:39:12,362][14395] Num frames 17500... +[2024-11-07 16:39:12,625][14395] Num frames 17600... +[2024-11-07 16:39:12,782][14395] Avg episode rewards: #0: 4.394, true rewards: #0: 4.007 +[2024-11-07 16:39:12,783][14395] Avg episode reward: 4.394, avg true_objective: 4.007 +[2024-11-07 16:39:12,967][14395] Num frames 17700... +[2024-11-07 16:39:13,262][14395] Num frames 17800... +[2024-11-07 16:39:13,527][14395] Num frames 17900... +[2024-11-07 16:39:13,757][14395] Num frames 18000... +[2024-11-07 16:39:13,857][14395] Avg episode rewards: #0: 4.382, true rewards: #0: 4.004 +[2024-11-07 16:39:13,858][14395] Avg episode reward: 4.382, avg true_objective: 4.004 +[2024-11-07 16:39:14,068][14395] Num frames 18100... +[2024-11-07 16:39:14,367][14395] Num frames 18200... +[2024-11-07 16:39:14,748][14395] Num frames 18300... +[2024-11-07 16:39:15,078][14395] Num frames 18400... +[2024-11-07 16:39:15,375][14395] Avg episode rewards: #0: 4.405, true rewards: #0: 4.014 +[2024-11-07 16:39:15,376][14395] Avg episode reward: 4.405, avg true_objective: 4.014 +[2024-11-07 16:39:15,488][14395] Num frames 18500... +[2024-11-07 16:39:15,784][14395] Num frames 18600... +[2024-11-07 16:39:16,189][14395] Num frames 18700... +[2024-11-07 16:39:16,627][14395] Num frames 18800... +[2024-11-07 16:39:16,901][14395] Avg episode rewards: #0: 4.393, true rewards: #0: 4.010 +[2024-11-07 16:39:16,906][14395] Avg episode reward: 4.393, avg true_objective: 4.010 +[2024-11-07 16:39:17,078][14395] Num frames 18900... +[2024-11-07 16:39:17,411][14395] Num frames 19000... +[2024-11-07 16:39:17,647][14395] Num frames 19100... +[2024-11-07 16:39:17,911][14395] Num frames 19200... +[2024-11-07 16:39:18,070][14395] Avg episode rewards: #0: 4.382, true rewards: #0: 4.007 +[2024-11-07 16:39:18,076][14395] Avg episode reward: 4.382, avg true_objective: 4.007 +[2024-11-07 16:39:18,259][14395] Num frames 19300... +[2024-11-07 16:39:18,513][14395] Num frames 19400... +[2024-11-07 16:39:18,735][14395] Num frames 19500... +[2024-11-07 16:39:18,987][14395] Num frames 19600... +[2024-11-07 16:39:19,085][14395] Avg episode rewards: #0: 4.371, true rewards: #0: 4.003 +[2024-11-07 16:39:19,089][14395] Avg episode reward: 4.371, avg true_objective: 4.003 +[2024-11-07 16:39:19,305][14395] Num frames 19700... +[2024-11-07 16:39:19,719][14395] Num frames 19800... +[2024-11-07 16:39:20,099][14395] Num frames 19900... +[2024-11-07 16:39:20,664][14395] Num frames 20000... +[2024-11-07 16:39:20,730][14395] Avg episode rewards: #0: 4.360, true rewards: #0: 4.000 +[2024-11-07 16:39:20,732][14395] Avg episode reward: 4.360, avg true_objective: 4.000 +[2024-11-07 16:39:21,128][14395] Num frames 20100... +[2024-11-07 16:39:21,371][14395] Num frames 20200... +[2024-11-07 16:39:21,611][14395] Num frames 20300... +[2024-11-07 16:39:21,862][14395] Avg episode rewards: #0: 4.350, true rewards: #0: 3.997 +[2024-11-07 16:39:21,866][14395] Avg episode reward: 4.350, avg true_objective: 3.997 +[2024-11-07 16:39:21,916][14395] Num frames 20400... +[2024-11-07 16:39:22,240][14395] Num frames 20500... +[2024-11-07 16:39:22,541][14395] Num frames 20600... +[2024-11-07 16:39:23,043][14395] Num frames 20700... +[2024-11-07 16:39:23,315][14395] Avg episode rewards: #0: 4.340, true rewards: #0: 3.994 +[2024-11-07 16:39:23,318][14395] Avg episode reward: 4.340, avg true_objective: 3.994 +[2024-11-07 16:39:23,427][14395] Num frames 20800... +[2024-11-07 16:39:23,736][14395] Num frames 20900... +[2024-11-07 16:39:24,070][14395] Num frames 21000... +[2024-11-07 16:39:24,348][14395] Num frames 21100... +[2024-11-07 16:39:24,598][14395] Avg episode rewards: #0: 4.331, true rewards: #0: 3.991 +[2024-11-07 16:39:24,602][14395] Avg episode reward: 4.331, avg true_objective: 3.991 +[2024-11-07 16:39:24,779][14395] Num frames 21200... +[2024-11-07 16:39:25,208][14395] Num frames 21300... +[2024-11-07 16:39:25,518][14395] Num frames 21400... +[2024-11-07 16:39:25,784][14395] Num frames 21500... +[2024-11-07 16:39:25,942][14395] Avg episode rewards: #0: 4.322, true rewards: #0: 3.988 +[2024-11-07 16:39:25,946][14395] Avg episode reward: 4.322, avg true_objective: 3.988 +[2024-11-07 16:39:26,106][14395] Num frames 21600... +[2024-11-07 16:39:26,340][14395] Num frames 21700... +[2024-11-07 16:39:26,591][14395] Num frames 21800... +[2024-11-07 16:39:26,827][14395] Num frames 21900... +[2024-11-07 16:39:26,926][14395] Avg episode rewards: #0: 4.313, true rewards: #0: 3.986 +[2024-11-07 16:39:26,929][14395] Avg episode reward: 4.313, avg true_objective: 3.986 +[2024-11-07 16:39:27,107][14395] Num frames 22000... +[2024-11-07 16:39:27,377][14395] Num frames 22100... +[2024-11-07 16:39:27,590][14395] Num frames 22200... +[2024-11-07 16:39:27,791][14395] Num frames 22300... +[2024-11-07 16:39:27,863][14395] Avg episode rewards: #0: 4.304, true rewards: #0: 3.983 +[2024-11-07 16:39:27,867][14395] Avg episode reward: 4.304, avg true_objective: 3.983 +[2024-11-07 16:39:28,125][14395] Num frames 22400... +[2024-11-07 16:39:28,399][14395] Num frames 22500... +[2024-11-07 16:39:28,632][14395] Num frames 22600... +[2024-11-07 16:39:28,893][14395] Num frames 22700... +[2024-11-07 16:39:29,126][14395] Num frames 22800... +[2024-11-07 16:39:29,221][14395] Avg episode rewards: #0: 4.354, true rewards: #0: 4.003 +[2024-11-07 16:39:29,223][14395] Avg episode reward: 4.354, avg true_objective: 4.003 +[2024-11-07 16:39:29,412][14395] Num frames 22900... +[2024-11-07 16:39:29,687][14395] Num frames 23000... +[2024-11-07 16:39:29,913][14395] Num frames 23100... +[2024-11-07 16:39:30,127][14395] Num frames 23200... +[2024-11-07 16:39:30,187][14395] Avg episode rewards: #0: 4.345, true rewards: #0: 4.000 +[2024-11-07 16:39:30,190][14395] Avg episode reward: 4.345, avg true_objective: 4.000 +[2024-11-07 16:39:30,418][14395] Num frames 23300... +[2024-11-07 16:39:30,678][14395] Num frames 23400... +[2024-11-07 16:39:31,046][14395] Num frames 23500... +[2024-11-07 16:39:31,323][14395] Num frames 23600... +[2024-11-07 16:39:31,595][14395] Avg episode rewards: #0: 4.387, true rewards: #0: 4.014 +[2024-11-07 16:39:31,600][14395] Avg episode reward: 4.387, avg true_objective: 4.014 +[2024-11-07 16:39:31,658][14395] Num frames 23700... +[2024-11-07 16:39:31,882][14395] Num frames 23800... +[2024-11-07 16:39:32,118][14395] Num frames 23900... +[2024-11-07 16:39:34,685][14395] Num frames 24000... +[2024-11-07 16:39:35,123][14395] Num frames 24100... +[2024-11-07 16:39:35,397][14395] Avg episode rewards: #0: 4.405, true rewards: #0: 4.021 +[2024-11-07 16:39:35,398][14395] Avg episode reward: 4.405, avg true_objective: 4.021 +[2024-11-07 16:39:35,593][14395] Num frames 24200... +[2024-11-07 16:39:35,804][14395] Num frames 24300... +[2024-11-07 16:39:36,056][14395] Avg episode rewards: #0: 4.375, true rewards: #0: 3.998 +[2024-11-07 16:39:36,060][14395] Avg episode reward: 4.375, avg true_objective: 3.998 +[2024-11-07 16:39:36,124][14395] Num frames 24400... +[2024-11-07 16:39:36,407][14395] Num frames 24500... +[2024-11-07 16:39:36,665][14395] Num frames 24600... +[2024-11-07 16:39:36,938][14395] Num frames 24700... +[2024-11-07 16:39:37,227][14395] Avg episode rewards: #0: 4.366, true rewards: #0: 3.995 +[2024-11-07 16:39:37,232][14395] Avg episode reward: 4.366, avg true_objective: 3.995 +[2024-11-07 16:39:37,316][14395] Num frames 24800... +[2024-11-07 16:39:37,636][14395] Num frames 24900... +[2024-11-07 16:39:37,890][14395] Num frames 25000... +[2024-11-07 16:39:38,104][14395] Num frames 25100... +[2024-11-07 16:39:38,283][14395] Avg episode rewards: #0: 4.358, true rewards: #0: 3.993 +[2024-11-07 16:39:38,288][14395] Avg episode reward: 4.358, avg true_objective: 3.993 +[2024-11-07 16:39:38,414][14395] Num frames 25200... +[2024-11-07 16:39:38,651][14395] Num frames 25300... +[2024-11-07 16:39:38,850][14395] Num frames 25400... +[2024-11-07 16:39:39,267][14395] Num frames 25500... +[2024-11-07 16:39:39,442][14395] Avg episode rewards: #0: 4.350, true rewards: #0: 3.990 +[2024-11-07 16:39:39,444][14395] Avg episode reward: 4.350, avg true_objective: 3.990 +[2024-11-07 16:39:39,601][14395] Num frames 25600... +[2024-11-07 16:39:39,848][14395] Num frames 25700... +[2024-11-07 16:39:40,110][14395] Num frames 25800... +[2024-11-07 16:39:40,372][14395] Num frames 25900... +[2024-11-07 16:39:40,685][14395] Avg episode rewards: #0: 4.367, true rewards: #0: 3.998 +[2024-11-07 16:39:40,692][14395] Avg episode reward: 4.367, avg true_objective: 3.998 +[2024-11-07 16:39:40,739][14395] Num frames 26000... +[2024-11-07 16:39:40,934][14395] Num frames 26100... +[2024-11-07 16:39:41,155][14395] Num frames 26200... +[2024-11-07 16:39:41,343][14395] Num frames 26300... +[2024-11-07 16:39:41,608][14395] Avg episode rewards: #0: 4.359, true rewards: #0: 3.995 +[2024-11-07 16:39:41,609][14395] Avg episode reward: 4.359, avg true_objective: 3.995 +[2024-11-07 16:39:41,674][14395] Num frames 26400... +[2024-11-07 16:39:41,938][14395] Num frames 26500... +[2024-11-07 16:39:42,164][14395] Num frames 26600... +[2024-11-07 16:39:42,367][14395] Num frames 26700... +[2024-11-07 16:39:42,604][14395] Avg episode rewards: #0: 4.371, true rewards: #0: 3.998 +[2024-11-07 16:39:42,606][14395] Avg episode reward: 4.371, avg true_objective: 3.998 +[2024-11-07 16:39:42,637][14395] Num frames 26800... +[2024-11-07 16:39:42,901][14395] Num frames 26900... +[2024-11-07 16:39:43,149][14395] Num frames 27000... +[2024-11-07 16:39:43,543][14395] Num frames 27100... +[2024-11-07 16:39:43,856][14395] Avg episode rewards: #0: 4.363, true rewards: #0: 3.995 +[2024-11-07 16:39:43,858][14395] Avg episode reward: 4.363, avg true_objective: 3.995 +[2024-11-07 16:39:43,969][14395] Num frames 27200... +[2024-11-07 16:39:44,161][14395] Num frames 27300... +[2024-11-07 16:39:44,367][14395] Num frames 27400... +[2024-11-07 16:39:44,563][14395] Num frames 27500... +[2024-11-07 16:39:44,719][14395] Avg episode rewards: #0: 4.356, true rewards: #0: 3.993 +[2024-11-07 16:39:44,721][14395] Avg episode reward: 4.356, avg true_objective: 3.993 +[2024-11-07 16:39:44,918][14395] Num frames 27600... +[2024-11-07 16:39:45,159][14395] Num frames 27700... +[2024-11-07 16:39:45,420][14395] Num frames 27800... +[2024-11-07 16:39:45,648][14395] Num frames 27900... +[2024-11-07 16:39:45,787][14395] Avg episode rewards: #0: 4.348, true rewards: #0: 3.991 +[2024-11-07 16:39:45,789][14395] Avg episode reward: 4.348, avg true_objective: 3.991 +[2024-11-07 16:39:45,893][14395] Num frames 28000... +[2024-11-07 16:39:46,089][14395] Num frames 28100... +[2024-11-07 16:39:46,262][14395] Num frames 28200... +[2024-11-07 16:39:46,507][14395] Num frames 28300... +[2024-11-07 16:39:46,614][14395] Avg episode rewards: #0: 4.341, true rewards: #0: 3.989 +[2024-11-07 16:39:46,617][14395] Avg episode reward: 4.341, avg true_objective: 3.989 +[2024-11-07 16:39:46,786][14395] Num frames 28400... +[2024-11-07 16:39:47,018][14395] Num frames 28500... +[2024-11-07 16:39:47,255][14395] Num frames 28600... +[2024-11-07 16:39:47,508][14395] Num frames 28700... +[2024-11-07 16:39:47,653][14395] Avg episode rewards: #0: 4.352, true rewards: #0: 3.991 +[2024-11-07 16:39:47,656][14395] Avg episode reward: 4.352, avg true_objective: 3.991 +[2024-11-07 16:39:47,833][14395] Num frames 28800... +[2024-11-07 16:39:48,046][14395] Num frames 28900... +[2024-11-07 16:39:48,254][14395] Num frames 29000... +[2024-11-07 16:39:48,448][14395] Num frames 29100... +[2024-11-07 16:39:48,546][14395] Avg episode rewards: #0: 4.345, true rewards: #0: 3.989 +[2024-11-07 16:39:48,551][14395] Avg episode reward: 4.345, avg true_objective: 3.989 +[2024-11-07 16:39:48,718][14395] Num frames 29200... +[2024-11-07 16:39:48,889][14395] Num frames 29300... +[2024-11-07 16:39:49,149][14395] Num frames 29400... +[2024-11-07 16:39:49,366][14395] Num frames 29500... +[2024-11-07 16:39:49,555][14395] Num frames 29600... +[2024-11-07 16:39:49,806][14395] Avg episode rewards: #0: 4.387, true rewards: #0: 4.009 +[2024-11-07 16:39:49,808][14395] Avg episode reward: 4.387, avg true_objective: 4.009 +[2024-11-07 16:39:49,890][14395] Num frames 29700... +[2024-11-07 16:39:50,069][14395] Num frames 29800... +[2024-11-07 16:39:50,284][14395] Num frames 29900... +[2024-11-07 16:39:50,557][14395] Num frames 30000... +[2024-11-07 16:39:50,739][14395] Avg episode rewards: #0: 4.380, true rewards: #0: 4.007 +[2024-11-07 16:39:50,740][14395] Avg episode reward: 4.380, avg true_objective: 4.007 +[2024-11-07 16:39:50,848][14395] Num frames 30100... +[2024-11-07 16:39:51,022][14395] Num frames 30200... +[2024-11-07 16:39:51,203][14395] Num frames 30300... +[2024-11-07 16:39:51,366][14395] Num frames 30400... +[2024-11-07 16:39:51,481][14395] Avg episode rewards: #0: 4.373, true rewards: #0: 4.004 +[2024-11-07 16:39:51,484][14395] Avg episode reward: 4.373, avg true_objective: 4.004 +[2024-11-07 16:39:51,621][14395] Num frames 30500... +[2024-11-07 16:39:51,804][14395] Num frames 30600... +[2024-11-07 16:39:52,052][14395] Num frames 30700... +[2024-11-07 16:39:52,262][14395] Num frames 30800... +[2024-11-07 16:39:52,357][14395] Avg episode rewards: #0: 4.366, true rewards: #0: 4.002 +[2024-11-07 16:39:52,358][14395] Avg episode reward: 4.366, avg true_objective: 4.002 +[2024-11-07 16:39:52,517][14395] Num frames 30900... +[2024-11-07 16:39:52,741][14395] Num frames 31000... +[2024-11-07 16:39:53,089][14395] Num frames 31100... +[2024-11-07 16:39:53,295][14395] Num frames 31200... +[2024-11-07 16:39:53,485][14395] Num frames 31300... +[2024-11-07 16:39:53,775][14395] Avg episode rewards: #0: 4.405, true rewards: #0: 4.021 +[2024-11-07 16:39:53,777][14395] Avg episode reward: 4.405, avg true_objective: 4.021 +[2024-11-07 16:39:53,869][14395] Num frames 31400... +[2024-11-07 16:39:54,091][14395] Num frames 31500... +[2024-11-07 16:39:54,310][14395] Num frames 31600... +[2024-11-07 16:39:54,516][14395] Num frames 31700... +[2024-11-07 16:39:54,660][14395] Avg episode rewards: #0: 4.398, true rewards: #0: 4.018 +[2024-11-07 16:39:54,665][14395] Avg episode reward: 4.398, avg true_objective: 4.018 +[2024-11-07 16:39:54,782][14395] Num frames 31800... +[2024-11-07 16:39:54,990][14395] Num frames 31900... +[2024-11-07 16:39:55,192][14395] Num frames 32000... +[2024-11-07 16:39:55,507][14395] Num frames 32100... +[2024-11-07 16:39:55,636][14395] Avg episode rewards: #0: 4.391, true rewards: #0: 4.016 +[2024-11-07 16:39:55,640][14395] Avg episode reward: 4.391, avg true_objective: 4.016 +[2024-11-07 16:39:55,832][14395] Num frames 32200... +[2024-11-07 16:39:56,036][14395] Num frames 32300... +[2024-11-07 16:39:56,348][14395] Num frames 32400... +[2024-11-07 16:39:56,573][14395] Num frames 32500... +[2024-11-07 16:39:56,777][14395] Avg episode rewards: #0: 4.403, true rewards: #0: 4.021 +[2024-11-07 16:39:56,779][14395] Avg episode reward: 4.403, avg true_objective: 4.021 +[2024-11-07 16:39:56,996][14395] Num frames 32600... +[2024-11-07 16:39:57,670][14395] Num frames 32700... +[2024-11-07 16:39:58,204][14395] Num frames 32800... +[2024-11-07 16:39:58,506][14395] Num frames 32900... +[2024-11-07 16:39:58,702][14395] Avg episode rewards: #0: 4.397, true rewards: #0: 4.019 +[2024-11-07 16:39:58,707][14395] Avg episode reward: 4.397, avg true_objective: 4.019 +[2024-11-07 16:39:58,833][14395] Num frames 33000... +[2024-11-07 16:39:59,141][14395] Num frames 33100... +[2024-11-07 16:39:59,354][14395] Num frames 33200... +[2024-11-07 16:39:59,585][14395] Num frames 33300... +[2024-11-07 16:39:59,816][14395] Num frames 33400... +[2024-11-07 16:39:59,869][14395] Avg episode rewards: #0: 4.410, true rewards: #0: 4.024 +[2024-11-07 16:39:59,870][14395] Avg episode reward: 4.410, avg true_objective: 4.024 +[2024-11-07 16:40:00,204][14395] Num frames 33500... +[2024-11-07 16:40:00,654][14395] Num frames 33600... +[2024-11-07 16:40:00,974][14395] Num frames 33700... +[2024-11-07 16:40:01,320][14395] Num frames 33800... +[2024-11-07 16:40:01,422][14395] Avg episode rewards: #0: 4.419, true rewards: #0: 4.026 +[2024-11-07 16:40:01,427][14395] Avg episode reward: 4.419, avg true_objective: 4.026 +[2024-11-07 16:40:01,660][14395] Num frames 33900... +[2024-11-07 16:40:01,907][14395] Num frames 34000... +[2024-11-07 16:40:02,119][14395] Num frames 34100... +[2024-11-07 16:40:02,335][14395] Num frames 34200... +[2024-11-07 16:40:02,540][14395] Avg episode rewards: #0: 4.431, true rewards: #0: 4.031 +[2024-11-07 16:40:02,543][14395] Avg episode reward: 4.431, avg true_objective: 4.031 +[2024-11-07 16:40:02,672][14395] Num frames 34300... +[2024-11-07 16:40:02,934][14395] Num frames 34400... +[2024-11-07 16:40:03,197][14395] Num frames 34500... +[2024-11-07 16:40:03,733][14395] Num frames 34600... +[2024-11-07 16:40:04,095][14395] Num frames 34700... +[2024-11-07 16:40:04,187][14395] Avg episode rewards: #0: 4.455, true rewards: #0: 4.036 +[2024-11-07 16:40:04,188][14395] Avg episode reward: 4.455, avg true_objective: 4.036 +[2024-11-07 16:40:04,493][14395] Num frames 34800... +[2024-11-07 16:40:04,896][14395] Num frames 34900... +[2024-11-07 16:40:05,192][14395] Num frames 35000... +[2024-11-07 16:40:05,461][14395] Avg episode rewards: #0: 4.448, true rewards: #0: 4.034 +[2024-11-07 16:40:05,466][14395] Avg episode reward: 4.448, avg true_objective: 4.034 +[2024-11-07 16:40:05,482][14395] Num frames 35100... +[2024-11-07 16:40:05,706][14395] Num frames 35200... +[2024-11-07 16:40:05,892][14395] Num frames 35300... +[2024-11-07 16:40:06,089][14395] Num frames 35400... +[2024-11-07 16:40:06,285][14395] Num frames 35500... +[2024-11-07 16:40:06,427][14395] Avg episode rewards: #0: 4.460, true rewards: #0: 4.039 +[2024-11-07 16:40:06,429][14395] Avg episode reward: 4.460, avg true_objective: 4.039 +[2024-11-07 16:40:06,546][14395] Num frames 35600... +[2024-11-07 16:40:06,759][14395] Num frames 35700... +[2024-11-07 16:40:09,287][14395] Num frames 35800... +[2024-11-07 16:40:09,491][14395] Num frames 35900... +[2024-11-07 16:40:09,723][14395] Avg episode rewards: #0: 4.471, true rewards: #0: 4.044 +[2024-11-07 16:40:09,724][14395] Avg episode reward: 4.471, avg true_objective: 4.044 +[2024-11-07 16:40:09,742][14395] Num frames 36000... +[2024-11-07 16:40:09,955][14395] Num frames 36100... +[2024-11-07 16:40:10,158][14395] Num frames 36200... +[2024-11-07 16:40:10,359][14395] Num frames 36300... +[2024-11-07 16:40:10,557][14395] Avg episode rewards: #0: 4.464, true rewards: #0: 4.042 +[2024-11-07 16:40:10,560][14395] Avg episode reward: 4.464, avg true_objective: 4.042 +[2024-11-07 16:40:10,634][14395] Num frames 36400... +[2024-11-07 16:40:10,829][14395] Num frames 36500... +[2024-11-07 16:40:11,031][14395] Num frames 36600... +[2024-11-07 16:40:11,241][14395] Num frames 36700... +[2024-11-07 16:40:11,436][14395] Avg episode rewards: #0: 4.457, true rewards: #0: 4.040 +[2024-11-07 16:40:11,441][14395] Avg episode reward: 4.457, avg true_objective: 4.040 +[2024-11-07 16:40:11,565][14395] Num frames 36800... +[2024-11-07 16:40:11,797][14395] Num frames 36900... +[2024-11-07 16:40:12,052][14395] Num frames 37000... +[2024-11-07 16:40:12,278][14395] Num frames 37100... +[2024-11-07 16:40:12,530][14395] Avg episode rewards: #0: 4.465, true rewards: #0: 4.041 +[2024-11-07 16:40:12,532][14395] Avg episode reward: 4.465, avg true_objective: 4.041 +[2024-11-07 16:40:12,613][14395] Num frames 37200... +[2024-11-07 16:40:12,878][14395] Num frames 37300... +[2024-11-07 16:40:13,126][14395] Num frames 37400... +[2024-11-07 16:40:13,376][14395] Num frames 37500... +[2024-11-07 16:40:13,563][14395] Avg episode rewards: #0: 4.458, true rewards: #0: 4.039 +[2024-11-07 16:40:13,567][14395] Avg episode reward: 4.458, avg true_objective: 4.039 +[2024-11-07 16:40:13,736][14395] Num frames 37600... +[2024-11-07 16:40:14,160][14395] Num frames 37700... +[2024-11-07 16:40:14,618][14395] Num frames 37800... +[2024-11-07 16:40:15,004][14395] Num frames 37900... +[2024-11-07 16:40:15,217][14395] Avg episode rewards: #0: 4.451, true rewards: #0: 4.037 +[2024-11-07 16:40:15,219][14395] Avg episode reward: 4.451, avg true_objective: 4.037 +[2024-11-07 16:40:15,421][14395] Num frames 38000... +[2024-11-07 16:40:15,751][14395] Num frames 38100... +[2024-11-07 16:40:16,085][14395] Num frames 38200... +[2024-11-07 16:40:16,413][14395] Num frames 38300... +[2024-11-07 16:40:16,628][14395] Avg episode rewards: #0: 4.445, true rewards: #0: 4.035 +[2024-11-07 16:40:16,630][14395] Avg episode reward: 4.445, avg true_objective: 4.035 +[2024-11-07 16:40:16,869][14395] Num frames 38400... +[2024-11-07 16:40:17,155][14395] Num frames 38500... +[2024-11-07 16:40:17,590][14395] Num frames 38600... +[2024-11-07 16:40:17,879][14395] Num frames 38700... +[2024-11-07 16:40:18,116][14395] Avg episode rewards: #0: 4.439, true rewards: #0: 4.032 +[2024-11-07 16:40:18,119][14395] Avg episode reward: 4.439, avg true_objective: 4.032 +[2024-11-07 16:40:18,451][14395] Num frames 38800... +[2024-11-07 16:40:18,914][14395] Num frames 38900... +[2024-11-07 16:40:19,263][14395] Num frames 39000... +[2024-11-07 16:40:19,348][14395] Avg episode rewards: #0: 4.423, true rewards: #0: 4.021 +[2024-11-07 16:40:19,352][14395] Avg episode reward: 4.423, avg true_objective: 4.021 +[2024-11-07 16:40:19,652][14395] Num frames 39100... +[2024-11-07 16:40:20,207][14395] Num frames 39200... +[2024-11-07 16:40:20,672][14395] Num frames 39300... +[2024-11-07 16:40:21,081][14395] Avg episode rewards: #0: 4.417, true rewards: #0: 4.019 +[2024-11-07 16:40:21,084][14395] Avg episode reward: 4.417, avg true_objective: 4.019 +[2024-11-07 16:40:21,151][14395] Num frames 39400... +[2024-11-07 16:40:21,469][14395] Num frames 39500... +[2024-11-07 16:40:21,902][14395] Num frames 39600... +[2024-11-07 16:40:22,256][14395] Num frames 39700... +[2024-11-07 16:40:22,517][14395] Avg episode rewards: #0: 4.418, true rewards: #0: 4.014 +[2024-11-07 16:40:22,522][14395] Avg episode reward: 4.418, avg true_objective: 4.014 +[2024-11-07 16:40:23,010][14395] Num frames 39800... +[2024-11-07 16:40:23,597][14395] Num frames 39900... +[2024-11-07 16:40:24,040][14395] Num frames 40000... +[2024-11-07 16:40:24,480][14395] Num frames 40100... +[2024-11-07 16:40:24,664][14395] Avg episode rewards: #0: 4.412, true rewards: #0: 4.012 +[2024-11-07 16:40:24,668][14395] Avg episode reward: 4.412, avg true_objective: 4.012 +[2024-11-07 16:42:52,173][14395] Replay video saved to /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/replay.mp4! +[2024-11-07 16:42:52,298][14395] Loading existing experiment configuration from /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/config.json +[2024-11-07 16:42:52,301][14395] Overriding arg 'num_workers' with value 1 passed from command line +[2024-11-07 16:42:52,303][14395] Adding new argument 'no_render'=True that is not in the saved config file! +[2024-11-07 16:42:52,304][14395] Adding new argument 'save_video'=True that is not in the saved config file! +[2024-11-07 16:42:52,307][14395] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! +[2024-11-07 16:42:52,309][14395] Adding new argument 'video_name'=None that is not in the saved config file! +[2024-11-07 16:42:52,311][14395] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! +[2024-11-07 16:42:52,313][14395] Adding new argument 'max_num_episodes'=100 that is not in the saved config file! +[2024-11-07 16:42:52,313][14395] Adding new argument 'push_to_hub'=True that is not in the saved config file! +[2024-11-07 16:42:52,316][14395] Adding new argument 'hf_repository'='alidenewade/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! +[2024-11-07 16:42:52,319][14395] Adding new argument 'policy_index'=0 that is not in the saved config file! +[2024-11-07 16:42:52,320][14395] Adding new argument 'eval_deterministic'=False that is not in the saved config file! +[2024-11-07 16:42:52,321][14395] Adding new argument 'train_script'=None that is not in the saved config file! +[2024-11-07 16:42:52,326][14395] Adding new argument 'enjoy_script'=None that is not in the saved config file! +[2024-11-07 16:42:52,333][14395] Using frameskip 1 and render_action_repeat=4 for evaluation +[2024-11-07 16:42:52,377][14395] RunningMeanStd input shape: (3, 72, 128) +[2024-11-07 16:42:52,379][14395] RunningMeanStd input shape: (1,) +[2024-11-07 16:42:52,432][14395] ConvEncoder: input_channels=3 +[2024-11-07 16:42:52,500][14395] Conv encoder output size: 512 +[2024-11-07 16:42:52,502][14395] Policy head output size: 512 +[2024-11-07 16:42:52,549][14395] Loading state from checkpoint /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000004884_20004864.pth... +[2024-11-07 16:42:53,432][14395] Num frames 100... +[2024-11-07 16:42:53,681][14395] Num frames 200... +[2024-11-07 16:42:53,926][14395] Num frames 300... +[2024-11-07 16:42:54,198][14395] Num frames 400... +[2024-11-07 16:42:54,426][14395] Avg episode rewards: #0: 6.600, true rewards: #0: 4.600 +[2024-11-07 16:42:54,427][14395] Avg episode reward: 6.600, avg true_objective: 4.600 +[2024-11-07 16:42:54,524][14395] Num frames 500... +[2024-11-07 16:42:54,766][14395] Num frames 600... +[2024-11-07 16:42:55,021][14395] Num frames 700... +[2024-11-07 16:42:55,280][14395] Num frames 800... +[2024-11-07 16:42:55,441][14395] Avg episode rewards: #0: 5.220, true rewards: #0: 4.220 +[2024-11-07 16:42:55,444][14395] Avg episode reward: 5.220, avg true_objective: 4.220 +[2024-11-07 16:42:55,579][14395] Num frames 900... +[2024-11-07 16:42:55,815][14395] Num frames 1000... +[2024-11-07 16:42:56,071][14395] Num frames 1100... +[2024-11-07 16:42:56,364][14395] Num frames 1200... +[2024-11-07 16:42:56,569][14395] Avg episode rewards: #0: 4.867, true rewards: #0: 4.200 +[2024-11-07 16:42:56,572][14395] Avg episode reward: 4.867, avg true_objective: 4.200 +[2024-11-07 16:42:56,665][14395] Num frames 1300... +[2024-11-07 16:42:56,904][14395] Num frames 1400... +[2024-11-07 16:42:57,165][14395] Num frames 1500... +[2024-11-07 16:42:57,419][14395] Num frames 1600... +[2024-11-07 16:42:57,572][14395] Avg episode rewards: #0: 4.610, true rewards: #0: 4.110 +[2024-11-07 16:42:57,578][14395] Avg episode reward: 4.610, avg true_objective: 4.110 +[2024-11-07 16:42:57,716][14395] Num frames 1700... +[2024-11-07 16:42:57,969][14395] Num frames 1800... +[2024-11-07 16:42:58,220][14395] Num frames 1900... +[2024-11-07 16:42:58,458][14395] Num frames 2000... +[2024-11-07 16:42:58,585][14395] Avg episode rewards: #0: 4.456, true rewards: #0: 4.056 +[2024-11-07 16:42:58,586][14395] Avg episode reward: 4.456, avg true_objective: 4.056 +[2024-11-07 16:42:58,763][14395] Num frames 2100... +[2024-11-07 16:42:59,029][14395] Num frames 2200... +[2024-11-07 16:42:59,309][14395] Num frames 2300... +[2024-11-07 16:42:59,669][14395] Num frames 2400... +[2024-11-07 16:42:59,882][14395] Avg episode rewards: #0: 4.573, true rewards: #0: 4.073 +[2024-11-07 16:42:59,885][14395] Avg episode reward: 4.573, avg true_objective: 4.073 +[2024-11-07 16:43:00,091][14395] Num frames 2500... +[2024-11-07 16:43:03,038][14395] Num frames 2600... +[2024-11-07 16:43:03,323][14395] Num frames 2700... +[2024-11-07 16:43:03,597][14395] Num frames 2800... +[2024-11-07 16:43:03,909][14395] Avg episode rewards: #0: 4.703, true rewards: #0: 4.131 +[2024-11-07 16:43:03,913][14395] Avg episode reward: 4.703, avg true_objective: 4.131 +[2024-11-07 16:43:03,943][14395] Num frames 2900... +[2024-11-07 16:43:04,216][14395] Num frames 3000... +[2024-11-07 16:43:04,504][14395] Num frames 3100... +[2024-11-07 16:43:04,783][14395] Num frames 3200... +[2024-11-07 16:43:05,053][14395] Avg episode rewards: #0: 4.595, true rewards: #0: 4.095 +[2024-11-07 16:43:05,057][14395] Avg episode reward: 4.595, avg true_objective: 4.095 +[2024-11-07 16:43:05,120][14395] Num frames 3300... +[2024-11-07 16:43:05,404][14395] Num frames 3400... +[2024-11-07 16:43:05,702][14395] Num frames 3500... +[2024-11-07 16:43:06,017][14395] Num frames 3600... +[2024-11-07 16:43:06,280][14395] Num frames 3700... +[2024-11-07 16:43:06,396][14395] Avg episode rewards: #0: 4.693, true rewards: #0: 4.138 +[2024-11-07 16:43:06,404][14395] Avg episode reward: 4.693, avg true_objective: 4.138 +[2024-11-07 16:43:06,606][14395] Num frames 3800... +[2024-11-07 16:43:06,884][14395] Num frames 3900... +[2024-11-07 16:43:07,163][14395] Num frames 4000... +[2024-11-07 16:43:07,433][14395] Num frames 4100... +[2024-11-07 16:43:07,514][14395] Avg episode rewards: #0: 4.608, true rewards: #0: 4.108 +[2024-11-07 16:43:07,515][14395] Avg episode reward: 4.608, avg true_objective: 4.108 +[2024-11-07 16:43:07,760][14395] Num frames 4200... +[2024-11-07 16:43:08,040][14395] Num frames 4300... +[2024-11-07 16:43:08,325][14395] Num frames 4400... +[2024-11-07 16:43:08,632][14395] Avg episode rewards: #0: 4.538, true rewards: #0: 4.084 +[2024-11-07 16:43:08,634][14395] Avg episode reward: 4.538, avg true_objective: 4.084 +[2024-11-07 16:43:08,668][14395] Num frames 4500... +[2024-11-07 16:43:09,113][14395] Num frames 4600... +[2024-11-07 16:43:09,391][14395] Num frames 4700... +[2024-11-07 16:43:09,686][14395] Num frames 4800... +[2024-11-07 16:43:09,976][14395] Num frames 4900... +[2024-11-07 16:43:10,262][14395] Num frames 5000... +[2024-11-07 16:43:10,587][14395] Num frames 5100... +[2024-11-07 16:43:10,639][14395] Avg episode rewards: #0: 4.917, true rewards: #0: 4.250 +[2024-11-07 16:43:10,642][14395] Avg episode reward: 4.917, avg true_objective: 4.250 +[2024-11-07 16:43:10,931][14395] Num frames 5200... +[2024-11-07 16:43:11,227][14395] Num frames 5300... +[2024-11-07 16:43:11,517][14395] Num frames 5400... +[2024-11-07 16:43:11,783][14395] Avg episode rewards: #0: 4.834, true rewards: #0: 4.218 +[2024-11-07 16:43:11,788][14395] Avg episode reward: 4.834, avg true_objective: 4.218 +[2024-11-07 16:43:11,847][14395] Num frames 5500... +[2024-11-07 16:43:12,083][14395] Num frames 5600... +[2024-11-07 16:43:12,373][14395] Num frames 5700... +[2024-11-07 16:43:12,715][14395] Num frames 5800... +[2024-11-07 16:43:12,965][14395] Avg episode rewards: #0: 4.763, true rewards: #0: 4.191 +[2024-11-07 16:43:12,966][14395] Avg episode reward: 4.763, avg true_objective: 4.191 +[2024-11-07 16:43:13,050][14395] Num frames 5900... +[2024-11-07 16:43:13,375][14395] Num frames 6000... +[2024-11-07 16:43:13,662][14395] Num frames 6100... +[2024-11-07 16:43:13,901][14395] Num frames 6200... +[2024-11-07 16:43:14,078][14395] Avg episode rewards: #0: 4.701, true rewards: #0: 4.168 +[2024-11-07 16:43:14,080][14395] Avg episode reward: 4.701, avg true_objective: 4.168 +[2024-11-07 16:43:14,198][14395] Num frames 6300... +[2024-11-07 16:43:14,440][14395] Num frames 6400... +[2024-11-07 16:43:14,687][14395] Num frames 6500... +[2024-11-07 16:43:14,918][14395] Num frames 6600... +[2024-11-07 16:43:15,066][14395] Avg episode rewards: #0: 4.648, true rewards: #0: 4.147 +[2024-11-07 16:43:15,067][14395] Avg episode reward: 4.648, avg true_objective: 4.147 +[2024-11-07 16:43:15,237][14395] Num frames 6700... +[2024-11-07 16:43:15,491][14395] Num frames 6800... +[2024-11-07 16:43:15,713][14395] Num frames 6900... +[2024-11-07 16:43:15,943][14395] Num frames 7000... +[2024-11-07 16:43:16,047][14395] Avg episode rewards: #0: 4.600, true rewards: #0: 4.129 +[2024-11-07 16:43:16,054][14395] Avg episode reward: 4.600, avg true_objective: 4.129 +[2024-11-07 16:43:16,296][14395] Num frames 7100... +[2024-11-07 16:43:16,650][14395] Num frames 7200... +[2024-11-07 16:43:16,964][14395] Num frames 7300... +[2024-11-07 16:43:17,229][14395] Num frames 7400... +[2024-11-07 16:43:17,387][14395] Avg episode rewards: #0: 4.631, true rewards: #0: 4.131 +[2024-11-07 16:43:17,389][14395] Avg episode reward: 4.631, avg true_objective: 4.131 +[2024-11-07 16:43:17,632][14395] Num frames 7500... +[2024-11-07 16:43:17,974][14395] Num frames 7600... +[2024-11-07 16:43:18,230][14395] Num frames 7700... +[2024-11-07 16:43:18,591][14395] Num frames 7800... +[2024-11-07 16:43:18,759][14395] Avg episode rewards: #0: 4.589, true rewards: #0: 4.116 +[2024-11-07 16:43:18,761][14395] Avg episode reward: 4.589, avg true_objective: 4.116 +[2024-11-07 16:43:19,087][14395] Num frames 7900... +[2024-11-07 16:43:19,439][14395] Num frames 8000... +[2024-11-07 16:43:19,872][14395] Num frames 8100... +[2024-11-07 16:43:20,248][14395] Num frames 8200... +[2024-11-07 16:43:20,327][14395] Avg episode rewards: #0: 4.552, true rewards: #0: 4.102 +[2024-11-07 16:43:20,329][14395] Avg episode reward: 4.552, avg true_objective: 4.102 +[2024-11-07 16:43:20,662][14395] Num frames 8300... +[2024-11-07 16:43:21,000][14395] Num frames 8400... +[2024-11-07 16:43:21,438][14395] Num frames 8500... +[2024-11-07 16:43:21,712][14395] Num frames 8600... +[2024-11-07 16:43:21,920][14395] Avg episode rewards: #0: 4.596, true rewards: #0: 4.120 +[2024-11-07 16:43:21,923][14395] Avg episode reward: 4.596, avg true_objective: 4.120 +[2024-11-07 16:43:22,187][14395] Num frames 8700... +[2024-11-07 16:43:22,633][14395] Num frames 8800... +[2024-11-07 16:43:22,980][14395] Num frames 8900... +[2024-11-07 16:43:23,540][14395] Num frames 9000... +[2024-11-07 16:43:23,760][14395] Avg episode rewards: #0: 4.562, true rewards: #0: 4.107 +[2024-11-07 16:43:23,764][14395] Avg episode reward: 4.562, avg true_objective: 4.107 +[2024-11-07 16:43:24,011][14395] Num frames 9100... +[2024-11-07 16:43:24,506][14395] Num frames 9200... +[2024-11-07 16:43:24,865][14395] Num frames 9300... +[2024-11-07 16:43:25,201][14395] Num frames 9400... +[2024-11-07 16:43:25,481][14395] Avg episode rewards: #0: 4.544, true rewards: #0: 4.110 +[2024-11-07 16:43:25,482][14395] Avg episode reward: 4.544, avg true_objective: 4.110 +[2024-11-07 16:43:25,653][14395] Num frames 9500... +[2024-11-07 16:43:25,932][14395] Num frames 9600... +[2024-11-07 16:43:26,183][14395] Num frames 9700... +[2024-11-07 16:43:26,425][14395] Num frames 9800... +[2024-11-07 16:43:26,571][14395] Avg episode rewards: #0: 4.515, true rewards: #0: 4.098 +[2024-11-07 16:43:26,574][14395] Avg episode reward: 4.515, avg true_objective: 4.098 +[2024-11-07 16:43:26,816][14395] Num frames 9900... +[2024-11-07 16:43:27,096][14395] Num frames 10000... +[2024-11-07 16:43:27,339][14395] Num frames 10100... +[2024-11-07 16:43:27,661][14395] Num frames 10200... +[2024-11-07 16:43:27,836][14395] Avg episode rewards: #0: 4.488, true rewards: #0: 4.088 +[2024-11-07 16:43:27,839][14395] Avg episode reward: 4.488, avg true_objective: 4.088 +[2024-11-07 16:43:28,120][14395] Num frames 10300... +[2024-11-07 16:43:28,477][14395] Num frames 10400... +[2024-11-07 16:43:28,758][14395] Num frames 10500... +[2024-11-07 16:43:29,031][14395] Num frames 10600... +[2024-11-07 16:43:29,183][14395] Avg episode rewards: #0: 4.514, true rewards: #0: 4.091 +[2024-11-07 16:43:29,189][14395] Avg episode reward: 4.514, avg true_objective: 4.091 +[2024-11-07 16:43:29,399][14395] Num frames 10700... +[2024-11-07 16:43:29,793][14395] Num frames 10800... +[2024-11-07 16:43:30,114][14395] Num frames 10900... +[2024-11-07 16:43:30,450][14395] Num frames 11000... +[2024-11-07 16:43:30,750][14395] Avg episode rewards: #0: 4.550, true rewards: #0: 4.105 +[2024-11-07 16:43:30,786][14395] Avg episode reward: 4.550, avg true_objective: 4.105 +[2024-11-07 16:43:30,847][14395] Num frames 11100... +[2024-11-07 16:43:31,099][14395] Num frames 11200... +[2024-11-07 16:43:31,355][14395] Num frames 11300... +[2024-11-07 16:43:31,591][14395] Num frames 11400... +[2024-11-07 16:43:31,830][14395] Avg episode rewards: #0: 4.524, true rewards: #0: 4.096 +[2024-11-07 16:43:31,832][14395] Avg episode reward: 4.524, avg true_objective: 4.096 +[2024-11-07 16:43:31,909][14395] Num frames 11500... +[2024-11-07 16:43:32,177][14395] Num frames 11600... +[2024-11-07 16:43:32,444][14395] Num frames 11700... +[2024-11-07 16:43:32,687][14395] Num frames 11800... +[2024-11-07 16:43:32,945][14395] Num frames 11900... +[2024-11-07 16:43:33,042][14395] Avg episode rewards: #0: 4.592, true rewards: #0: 4.109 +[2024-11-07 16:43:33,050][14395] Avg episode reward: 4.592, avg true_objective: 4.109 +[2024-11-07 16:43:33,303][14395] Num frames 12000... +[2024-11-07 16:43:33,606][14395] Num frames 12100... +[2024-11-07 16:43:33,849][14395] Num frames 12200... +[2024-11-07 16:43:34,120][14395] Num frames 12300... +[2024-11-07 16:43:34,248][14395] Avg episode rewards: #0: 4.577, true rewards: #0: 4.111 +[2024-11-07 16:43:34,251][14395] Avg episode reward: 4.577, avg true_objective: 4.111 +[2024-11-07 16:43:34,442][14395] Num frames 12400... +[2024-11-07 16:43:34,696][14395] Num frames 12500... +[2024-11-07 16:43:34,937][14395] Num frames 12600... +[2024-11-07 16:43:37,515][14395] Num frames 12700... +[2024-11-07 16:43:37,627][14395] Avg episode rewards: #0: 4.554, true rewards: #0: 4.102 +[2024-11-07 16:43:37,632][14395] Avg episode reward: 4.554, avg true_objective: 4.102 +[2024-11-07 16:43:37,883][14395] Num frames 12800... +[2024-11-07 16:43:38,153][14395] Num frames 12900... +[2024-11-07 16:43:38,417][14395] Num frames 13000... +[2024-11-07 16:43:38,772][14395] Num frames 13100... +[2024-11-07 16:43:38,825][14395] Avg episode rewards: #0: 4.531, true rewards: #0: 4.094 +[2024-11-07 16:43:38,829][14395] Avg episode reward: 4.531, avg true_objective: 4.094 +[2024-11-07 16:43:39,125][14395] Num frames 13200... +[2024-11-07 16:43:39,466][14395] Num frames 13300... +[2024-11-07 16:43:39,754][14395] Num frames 13400... +[2024-11-07 16:43:40,055][14395] Avg episode rewards: #0: 4.510, true rewards: #0: 4.086 +[2024-11-07 16:43:40,062][14395] Avg episode reward: 4.510, avg true_objective: 4.086 +[2024-11-07 16:43:40,114][14395] Num frames 13500... +[2024-11-07 16:43:40,366][14395] Num frames 13600... +[2024-11-07 16:43:40,631][14395] Num frames 13700... +[2024-11-07 16:43:40,882][14395] Num frames 13800... +[2024-11-07 16:43:41,119][14395] Avg episode rewards: #0: 4.491, true rewards: #0: 4.079 +[2024-11-07 16:43:41,125][14395] Avg episode reward: 4.491, avg true_objective: 4.079 +[2024-11-07 16:43:41,221][14395] Num frames 13900... +[2024-11-07 16:43:41,452][14395] Num frames 14000... +[2024-11-07 16:43:41,783][14395] Num frames 14100... +[2024-11-07 16:43:42,099][14395] Num frames 14200... +[2024-11-07 16:43:42,325][14395] Avg episode rewards: #0: 4.472, true rewards: #0: 4.072 +[2024-11-07 16:43:42,327][14395] Avg episode reward: 4.472, avg true_objective: 4.072 +[2024-11-07 16:43:42,479][14395] Num frames 14300... +[2024-11-07 16:43:42,827][14395] Num frames 14400... +[2024-11-07 16:43:43,136][14395] Num frames 14500... +[2024-11-07 16:43:43,429][14395] Num frames 14600... +[2024-11-07 16:43:43,732][14395] Num frames 14700... +[2024-11-07 16:43:44,094][14395] Avg episode rewards: #0: 4.554, true rewards: #0: 4.110 +[2024-11-07 16:43:44,096][14395] Avg episode reward: 4.554, avg true_objective: 4.110 +[2024-11-07 16:43:44,118][14395] Num frames 14800... +[2024-11-07 16:43:44,402][14395] Num frames 14900... +[2024-11-07 16:43:44,664][14395] Num frames 15000... +[2024-11-07 16:43:44,944][14395] Num frames 15100... +[2024-11-07 16:43:45,199][14395] Num frames 15200... +[2024-11-07 16:43:45,394][14395] Avg episode rewards: #0: 4.579, true rewards: #0: 4.120 +[2024-11-07 16:43:45,395][14395] Avg episode reward: 4.579, avg true_objective: 4.120 +[2024-11-07 16:43:45,560][14395] Num frames 15300... +[2024-11-07 16:43:45,872][14395] Num frames 15400... +[2024-11-07 16:43:46,141][14395] Num frames 15500... +[2024-11-07 16:43:46,193][14395] Avg episode rewards: #0: 4.526, true rewards: #0: 4.079 +[2024-11-07 16:43:46,195][14395] Avg episode reward: 4.526, avg true_objective: 4.079 +[2024-11-07 16:43:46,470][14395] Num frames 15600... +[2024-11-07 16:43:46,712][14395] Num frames 15700... +[2024-11-07 16:43:46,968][14395] Num frames 15800... +[2024-11-07 16:43:47,285][14395] Num frames 15900... +[2024-11-07 16:43:47,461][14395] Avg episode rewards: #0: 4.551, true rewards: #0: 4.089 +[2024-11-07 16:43:47,466][14395] Avg episode reward: 4.551, avg true_objective: 4.089 +[2024-11-07 16:43:47,613][14395] Num frames 16000... +[2024-11-07 16:43:47,932][14395] Num frames 16100... +[2024-11-07 16:43:48,189][14395] Num frames 16200... +[2024-11-07 16:43:48,417][14395] Num frames 16300... +[2024-11-07 16:43:48,557][14395] Avg episode rewards: #0: 4.533, true rewards: #0: 4.083 +[2024-11-07 16:43:48,561][14395] Avg episode reward: 4.533, avg true_objective: 4.083 +[2024-11-07 16:43:48,731][14395] Num frames 16400... +[2024-11-07 16:43:48,971][14395] Num frames 16500... +[2024-11-07 16:43:49,208][14395] Num frames 16600... +[2024-11-07 16:43:49,448][14395] Num frames 16700... +[2024-11-07 16:43:49,545][14395] Avg episode rewards: #0: 4.516, true rewards: #0: 4.077 +[2024-11-07 16:43:49,547][14395] Avg episode reward: 4.516, avg true_objective: 4.077 +[2024-11-07 16:43:49,738][14395] Num frames 16800... +[2024-11-07 16:43:49,989][14395] Num frames 16900... +[2024-11-07 16:43:50,222][14395] Num frames 17000... +[2024-11-07 16:43:50,479][14395] Num frames 17100... +[2024-11-07 16:43:50,532][14395] Avg episode rewards: #0: 4.500, true rewards: #0: 4.071 +[2024-11-07 16:43:50,533][14395] Avg episode reward: 4.500, avg true_objective: 4.071 +[2024-11-07 16:43:50,799][14395] Num frames 17200... +[2024-11-07 16:43:51,077][14395] Num frames 17300... +[2024-11-07 16:43:51,330][14395] Num frames 17400... +[2024-11-07 16:43:51,589][14395] Avg episode rewards: #0: 4.485, true rewards: #0: 4.066 +[2024-11-07 16:43:51,591][14395] Avg episode reward: 4.485, avg true_objective: 4.066 +[2024-11-07 16:43:51,650][14395] Num frames 17500... +[2024-11-07 16:43:51,906][14395] Num frames 17600... +[2024-11-07 16:43:52,141][14395] Num frames 17700... +[2024-11-07 16:43:52,376][14395] Num frames 17800... +[2024-11-07 16:43:52,604][14395] Avg episode rewards: #0: 4.470, true rewards: #0: 4.061 +[2024-11-07 16:43:52,605][14395] Avg episode reward: 4.470, avg true_objective: 4.061 +[2024-11-07 16:43:52,706][14395] Num frames 17900... +[2024-11-07 16:43:52,985][14395] Num frames 18000... +[2024-11-07 16:43:53,239][14395] Num frames 18100... +[2024-11-07 16:43:53,504][14395] Num frames 18200... +[2024-11-07 16:43:53,718][14395] Avg episode rewards: #0: 4.456, true rewards: #0: 4.056 +[2024-11-07 16:43:53,721][14395] Avg episode reward: 4.456, avg true_objective: 4.056 +[2024-11-07 16:43:53,867][14395] Num frames 18300... +[2024-11-07 16:43:54,129][14395] Num frames 18400... +[2024-11-07 16:43:54,374][14395] Num frames 18500... +[2024-11-07 16:43:54,639][14395] Num frames 18600... +[2024-11-07 16:43:54,779][14395] Avg episode rewards: #0: 4.443, true rewards: #0: 4.051 +[2024-11-07 16:43:54,781][14395] Avg episode reward: 4.443, avg true_objective: 4.051 +[2024-11-07 16:43:54,947][14395] Num frames 18700... +[2024-11-07 16:43:55,188][14395] Num frames 18800... +[2024-11-07 16:43:55,428][14395] Num frames 18900... +[2024-11-07 16:43:55,675][14395] Num frames 19000... +[2024-11-07 16:43:55,775][14395] Avg episode rewards: #0: 4.430, true rewards: #0: 4.047 +[2024-11-07 16:43:55,777][14395] Avg episode reward: 4.430, avg true_objective: 4.047 +[2024-11-07 16:43:55,989][14395] Num frames 19100... +[2024-11-07 16:43:56,239][14395] Num frames 19200... +[2024-11-07 16:43:56,477][14395] Num frames 19300... +[2024-11-07 16:43:56,717][14395] Num frames 19400... +[2024-11-07 16:43:56,786][14395] Avg episode rewards: #0: 4.418, true rewards: #0: 4.042 +[2024-11-07 16:43:56,793][14395] Avg episode reward: 4.418, avg true_objective: 4.042 +[2024-11-07 16:43:57,047][14395] Num frames 19500... +[2024-11-07 16:43:57,271][14395] Num frames 19600... +[2024-11-07 16:43:57,515][14395] Num frames 19700... +[2024-11-07 16:43:57,801][14395] Avg episode rewards: #0: 4.406, true rewards: #0: 4.038 +[2024-11-07 16:43:57,805][14395] Avg episode reward: 4.406, avg true_objective: 4.038 +[2024-11-07 16:43:57,845][14395] Num frames 19800... +[2024-11-07 16:43:58,084][14395] Num frames 19900... +[2024-11-07 16:43:58,338][14395] Num frames 20000... +[2024-11-07 16:43:58,516][14395] Avg episode rewards: #0: 4.369, true rewards: #0: 4.009 +[2024-11-07 16:43:58,519][14395] Avg episode reward: 4.369, avg true_objective: 4.009 +[2024-11-07 16:43:58,691][14395] Num frames 20100... +[2024-11-07 16:43:58,939][14395] Num frames 20200... +[2024-11-07 16:43:59,190][14395] Num frames 20300... +[2024-11-07 16:43:59,448][14395] Num frames 20400... +[2024-11-07 16:43:59,712][14395] Num frames 20500... +[2024-11-07 16:44:00,013][14395] Avg episode rewards: #0: 4.429, true rewards: #0: 4.037 +[2024-11-07 16:44:00,014][14395] Avg episode reward: 4.429, avg true_objective: 4.037 +[2024-11-07 16:44:00,064][14395] Num frames 20600... +[2024-11-07 16:44:00,491][14395] Num frames 20700... +[2024-11-07 16:44:00,759][14395] Num frames 20800... +[2024-11-07 16:44:01,052][14395] Num frames 20900... +[2024-11-07 16:44:01,278][14395] Num frames 21000... +[2024-11-07 16:44:01,422][14395] Avg episode rewards: #0: 4.449, true rewards: #0: 4.045 +[2024-11-07 16:44:01,425][14395] Avg episode reward: 4.449, avg true_objective: 4.045 +[2024-11-07 16:44:01,586][14395] Num frames 21100... +[2024-11-07 16:44:01,818][14395] Num frames 21200... +[2024-11-07 16:44:02,029][14395] Num frames 21300... +[2024-11-07 16:44:02,277][14395] Num frames 21400... +[2024-11-07 16:44:02,395][14395] Avg episode rewards: #0: 4.438, true rewards: #0: 4.042 +[2024-11-07 16:44:02,397][14395] Avg episode reward: 4.438, avg true_objective: 4.042 +[2024-11-07 16:44:02,581][14395] Num frames 21500... +[2024-11-07 16:44:02,813][14395] Num frames 21600... +[2024-11-07 16:44:03,051][14395] Num frames 21700... +[2024-11-07 16:44:03,323][14395] Num frames 21800... +[2024-11-07 16:44:03,393][14395] Avg episode rewards: #0: 4.427, true rewards: #0: 4.038 +[2024-11-07 16:44:03,396][14395] Avg episode reward: 4.427, avg true_objective: 4.038 +[2024-11-07 16:44:03,673][14395] Num frames 21900... +[2024-11-07 16:44:03,922][14395] Num frames 22000... +[2024-11-07 16:44:04,176][14395] Num frames 22100... +[2024-11-07 16:44:04,474][14395] Avg episode rewards: #0: 4.416, true rewards: #0: 4.034 +[2024-11-07 16:44:04,480][14395] Avg episode reward: 4.416, avg true_objective: 4.034 +[2024-11-07 16:44:04,534][14395] Num frames 22200... +[2024-11-07 16:44:04,779][14395] Num frames 22300... +[2024-11-07 16:44:05,020][14395] Num frames 22400... +[2024-11-07 16:44:05,312][14395] Num frames 22500... +[2024-11-07 16:44:05,648][14395] Num frames 22600... +[2024-11-07 16:44:05,715][14395] Avg episode rewards: #0: 4.429, true rewards: #0: 4.036 +[2024-11-07 16:44:05,722][14395] Avg episode reward: 4.429, avg true_objective: 4.036 +[2024-11-07 16:44:05,998][14395] Num frames 22700... +[2024-11-07 16:44:06,261][14395] Num frames 22800... +[2024-11-07 16:44:06,520][14395] Num frames 22900... +[2024-11-07 16:44:06,805][14395] Avg episode rewards: #0: 4.419, true rewards: #0: 4.033 +[2024-11-07 16:44:06,806][14395] Avg episode reward: 4.419, avg true_objective: 4.033 +[2024-11-07 16:44:06,842][14395] Num frames 23000... +[2024-11-07 16:44:07,202][14395] Num frames 23100... +[2024-11-07 16:44:07,464][14395] Num frames 23200... +[2024-11-07 16:44:07,718][14395] Avg episode rewards: #0: 4.410, true rewards: #0: 4.013 +[2024-11-07 16:44:07,721][14395] Avg episode reward: 4.410, avg true_objective: 4.013 +[2024-11-07 16:44:07,781][14395] Num frames 23300... +[2024-11-07 16:44:08,040][14395] Num frames 23400... +[2024-11-07 16:44:08,289][14395] Num frames 23500... +[2024-11-07 16:44:08,540][14395] Num frames 23600... +[2024-11-07 16:44:08,767][14395] Avg episode rewards: #0: 4.400, true rewards: #0: 4.010 +[2024-11-07 16:44:08,773][14395] Avg episode reward: 4.400, avg true_objective: 4.010 +[2024-11-07 16:44:08,893][14395] Num frames 23700... +[2024-11-07 16:44:09,156][14395] Num frames 23800... +[2024-11-07 16:44:09,432][14395] Num frames 23900... +[2024-11-07 16:44:09,734][14395] Num frames 24000... +[2024-11-07 16:44:12,442][14395] Avg episode rewards: #0: 4.391, true rewards: #0: 4.007 +[2024-11-07 16:44:12,444][14395] Avg episode reward: 4.391, avg true_objective: 4.007 +[2024-11-07 16:44:12,599][14395] Num frames 24100... +[2024-11-07 16:44:12,987][14395] Num frames 24200... +[2024-11-07 16:44:13,304][14395] Num frames 24300... +[2024-11-07 16:44:13,577][14395] Num frames 24400... +[2024-11-07 16:44:13,704][14395] Avg episode rewards: #0: 4.382, true rewards: #0: 4.005 +[2024-11-07 16:44:13,705][14395] Avg episode reward: 4.382, avg true_objective: 4.005 +[2024-11-07 16:44:13,918][14395] Num frames 24500... +[2024-11-07 16:44:14,202][14395] Num frames 24600... +[2024-11-07 16:44:14,478][14395] Num frames 24700... +[2024-11-07 16:44:14,767][14395] Num frames 24800... +[2024-11-07 16:44:14,859][14395] Avg episode rewards: #0: 4.373, true rewards: #0: 4.002 +[2024-11-07 16:44:14,861][14395] Avg episode reward: 4.373, avg true_objective: 4.002 +[2024-11-07 16:44:15,126][14395] Num frames 24900... +[2024-11-07 16:44:15,486][14395] Num frames 25000... +[2024-11-07 16:44:15,818][14395] Num frames 25100... +[2024-11-07 16:44:16,166][14395] Avg episode rewards: #0: 4.364, true rewards: #0: 3.999 +[2024-11-07 16:44:16,167][14395] Avg episode reward: 4.364, avg true_objective: 3.999 +[2024-11-07 16:44:16,178][14395] Num frames 25200... +[2024-11-07 16:44:16,580][14395] Num frames 25300... +[2024-11-07 16:44:16,919][14395] Num frames 25400... +[2024-11-07 16:44:17,364][14395] Num frames 25500... +[2024-11-07 16:44:17,700][14395] Avg episode rewards: #0: 4.356, true rewards: #0: 3.997 +[2024-11-07 16:44:17,702][14395] Avg episode reward: 4.356, avg true_objective: 3.997 +[2024-11-07 16:44:17,798][14395] Num frames 25600... +[2024-11-07 16:44:18,151][14395] Num frames 25700... +[2024-11-07 16:44:18,704][14395] Num frames 25800... +[2024-11-07 16:44:19,117][14395] Num frames 25900... +[2024-11-07 16:44:19,540][14395] Avg episode rewards: #0: 4.348, true rewards: #0: 3.994 +[2024-11-07 16:44:19,543][14395] Avg episode reward: 4.348, avg true_objective: 3.994 +[2024-11-07 16:44:19,699][14395] Num frames 26000... +[2024-11-07 16:44:20,171][14395] Num frames 26100... +[2024-11-07 16:44:20,540][14395] Num frames 26200... +[2024-11-07 16:44:20,908][14395] Num frames 26300... +[2024-11-07 16:44:21,272][14395] Num frames 26400... +[2024-11-07 16:44:21,449][14395] Avg episode rewards: #0: 4.370, true rewards: #0: 4.007 +[2024-11-07 16:44:21,451][14395] Avg episode reward: 4.370, avg true_objective: 4.007 +[2024-11-07 16:44:21,678][14395] Num frames 26500... +[2024-11-07 16:44:21,991][14395] Num frames 26600... +[2024-11-07 16:44:22,365][14395] Num frames 26700... +[2024-11-07 16:44:22,667][14395] Num frames 26800... +[2024-11-07 16:44:22,806][14395] Avg episode rewards: #0: 4.362, true rewards: #0: 4.004 +[2024-11-07 16:44:22,810][14395] Avg episode reward: 4.362, avg true_objective: 4.004 +[2024-11-07 16:44:23,012][14395] Num frames 26900... +[2024-11-07 16:44:23,261][14395] Num frames 27000... +[2024-11-07 16:44:23,569][14395] Num frames 27100... +[2024-11-07 16:44:23,896][14395] Num frames 27200... +[2024-11-07 16:44:23,980][14395] Avg episode rewards: #0: 4.355, true rewards: #0: 4.002 +[2024-11-07 16:44:23,982][14395] Avg episode reward: 4.355, avg true_objective: 4.002 +[2024-11-07 16:44:24,216][14395] Num frames 27300... +[2024-11-07 16:44:24,494][14395] Num frames 27400... +[2024-11-07 16:44:24,759][14395] Num frames 27500... +[2024-11-07 16:44:25,063][14395] Avg episode rewards: #0: 4.347, true rewards: #0: 3.999 +[2024-11-07 16:44:25,067][14395] Avg episode reward: 4.347, avg true_objective: 3.999 +[2024-11-07 16:44:25,085][14395] Num frames 27600... +[2024-11-07 16:44:25,412][14395] Num frames 27700... +[2024-11-07 16:44:25,712][14395] Num frames 27800... +[2024-11-07 16:44:26,010][14395] Num frames 27900... +[2024-11-07 16:44:26,296][14395] Num frames 28000... +[2024-11-07 16:44:26,392][14395] Avg episode rewards: #0: 4.345, true rewards: #0: 4.002 +[2024-11-07 16:44:26,398][14395] Avg episode reward: 4.345, avg true_objective: 4.002 +[2024-11-07 16:44:26,641][14395] Num frames 28100... +[2024-11-07 16:44:26,941][14395] Num frames 28200... +[2024-11-07 16:44:27,337][14395] Avg episode rewards: #0: 4.319, true rewards: #0: 3.981 +[2024-11-07 16:44:27,340][14395] Avg episode reward: 4.319, avg true_objective: 3.981 +[2024-11-07 16:44:27,445][14395] Num frames 28300... +[2024-11-07 16:44:27,743][14395] Num frames 28400... +[2024-11-07 16:44:28,014][14395] Num frames 28500... +[2024-11-07 16:44:28,293][14395] Num frames 28600... +[2024-11-07 16:44:28,510][14395] Avg episode rewards: #0: 4.313, true rewards: #0: 3.979 +[2024-11-07 16:44:28,512][14395] Avg episode reward: 4.313, avg true_objective: 3.979 +[2024-11-07 16:44:28,668][14395] Num frames 28700... +[2024-11-07 16:44:28,941][14395] Num frames 28800... +[2024-11-07 16:44:29,250][14395] Num frames 28900... +[2024-11-07 16:44:29,540][14395] Num frames 29000... +[2024-11-07 16:44:29,823][14395] Num frames 29100... +[2024-11-07 16:44:29,989][14395] Avg episode rewards: #0: 4.333, true rewards: #0: 3.991 +[2024-11-07 16:44:29,992][14395] Avg episode reward: 4.333, avg true_objective: 3.991 +[2024-11-07 16:44:30,187][14395] Num frames 29200... +[2024-11-07 16:44:30,469][14395] Num frames 29300... +[2024-11-07 16:44:30,745][14395] Num frames 29400... +[2024-11-07 16:44:31,040][14395] Num frames 29500... +[2024-11-07 16:44:31,340][14395] Avg episode rewards: #0: 4.349, true rewards: #0: 3.997 +[2024-11-07 16:44:31,347][14395] Avg episode reward: 4.349, avg true_objective: 3.997 +[2024-11-07 16:44:31,409][14395] Num frames 29600... +[2024-11-07 16:44:31,707][14395] Num frames 29700... +[2024-11-07 16:44:31,972][14395] Num frames 29800... +[2024-11-07 16:44:32,277][14395] Num frames 29900... +[2024-11-07 16:44:32,550][14395] Num frames 30000... +[2024-11-07 16:44:32,694][14395] Avg episode rewards: #0: 4.364, true rewards: #0: 4.004 +[2024-11-07 16:44:32,699][14395] Avg episode reward: 4.364, avg true_objective: 4.004 +[2024-11-07 16:44:32,912][14395] Num frames 30100... +[2024-11-07 16:44:33,168][14395] Num frames 30200... +[2024-11-07 16:44:33,449][14395] Num frames 30300... +[2024-11-07 16:44:33,713][14395] Num frames 30400... +[2024-11-07 16:44:33,982][14395] Avg episode rewards: #0: 4.378, true rewards: #0: 4.010 +[2024-11-07 16:44:33,985][14395] Avg episode reward: 4.378, avg true_objective: 4.010 +[2024-11-07 16:44:34,071][14395] Num frames 30500... +[2024-11-07 16:44:34,342][14395] Num frames 30600... +[2024-11-07 16:44:34,647][14395] Num frames 30700... +[2024-11-07 16:44:34,928][14395] Num frames 30800... +[2024-11-07 16:44:35,212][14395] Num frames 30900... +[2024-11-07 16:44:35,341][14395] Avg episode rewards: #0: 4.393, true rewards: #0: 4.016 +[2024-11-07 16:44:35,347][14395] Avg episode reward: 4.393, avg true_objective: 4.016 +[2024-11-07 16:44:35,553][14395] Num frames 31000... +[2024-11-07 16:44:35,821][14395] Num frames 31100... +[2024-11-07 16:44:36,079][14395] Num frames 31200... +[2024-11-07 16:44:36,388][14395] Num frames 31300... +[2024-11-07 16:44:36,467][14395] Avg episode rewards: #0: 4.386, true rewards: #0: 4.014 +[2024-11-07 16:44:36,471][14395] Avg episode reward: 4.386, avg true_objective: 4.014 +[2024-11-07 16:44:36,735][14395] Num frames 31400... +[2024-11-07 16:44:37,030][14395] Num frames 31500... +[2024-11-07 16:44:37,358][14395] Num frames 31600... +[2024-11-07 16:44:37,714][14395] Num frames 31700... +[2024-11-07 16:44:38,059][14395] Avg episode rewards: #0: 4.400, true rewards: #0: 4.020 +[2024-11-07 16:44:38,062][14395] Avg episode reward: 4.400, avg true_objective: 4.020 +[2024-11-07 16:44:38,226][14395] Num frames 31800... +[2024-11-07 16:44:38,478][14395] Num frames 31900... +[2024-11-07 16:44:38,768][14395] Num frames 32000... +[2024-11-07 16:44:39,165][14395] Num frames 32100... +[2024-11-07 16:44:39,378][14395] Avg episode rewards: #0: 4.393, true rewards: #0: 4.017 +[2024-11-07 16:44:39,379][14395] Avg episode reward: 4.393, avg true_objective: 4.017 +[2024-11-07 16:44:39,571][14395] Num frames 32200... +[2024-11-07 16:44:39,866][14395] Num frames 32300... +[2024-11-07 16:44:40,201][14395] Num frames 32400... +[2024-11-07 16:44:40,509][14395] Num frames 32500... +[2024-11-07 16:44:40,812][14395] Num frames 32600... +[2024-11-07 16:44:41,190][14395] Avg episode rewards: #0: 4.430, true rewards: #0: 4.035 +[2024-11-07 16:44:41,195][14395] Avg episode reward: 4.430, avg true_objective: 4.035 +[2024-11-07 16:44:41,325][14395] Num frames 32700... +[2024-11-07 16:44:41,967][14395] Num frames 32800... +[2024-11-07 16:44:42,269][14395] Num frames 32900... +[2024-11-07 16:44:42,746][14395] Num frames 33000... +[2024-11-07 16:44:43,030][14395] Avg episode rewards: #0: 4.423, true rewards: #0: 4.033 +[2024-11-07 16:44:43,032][14395] Avg episode reward: 4.423, avg true_objective: 4.033 +[2024-11-07 16:44:43,130][14395] Num frames 33100... +[2024-11-07 16:44:43,516][14395] Num frames 33200... +[2024-11-07 16:44:43,768][14395] Num frames 33300... +[2024-11-07 16:44:44,043][14395] Num frames 33400... +[2024-11-07 16:44:44,361][14395] Num frames 33500... +[2024-11-07 16:44:44,467][14395] Avg episode rewards: #0: 4.436, true rewards: #0: 4.038 +[2024-11-07 16:44:44,469][14395] Avg episode reward: 4.436, avg true_objective: 4.038 +[2024-11-07 16:44:47,173][14395] Num frames 33600... +[2024-11-07 16:44:47,501][14395] Num frames 33700... +[2024-11-07 16:44:47,873][14395] Num frames 33800... +[2024-11-07 16:44:48,210][14395] Num frames 33900... +[2024-11-07 16:44:48,264][14395] Avg episode rewards: #0: 4.429, true rewards: #0: 4.036 +[2024-11-07 16:44:48,265][14395] Avg episode reward: 4.429, avg true_objective: 4.036 +[2024-11-07 16:44:48,594][14395] Num frames 34000... +[2024-11-07 16:44:48,920][14395] Num frames 34100... +[2024-11-07 16:44:49,191][14395] Avg episode rewards: #0: 4.407, true rewards: #0: 4.018 +[2024-11-07 16:44:49,195][14395] Avg episode reward: 4.407, avg true_objective: 4.018 +[2024-11-07 16:44:49,349][14395] Num frames 34200... +[2024-11-07 16:44:49,705][14395] Num frames 34300... +[2024-11-07 16:44:50,038][14395] Num frames 34400... +[2024-11-07 16:44:50,382][14395] Num frames 34500... +[2024-11-07 16:44:50,680][14395] Avg episode rewards: #0: 4.415, true rewards: #0: 4.020 +[2024-11-07 16:44:50,683][14395] Avg episode reward: 4.415, avg true_objective: 4.020 +[2024-11-07 16:44:50,784][14395] Num frames 34600... +[2024-11-07 16:44:51,097][14395] Num frames 34700... +[2024-11-07 16:44:51,386][14395] Num frames 34800... +[2024-11-07 16:44:51,634][14395] Num frames 34900... +[2024-11-07 16:44:51,857][14395] Avg episode rewards: #0: 4.409, true rewards: #0: 4.018 +[2024-11-07 16:44:51,860][14395] Avg episode reward: 4.409, avg true_objective: 4.018 +[2024-11-07 16:44:52,012][14395] Num frames 35000... +[2024-11-07 16:44:52,334][14395] Num frames 35100... +[2024-11-07 16:44:52,625][14395] Num frames 35200... +[2024-11-07 16:44:52,923][14395] Num frames 35300... +[2024-11-07 16:44:53,201][14395] Num frames 35400... +[2024-11-07 16:44:53,271][14395] Avg episode rewards: #0: 4.421, true rewards: #0: 4.023 +[2024-11-07 16:44:53,276][14395] Avg episode reward: 4.421, avg true_objective: 4.023 +[2024-11-07 16:44:53,503][14395] Num frames 35500... +[2024-11-07 16:44:53,754][14395] Num frames 35600... +[2024-11-07 16:44:53,966][14395] Num frames 35700... +[2024-11-07 16:44:54,252][14395] Avg episode rewards: #0: 4.414, true rewards: #0: 4.021 +[2024-11-07 16:44:54,257][14395] Avg episode reward: 4.414, avg true_objective: 4.021 +[2024-11-07 16:44:54,295][14395] Num frames 35800... +[2024-11-07 16:44:54,535][14395] Num frames 35900... +[2024-11-07 16:44:54,775][14395] Num frames 36000... +[2024-11-07 16:44:55,023][14395] Num frames 36100... +[2024-11-07 16:44:55,342][14395] Avg episode rewards: #0: 4.408, true rewards: #0: 4.019 +[2024-11-07 16:44:55,348][14395] Avg episode reward: 4.408, avg true_objective: 4.019 +[2024-11-07 16:44:55,434][14395] Num frames 36200... +[2024-11-07 16:44:55,655][14395] Num frames 36300... +[2024-11-07 16:44:55,910][14395] Num frames 36400... +[2024-11-07 16:44:56,158][14395] Num frames 36500... +[2024-11-07 16:44:56,335][14395] Avg episode rewards: #0: 4.402, true rewards: #0: 4.017 +[2024-11-07 16:44:56,339][14395] Avg episode reward: 4.402, avg true_objective: 4.017 +[2024-11-07 16:44:56,474][14395] Num frames 36600... +[2024-11-07 16:44:56,741][14395] Num frames 36700... +[2024-11-07 16:44:57,056][14395] Num frames 36800... +[2024-11-07 16:44:57,337][14395] Num frames 36900... +[2024-11-07 16:44:57,490][14395] Avg episode rewards: #0: 4.396, true rewards: #0: 4.015 +[2024-11-07 16:44:57,492][14395] Avg episode reward: 4.396, avg true_objective: 4.015 +[2024-11-07 16:44:57,627][14395] Num frames 37000... +[2024-11-07 16:44:57,823][14395] Num frames 37100... +[2024-11-07 16:44:58,048][14395] Num frames 37200... +[2024-11-07 16:44:58,242][14395] Num frames 37300... +[2024-11-07 16:44:58,361][14395] Avg episode rewards: #0: 4.390, true rewards: #0: 4.013 +[2024-11-07 16:44:58,364][14395] Avg episode reward: 4.390, avg true_objective: 4.013 +[2024-11-07 16:44:58,588][14395] Num frames 37400... +[2024-11-07 16:44:58,811][14395] Num frames 37500... +[2024-11-07 16:44:59,038][14395] Num frames 37600... +[2024-11-07 16:44:59,304][14395] Num frames 37700... +[2024-11-07 16:44:59,453][14395] Avg episode rewards: #0: 4.387, true rewards: #0: 4.015 +[2024-11-07 16:44:59,458][14395] Avg episode reward: 4.387, avg true_objective: 4.015 +[2024-11-07 16:44:59,614][14395] Num frames 37800... +[2024-11-07 16:44:59,838][14395] Num frames 37900... +[2024-11-07 16:45:00,073][14395] Num frames 38000... +[2024-11-07 16:45:00,297][14395] Num frames 38100... +[2024-11-07 16:45:00,413][14395] Avg episode rewards: #0: 4.381, true rewards: #0: 4.013 +[2024-11-07 16:45:00,417][14395] Avg episode reward: 4.381, avg true_objective: 4.013 +[2024-11-07 16:45:00,627][14395] Num frames 38200... +[2024-11-07 16:45:00,938][14395] Num frames 38300... +[2024-11-07 16:45:01,301][14395] Num frames 38400... +[2024-11-07 16:45:01,644][14395] Num frames 38500... +[2024-11-07 16:45:01,736][14395] Avg episode rewards: #0: 4.376, true rewards: #0: 4.011 +[2024-11-07 16:45:01,740][14395] Avg episode reward: 4.376, avg true_objective: 4.011 +[2024-11-07 16:45:02,065][14395] Num frames 38600... +[2024-11-07 16:45:02,308][14395] Num frames 38700... +[2024-11-07 16:45:02,639][14395] Num frames 38800... +[2024-11-07 16:45:02,939][14395] Avg episode rewards: #0: 4.370, true rewards: #0: 4.009 +[2024-11-07 16:45:02,942][14395] Avg episode reward: 4.370, avg true_objective: 4.009 +[2024-11-07 16:45:02,970][14395] Num frames 38900... +[2024-11-07 16:45:03,197][14395] Num frames 39000... +[2024-11-07 16:45:03,485][14395] Num frames 39100... +[2024-11-07 16:45:03,729][14395] Num frames 39200... +[2024-11-07 16:45:03,960][14395] Avg episode rewards: #0: 4.365, true rewards: #0: 4.008 +[2024-11-07 16:45:03,965][14395] Avg episode reward: 4.365, avg true_objective: 4.008 +[2024-11-07 16:45:04,025][14395] Num frames 39300... +[2024-11-07 16:45:04,280][14395] Num frames 39400... +[2024-11-07 16:45:04,562][14395] Num frames 39500... +[2024-11-07 16:45:04,828][14395] Num frames 39600... +[2024-11-07 16:45:05,100][14395] Num frames 39700... +[2024-11-07 16:45:05,365][14395] Num frames 39800... +[2024-11-07 16:45:05,471][14395] Avg episode rewards: #0: 4.396, true rewards: #0: 4.022 +[2024-11-07 16:45:05,472][14395] Avg episode reward: 4.396, avg true_objective: 4.022 +[2024-11-07 16:45:05,708][14395] Num frames 39900... +[2024-11-07 16:45:05,981][14395] Num frames 40000... +[2024-11-07 16:45:06,262][14395] Num frames 40100... +[2024-11-07 16:45:06,529][14395] Num frames 40200... +[2024-11-07 16:45:06,599][14395] Avg episode rewards: #0: 4.390, true rewards: #0: 4.020 +[2024-11-07 16:45:06,603][14395] Avg episode reward: 4.390, avg true_objective: 4.020 +[2024-11-07 16:47:35,377][14395] Replay video saved to /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/replay.mp4! +[2024-11-07 16:47:52,066][14395] The model has been pushed to https://huggingface.co/alidenewade/rl_course_vizdoom_health_gathering_supreme +[2024-11-07 16:48:50,525][14395] Loading existing experiment configuration from /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/config.json +[2024-11-07 16:48:50,529][14395] Overriding arg 'num_workers' with value 1 passed from command line +[2024-11-07 16:48:50,531][14395] Adding new argument 'no_render'=True that is not in the saved config file! +[2024-11-07 16:48:50,535][14395] Adding new argument 'save_video'=True that is not in the saved config file! +[2024-11-07 16:48:50,537][14395] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! +[2024-11-07 16:48:50,540][14395] Adding new argument 'video_name'=None that is not in the saved config file! +[2024-11-07 16:48:50,543][14395] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! +[2024-11-07 16:48:50,547][14395] Adding new argument 'max_num_episodes'=100000000000 that is not in the saved config file! +[2024-11-07 16:48:50,550][14395] Adding new argument 'push_to_hub'=False that is not in the saved config file! +[2024-11-07 16:48:50,552][14395] Adding new argument 'hf_repository'=None that is not in the saved config file! +[2024-11-07 16:48:50,554][14395] Adding new argument 'policy_index'=0 that is not in the saved config file! +[2024-11-07 16:48:50,558][14395] Adding new argument 'eval_deterministic'=False that is not in the saved config file! +[2024-11-07 16:48:50,562][14395] Adding new argument 'train_script'=None that is not in the saved config file! +[2024-11-07 16:48:50,564][14395] Adding new argument 'enjoy_script'=None that is not in the saved config file! +[2024-11-07 16:48:50,568][14395] Using frameskip 1 and render_action_repeat=4 for evaluation +[2024-11-07 16:48:50,631][14395] RunningMeanStd input shape: (3, 72, 128) +[2024-11-07 16:48:50,637][14395] RunningMeanStd input shape: (1,) +[2024-11-07 16:48:50,697][14395] ConvEncoder: input_channels=3 +[2024-11-07 16:48:50,755][14395] Conv encoder output size: 512 +[2024-11-07 16:48:50,756][14395] Policy head output size: 512 +[2024-11-07 16:48:50,787][14395] Loading state from checkpoint /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000004884_20004864.pth... +[2024-11-07 16:48:51,548][14395] Num frames 100... +[2024-11-07 16:48:51,786][14395] Num frames 200... +[2024-11-07 16:48:52,032][14395] Num frames 300... +[2024-11-07 16:48:52,295][14395] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840 +[2024-11-07 16:48:52,298][14395] Avg episode reward: 3.840, avg true_objective: 3.840 +[2024-11-07 16:48:52,340][14395] Num frames 400... +[2024-11-07 16:48:52,554][14395] Num frames 500... +[2024-11-07 16:48:52,760][14395] Num frames 600... +[2024-11-07 16:48:52,960][14395] Num frames 700... +[2024-11-07 16:48:53,182][14395] Num frames 800... +[2024-11-07 16:48:53,313][14395] Avg episode rewards: #0: 4.660, true rewards: #0: 4.160 +[2024-11-07 16:48:53,315][14395] Avg episode reward: 4.660, avg true_objective: 4.160 +[2024-11-07 16:48:53,481][14395] Num frames 900... +[2024-11-07 16:48:53,688][14395] Num frames 1000... +[2024-11-07 16:48:53,910][14395] Num frames 1100... +[2024-11-07 16:48:54,110][14395] Num frames 1200... +[2024-11-07 16:48:54,203][14395] Avg episode rewards: #0: 4.387, true rewards: #0: 4.053 +[2024-11-07 16:48:54,207][14395] Avg episode reward: 4.387, avg true_objective: 4.053 +[2024-11-07 16:48:54,393][14395] Num frames 1300... +[2024-11-07 16:48:54,590][14395] Num frames 1400... +[2024-11-07 16:48:54,793][14395] Num frames 1500... +[2024-11-07 16:48:54,989][14395] Num frames 1600... +[2024-11-07 16:48:55,225][14395] Avg episode rewards: #0: 4.740, true rewards: #0: 4.240 +[2024-11-07 16:48:55,229][14395] Avg episode reward: 4.740, avg true_objective: 4.240 +[2024-11-07 16:48:55,250][14395] Num frames 1700... +[2024-11-07 16:48:55,452][14395] Num frames 1800... +[2024-11-07 16:48:55,652][14395] Num frames 1900... +[2024-11-07 16:48:55,860][14395] Num frames 2000... +[2024-11-07 16:48:56,076][14395] Avg episode rewards: #0: 4.560, true rewards: #0: 4.160 +[2024-11-07 16:48:56,080][14395] Avg episode reward: 4.560, avg true_objective: 4.160 +[2024-11-07 16:48:56,136][14395] Num frames 2100... +[2024-11-07 16:48:56,326][14395] Num frames 2200... +[2024-11-07 16:48:56,524][14395] Num frames 2300... +[2024-11-07 16:48:56,734][14395] Num frames 2400... +[2024-11-07 16:48:56,917][14395] Avg episode rewards: #0: 4.440, true rewards: #0: 4.107 +[2024-11-07 16:48:56,922][14395] Avg episode reward: 4.440, avg true_objective: 4.107 +[2024-11-07 16:48:57,011][14395] Num frames 2500... +[2024-11-07 16:48:57,214][14395] Num frames 2600... +[2024-11-07 16:48:57,407][14395] Num frames 2700... +[2024-11-07 16:48:57,620][14395] Num frames 2800... +[2024-11-07 16:48:57,774][14395] Avg episode rewards: #0: 4.354, true rewards: #0: 4.069 +[2024-11-07 16:48:57,778][14395] Avg episode reward: 4.354, avg true_objective: 4.069 +[2024-11-07 16:48:57,897][14395] Num frames 2900... +[2024-11-07 16:48:58,109][14395] Num frames 3000... +[2024-11-07 16:48:58,323][14395] Num frames 3100... +[2024-11-07 16:48:58,527][14395] Num frames 3200... +[2024-11-07 16:48:58,735][14395] Num frames 3300... +[2024-11-07 16:48:58,944][14395] Num frames 3400... +[2024-11-07 16:48:59,133][14395] Avg episode rewards: #0: 4.945, true rewards: #0: 4.320 +[2024-11-07 16:48:59,137][14395] Avg episode reward: 4.945, avg true_objective: 4.320 +[2024-11-07 16:49:23,526][14395] Loading existing experiment configuration from /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/config.json +[2024-11-07 16:49:23,528][14395] Overriding arg 'num_workers' with value 1 passed from command line +[2024-11-07 16:49:23,529][14395] Adding new argument 'no_render'=True that is not in the saved config file! +[2024-11-07 16:49:23,530][14395] Adding new argument 'save_video'=True that is not in the saved config file! +[2024-11-07 16:49:23,533][14395] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! +[2024-11-07 16:49:23,534][14395] Adding new argument 'video_name'=None that is not in the saved config file! +[2024-11-07 16:49:23,537][14395] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! +[2024-11-07 16:49:23,539][14395] Adding new argument 'max_num_episodes'=100000000000 that is not in the saved config file! +[2024-11-07 16:49:23,540][14395] Adding new argument 'push_to_hub'=False that is not in the saved config file! +[2024-11-07 16:49:23,541][14395] Adding new argument 'hf_repository'=None that is not in the saved config file! +[2024-11-07 16:49:23,543][14395] Adding new argument 'policy_index'=0 that is not in the saved config file! +[2024-11-07 16:49:23,547][14395] Adding new argument 'eval_deterministic'=False that is not in the saved config file! +[2024-11-07 16:49:23,548][14395] Adding new argument 'train_script'=None that is not in the saved config file! +[2024-11-07 16:49:23,551][14395] Adding new argument 'enjoy_script'=None that is not in the saved config file! +[2024-11-07 16:49:23,552][14395] Using frameskip 1 and render_action_repeat=4 for evaluation +[2024-11-07 16:49:23,621][14395] RunningMeanStd input shape: (3, 72, 128) +[2024-11-07 16:49:23,624][14395] RunningMeanStd input shape: (1,) +[2024-11-07 16:49:23,660][14395] ConvEncoder: input_channels=3 +[2024-11-07 16:49:23,709][14395] Conv encoder output size: 512 +[2024-11-07 16:49:23,711][14395] Policy head output size: 512 +[2024-11-07 16:49:23,790][14395] Loading state from checkpoint /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000004884_20004864.pth... +[2024-11-07 16:49:24,636][14395] Num frames 100... +[2024-11-07 16:49:24,860][14395] Num frames 200... +[2024-11-07 16:49:25,076][14395] Num frames 300... +[2024-11-07 16:49:25,299][14395] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840 +[2024-11-07 16:49:25,308][14395] Avg episode reward: 3.840, avg true_objective: 3.840 +[2024-11-07 16:49:25,375][14395] Num frames 400... +[2024-11-07 16:49:25,573][14395] Num frames 500... +[2024-11-07 16:49:25,842][14395] Num frames 600... +[2024-11-07 16:49:26,042][14395] Num frames 700... +[2024-11-07 16:49:26,264][14395] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840 +[2024-11-07 16:49:26,266][14395] Avg episode reward: 3.840, avg true_objective: 3.840 +[2024-11-07 16:49:26,338][14395] Num frames 800... +[2024-11-07 16:49:26,550][14395] Num frames 900... +[2024-11-07 16:49:26,751][14395] Num frames 1000... +[2024-11-07 16:49:26,946][14395] Num frames 1100... +[2024-11-07 16:49:27,151][14395] Num frames 1200... +[2024-11-07 16:49:27,244][14395] Avg episode rewards: #0: 4.387, true rewards: #0: 4.053 +[2024-11-07 16:49:27,247][14395] Avg episode reward: 4.387, avg true_objective: 4.053 +[2024-11-07 16:49:27,442][14395] Num frames 1300... +[2024-11-07 16:49:27,696][14395] Num frames 1400... +[2024-11-07 16:49:27,898][14395] Num frames 1500... +[2024-11-07 16:49:28,094][14395] Num frames 1600... +[2024-11-07 16:49:28,212][14395] Avg episode rewards: #0: 4.580, true rewards: #0: 4.080 +[2024-11-07 16:49:28,216][14395] Avg episode reward: 4.580, avg true_objective: 4.080 +[2024-11-07 16:49:28,372][14395] Num frames 1700... +[2024-11-07 16:49:28,588][14395] Num frames 1800... +[2024-11-07 16:49:28,810][14395] Num frames 1900... +[2024-11-07 16:49:29,004][14395] Num frames 2000... +[2024-11-07 16:49:29,229][14395] Avg episode rewards: #0: 4.760, true rewards: #0: 4.160 +[2024-11-07 16:49:29,231][14395] Avg episode reward: 4.760, avg true_objective: 4.160 +[2024-11-07 16:49:29,279][14395] Num frames 2100... +[2024-11-07 16:49:29,486][14395] Num frames 2200... +[2024-11-07 16:49:29,741][14395] Num frames 2300... +[2024-11-07 16:49:29,974][14395] Num frames 2400... +[2024-11-07 16:49:30,158][14395] Avg episode rewards: #0: 4.607, true rewards: #0: 4.107 +[2024-11-07 16:49:30,161][14395] Avg episode reward: 4.607, avg true_objective: 4.107 +[2024-11-07 16:49:30,253][14395] Num frames 2500... +[2024-11-07 16:49:30,489][14395] Num frames 2600... +[2024-11-07 16:49:30,752][14395] Num frames 2700... +[2024-11-07 16:49:30,952][14395] Num frames 2800... +[2024-11-07 16:49:31,102][14395] Avg episode rewards: #0: 4.497, true rewards: #0: 4.069 +[2024-11-07 16:49:31,104][14395] Avg episode reward: 4.497, avg true_objective: 4.069 +[2024-11-07 16:49:31,229][14395] Num frames 2900... +[2024-11-07 16:49:31,450][14395] Num frames 3000... +[2024-11-07 16:49:31,654][14395] Num frames 3100... +[2024-11-07 16:49:31,863][14395] Num frames 3200... +[2024-11-07 16:49:31,985][14395] Avg episode rewards: #0: 4.415, true rewards: #0: 4.040 +[2024-11-07 16:49:31,986][14395] Avg episode reward: 4.415, avg true_objective: 4.040 +[2024-11-07 16:49:32,155][14395] Num frames 3300... +[2024-11-07 16:49:32,376][14395] Num frames 3400... +[2024-11-07 16:49:32,598][14395] Num frames 3500... +[2024-11-07 16:49:32,812][14395] Num frames 3600... +[2024-11-07 16:49:32,904][14395] Avg episode rewards: #0: 4.351, true rewards: #0: 4.018 +[2024-11-07 16:49:32,910][14395] Avg episode reward: 4.351, avg true_objective: 4.018 +[2024-11-07 16:49:33,134][14395] Num frames 3700... +[2024-11-07 16:49:33,332][14395] Num frames 3800... +[2024-11-07 16:49:33,544][14395] Num frames 3900... +[2024-11-07 16:49:33,750][14395] Num frames 4000... +[2024-11-07 16:49:33,803][14395] Avg episode rewards: #0: 4.300, true rewards: #0: 4.000 +[2024-11-07 16:49:33,804][14395] Avg episode reward: 4.300, avg true_objective: 4.000 +[2024-11-07 16:49:34,010][14395] Num frames 4100... +[2024-11-07 16:49:34,225][14395] Num frames 4200... +[2024-11-07 16:49:34,432][14395] Num frames 4300... +[2024-11-07 16:49:34,654][14395] Avg episode rewards: #0: 4.258, true rewards: #0: 3.985 +[2024-11-07 16:49:34,658][14395] Avg episode reward: 4.258, avg true_objective: 3.985 +[2024-11-07 16:49:34,714][14395] Num frames 4400... +[2024-11-07 16:49:34,967][14395] Num frames 4500... +[2024-11-07 16:49:35,191][14395] Num frames 4600... +[2024-11-07 16:49:35,420][14395] Num frames 4700... +[2024-11-07 16:49:35,619][14395] Avg episode rewards: #0: 4.223, true rewards: #0: 3.973 +[2024-11-07 16:49:35,621][14395] Avg episode reward: 4.223, avg true_objective: 3.973 +[2024-11-07 16:49:35,687][14395] Num frames 4800... +[2024-11-07 16:49:35,890][14395] Num frames 4900... +[2024-11-07 16:49:36,083][14395] Num frames 5000... +[2024-11-07 16:49:36,288][14395] Num frames 5100... +[2024-11-07 16:49:36,445][14395] Avg episode rewards: #0: 4.194, true rewards: #0: 3.963 +[2024-11-07 16:49:36,449][14395] Avg episode reward: 4.194, avg true_objective: 3.963 +[2024-11-07 16:49:36,555][14395] Num frames 5200... +[2024-11-07 16:49:36,751][14395] Num frames 5300... +[2024-11-07 16:49:36,953][14395] Num frames 5400... +[2024-11-07 16:49:37,147][14395] Num frames 5500... +[2024-11-07 16:49:37,279][14395] Avg episode rewards: #0: 4.169, true rewards: #0: 3.954 +[2024-11-07 16:49:37,285][14395] Avg episode reward: 4.169, avg true_objective: 3.954 +[2024-11-07 16:49:37,426][14395] Num frames 5600... +[2024-11-07 16:49:37,623][14395] Num frames 5700... +[2024-11-07 16:49:37,808][14395] Num frames 5800... +[2024-11-07 16:49:38,005][14395] Num frames 5900... +[2024-11-07 16:49:38,097][14395] Avg episode rewards: #0: 4.147, true rewards: #0: 3.947 +[2024-11-07 16:49:38,100][14395] Avg episode reward: 4.147, avg true_objective: 3.947 +[2024-11-07 16:49:38,263][14395] Num frames 6000... +[2024-11-07 16:49:38,452][14395] Num frames 6100... +[2024-11-07 16:49:38,635][14395] Num frames 6200... +[2024-11-07 16:49:38,813][14395] Num frames 6300... +[2024-11-07 16:49:38,877][14395] Avg episode rewards: #0: 4.128, true rewards: #0: 3.940 +[2024-11-07 16:49:38,880][14395] Avg episode reward: 4.128, avg true_objective: 3.940 +[2024-11-07 16:49:39,075][14395] Num frames 6400... +[2024-11-07 16:49:39,266][14395] Num frames 6500... +[2024-11-07 16:49:39,447][14395] Num frames 6600... +[2024-11-07 16:49:39,675][14395] Avg episode rewards: #0: 4.111, true rewards: #0: 3.934 +[2024-11-07 16:49:39,677][14395] Avg episode reward: 4.111, avg true_objective: 3.934 +[2024-11-07 16:49:39,704][14395] Num frames 6700... +[2024-11-07 16:49:39,907][14395] Num frames 6800... +[2024-11-07 16:49:40,106][14395] Num frames 6900... +[2024-11-07 16:49:40,306][14395] Num frames 7000... +[2024-11-07 16:49:40,490][14395] Avg episode rewards: #0: 4.096, true rewards: #0: 3.929 +[2024-11-07 16:49:40,496][14395] Avg episode reward: 4.096, avg true_objective: 3.929 +[2024-11-07 16:49:40,568][14395] Num frames 7100... +[2024-11-07 16:49:40,756][14395] Num frames 7200... +[2024-11-07 16:49:40,947][14395] Num frames 7300... +[2024-11-07 16:49:41,129][14395] Num frames 7400... +[2024-11-07 16:49:41,440][14395] Num frames 7500... +[2024-11-07 16:49:41,601][14395] Avg episode rewards: #0: 4.238, true rewards: #0: 3.975 +[2024-11-07 16:49:41,603][14395] Avg episode reward: 4.238, avg true_objective: 3.975 +[2024-11-07 16:49:41,701][14395] Num frames 7600... +[2024-11-07 16:49:41,890][14395] Num frames 7700... +[2024-11-07 16:49:42,089][14395] Num frames 7800... +[2024-11-07 16:49:42,295][14395] Num frames 7900... +[2024-11-07 16:49:42,420][14395] Avg episode rewards: #0: 4.218, true rewards: #0: 3.968 +[2024-11-07 16:49:42,421][14395] Avg episode reward: 4.218, avg true_objective: 3.968 +[2024-11-07 16:49:42,554][14395] Num frames 8000... +[2024-11-07 16:49:42,762][14395] Num frames 8100... +[2024-11-07 16:49:42,951][14395] Num frames 8200... +[2024-11-07 16:49:43,144][14395] Num frames 8300... +[2024-11-07 16:49:43,355][14395] Avg episode rewards: #0: 4.278, true rewards: #0: 3.992 +[2024-11-07 16:49:43,358][14395] Avg episode reward: 4.278, avg true_objective: 3.992 +[2024-11-07 16:49:43,403][14395] Num frames 8400... +[2024-11-07 16:49:43,599][14395] Num frames 8500... +[2024-11-07 16:49:43,790][14395] Num frames 8600... +[2024-11-07 16:49:43,979][14395] Num frames 8700... +[2024-11-07 16:49:44,167][14395] Avg episode rewards: #0: 4.258, true rewards: #0: 3.985 +[2024-11-07 16:49:44,171][14395] Avg episode reward: 4.258, avg true_objective: 3.985 +[2024-11-07 16:49:44,251][14395] Num frames 8800... +[2024-11-07 16:49:44,432][14395] Num frames 8900... +[2024-11-07 16:49:44,618][14395] Num frames 9000... +[2024-11-07 16:49:44,799][14395] Num frames 9100... +[2024-11-07 16:49:44,985][14395] Num frames 9200... +[2024-11-07 16:49:45,071][14395] Avg episode rewards: #0: 4.311, true rewards: #0: 4.007 +[2024-11-07 16:49:45,074][14395] Avg episode reward: 4.311, avg true_objective: 4.007 +[2024-11-07 16:49:45,259][14395] Num frames 9300... +[2024-11-07 16:49:45,504][14395] Num frames 9400... +[2024-11-07 16:49:45,699][14395] Num frames 9500... +[2024-11-07 16:49:45,908][14395] Num frames 9600... +[2024-11-07 16:49:45,961][14395] Avg episode rewards: #0: 4.292, true rewards: #0: 4.000 +[2024-11-07 16:49:45,963][14395] Avg episode reward: 4.292, avg true_objective: 4.000 +[2024-11-07 16:49:46,158][14395] Num frames 9700... +[2024-11-07 16:49:46,344][14395] Num frames 9800... +[2024-11-07 16:49:46,531][14395] Num frames 9900... +[2024-11-07 16:49:46,745][14395] Avg episode rewards: #0: 4.274, true rewards: #0: 3.994 +[2024-11-07 16:49:46,750][14395] Avg episode reward: 4.274, avg true_objective: 3.994 +[2024-11-07 16:49:46,786][14395] Num frames 10000... +[2024-11-07 16:49:46,979][14395] Num frames 10100... +[2024-11-07 16:49:47,156][14395] Num frames 10200... +[2024-11-07 16:49:47,329][14395] Num frames 10300... +[2024-11-07 16:49:47,506][14395] Avg episode rewards: #0: 4.257, true rewards: #0: 3.988 +[2024-11-07 16:49:47,510][14395] Avg episode reward: 4.257, avg true_objective: 3.988 +[2024-11-07 16:49:47,594][14395] Num frames 10400... +[2024-11-07 16:49:47,774][14395] Num frames 10500... +[2024-11-07 16:49:47,952][14395] Num frames 10600... +[2024-11-07 16:49:48,133][14395] Num frames 10700... +[2024-11-07 16:49:48,332][14395] Avg episode rewards: #0: 4.290, true rewards: #0: 3.994 +[2024-11-07 16:49:48,337][14395] Avg episode reward: 4.290, avg true_objective: 3.994 +[2024-11-07 16:49:48,380][14395] Num frames 10800... +[2024-11-07 16:49:48,563][14395] Num frames 10900... +[2024-11-07 16:49:48,743][14395] Num frames 11000... +[2024-11-07 16:49:48,929][14395] Num frames 11100... +[2024-11-07 16:49:49,257][14395] Avg episode rewards: #0: 4.274, true rewards: #0: 3.989 +[2024-11-07 16:49:49,259][14395] Avg episode reward: 4.274, avg true_objective: 3.989 +[2024-11-07 16:49:49,339][14395] Num frames 11200... +[2024-11-07 16:49:49,526][14395] Num frames 11300... +[2024-11-07 16:49:49,719][14395] Num frames 11400... +[2024-11-07 16:49:49,901][14395] Num frames 11500... +[2024-11-07 16:49:50,084][14395] Num frames 11600... +[2024-11-07 16:49:50,270][14395] Num frames 11700... +[2024-11-07 16:49:50,467][14395] Avg episode rewards: #0: 4.440, true rewards: #0: 4.061 +[2024-11-07 16:49:50,472][14395] Avg episode reward: 4.440, avg true_objective: 4.061 +[2024-11-07 16:49:50,542][14395] Num frames 11800... +[2024-11-07 16:49:50,724][14395] Num frames 11900... +[2024-11-07 16:49:50,912][14395] Num frames 12000... +[2024-11-07 16:49:51,103][14395] Num frames 12100... +[2024-11-07 16:49:51,268][14395] Avg episode rewards: #0: 4.420, true rewards: #0: 4.053 +[2024-11-07 16:49:51,271][14395] Avg episode reward: 4.420, avg true_objective: 4.053 +[2024-11-07 16:49:51,363][14395] Num frames 12200... +[2024-11-07 16:49:51,552][14395] Num frames 12300... +[2024-11-07 16:49:51,748][14395] Num frames 12400... +[2024-11-07 16:49:51,934][14395] Num frames 12500... +[2024-11-07 16:49:52,069][14395] Avg episode rewards: #0: 4.401, true rewards: #0: 4.046 +[2024-11-07 16:49:52,074][14395] Avg episode reward: 4.401, avg true_objective: 4.046 +[2024-11-07 16:49:52,194][14395] Num frames 12600... +[2024-11-07 16:49:52,376][14395] Num frames 12700... +[2024-11-07 16:49:52,559][14395] Num frames 12800... +[2024-11-07 16:49:52,762][14395] Num frames 12900... +[2024-11-07 16:49:52,876][14395] Avg episode rewards: #0: 4.384, true rewards: #0: 4.040 +[2024-11-07 16:49:52,877][14395] Avg episode reward: 4.384, avg true_objective: 4.040 +[2024-11-07 16:49:53,036][14395] Num frames 13000... +[2024-11-07 16:49:53,267][14395] Num frames 13100... +[2024-11-07 16:49:53,470][14395] Num frames 13200... +[2024-11-07 16:49:53,706][14395] Num frames 13300... +[2024-11-07 16:49:53,915][14395] Avg episode rewards: #0: 4.417, true rewards: #0: 4.053 +[2024-11-07 16:49:53,916][14395] Avg episode reward: 4.417, avg true_objective: 4.053 +[2024-11-07 16:49:53,979][14395] Num frames 13400... +[2024-11-07 16:49:54,194][14395] Num frames 13500... +[2024-11-07 16:49:54,427][14395] Num frames 13600... +[2024-11-07 16:49:54,678][14395] Num frames 13700... +[2024-11-07 16:49:56,906][14395] Avg episode rewards: #0: 4.400, true rewards: #0: 4.047 +[2024-11-07 16:49:56,907][14395] Avg episode reward: 4.400, avg true_objective: 4.047 +[2024-11-07 16:49:56,987][14395] Num frames 13800... +[2024-11-07 16:49:57,191][14395] Num frames 13900... +[2024-11-07 16:49:57,383][14395] Num frames 14000... +[2024-11-07 16:49:57,560][14395] Num frames 14100... +[2024-11-07 16:49:57,724][14395] Avg episode rewards: #0: 4.384, true rewards: #0: 4.041 +[2024-11-07 16:49:57,725][14395] Avg episode reward: 4.384, avg true_objective: 4.041 +[2024-11-07 16:49:57,832][14395] Num frames 14200... +[2024-11-07 16:49:58,066][14395] Num frames 14300... +[2024-11-07 16:49:58,273][14395] Num frames 14400... +[2024-11-07 16:49:58,492][14395] Num frames 14500... +[2024-11-07 16:49:58,671][14395] Avg episode rewards: #0: 4.406, true rewards: #0: 4.044 +[2024-11-07 16:49:58,673][14395] Avg episode reward: 4.406, avg true_objective: 4.044 +[2024-11-07 16:49:58,754][14395] Num frames 14600... +[2024-11-07 16:49:58,957][14395] Num frames 14700... +[2024-11-07 16:49:59,150][14395] Num frames 14800... +[2024-11-07 16:49:59,346][14395] Num frames 14900... +[2024-11-07 16:49:59,541][14395] Num frames 15000... +[2024-11-07 16:49:59,615][14395] Avg episode rewards: #0: 4.435, true rewards: #0: 4.056 +[2024-11-07 16:49:59,619][14395] Avg episode reward: 4.435, avg true_objective: 4.056 +[2024-11-07 16:49:59,831][14395] Num frames 15100... +[2024-11-07 16:50:00,028][14395] Num frames 15200... +[2024-11-07 16:50:00,217][14395] Num frames 15300... +[2024-11-07 16:50:00,400][14395] Num frames 15400... +[2024-11-07 16:50:00,499][14395] Avg episode rewards: #0: 4.480, true rewards: #0: 4.059 +[2024-11-07 16:50:00,503][14395] Avg episode reward: 4.480, avg true_objective: 4.059 +[2024-11-07 16:50:00,679][14395] Num frames 15500... +[2024-11-07 16:50:00,881][14395] Num frames 15600... +[2024-11-07 16:50:01,073][14395] Num frames 15700... +[2024-11-07 16:50:01,278][14395] Num frames 15800... +[2024-11-07 16:50:01,354][14395] Avg episode rewards: #0: 4.464, true rewards: #0: 4.053 +[2024-11-07 16:50:01,355][14395] Avg episode reward: 4.464, avg true_objective: 4.053 +[2024-11-07 16:50:01,539][14395] Num frames 15900... +[2024-11-07 16:50:01,848][14395] Num frames 16000... +[2024-11-07 16:50:02,073][14395] Avg episode rewards: #0: 4.416, true rewards: #0: 4.016 +[2024-11-07 16:50:02,076][14395] Avg episode reward: 4.416, avg true_objective: 4.016 +[2024-11-07 16:50:02,164][14395] Num frames 16100... +[2024-11-07 16:50:02,428][14395] Num frames 16200... +[2024-11-07 16:50:02,695][14395] Num frames 16300... +[2024-11-07 16:50:02,887][14395] Num frames 16400... +[2024-11-07 16:50:03,088][14395] Avg episode rewards: #0: 4.434, true rewards: #0: 4.020 +[2024-11-07 16:50:03,091][14395] Avg episode reward: 4.434, avg true_objective: 4.020 +[2024-11-07 16:50:03,148][14395] Num frames 16500... +[2024-11-07 16:50:03,334][14395] Num frames 16600... +[2024-11-07 16:50:03,535][14395] Num frames 16700... +[2024-11-07 16:50:03,733][14395] Num frames 16800... +[2024-11-07 16:50:03,906][14395] Avg episode rewards: #0: 4.420, true rewards: #0: 4.015 +[2024-11-07 16:50:03,912][14395] Avg episode reward: 4.420, avg true_objective: 4.015 +[2024-11-07 16:50:03,992][14395] Num frames 16900... +[2024-11-07 16:50:04,183][14395] Num frames 17000... +[2024-11-07 16:50:04,382][14395] Num frames 17100... +[2024-11-07 16:50:04,593][14395] Num frames 17200... +[2024-11-07 16:50:04,747][14395] Avg episode rewards: #0: 4.407, true rewards: #0: 4.011 +[2024-11-07 16:50:04,751][14395] Avg episode reward: 4.407, avg true_objective: 4.011 +[2024-11-07 16:50:04,867][14395] Num frames 17300... +[2024-11-07 16:50:05,055][14395] Num frames 17400... +[2024-11-07 16:50:05,266][14395] Num frames 17500... +[2024-11-07 16:50:05,466][14395] Num frames 17600... +[2024-11-07 16:50:05,586][14395] Avg episode rewards: #0: 4.394, true rewards: #0: 4.007 +[2024-11-07 16:50:05,589][14395] Avg episode reward: 4.394, avg true_objective: 4.007 +[2024-11-07 16:50:05,748][14395] Num frames 17700... +[2024-11-07 16:50:05,938][14395] Num frames 17800... +[2024-11-07 16:50:06,146][14395] Num frames 17900... +[2024-11-07 16:50:06,382][14395] Num frames 18000... +[2024-11-07 16:50:06,597][14395] Num frames 18100... +[2024-11-07 16:50:06,803][14395] Avg episode rewards: #0: 4.461, true rewards: #0: 4.039 +[2024-11-07 16:50:06,809][14395] Avg episode reward: 4.461, avg true_objective: 4.039 +[2024-11-07 16:50:06,875][14395] Num frames 18200... +[2024-11-07 16:50:07,076][14395] Num frames 18300... +[2024-11-07 16:50:07,256][14395] Num frames 18400... +[2024-11-07 16:50:07,437][14395] Num frames 18500... +[2024-11-07 16:50:07,605][14395] Avg episode rewards: #0: 4.448, true rewards: #0: 4.035 +[2024-11-07 16:50:07,608][14395] Avg episode reward: 4.448, avg true_objective: 4.035 +[2024-11-07 16:50:07,702][14395] Num frames 18600... +[2024-11-07 16:50:07,884][14395] Num frames 18700... +[2024-11-07 16:50:08,063][14395] Num frames 18800... +[2024-11-07 16:50:08,245][14395] Num frames 18900... +[2024-11-07 16:50:08,382][14395] Avg episode rewards: #0: 4.435, true rewards: #0: 4.031 +[2024-11-07 16:50:08,385][14395] Avg episode reward: 4.435, avg true_objective: 4.031 +[2024-11-07 16:50:08,507][14395] Num frames 19000... +[2024-11-07 16:50:08,694][14395] Num frames 19100... +[2024-11-07 16:50:08,876][14395] Num frames 19200... +[2024-11-07 16:50:09,060][14395] Num frames 19300... +[2024-11-07 16:50:09,168][14395] Avg episode rewards: #0: 4.423, true rewards: #0: 4.027 +[2024-11-07 16:50:09,169][14395] Avg episode reward: 4.423, avg true_objective: 4.027 +[2024-11-07 16:50:09,320][14395] Num frames 19400... +[2024-11-07 16:50:09,502][14395] Num frames 19500... +[2024-11-07 16:50:09,692][14395] Num frames 19600... +[2024-11-07 16:50:09,869][14395] Num frames 19700... +[2024-11-07 16:50:09,947][14395] Avg episode rewards: #0: 4.411, true rewards: #0: 4.023 +[2024-11-07 16:50:09,949][14395] Avg episode reward: 4.411, avg true_objective: 4.023 +[2024-11-07 16:50:10,124][14395] Num frames 19800... +[2024-11-07 16:50:10,299][14395] Num frames 19900... +[2024-11-07 16:50:10,478][14395] Num frames 20000... +[2024-11-07 16:50:10,715][14395] Avg episode rewards: #0: 4.399, true rewards: #0: 4.019 +[2024-11-07 16:50:10,719][14395] Avg episode reward: 4.399, avg true_objective: 4.019 +[2024-11-07 16:50:10,747][14395] Num frames 20100... +[2024-11-07 16:50:10,965][14395] Num frames 20200... +[2024-11-07 16:50:11,149][14395] Num frames 20300... +[2024-11-07 16:50:11,329][14395] Num frames 20400... +[2024-11-07 16:50:11,529][14395] Avg episode rewards: #0: 4.388, true rewards: #0: 4.016 +[2024-11-07 16:50:11,532][14395] Avg episode reward: 4.388, avg true_objective: 4.016 +[2024-11-07 16:50:11,579][14395] Num frames 20500... +[2024-11-07 16:50:11,820][14395] Num frames 20600... +[2024-11-07 16:50:12,007][14395] Num frames 20700... +[2024-11-07 16:50:12,189][14395] Num frames 20800... +[2024-11-07 16:50:12,366][14395] Avg episode rewards: #0: 4.378, true rewards: #0: 4.012 +[2024-11-07 16:50:12,369][14395] Avg episode reward: 4.378, avg true_objective: 4.012 +[2024-11-07 16:50:12,453][14395] Num frames 20900... +[2024-11-07 16:50:12,647][14395] Num frames 21000... +[2024-11-07 16:50:12,873][14395] Num frames 21100... +[2024-11-07 16:50:13,079][14395] Num frames 21200... +[2024-11-07 16:50:13,321][14395] Num frames 21300... +[2024-11-07 16:50:13,405][14395] Avg episode rewards: #0: 4.398, true rewards: #0: 4.021 +[2024-11-07 16:50:13,406][14395] Avg episode reward: 4.398, avg true_objective: 4.021 +[2024-11-07 16:50:13,607][14395] Num frames 21400... +[2024-11-07 16:50:13,842][14395] Num frames 21500... +[2024-11-07 16:50:14,067][14395] Num frames 21600... +[2024-11-07 16:50:14,333][14395] Avg episode rewards: #0: 4.388, true rewards: #0: 4.018 +[2024-11-07 16:50:14,334][14395] Avg episode reward: 4.388, avg true_objective: 4.018 +[2024-11-07 16:50:14,350][14395] Num frames 21700... +[2024-11-07 16:50:14,566][14395] Num frames 21800... +[2024-11-07 16:50:14,779][14395] Num frames 21900... +[2024-11-07 16:50:14,961][14395] Num frames 22000... +[2024-11-07 16:50:15,160][14395] Avg episode rewards: #0: 4.378, true rewards: #0: 4.015 +[2024-11-07 16:50:15,162][14395] Avg episode reward: 4.378, avg true_objective: 4.015 +[2024-11-07 16:50:15,203][14395] Num frames 22100... +[2024-11-07 16:50:15,383][14395] Num frames 22200... +[2024-11-07 16:50:15,563][14395] Num frames 22300... +[2024-11-07 16:50:15,761][14395] Num frames 22400... +[2024-11-07 16:50:16,021][14395] Avg episode rewards: #0: 4.392, true rewards: #0: 4.017 +[2024-11-07 16:50:16,024][14395] Avg episode reward: 4.392, avg true_objective: 4.017 +[2024-11-07 16:50:16,045][14395] Num frames 22500... +[2024-11-07 16:50:16,279][14395] Num frames 22600... +[2024-11-07 16:50:16,530][14395] Num frames 22700... +[2024-11-07 16:50:16,758][14395] Num frames 22800... +[2024-11-07 16:50:16,989][14395] Avg episode rewards: #0: 4.382, true rewards: #0: 4.014 +[2024-11-07 16:50:16,992][14395] Avg episode reward: 4.382, avg true_objective: 4.014 +[2024-11-07 16:50:17,064][14395] Num frames 22900... +[2024-11-07 16:50:17,345][14395] Num frames 23000... +[2024-11-07 16:50:17,632][14395] Num frames 23100... +[2024-11-07 16:50:17,873][14395] Num frames 23200... +[2024-11-07 16:50:18,298][14395] Avg episode rewards: #0: 4.373, true rewards: #0: 4.011 +[2024-11-07 16:50:18,301][14395] Avg episode reward: 4.373, avg true_objective: 4.011 +[2024-11-07 16:50:18,395][14395] Num frames 23300... +[2024-11-07 16:50:18,582][14395] Num frames 23400... +[2024-11-07 16:50:18,759][14395] Num frames 23500... +[2024-11-07 16:50:18,947][14395] Num frames 23600... +[2024-11-07 16:50:19,089][14395] Avg episode rewards: #0: 4.364, true rewards: #0: 4.008 +[2024-11-07 16:50:19,091][14395] Avg episode reward: 4.364, avg true_objective: 4.008 +[2024-11-07 16:50:19,233][14395] Num frames 23700... +[2024-11-07 16:50:19,551][14395] Num frames 23800... +[2024-11-07 16:50:19,792][14395] Num frames 23900... +[2024-11-07 16:50:19,985][14395] Num frames 24000... +[2024-11-07 16:50:20,105][14395] Avg episode rewards: #0: 4.355, true rewards: #0: 4.005 +[2024-11-07 16:50:20,110][14395] Avg episode reward: 4.355, avg true_objective: 4.005 +[2024-11-07 16:50:20,300][14395] Num frames 24100... +[2024-11-07 16:50:20,560][14395] Num frames 24200... +[2024-11-07 16:50:20,756][14395] Num frames 24300... +[2024-11-07 16:50:20,944][14395] Num frames 24400... +[2024-11-07 16:50:21,034][14395] Avg episode rewards: #0: 4.347, true rewards: #0: 4.003 +[2024-11-07 16:50:21,035][14395] Avg episode reward: 4.347, avg true_objective: 4.003 +[2024-11-07 16:50:21,197][14395] Num frames 24500... +[2024-11-07 16:50:21,402][14395] Num frames 24600... +[2024-11-07 16:50:21,595][14395] Num frames 24700... +[2024-11-07 16:50:21,796][14395] Num frames 24800... +[2024-11-07 16:50:21,992][14395] Num frames 24900... +[2024-11-07 16:50:22,162][14395] Avg episode rewards: #0: 4.397, true rewards: #0: 4.026 +[2024-11-07 16:50:22,164][14395] Avg episode reward: 4.397, avg true_objective: 4.026 +[2024-11-07 16:50:22,267][14395] Num frames 25000... +[2024-11-07 16:50:22,574][14395] Num frames 25100... +[2024-11-07 16:50:22,807][14395] Num frames 25200... +[2024-11-07 16:50:23,060][14395] Num frames 25300... +[2024-11-07 16:50:23,231][14395] Avg episode rewards: #0: 4.388, true rewards: #0: 4.023 +[2024-11-07 16:50:23,233][14395] Avg episode reward: 4.388, avg true_objective: 4.023 +[2024-11-07 16:50:23,389][14395] Num frames 25400... +[2024-11-07 16:50:23,644][14395] Num frames 25500... +[2024-11-07 16:50:23,871][14395] Num frames 25600... +[2024-11-07 16:50:24,398][14395] Num frames 25700... +[2024-11-07 16:50:24,541][14395] Avg episode rewards: #0: 4.379, true rewards: #0: 4.020 +[2024-11-07 16:50:24,543][14395] Avg episode reward: 4.379, avg true_objective: 4.020 +[2024-11-07 16:50:24,945][14395] Num frames 25800... +[2024-11-07 16:50:25,316][14395] Num frames 25900... +[2024-11-07 16:50:25,834][14395] Num frames 26000... +[2024-11-07 16:50:26,354][14395] Num frames 26100... +[2024-11-07 16:50:26,456][14395] Avg episode rewards: #0: 4.371, true rewards: #0: 4.017 +[2024-11-07 16:50:26,458][14395] Avg episode reward: 4.371, avg true_objective: 4.017 +[2024-11-07 16:50:26,851][14395] Num frames 26200... +[2024-11-07 16:50:27,258][14395] Num frames 26300... +[2024-11-07 16:50:27,731][14395] Num frames 26400... +[2024-11-07 16:50:28,213][14395] Avg episode rewards: #0: 4.363, true rewards: #0: 4.015 +[2024-11-07 16:50:28,215][14395] Avg episode reward: 4.363, avg true_objective: 4.015 +[2024-11-07 16:50:28,229][14395] Num frames 26500... +[2024-11-07 16:50:28,554][14395] Num frames 26600... +[2024-11-07 16:50:29,001][14395] Num frames 26700... +[2024-11-07 16:50:29,200][14395] Avg episode rewards: #0: 4.336, true rewards: #0: 3.993 +[2024-11-07 16:50:29,202][14395] Avg episode reward: 4.336, avg true_objective: 3.993 +[2024-11-07 16:50:31,294][14395] Num frames 26800... +[2024-11-07 16:50:31,551][14395] Num frames 26900... +[2024-11-07 16:50:31,873][14395] Num frames 27000... +[2024-11-07 16:50:32,161][14395] Num frames 27100... +[2024-11-07 16:50:32,332][14395] Avg episode rewards: #0: 4.329, true rewards: #0: 3.991 +[2024-11-07 16:50:32,334][14395] Avg episode reward: 4.329, avg true_objective: 3.991 +[2024-11-07 16:50:32,477][14395] Num frames 27200... +[2024-11-07 16:50:32,825][14395] Num frames 27300... +[2024-11-07 16:50:33,321][14395] Num frames 27400... +[2024-11-07 16:50:33,591][14395] Num frames 27500... +[2024-11-07 16:50:33,923][14395] Avg episode rewards: #0: 4.346, true rewards: #0: 3.998 +[2024-11-07 16:50:33,925][14395] Avg episode reward: 4.346, avg true_objective: 3.998 +[2024-11-07 16:50:33,958][14395] Num frames 27600... +[2024-11-07 16:50:34,165][14395] Num frames 27700... +[2024-11-07 16:50:34,390][14395] Num frames 27800... +[2024-11-07 16:50:34,585][14395] Num frames 27900... +[2024-11-07 16:50:34,805][14395] Avg episode rewards: #0: 4.338, true rewards: #0: 3.995 +[2024-11-07 16:50:34,806][14395] Avg episode reward: 4.338, avg true_objective: 3.995 +[2024-11-07 16:50:34,888][14395] Num frames 28000... +[2024-11-07 16:50:35,087][14395] Num frames 28100... +[2024-11-07 16:50:35,294][14395] Num frames 28200... +[2024-11-07 16:50:35,493][14395] Num frames 28300... +[2024-11-07 16:50:35,688][14395] Num frames 28400... +[2024-11-07 16:50:35,780][14395] Avg episode rewards: #0: 4.354, true rewards: #0: 4.002 +[2024-11-07 16:50:35,785][14395] Avg episode reward: 4.354, avg true_objective: 4.002 +[2024-11-07 16:50:35,967][14395] Num frames 28500... +[2024-11-07 16:50:36,152][14395] Num frames 28600... +[2024-11-07 16:50:36,348][14395] Num frames 28700... +[2024-11-07 16:50:36,548][14395] Num frames 28800... +[2024-11-07 16:50:36,601][14395] Avg episode rewards: #0: 4.347, true rewards: #0: 4.000 +[2024-11-07 16:50:36,604][14395] Avg episode reward: 4.347, avg true_objective: 4.000 +[2024-11-07 16:50:36,827][14395] Num frames 28900... +[2024-11-07 16:50:37,019][14395] Num frames 29000... +[2024-11-07 16:50:37,212][14395] Num frames 29100... +[2024-11-07 16:50:37,430][14395] Avg episode rewards: #0: 4.340, true rewards: #0: 3.998 +[2024-11-07 16:50:37,433][14395] Avg episode reward: 4.340, avg true_objective: 3.998 +[2024-11-07 16:50:37,484][14395] Num frames 29200... +[2024-11-07 16:50:37,676][14395] Num frames 29300... +[2024-11-07 16:50:37,869][14395] Num frames 29400... +[2024-11-07 16:50:38,059][14395] Num frames 29500... +[2024-11-07 16:50:38,246][14395] Avg episode rewards: #0: 4.334, true rewards: #0: 3.996 +[2024-11-07 16:50:38,249][14395] Avg episode reward: 4.334, avg true_objective: 3.996 +[2024-11-07 16:50:38,337][14395] Num frames 29600... +[2024-11-07 16:50:38,537][14395] Num frames 29700... +[2024-11-07 16:50:38,726][14395] Num frames 29800... +[2024-11-07 16:50:38,908][14395] Num frames 29900... +[2024-11-07 16:50:39,101][14395] Num frames 30000... +[2024-11-07 16:50:39,289][14395] Num frames 30100... +[2024-11-07 16:50:39,369][14395] Avg episode rewards: #0: 4.375, true rewards: #0: 4.015 +[2024-11-07 16:50:39,372][14395] Avg episode reward: 4.375, avg true_objective: 4.015 +[2024-11-07 16:50:39,558][14395] Num frames 30200... +[2024-11-07 16:50:39,758][14395] Num frames 30300... +[2024-11-07 16:50:39,957][14395] Num frames 30400... +[2024-11-07 16:50:40,215][14395] Avg episode rewards: #0: 4.368, true rewards: #0: 4.013 +[2024-11-07 16:50:40,217][14395] Avg episode reward: 4.368, avg true_objective: 4.013 +[2024-11-07 16:50:40,229][14395] Num frames 30500... +[2024-11-07 16:50:40,426][14395] Num frames 30600... +[2024-11-07 16:50:40,622][14395] Num frames 30700... +[2024-11-07 16:50:40,842][14395] Num frames 30800... +[2024-11-07 16:50:40,989][14395] Avg episode rewards: #0: 4.370, true rewards: #0: 4.006 +[2024-11-07 16:50:40,991][14395] Avg episode reward: 4.370, avg true_objective: 4.006 +[2024-11-07 16:50:41,099][14395] Num frames 30900... +[2024-11-07 16:50:41,286][14395] Num frames 31000... +[2024-11-07 16:50:41,505][14395] Num frames 31100... +[2024-11-07 16:50:41,704][14395] Num frames 31200... +[2024-11-07 16:50:41,893][14395] Num frames 31300... +[2024-11-07 16:50:42,146][14395] Avg episode rewards: #0: 4.409, true rewards: #0: 4.025 +[2024-11-07 16:50:42,149][14395] Avg episode reward: 4.409, avg true_objective: 4.025 +[2024-11-07 16:50:42,185][14395] Num frames 31400... +[2024-11-07 16:50:42,369][14395] Num frames 31500... +[2024-11-07 16:50:42,569][14395] Num frames 31600... +[2024-11-07 16:50:42,766][14395] Num frames 31700... +[2024-11-07 16:50:42,958][14395] Avg episode rewards: #0: 4.402, true rewards: #0: 4.022 +[2024-11-07 16:50:42,960][14395] Avg episode reward: 4.402, avg true_objective: 4.022 +[2024-11-07 16:50:43,023][14395] Num frames 31800... +[2024-11-07 16:50:43,215][14395] Num frames 31900... +[2024-11-07 16:50:43,411][14395] Num frames 32000... +[2024-11-07 16:50:43,594][14395] Num frames 32100... +[2024-11-07 16:50:43,778][14395] Avg episode rewards: #0: 4.395, true rewards: #0: 4.020 +[2024-11-07 16:50:43,782][14395] Avg episode reward: 4.395, avg true_objective: 4.020 +[2024-11-07 16:50:43,882][14395] Num frames 32200... +[2024-11-07 16:50:44,095][14395] Num frames 32300... +[2024-11-07 16:50:44,296][14395] Num frames 32400... +[2024-11-07 16:50:44,486][14395] Num frames 32500... +[2024-11-07 16:50:44,630][14395] Avg episode rewards: #0: 4.388, true rewards: #0: 4.018 +[2024-11-07 16:50:44,633][14395] Avg episode reward: 4.388, avg true_objective: 4.018 +[2024-11-07 16:50:44,747][14395] Num frames 32600... +[2024-11-07 16:50:44,940][14395] Num frames 32700... +[2024-11-07 16:50:45,161][14395] Num frames 32800... +[2024-11-07 16:50:45,407][14395] Num frames 32900... +[2024-11-07 16:50:45,526][14395] Avg episode rewards: #0: 4.381, true rewards: #0: 4.016 +[2024-11-07 16:50:45,530][14395] Avg episode reward: 4.381, avg true_objective: 4.016 +[2024-11-07 16:50:45,725][14395] Num frames 33000... +[2024-11-07 16:50:45,965][14395] Num frames 33100... +[2024-11-07 16:50:46,207][14395] Num frames 33200... +[2024-11-07 16:50:46,430][14395] Num frames 33300... +[2024-11-07 16:50:46,512][14395] Avg episode rewards: #0: 4.375, true rewards: #0: 4.013 +[2024-11-07 16:50:46,513][14395] Avg episode reward: 4.375, avg true_objective: 4.013 +[2024-11-07 16:50:46,724][14395] Num frames 33400... +[2024-11-07 16:50:46,966][14395] Num frames 33500... +[2024-11-07 16:50:47,184][14395] Num frames 33600... +[2024-11-07 16:50:47,471][14395] Avg episode rewards: #0: 4.369, true rewards: #0: 4.011 +[2024-11-07 16:50:47,474][14395] Avg episode reward: 4.369, avg true_objective: 4.011 +[2024-11-07 16:50:47,497][14395] Num frames 33700... +[2024-11-07 16:50:47,733][14395] Num frames 33800... +[2024-11-07 16:50:47,941][14395] Num frames 33900... +[2024-11-07 16:50:48,120][14395] Num frames 34000... +[2024-11-07 16:50:48,303][14395] Num frames 34100... +[2024-11-07 16:50:48,439][14395] Avg episode rewards: #0: 4.382, true rewards: #0: 4.017 +[2024-11-07 16:50:48,442][14395] Avg episode reward: 4.382, avg true_objective: 4.017 +[2024-11-07 16:50:48,555][14395] Num frames 34200... +[2024-11-07 16:50:48,745][14395] Num frames 34300... +[2024-11-07 16:50:48,934][14395] Num frames 34400... +[2024-11-07 16:50:49,168][14395] Num frames 34500... +[2024-11-07 16:50:49,405][14395] Avg episode rewards: #0: 4.394, true rewards: #0: 4.022 +[2024-11-07 16:50:49,407][14395] Avg episode reward: 4.394, avg true_objective: 4.022 +[2024-11-07 16:50:49,434][14395] Num frames 34600... +[2024-11-07 16:50:49,635][14395] Num frames 34700... +[2024-11-07 16:50:49,829][14395] Num frames 34800... +[2024-11-07 16:50:50,011][14395] Num frames 34900... +[2024-11-07 16:50:50,214][14395] Avg episode rewards: #0: 4.388, true rewards: #0: 4.020 +[2024-11-07 16:50:50,216][14395] Avg episode reward: 4.388, avg true_objective: 4.020 +[2024-11-07 16:50:50,270][14395] Num frames 35000... +[2024-11-07 16:50:50,506][14395] Num frames 35100... +[2024-11-07 16:50:50,718][14395] Num frames 35200... +[2024-11-07 16:50:50,967][14395] Num frames 35300... +[2024-11-07 16:50:51,165][14395] Avg episode rewards: #0: 4.382, true rewards: #0: 4.018 +[2024-11-07 16:50:51,170][14395] Avg episode reward: 4.382, avg true_objective: 4.018 +[2024-11-07 16:50:51,285][14395] Num frames 35400... +[2024-11-07 16:50:51,537][14395] Num frames 35500... +[2024-11-07 16:50:51,790][14395] Num frames 35600... +[2024-11-07 16:50:52,038][14395] Num frames 35700... +[2024-11-07 16:50:52,242][14395] Num frames 35800... +[2024-11-07 16:50:52,318][14395] Avg episode rewards: #0: 4.394, true rewards: #0: 4.023 +[2024-11-07 16:50:52,322][14395] Avg episode reward: 4.394, avg true_objective: 4.023 +[2024-11-07 16:50:52,540][14395] Num frames 35900... +[2024-11-07 16:50:52,737][14395] Num frames 36000... +[2024-11-07 16:50:52,952][14395] Num frames 36100... +[2024-11-07 16:50:53,231][14395] Avg episode rewards: #0: 4.388, true rewards: #0: 4.021 +[2024-11-07 16:50:53,233][14395] Avg episode reward: 4.388, avg true_objective: 4.021 +[2024-11-07 16:50:53,256][14395] Num frames 36200... +[2024-11-07 16:50:53,487][14395] Num frames 36300... +[2024-11-07 16:50:53,741][14395] Num frames 36400... +[2024-11-07 16:50:53,975][14395] Num frames 36500... +[2024-11-07 16:50:54,215][14395] Avg episode rewards: #0: 4.382, true rewards: #0: 4.019 +[2024-11-07 16:50:54,218][14395] Avg episode reward: 4.382, avg true_objective: 4.019 +[2024-11-07 16:50:54,287][14395] Num frames 36600... +[2024-11-07 16:50:54,515][14395] Num frames 36700... +[2024-11-07 16:50:54,741][14395] Num frames 36800... +[2024-11-07 16:50:54,947][14395] Num frames 36900... +[2024-11-07 16:50:55,127][14395] Avg episode rewards: #0: 4.376, true rewards: #0: 4.017 +[2024-11-07 16:50:55,131][14395] Avg episode reward: 4.376, avg true_objective: 4.017 +[2024-11-07 16:50:55,220][14395] Num frames 37000... +[2024-11-07 16:50:55,422][14395] Num frames 37100... +[2024-11-07 16:50:55,626][14395] Num frames 37200... +[2024-11-07 16:50:55,830][14395] Num frames 37300... +[2024-11-07 16:50:55,971][14395] Avg episode rewards: #0: 4.370, true rewards: #0: 4.015 +[2024-11-07 16:50:55,974][14395] Avg episode reward: 4.370, avg true_objective: 4.015 +[2024-11-07 16:50:56,100][14395] Num frames 37400... +[2024-11-07 16:50:56,319][14395] Num frames 37500... +[2024-11-07 16:50:56,549][14395] Num frames 37600... +[2024-11-07 16:50:56,793][14395] Num frames 37700... +[2024-11-07 16:50:56,911][14395] Avg episode rewards: #0: 4.365, true rewards: #0: 4.014 +[2024-11-07 16:50:56,925][14395] Avg episode reward: 4.365, avg true_objective: 4.014 +[2024-11-07 16:50:57,137][14395] Num frames 37800... +[2024-11-07 16:50:57,368][14395] Num frames 37900... +[2024-11-07 16:50:57,614][14395] Num frames 38000... +[2024-11-07 16:50:57,866][14395] Num frames 38100... +[2024-11-07 16:50:58,014][14395] Avg episode rewards: #0: 4.373, true rewards: #0: 4.015 +[2024-11-07 16:50:58,017][14395] Avg episode reward: 4.373, avg true_objective: 4.015 +[2024-11-07 16:50:58,173][14395] Num frames 38200... +[2024-11-07 16:50:58,420][14395] Num frames 38300... +[2024-11-07 16:50:58,652][14395] Num frames 38400... +[2024-11-07 16:50:58,878][14395] Num frames 38500... +[2024-11-07 16:50:58,998][14395] Avg episode rewards: #0: 4.368, true rewards: #0: 4.013 +[2024-11-07 16:50:59,002][14395] Avg episode reward: 4.368, avg true_objective: 4.013 +[2024-11-07 16:50:59,195][14395] Num frames 38600... +[2024-11-07 16:50:59,418][14395] Num frames 38700... +[2024-11-07 16:50:59,674][14395] Num frames 38800... +[2024-11-07 16:50:59,973][14395] Num frames 38900... +[2024-11-07 16:51:00,350][14395] Avg episode rewards: #0: 4.379, true rewards: #0: 4.018 +[2024-11-07 16:51:00,351][14395] Avg episode reward: 4.379, avg true_objective: 4.018 +[2024-11-07 16:51:00,436][14395] Num frames 39000... +[2024-11-07 16:51:00,782][14395] Num frames 39100... +[2024-11-07 16:51:01,099][14395] Num frames 39200... +[2024-11-07 16:51:01,383][14395] Num frames 39300... +[2024-11-07 16:51:01,681][14395] Avg episode rewards: #0: 4.387, true rewards: #0: 4.020 +[2024-11-07 16:51:01,683][14395] Avg episode reward: 4.387, avg true_objective: 4.020 +[2024-11-07 16:51:01,707][14395] Num frames 39400... +[2024-11-07 16:51:01,989][14395] Num frames 39500... +[2024-11-07 16:51:02,273][14395] Num frames 39600... +[2024-11-07 16:51:02,512][14395] Num frames 39700... +[2024-11-07 16:51:02,760][14395] Num frames 39800... +[2024-11-07 16:51:02,911][14395] Avg episode rewards: #0: 4.398, true rewards: #0: 4.024 +[2024-11-07 16:51:02,912][14395] Avg episode reward: 4.398, avg true_objective: 4.024 +[2024-11-07 16:51:03,086][14395] Num frames 39900... +[2024-11-07 16:51:03,329][14395] Num frames 40000... +[2024-11-07 16:51:05,794][14395] Avg episode rewards: #0: 4.380, true rewards: #0: 4.010 +[2024-11-07 16:51:05,795][14395] Avg episode reward: 4.380, avg true_objective: 4.010 +[2024-11-07 16:51:05,822][14395] Num frames 40100... +[2024-11-07 16:51:06,057][14395] Num frames 40200... +[2024-11-07 16:51:06,248][14395] Num frames 40300... +[2024-11-07 16:51:06,439][14395] Num frames 40400... +[2024-11-07 16:51:06,681][14395] Num frames 40500... +[2024-11-07 16:51:06,917][14395] Avg episode rewards: #0: 4.409, true rewards: #0: 4.019 +[2024-11-07 16:51:06,918][14395] Avg episode reward: 4.409, avg true_objective: 4.019 +[2024-11-07 16:51:06,975][14395] Num frames 40600... +[2024-11-07 16:51:07,158][14395] Num frames 40700... +[2024-11-07 16:51:07,361][14395] Num frames 40800... +[2024-11-07 16:51:07,568][14395] Num frames 40900... +[2024-11-07 16:51:07,745][14395] Avg episode rewards: #0: 4.409, true rewards: #0: 4.019 +[2024-11-07 16:51:07,750][14395] Avg episode reward: 4.409, avg true_objective: 4.019 +[2024-11-07 16:51:07,854][14395] Num frames 41000... +[2024-11-07 16:51:08,044][14395] Num frames 41100... +[2024-11-07 16:51:08,278][14395] Num frames 41200... +[2024-11-07 16:51:08,519][14395] Num frames 41300... +[2024-11-07 16:51:08,672][14395] Avg episode rewards: #0: 4.393, true rewards: #0: 4.013 +[2024-11-07 16:51:08,674][14395] Avg episode reward: 4.393, avg true_objective: 4.013 +[2024-11-07 16:51:08,810][14395] Num frames 41400... +[2024-11-07 16:51:09,001][14395] Num frames 41500... +[2024-11-07 16:51:09,236][14395] Num frames 41600... +[2024-11-07 16:51:09,493][14395] Num frames 41700... +[2024-11-07 16:51:09,703][14395] Avg episode rewards: #0: 4.393, true rewards: #0: 4.013 +[2024-11-07 16:51:09,704][14395] Avg episode reward: 4.393, avg true_objective: 4.013 +[2024-11-07 16:51:09,813][14395] Num frames 41800... +[2024-11-07 16:51:10,074][14395] Num frames 41900... +[2024-11-07 16:51:10,350][14395] Num frames 42000... +[2024-11-07 16:51:10,640][14395] Num frames 42100... +[2024-11-07 16:51:10,837][14395] Avg episode rewards: #0: 4.376, true rewards: #0: 4.006 +[2024-11-07 16:51:10,839][14395] Avg episode reward: 4.376, avg true_objective: 4.006 +[2024-11-07 16:51:10,995][14395] Num frames 42200... +[2024-11-07 16:51:11,250][14395] Num frames 42300... +[2024-11-07 16:51:11,469][14395] Num frames 42400... +[2024-11-07 16:51:11,707][14395] Num frames 42500... +[2024-11-07 16:51:11,929][14395] Num frames 42600... +[2024-11-07 16:51:12,184][14395] Num frames 42700... +[2024-11-07 16:51:12,384][14395] Avg episode rewards: #0: 4.429, true rewards: #0: 4.029 +[2024-11-07 16:51:12,385][14395] Avg episode reward: 4.429, avg true_objective: 4.029 +[2024-11-07 16:51:12,512][14395] Num frames 42800... +[2024-11-07 16:51:12,835][14395] Num frames 42900... +[2024-11-07 16:51:13,103][14395] Num frames 43000... +[2024-11-07 16:51:13,311][14395] Num frames 43100... +[2024-11-07 16:51:13,433][14395] Avg episode rewards: #0: 4.429, true rewards: #0: 4.029 +[2024-11-07 16:51:13,436][14395] Avg episode reward: 4.429, avg true_objective: 4.029 +[2024-11-07 16:51:13,561][14395] Num frames 43200... +[2024-11-07 16:51:13,784][14395] Num frames 43300... +[2024-11-07 16:51:13,991][14395] Num frames 43400... +[2024-11-07 16:51:14,206][14395] Num frames 43500... +[2024-11-07 16:51:14,306][14395] Avg episode rewards: #0: 4.429, true rewards: #0: 4.029 +[2024-11-07 16:51:14,311][14395] Avg episode reward: 4.429, avg true_objective: 4.029 +[2024-11-07 16:51:14,507][14395] Num frames 43600... +[2024-11-07 16:51:14,708][14395] Num frames 43700... +[2024-11-07 16:51:14,898][14395] Avg episode rewards: #0: 4.416, true rewards: #0: 4.016 +[2024-11-07 16:51:14,901][14395] Avg episode reward: 4.416, avg true_objective: 4.016 +[2024-11-07 16:51:14,979][14395] Num frames 43800... +[2024-11-07 16:51:15,182][14395] Num frames 43900... +[2024-11-07 16:51:15,400][14395] Num frames 44000... +[2024-11-07 16:51:15,626][14395] Num frames 44100... +[2024-11-07 16:51:15,819][14395] Avg episode rewards: #0: 4.416, true rewards: #0: 4.016 +[2024-11-07 16:51:15,822][14395] Avg episode reward: 4.416, avg true_objective: 4.016 +[2024-11-07 16:51:15,935][14395] Num frames 44200... +[2024-11-07 16:51:16,180][14395] Num frames 44300... +[2024-11-07 16:51:16,544][14395] Num frames 44400... +[2024-11-07 16:51:16,833][14395] Num frames 44500... +[2024-11-07 16:51:16,982][14395] Avg episode rewards: #0: 4.416, true rewards: #0: 4.016 +[2024-11-07 16:51:16,985][14395] Avg episode reward: 4.416, avg true_objective: 4.016 +[2024-11-07 16:51:17,158][14395] Num frames 44600... +[2024-11-07 16:51:17,390][14395] Num frames 44700... +[2024-11-07 16:51:17,672][14395] Num frames 44800... +[2024-11-07 16:51:17,940][14395] Num frames 44900... +[2024-11-07 16:51:18,248][14395] Avg episode rewards: #0: 4.432, true rewards: #0: 4.022 +[2024-11-07 16:51:18,252][14395] Avg episode reward: 4.432, avg true_objective: 4.022 +[2024-11-07 16:51:18,286][14395] Num frames 45000... +[2024-11-07 16:51:18,535][14395] Num frames 45100... +[2024-11-07 16:51:18,787][14395] Num frames 45200... +[2024-11-07 16:51:19,042][14395] Num frames 45300... +[2024-11-07 16:51:19,283][14395] Num frames 45400... +[2024-11-07 16:51:19,423][14395] Avg episode rewards: #0: 4.449, true rewards: #0: 4.029 +[2024-11-07 16:51:19,427][14395] Avg episode reward: 4.449, avg true_objective: 4.029 +[2024-11-07 16:51:19,598][14395] Num frames 45500... +[2024-11-07 16:51:19,822][14395] Num frames 45600... +[2024-11-07 16:51:20,094][14395] Num frames 45700... +[2024-11-07 16:51:20,305][14395] Num frames 45800... +[2024-11-07 16:51:20,425][14395] Avg episode rewards: #0: 4.449, true rewards: #0: 4.029 +[2024-11-07 16:51:20,426][14395] Avg episode reward: 4.449, avg true_objective: 4.029 +[2024-11-07 16:51:20,621][14395] Num frames 45900... +[2024-11-07 16:51:20,871][14395] Num frames 46000... +[2024-11-07 16:51:21,111][14395] Num frames 46100... +[2024-11-07 16:51:21,372][14395] Num frames 46200... +[2024-11-07 16:51:21,450][14395] Avg episode rewards: #0: 4.449, true rewards: #0: 4.029 +[2024-11-07 16:51:21,454][14395] Avg episode reward: 4.449, avg true_objective: 4.029 +[2024-11-07 16:51:21,688][14395] Num frames 46300... +[2024-11-07 16:51:21,927][14395] Num frames 46400... +[2024-11-07 16:51:22,171][14395] Num frames 46500... +[2024-11-07 16:51:22,458][14395] Avg episode rewards: #0: 4.449, true rewards: #0: 4.029 +[2024-11-07 16:51:22,462][14395] Avg episode reward: 4.449, avg true_objective: 4.029 +[2024-11-07 16:51:22,498][14395] Num frames 46600... +[2024-11-07 16:51:22,757][14395] Num frames 46700... +[2024-11-07 16:51:23,021][14395] Num frames 46800... +[2024-11-07 16:51:23,248][14395] Num frames 46900... +[2024-11-07 16:51:23,476][14395] Avg episode rewards: #0: 4.449, true rewards: #0: 4.029 +[2024-11-07 16:51:23,479][14395] Avg episode reward: 4.449, avg true_objective: 4.029 +[2024-11-07 16:51:23,556][14395] Num frames 47000... +[2024-11-07 16:51:23,774][14395] Num frames 47100... +[2024-11-07 16:51:23,986][14395] Num frames 47200... +[2024-11-07 16:51:24,229][14395] Num frames 47300... +[2024-11-07 16:51:24,437][14395] Avg episode rewards: #0: 4.449, true rewards: #0: 4.029 +[2024-11-07 16:51:24,443][14395] Avg episode reward: 4.449, avg true_objective: 4.029 +[2024-11-07 16:51:24,571][14395] Num frames 47400... +[2024-11-07 16:51:24,811][14395] Num frames 47500... +[2024-11-07 16:51:25,095][14395] Num frames 47600... +[2024-11-07 16:51:25,350][14395] Num frames 47700... +[2024-11-07 16:51:25,509][14395] Avg episode rewards: #0: 4.419, true rewards: #0: 4.019 +[2024-11-07 16:51:25,511][14395] Avg episode reward: 4.419, avg true_objective: 4.019 +[2024-11-07 16:51:25,626][14395] Num frames 47800... +[2024-11-07 16:51:25,873][14395] Num frames 47900... +[2024-11-07 16:51:26,119][14395] Num frames 48000... +[2024-11-07 16:51:26,392][14395] Num frames 48100... +[2024-11-07 16:51:26,522][14395] Avg episode rewards: #0: 4.419, true rewards: #0: 4.019 +[2024-11-07 16:51:26,524][14395] Avg episode reward: 4.419, avg true_objective: 4.019 +[2024-11-07 16:51:26,723][14395] Num frames 48200... +[2024-11-07 16:51:26,981][14395] Num frames 48300... +[2024-11-07 16:51:27,242][14395] Num frames 48400... +[2024-11-07 16:51:27,492][14395] Num frames 48500... +[2024-11-07 16:51:27,581][14395] Avg episode rewards: #0: 4.403, true rewards: #0: 4.013 +[2024-11-07 16:51:27,583][14395] Avg episode reward: 4.403, avg true_objective: 4.013 +[2024-11-07 16:51:27,787][14395] Num frames 48600... +[2024-11-07 16:51:28,047][14395] Num frames 48700... +[2024-11-07 16:51:28,269][14395] Num frames 48800... +[2024-11-07 16:51:28,610][14395] Avg episode rewards: #0: 4.403, true rewards: #0: 4.013 +[2024-11-07 16:51:28,613][14395] Avg episode reward: 4.403, avg true_objective: 4.013 +[2024-11-07 16:51:28,629][14395] Num frames 48900... +[2024-11-07 16:51:28,923][14395] Num frames 49000... +[2024-11-07 16:51:29,129][14395] Num frames 49100... +[2024-11-07 16:51:29,341][14395] Num frames 49200... +[2024-11-07 16:51:29,563][14395] Num frames 49300... +[2024-11-07 16:51:29,645][14395] Avg episode rewards: #0: 4.400, true rewards: #0: 4.010 +[2024-11-07 16:51:29,647][14395] Avg episode reward: 4.400, avg true_objective: 4.010 +[2024-11-07 16:51:29,897][14395] Num frames 49400... +[2024-11-07 16:51:30,088][14395] Num frames 49500... +[2024-11-07 16:51:30,344][14395] Num frames 49600... +[2024-11-07 16:51:30,584][14395] Avg episode rewards: #0: 4.400, true rewards: #0: 4.010 +[2024-11-07 16:51:30,588][14395] Avg episode reward: 4.400, avg true_objective: 4.010 +[2024-11-07 16:51:30,611][14395] Num frames 49700... +[2024-11-07 16:51:30,872][14395] Num frames 49800... +[2024-11-07 16:51:31,068][14395] Num frames 49900... +[2024-11-07 16:51:31,509][14395] Num frames 50000... +[2024-11-07 16:51:31,761][14395] Avg episode rewards: #0: 4.400, true rewards: #0: 4.010 +[2024-11-07 16:51:31,763][14395] Avg episode reward: 4.400, avg true_objective: 4.010 +[2024-11-07 16:51:31,811][14395] Num frames 50100... +[2024-11-07 16:51:32,053][14395] Num frames 50200... +[2024-11-07 16:51:32,284][14395] Num frames 50300... +[2024-11-07 16:51:32,566][14395] Num frames 50400... +[2024-11-07 16:51:32,870][14395] Num frames 50500... +[2024-11-07 16:51:33,006][14395] Avg episode rewards: #0: 4.416, true rewards: #0: 4.016 +[2024-11-07 16:51:33,008][14395] Avg episode reward: 4.416, avg true_objective: 4.016 +[2024-11-07 16:51:33,222][14395] Num frames 50600... +[2024-11-07 16:51:33,478][14395] Num frames 50700... +[2024-11-07 16:51:33,747][14395] Num frames 50800... +[2024-11-07 16:51:34,050][14395] Num frames 50900... +[2024-11-07 16:51:34,326][14395] Avg episode rewards: #0: 4.419, true rewards: #0: 4.019 +[2024-11-07 16:51:34,327][14395] Avg episode reward: 4.419, avg true_objective: 4.019 +[2024-11-07 16:51:34,402][14395] Num frames 51000... +[2024-11-07 16:51:34,730][14395] Num frames 51100... +[2024-11-07 16:51:34,949][14395] Num frames 51200... +[2024-11-07 16:51:35,175][14395] Num frames 51300... +[2024-11-07 16:51:35,383][14395] Avg episode rewards: #0: 4.419, true rewards: #0: 4.019 +[2024-11-07 16:51:35,385][14395] Avg episode reward: 4.419, avg true_objective: 4.019 +[2024-11-07 16:51:35,496][14395] Num frames 51400... +[2024-11-07 16:51:35,740][14395] Num frames 51500... +[2024-11-07 16:51:36,023][14395] Num frames 51600... +[2024-11-07 16:51:36,319][14395] Num frames 51700... +[2024-11-07 16:51:36,555][14395] Avg episode rewards: #0: 4.367, true rewards: #0: 3.997 +[2024-11-07 16:51:36,557][14395] Avg episode reward: 4.367, avg true_objective: 3.997 +[2024-11-07 16:51:36,726][14395] Num frames 51800... +[2024-11-07 16:51:36,976][14395] Num frames 51900... +[2024-11-07 16:51:37,229][14395] Num frames 52000... +[2024-11-07 16:51:37,519][14395] Avg episode rewards: #0: 4.374, true rewards: #0: 3.994 +[2024-11-07 16:51:37,520][14395] Avg episode reward: 4.374, avg true_objective: 3.994 +[2024-11-07 16:51:37,531][14395] Num frames 52100... +[2024-11-07 16:51:37,781][14395] Num frames 52200... +[2024-11-07 16:51:40,064][14395] Num frames 52300... +[2024-11-07 16:51:40,310][14395] Num frames 52400... +[2024-11-07 16:51:40,573][14395] Avg episode rewards: #0: 4.374, true rewards: #0: 3.994 +[2024-11-07 16:51:40,578][14395] Avg episode reward: 4.374, avg true_objective: 3.994 +[2024-11-07 16:51:40,660][14395] Num frames 52500... +[2024-11-07 16:51:40,900][14395] Num frames 52600... +[2024-11-07 16:51:41,131][14395] Num frames 52700... +[2024-11-07 16:51:41,381][14395] Num frames 52800... +[2024-11-07 16:51:41,598][14395] Avg episode rewards: #0: 4.374, true rewards: #0: 3.994 +[2024-11-07 16:51:41,599][14395] Avg episode reward: 4.374, avg true_objective: 3.994 +[2024-11-07 16:51:41,711][14395] Num frames 52900... +[2024-11-07 16:51:41,970][14395] Num frames 53000... +[2024-11-07 16:51:42,237][14395] Num frames 53100... +[2024-11-07 16:51:42,466][14395] Num frames 53200... +[2024-11-07 16:51:42,702][14395] Num frames 53300... +[2024-11-07 16:51:42,784][14395] Avg episode rewards: #0: 4.374, true rewards: #0: 3.994 +[2024-11-07 16:51:42,786][14395] Avg episode reward: 4.374, avg true_objective: 3.994 +[2024-11-07 16:51:43,037][14395] Num frames 53400... +[2024-11-07 16:51:43,348][14395] Num frames 53500... +[2024-11-07 16:51:43,595][14395] Num frames 53600... +[2024-11-07 16:51:43,912][14395] Avg episode rewards: #0: 4.374, true rewards: #0: 3.994 +[2024-11-07 16:51:43,915][14395] Avg episode reward: 4.374, avg true_objective: 3.994 +[2024-11-07 16:51:43,934][14395] Num frames 53700... +[2024-11-07 16:51:44,185][14395] Num frames 53800... +[2024-11-07 16:51:44,429][14395] Num frames 53900... +[2024-11-07 16:51:44,715][14395] Num frames 54000... +[2024-11-07 16:51:44,959][14395] Num frames 54100... +[2024-11-07 16:51:45,116][14395] Avg episode rewards: #0: 4.390, true rewards: #0: 4.000 +[2024-11-07 16:51:45,117][14395] Avg episode reward: 4.390, avg true_objective: 4.000 +[2024-11-07 16:51:45,354][14395] Num frames 54200... +[2024-11-07 16:51:45,570][14395] Num frames 54300... +[2024-11-07 16:51:45,816][14395] Num frames 54400... +[2024-11-07 16:51:45,976][14395] Avg episode rewards: #0: 4.377, true rewards: #0: 3.987 +[2024-11-07 16:51:45,980][14395] Avg episode reward: 4.377, avg true_objective: 3.987 +[2024-11-07 16:51:46,164][14395] Num frames 54500... +[2024-11-07 16:51:46,426][14395] Num frames 54600... +[2024-11-07 16:51:46,681][14395] Num frames 54700... +[2024-11-07 16:51:46,932][14395] Num frames 54800... +[2024-11-07 16:51:47,181][14395] Avg episode rewards: #0: 4.377, true rewards: #0: 3.987 +[2024-11-07 16:51:47,183][14395] Avg episode reward: 4.377, avg true_objective: 3.987 +[2024-11-07 16:51:47,273][14395] Num frames 54900... +[2024-11-07 16:51:47,529][14395] Num frames 55000... +[2024-11-07 16:51:47,795][14395] Num frames 55100... +[2024-11-07 16:51:48,035][14395] Num frames 55200... +[2024-11-07 16:51:48,242][14395] Avg episode rewards: #0: 4.354, true rewards: #0: 3.984 +[2024-11-07 16:51:48,244][14395] Avg episode reward: 4.354, avg true_objective: 3.984 +[2024-11-07 16:51:48,331][14395] Num frames 55300... +[2024-11-07 16:51:48,575][14395] Num frames 55400... +[2024-11-07 16:51:48,928][14395] Num frames 55500... +[2024-11-07 16:51:49,165][14395] Num frames 55600... +[2024-11-07 16:51:49,417][14395] Num frames 55700... +[2024-11-07 16:51:49,662][14395] Num frames 55800... +[2024-11-07 16:51:49,871][14395] Num frames 55900... +[2024-11-07 16:51:49,938][14395] Avg episode rewards: #0: 4.420, true rewards: #0: 4.010 +[2024-11-07 16:51:49,939][14395] Avg episode reward: 4.420, avg true_objective: 4.010 +[2024-11-07 16:51:50,164][14395] Num frames 56000... +[2024-11-07 16:51:50,407][14395] Num frames 56100... +[2024-11-07 16:51:50,662][14395] Num frames 56200... +[2024-11-07 16:51:50,934][14395] Avg episode rewards: #0: 4.432, true rewards: #0: 4.022 +[2024-11-07 16:51:50,937][14395] Avg episode reward: 4.432, avg true_objective: 4.022 +[2024-11-07 16:51:50,990][14395] Num frames 56300... +[2024-11-07 16:51:51,233][14395] Num frames 56400... +[2024-11-07 16:51:51,475][14395] Num frames 56500... +[2024-11-07 16:51:51,724][14395] Num frames 56600... +[2024-11-07 16:51:51,924][14395] Avg episode rewards: #0: 4.419, true rewards: #0: 4.019 +[2024-11-07 16:51:51,925][14395] Avg episode reward: 4.419, avg true_objective: 4.019 +[2024-11-07 16:51:51,998][14395] Num frames 56700... +[2024-11-07 16:51:52,256][14395] Num frames 56800... +[2024-11-07 16:51:52,505][14395] Num frames 56900... +[2024-11-07 16:51:52,753][14395] Num frames 57000... +[2024-11-07 16:51:53,028][14395] Avg episode rewards: #0: 4.432, true rewards: #0: 4.022 +[2024-11-07 16:51:53,030][14395] Avg episode reward: 4.432, avg true_objective: 4.022 +[2024-11-07 16:51:53,069][14395] Num frames 57100... +[2024-11-07 16:51:53,301][14395] Num frames 57200... +[2024-11-07 16:51:53,529][14395] Num frames 57300... +[2024-11-07 16:51:53,787][14395] Num frames 57400... +[2024-11-07 16:51:54,012][14395] Num frames 57500... +[2024-11-07 16:51:54,153][14395] Avg episode rewards: #0: 4.449, true rewards: #0: 4.029 +[2024-11-07 16:51:54,155][14395] Avg episode reward: 4.449, avg true_objective: 4.029 +[2024-11-07 16:51:54,324][14395] Num frames 57600... +[2024-11-07 16:51:54,581][14395] Num frames 57700... +[2024-11-07 16:51:54,824][14395] Num frames 57800... +[2024-11-07 16:51:55,076][14395] Num frames 57900... +[2024-11-07 16:51:55,185][14395] Avg episode rewards: #0: 4.449, true rewards: #0: 4.029 +[2024-11-07 16:51:55,188][14395] Avg episode reward: 4.449, avg true_objective: 4.029 +[2024-11-07 16:51:55,391][14395] Num frames 58000... +[2024-11-07 16:51:55,571][14395] Num frames 58100... +[2024-11-07 16:51:55,817][14395] Num frames 58200... +[2024-11-07 16:51:56,039][14395] Num frames 58300... +[2024-11-07 16:51:56,104][14395] Avg episode rewards: #0: 4.413, true rewards: #0: 4.013 +[2024-11-07 16:51:56,105][14395] Avg episode reward: 4.413, avg true_objective: 4.013 +[2024-11-07 16:51:56,357][14395] Num frames 58400... +[2024-11-07 16:51:56,611][14395] Num frames 58500... +[2024-11-07 16:51:56,834][14395] Num frames 58600... +[2024-11-07 16:51:57,083][14395] Num frames 58700... +[2024-11-07 16:51:57,272][14395] Avg episode rewards: #0: 4.429, true rewards: #0: 4.019 +[2024-11-07 16:51:57,275][14395] Avg episode reward: 4.429, avg true_objective: 4.019 +[2024-11-07 16:51:57,420][14395] Num frames 58800... +[2024-11-07 16:51:57,690][14395] Num frames 58900... +[2024-11-07 16:51:57,909][14395] Num frames 59000... +[2024-11-07 16:51:58,159][14395] Num frames 59100... +[2024-11-07 16:51:58,420][14395] Num frames 59200... +[2024-11-07 16:51:58,473][14395] Avg episode rewards: #0: 4.446, true rewards: #0: 4.026 +[2024-11-07 16:51:58,477][14395] Avg episode reward: 4.446, avg true_objective: 4.026 +[2024-11-07 16:51:58,768][14395] Num frames 59300... +[2024-11-07 16:51:58,971][14395] Num frames 59400... +[2024-11-07 16:51:59,177][14395] Num frames 59500... +[2024-11-07 16:51:59,397][14395] Avg episode rewards: #0: 4.446, true rewards: #0: 4.026 +[2024-11-07 16:51:59,399][14395] Avg episode reward: 4.446, avg true_objective: 4.026 +[2024-11-07 16:51:59,436][14395] Num frames 59600... +[2024-11-07 16:51:59,640][14395] Num frames 59700... +[2024-11-07 16:51:59,845][14395] Num frames 59800... +[2024-11-07 16:52:00,028][14395] Num frames 59900... +[2024-11-07 16:52:00,216][14395] Avg episode rewards: #0: 4.446, true rewards: #0: 4.026 +[2024-11-07 16:52:00,217][14395] Avg episode reward: 4.446, avg true_objective: 4.026 +[2024-11-07 16:52:00,278][14395] Num frames 60000... +[2024-11-07 16:52:00,482][14395] Num frames 60100... +[2024-11-07 16:52:00,771][14395] Num frames 60200... +[2024-11-07 16:52:00,880][14395] Avg episode rewards: #0: 4.433, true rewards: #0: 4.013 +[2024-11-07 16:52:00,884][14395] Avg episode reward: 4.433, avg true_objective: 4.013 +[2024-11-07 16:52:01,084][14395] Num frames 60300... +[2024-11-07 16:52:01,344][14395] Num frames 60400... +[2024-11-07 16:52:01,610][14395] Num frames 60500... +[2024-11-07 16:52:01,877][14395] Num frames 60600... +[2024-11-07 16:52:01,954][14395] Avg episode rewards: #0: 4.433, true rewards: #0: 4.013 +[2024-11-07 16:52:01,955][14395] Avg episode reward: 4.433, avg true_objective: 4.013 +[2024-11-07 16:52:02,195][14395] Num frames 60700... +[2024-11-07 16:52:02,447][14395] Num frames 60800... +[2024-11-07 16:52:02,653][14395] Num frames 60900... +[2024-11-07 16:52:02,910][14395] Num frames 61000... +[2024-11-07 16:52:03,086][14395] Avg episode rewards: #0: 4.449, true rewards: #0: 4.019 +[2024-11-07 16:52:03,088][14395] Avg episode reward: 4.449, avg true_objective: 4.019 +[2024-11-07 16:52:03,193][14395] Num frames 61100... +[2024-11-07 16:52:03,459][14395] Num frames 61200... +[2024-11-07 16:52:03,667][14395] Num frames 61300... +[2024-11-07 16:52:03,911][14395] Num frames 61400... +[2024-11-07 16:52:04,039][14395] Avg episode rewards: #0: 4.433, true rewards: #0: 4.013 +[2024-11-07 16:52:04,044][14395] Avg episode reward: 4.433, avg true_objective: 4.013 +[2024-11-07 16:52:04,219][14395] Num frames 61500... +[2024-11-07 16:52:04,436][14395] Num frames 61600... +[2024-11-07 16:52:04,697][14395] Num frames 61700... +[2024-11-07 16:52:04,910][14395] Num frames 61800... +[2024-11-07 16:52:05,017][14395] Avg episode rewards: #0: 4.433, true rewards: #0: 4.013 +[2024-11-07 16:52:05,018][14395] Avg episode reward: 4.433, avg true_objective: 4.013 +[2024-11-07 16:52:05,232][14395] Num frames 61900... +[2024-11-07 16:52:05,499][14395] Num frames 62000... +[2024-11-07 16:52:05,789][14395] Num frames 62100... +[2024-11-07 16:52:06,033][14395] Num frames 62200... +[2024-11-07 16:52:06,258][14395] Avg episode rewards: #0: 4.449, true rewards: #0: 4.019 +[2024-11-07 16:52:06,261][14395] Avg episode reward: 4.449, avg true_objective: 4.019 +[2024-11-07 16:52:06,322][14395] Num frames 62300... +[2024-11-07 16:52:06,554][14395] Num frames 62400... +[2024-11-07 16:52:06,843][14395] Num frames 62500... +[2024-11-07 16:52:07,050][14395] Num frames 62600... +[2024-11-07 16:52:07,230][14395] Avg episode rewards: #0: 4.436, true rewards: #0: 4.016 +[2024-11-07 16:52:07,232][14395] Avg episode reward: 4.436, avg true_objective: 4.016 +[2024-11-07 16:52:07,325][14395] Num frames 62700... +[2024-11-07 16:52:07,585][14395] Num frames 62800... +[2024-11-07 16:52:07,819][14395] Num frames 62900... +[2024-11-07 16:52:08,044][14395] Num frames 63000... +[2024-11-07 16:52:08,186][14395] Avg episode rewards: #0: 4.436, true rewards: #0: 4.016 +[2024-11-07 16:52:08,187][14395] Avg episode reward: 4.436, avg true_objective: 4.016 +[2024-11-07 16:52:08,342][14395] Num frames 63100... +[2024-11-07 16:52:08,610][14395] Num frames 63200... +[2024-11-07 16:52:08,855][14395] Num frames 63300... +[2024-11-07 16:52:09,102][14395] Num frames 63400... +[2024-11-07 16:52:09,221][14395] Avg episode rewards: #0: 4.436, true rewards: #0: 4.016 +[2024-11-07 16:52:09,223][14395] Avg episode reward: 4.436, avg true_objective: 4.016 +[2024-11-07 16:52:09,422][14395] Num frames 63500... +[2024-11-07 16:52:09,664][14395] Num frames 63600... +[2024-11-07 16:52:09,873][14395] Num frames 63700... +[2024-11-07 16:52:10,115][14395] Num frames 63800... +[2024-11-07 16:52:10,191][14395] Avg episode rewards: #0: 4.436, true rewards: #0: 4.016 +[2024-11-07 16:52:10,192][14395] Avg episode reward: 4.436, avg true_objective: 4.016 +[2024-11-07 16:52:10,412][14395] Num frames 63900... +[2024-11-07 16:52:10,673][14395] Num frames 64000... +[2024-11-07 16:52:10,925][14395] Num frames 64100... +[2024-11-07 16:52:11,201][14395] Avg episode rewards: #0: 4.436, true rewards: #0: 4.016 +[2024-11-07 16:52:11,206][14395] Avg episode reward: 4.436, avg true_objective: 4.016 +[2024-11-07 16:52:11,246][14395] Num frames 64200... +[2024-11-07 16:52:11,479][14395] Num frames 64300... +[2024-11-07 16:52:11,802][14395] Num frames 64400... +[2024-11-07 16:52:12,061][14395] Num frames 64500... +[2024-11-07 16:52:14,337][14395] Avg episode rewards: #0: 4.436, true rewards: #0: 4.016 +[2024-11-07 16:52:14,342][14395] Avg episode reward: 4.436, avg true_objective: 4.016 +[2024-11-07 16:52:14,418][14395] Num frames 64600... +[2024-11-07 16:52:14,651][14395] Num frames 64700... +[2024-11-07 16:52:14,886][14395] Num frames 64800... +[2024-11-07 16:52:15,120][14395] Num frames 64900... +[2024-11-07 16:52:15,668][14395] Num frames 65000... +[2024-11-07 16:52:15,779][14395] Avg episode rewards: #0: 4.416, true rewards: #0: 4.006 +[2024-11-07 16:52:15,782][14395] Avg episode reward: 4.416, avg true_objective: 4.006 +[2024-11-07 16:52:15,983][14395] Num frames 65100... +[2024-11-07 16:52:16,228][14395] Num frames 65200... +[2024-11-07 16:52:16,471][14395] Num frames 65300... +[2024-11-07 16:52:16,709][14395] Num frames 65400... +[2024-11-07 16:52:16,779][14395] Avg episode rewards: #0: 4.416, true rewards: #0: 4.006 +[2024-11-07 16:52:16,784][14395] Avg episode reward: 4.416, avg true_objective: 4.006 +[2024-11-07 16:52:17,014][14395] Num frames 65500... +[2024-11-07 16:52:17,241][14395] Num frames 65600... +[2024-11-07 16:52:17,487][14395] Num frames 65700... +[2024-11-07 16:52:17,774][14395] Avg episode rewards: #0: 4.416, true rewards: #0: 4.006 +[2024-11-07 16:52:17,778][14395] Avg episode reward: 4.416, avg true_objective: 4.006 +[2024-11-07 16:52:17,817][14395] Num frames 65800... +[2024-11-07 16:52:18,070][14395] Num frames 65900... +[2024-11-07 16:52:18,313][14395] Num frames 66000... +[2024-11-07 16:52:18,547][14395] Num frames 66100... +[2024-11-07 16:52:18,776][14395] Avg episode rewards: #0: 4.416, true rewards: #0: 4.006 +[2024-11-07 16:52:18,778][14395] Avg episode reward: 4.416, avg true_objective: 4.006 +[2024-11-07 16:52:18,856][14395] Num frames 66200... +[2024-11-07 16:52:19,110][14395] Num frames 66300... +[2024-11-07 16:52:19,361][14395] Num frames 66400... +[2024-11-07 16:52:19,609][14395] Num frames 66500... +[2024-11-07 16:52:19,809][14395] Avg episode rewards: #0: 4.416, true rewards: #0: 4.006 +[2024-11-07 16:52:19,812][14395] Avg episode reward: 4.416, avg true_objective: 4.006 +[2024-11-07 16:52:19,946][14395] Num frames 66600... +[2024-11-07 16:52:20,191][14395] Num frames 66700... +[2024-11-07 16:52:20,610][14395] Num frames 66800... +[2024-11-07 16:52:20,866][14395] Num frames 66900... +[2024-11-07 16:52:21,109][14395] Num frames 67000... +[2024-11-07 16:52:21,184][14395] Avg episode rewards: #0: 4.446, true rewards: #0: 4.026 +[2024-11-07 16:52:21,188][14395] Avg episode reward: 4.446, avg true_objective: 4.026 +[2024-11-07 16:52:21,417][14395] Num frames 67100... +[2024-11-07 16:52:21,654][14395] Num frames 67200... +[2024-11-07 16:52:21,898][14395] Num frames 67300... +[2024-11-07 16:52:22,170][14395] Avg episode rewards: #0: 4.446, true rewards: #0: 4.026 +[2024-11-07 16:52:22,171][14395] Avg episode reward: 4.446, avg true_objective: 4.026 +[2024-11-07 16:52:22,192][14395] Num frames 67400... +[2024-11-07 16:52:22,428][14395] Num frames 67500... +[2024-11-07 16:52:22,679][14395] Num frames 67600... +[2024-11-07 16:52:22,929][14395] Num frames 67700... +[2024-11-07 16:52:23,170][14395] Num frames 67800... +[2024-11-07 16:52:23,372][14395] Num frames 67900... +[2024-11-07 16:52:23,598][14395] Num frames 68000... +[2024-11-07 16:52:23,651][14395] Avg episode rewards: #0: 4.482, true rewards: #0: 4.042 +[2024-11-07 16:52:23,654][14395] Avg episode reward: 4.482, avg true_objective: 4.042 +[2024-11-07 16:52:23,914][14395] Num frames 68100... +[2024-11-07 16:52:24,153][14395] Num frames 68200... +[2024-11-07 16:52:24,406][14395] Num frames 68300... +[2024-11-07 16:52:24,641][14395] Avg episode rewards: #0: 4.482, true rewards: #0: 4.042 +[2024-11-07 16:52:24,646][14395] Avg episode reward: 4.482, avg true_objective: 4.042 +[2024-11-07 16:52:24,694][14395] Num frames 68400... +[2024-11-07 16:52:24,881][14395] Num frames 68500... +[2024-11-07 16:52:25,065][14395] Num frames 68600... +[2024-11-07 16:52:25,247][14395] Num frames 68700... +[2024-11-07 16:52:25,429][14395] Num frames 68800... +[2024-11-07 16:52:25,562][14395] Avg episode rewards: #0: 4.482, true rewards: #0: 4.042 +[2024-11-07 16:52:25,566][14395] Avg episode reward: 4.482, avg true_objective: 4.042 +[2024-11-07 16:52:25,743][14395] Num frames 68900... +[2024-11-07 16:52:25,969][14395] Num frames 69000... +[2024-11-07 16:52:26,221][14395] Avg episode rewards: #0: 4.469, true rewards: #0: 4.029 +[2024-11-07 16:52:26,223][14395] Avg episode reward: 4.469, avg true_objective: 4.029 +[2024-11-07 16:52:26,269][14395] Num frames 69100... +[2024-11-07 16:52:26,493][14395] Num frames 69200... +[2024-11-07 16:52:26,761][14395] Num frames 69300... +[2024-11-07 16:52:27,002][14395] Num frames 69400... +[2024-11-07 16:52:27,249][14395] Avg episode rewards: #0: 4.469, true rewards: #0: 4.029 +[2024-11-07 16:52:27,252][14395] Avg episode reward: 4.469, avg true_objective: 4.029 +[2024-11-07 16:52:27,357][14395] Num frames 69500... +[2024-11-07 16:52:27,704][14395] Num frames 69600... +[2024-11-07 16:52:27,946][14395] Num frames 69700... +[2024-11-07 16:52:28,164][14395] Num frames 69800... +[2024-11-07 16:52:28,354][14395] Avg episode rewards: #0: 4.469, true rewards: #0: 4.029 +[2024-11-07 16:52:28,358][14395] Avg episode reward: 4.469, avg true_objective: 4.029 +[2024-11-07 16:52:28,503][14395] Num frames 69900... +[2024-11-07 16:52:28,775][14395] Num frames 70000... +[2024-11-07 16:52:29,033][14395] Num frames 70100... +[2024-11-07 16:52:29,239][14395] Num frames 70200... +[2024-11-07 16:52:29,457][14395] Num frames 70300... +[2024-11-07 16:52:29,696][14395] Num frames 70400... +[2024-11-07 16:52:29,748][14395] Avg episode rewards: #0: 4.469, true rewards: #0: 4.029 +[2024-11-07 16:52:29,752][14395] Avg episode reward: 4.469, avg true_objective: 4.029 +[2024-11-07 16:52:29,990][14395] Num frames 70500... +[2024-11-07 16:52:30,359][14395] Num frames 70600... +[2024-11-07 16:52:30,577][14395] Num frames 70700... +[2024-11-07 16:52:30,820][14395] Avg episode rewards: #0: 4.469, true rewards: #0: 4.029 +[2024-11-07 16:52:30,822][14395] Avg episode reward: 4.469, avg true_objective: 4.029 +[2024-11-07 16:52:30,859][14395] Num frames 70800... +[2024-11-07 16:52:31,093][14395] Num frames 70900... +[2024-11-07 16:52:31,351][14395] Num frames 71000... +[2024-11-07 16:52:31,584][14395] Num frames 71100... +[2024-11-07 16:52:31,807][14395] Avg episode rewards: #0: 4.462, true rewards: #0: 4.032 +[2024-11-07 16:52:31,810][14395] Avg episode reward: 4.462, avg true_objective: 4.032 +[2024-11-07 16:52:31,899][14395] Num frames 71200... +[2024-11-07 16:52:32,123][14395] Num frames 71300... +[2024-11-07 16:52:32,326][14395] Num frames 71400... +[2024-11-07 16:52:32,538][14395] Num frames 71500... +[2024-11-07 16:52:32,721][14395] Avg episode rewards: #0: 4.426, true rewards: #0: 4.016 +[2024-11-07 16:52:32,726][14395] Avg episode reward: 4.426, avg true_objective: 4.016 +[2024-11-07 16:52:32,833][14395] Num frames 71600... +[2024-11-07 16:52:33,022][14395] Num frames 71700... +[2024-11-07 16:52:33,210][14395] Num frames 71800... +[2024-11-07 16:52:33,396][14395] Num frames 71900... +[2024-11-07 16:52:33,523][14395] Avg episode rewards: #0: 4.426, true rewards: #0: 4.016 +[2024-11-07 16:52:33,526][14395] Avg episode reward: 4.426, avg true_objective: 4.016 +[2024-11-07 16:52:33,687][14395] Num frames 72000... +[2024-11-07 16:52:33,936][14395] Num frames 72100... +[2024-11-07 16:52:34,159][14395] Num frames 72200... +[2024-11-07 16:52:34,487][14395] Num frames 72300... +[2024-11-07 16:52:34,600][14395] Avg episode rewards: #0: 4.426, true rewards: #0: 4.016 +[2024-11-07 16:52:34,604][14395] Avg episode reward: 4.426, avg true_objective: 4.016 +[2024-11-07 16:52:34,858][14395] Num frames 72400... +[2024-11-07 16:52:35,137][14395] Num frames 72500... +[2024-11-07 16:52:35,424][14395] Num frames 72600... +[2024-11-07 16:52:35,731][14395] Num frames 72700... +[2024-11-07 16:52:35,798][14395] Avg episode rewards: #0: 4.426, true rewards: #0: 4.016 +[2024-11-07 16:52:35,803][14395] Avg episode reward: 4.426, avg true_objective: 4.016 +[2024-11-07 16:52:36,087][14395] Num frames 72800... +[2024-11-07 16:52:36,379][14395] Num frames 72900... +[2024-11-07 16:52:36,674][14395] Num frames 73000... +[2024-11-07 16:52:36,978][14395] Avg episode rewards: #0: 4.426, true rewards: #0: 4.016 +[2024-11-07 16:52:36,983][14395] Avg episode reward: 4.426, avg true_objective: 4.016 +[2024-11-07 16:52:37,036][14395] Num frames 73100... +[2024-11-07 16:52:37,327][14395] Num frames 73200... +[2024-11-07 16:52:37,618][14395] Num frames 73300... +[2024-11-07 16:52:37,906][14395] Num frames 73400... +[2024-11-07 16:52:38,179][14395] Avg episode rewards: #0: 4.426, true rewards: #0: 4.016 +[2024-11-07 16:52:38,181][14395] Avg episode reward: 4.426, avg true_objective: 4.016 +[2024-11-07 16:52:38,254][14395] Num frames 73500... +[2024-11-07 16:52:38,543][14395] Num frames 73600... +[2024-11-07 16:52:38,826][14395] Num frames 73700... +[2024-11-07 16:52:39,148][14395] Num frames 73800... +[2024-11-07 16:52:39,383][14395] Avg episode rewards: #0: 4.426, true rewards: #0: 4.016 +[2024-11-07 16:52:39,384][14395] Avg episode reward: 4.426, avg true_objective: 4.016 +[2024-11-07 16:52:39,540][14395] Num frames 73900... +[2024-11-07 16:52:39,794][14395] Num frames 74000... +[2024-11-07 16:52:40,005][14395] Num frames 74100... +[2024-11-07 16:52:40,785][14395] Num frames 74200... +[2024-11-07 16:52:40,971][14395] Avg episode rewards: #0: 4.410, true rewards: #0: 4.010 +[2024-11-07 16:52:40,974][14395] Avg episode reward: 4.410, avg true_objective: 4.010 +[2024-11-07 16:52:41,185][14395] Num frames 74300... +[2024-11-07 16:52:41,490][14395] Num frames 74400... +[2024-11-07 16:52:41,743][14395] Num frames 74500... +[2024-11-07 16:52:42,004][14395] Num frames 74600... +[2024-11-07 16:52:42,228][14395] Num frames 74700... +[2024-11-07 16:52:42,458][14395] Num frames 74800... +[2024-11-07 16:52:42,553][14395] Avg episode rewards: #0: 4.442, true rewards: #0: 4.022 +[2024-11-07 16:52:42,558][14395] Avg episode reward: 4.442, avg true_objective: 4.022 +[2024-11-07 16:52:42,805][14395] Num frames 74900... +[2024-11-07 16:52:43,106][14395] Num frames 75000... +[2024-11-07 16:52:43,376][14395] Num frames 75100... +[2024-11-07 16:52:43,665][14395] Num frames 75200... +[2024-11-07 16:52:43,802][14395] Avg episode rewards: #0: 4.456, true rewards: #0: 4.026 +[2024-11-07 16:52:43,803][14395] Avg episode reward: 4.456, avg true_objective: 4.026 +[2024-11-07 16:52:43,991][14395] Num frames 75300... +[2024-11-07 16:52:44,254][14395] Num frames 75400... +[2024-11-07 16:52:44,618][14395] Num frames 75500... +[2024-11-07 16:52:44,921][14395] Num frames 75600... +[2024-11-07 16:52:45,239][14395] Num frames 75700... +[2024-11-07 16:52:45,519][14395] Avg episode rewards: #0: 4.492, true rewards: #0: 4.042 +[2024-11-07 16:52:45,521][14395] Avg episode reward: 4.492, avg true_objective: 4.042 +[2024-11-07 16:52:45,585][14395] Num frames 75800... +[2024-11-07 16:52:45,955][14395] Num frames 75900... +[2024-11-07 16:52:46,264][14395] Num frames 76000... +[2024-11-07 16:52:48,565][14395] Num frames 76100... +[2024-11-07 16:52:48,788][14395] Avg episode rewards: #0: 4.475, true rewards: #0: 4.035 +[2024-11-07 16:52:48,793][14395] Avg episode reward: 4.475, avg true_objective: 4.035 +[2024-11-07 16:52:48,919][14395] Num frames 76200... +[2024-11-07 16:52:49,210][14395] Num frames 76300... +[2024-11-07 16:52:49,520][14395] Num frames 76400... +[2024-11-07 16:52:49,811][14395] Num frames 76500... +[2024-11-07 16:52:50,107][14395] Num frames 76600... +[2024-11-07 16:52:50,396][14395] Num frames 76700... +[2024-11-07 16:52:50,642][14395] Avg episode rewards: #0: 4.528, true rewards: #0: 4.058 +[2024-11-07 16:52:50,643][14395] Avg episode reward: 4.528, avg true_objective: 4.058 +[2024-11-07 16:52:50,752][14395] Num frames 76800... +[2024-11-07 16:52:51,053][14395] Num frames 76900... +[2024-11-07 16:52:51,350][14395] Num frames 77000... +[2024-11-07 16:52:51,710][14395] Num frames 77100... +[2024-11-07 16:52:52,022][14395] Num frames 77200... +[2024-11-07 16:52:52,271][14395] Num frames 77300... +[2024-11-07 16:52:52,358][14395] Avg episode rewards: #0: 4.564, true rewards: #0: 4.074 +[2024-11-07 16:52:52,361][14395] Avg episode reward: 4.564, avg true_objective: 4.074 +[2024-11-07 16:52:52,606][14395] Num frames 77400... +[2024-11-07 16:52:52,840][14395] Num frames 77500... +[2024-11-07 16:52:53,082][14395] Num frames 77600... +[2024-11-07 16:52:53,319][14395] Num frames 77700... +[2024-11-07 16:52:53,438][14395] Avg episode rewards: #0: 4.577, true rewards: #0: 4.077 +[2024-11-07 16:52:53,441][14395] Avg episode reward: 4.577, avg true_objective: 4.077 +[2024-11-07 16:52:53,643][14395] Num frames 77800... +[2024-11-07 16:52:53,896][14395] Num frames 77900... +[2024-11-07 16:52:54,139][14395] Num frames 78000... +[2024-11-07 16:52:54,386][14395] Num frames 78100... +[2024-11-07 16:52:54,556][14395] Avg episode rewards: #0: 4.600, true rewards: #0: 4.080 +[2024-11-07 16:52:54,558][14395] Avg episode reward: 4.600, avg true_objective: 4.080 +[2024-11-07 16:52:54,695][14395] Num frames 78200... +[2024-11-07 16:52:54,948][14395] Num frames 78300... +[2024-11-07 16:52:55,178][14395] Num frames 78400... +[2024-11-07 16:52:55,419][14395] Num frames 78500... +[2024-11-07 16:52:55,715][14395] Avg episode rewards: #0: 4.616, true rewards: #0: 4.086 +[2024-11-07 16:52:55,718][14395] Avg episode reward: 4.616, avg true_objective: 4.086 +[2024-11-07 16:52:55,758][14395] Num frames 78600... +[2024-11-07 16:52:55,997][14395] Num frames 78700... +[2024-11-07 16:52:56,232][14395] Num frames 78800... +[2024-11-07 16:52:56,463][14395] Num frames 78900... +[2024-11-07 16:52:56,697][14395] Num frames 79000... +[2024-11-07 16:52:56,852][14395] Avg episode rewards: #0: 4.620, true rewards: #0: 4.090 +[2024-11-07 16:52:56,854][14395] Avg episode reward: 4.620, avg true_objective: 4.090 +[2024-11-07 16:52:57,010][14395] Num frames 79100... +[2024-11-07 16:52:57,251][14395] Num frames 79200... +[2024-11-07 16:52:57,497][14395] Num frames 79300... +[2024-11-07 16:52:57,733][14395] Num frames 79400... +[2024-11-07 16:52:57,872][14395] Avg episode rewards: #0: 4.620, true rewards: #0: 4.090 +[2024-11-07 16:52:57,877][14395] Avg episode reward: 4.620, avg true_objective: 4.090 +[2024-11-07 16:52:58,105][14395] Num frames 79500... +[2024-11-07 16:52:58,354][14395] Num frames 79600... +[2024-11-07 16:52:58,609][14395] Num frames 79700... +[2024-11-07 16:52:58,851][14395] Num frames 79800... +[2024-11-07 16:52:59,079][14395] Avg episode rewards: #0: 4.620, true rewards: #0: 4.090 +[2024-11-07 16:52:59,084][14395] Avg episode reward: 4.620, avg true_objective: 4.090 +[2024-11-07 16:52:59,160][14395] Num frames 79900... +[2024-11-07 16:52:59,409][14395] Num frames 80000... +[2024-11-07 16:52:59,659][14395] Num frames 80100... +[2024-11-07 16:52:59,889][14395] Num frames 80200... +[2024-11-07 16:53:00,125][14395] Num frames 80300... +[2024-11-07 16:53:00,218][14395] Avg episode rewards: #0: 4.623, true rewards: #0: 4.093 +[2024-11-07 16:53:00,220][14395] Avg episode reward: 4.623, avg true_objective: 4.093 +[2024-11-07 16:53:00,413][14395] Num frames 80400... +[2024-11-07 16:53:00,769][14395] Num frames 80500... +[2024-11-07 16:53:01,039][14395] Num frames 80600... +[2024-11-07 16:53:01,433][14395] Num frames 80700... +[2024-11-07 16:53:01,505][14395] Avg episode rewards: #0: 4.606, true rewards: #0: 4.086 +[2024-11-07 16:53:01,506][14395] Avg episode reward: 4.606, avg true_objective: 4.086 +[2024-11-07 16:53:01,815][14395] Num frames 80800... +[2024-11-07 16:53:02,139][14395] Num frames 80900... +[2024-11-07 16:53:02,381][14395] Num frames 81000... +[2024-11-07 16:53:02,640][14395] Num frames 81100... +[2024-11-07 16:53:02,828][14395] Avg episode rewards: #0: 4.636, true rewards: #0: 4.106 +[2024-11-07 16:53:02,832][14395] Avg episode reward: 4.636, avg true_objective: 4.106 +[2024-11-07 16:53:02,978][14395] Num frames 81200... +[2024-11-07 16:53:03,224][14395] Num frames 81300... +[2024-11-07 16:53:03,473][14395] Num frames 81400... +[2024-11-07 16:53:03,577][14395] Avg episode rewards: #0: 4.594, true rewards: #0: 4.084 +[2024-11-07 16:53:03,583][14395] Avg episode reward: 4.594, avg true_objective: 4.084 +[2024-11-07 16:53:03,796][14395] Num frames 81500... +[2024-11-07 16:53:04,031][14395] Num frames 81600... +[2024-11-07 16:53:04,270][14395] Num frames 81700... +[2024-11-07 16:53:04,526][14395] Num frames 81800... +[2024-11-07 16:53:04,674][14395] Avg episode rewards: #0: 4.607, true rewards: #0: 4.087 +[2024-11-07 16:53:04,679][14395] Avg episode reward: 4.607, avg true_objective: 4.087 +[2024-11-07 16:53:04,863][14395] Num frames 81900... +[2024-11-07 16:53:05,124][14395] Num frames 82000... +[2024-11-07 16:53:05,364][14395] Num frames 82100... +[2024-11-07 16:53:05,602][14395] Num frames 82200... +[2024-11-07 16:53:05,699][14395] Avg episode rewards: #0: 4.607, true rewards: #0: 4.087 +[2024-11-07 16:53:05,703][14395] Avg episode reward: 4.607, avg true_objective: 4.087 +[2024-11-07 16:53:05,870][14395] Num frames 82300... +[2024-11-07 16:53:06,111][14395] Num frames 82400... +[2024-11-07 16:53:06,342][14395] Num frames 82500... +[2024-11-07 16:53:06,586][14395] Num frames 82600... +[2024-11-07 16:53:06,798][14395] Avg episode rewards: #0: 4.611, true rewards: #0: 4.091 +[2024-11-07 16:53:06,802][14395] Avg episode reward: 4.611, avg true_objective: 4.091 +[2024-11-07 16:53:06,894][14395] Num frames 82700... +[2024-11-07 16:53:07,150][14395] Num frames 82800... +[2024-11-07 16:53:07,499][14395] Num frames 82900... +[2024-11-07 16:53:07,753][14395] Num frames 83000... +[2024-11-07 16:53:07,940][14395] Avg episode rewards: #0: 4.611, true rewards: #0: 4.091 +[2024-11-07 16:53:07,941][14395] Avg episode reward: 4.611, avg true_objective: 4.091 +[2024-11-07 16:53:08,095][14395] Num frames 83100... +[2024-11-07 16:53:08,385][14395] Num frames 83200... +[2024-11-07 16:53:08,623][14395] Num frames 83300... +[2024-11-07 16:53:08,899][14395] Num frames 83400... +[2024-11-07 16:53:09,138][14395] Avg episode rewards: #0: 4.571, true rewards: #0: 4.071 +[2024-11-07 16:53:09,142][14395] Avg episode reward: 4.571, avg true_objective: 4.071 +[2024-11-07 16:53:09,242][14395] Num frames 83500... +[2024-11-07 16:53:09,492][14395] Num frames 83600... +[2024-11-07 16:53:09,736][14395] Num frames 83700... +[2024-11-07 16:53:09,989][14395] Num frames 83800... +[2024-11-07 16:53:10,269][14395] Avg episode rewards: #0: 4.585, true rewards: #0: 4.075 +[2024-11-07 16:53:10,270][14395] Avg episode reward: 4.585, avg true_objective: 4.075 +[2024-11-07 16:53:10,320][14395] Num frames 83900... +[2024-11-07 16:53:10,600][14395] Num frames 84000... +[2024-11-07 16:53:10,885][14395] Num frames 84100... +[2024-11-07 16:53:11,171][14395] Num frames 84200... +[2024-11-07 16:53:11,428][14395] Avg episode rewards: #0: 4.585, true rewards: #0: 4.075 +[2024-11-07 16:53:11,430][14395] Avg episode reward: 4.585, avg true_objective: 4.075 +[2024-11-07 16:53:11,519][14395] Num frames 84300... +[2024-11-07 16:53:11,807][14395] Num frames 84400... +[2024-11-07 16:53:12,114][14395] Num frames 84500... +[2024-11-07 16:53:12,422][14395] Num frames 84600... +[2024-11-07 16:53:12,623][14395] Avg episode rewards: #0: 4.597, true rewards: #0: 4.087 +[2024-11-07 16:53:12,624][14395] Avg episode reward: 4.597, avg true_objective: 4.087 +[2024-11-07 16:53:12,745][14395] Num frames 84700... +[2024-11-07 16:53:13,023][14395] Num frames 84800... +[2024-11-07 16:53:13,372][14395] Num frames 84900... +[2024-11-07 16:53:13,714][14395] Num frames 85000... +[2024-11-07 16:53:13,885][14395] Avg episode rewards: #0: 4.597, true rewards: #0: 4.087 +[2024-11-07 16:53:13,886][14395] Avg episode reward: 4.597, avg true_objective: 4.087 +[2024-11-07 16:53:14,058][14395] Num frames 85100... +[2024-11-07 16:53:14,315][14395] Num frames 85200... +[2024-11-07 16:53:14,631][14395] Avg episode rewards: #0: 4.585, true rewards: #0: 4.075 +[2024-11-07 16:53:14,632][14395] Avg episode reward: 4.585, avg true_objective: 4.075 +[2024-11-07 16:53:14,659][14395] Num frames 85300... +[2024-11-07 16:53:14,931][14395] Num frames 85400... +[2024-11-07 16:53:15,253][14395] Num frames 85500... +[2024-11-07 16:53:15,505][14395] Num frames 85600... +[2024-11-07 16:53:15,900][14395] Avg episode rewards: #0: 4.568, true rewards: #0: 4.068 +[2024-11-07 16:53:15,902][14395] Avg episode reward: 4.568, avg true_objective: 4.068 +[2024-11-07 16:53:16,082][14395] Num frames 85700... +[2024-11-07 16:53:16,926][14395] Num frames 85800... +[2024-11-07 16:53:17,377][14395] Num frames 85900... +[2024-11-07 16:53:17,611][14395] Num frames 86000... +[2024-11-07 16:53:17,716][14395] Avg episode rewards: #0: 4.559, true rewards: #0: 4.059 +[2024-11-07 16:53:17,721][14395] Avg episode reward: 4.559, avg true_objective: 4.059 +[2024-11-07 16:53:17,894][14395] Num frames 86100... +[2024-11-07 16:53:18,234][14395] Num frames 86200... +[2024-11-07 16:53:18,487][14395] Num frames 86300... +[2024-11-07 16:53:18,741][14395] Num frames 86400... +[2024-11-07 16:53:18,827][14395] Avg episode rewards: #0: 4.559, true rewards: #0: 4.059 +[2024-11-07 16:53:18,828][14395] Avg episode reward: 4.559, avg true_objective: 4.059 +[2024-11-07 16:53:19,094][14395] Num frames 86500... +[2024-11-07 16:53:19,350][14395] Num frames 86600... +[2024-11-07 16:53:19,575][14395] Num frames 86700... +[2024-11-07 16:53:19,948][14395] Avg episode rewards: #0: 4.559, true rewards: #0: 4.059 +[2024-11-07 16:53:19,949][14395] Avg episode reward: 4.559, avg true_objective: 4.059 +[2024-11-07 16:53:19,969][14395] Num frames 86800... +[2024-11-07 16:53:20,265][14395] Num frames 86900... +[2024-11-07 16:53:20,532][14395] Num frames 87000... +[2024-11-07 16:53:22,869][14395] Num frames 87100... +[2024-11-07 16:53:23,154][14395] Num frames 87200... +[2024-11-07 16:53:23,324][14395] Avg episode rewards: #0: 4.575, true rewards: #0: 4.065 +[2024-11-07 16:53:23,327][14395] Avg episode reward: 4.575, avg true_objective: 4.065 +[2024-11-07 16:53:23,482][14395] Num frames 87300... +[2024-11-07 16:53:23,784][14395] Num frames 87400... +[2024-11-07 16:53:24,057][14395] Num frames 87500... +[2024-11-07 16:53:24,273][14395] Num frames 87600... +[2024-11-07 16:53:24,527][14395] Num frames 87700... +[2024-11-07 16:53:24,810][14395] Avg episode rewards: #0: 4.611, true rewards: #0: 4.081 +[2024-11-07 16:53:24,813][14395] Avg episode reward: 4.611, avg true_objective: 4.081 +[2024-11-07 16:53:24,876][14395] Num frames 87800... +[2024-11-07 16:53:25,156][14395] Num frames 87900... +[2024-11-07 16:53:25,453][14395] Num frames 88000... +[2024-11-07 16:53:25,719][14395] Num frames 88100... +[2024-11-07 16:53:25,954][14395] Avg episode rewards: #0: 4.611, true rewards: #0: 4.081 +[2024-11-07 16:53:25,957][14395] Avg episode reward: 4.611, avg true_objective: 4.081 +[2024-11-07 16:53:26,054][14395] Num frames 88200... +[2024-11-07 16:53:26,314][14395] Num frames 88300... +[2024-11-07 16:53:26,611][14395] Num frames 88400... +[2024-11-07 16:53:26,894][14395] Num frames 88500... +[2024-11-07 16:53:27,108][14395] Avg episode rewards: #0: 4.611, true rewards: #0: 4.081 +[2024-11-07 16:53:27,112][14395] Avg episode reward: 4.611, avg true_objective: 4.081 +[2024-11-07 16:53:27,256][14395] Num frames 88600... +[2024-11-07 16:53:27,531][14395] Num frames 88700... +[2024-11-07 16:53:27,803][14395] Num frames 88800... +[2024-11-07 16:53:28,044][14395] Num frames 88900... +[2024-11-07 16:53:28,106][14395] Avg episode rewards: #0: 4.608, true rewards: #0: 4.077 +[2024-11-07 16:53:28,107][14395] Avg episode reward: 4.608, avg true_objective: 4.077 +[2024-11-07 16:53:28,403][14395] Num frames 89000... +[2024-11-07 16:53:28,693][14395] Num frames 89100... +[2024-11-07 16:53:28,966][14395] Num frames 89200... +[2024-11-07 16:53:29,220][14395] Num frames 89300... +[2024-11-07 16:53:29,412][14395] Avg episode rewards: #0: 4.624, true rewards: #0: 4.084 +[2024-11-07 16:53:29,420][14395] Avg episode reward: 4.624, avg true_objective: 4.084 +[2024-11-07 16:53:29,557][14395] Num frames 89400... +[2024-11-07 16:53:29,832][14395] Num frames 89500... +[2024-11-07 16:53:30,094][14395] Num frames 89600... +[2024-11-07 16:53:30,342][14395] Num frames 89700... +[2024-11-07 16:53:30,619][14395] Num frames 89800... +[2024-11-07 16:53:30,765][14395] Avg episode rewards: #0: 4.644, true rewards: #0: 4.093 +[2024-11-07 16:53:30,770][14395] Avg episode reward: 4.644, avg true_objective: 4.093 +[2024-11-07 16:53:30,985][14395] Num frames 89900... +[2024-11-07 16:53:31,267][14395] Num frames 90000... +[2024-11-07 16:53:31,549][14395] Num frames 90100... +[2024-11-07 16:53:31,847][14395] Num frames 90200... +[2024-11-07 16:53:32,124][14395] Avg episode rewards: #0: 4.647, true rewards: #0: 4.097 +[2024-11-07 16:53:32,131][14395] Avg episode reward: 4.647, avg true_objective: 4.097 +[2024-11-07 16:53:32,202][14395] Num frames 90300... +[2024-11-07 16:53:32,497][14395] Num frames 90400... +[2024-11-07 16:53:32,758][14395] Num frames 90500... +[2024-11-07 16:53:33,014][14395] Num frames 90600... +[2024-11-07 16:53:33,311][14395] Avg episode rewards: #0: 4.660, true rewards: #0: 4.100 +[2024-11-07 16:53:33,313][14395] Avg episode reward: 4.660, avg true_objective: 4.100 +[2024-11-07 16:53:33,329][14395] Num frames 90700... +[2024-11-07 16:53:33,609][14395] Num frames 90800... +[2024-11-07 16:53:33,893][14395] Num frames 90900... +[2024-11-07 16:53:34,158][14395] Num frames 91000... +[2024-11-07 16:53:34,437][14395] Num frames 91100... +[2024-11-07 16:53:34,618][14395] Avg episode rewards: #0: 4.676, true rewards: #0: 4.106 +[2024-11-07 16:53:34,623][14395] Avg episode reward: 4.676, avg true_objective: 4.106 +[2024-11-07 16:53:34,802][14395] Num frames 91200... +[2024-11-07 16:53:35,079][14395] Num frames 91300... +[2024-11-07 16:53:35,360][14395] Num frames 91400... +[2024-11-07 16:53:35,644][14395] Num frames 91500... +[2024-11-07 16:53:35,788][14395] Avg episode rewards: #0: 4.660, true rewards: #0: 4.100 +[2024-11-07 16:53:35,792][14395] Avg episode reward: 4.660, avg true_objective: 4.100 +[2024-11-07 16:53:36,033][14395] Num frames 91600... +[2024-11-07 16:53:36,318][14395] Num frames 91700... +[2024-11-07 16:53:36,612][14395] Avg episode rewards: #0: 4.631, true rewards: #0: 4.081 +[2024-11-07 16:53:36,618][14395] Avg episode reward: 4.631, avg true_objective: 4.081 +[2024-11-07 16:53:36,682][14395] Num frames 91800... +[2024-11-07 16:53:36,939][14395] Num frames 91900... +[2024-11-07 16:53:37,220][14395] Num frames 92000... +[2024-11-07 16:53:37,540][14395] Num frames 92100... +[2024-11-07 16:53:37,823][14395] Num frames 92200... +[2024-11-07 16:53:37,974][14395] Avg episode rewards: #0: 4.647, true rewards: #0: 4.087 +[2024-11-07 16:53:37,976][14395] Avg episode reward: 4.647, avg true_objective: 4.087 +[2024-11-07 16:53:38,165][14395] Num frames 92300... +[2024-11-07 16:53:38,457][14395] Num frames 92400... +[2024-11-07 16:53:38,719][14395] Num frames 92500... +[2024-11-07 16:53:38,998][14395] Num frames 92600... +[2024-11-07 16:53:39,091][14395] Avg episode rewards: #0: 4.647, true rewards: #0: 4.087 +[2024-11-07 16:53:39,093][14395] Avg episode reward: 4.647, avg true_objective: 4.087 +[2024-11-07 16:53:39,339][14395] Num frames 92700... +[2024-11-07 16:53:39,641][14395] Num frames 92800... +[2024-11-07 16:53:39,954][14395] Num frames 92900... +[2024-11-07 16:53:40,299][14395] Avg episode rewards: #0: 4.640, true rewards: #0: 4.090 +[2024-11-07 16:53:40,304][14395] Avg episode reward: 4.640, avg true_objective: 4.090 +[2024-11-07 16:53:40,319][14395] Num frames 93000... +[2024-11-07 16:53:40,631][14395] Num frames 93100... +[2024-11-07 16:53:40,937][14395] Num frames 93200... +[2024-11-07 16:53:41,217][14395] Num frames 93300... +[2024-11-07 16:53:41,506][14395] Num frames 93400... +[2024-11-07 16:53:41,694][14395] Avg episode rewards: #0: 4.657, true rewards: #0: 4.097 +[2024-11-07 16:53:41,699][14395] Avg episode reward: 4.657, avg true_objective: 4.097 +[2024-11-07 16:53:41,925][14395] Num frames 93500... +[2024-11-07 16:53:42,238][14395] Num frames 93600... +[2024-11-07 16:53:42,750][14395] Num frames 93700... +[2024-11-07 16:53:43,064][14395] Num frames 93800... +[2024-11-07 16:53:43,213][14395] Avg episode rewards: #0: 4.657, true rewards: #0: 4.097 +[2024-11-07 16:53:43,215][14395] Avg episode reward: 4.657, avg true_objective: 4.097 +[2024-11-07 16:53:43,518][14395] Num frames 93900... +[2024-11-07 16:53:43,906][14395] Num frames 94000... +[2024-11-07 16:53:44,237][14395] Num frames 94100... +[2024-11-07 16:53:44,525][14395] Num frames 94200... +[2024-11-07 16:53:44,633][14395] Avg episode rewards: #0: 4.640, true rewards: #0: 4.090 +[2024-11-07 16:53:44,634][14395] Avg episode reward: 4.640, avg true_objective: 4.090 +[2024-11-07 16:53:44,853][14395] Num frames 94300... +[2024-11-07 16:53:45,052][14395] Num frames 94400... +[2024-11-07 16:53:45,301][14395] Num frames 94500... +[2024-11-07 16:53:45,541][14395] Num frames 94600... +[2024-11-07 16:53:45,755][14395] Avg episode rewards: #0: 4.657, true rewards: #0: 4.097 +[2024-11-07 16:53:45,757][14395] Avg episode reward: 4.657, avg true_objective: 4.097 +[2024-11-07 16:53:45,886][14395] Num frames 94700... +[2024-11-07 16:53:46,227][14395] Num frames 94800... +[2024-11-07 16:53:46,506][14395] Num frames 94900... +[2024-11-07 16:53:46,890][14395] Num frames 95000... +[2024-11-07 16:53:47,094][14395] Avg episode rewards: #0: 4.640, true rewards: #0: 4.090 +[2024-11-07 16:53:47,098][14395] Avg episode reward: 4.640, avg true_objective: 4.090 +[2024-11-07 16:53:47,260][14395] Num frames 95100... +[2024-11-07 16:53:47,526][14395] Num frames 95200... +[2024-11-07 16:53:47,805][14395] Num frames 95300... +[2024-11-07 16:53:48,070][14395] Num frames 95400... +[2024-11-07 16:53:48,206][14395] Avg episode rewards: #0: 4.640, true rewards: #0: 4.100 +[2024-11-07 16:53:48,210][14395] Avg episode reward: 4.640, avg true_objective: 4.100 +[2024-11-07 16:53:48,412][14395] Num frames 95500... +[2024-11-07 16:53:48,663][14395] Num frames 95600... +[2024-11-07 16:53:48,928][14395] Num frames 95700... +[2024-11-07 16:53:49,167][14395] Num frames 95800... +[2024-11-07 16:53:49,551][14395] Num frames 95900... +[2024-11-07 16:53:49,622][14395] Avg episode rewards: #0: 4.653, true rewards: #0: 4.102 +[2024-11-07 16:53:49,626][14395] Avg episode reward: 4.653, avg true_objective: 4.102 +[2024-11-07 16:53:49,893][14395] Num frames 96000... +[2024-11-07 16:53:50,159][14395] Num frames 96100... +[2024-11-07 16:53:50,409][14395] Num frames 96200... +[2024-11-07 16:53:50,649][14395] Avg episode rewards: #0: 4.653, true rewards: #0: 4.102 +[2024-11-07 16:53:50,650][14395] Avg episode reward: 4.653, avg true_objective: 4.102 +[2024-11-07 16:53:50,674][14395] Num frames 96300... +[2024-11-07 16:53:50,877][14395] Num frames 96400... +[2024-11-07 16:53:51,078][14395] Num frames 96500... +[2024-11-07 16:53:51,284][14395] Num frames 96600... +[2024-11-07 16:53:51,497][14395] Avg episode rewards: #0: 4.587, true rewards: #0: 4.077 +[2024-11-07 16:53:51,501][14395] Avg episode reward: 4.587, avg true_objective: 4.077 +[2024-11-07 16:53:51,572][14395] Num frames 96700... +[2024-11-07 16:53:51,801][14395] Num frames 96800... +[2024-11-07 16:53:52,011][14395] Num frames 96900... +[2024-11-07 16:53:52,230][14395] Num frames 97000... +[2024-11-07 16:53:52,428][14395] Avg episode rewards: #0: 4.587, true rewards: #0: 4.077 +[2024-11-07 16:53:52,429][14395] Avg episode reward: 4.587, avg true_objective: 4.077 +[2024-11-07 16:53:52,514][14395] Num frames 97100... +[2024-11-07 16:53:52,730][14395] Num frames 97200... +[2024-11-07 16:53:52,989][14395] Num frames 97300... +[2024-11-07 16:53:53,197][14395] Num frames 97400... +[2024-11-07 16:53:53,399][14395] Num frames 97500... +[2024-11-07 16:53:53,468][14395] Avg episode rewards: #0: 4.603, true rewards: #0: 4.083 +[2024-11-07 16:53:53,473][14395] Avg episode reward: 4.603, avg true_objective: 4.083 +[2024-11-07 16:53:53,689][14395] Num frames 97600... +[2024-11-07 16:53:53,891][14395] Num frames 97700... +[2024-11-07 16:53:54,082][14395] Num frames 97800... +[2024-11-07 16:53:54,367][14395] Avg episode rewards: #0: 4.590, true rewards: #0: 4.080 +[2024-11-07 16:53:54,368][14395] Avg episode reward: 4.590, avg true_objective: 4.080 +[2024-11-07 16:53:54,398][14395] Num frames 97900... +[2024-11-07 16:53:54,685][14395] Num frames 98000... +[2024-11-07 16:53:54,944][14395] Num frames 98100... +[2024-11-07 16:53:57,242][14395] Num frames 98200... +[2024-11-07 16:53:57,491][14395] Avg episode rewards: #0: 4.574, true rewards: #0: 4.074 +[2024-11-07 16:53:57,493][14395] Avg episode reward: 4.574, avg true_objective: 4.074 +[2024-11-07 16:53:57,575][14395] Num frames 98300... +[2024-11-07 16:53:57,842][14395] Num frames 98400... +[2024-11-07 16:53:58,072][14395] Num frames 98500... +[2024-11-07 16:53:58,312][14395] Num frames 98600... +[2024-11-07 16:53:58,512][14395] Avg episode rewards: #0: 4.574, true rewards: #0: 4.074 +[2024-11-07 16:53:58,514][14395] Avg episode reward: 4.574, avg true_objective: 4.074 +[2024-11-07 16:53:58,599][14395] Num frames 98700... +[2024-11-07 16:53:58,826][14395] Num frames 98800... +[2024-11-07 16:53:59,015][14395] Num frames 98900... +[2024-11-07 16:53:59,209][14395] Num frames 99000... +[2024-11-07 16:53:59,377][14395] Avg episode rewards: #0: 4.574, true rewards: #0: 4.074 +[2024-11-07 16:53:59,379][14395] Avg episode reward: 4.574, avg true_objective: 4.074 +[2024-11-07 16:53:59,500][14395] Num frames 99100... +[2024-11-07 16:53:59,704][14395] Num frames 99200... +[2024-11-07 16:53:59,941][14395] Num frames 99300... +[2024-11-07 16:54:00,149][14395] Num frames 99400... +[2024-11-07 16:54:00,258][14395] Avg episode rewards: #0: 4.557, true rewards: #0: 4.067 +[2024-11-07 16:54:00,259][14395] Avg episode reward: 4.557, avg true_objective: 4.067 +[2024-11-07 16:54:00,417][14395] Num frames 99500... +[2024-11-07 16:54:00,634][14395] Num frames 99600... +[2024-11-07 16:54:00,831][14395] Num frames 99700... +[2024-11-07 16:54:01,048][14395] Num frames 99800... +[2024-11-07 16:54:01,129][14395] Avg episode rewards: #0: 4.541, true rewards: #0: 4.061 +[2024-11-07 16:54:01,131][14395] Avg episode reward: 4.541, avg true_objective: 4.061 +[2024-11-07 16:54:01,313][14395] Num frames 99900... +[2024-11-07 16:54:01,510][14395] Num frames 100000... +[2024-11-07 16:54:01,701][14395] Num frames 100100... +[2024-11-07 16:54:01,946][14395] Avg episode rewards: #0: 4.541, true rewards: #0: 4.061 +[2024-11-07 16:54:01,947][14395] Avg episode reward: 4.541, avg true_objective: 4.061 +[2024-11-07 16:54:01,961][14395] Num frames 100200... +[2024-11-07 16:54:02,157][14395] Num frames 100300... +[2024-11-07 16:54:02,475][14395] Num frames 100400... +[2024-11-07 16:54:02,788][14395] Num frames 100500... +[2024-11-07 16:54:03,112][14395] Avg episode rewards: #0: 4.541, true rewards: #0: 4.061 +[2024-11-07 16:54:03,113][14395] Avg episode reward: 4.541, avg true_objective: 4.061 +[2024-11-07 16:54:03,231][14395] Num frames 100600... +[2024-11-07 16:54:03,495][14395] Num frames 100700... +[2024-11-07 16:54:03,720][14395] Num frames 100800... +[2024-11-07 16:54:03,911][14395] Num frames 100900... +[2024-11-07 16:54:04,142][14395] Avg episode rewards: #0: 4.567, true rewards: #0: 4.077 +[2024-11-07 16:54:04,145][14395] Avg episode reward: 4.567, avg true_objective: 4.077 +[2024-11-07 16:54:04,177][14395] Num frames 101000... +[2024-11-07 16:54:04,373][14395] Num frames 101100... +[2024-11-07 16:54:04,556][14395] Num frames 101200... +[2024-11-07 16:54:04,748][14395] Num frames 101300... +[2024-11-07 16:54:04,955][14395] Num frames 101400... +[2024-11-07 16:54:05,094][14395] Avg episode rewards: #0: 4.583, true rewards: #0: 4.083 +[2024-11-07 16:54:05,099][14395] Avg episode reward: 4.583, avg true_objective: 4.083 +[2024-11-07 16:54:05,246][14395] Num frames 101500... +[2024-11-07 16:54:05,435][14395] Num frames 101600... +[2024-11-07 16:54:05,618][14395] Num frames 101700... +[2024-11-07 16:54:05,806][14395] Num frames 101800... +[2024-11-07 16:54:05,909][14395] Avg episode rewards: #0: 4.567, true rewards: #0: 4.077 +[2024-11-07 16:54:05,911][14395] Avg episode reward: 4.567, avg true_objective: 4.077 +[2024-11-07 16:54:06,061][14395] Num frames 101900... +[2024-11-07 16:54:06,249][14395] Num frames 102000... +[2024-11-07 16:54:06,452][14395] Num frames 102100... +[2024-11-07 16:54:06,636][14395] Num frames 102200... +[2024-11-07 16:54:06,825][14395] Num frames 102300... +[2024-11-07 16:54:07,032][14395] Avg episode rewards: #0: 4.603, true rewards: #0: 4.093 +[2024-11-07 16:54:07,034][14395] Avg episode reward: 4.603, avg true_objective: 4.093 +[2024-11-07 16:54:07,118][14395] Num frames 102400... +[2024-11-07 16:54:07,348][14395] Num frames 102500... +[2024-11-07 16:54:07,540][14395] Num frames 102600... +[2024-11-07 16:54:07,726][14395] Num frames 102700... +[2024-11-07 16:54:07,885][14395] Avg episode rewards: #0: 4.603, true rewards: #0: 4.093 +[2024-11-07 16:54:07,890][14395] Avg episode reward: 4.603, avg true_objective: 4.093 +[2024-11-07 16:54:07,994][14395] Num frames 102800... +[2024-11-07 16:54:08,202][14395] Num frames 102900... +[2024-11-07 16:54:08,389][14395] Num frames 103000... +[2024-11-07 16:54:08,571][14395] Num frames 103100... +[2024-11-07 16:54:08,766][14395] Num frames 103200... +[2024-11-07 16:54:08,825][14395] Avg episode rewards: #0: 4.603, true rewards: #0: 4.093 +[2024-11-07 16:54:08,830][14395] Avg episode reward: 4.603, avg true_objective: 4.093 +[2024-11-07 16:54:09,076][14395] Num frames 103300... +[2024-11-07 16:54:09,316][14395] Num frames 103400... +[2024-11-07 16:54:09,537][14395] Num frames 103500... +[2024-11-07 16:54:09,762][14395] Num frames 103600... +[2024-11-07 16:54:09,863][14395] Avg episode rewards: #0: 4.606, true rewards: #0: 4.096 +[2024-11-07 16:54:09,868][14395] Avg episode reward: 4.606, avg true_objective: 4.096 +[2024-11-07 16:54:10,068][14395] Num frames 103700... +[2024-11-07 16:54:10,300][14395] Num frames 103800... +[2024-11-07 16:54:10,919][14395] Num frames 103900... +[2024-11-07 16:54:11,144][14395] Num frames 104000... +[2024-11-07 16:54:12,461][14395] Avg episode rewards: #0: 4.606, true rewards: #0: 4.096 +[2024-11-07 16:54:12,465][14395] Avg episode reward: 4.606, avg true_objective: 4.096 +[2024-11-07 16:54:13,690][14395] Num frames 104100... +[2024-11-07 16:54:13,912][14395] Num frames 104200... +[2024-11-07 16:54:14,125][14395] Num frames 104300... +[2024-11-07 16:54:14,376][14395] Avg episode rewards: #0: 4.606, true rewards: #0: 4.096 +[2024-11-07 16:54:14,382][14395] Avg episode reward: 4.606, avg true_objective: 4.096 +[2024-11-07 16:54:14,439][14395] Num frames 104400... +[2024-11-07 16:54:14,662][14395] Num frames 104500... +[2024-11-07 16:54:14,876][14395] Num frames 104600... +[2024-11-07 16:54:15,117][14395] Num frames 104700... +[2024-11-07 16:54:15,380][14395] Avg episode rewards: #0: 4.606, true rewards: #0: 4.096 +[2024-11-07 16:54:15,382][14395] Avg episode reward: 4.606, avg true_objective: 4.096 +[2024-11-07 16:54:15,454][14395] Num frames 104800... +[2024-11-07 16:54:15,708][14395] Num frames 104900... +[2024-11-07 16:54:15,945][14395] Num frames 105000... +[2024-11-07 16:54:16,206][14395] Num frames 105100... +[2024-11-07 16:54:16,395][14395] Avg episode rewards: #0: 4.606, true rewards: #0: 4.096 +[2024-11-07 16:54:16,396][14395] Avg episode reward: 4.606, avg true_objective: 4.096 +[2024-11-07 16:54:16,512][14395] Num frames 105200... +[2024-11-07 16:54:16,811][14395] Num frames 105300... +[2024-11-07 16:54:17,063][14395] Num frames 105400... +[2024-11-07 16:54:17,308][14395] Num frames 105500... +[2024-11-07 16:54:17,506][14395] Avg episode rewards: #0: 4.609, true rewards: #0: 4.099 +[2024-11-07 16:54:17,508][14395] Avg episode reward: 4.609, avg true_objective: 4.099 +[2024-11-07 16:54:17,580][14395] Num frames 105600... +[2024-11-07 16:54:17,807][14395] Num frames 105700... +[2024-11-07 16:54:18,016][14395] Num frames 105800... +[2024-11-07 16:54:18,212][14395] Num frames 105900... +[2024-11-07 16:54:18,384][14395] Avg episode rewards: #0: 4.593, true rewards: #0: 4.093 +[2024-11-07 16:54:18,385][14395] Avg episode reward: 4.593, avg true_objective: 4.093 +[2024-11-07 16:54:18,495][14395] Num frames 106000... +[2024-11-07 16:54:18,762][14395] Num frames 106100... +[2024-11-07 16:54:18,997][14395] Num frames 106200... +[2024-11-07 16:54:19,191][14395] Num frames 106300... +[2024-11-07 16:54:19,391][14395] Avg episode rewards: #0: 4.606, true rewards: #0: 4.096 +[2024-11-07 16:54:19,394][14395] Avg episode reward: 4.606, avg true_objective: 4.096 +[2024-11-07 16:54:19,475][14395] Num frames 106400... +[2024-11-07 16:54:19,690][14395] Num frames 106500... +[2024-11-07 16:54:19,890][14395] Num frames 106600... +[2024-11-07 16:54:20,097][14395] Num frames 106700... +[2024-11-07 16:54:20,270][14395] Avg episode rewards: #0: 4.606, true rewards: #0: 4.096 +[2024-11-07 16:54:20,274][14395] Avg episode reward: 4.606, avg true_objective: 4.096 +[2024-11-07 16:54:20,382][14395] Num frames 106800... +[2024-11-07 16:54:20,575][14395] Num frames 106900... +[2024-11-07 16:54:20,767][14395] Num frames 107000... +[2024-11-07 16:54:20,956][14395] Num frames 107100... +[2024-11-07 16:54:21,084][14395] Avg episode rewards: #0: 4.606, true rewards: #0: 4.096 +[2024-11-07 16:54:21,087][14395] Avg episode reward: 4.606, avg true_objective: 4.096 +[2024-11-07 16:54:21,243][14395] Num frames 107200... +[2024-11-07 16:54:21,444][14395] Num frames 107300... +[2024-11-07 16:54:21,650][14395] Num frames 107400... +[2024-11-07 16:54:21,850][14395] Num frames 107500... +[2024-11-07 16:54:22,073][14395] Avg episode rewards: #0: 4.623, true rewards: #0: 4.102 +[2024-11-07 16:54:22,077][14395] Avg episode reward: 4.623, avg true_objective: 4.102 +[2024-11-07 16:54:22,124][14395] Num frames 107600... +[2024-11-07 16:54:22,336][14395] Num frames 107700... +[2024-11-07 16:54:22,548][14395] Num frames 107800... +[2024-11-07 16:54:22,755][14395] Num frames 107900... +[2024-11-07 16:54:25,524][14395] Avg episode rewards: #0: 4.606, true rewards: #0: 4.096 +[2024-11-07 16:54:25,577][14395] Avg episode reward: 4.606, avg true_objective: 4.096 +[2024-11-07 16:54:25,671][14395] Num frames 108000... +[2024-11-07 16:54:25,932][14395] Num frames 108100... +[2024-11-07 16:54:26,159][14395] Num frames 108200... +[2024-11-07 16:54:26,388][14395] Num frames 108300... +[2024-11-07 16:54:26,758][14395] Avg episode rewards: #0: 4.606, true rewards: #0: 4.096 +[2024-11-07 16:54:26,762][14395] Avg episode reward: 4.606, avg true_objective: 4.096 +[2024-11-07 16:54:26,880][14395] Num frames 108400... +[2024-11-07 16:54:27,121][14395] Num frames 108500... +[2024-11-07 16:54:27,360][14395] Num frames 108600... +[2024-11-07 16:54:27,593][14395] Num frames 108700... +[2024-11-07 16:54:27,819][14395] Num frames 108800... +[2024-11-07 16:54:27,879][14395] Avg episode rewards: #0: 4.570, true rewards: #0: 4.080 +[2024-11-07 16:54:27,880][14395] Avg episode reward: 4.570, avg true_objective: 4.080 +[2024-11-07 16:54:28,107][14395] Num frames 108900... +[2024-11-07 16:54:28,362][14395] Num frames 109000... +[2024-11-07 16:54:28,580][14395] Num frames 109100... +[2024-11-07 16:54:28,843][14395] Avg episode rewards: #0: 4.570, true rewards: #0: 4.080 +[2024-11-07 16:54:28,846][14395] Avg episode reward: 4.570, avg true_objective: 4.080 +[2024-11-07 16:54:28,887][14395] Num frames 109200... +[2024-11-07 16:54:29,116][14395] Num frames 109300... +[2024-11-07 16:54:29,327][14395] Num frames 109400... +[2024-11-07 16:54:29,558][14395] Num frames 109500... +[2024-11-07 16:54:29,760][14395] Avg episode rewards: #0: 4.554, true rewards: #0: 4.074 +[2024-11-07 16:54:29,768][14395] Avg episode reward: 4.554, avg true_objective: 4.074 +[2024-11-07 16:54:29,853][14395] Num frames 109600... +[2024-11-07 16:54:30,080][14395] Num frames 109700... +[2024-11-07 16:54:30,315][14395] Num frames 109800... +[2024-11-07 16:54:30,547][14395] Num frames 109900... +[2024-11-07 16:54:30,723][14395] Avg episode rewards: #0: 4.567, true rewards: #0: 4.086 +[2024-11-07 16:54:30,725][14395] Avg episode reward: 4.567, avg true_objective: 4.086 +[2024-11-07 16:54:30,834][14395] Num frames 110000... +[2024-11-07 16:54:31,083][14395] Num frames 110100... +[2024-11-07 16:54:31,350][14395] Num frames 110200... +[2024-11-07 16:54:34,075][14395] Num frames 110300... +[2024-11-07 16:54:34,291][14395] Avg episode rewards: #0: 4.570, true rewards: #0: 4.090 +[2024-11-07 16:54:35,148][14395] Avg episode reward: 4.570, avg true_objective: 4.090 +[2024-11-07 16:54:35,216][14395] Num frames 110400... +[2024-11-07 16:54:35,428][14395] Num frames 110500... +[2024-11-07 16:54:35,638][14395] Num frames 110600... +[2024-11-07 16:54:35,843][14395] Num frames 110700... +[2024-11-07 16:54:36,016][14395] Avg episode rewards: #0: 4.570, true rewards: #0: 4.090 +[2024-11-07 16:54:36,017][14395] Avg episode reward: 4.570, avg true_objective: 4.090 +[2024-11-07 16:54:36,106][14395] Num frames 110800... +[2024-11-07 16:54:36,329][14395] Num frames 110900... +[2024-11-07 16:54:36,546][14395] Num frames 111000... +[2024-11-07 16:54:36,754][14395] Num frames 111100... +[2024-11-07 16:54:36,968][14395] Num frames 111200... +[2024-11-07 16:54:37,030][14395] Avg episode rewards: #0: 4.550, true rewards: #0: 4.080 +[2024-11-07 16:54:37,035][14395] Avg episode reward: 4.550, avg true_objective: 4.080 +[2024-11-07 16:54:37,256][14395] Num frames 111300... +[2024-11-07 16:54:37,478][14395] Num frames 111400... +[2024-11-07 16:54:37,692][14395] Num frames 111500... +[2024-11-07 16:54:37,929][14395] Avg episode rewards: #0: 4.550, true rewards: #0: 4.080 +[2024-11-07 16:54:37,933][14395] Avg episode reward: 4.550, avg true_objective: 4.080 +[2024-11-07 16:54:37,981][14395] Num frames 111600... +[2024-11-07 16:54:38,192][14395] Num frames 111700... +[2024-11-07 16:54:38,416][14395] Num frames 111800... +[2024-11-07 16:54:38,625][14395] Num frames 111900... +[2024-11-07 16:54:38,808][14395] Avg episode rewards: #0: 4.550, true rewards: #0: 4.080 +[2024-11-07 16:54:38,811][14395] Avg episode reward: 4.550, avg true_objective: 4.080 +[2024-11-07 16:54:38,878][14395] Num frames 112000... +[2024-11-07 16:54:39,064][14395] Num frames 112100... +[2024-11-07 16:54:39,281][14395] Num frames 112200... +[2024-11-07 16:54:39,467][14395] Num frames 112300... +[2024-11-07 16:54:39,622][14395] Avg episode rewards: #0: 4.550, true rewards: #0: 4.080 +[2024-11-07 16:54:39,625][14395] Avg episode reward: 4.550, avg true_objective: 4.080 +[2024-11-07 16:54:39,737][14395] Num frames 112400... +[2024-11-07 16:54:39,929][14395] Num frames 112500... +[2024-11-07 16:54:40,120][14395] Num frames 112600... +[2024-11-07 16:54:40,303][14395] Num frames 112700... +[2024-11-07 16:54:40,428][14395] Avg episode rewards: #0: 4.550, true rewards: #0: 4.080 +[2024-11-07 16:54:40,431][14395] Avg episode reward: 4.550, avg true_objective: 4.080 +[2024-11-07 16:54:41,822][14395] Num frames 112800... +[2024-11-07 16:54:42,020][14395] Num frames 112900... +[2024-11-07 16:54:42,219][14395] Num frames 113000... +[2024-11-07 16:54:42,413][14395] Num frames 113100... +[2024-11-07 16:54:43,483][14395] Avg episode rewards: #0: 4.567, true rewards: #0: 4.086 +[2024-11-07 16:54:43,485][14395] Avg episode reward: 4.567, avg true_objective: 4.086 +[2024-11-07 16:54:43,518][14395] Num frames 113200... +[2024-11-07 16:54:43,747][14395] Num frames 113300... +[2024-11-07 16:54:44,167][14395] Num frames 113400... +[2024-11-07 16:54:44,364][14395] Num frames 113500... +[2024-11-07 16:54:44,558][14395] Num frames 113600... +[2024-11-07 16:54:44,750][14395] Num frames 113700... +[2024-11-07 16:54:44,938][14395] Num frames 113800... +[2024-11-07 16:54:45,044][14395] Avg episode rewards: #0: 4.632, true rewards: #0: 4.112 +[2024-11-07 16:54:45,046][14395] Avg episode reward: 4.632, avg true_objective: 4.112 +[2024-11-07 16:54:45,215][14395] Num frames 113900... +[2024-11-07 16:54:45,410][14395] Num frames 114000... +[2024-11-07 16:54:45,593][14395] Num frames 114100... +[2024-11-07 16:54:45,807][14395] Num frames 114200... +[2024-11-07 16:54:45,947][14395] Avg episode rewards: #0: 4.645, true rewards: #0: 4.115 +[2024-11-07 16:54:45,951][14395] Avg episode reward: 4.645, avg true_objective: 4.115 +[2024-11-07 16:54:46,110][14395] Num frames 114300... +[2024-11-07 16:54:46,331][14395] Num frames 114400... +[2024-11-07 16:54:46,541][14395] Num frames 114500... +[2024-11-07 16:54:46,777][14395] Num frames 114600... +[2024-11-07 16:54:46,900][14395] Avg episode rewards: #0: 4.645, true rewards: #0: 4.115 +[2024-11-07 16:54:46,902][14395] Avg episode reward: 4.645, avg true_objective: 4.115 +[2024-11-07 16:54:47,070][14395] Num frames 114700... +[2024-11-07 16:54:47,247][14395] Num frames 114800... +[2024-11-07 16:54:47,437][14395] Num frames 114900... +[2024-11-07 16:54:47,634][14395] Num frames 115000... +[2024-11-07 16:54:47,710][14395] Avg episode rewards: #0: 4.645, true rewards: #0: 4.115 +[2024-11-07 16:54:47,712][14395] Avg episode reward: 4.645, avg true_objective: 4.115 +[2024-11-07 16:54:47,983][14395] Num frames 115100... +[2024-11-07 16:54:48,214][14395] Num frames 115200... +[2024-11-07 16:54:48,522][14395] Num frames 115300... +[2024-11-07 16:54:48,710][14395] Avg episode rewards: #0: 4.652, true rewards: #0: 4.112 +[2024-11-07 16:54:48,711][14395] Avg episode reward: 4.652, avg true_objective: 4.112 +[2024-11-07 16:54:48,789][14395] Num frames 115400... +[2024-11-07 16:54:49,043][14395] Num frames 115500... +[2024-11-07 16:54:49,253][14395] Num frames 115600... +[2024-11-07 16:54:49,469][14395] Num frames 115700... +[2024-11-07 16:54:49,600][14395] Avg episode rewards: #0: 4.603, true rewards: #0: 4.093 +[2024-11-07 16:54:49,604][14395] Avg episode reward: 4.603, avg true_objective: 4.093 +[2024-11-07 16:54:49,732][14395] Num frames 115800... +[2024-11-07 16:54:49,942][14395] Num frames 115900... +[2024-11-07 16:54:50,147][14395] Num frames 116000... +[2024-11-07 16:54:50,335][14395] Num frames 116100... +[2024-11-07 16:54:50,451][14395] Avg episode rewards: #0: 4.590, true rewards: #0: 4.090 +[2024-11-07 16:54:50,453][14395] Avg episode reward: 4.590, avg true_objective: 4.090 +[2024-11-07 16:54:50,606][14395] Num frames 116200... +[2024-11-07 16:54:50,845][14395] Num frames 116300... +[2024-11-07 16:54:51,082][14395] Num frames 116400... +[2024-11-07 16:54:51,376][14395] Num frames 116500... +[2024-11-07 16:54:51,583][14395] Avg episode rewards: #0: 4.570, true rewards: #0: 4.080 +[2024-11-07 16:54:51,587][14395] Avg episode reward: 4.570, avg true_objective: 4.080 +[2024-11-07 16:54:51,649][14395] Num frames 116600... +[2024-11-07 16:54:51,869][14395] Num frames 116700... +[2024-11-07 16:54:52,089][14395] Num frames 116800... +[2024-11-07 16:54:52,278][14395] Num frames 116900... +[2024-11-07 16:54:52,458][14395] Avg episode rewards: #0: 4.570, true rewards: #0: 4.080 +[2024-11-07 16:54:52,462][14395] Avg episode reward: 4.570, avg true_objective: 4.080 +[2024-11-07 16:54:52,553][14395] Num frames 117000... +[2024-11-07 16:54:52,753][14395] Num frames 117100... +[2024-11-07 16:54:52,951][14395] Num frames 117200... +[2024-11-07 16:54:53,143][14395] Num frames 117300... +[2024-11-07 16:54:53,296][14395] Avg episode rewards: #0: 4.518, true rewards: #0: 4.058 +[2024-11-07 16:54:53,300][14395] Avg episode reward: 4.518, avg true_objective: 4.058 +[2024-11-07 16:54:53,422][14395] Num frames 117400... +[2024-11-07 16:54:53,627][14395] Num frames 117500... +[2024-11-07 16:54:53,834][14395] Num frames 117600... +[2024-11-07 16:54:54,032][14395] Num frames 117700... +[2024-11-07 16:54:54,264][14395] Avg episode rewards: #0: 4.498, true rewards: #0: 4.048 +[2024-11-07 16:54:54,267][14395] Avg episode reward: 4.498, avg true_objective: 4.048 +[2024-11-07 16:54:54,297][14395] Num frames 117800... +[2024-11-07 16:54:54,484][14395] Num frames 117900... +[2024-11-07 16:54:54,691][14395] Num frames 118000... +[2024-11-07 16:54:54,899][14395] Num frames 118100... +[2024-11-07 16:54:55,115][14395] Avg episode rewards: #0: 4.485, true rewards: #0: 4.045 +[2024-11-07 16:54:55,118][14395] Avg episode reward: 4.485, avg true_objective: 4.045 +[2024-11-07 16:54:55,190][14395] Num frames 118200... +[2024-11-07 16:54:55,418][14395] Num frames 118300... +[2024-11-07 16:54:55,634][14395] Num frames 118400... +[2024-11-07 16:54:55,837][14395] Num frames 118500... +[2024-11-07 16:54:56,022][14395] Avg episode rewards: #0: 4.462, true rewards: #0: 4.042 +[2024-11-07 16:54:56,025][14395] Avg episode reward: 4.462, avg true_objective: 4.042 +[2024-11-07 16:54:56,127][14395] Num frames 118600... +[2024-11-07 16:54:56,326][14395] Num frames 118700... +[2024-11-07 16:54:56,531][14395] Num frames 118800... +[2024-11-07 16:54:56,724][14395] Num frames 118900... +[2024-11-07 16:54:56,941][14395] Avg episode rewards: #0: 4.459, true rewards: #0: 4.038 +[2024-11-07 16:54:56,947][14395] Avg episode reward: 4.459, avg true_objective: 4.038 +[2024-11-07 16:54:57,005][14395] Num frames 119000... +[2024-11-07 16:54:57,195][14395] Num frames 119100... +[2024-11-07 16:54:57,396][14395] Num frames 119200... +[2024-11-07 16:54:57,593][14395] Num frames 119300... +[2024-11-07 16:54:57,784][14395] Avg episode rewards: #0: 4.442, true rewards: #0: 4.032 +[2024-11-07 16:54:57,789][14395] Avg episode reward: 4.442, avg true_objective: 4.032 +[2024-11-07 16:54:57,887][14395] Num frames 119400... +[2024-11-07 16:54:58,084][14395] Num frames 119500... +[2024-11-07 16:54:58,283][14395] Num frames 119600... +[2024-11-07 16:54:58,485][14395] Num frames 119700... +[2024-11-07 16:54:58,630][14395] Avg episode rewards: #0: 4.442, true rewards: #0: 4.032 +[2024-11-07 16:54:58,631][14395] Avg episode reward: 4.442, avg true_objective: 4.032 +[2024-11-07 16:54:58,747][14395] Num frames 119800... +[2024-11-07 16:54:58,958][14395] Num frames 119900... +[2024-11-07 16:54:59,152][14395] Num frames 120000... +[2024-11-07 16:54:59,388][14395] Avg episode rewards: #0: 4.433, true rewards: #0: 4.022 +[2024-11-07 16:54:59,391][14395] Avg episode reward: 4.433, avg true_objective: 4.022 +[2024-11-07 16:54:59,407][14395] Num frames 120100... +[2024-11-07 16:54:59,635][14395] Num frames 120200... +[2024-11-07 16:54:59,816][14395] Num frames 120300... +[2024-11-07 16:55:00,037][14395] Num frames 120400... +[2024-11-07 16:55:00,326][14395] Num frames 120500... +[2024-11-07 16:55:00,500][14395] Avg episode rewards: #0: 4.433, true rewards: #0: 4.022 +[2024-11-07 16:55:00,503][14395] Avg episode reward: 4.433, avg true_objective: 4.022 +[2024-11-07 16:55:00,638][14395] Num frames 120600... +[2024-11-07 16:55:00,894][14395] Num frames 120700... +[2024-11-07 16:55:01,146][14395] Num frames 120800... +[2024-11-07 16:55:01,352][14395] Num frames 120900... +[2024-11-07 16:55:01,598][14395] Avg episode rewards: #0: 4.449, true rewards: #0: 4.029 +[2024-11-07 16:55:01,599][14395] Avg episode reward: 4.449, avg true_objective: 4.029 +[2024-11-07 16:55:01,612][14395] Num frames 121000... +[2024-11-07 16:55:01,806][14395] Num frames 121100... +[2024-11-07 16:55:01,999][14395] Num frames 121200... +[2024-11-07 16:55:03,095][14395] Num frames 121300... +[2024-11-07 16:55:03,287][14395] Avg episode rewards: #0: 4.433, true rewards: #0: 4.022 +[2024-11-07 16:55:03,288][14395] Avg episode reward: 4.433, avg true_objective: 4.022 +[2024-11-07 16:55:03,328][14395] Num frames 121400... +[2024-11-07 16:55:03,519][14395] Num frames 121500... +[2024-11-07 16:55:03,707][14395] Num frames 121600... +[2024-11-07 16:55:03,900][14395] Num frames 121700... +[2024-11-07 16:55:04,074][14395] Avg episode rewards: #0: 4.444, true rewards: #0: 4.034 +[2024-11-07 16:55:04,079][14395] Avg episode reward: 4.444, avg true_objective: 4.034 +[2024-11-07 16:55:04,167][14395] Num frames 121800... +[2024-11-07 16:55:04,364][14395] Num frames 121900... +[2024-11-07 16:55:04,549][14395] Num frames 122000... +[2024-11-07 16:55:04,737][14395] Num frames 122100... +[2024-11-07 16:55:04,883][14395] Avg episode rewards: #0: 4.431, true rewards: #0: 4.031 +[2024-11-07 16:55:04,887][14395] Avg episode reward: 4.431, avg true_objective: 4.031 +[2024-11-07 16:55:05,016][14395] Num frames 122200... +[2024-11-07 16:55:05,204][14395] Num frames 122300... +[2024-11-07 16:55:05,397][14395] Num frames 122400... +[2024-11-07 16:55:05,593][14395] Num frames 122500... +[2024-11-07 16:55:05,827][14395] Avg episode rewards: #0: 4.448, true rewards: #0: 4.037 +[2024-11-07 16:55:05,828][14395] Avg episode reward: 4.448, avg true_objective: 4.037 +[2024-11-07 16:55:05,844][14395] Num frames 122600... +[2024-11-07 16:55:06,046][14395] Num frames 122700... +[2024-11-07 16:55:06,239][14395] Num frames 122800... +[2024-11-07 16:55:06,433][14395] Num frames 122900... +[2024-11-07 16:55:06,628][14395] Num frames 123000... +[2024-11-07 16:55:06,768][14395] Avg episode rewards: #0: 4.448, true rewards: #0: 4.037 +[2024-11-07 16:55:06,769][14395] Avg episode reward: 4.448, avg true_objective: 4.037 +[2024-11-07 16:55:06,878][14395] Num frames 123100... +[2024-11-07 16:55:07,064][14395] Num frames 123200... +[2024-11-07 16:55:07,755][14395] Num frames 123300... +[2024-11-07 16:55:11,298][14395] Num frames 123400... +[2024-11-07 16:55:11,431][14395] Avg episode rewards: #0: 4.448, true rewards: #0: 4.037 +[2024-11-07 16:55:11,433][14395] Avg episode reward: 4.448, avg true_objective: 4.037 +[2024-11-07 16:55:11,583][14395] Num frames 123500... +[2024-11-07 16:55:11,794][14395] Num frames 123600... +[2024-11-07 16:55:11,983][14395] Num frames 123700... +[2024-11-07 16:55:12,182][14395] Num frames 123800... +[2024-11-07 16:55:12,259][14395] Avg episode rewards: #0: 4.434, true rewards: #0: 4.034 +[2024-11-07 16:55:12,263][14395] Avg episode reward: 4.434, avg true_objective: 4.034 +[2024-11-07 16:55:12,485][14395] Num frames 123900... +[2024-11-07 16:55:12,686][14395] Num frames 124000... +[2024-11-07 16:55:12,883][14395] Num frames 124100... +[2024-11-07 16:55:13,117][14395] Avg episode rewards: #0: 4.421, true rewards: #0: 4.031 +[2024-11-07 16:55:13,120][14395] Avg episode reward: 4.421, avg true_objective: 4.031 +[2024-11-07 16:55:13,151][14395] Num frames 124200... +[2024-11-07 16:55:13,337][14395] Num frames 124300... +[2024-11-07 16:55:13,525][14395] Num frames 124400... +[2024-11-07 16:55:13,709][14395] Num frames 124500... +[2024-11-07 16:55:13,911][14395] Avg episode rewards: #0: 4.421, true rewards: #0: 4.031 +[2024-11-07 16:55:13,914][14395] Avg episode reward: 4.421, avg true_objective: 4.031 +[2024-11-07 16:55:13,976][14395] Num frames 124600... +[2024-11-07 16:55:14,168][14395] Num frames 124700... +[2024-11-07 16:55:14,360][14395] Num frames 124800... +[2024-11-07 16:55:14,571][14395] Num frames 124900... +[2024-11-07 16:55:14,774][14395] Avg episode rewards: #0: 4.421, true rewards: #0: 4.031 +[2024-11-07 16:55:14,775][14395] Avg episode reward: 4.421, avg true_objective: 4.031 +[2024-11-07 16:55:14,861][14395] Num frames 125000... +[2024-11-07 16:55:15,126][14395] Num frames 125100... +[2024-11-07 16:55:15,349][14395] Num frames 125200... +[2024-11-07 16:55:15,546][14395] Num frames 125300... +[2024-11-07 16:55:15,759][14395] Num frames 125400... +[2024-11-07 16:55:15,965][14395] Avg episode rewards: #0: 4.454, true rewards: #0: 4.044 +[2024-11-07 16:55:15,967][14395] Avg episode reward: 4.454, avg true_objective: 4.044 +[2024-11-07 16:55:16,026][14395] Num frames 125500... +[2024-11-07 16:55:16,304][14395] Num frames 125600... +[2024-11-07 16:55:16,553][14395] Num frames 125700... +[2024-11-07 16:55:16,809][14395] Num frames 125800... +[2024-11-07 16:55:16,984][14395] Avg episode rewards: #0: 4.467, true rewards: #0: 4.057 +[2024-11-07 16:55:16,987][14395] Avg episode reward: 4.467, avg true_objective: 4.057 +[2024-11-07 16:55:17,095][14395] Num frames 125900... +[2024-11-07 16:55:17,288][14395] Num frames 126000... +[2024-11-07 16:55:17,508][14395] Num frames 126100... +[2024-11-07 16:55:18,092][14395] Num frames 126200... +[2024-11-07 16:55:18,375][14395] Num frames 126300... +[2024-11-07 16:55:18,609][14395] Num frames 126400... +[2024-11-07 16:55:20,293][14395] Avg episode rewards: #0: 4.503, true rewards: #0: 4.073 +[2024-11-07 16:55:20,294][14395] Avg episode reward: 4.503, avg true_objective: 4.073 +[2024-11-07 16:55:20,488][14395] Num frames 126500... +[2024-11-07 16:55:21,805][14395] Num frames 126600... +[2024-11-07 16:55:22,068][14395] Num frames 126700... +[2024-11-07 16:55:22,311][14395] Num frames 126800... +[2024-11-07 16:55:22,527][14395] Num frames 126900... +[2024-11-07 16:55:22,748][14395] Num frames 127000... +[2024-11-07 16:55:22,825][14395] Avg episode rewards: #0: 4.548, true rewards: #0: 4.098 +[2024-11-07 16:55:22,826][14395] Avg episode reward: 4.548, avg true_objective: 4.098 +[2024-11-07 16:55:23,027][14395] Num frames 127100... +[2024-11-07 16:55:23,268][14395] Num frames 127200... +[2024-11-07 16:55:23,451][14395] Num frames 127300... +[2024-11-07 16:55:23,678][14395] Avg episode rewards: #0: 4.548, true rewards: #0: 4.098 +[2024-11-07 16:55:23,682][14395] Avg episode reward: 4.548, avg true_objective: 4.098 +[2024-11-07 16:55:23,713][14395] Num frames 127400... +[2024-11-07 16:55:23,900][14395] Num frames 127500... +[2024-11-07 16:55:24,093][14395] Num frames 127600... +[2024-11-07 16:55:24,305][14395] Num frames 127700... +[2024-11-07 16:55:24,520][14395] Num frames 127800... +[2024-11-07 16:55:24,664][14395] Avg episode rewards: #0: 4.565, true rewards: #0: 4.105 +[2024-11-07 16:55:24,665][14395] Avg episode reward: 4.565, avg true_objective: 4.105 +[2024-11-07 16:55:24,785][14395] Num frames 127900... +[2024-11-07 16:55:25,011][14395] Num frames 128000... +[2024-11-07 16:55:25,217][14395] Num frames 128100... +[2024-11-07 16:55:25,407][14395] Num frames 128200... +[2024-11-07 16:55:25,574][14395] Avg episode rewards: #0: 4.552, true rewards: #0: 4.101 +[2024-11-07 16:55:25,580][14395] Avg episode reward: 4.552, avg true_objective: 4.101 +[2024-11-07 16:55:25,678][14395] Num frames 128300... +[2024-11-07 16:55:25,876][14395] Num frames 128400... +[2024-11-07 16:55:26,073][14395] Num frames 128500... +[2024-11-07 16:55:26,261][14395] Num frames 128600... +[2024-11-07 16:55:26,449][14395] Num frames 128700... +[2024-11-07 16:55:26,517][14395] Avg episode rewards: #0: 4.532, true rewards: #0: 4.092 +[2024-11-07 16:55:26,521][14395] Avg episode reward: 4.532, avg true_objective: 4.092 +[2024-11-07 16:55:26,718][14395] Num frames 128800... +[2024-11-07 16:55:26,900][14395] Num frames 128900... +[2024-11-07 16:55:27,115][14395] Avg episode rewards: #0: 4.521, true rewards: #0: 4.081 +[2024-11-07 16:55:27,119][14395] Avg episode reward: 4.521, avg true_objective: 4.081 +[2024-11-07 16:55:27,175][14395] Num frames 129000... +[2024-11-07 16:55:27,381][14395] Num frames 129100... +[2024-11-07 16:55:27,581][14395] Num frames 129200... +[2024-11-07 16:55:27,773][14395] Num frames 129300... +[2024-11-07 16:55:27,971][14395] Avg episode rewards: #0: 4.521, true rewards: #0: 4.081 +[2024-11-07 16:55:27,973][14395] Avg episode reward: 4.521, avg true_objective: 4.081 +[2024-11-07 16:55:28,055][14395] Num frames 129400... +[2024-11-07 16:55:28,252][14395] Num frames 129500... +[2024-11-07 16:55:28,441][14395] Num frames 129600... +[2024-11-07 16:55:28,634][14395] Num frames 129700... +[2024-11-07 16:55:28,694][14395] Avg episode rewards: #0: 4.520, true rewards: #0: 4.080 +[2024-11-07 16:55:28,698][14395] Avg episode reward: 4.520, avg true_objective: 4.080 +[2024-11-07 16:55:28,908][14395] Num frames 129800... +[2024-11-07 16:55:29,098][14395] Num frames 129900... +[2024-11-07 16:55:29,288][14395] Num frames 130000... +[2024-11-07 16:55:29,468][14395] Num frames 130100... +[2024-11-07 16:55:29,557][14395] Avg episode rewards: #0: 4.517, true rewards: #0: 4.077 +[2024-11-07 16:55:29,559][14395] Avg episode reward: 4.517, avg true_objective: 4.077 +[2024-11-07 16:55:29,750][14395] Num frames 130200... +[2024-11-07 16:55:29,967][14395] Num frames 130300... +[2024-11-07 16:55:30,173][14395] Num frames 130400... +[2024-11-07 16:55:30,362][14395] Num frames 130500... +[2024-11-07 16:55:30,422][14395] Avg episode rewards: #0: 4.497, true rewards: #0: 4.067 +[2024-11-07 16:55:30,426][14395] Avg episode reward: 4.497, avg true_objective: 4.067 +[2024-11-07 16:55:30,620][14395] Num frames 130600... +[2024-11-07 16:55:30,822][14395] Num frames 130700... +[2024-11-07 16:55:31,002][14395] Num frames 130800... +[2024-11-07 16:55:31,073][14395] Avg episode rewards: #0: 4.473, true rewards: #0: 4.053 +[2024-11-07 16:55:31,076][14395] Avg episode reward: 4.473, avg true_objective: 4.053 +[2024-11-07 16:55:31,278][14395] Num frames 130900... +[2024-11-07 16:55:31,464][14395] Num frames 131000... +[2024-11-07 16:55:31,652][14395] Num frames 131100... +[2024-11-07 16:55:31,851][14395] Num frames 131200... +[2024-11-07 16:55:31,950][14395] Avg episode rewards: #0: 4.473, true rewards: #0: 4.053 +[2024-11-07 16:55:31,955][14395] Avg episode reward: 4.473, avg true_objective: 4.053 +[2024-11-07 16:55:32,125][14395] Num frames 131300... +[2024-11-07 16:55:32,313][14395] Num frames 131400... +[2024-11-07 16:55:32,606][14395] Num frames 131500... +[2024-11-07 16:55:32,795][14395] Num frames 131600... +[2024-11-07 16:55:32,863][14395] Avg episode rewards: #0: 4.456, true rewards: #0: 4.046 +[2024-11-07 16:55:32,865][14395] Avg episode reward: 4.456, avg true_objective: 4.046 +[2024-11-07 16:55:33,067][14395] Num frames 131700... +[2024-11-07 16:55:33,275][14395] Num frames 131800... +[2024-11-07 16:55:33,510][14395] Avg episode rewards: #0: 4.457, true rewards: #0: 4.037 +[2024-11-07 16:55:33,515][14395] Avg episode reward: 4.457, avg true_objective: 4.037 +[2024-11-07 16:55:33,545][14395] Num frames 131900... +[2024-11-07 16:55:33,738][14395] Num frames 132000... +[2024-11-07 16:55:33,952][14395] Num frames 132100... +[2024-11-07 16:55:34,144][14395] Num frames 132200... +[2024-11-07 16:55:34,346][14395] Avg episode rewards: #0: 4.470, true rewards: #0: 4.049 +[2024-11-07 16:55:34,349][14395] Avg episode reward: 4.470, avg true_objective: 4.049 +[2024-11-07 16:55:34,407][14395] Num frames 132300... +[2024-11-07 16:55:34,597][14395] Num frames 132400... +[2024-11-07 16:55:34,790][14395] Num frames 132500... +[2024-11-07 16:55:34,985][14395] Num frames 132600... +[2024-11-07 16:55:35,155][14395] Avg episode rewards: #0: 4.453, true rewards: #0: 4.043 +[2024-11-07 16:55:35,157][14395] Avg episode reward: 4.453, avg true_objective: 4.043 +[2024-11-07 16:55:35,242][14395] Num frames 132700... +[2024-11-07 16:55:35,451][14395] Num frames 132800... +[2024-11-07 16:55:35,645][14395] Num frames 132900... +[2024-11-07 16:55:35,852][14395] Num frames 133000... +[2024-11-07 16:56:34,547][14395] Avg episode rewards: #0: 4.453, true rewards: #0: 4.043 +[2024-11-07 16:56:35,186][14395] Avg episode reward: 4.453, avg true_objective: 4.043 +[2024-11-07 16:58:32,328][14395] Num frames 133100... +[2024-11-07 16:59:01,673][14395] Num frames 133200... +[2024-11-07 16:59:02,459][14395] Num frames 133300... +[2024-11-07 16:59:02,790][14395] Num frames 133400... +[2024-11-07 16:59:06,979][14395] Avg episode rewards: #0: 4.453, true rewards: #0: 4.043 +[2024-11-07 16:59:07,279][14395] Avg episode reward: 4.453, avg true_objective: 4.043 +[2024-11-07 17:02:09,363][14395] Num frames 133500... +[2024-11-07 17:11:47,637][14395] Num frames 133600... +[2024-11-07 17:21:28,546][14395] Num frames 133700... +[2024-11-07 17:34:34,423][14395] Num frames 133800... +[2024-11-07 17:37:08,638][14395] Avg episode rewards: #0: 4.437, true rewards: #0: 4.037 +[2024-11-07 17:37:09,069][14395] Avg episode reward: 4.437, avg true_objective: 4.037 +[2024-11-07 17:58:13,270][14395] Num frames 133900... +[2024-11-07 22:48:27,457][40007] Saving configuration to /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/config.json... +[2024-11-07 22:48:27,493][40007] Rollout worker 0 uses device cpu +[2024-11-07 22:48:27,790][40007] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-11-07 22:48:27,791][40007] InferenceWorker_p0-w0: min num requests: 1 +[2024-11-07 22:48:27,797][40007] Starting all processes... +[2024-11-07 22:48:27,798][40007] Starting process learner_proc0 +[2024-11-07 22:48:27,876][40007] Starting all processes... +[2024-11-07 22:48:27,917][40007] Starting process inference_proc0-0 +[2024-11-07 22:48:27,918][40007] Starting process rollout_proc0 +[2024-11-07 22:48:30,816][40309] Worker 0 uses CPU cores [0, 1, 2, 3, 4, 5, 6] +[2024-11-07 22:48:30,817][40308] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-11-07 22:48:30,817][40302] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-11-07 22:48:30,817][40308] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 +[2024-11-07 22:48:30,818][40302] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 +[2024-11-07 22:48:30,891][40308] Num visible devices: 1 +[2024-11-07 22:48:30,891][40302] Num visible devices: 1 +[2024-11-07 22:48:30,931][40302] Starting seed is not provided +[2024-11-07 22:48:30,931][40302] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-11-07 22:48:30,931][40302] Initializing actor-critic model on device cuda:0 +[2024-11-07 22:48:30,931][40302] RunningMeanStd input shape: (3, 72, 128) +[2024-11-07 22:48:30,935][40302] RunningMeanStd input shape: (1,) +[2024-11-07 22:48:30,948][40302] ConvEncoder: input_channels=3 +[2024-11-07 22:48:31,957][40302] Conv encoder output size: 512 +[2024-11-07 22:48:31,958][40302] Policy head output size: 512 +[2024-11-07 22:48:32,287][40302] Created Actor Critic model with architecture: +[2024-11-07 22:48:32,287][40302] ActorCriticSharedWeights( + (obs_normalizer): ObservationNormalizer( + (running_mean_std): RunningMeanStdDictInPlace( + (running_mean_std): ModuleDict( + (obs): RunningMeanStdInPlace() + ) + ) + ) + (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) + (encoder): VizdoomEncoder( + (basic_encoder): ConvEncoder( + (enc): RecursiveScriptModule( + original_name=ConvEncoderImpl + (conv_head): RecursiveScriptModule( + original_name=Sequential + (0): RecursiveScriptModule(original_name=Conv2d) + (1): RecursiveScriptModule(original_name=ELU) + (2): RecursiveScriptModule(original_name=Conv2d) + (3): RecursiveScriptModule(original_name=ELU) + (4): RecursiveScriptModule(original_name=Conv2d) + (5): RecursiveScriptModule(original_name=ELU) + ) + (mlp_layers): RecursiveScriptModule( + original_name=Sequential + (0): RecursiveScriptModule(original_name=Linear) + (1): RecursiveScriptModule(original_name=ELU) + ) + ) + ) + ) + (core): ModelCoreRNN( + (core): GRU(512, 512) + ) + (decoder): MlpDecoder( + (mlp): Identity() + ) + (critic_linear): Linear(in_features=512, out_features=1, bias=True) + (action_parameterization): ActionParameterizationDefault( + (distribution_linear): Linear(in_features=512, out_features=5, bias=True) + ) +) +[2024-11-07 22:48:33,883][40302] Using optimizer +[2024-11-07 22:48:40,573][40302] Loading state from checkpoint /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000004884_20004864.pth... +[2024-11-07 22:48:41,038][40302] Loading model from checkpoint +[2024-11-07 22:48:41,042][40302] Loaded experiment state at self.train_step=4884, self.env_steps=20004864 +[2024-11-07 22:48:41,043][40302] Initialized policy 0 weights for model version 4884 +[2024-11-07 22:48:41,063][40302] LearnerWorker_p0 finished initialization! +[2024-11-07 22:48:41,064][40302] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-11-07 22:48:41,239][40308] RunningMeanStd input shape: (3, 72, 128) +[2024-11-07 22:48:41,241][40308] RunningMeanStd input shape: (1,) +[2024-11-07 22:48:41,253][40308] ConvEncoder: input_channels=3 +[2024-11-07 22:48:41,364][40308] Conv encoder output size: 512 +[2024-11-07 22:48:41,364][40308] Policy head output size: 512 +[2024-11-07 22:48:41,406][40007] Inference worker 0-0 is ready! +[2024-11-07 22:48:41,408][40007] All inference workers are ready! Signal rollout workers to start! +[2024-11-07 22:48:41,440][40309] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 22:48:43,973][40309] Decorrelating experience for 0 frames... +[2024-11-07 22:48:44,308][40007] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 20004864. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-11-07 22:48:44,318][40309] Decorrelating experience for 32 frames... +[2024-11-07 22:48:47,781][40007] Heartbeat connected on Batcher_0 +[2024-11-07 22:48:47,785][40007] Heartbeat connected on LearnerWorker_p0 +[2024-11-07 22:48:47,796][40007] Heartbeat connected on RolloutWorker_w0 +[2024-11-07 22:48:49,019][40007] Heartbeat connected on InferenceWorker_p0-w0 +[2024-11-07 22:48:49,308][40007] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 20004864. Throughput: 0: 1.0. Samples: 5. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-11-07 22:48:54,308][40007] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 20004864. Throughput: 0: 93.5. Samples: 935. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-11-07 22:48:54,310][40007] Avg episode reward: [(0, '4.568')] +[2024-11-07 22:48:58,369][40302] Signal inference workers to stop experience collection... +[2024-11-07 22:48:58,375][40308] InferenceWorker_p0-w0: stopping experience collection +[2024-11-07 22:48:59,309][40007] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 20004864. Throughput: 0: 109.1. Samples: 1636. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-11-07 22:48:59,310][40007] Avg episode reward: [(0, '4.482')] +[2024-11-07 22:49:04,309][40007] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 20004864. Throughput: 0: 102.6. Samples: 2052. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-11-07 22:49:04,314][40007] Avg episode reward: [(0, '4.482')] +[2024-11-07 22:49:04,343][40302] Signal inference workers to resume experience collection... +[2024-11-07 22:49:04,344][40308] InferenceWorker_p0-w0: resuming experience collection +[2024-11-07 22:49:04,345][40302] Stopping Batcher_0... +[2024-11-07 22:49:04,345][40302] Loop batcher_evt_loop terminating... +[2024-11-07 22:49:04,358][40007] Component Batcher_0 stopped! +[2024-11-07 22:49:04,360][40308] Weights refcount: 2 0 +[2024-11-07 22:49:04,364][40308] Stopping InferenceWorker_p0-w0... +[2024-11-07 22:49:04,365][40308] Loop inference_proc0-0_evt_loop terminating... +[2024-11-07 22:49:04,364][40007] Component InferenceWorker_p0-w0 stopped! +[2024-11-07 22:49:05,054][40309] Stopping RolloutWorker_w0... +[2024-11-07 22:49:05,055][40309] Loop rollout_proc0_evt_loop terminating... +[2024-11-07 22:49:05,054][40007] Component RolloutWorker_w0 stopped! +[2024-11-07 22:49:05,574][40302] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000004886_20013056.pth... +[2024-11-07 22:49:05,698][40302] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000004691_19214336.pth +[2024-11-07 22:49:05,711][40302] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000004886_20013056.pth... +[2024-11-07 22:49:05,830][40302] Stopping LearnerWorker_p0... +[2024-11-07 22:49:05,831][40302] Loop learner_proc0_evt_loop terminating... +[2024-11-07 22:49:05,830][40007] Component LearnerWorker_p0 stopped! +[2024-11-07 22:49:05,832][40007] Waiting for process learner_proc0 to stop... +[2024-11-07 22:49:07,103][40007] Waiting for process inference_proc0-0 to join... +[2024-11-07 22:49:07,104][40007] Waiting for process rollout_proc0 to join... +[2024-11-07 22:49:07,105][40007] Batcher 0 profile tree view: +batching: 0.1261, releasing_batches: 0.0012 +[2024-11-07 22:49:07,107][40007] InferenceWorker_p0-w0 profile tree view: +update_model: 0.0807 +wait_policy: 0.0000 + wait_policy_total: 2.9407 +one_step: 0.0050 + handle_policy_step: 13.5353 + deserialize: 0.0998, stack: 0.0362, obs_to_device_normalize: 2.7488, forward: 8.5619, send_messages: 0.2222 + prepare_outputs: 1.5741 + to_cpu: 1.2227 +[2024-11-07 22:49:07,108][40007] Learner 0 profile tree view: +misc: 0.0000, prepare_batch: 3.7822 +train: 7.8824 + epoch_init: 0.0000, minibatch_init: 0.0000, losses_postprocess: 0.0024, kl_divergence: 0.0241, after_optimizer: 0.5007 + calculate_losses: 1.5819 + losses_init: 0.0000, forward_head: 0.6289, bptt_initial: 0.5272, tail: 0.0389, advantages_returns: 0.0028, losses: 0.2730 + bptt: 0.1021 + bptt_forward_core: 0.1016 + update: 5.7721 + clip: 0.3766 +[2024-11-07 22:49:07,111][40007] RolloutWorker_w0 profile tree view: +wait_for_trajectories: 0.0039, enqueue_policy_requests: 0.2836, env_step: 2.4835, overhead: 0.1487, complete_rollouts: 0.0108 +save_policy_outputs: 0.2191 + split_output_tensors: 0.0756 +[2024-11-07 22:49:07,113][40007] Loop Runner_EvtLoop terminating... +[2024-11-07 22:49:07,114][40007] Runner profile tree view: +main_loop: 39.3175 +[2024-11-07 22:49:07,116][40007] Collected {0: 20013056}, FPS: 208.4 +[2024-11-07 22:49:07,412][40007] Loading existing experiment configuration from /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/config.json +[2024-11-07 22:49:07,413][40007] Adding new argument 'no_render'=True that is not in the saved config file! +[2024-11-07 22:49:07,414][40007] Adding new argument 'save_video'=True that is not in the saved config file! +[2024-11-07 22:49:07,416][40007] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! +[2024-11-07 22:49:07,419][40007] Adding new argument 'video_name'=None that is not in the saved config file! +[2024-11-07 22:49:07,420][40007] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! +[2024-11-07 22:49:07,423][40007] Adding new argument 'max_num_episodes'=1000000 that is not in the saved config file! +[2024-11-07 22:49:07,424][40007] Adding new argument 'push_to_hub'=False that is not in the saved config file! +[2024-11-07 22:49:07,428][40007] Adding new argument 'hf_repository'=None that is not in the saved config file! +[2024-11-07 22:49:07,429][40007] Adding new argument 'policy_index'=0 that is not in the saved config file! +[2024-11-07 22:49:07,432][40007] Adding new argument 'eval_deterministic'=False that is not in the saved config file! +[2024-11-07 22:49:07,433][40007] Adding new argument 'train_script'=None that is not in the saved config file! +[2024-11-07 22:49:07,435][40007] Adding new argument 'enjoy_script'=None that is not in the saved config file! +[2024-11-07 22:49:07,436][40007] Using frameskip 1 and render_action_repeat=4 for evaluation +[2024-11-07 22:49:07,478][40007] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 22:49:07,485][40007] RunningMeanStd input shape: (3, 72, 128) +[2024-11-07 22:49:07,488][40007] RunningMeanStd input shape: (1,) +[2024-11-07 22:49:07,511][40007] ConvEncoder: input_channels=3 +[2024-11-07 22:49:07,648][40007] Conv encoder output size: 512 +[2024-11-07 22:49:07,649][40007] Policy head output size: 512 +[2024-11-07 22:49:08,368][40007] Loading state from checkpoint /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000004886_20013056.pth... +[2024-11-07 22:49:09,832][40007] Num frames 100... +[2024-11-07 22:49:10,163][40007] Num frames 200... +[2024-11-07 22:49:10,486][40007] Num frames 300... +[2024-11-07 22:49:10,797][40007] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840 +[2024-11-07 22:49:10,801][40007] Avg episode reward: 3.840, avg true_objective: 3.840 +[2024-11-07 22:49:10,863][40007] Num frames 400... +[2024-11-07 22:49:11,209][40007] Num frames 500... +[2024-11-07 22:49:11,533][40007] Num frames 600... +[2024-11-07 22:49:11,834][40007] Num frames 700... +[2024-11-07 22:49:12,165][40007] Num frames 800... +[2024-11-07 22:49:12,330][40007] Avg episode rewards: #0: 4.660, true rewards: #0: 4.160 +[2024-11-07 22:49:12,332][40007] Avg episode reward: 4.660, avg true_objective: 4.160 +[2024-11-07 22:49:12,541][40007] Num frames 900... +[2024-11-07 22:49:14,275][40007] Num frames 1000... +[2024-11-07 22:49:14,626][40007] Num frames 1100... +[2024-11-07 22:49:14,877][40007] Num frames 1200... +[2024-11-07 22:49:14,976][40007] Avg episode rewards: #0: 4.387, true rewards: #0: 4.053 +[2024-11-07 22:49:14,982][40007] Avg episode reward: 4.387, avg true_objective: 4.053 +[2024-11-07 22:49:15,177][40007] Num frames 1300... +[2024-11-07 22:49:15,409][40007] Num frames 1400... +[2024-11-07 22:49:15,627][40007] Num frames 1500... +[2024-11-07 22:49:15,866][40007] Num frames 1600... +[2024-11-07 22:49:15,918][40007] Avg episode rewards: #0: 4.250, true rewards: #0: 4.000 +[2024-11-07 22:49:15,922][40007] Avg episode reward: 4.250, avg true_objective: 4.000 +[2024-11-07 22:49:16,166][40007] Num frames 1700... +[2024-11-07 22:49:16,408][40007] Num frames 1800... +[2024-11-07 22:49:16,661][40007] Num frames 1900... +[2024-11-07 22:49:16,878][40007] Num frames 2000... +[2024-11-07 22:49:16,971][40007] Avg episode rewards: #0: 4.432, true rewards: #0: 4.032 +[2024-11-07 22:49:16,975][40007] Avg episode reward: 4.432, avg true_objective: 4.032 +[2024-11-07 22:49:17,188][40007] Num frames 2100... +[2024-11-07 22:49:17,441][40007] Num frames 2200... +[2024-11-07 22:49:17,673][40007] Num frames 2300... +[2024-11-07 22:49:17,898][40007] Num frames 2400... +[2024-11-07 22:49:17,951][40007] Avg episode rewards: #0: 4.333, true rewards: #0: 4.000 +[2024-11-07 22:49:17,956][40007] Avg episode reward: 4.333, avg true_objective: 4.000 +[2024-11-07 22:49:18,182][40007] Num frames 2500... +[2024-11-07 22:49:18,408][40007] Num frames 2600... +[2024-11-07 22:49:18,609][40007] Num frames 2700... +[2024-11-07 22:49:18,797][40007] Num frames 2800... +[2024-11-07 22:49:18,953][40007] Avg episode rewards: #0: 4.497, true rewards: #0: 4.069 +[2024-11-07 22:49:18,956][40007] Avg episode reward: 4.497, avg true_objective: 4.069 +[2024-11-07 22:49:19,060][40007] Num frames 2900... +[2024-11-07 22:49:19,231][40007] Num frames 3000... +[2024-11-07 22:49:19,409][40007] Num frames 3100... +[2024-11-07 22:49:19,586][40007] Num frames 3200... +[2024-11-07 22:49:19,744][40007] Avg episode rewards: #0: 4.415, true rewards: #0: 4.040 +[2024-11-07 22:49:19,747][40007] Avg episode reward: 4.415, avg true_objective: 4.040 +[2024-11-07 22:49:19,916][40007] Num frames 3300... +[2024-11-07 22:49:20,115][40007] Num frames 3400... +[2024-11-07 22:49:20,307][40007] Num frames 3500... +[2024-11-07 22:49:20,511][40007] Num frames 3600... +[2024-11-07 22:49:20,727][40007] Avg episode rewards: #0: 4.533, true rewards: #0: 4.089 +[2024-11-07 22:49:20,732][40007] Avg episode reward: 4.533, avg true_objective: 4.089 +[2024-11-07 22:49:20,784][40007] Num frames 3700... +[2024-11-07 22:49:20,959][40007] Num frames 3800... +[2024-11-07 22:49:21,130][40007] Num frames 3900... +[2024-11-07 22:49:21,305][40007] Num frames 4000... +[2024-11-07 22:49:21,489][40007] Avg episode rewards: #0: 4.464, true rewards: #0: 4.064 +[2024-11-07 22:49:21,495][40007] Avg episode reward: 4.464, avg true_objective: 4.064 +[2024-11-07 22:49:21,572][40007] Num frames 4100... +[2024-11-07 22:49:21,743][40007] Num frames 4200... +[2024-11-07 22:49:21,902][40007] Num frames 4300... +[2024-11-07 22:49:22,066][40007] Num frames 4400... +[2024-11-07 22:49:22,198][40007] Avg episode rewards: #0: 4.407, true rewards: #0: 4.044 +[2024-11-07 22:49:22,200][40007] Avg episode reward: 4.407, avg true_objective: 4.044 +[2024-11-07 22:49:22,291][40007] Num frames 4500... +[2024-11-07 22:49:22,449][40007] Num frames 4600... +[2024-11-07 22:49:22,632][40007] Num frames 4700... +[2024-11-07 22:49:22,816][40007] Num frames 4800... +[2024-11-07 22:49:22,923][40007] Avg episode rewards: #0: 4.360, true rewards: #0: 4.027 +[2024-11-07 22:49:22,928][40007] Avg episode reward: 4.360, avg true_objective: 4.027 +[2024-11-07 22:49:23,091][40007] Num frames 4900... +[2024-11-07 22:49:23,259][40007] Num frames 5000... +[2024-11-07 22:49:23,457][40007] Avg episode rewards: #0: 4.222, true rewards: #0: 3.914 +[2024-11-07 22:49:23,458][40007] Avg episode reward: 4.222, avg true_objective: 3.914 +[2024-11-07 22:49:23,487][40007] Num frames 5100... +[2024-11-07 22:49:23,655][40007] Num frames 5200... +[2024-11-07 22:49:23,808][40007] Num frames 5300... +[2024-11-07 22:49:23,994][40007] Num frames 5400... +[2024-11-07 22:49:24,192][40007] Avg episode rewards: #0: 4.194, true rewards: #0: 3.909 +[2024-11-07 22:49:24,197][40007] Avg episode reward: 4.194, avg true_objective: 3.909 +[2024-11-07 22:49:24,273][40007] Num frames 5500... +[2024-11-07 22:49:24,445][40007] Num frames 5600... +[2024-11-07 22:49:24,632][40007] Num frames 5700... +[2024-11-07 22:49:24,790][40007] Num frames 5800... +[2024-11-07 22:49:24,934][40007] Avg episode rewards: #0: 4.171, true rewards: #0: 3.904 +[2024-11-07 22:49:24,936][40007] Avg episode reward: 4.171, avg true_objective: 3.904 +[2024-11-07 22:49:25,014][40007] Num frames 5900... +[2024-11-07 22:49:25,186][40007] Num frames 6000... +[2024-11-07 22:49:25,355][40007] Num frames 6100... +[2024-11-07 22:49:25,647][40007] Num frames 6200... +[2024-11-07 22:49:25,767][40007] Avg episode rewards: #0: 4.150, true rewards: #0: 3.900 +[2024-11-07 22:49:25,771][40007] Avg episode reward: 4.150, avg true_objective: 3.900 +[2024-11-07 22:49:25,919][40007] Num frames 6300... +[2024-11-07 22:49:26,140][40007] Num frames 6400... +[2024-11-07 22:49:26,384][40007] Num frames 6500... +[2024-11-07 22:49:26,579][40007] Num frames 6600... +[2024-11-07 22:49:26,674][40007] Avg episode rewards: #0: 4.132, true rewards: #0: 3.896 +[2024-11-07 22:49:26,678][40007] Avg episode reward: 4.132, avg true_objective: 3.896 +[2024-11-07 22:49:26,831][40007] Num frames 6700... +[2024-11-07 22:49:27,056][40007] Num frames 6800... +[2024-11-07 22:49:27,257][40007] Num frames 6900... +[2024-11-07 22:49:27,488][40007] Num frames 7000... +[2024-11-07 22:49:27,568][40007] Avg episode rewards: #0: 4.116, true rewards: #0: 3.893 +[2024-11-07 22:49:27,572][40007] Avg episode reward: 4.116, avg true_objective: 3.893 +[2024-11-07 22:49:27,784][40007] Num frames 7100... +[2024-11-07 22:49:28,056][40007] Num frames 7200... +[2024-11-07 22:49:28,297][40007] Num frames 7300... +[2024-11-07 22:49:28,536][40007] Num frames 7400... +[2024-11-07 22:49:28,724][40007] Avg episode rewards: #0: 4.187, true rewards: #0: 3.924 +[2024-11-07 22:49:28,727][40007] Avg episode reward: 4.187, avg true_objective: 3.924 +[2024-11-07 22:49:28,855][40007] Num frames 7500... +[2024-11-07 22:49:29,044][40007] Num frames 7600... +[2024-11-07 22:49:29,203][40007] Num frames 7700... +[2024-11-07 22:49:29,362][40007] Num frames 7800... +[2024-11-07 22:49:29,508][40007] Num frames 7900... +[2024-11-07 22:49:29,570][40007] Avg episode rewards: #0: 4.252, true rewards: #0: 3.952 +[2024-11-07 22:49:29,571][40007] Avg episode reward: 4.252, avg true_objective: 3.952 +[2024-11-07 22:49:29,725][40007] Num frames 8000... +[2024-11-07 22:49:29,880][40007] Num frames 8100... +[2024-11-07 22:49:30,028][40007] Avg episode rewards: #0: 4.171, true rewards: #0: 3.886 +[2024-11-07 22:49:30,032][40007] Avg episode reward: 4.171, avg true_objective: 3.886 +[2024-11-07 22:49:30,113][40007] Num frames 8200... +[2024-11-07 22:49:30,274][40007] Num frames 8300... +[2024-11-07 22:49:30,444][40007] Num frames 8400... +[2024-11-07 22:49:30,630][40007] Num frames 8500... +[2024-11-07 22:49:30,788][40007] Avg episode rewards: #0: 4.156, true rewards: #0: 3.884 +[2024-11-07 22:49:30,789][40007] Avg episode reward: 4.156, avg true_objective: 3.884 +[2024-11-07 22:49:30,894][40007] Num frames 8600... +[2024-11-07 22:49:31,096][40007] Num frames 8700... +[2024-11-07 22:49:31,287][40007] Num frames 8800... +[2024-11-07 22:49:31,519][40007] Num frames 8900... +[2024-11-07 22:49:31,648][40007] Avg episode rewards: #0: 4.143, true rewards: #0: 3.882 +[2024-11-07 22:49:31,650][40007] Avg episode reward: 4.143, avg true_objective: 3.882 +[2024-11-07 22:49:31,827][40007] Num frames 9000... +[2024-11-07 22:49:32,076][40007] Num frames 9100... +[2024-11-07 22:49:32,265][40007] Num frames 9200... +[2024-11-07 22:49:32,480][40007] Num frames 9300... +[2024-11-07 22:49:32,556][40007] Avg episode rewards: #0: 4.130, true rewards: #0: 3.880 +[2024-11-07 22:49:32,559][40007] Avg episode reward: 4.130, avg true_objective: 3.880 +[2024-11-07 22:49:32,740][40007] Num frames 9400... +[2024-11-07 22:49:32,964][40007] Num frames 9500... +[2024-11-07 22:49:33,146][40007] Num frames 9600... +[2024-11-07 22:49:33,391][40007] Avg episode rewards: #0: 4.118, true rewards: #0: 3.878 +[2024-11-07 22:49:33,393][40007] Avg episode reward: 4.118, avg true_objective: 3.878 +[2024-11-07 22:49:33,407][40007] Num frames 9700... +[2024-11-07 22:49:33,683][40007] Num frames 9800... +[2024-11-07 22:49:33,884][40007] Num frames 9900... +[2024-11-07 22:49:34,124][40007] Num frames 10000... +[2024-11-07 22:49:34,348][40007] Avg episode rewards: #0: 4.108, true rewards: #0: 3.877 +[2024-11-07 22:49:34,352][40007] Avg episode reward: 4.108, avg true_objective: 3.877 +[2024-11-07 22:49:34,403][40007] Num frames 10100... +[2024-11-07 22:49:34,591][40007] Num frames 10200... +[2024-11-07 22:49:34,756][40007] Num frames 10300... +[2024-11-07 22:49:34,935][40007] Num frames 10400... +[2024-11-07 22:49:35,049][40007] Avg episode rewards: #0: 4.086, true rewards: #0: 3.863 +[2024-11-07 22:49:35,053][40007] Avg episode reward: 4.086, avg true_objective: 3.863 +[2024-11-07 22:49:35,195][40007] Num frames 10500... +[2024-11-07 22:49:35,363][40007] Num frames 10600... +[2024-11-07 22:49:35,559][40007] Avg episode rewards: #0: 4.031, true rewards: #0: 3.817 +[2024-11-07 22:49:35,563][40007] Avg episode reward: 4.031, avg true_objective: 3.817 +[2024-11-07 22:49:35,609][40007] Num frames 10700... +[2024-11-07 22:49:35,781][40007] Num frames 10800... +[2024-11-07 22:49:35,951][40007] Num frames 10900... +[2024-11-07 22:49:36,121][40007] Num frames 11000... +[2024-11-07 22:49:36,306][40007] Avg episode rewards: #0: 4.024, true rewards: #0: 3.818 +[2024-11-07 22:49:36,310][40007] Avg episode reward: 4.024, avg true_objective: 3.818 +[2024-11-07 22:49:36,392][40007] Num frames 11100... +[2024-11-07 22:49:36,566][40007] Num frames 11200... +[2024-11-07 22:49:36,741][40007] Num frames 11300... +[2024-11-07 22:49:36,908][40007] Num frames 11400... +[2024-11-07 22:49:37,071][40007] Num frames 11500... +[2024-11-07 22:49:37,212][40007] Avg episode rewards: #0: 4.117, true rewards: #0: 3.850 +[2024-11-07 22:49:37,215][40007] Avg episode reward: 4.117, avg true_objective: 3.850 +[2024-11-07 22:49:37,315][40007] Num frames 11600... +[2024-11-07 22:49:37,482][40007] Num frames 11700... +[2024-11-07 22:49:37,656][40007] Num frames 11800... +[2024-11-07 22:49:37,824][40007] Num frames 11900... +[2024-11-07 22:49:37,993][40007] Num frames 12000... +[2024-11-07 22:49:38,107][40007] Avg episode rewards: #0: 4.171, true rewards: #0: 3.881 +[2024-11-07 22:49:38,111][40007] Avg episode reward: 4.171, avg true_objective: 3.881 +[2024-11-07 22:49:38,244][40007] Num frames 12100... +[2024-11-07 22:49:38,400][40007] Num frames 12200... +[2024-11-07 22:49:38,576][40007] Num frames 12300... +[2024-11-07 22:49:38,748][40007] Num frames 12400... +[2024-11-07 22:49:38,880][40007] Avg episode rewards: #0: 4.202, true rewards: #0: 3.890 +[2024-11-07 22:49:38,884][40007] Avg episode reward: 4.202, avg true_objective: 3.890 +[2024-11-07 22:49:38,990][40007] Num frames 12500... +[2024-11-07 22:49:39,174][40007] Num frames 12600... +[2024-11-07 22:49:39,348][40007] Num frames 12700... +[2024-11-07 22:49:39,512][40007] Num frames 12800... +[2024-11-07 22:49:39,622][40007] Avg episode rewards: #0: 4.191, true rewards: #0: 3.888 +[2024-11-07 22:49:39,627][40007] Avg episode reward: 4.191, avg true_objective: 3.888 +[2024-11-07 22:49:39,783][40007] Num frames 12900... +[2024-11-07 22:49:39,977][40007] Num frames 13000... +[2024-11-07 22:49:40,155][40007] Num frames 13100... +[2024-11-07 22:49:40,336][40007] Num frames 13200... +[2024-11-07 22:49:40,423][40007] Avg episode rewards: #0: 4.181, true rewards: #0: 3.887 +[2024-11-07 22:49:40,425][40007] Avg episode reward: 4.181, avg true_objective: 3.887 +[2024-11-07 22:49:40,622][40007] Num frames 13300... +[2024-11-07 22:49:40,845][40007] Num frames 13400... +[2024-11-07 22:49:41,047][40007] Num frames 13500... +[2024-11-07 22:49:41,259][40007] Num frames 13600... +[2024-11-07 22:49:41,468][40007] Num frames 13700... +[2024-11-07 22:49:41,666][40007] Avg episode rewards: #0: 4.274, true rewards: #0: 3.931 +[2024-11-07 22:49:41,668][40007] Avg episode reward: 4.274, avg true_objective: 3.931 +[2024-11-07 22:49:41,783][40007] Num frames 13800... +[2024-11-07 22:49:42,034][40007] Num frames 13900... +[2024-11-07 22:49:42,259][40007] Num frames 14000... +[2024-11-07 22:49:42,469][40007] Avg episode rewards: #0: 4.238, true rewards: #0: 3.905 +[2024-11-07 22:49:42,470][40007] Avg episode reward: 4.238, avg true_objective: 3.905 +[2024-11-07 22:49:42,574][40007] Num frames 14100... +[2024-11-07 22:49:42,825][40007] Num frames 14200... +[2024-11-07 22:49:43,010][40007] Num frames 14300... +[2024-11-07 22:49:43,275][40007] Num frames 14400... +[2024-11-07 22:49:43,418][40007] Avg episode rewards: #0: 4.228, true rewards: #0: 3.903 +[2024-11-07 22:49:43,422][40007] Avg episode reward: 4.228, avg true_objective: 3.903 +[2024-11-07 22:49:43,552][40007] Num frames 14500... +[2024-11-07 22:49:43,754][40007] Num frames 14600... +[2024-11-07 22:49:43,959][40007] Num frames 14700... +[2024-11-07 22:49:44,169][40007] Num frames 14800... +[2024-11-07 22:49:44,409][40007] Avg episode rewards: #0: 4.261, true rewards: #0: 3.918 +[2024-11-07 22:49:44,415][40007] Avg episode reward: 4.261, avg true_objective: 3.918 +[2024-11-07 22:49:44,461][40007] Num frames 14900... +[2024-11-07 22:49:44,656][40007] Num frames 15000... +[2024-11-07 22:49:44,852][40007] Num frames 15100... +[2024-11-07 22:49:45,050][40007] Num frames 15200... +[2024-11-07 22:49:45,138][40007] Avg episode rewards: #0: 4.235, true rewards: #0: 3.902 +[2024-11-07 22:49:45,139][40007] Avg episode reward: 4.235, avg true_objective: 3.902 +[2024-11-07 22:49:45,313][40007] Num frames 15300... +[2024-11-07 22:49:45,510][40007] Num frames 15400... +[2024-11-07 22:49:45,710][40007] Num frames 15500... +[2024-11-07 22:49:45,906][40007] Num frames 15600... +[2024-11-07 22:49:46,081][40007] Avg episode rewards: #0: 4.266, true rewards: #0: 3.916 +[2024-11-07 22:49:46,085][40007] Avg episode reward: 4.266, avg true_objective: 3.916 +[2024-11-07 22:49:46,179][40007] Num frames 15700... +[2024-11-07 22:49:47,795][40007] Num frames 15800... +[2024-11-07 22:49:47,977][40007] Num frames 15900... +[2024-11-07 22:49:48,166][40007] Num frames 16000... +[2024-11-07 22:49:48,316][40007] Avg episode rewards: #0: 4.256, true rewards: #0: 3.914 +[2024-11-07 22:49:48,318][40007] Avg episode reward: 4.256, avg true_objective: 3.914 +[2024-11-07 22:49:48,424][40007] Num frames 16100... +[2024-11-07 22:49:48,615][40007] Num frames 16200... +[2024-11-07 22:49:48,804][40007] Num frames 16300... +[2024-11-07 22:49:48,986][40007] Num frames 16400... +[2024-11-07 22:49:49,112][40007] Avg episode rewards: #0: 4.246, true rewards: #0: 3.912 +[2024-11-07 22:49:49,117][40007] Avg episode reward: 4.246, avg true_objective: 3.912 +[2024-11-07 22:49:49,277][40007] Num frames 16500... +[2024-11-07 22:49:49,471][40007] Num frames 16600... +[2024-11-07 22:49:49,656][40007] Num frames 16700... +[2024-11-07 22:49:49,838][40007] Num frames 16800... +[2024-11-07 22:49:49,983][40007] Avg episode rewards: #0: 4.267, true rewards: #0: 3.918 +[2024-11-07 22:49:49,989][40007] Avg episode reward: 4.267, avg true_objective: 3.918 +[2024-11-07 22:49:50,105][40007] Num frames 16900... +[2024-11-07 22:49:50,302][40007] Num frames 17000... +[2024-11-07 22:49:50,498][40007] Num frames 17100... +[2024-11-07 22:49:50,688][40007] Num frames 17200... +[2024-11-07 22:49:50,810][40007] Avg episode rewards: #0: 4.257, true rewards: #0: 3.916 +[2024-11-07 22:49:50,814][40007] Avg episode reward: 4.257, avg true_objective: 3.916 +[2024-11-07 22:49:50,973][40007] Num frames 17300... +[2024-11-07 22:49:51,167][40007] Num frames 17400... +[2024-11-07 22:49:51,362][40007] Num frames 17500... +[2024-11-07 22:49:51,553][40007] Num frames 17600... +[2024-11-07 22:49:51,640][40007] Avg episode rewards: #0: 4.248, true rewards: #0: 3.915 +[2024-11-07 22:49:51,645][40007] Avg episode reward: 4.248, avg true_objective: 3.915 +[2024-11-07 22:49:51,827][40007] Num frames 17700... +[2024-11-07 22:49:52,031][40007] Num frames 17800... +[2024-11-07 22:49:52,237][40007] Num frames 17900... +[2024-11-07 22:49:52,464][40007] Num frames 18000... +[2024-11-07 22:49:52,516][40007] Avg episode rewards: #0: 4.239, true rewards: #0: 3.913 +[2024-11-07 22:49:52,521][40007] Avg episode reward: 4.239, avg true_objective: 3.913 +[2024-11-07 22:49:52,730][40007] Num frames 18100... +[2024-11-07 22:49:52,940][40007] Num frames 18200... +[2024-11-07 22:49:53,135][40007] Num frames 18300... +[2024-11-07 22:49:53,346][40007] Num frames 18400... +[2024-11-07 22:49:53,507][40007] Avg episode rewards: #0: 4.266, true rewards: #0: 3.925 +[2024-11-07 22:49:53,508][40007] Avg episode reward: 4.266, avg true_objective: 3.925 +[2024-11-07 22:49:53,611][40007] Num frames 18500... +[2024-11-07 22:49:53,812][40007] Num frames 18600... +[2024-11-07 22:49:54,018][40007] Num frames 18700... +[2024-11-07 22:49:54,219][40007] Num frames 18800... +[2024-11-07 22:49:54,341][40007] Avg episode rewards: #0: 4.257, true rewards: #0: 3.923 +[2024-11-07 22:49:54,342][40007] Avg episode reward: 4.257, avg true_objective: 3.923 +[2024-11-07 22:49:54,495][40007] Num frames 18900... +[2024-11-07 22:49:54,676][40007] Num frames 19000... +[2024-11-07 22:49:54,862][40007] Num frames 19100... +[2024-11-07 22:49:55,048][40007] Num frames 19200... +[2024-11-07 22:49:55,236][40007] Num frames 19300... +[2024-11-07 22:49:55,314][40007] Avg episode rewards: #0: 4.309, true rewards: #0: 3.941 +[2024-11-07 22:49:55,315][40007] Avg episode reward: 4.309, avg true_objective: 3.941 +[2024-11-07 22:49:55,478][40007] Num frames 19400... +[2024-11-07 22:49:55,670][40007] Num frames 19500... +[2024-11-07 22:49:55,859][40007] Num frames 19600... +[2024-11-07 22:49:56,089][40007] Avg episode rewards: #0: 4.299, true rewards: #0: 3.939 +[2024-11-07 22:49:56,092][40007] Avg episode reward: 4.299, avg true_objective: 3.939 +[2024-11-07 22:49:56,108][40007] Num frames 19700... +[2024-11-07 22:49:56,375][40007] Num frames 19800... +[2024-11-07 22:49:56,567][40007] Num frames 19900... +[2024-11-07 22:49:56,764][40007] Num frames 20000... +[2024-11-07 22:49:56,974][40007] Avg episode rewards: #0: 4.290, true rewards: #0: 3.937 +[2024-11-07 22:49:56,978][40007] Avg episode reward: 4.290, avg true_objective: 3.937 +[2024-11-07 22:49:57,036][40007] Num frames 20100... +[2024-11-07 22:49:57,215][40007] Num frames 20200... +[2024-11-07 22:49:57,397][40007] Num frames 20300... +[2024-11-07 22:49:57,572][40007] Num frames 20400... +[2024-11-07 22:49:57,803][40007] Avg episode rewards: #0: 4.326, true rewards: #0: 3.942 +[2024-11-07 22:49:57,806][40007] Avg episode reward: 4.326, avg true_objective: 3.942 +[2024-11-07 22:49:57,827][40007] Num frames 20500... +[2024-11-07 22:49:58,021][40007] Num frames 20600... +[2024-11-07 22:49:58,205][40007] Num frames 20700... +[2024-11-07 22:49:58,407][40007] Num frames 20800... +[2024-11-07 22:49:58,590][40007] Num frames 20900... +[2024-11-07 22:49:58,777][40007] Num frames 21000... +[2024-11-07 22:49:58,909][40007] Avg episode rewards: #0: 4.385, true rewards: #0: 3.970 +[2024-11-07 22:49:58,912][40007] Avg episode reward: 4.385, avg true_objective: 3.970 +[2024-11-07 22:49:59,036][40007] Num frames 21100... +[2024-11-07 22:49:59,217][40007] Num frames 21200... +[2024-11-07 22:49:59,399][40007] Num frames 21300... +[2024-11-07 22:49:59,579][40007] Num frames 21400... +[2024-11-07 22:49:59,681][40007] Avg episode rewards: #0: 4.375, true rewards: #0: 3.967 +[2024-11-07 22:49:59,685][40007] Avg episode reward: 4.375, avg true_objective: 3.967 +[2024-11-07 22:49:59,843][40007] Num frames 21500... +[2024-11-07 22:50:00,022][40007] Num frames 21600... +[2024-11-07 22:50:00,206][40007] Num frames 21700... +[2024-11-07 22:50:00,401][40007] Num frames 21800... +[2024-11-07 22:50:00,608][40007] Avg episode rewards: #0: 4.395, true rewards: #0: 3.977 +[2024-11-07 22:50:00,609][40007] Avg episode reward: 4.395, avg true_objective: 3.977 +[2024-11-07 22:50:00,682][40007] Num frames 21900... +[2024-11-07 22:50:00,914][40007] Num frames 22000... +[2024-11-07 22:50:01,173][40007] Num frames 22100... +[2024-11-07 22:50:01,405][40007] Num frames 22200... +[2024-11-07 22:50:01,598][40007] Avg episode rewards: #0: 4.385, true rewards: #0: 3.974 +[2024-11-07 22:50:01,604][40007] Avg episode reward: 4.385, avg true_objective: 3.974 +[2024-11-07 22:50:01,723][40007] Num frames 22300... +[2024-11-07 22:50:01,937][40007] Num frames 22400... +[2024-11-07 22:50:02,177][40007] Num frames 22500... +[2024-11-07 22:50:02,359][40007] Num frames 22600... +[2024-11-07 22:50:02,545][40007] Num frames 22700... +[2024-11-07 22:50:02,714][40007] Avg episode rewards: #0: 4.433, true rewards: #0: 3.994 +[2024-11-07 22:50:02,719][40007] Avg episode reward: 4.433, avg true_objective: 3.994 +[2024-11-07 22:50:02,802][40007] Num frames 22800... +[2024-11-07 22:50:03,005][40007] Num frames 22900... +[2024-11-07 22:50:03,199][40007] Num frames 23000... +[2024-11-07 22:50:03,441][40007] Num frames 23100... +[2024-11-07 22:50:03,618][40007] Avg episode rewards: #0: 4.423, true rewards: #0: 3.992 +[2024-11-07 22:50:03,619][40007] Avg episode reward: 4.423, avg true_objective: 3.992 +[2024-11-07 22:50:03,728][40007] Num frames 23200... +[2024-11-07 22:50:03,945][40007] Num frames 23300... +[2024-11-07 22:50:04,178][40007] Num frames 23400... +[2024-11-07 22:50:04,396][40007] Num frames 23500... +[2024-11-07 22:50:04,526][40007] Avg episode rewards: #0: 4.413, true rewards: #0: 3.989 +[2024-11-07 22:50:04,528][40007] Avg episode reward: 4.413, avg true_objective: 3.989 +[2024-11-07 22:50:04,679][40007] Num frames 23600... +[2024-11-07 22:50:04,889][40007] Num frames 23700... +[2024-11-07 22:50:05,080][40007] Num frames 23800... +[2024-11-07 22:50:05,272][40007] Num frames 23900... +[2024-11-07 22:50:05,373][40007] Avg episode rewards: #0: 4.403, true rewards: #0: 3.987 +[2024-11-07 22:50:05,374][40007] Avg episode reward: 4.403, avg true_objective: 3.987 +[2024-11-07 22:50:05,534][40007] Num frames 24000... +[2024-11-07 22:50:05,732][40007] Num frames 24100... +[2024-11-07 22:50:05,958][40007] Num frames 24200... +[2024-11-07 22:50:06,207][40007] Avg episode rewards: #0: 4.409, true rewards: #0: 3.982 +[2024-11-07 22:50:06,208][40007] Avg episode reward: 4.409, avg true_objective: 3.982 +[2024-11-07 22:50:06,227][40007] Num frames 24300... +[2024-11-07 22:50:06,460][40007] Num frames 24400... +[2024-11-07 22:50:06,660][40007] Num frames 24500... +[2024-11-07 22:50:06,876][40007] Num frames 24600... +[2024-11-07 22:50:07,094][40007] Avg episode rewards: #0: 4.399, true rewards: #0: 3.980 +[2024-11-07 22:50:07,097][40007] Avg episode reward: 4.399, avg true_objective: 3.980 +[2024-11-07 22:50:07,176][40007] Num frames 24700... +[2024-11-07 22:50:07,389][40007] Num frames 24800... +[2024-11-07 22:50:07,590][40007] Num frames 24900... +[2024-11-07 22:50:07,795][40007] Num frames 25000... +[2024-11-07 22:50:07,977][40007] Avg episode rewards: #0: 4.390, true rewards: #0: 3.978 +[2024-11-07 22:50:07,982][40007] Avg episode reward: 4.390, avg true_objective: 3.978 +[2024-11-07 22:50:08,086][40007] Num frames 25100... +[2024-11-07 22:50:08,284][40007] Num frames 25200... +[2024-11-07 22:50:08,505][40007] Num frames 25300... +[2024-11-07 22:50:08,714][40007] Num frames 25400... +[2024-11-07 22:50:08,863][40007] Avg episode rewards: #0: 4.382, true rewards: #0: 3.976 +[2024-11-07 22:50:08,867][40007] Avg episode reward: 4.382, avg true_objective: 3.976 +[2024-11-07 22:50:09,002][40007] Num frames 25500... +[2024-11-07 22:50:09,206][40007] Num frames 25600... +[2024-11-07 22:50:09,417][40007] Num frames 25700... +[2024-11-07 22:50:09,615][40007] Num frames 25800... +[2024-11-07 22:50:09,731][40007] Avg episode rewards: #0: 4.374, true rewards: #0: 3.974 +[2024-11-07 22:50:09,735][40007] Avg episode reward: 4.374, avg true_objective: 3.974 +[2024-11-07 22:50:09,905][40007] Num frames 25900... +[2024-11-07 22:50:10,137][40007] Num frames 26000... +[2024-11-07 22:50:10,390][40007] Num frames 26100... +[2024-11-07 22:50:10,698][40007] Num frames 26200... +[2024-11-07 22:50:10,780][40007] Avg episode rewards: #0: 4.365, true rewards: #0: 3.972 +[2024-11-07 22:50:10,784][40007] Avg episode reward: 4.365, avg true_objective: 3.972 +[2024-11-07 22:50:10,980][40007] Num frames 26300... +[2024-11-07 22:50:11,179][40007] Num frames 26400... +[2024-11-07 22:50:11,372][40007] Num frames 26500... +[2024-11-07 22:50:11,562][40007] Num frames 26600... +[2024-11-07 22:50:11,730][40007] Avg episode rewards: #0: 4.382, true rewards: #0: 3.979 +[2024-11-07 22:50:11,734][40007] Avg episode reward: 4.382, avg true_objective: 3.979 +[2024-11-07 22:50:11,837][40007] Num frames 26700... +[2024-11-07 22:50:12,038][40007] Num frames 26800... +[2024-11-07 22:50:12,239][40007] Num frames 26900... +[2024-11-07 22:50:12,495][40007] Num frames 27000... +[2024-11-07 22:50:12,743][40007] Num frames 27100... +[2024-11-07 22:50:12,817][40007] Avg episode rewards: #0: 4.398, true rewards: #0: 3.986 +[2024-11-07 22:50:12,819][40007] Avg episode reward: 4.398, avg true_objective: 3.986 +[2024-11-07 22:50:13,051][40007] Num frames 27200... +[2024-11-07 22:50:13,264][40007] Num frames 27300... +[2024-11-07 22:50:13,473][40007] Num frames 27400... +[2024-11-07 22:50:13,688][40007] Avg episode rewards: #0: 4.400, true rewards: #0: 3.980 +[2024-11-07 22:50:13,689][40007] Avg episode reward: 4.400, avg true_objective: 3.980 +[2024-11-07 22:50:13,787][40007] Num frames 27500... +[2024-11-07 22:50:14,012][40007] Num frames 27600... +[2024-11-07 22:50:14,248][40007] Num frames 27700... +[2024-11-07 22:50:14,512][40007] Num frames 27800... +[2024-11-07 22:50:14,670][40007] Avg episode rewards: #0: 4.392, true rewards: #0: 3.978 +[2024-11-07 22:50:14,672][40007] Avg episode reward: 4.392, avg true_objective: 3.978 +[2024-11-07 22:50:14,824][40007] Num frames 27900... +[2024-11-07 22:50:15,031][40007] Num frames 28000... +[2024-11-07 22:50:15,229][40007] Num frames 28100... +[2024-11-07 22:50:15,458][40007] Num frames 28200... +[2024-11-07 22:50:15,593][40007] Avg episode rewards: #0: 4.384, true rewards: #0: 3.976 +[2024-11-07 22:50:15,594][40007] Avg episode reward: 4.384, avg true_objective: 3.976 +[2024-11-07 22:50:15,775][40007] Num frames 28300... +[2024-11-07 22:50:15,995][40007] Num frames 28400... +[2024-11-07 22:50:16,245][40007] Num frames 28500... +[2024-11-07 22:50:16,455][40007] Num frames 28600... +[2024-11-07 22:50:16,539][40007] Avg episode rewards: #0: 4.377, true rewards: #0: 3.974 +[2024-11-07 22:50:16,543][40007] Avg episode reward: 4.377, avg true_objective: 3.974 +[2024-11-07 22:50:16,746][40007] Num frames 28700... +[2024-11-07 22:50:16,954][40007] Num frames 28800... +[2024-11-07 22:50:17,146][40007] Avg episode rewards: #0: 4.352, true rewards: #0: 3.955 +[2024-11-07 22:50:17,150][40007] Avg episode reward: 4.352, avg true_objective: 3.955 +[2024-11-07 22:50:17,238][40007] Num frames 28900... +[2024-11-07 22:50:17,437][40007] Num frames 29000... +[2024-11-07 22:50:17,645][40007] Num frames 29100... +[2024-11-07 22:50:17,854][40007] Num frames 29200... +[2024-11-07 22:50:18,015][40007] Avg episode rewards: #0: 4.345, true rewards: #0: 3.953 +[2024-11-07 22:50:18,018][40007] Avg episode reward: 4.345, avg true_objective: 3.953 +[2024-11-07 22:50:18,128][40007] Num frames 29300... +[2024-11-07 22:50:18,318][40007] Num frames 29400... +[2024-11-07 22:50:18,507][40007] Num frames 29500... +[2024-11-07 22:50:18,712][40007] Num frames 29600... +[2024-11-07 22:50:18,842][40007] Avg episode rewards: #0: 4.338, true rewards: #0: 3.951 +[2024-11-07 22:50:18,845][40007] Avg episode reward: 4.338, avg true_objective: 3.951 +[2024-11-07 22:50:19,020][40007] Num frames 29700... +[2024-11-07 22:50:19,208][40007] Num frames 29800... +[2024-11-07 22:50:19,397][40007] Num frames 29900... +[2024-11-07 22:50:19,581][40007] Num frames 30000... +[2024-11-07 22:50:19,680][40007] Avg episode rewards: #0: 4.332, true rewards: #0: 3.950 +[2024-11-07 22:50:19,685][40007] Avg episode reward: 4.332, avg true_objective: 3.950 +[2024-11-07 22:50:19,840][40007] Num frames 30100... +[2024-11-07 22:50:21,472][40007] Num frames 30200... +[2024-11-07 22:50:21,655][40007] Num frames 30300... +[2024-11-07 22:50:21,842][40007] Num frames 30400... +[2024-11-07 22:50:21,904][40007] Avg episode rewards: #0: 4.325, true rewards: #0: 3.949 +[2024-11-07 22:50:21,907][40007] Avg episode reward: 4.325, avg true_objective: 3.949 +[2024-11-07 22:50:22,100][40007] Num frames 30500... +[2024-11-07 22:50:22,300][40007] Num frames 30600... +[2024-11-07 22:50:22,486][40007] Num frames 30700... +[2024-11-07 22:50:22,711][40007] Avg episode rewards: #0: 4.319, true rewards: #0: 3.947 +[2024-11-07 22:50:22,712][40007] Avg episode reward: 4.319, avg true_objective: 3.947 +[2024-11-07 22:50:22,740][40007] Num frames 30800... +[2024-11-07 22:50:22,933][40007] Num frames 30900... +[2024-11-07 22:50:23,122][40007] Num frames 31000... +[2024-11-07 22:50:23,310][40007] Num frames 31100... +[2024-11-07 22:50:23,499][40007] Avg episode rewards: #0: 4.313, true rewards: #0: 3.946 +[2024-11-07 22:50:23,502][40007] Avg episode reward: 4.313, avg true_objective: 3.946 +[2024-11-07 22:50:23,586][40007] Num frames 31200... +[2024-11-07 22:50:23,773][40007] Num frames 31300... +[2024-11-07 22:50:23,952][40007] Num frames 31400... +[2024-11-07 22:50:24,130][40007] Num frames 31500... +[2024-11-07 22:50:24,305][40007] Avg episode rewards: #0: 4.307, true rewards: #0: 3.944 +[2024-11-07 22:50:24,309][40007] Avg episode reward: 4.307, avg true_objective: 3.944 +[2024-11-07 22:50:24,424][40007] Num frames 31600... +[2024-11-07 22:50:24,618][40007] Num frames 31700... +[2024-11-07 22:50:24,800][40007] Num frames 31800... +[2024-11-07 22:50:24,986][40007] Num frames 31900... +[2024-11-07 22:50:25,179][40007] Num frames 32000... +[2024-11-07 22:50:25,378][40007] Avg episode rewards: #0: 4.342, true rewards: #0: 3.959 +[2024-11-07 22:50:25,382][40007] Avg episode reward: 4.342, avg true_objective: 3.959 +[2024-11-07 22:50:25,472][40007] Num frames 32100... +[2024-11-07 22:50:25,664][40007] Num frames 32200... +[2024-11-07 22:50:25,857][40007] Num frames 32300... +[2024-11-07 22:50:26,048][40007] Num frames 32400... +[2024-11-07 22:50:26,213][40007] Avg episode rewards: #0: 4.336, true rewards: #0: 3.958 +[2024-11-07 22:50:26,214][40007] Avg episode reward: 4.336, avg true_objective: 3.958 +[2024-11-07 22:50:26,308][40007] Num frames 32500... +[2024-11-07 22:50:26,501][40007] Num frames 32600... +[2024-11-07 22:50:26,696][40007] Num frames 32700... +[2024-11-07 22:50:26,886][40007] Num frames 32800... +[2024-11-07 22:50:27,012][40007] Avg episode rewards: #0: 4.330, true rewards: #0: 3.956 +[2024-11-07 22:50:27,016][40007] Avg episode reward: 4.330, avg true_objective: 3.956 +[2024-11-07 22:50:27,172][40007] Num frames 32900... +[2024-11-07 22:50:27,363][40007] Num frames 33000... +[2024-11-07 22:50:27,565][40007] Num frames 33100... +[2024-11-07 22:50:27,755][40007] Num frames 33200... +[2024-11-07 22:50:27,954][40007] Avg episode rewards: #0: 4.343, true rewards: #0: 3.962 +[2024-11-07 22:50:27,957][40007] Avg episode reward: 4.343, avg true_objective: 3.962 +[2024-11-07 22:50:28,007][40007] Num frames 33300... +[2024-11-07 22:50:28,228][40007] Num frames 33400... +[2024-11-07 22:50:28,525][40007] Num frames 33500... +[2024-11-07 22:50:28,692][40007] Num frames 33600... +[2024-11-07 22:50:28,865][40007] Avg episode rewards: #0: 4.337, true rewards: #0: 3.961 +[2024-11-07 22:50:28,870][40007] Avg episode reward: 4.337, avg true_objective: 3.961 +[2024-11-07 22:50:28,953][40007] Num frames 33700... +[2024-11-07 22:50:29,168][40007] Num frames 33800... +[2024-11-07 22:50:29,394][40007] Num frames 33900... +[2024-11-07 22:50:29,572][40007] Num frames 34000... +[2024-11-07 22:50:29,702][40007] Avg episode rewards: #0: 4.342, true rewards: #0: 3.958 +[2024-11-07 22:50:29,707][40007] Avg episode reward: 4.342, avg true_objective: 3.958 +[2024-11-07 22:50:29,834][40007] Num frames 34100... +[2024-11-07 22:50:30,007][40007] Num frames 34200... +[2024-11-07 22:50:30,187][40007] Num frames 34300... +[2024-11-07 22:50:30,371][40007] Num frames 34400... +[2024-11-07 22:50:30,584][40007] Avg episode rewards: #0: 4.355, true rewards: #0: 3.964 +[2024-11-07 22:50:30,587][40007] Avg episode reward: 4.355, avg true_objective: 3.964 +[2024-11-07 22:50:30,633][40007] Num frames 34500... +[2024-11-07 22:50:30,813][40007] Num frames 34600... +[2024-11-07 22:50:30,993][40007] Num frames 34700... +[2024-11-07 22:50:31,180][40007] Num frames 34800... +[2024-11-07 22:50:31,373][40007] Avg episode rewards: #0: 4.349, true rewards: #0: 3.962 +[2024-11-07 22:50:31,376][40007] Avg episode reward: 4.349, avg true_objective: 3.962 +[2024-11-07 22:50:31,446][40007] Num frames 34900... +[2024-11-07 22:50:31,621][40007] Num frames 35000... +[2024-11-07 22:50:31,796][40007] Num frames 35100... +[2024-11-07 22:50:31,980][40007] Num frames 35200... +[2024-11-07 22:50:32,137][40007] Avg episode rewards: #0: 4.343, true rewards: #0: 3.961 +[2024-11-07 22:50:32,141][40007] Avg episode reward: 4.343, avg true_objective: 3.961 +[2024-11-07 22:50:32,237][40007] Num frames 35300... +[2024-11-07 22:50:32,416][40007] Num frames 35400... +[2024-11-07 22:50:32,596][40007] Num frames 35500... +[2024-11-07 22:50:32,673][40007] Avg episode rewards: #0: 4.323, true rewards: #0: 3.946 +[2024-11-07 22:50:32,676][40007] Avg episode reward: 4.323, avg true_objective: 3.946 +[2024-11-07 22:50:32,861][40007] Num frames 35600... +[2024-11-07 22:50:33,042][40007] Num frames 35700... +[2024-11-07 22:50:33,232][40007] Num frames 35800... +[2024-11-07 22:50:33,435][40007] Num frames 35900... +[2024-11-07 22:50:33,594][40007] Avg episode rewards: #0: 4.336, true rewards: #0: 3.951 +[2024-11-07 22:50:33,598][40007] Avg episode reward: 4.336, avg true_objective: 3.951 +[2024-11-07 22:50:33,694][40007] Num frames 36000... +[2024-11-07 22:50:33,875][40007] Num frames 36100... +[2024-11-07 22:50:34,075][40007] Num frames 36200... +[2024-11-07 22:50:34,275][40007] Num frames 36300... +[2024-11-07 22:50:34,410][40007] Avg episode rewards: #0: 4.331, true rewards: #0: 3.950 +[2024-11-07 22:50:34,414][40007] Avg episode reward: 4.331, avg true_objective: 3.950 +[2024-11-07 22:50:34,541][40007] Num frames 36400... +[2024-11-07 22:50:34,746][40007] Num frames 36500... +[2024-11-07 22:50:34,950][40007] Num frames 36600... +[2024-11-07 22:50:35,157][40007] Num frames 36700... +[2024-11-07 22:50:35,293][40007] Avg episode rewards: #0: 4.325, true rewards: #0: 3.949 +[2024-11-07 22:50:35,295][40007] Avg episode reward: 4.325, avg true_objective: 3.949 +[2024-11-07 22:50:35,461][40007] Num frames 36800... +[2024-11-07 22:50:35,710][40007] Num frames 36900... +[2024-11-07 22:50:35,936][40007] Num frames 37000... +[2024-11-07 22:50:36,147][40007] Num frames 37100... +[2024-11-07 22:50:36,226][40007] Avg episode rewards: #0: 4.320, true rewards: #0: 3.948 +[2024-11-07 22:50:36,228][40007] Avg episode reward: 4.320, avg true_objective: 3.948 +[2024-11-07 22:50:36,432][40007] Num frames 37200... +[2024-11-07 22:50:36,636][40007] Num frames 37300... +[2024-11-07 22:50:36,854][40007] Num frames 37400... +[2024-11-07 22:50:37,101][40007] Avg episode rewards: #0: 4.315, true rewards: #0: 3.947 +[2024-11-07 22:50:37,103][40007] Avg episode reward: 4.315, avg true_objective: 3.947 +[2024-11-07 22:50:37,115][40007] Num frames 37500... +[2024-11-07 22:50:37,306][40007] Num frames 37600... +[2024-11-07 22:50:37,525][40007] Num frames 37700... +[2024-11-07 22:50:37,746][40007] Num frames 37800... +[2024-11-07 22:50:37,974][40007] Avg episode rewards: #0: 4.310, true rewards: #0: 3.946 +[2024-11-07 22:50:37,975][40007] Avg episode reward: 4.310, avg true_objective: 3.946 +[2024-11-07 22:50:38,024][40007] Num frames 37900... +[2024-11-07 22:50:38,269][40007] Num frames 38000... +[2024-11-07 22:50:38,499][40007] Num frames 38100... +[2024-11-07 22:50:38,714][40007] Num frames 38200... +[2024-11-07 22:50:38,905][40007] Avg episode rewards: #0: 4.305, true rewards: #0: 3.945 +[2024-11-07 22:50:38,909][40007] Avg episode reward: 4.305, avg true_objective: 3.945 +[2024-11-07 22:50:39,010][40007] Num frames 38300... +[2024-11-07 22:50:39,220][40007] Num frames 38400... +[2024-11-07 22:50:39,437][40007] Num frames 38500... +[2024-11-07 22:50:39,634][40007] Num frames 38600... +[2024-11-07 22:50:39,829][40007] Num frames 38700... +[2024-11-07 22:50:39,980][40007] Avg episode rewards: #0: 4.321, true rewards: #0: 3.953 +[2024-11-07 22:50:39,986][40007] Avg episode reward: 4.321, avg true_objective: 3.953 +[2024-11-07 22:50:40,136][40007] Num frames 38800... +[2024-11-07 22:50:40,374][40007] Num frames 38900... +[2024-11-07 22:50:40,586][40007] Num frames 39000... +[2024-11-07 22:50:40,808][40007] Num frames 39100... +[2024-11-07 22:50:40,931][40007] Avg episode rewards: #0: 4.316, true rewards: #0: 3.952 +[2024-11-07 22:50:40,937][40007] Avg episode reward: 4.316, avg true_objective: 3.952 +[2024-11-07 22:50:41,117][40007] Num frames 39200... +[2024-11-07 22:50:41,351][40007] Num frames 39300... +[2024-11-07 22:50:41,648][40007] Num frames 39400... +[2024-11-07 22:50:41,853][40007] Num frames 39500... +[2024-11-07 22:50:42,056][40007] Num frames 39600... +[2024-11-07 22:50:42,132][40007] Avg episode rewards: #0: 4.331, true rewards: #0: 3.961 +[2024-11-07 22:50:42,136][40007] Avg episode reward: 4.331, avg true_objective: 3.961 +[2024-11-07 22:50:42,355][40007] Num frames 39700... +[2024-11-07 22:50:42,550][40007] Num frames 39800... +[2024-11-07 22:50:42,741][40007] Num frames 39900... +[2024-11-07 22:50:42,985][40007] Avg episode rewards: #0: 4.331, true rewards: #0: 3.961 +[2024-11-07 22:50:42,989][40007] Avg episode reward: 4.331, avg true_objective: 3.961 +[2024-11-07 22:50:43,031][40007] Num frames 40000... +[2024-11-07 22:50:43,244][40007] Num frames 40100... +[2024-11-07 22:50:43,446][40007] Num frames 40200... +[2024-11-07 22:50:43,629][40007] Num frames 40300... +[2024-11-07 22:50:43,836][40007] Avg episode rewards: #0: 4.314, true rewards: #0: 3.954 +[2024-11-07 22:50:43,839][40007] Avg episode reward: 4.314, avg true_objective: 3.954 +[2024-11-07 22:50:43,907][40007] Num frames 40400... +[2024-11-07 22:50:44,085][40007] Num frames 40500... +[2024-11-07 22:50:44,264][40007] Num frames 40600... +[2024-11-07 22:50:44,445][40007] Num frames 40700... +[2024-11-07 22:50:44,609][40007] Avg episode rewards: #0: 4.314, true rewards: #0: 3.954 +[2024-11-07 22:50:44,611][40007] Avg episode reward: 4.314, avg true_objective: 3.954 +[2024-11-07 22:50:44,701][40007] Num frames 40800... +[2024-11-07 22:50:44,910][40007] Num frames 40900... +[2024-11-07 22:50:45,091][40007] Num frames 41000... +[2024-11-07 22:50:45,282][40007] Num frames 41100... +[2024-11-07 22:50:45,432][40007] Avg episode rewards: #0: 4.314, true rewards: #0: 3.954 +[2024-11-07 22:50:45,433][40007] Avg episode reward: 4.314, avg true_objective: 3.954 +[2024-11-07 22:50:45,544][40007] Num frames 41200... +[2024-11-07 22:50:45,724][40007] Num frames 41300... +[2024-11-07 22:50:45,909][40007] Num frames 41400... +[2024-11-07 22:50:46,077][40007] Avg episode rewards: #0: 4.305, true rewards: #0: 3.945 +[2024-11-07 22:50:46,083][40007] Avg episode reward: 4.305, avg true_objective: 3.945 +[2024-11-07 22:50:46,169][40007] Num frames 41500... +[2024-11-07 22:50:46,354][40007] Num frames 41600... +[2024-11-07 22:50:46,532][40007] Num frames 41700... +[2024-11-07 22:50:46,713][40007] Num frames 41800... +[2024-11-07 22:50:46,855][40007] Avg episode rewards: #0: 4.305, true rewards: #0: 3.945 +[2024-11-07 22:50:46,858][40007] Avg episode reward: 4.305, avg true_objective: 3.945 +[2024-11-07 22:50:46,968][40007] Num frames 41900... +[2024-11-07 22:50:47,148][40007] Num frames 42000... +[2024-11-07 22:50:47,328][40007] Num frames 42100... +[2024-11-07 22:50:47,502][40007] Num frames 42200... +[2024-11-07 22:50:47,614][40007] Avg episode rewards: #0: 4.288, true rewards: #0: 3.938 +[2024-11-07 22:50:47,618][40007] Avg episode reward: 4.288, avg true_objective: 3.938 +[2024-11-07 22:50:47,777][40007] Num frames 42300... +[2024-11-07 22:50:47,988][40007] Num frames 42400... +[2024-11-07 22:50:48,187][40007] Num frames 42500... +[2024-11-07 22:50:48,379][40007] Num frames 42600... +[2024-11-07 22:50:48,565][40007] Num frames 42700... +[2024-11-07 22:50:48,644][40007] Avg episode rewards: #0: 4.318, true rewards: #0: 3.948 +[2024-11-07 22:50:48,646][40007] Avg episode reward: 4.318, avg true_objective: 3.948 +[2024-11-07 22:50:48,834][40007] Num frames 42800... +[2024-11-07 22:50:49,026][40007] Num frames 42900... +[2024-11-07 22:50:49,220][40007] Num frames 43000... +[2024-11-07 22:50:49,471][40007] Avg episode rewards: #0: 4.301, true rewards: #0: 3.941 +[2024-11-07 22:50:49,476][40007] Avg episode reward: 4.301, avg true_objective: 3.941 +[2024-11-07 22:50:49,507][40007] Num frames 43100... +[2024-11-07 22:50:49,707][40007] Num frames 43200... +[2024-11-07 22:50:49,918][40007] Num frames 43300... +[2024-11-07 22:50:50,131][40007] Num frames 43400... +[2024-11-07 22:50:50,345][40007] Avg episode rewards: #0: 4.301, true rewards: #0: 3.941 +[2024-11-07 22:50:50,347][40007] Avg episode reward: 4.301, avg true_objective: 3.941 +[2024-11-07 22:50:50,409][40007] Num frames 43500... +[2024-11-07 22:50:50,613][40007] Num frames 43600... +[2024-11-07 22:50:50,804][40007] Num frames 43700... +[2024-11-07 22:50:50,994][40007] Num frames 43800... +[2024-11-07 22:50:51,195][40007] Num frames 43900... +[2024-11-07 22:50:51,390][40007] Num frames 44000... +[2024-11-07 22:50:51,495][40007] Avg episode rewards: #0: 4.337, true rewards: #0: 3.957 +[2024-11-07 22:50:51,500][40007] Avg episode reward: 4.337, avg true_objective: 3.957 +[2024-11-07 22:50:51,668][40007] Num frames 44100... +[2024-11-07 22:50:51,850][40007] Num frames 44200... +[2024-11-07 22:50:52,032][40007] Num frames 44300... +[2024-11-07 22:50:52,218][40007] Num frames 44400... +[2024-11-07 22:50:52,287][40007] Avg episode rewards: #0: 4.337, true rewards: #0: 3.957 +[2024-11-07 22:50:52,291][40007] Avg episode reward: 4.337, avg true_objective: 3.957 +[2024-11-07 22:50:52,482][40007] Num frames 44500... +[2024-11-07 22:50:52,672][40007] Num frames 44600... +[2024-11-07 22:50:52,863][40007] Num frames 44700... +[2024-11-07 22:50:53,093][40007] Avg episode rewards: #0: 4.350, true rewards: #0: 3.970 +[2024-11-07 22:50:53,097][40007] Avg episode reward: 4.350, avg true_objective: 3.970 +[2024-11-07 22:50:53,139][40007] Num frames 44800... +[2024-11-07 22:50:53,328][40007] Num frames 44900... +[2024-11-07 22:50:53,515][40007] Num frames 45000... +[2024-11-07 22:50:55,158][40007] Num frames 45100... +[2024-11-07 22:50:55,351][40007] Avg episode rewards: #0: 4.350, true rewards: #0: 3.970 +[2024-11-07 22:50:55,354][40007] Avg episode reward: 4.350, avg true_objective: 3.970 +[2024-11-07 22:50:55,431][40007] Num frames 45200... +[2024-11-07 22:50:55,615][40007] Num frames 45300... +[2024-11-07 22:50:55,807][40007] Num frames 45400... +[2024-11-07 22:50:55,992][40007] Num frames 45500... +[2024-11-07 22:50:56,171][40007] Avg episode rewards: #0: 4.350, true rewards: #0: 3.970 +[2024-11-07 22:50:56,176][40007] Avg episode reward: 4.350, avg true_objective: 3.970 +[2024-11-07 22:50:56,280][40007] Num frames 45600... +[2024-11-07 22:50:56,463][40007] Num frames 45700... +[2024-11-07 22:50:56,651][40007] Num frames 45800... +[2024-11-07 22:50:56,835][40007] Num frames 45900... +[2024-11-07 22:50:56,975][40007] Avg episode rewards: #0: 4.350, true rewards: #0: 3.970 +[2024-11-07 22:50:56,976][40007] Avg episode reward: 4.350, avg true_objective: 3.970 +[2024-11-07 22:50:57,093][40007] Num frames 46000... +[2024-11-07 22:50:57,281][40007] Num frames 46100... +[2024-11-07 22:50:57,461][40007] Num frames 46200... +[2024-11-07 22:50:57,647][40007] Num frames 46300... +[2024-11-07 22:50:57,758][40007] Avg episode rewards: #0: 4.350, true rewards: #0: 3.970 +[2024-11-07 22:50:57,762][40007] Avg episode reward: 4.350, avg true_objective: 3.970 +[2024-11-07 22:50:57,923][40007] Num frames 46400... +[2024-11-07 22:50:58,116][40007] Num frames 46500... +[2024-11-07 22:50:58,363][40007] Num frames 46600... +[2024-11-07 22:50:58,570][40007] Num frames 46700... +[2024-11-07 22:50:58,650][40007] Avg episode rewards: #0: 4.350, true rewards: #0: 3.970 +[2024-11-07 22:50:58,654][40007] Avg episode reward: 4.350, avg true_objective: 3.970 +[2024-11-07 22:50:58,848][40007] Num frames 46800... +[2024-11-07 22:50:59,045][40007] Num frames 46900... +[2024-11-07 22:50:59,243][40007] Num frames 47000... +[2024-11-07 22:50:59,487][40007] Avg episode rewards: #0: 4.334, true rewards: #0: 3.964 +[2024-11-07 22:50:59,492][40007] Avg episode reward: 4.334, avg true_objective: 3.964 +[2024-11-07 22:50:59,516][40007] Num frames 47100... +[2024-11-07 22:50:59,711][40007] Num frames 47200... +[2024-11-07 22:50:59,896][40007] Num frames 47300... +[2024-11-07 22:51:00,086][40007] Num frames 47400... +[2024-11-07 22:51:00,236][40007] Avg episode rewards: #0: 4.324, true rewards: #0: 3.954 +[2024-11-07 22:51:00,239][40007] Avg episode reward: 4.324, avg true_objective: 3.954 +[2024-11-07 22:51:00,357][40007] Num frames 47500... +[2024-11-07 22:51:00,552][40007] Num frames 47600... +[2024-11-07 22:51:00,753][40007] Num frames 47700... +[2024-11-07 22:51:00,936][40007] Num frames 47800... +[2024-11-07 22:51:01,054][40007] Avg episode rewards: #0: 4.337, true rewards: #0: 3.967 +[2024-11-07 22:51:01,056][40007] Avg episode reward: 4.337, avg true_objective: 3.967 +[2024-11-07 22:51:01,219][40007] Num frames 47900... +[2024-11-07 22:51:01,414][40007] Num frames 48000... +[2024-11-07 22:51:01,676][40007] Num frames 48100... +[2024-11-07 22:51:01,885][40007] Num frames 48200... +[2024-11-07 22:51:01,972][40007] Avg episode rewards: #0: 4.337, true rewards: #0: 3.967 +[2024-11-07 22:51:01,975][40007] Avg episode reward: 4.337, avg true_objective: 3.967 +[2024-11-07 22:51:02,174][40007] Num frames 48300... +[2024-11-07 22:51:02,396][40007] Num frames 48400... +[2024-11-07 22:51:02,617][40007] Num frames 48500... +[2024-11-07 22:51:02,790][40007] Num frames 48600... +[2024-11-07 22:51:02,902][40007] Avg episode rewards: #0: 4.350, true rewards: #0: 3.970 +[2024-11-07 22:51:02,905][40007] Avg episode reward: 4.350, avg true_objective: 3.970 +[2024-11-07 22:51:03,049][40007] Num frames 48700... +[2024-11-07 22:51:03,228][40007] Num frames 48800... +[2024-11-07 22:51:03,418][40007] Num frames 48900... +[2024-11-07 22:51:03,597][40007] Num frames 49000... +[2024-11-07 22:51:03,680][40007] Avg episode rewards: #0: 4.350, true rewards: #0: 3.970 +[2024-11-07 22:51:03,683][40007] Avg episode reward: 4.350, avg true_objective: 3.970 +[2024-11-07 22:51:03,854][40007] Num frames 49100... +[2024-11-07 22:51:04,032][40007] Num frames 49200... +[2024-11-07 22:51:04,270][40007] Num frames 49300... +[2024-11-07 22:51:04,500][40007] Avg episode rewards: #0: 4.350, true rewards: #0: 3.970 +[2024-11-07 22:51:04,504][40007] Avg episode reward: 4.350, avg true_objective: 3.970 +[2024-11-07 22:51:04,519][40007] Num frames 49400... +[2024-11-07 22:51:04,720][40007] Num frames 49500... +[2024-11-07 22:51:04,905][40007] Num frames 49600... +[2024-11-07 22:51:05,080][40007] Num frames 49700... +[2024-11-07 22:51:05,268][40007] Num frames 49800... +[2024-11-07 22:51:05,416][40007] Avg episode rewards: #0: 4.367, true rewards: #0: 3.977 +[2024-11-07 22:51:05,420][40007] Avg episode reward: 4.367, avg true_objective: 3.977 +[2024-11-07 22:51:05,533][40007] Num frames 49900... +[2024-11-07 22:51:05,733][40007] Num frames 50000... +[2024-11-07 22:51:05,960][40007] Num frames 50100... +[2024-11-07 22:51:06,226][40007] Num frames 50200... +[2024-11-07 22:51:06,353][40007] Avg episode rewards: #0: 4.370, true rewards: #0: 3.980 +[2024-11-07 22:51:06,357][40007] Avg episode reward: 4.370, avg true_objective: 3.980 +[2024-11-07 22:51:06,496][40007] Num frames 50300... +[2024-11-07 22:51:06,672][40007] Num frames 50400... +[2024-11-07 22:51:06,852][40007] Num frames 50500... +[2024-11-07 22:51:07,035][40007] Num frames 50600... +[2024-11-07 22:51:07,249][40007] Avg episode rewards: #0: 4.399, true rewards: #0: 3.999 +[2024-11-07 22:51:07,256][40007] Avg episode reward: 4.399, avg true_objective: 3.999 +[2024-11-07 22:51:07,310][40007] Num frames 50700... +[2024-11-07 22:51:07,508][40007] Num frames 50800... +[2024-11-07 22:51:07,710][40007] Num frames 50900... +[2024-11-07 22:51:07,926][40007] Num frames 51000... +[2024-11-07 22:51:08,139][40007] Avg episode rewards: #0: 4.399, true rewards: #0: 3.999 +[2024-11-07 22:51:08,141][40007] Avg episode reward: 4.399, avg true_objective: 3.999 +[2024-11-07 22:51:08,238][40007] Num frames 51100... +[2024-11-07 22:51:08,450][40007] Num frames 51200... +[2024-11-07 22:51:08,691][40007] Num frames 51300... +[2024-11-07 22:51:08,955][40007] Num frames 51400... +[2024-11-07 22:51:09,144][40007] Avg episode rewards: #0: 4.370, true rewards: #0: 3.989 +[2024-11-07 22:51:09,146][40007] Avg episode reward: 4.370, avg true_objective: 3.989 +[2024-11-07 22:51:09,310][40007] Num frames 51500... +[2024-11-07 22:51:09,575][40007] Num frames 51600... +[2024-11-07 22:51:09,792][40007] Num frames 51700... +[2024-11-07 22:51:09,998][40007] Num frames 51800... +[2024-11-07 22:51:10,130][40007] Avg episode rewards: #0: 4.350, true rewards: #0: 3.980 +[2024-11-07 22:51:10,132][40007] Avg episode reward: 4.350, avg true_objective: 3.980 +[2024-11-07 22:51:10,293][40007] Num frames 51900... +[2024-11-07 22:51:10,502][40007] Num frames 52000... +[2024-11-07 22:51:10,774][40007] Num frames 52100... +[2024-11-07 22:51:10,985][40007] Num frames 52200... +[2024-11-07 22:51:11,079][40007] Avg episode rewards: #0: 4.337, true rewards: #0: 3.977 +[2024-11-07 22:51:11,080][40007] Avg episode reward: 4.337, avg true_objective: 3.977 +[2024-11-07 22:51:11,275][40007] Num frames 52300... +[2024-11-07 22:51:11,464][40007] Num frames 52400... +[2024-11-07 22:51:11,647][40007] Num frames 52500... +[2024-11-07 22:51:11,914][40007] Avg episode rewards: #0: 4.337, true rewards: #0: 3.977 +[2024-11-07 22:51:11,915][40007] Avg episode reward: 4.337, avg true_objective: 3.977 +[2024-11-07 22:51:11,919][40007] Num frames 52600... +[2024-11-07 22:51:12,121][40007] Num frames 52700... +[2024-11-07 22:51:12,314][40007] Num frames 52800... +[2024-11-07 22:51:12,519][40007] Num frames 52900... +[2024-11-07 22:51:12,688][40007] Num frames 53000... +[2024-11-07 22:51:12,773][40007] Avg episode rewards: #0: 4.350, true rewards: #0: 3.980 +[2024-11-07 22:51:12,774][40007] Avg episode reward: 4.350, avg true_objective: 3.980 +[2024-11-07 22:51:12,945][40007] Num frames 53100... +[2024-11-07 22:51:13,130][40007] Num frames 53200... +[2024-11-07 22:51:13,315][40007] Num frames 53300... +[2024-11-07 22:51:13,562][40007] Avg episode rewards: #0: 4.314, true rewards: #0: 3.964 +[2024-11-07 22:51:13,564][40007] Avg episode reward: 4.314, avg true_objective: 3.964 +[2024-11-07 22:51:13,570][40007] Num frames 53400... +[2024-11-07 22:51:13,783][40007] Num frames 53500... +[2024-11-07 22:51:13,991][40007] Num frames 53600... +[2024-11-07 22:51:14,195][40007] Num frames 53700... +[2024-11-07 22:51:14,404][40007] Avg episode rewards: #0: 4.322, true rewards: #0: 3.972 +[2024-11-07 22:51:14,408][40007] Avg episode reward: 4.322, avg true_objective: 3.972 +[2024-11-07 22:51:14,469][40007] Num frames 53800... +[2024-11-07 22:51:14,659][40007] Num frames 53900... +[2024-11-07 22:51:14,869][40007] Num frames 54000... +[2024-11-07 22:51:15,059][40007] Num frames 54100... +[2024-11-07 22:51:15,257][40007] Avg episode rewards: #0: 4.322, true rewards: #0: 3.972 +[2024-11-07 22:51:15,261][40007] Avg episode reward: 4.322, avg true_objective: 3.972 +[2024-11-07 22:51:15,349][40007] Num frames 54200... +[2024-11-07 22:51:15,542][40007] Num frames 54300... +[2024-11-07 22:51:15,787][40007] Num frames 54400... +[2024-11-07 22:51:16,031][40007] Num frames 54500... +[2024-11-07 22:51:16,300][40007] Avg episode rewards: #0: 4.319, true rewards: #0: 3.969 +[2024-11-07 22:51:16,304][40007] Avg episode reward: 4.319, avg true_objective: 3.969 +[2024-11-07 22:51:16,367][40007] Num frames 54600... +[2024-11-07 22:51:16,601][40007] Num frames 54700... +[2024-11-07 22:51:16,833][40007] Num frames 54800... +[2024-11-07 22:51:17,063][40007] Num frames 54900... +[2024-11-07 22:51:17,257][40007] Avg episode rewards: #0: 4.325, true rewards: #0: 3.975 +[2024-11-07 22:51:17,259][40007] Avg episode reward: 4.325, avg true_objective: 3.975 +[2024-11-07 22:51:17,331][40007] Num frames 55000... +[2024-11-07 22:51:17,547][40007] Num frames 55100... +[2024-11-07 22:51:17,763][40007] Num frames 55200... +[2024-11-07 22:51:17,977][40007] Num frames 55300... +[2024-11-07 22:51:18,146][40007] Avg episode rewards: #0: 4.309, true rewards: #0: 3.969 +[2024-11-07 22:51:18,152][40007] Avg episode reward: 4.309, avg true_objective: 3.969 +[2024-11-07 22:51:18,275][40007] Num frames 55400... +[2024-11-07 22:51:18,486][40007] Num frames 55500... +[2024-11-07 22:51:18,704][40007] Num frames 55600... +[2024-11-07 22:51:18,943][40007] Num frames 55700... +[2024-11-07 22:51:19,074][40007] Avg episode rewards: #0: 4.309, true rewards: #0: 3.969 +[2024-11-07 22:51:19,079][40007] Avg episode reward: 4.309, avg true_objective: 3.969 +[2024-11-07 22:51:19,243][40007] Num frames 55800... +[2024-11-07 22:51:19,441][40007] Num frames 55900... +[2024-11-07 22:51:19,647][40007] Num frames 56000... +[2024-11-07 22:51:19,749][40007] Avg episode rewards: #0: 4.299, true rewards: #0: 3.959 +[2024-11-07 22:51:19,751][40007] Avg episode reward: 4.299, avg true_objective: 3.959 +[2024-11-07 22:51:19,942][40007] Num frames 56100... +[2024-11-07 22:51:20,166][40007] Num frames 56200... +[2024-11-07 22:51:20,386][40007] Num frames 56300... +[2024-11-07 22:51:20,604][40007] Num frames 56400... +[2024-11-07 22:51:20,691][40007] Avg episode rewards: #0: 4.286, true rewards: #0: 3.956 +[2024-11-07 22:51:20,694][40007] Avg episode reward: 4.286, avg true_objective: 3.956 +[2024-11-07 22:51:20,945][40007] Num frames 56500... +[2024-11-07 22:51:21,195][40007] Num frames 56600... +[2024-11-07 22:51:21,424][40007] Num frames 56700... +[2024-11-07 22:51:21,678][40007] Avg episode rewards: #0: 4.286, true rewards: #0: 3.956 +[2024-11-07 22:51:21,682][40007] Avg episode reward: 4.286, avg true_objective: 3.956 +[2024-11-07 22:51:21,727][40007] Num frames 56800... +[2024-11-07 22:51:21,938][40007] Num frames 56900... +[2024-11-07 22:51:22,163][40007] Num frames 57000... +[2024-11-07 22:51:22,389][40007] Num frames 57100... +[2024-11-07 22:51:22,604][40007] Num frames 57200... +[2024-11-07 22:51:22,804][40007] Avg episode rewards: #0: 4.315, true rewards: #0: 3.965 +[2024-11-07 22:51:22,806][40007] Avg episode reward: 4.315, avg true_objective: 3.965 +[2024-11-07 22:51:22,889][40007] Num frames 57300... +[2024-11-07 22:51:23,134][40007] Num frames 57400... +[2024-11-07 22:51:23,370][40007] Num frames 57500... +[2024-11-07 22:51:23,615][40007] Num frames 57600... +[2024-11-07 22:51:23,803][40007] Avg episode rewards: #0: 4.315, true rewards: #0: 3.965 +[2024-11-07 22:51:23,808][40007] Avg episode reward: 4.315, avg true_objective: 3.965 +[2024-11-07 22:51:23,940][40007] Num frames 57700... +[2024-11-07 22:51:24,183][40007] Num frames 57800... +[2024-11-07 22:51:24,423][40007] Num frames 57900... +[2024-11-07 22:51:24,661][40007] Num frames 58000... +[2024-11-07 22:51:24,884][40007] Avg episode rewards: #0: 4.312, true rewards: #0: 3.962 +[2024-11-07 22:51:24,885][40007] Avg episode reward: 4.312, avg true_objective: 3.962 +[2024-11-07 22:51:24,951][40007] Num frames 58100... +[2024-11-07 22:51:25,168][40007] Num frames 58200... +[2024-11-07 22:51:25,379][40007] Num frames 58300... +[2024-11-07 22:51:25,578][40007] Num frames 58400... +[2024-11-07 22:51:25,756][40007] Avg episode rewards: #0: 4.312, true rewards: #0: 3.962 +[2024-11-07 22:51:25,761][40007] Avg episode reward: 4.312, avg true_objective: 3.962 +[2024-11-07 22:51:25,937][40007] Num frames 58500... +[2024-11-07 22:51:26,144][40007] Num frames 58600... +[2024-11-07 22:51:26,354][40007] Num frames 58700... +[2024-11-07 22:51:26,550][40007] Num frames 58800... +[2024-11-07 22:51:26,688][40007] Avg episode rewards: #0: 4.283, true rewards: #0: 3.953 +[2024-11-07 22:51:26,694][40007] Avg episode reward: 4.283, avg true_objective: 3.953 +[2024-11-07 22:51:26,829][40007] Num frames 58900... +[2024-11-07 22:51:27,035][40007] Num frames 59000... +[2024-11-07 22:51:27,264][40007] Num frames 59100... +[2024-11-07 22:51:28,963][40007] Num frames 59200... +[2024-11-07 22:51:29,077][40007] Avg episode rewards: #0: 4.283, true rewards: #0: 3.953 +[2024-11-07 22:51:29,082][40007] Avg episode reward: 4.283, avg true_objective: 3.953 +[2024-11-07 22:51:29,273][40007] Num frames 59300... +[2024-11-07 22:51:29,503][40007] Num frames 59400... +[2024-11-07 22:51:29,721][40007] Num frames 59500... +[2024-11-07 22:51:29,960][40007] Num frames 59600... +[2024-11-07 22:51:30,113][40007] Avg episode rewards: #0: 4.296, true rewards: #0: 3.956 +[2024-11-07 22:51:30,117][40007] Avg episode reward: 4.296, avg true_objective: 3.956 +[2024-11-07 22:51:30,285][40007] Num frames 59700... +[2024-11-07 22:51:30,522][40007] Num frames 59800... +[2024-11-07 22:51:30,761][40007] Num frames 59900... +[2024-11-07 22:51:31,013][40007] Num frames 60000... +[2024-11-07 22:51:31,249][40007] Num frames 60100... +[2024-11-07 22:51:31,430][40007] Avg episode rewards: #0: 4.305, true rewards: #0: 3.965 +[2024-11-07 22:51:31,434][40007] Avg episode reward: 4.305, avg true_objective: 3.965 +[2024-11-07 22:51:31,570][40007] Num frames 60200... +[2024-11-07 22:51:31,807][40007] Num frames 60300... +[2024-11-07 22:51:32,058][40007] Num frames 60400... +[2024-11-07 22:51:32,280][40007] Num frames 60500... +[2024-11-07 22:51:32,567][40007] Avg episode rewards: #0: 4.286, true rewards: #0: 3.956 +[2024-11-07 22:51:32,571][40007] Avg episode reward: 4.286, avg true_objective: 3.956 +[2024-11-07 22:51:32,586][40007] Num frames 60600... +[2024-11-07 22:51:32,837][40007] Num frames 60700... +[2024-11-07 22:51:33,090][40007] Num frames 60800... +[2024-11-07 22:51:33,309][40007] Num frames 60900... +[2024-11-07 22:51:33,558][40007] Avg episode rewards: #0: 4.286, true rewards: #0: 3.956 +[2024-11-07 22:51:33,562][40007] Avg episode reward: 4.286, avg true_objective: 3.956 +[2024-11-07 22:51:33,622][40007] Num frames 61000... +[2024-11-07 22:51:33,855][40007] Num frames 61100... +[2024-11-07 22:51:34,063][40007] Num frames 61200... +[2024-11-07 22:51:34,250][40007] Num frames 61300... +[2024-11-07 22:51:34,435][40007] Avg episode rewards: #0: 4.269, true rewards: #0: 3.949 +[2024-11-07 22:51:34,438][40007] Avg episode reward: 4.269, avg true_objective: 3.949 +[2024-11-07 22:51:34,542][40007] Num frames 61400... +[2024-11-07 22:51:34,762][40007] Num frames 61500... +[2024-11-07 22:51:34,976][40007] Num frames 61600... +[2024-11-07 22:51:35,191][40007] Num frames 61700... +[2024-11-07 22:51:35,356][40007] Avg episode rewards: #0: 4.269, true rewards: #0: 3.949 +[2024-11-07 22:51:35,360][40007] Avg episode reward: 4.269, avg true_objective: 3.949 +[2024-11-07 22:51:35,486][40007] Num frames 61800... +[2024-11-07 22:51:35,701][40007] Num frames 61900... +[2024-11-07 22:51:35,920][40007] Num frames 62000... +[2024-11-07 22:51:36,138][40007] Num frames 62100... +[2024-11-07 22:51:36,365][40007] Num frames 62200... +[2024-11-07 22:51:36,563][40007] Avg episode rewards: #0: 4.269, true rewards: #0: 3.949 +[2024-11-07 22:51:36,568][40007] Avg episode reward: 4.269, avg true_objective: 3.949 +[2024-11-07 22:51:36,663][40007] Num frames 62300... +[2024-11-07 22:51:36,879][40007] Num frames 62400... +[2024-11-07 22:51:37,096][40007] Num frames 62500... +[2024-11-07 22:51:37,316][40007] Num frames 62600... +[2024-11-07 22:51:37,543][40007] Avg episode rewards: #0: 4.283, true rewards: #0: 3.953 +[2024-11-07 22:51:37,547][40007] Avg episode reward: 4.283, avg true_objective: 3.953 +[2024-11-07 22:51:37,618][40007] Num frames 62700... +[2024-11-07 22:51:37,843][40007] Num frames 62800... +[2024-11-07 22:51:38,073][40007] Num frames 62900... +[2024-11-07 22:51:38,286][40007] Num frames 63000... +[2024-11-07 22:51:38,535][40007] Avg episode rewards: #0: 4.306, true rewards: #0: 3.956 +[2024-11-07 22:51:38,539][40007] Avg episode reward: 4.306, avg true_objective: 3.956 +[2024-11-07 22:51:38,578][40007] Num frames 63100... +[2024-11-07 22:51:38,810][40007] Num frames 63200... +[2024-11-07 22:51:39,027][40007] Num frames 63300... +[2024-11-07 22:51:39,216][40007] Num frames 63400... +[2024-11-07 22:51:39,408][40007] Num frames 63500... +[2024-11-07 22:51:39,634][40007] Num frames 63600... +[2024-11-07 22:51:39,706][40007] Avg episode rewards: #0: 4.349, true rewards: #0: 3.969 +[2024-11-07 22:51:39,710][40007] Avg episode reward: 4.349, avg true_objective: 3.969 +[2024-11-07 22:51:39,962][40007] Num frames 63700... +[2024-11-07 22:51:40,186][40007] Num frames 63800... +[2024-11-07 22:51:40,422][40007] Num frames 63900... +[2024-11-07 22:51:40,664][40007] Num frames 64000... +[2024-11-07 22:51:40,870][40007] Avg episode rewards: #0: 4.356, true rewards: #0: 3.976 +[2024-11-07 22:51:40,871][40007] Avg episode reward: 4.356, avg true_objective: 3.976 +[2024-11-07 22:51:40,989][40007] Num frames 64100... +[2024-11-07 22:51:41,229][40007] Num frames 64200... +[2024-11-07 22:51:41,441][40007] Num frames 64300... +[2024-11-07 22:51:41,700][40007] Num frames 64400... +[2024-11-07 22:51:41,855][40007] Avg episode rewards: #0: 4.356, true rewards: #0: 3.976 +[2024-11-07 22:51:41,860][40007] Avg episode reward: 4.356, avg true_objective: 3.976 +[2024-11-07 22:51:42,028][40007] Num frames 64500... +[2024-11-07 22:51:42,268][40007] Num frames 64600... +[2024-11-07 22:51:42,498][40007] Num frames 64700... +[2024-11-07 22:51:42,721][40007] Num frames 64800... +[2024-11-07 22:51:42,835][40007] Avg episode rewards: #0: 4.356, true rewards: #0: 3.976 +[2024-11-07 22:51:42,836][40007] Avg episode reward: 4.356, avg true_objective: 3.976 +[2024-11-07 22:51:43,025][40007] Num frames 64900... +[2024-11-07 22:51:43,258][40007] Num frames 65000... +[2024-11-07 22:51:43,602][40007] Num frames 65100... +[2024-11-07 22:51:43,866][40007] Num frames 65200... +[2024-11-07 22:51:43,940][40007] Avg episode rewards: #0: 4.356, true rewards: #0: 3.976 +[2024-11-07 22:51:43,942][40007] Avg episode reward: 4.356, avg true_objective: 3.976 +[2024-11-07 22:51:44,264][40007] Num frames 65300... +[2024-11-07 22:51:44,527][40007] Num frames 65400... +[2024-11-07 22:51:44,752][40007] Num frames 65500... +[2024-11-07 22:51:45,059][40007] Avg episode rewards: #0: 4.356, true rewards: #0: 3.976 +[2024-11-07 22:51:45,060][40007] Avg episode reward: 4.356, avg true_objective: 3.976 +[2024-11-07 22:51:45,085][40007] Num frames 65600... +[2024-11-07 22:51:45,431][40007] Num frames 65700... +[2024-11-07 22:51:45,800][40007] Num frames 65800... +[2024-11-07 22:51:46,038][40007] Num frames 65900... +[2024-11-07 22:51:46,383][40007] Avg episode rewards: #0: 4.356, true rewards: #0: 3.976 +[2024-11-07 22:51:46,388][40007] Avg episode reward: 4.356, avg true_objective: 3.976 +[2024-11-07 22:51:46,494][40007] Num frames 66000... +[2024-11-07 22:51:46,694][40007] Num frames 66100... +[2024-11-07 22:51:46,886][40007] Num frames 66200... +[2024-11-07 22:51:47,084][40007] Num frames 66300... +[2024-11-07 22:51:47,256][40007] Avg episode rewards: #0: 4.340, true rewards: #0: 3.970 +[2024-11-07 22:51:47,260][40007] Avg episode reward: 4.340, avg true_objective: 3.970 +[2024-11-07 22:51:47,366][40007] Num frames 66400... +[2024-11-07 22:51:47,578][40007] Num frames 66500... +[2024-11-07 22:51:47,808][40007] Num frames 66600... +[2024-11-07 22:51:48,036][40007] Num frames 66700... +[2024-11-07 22:51:48,188][40007] Avg episode rewards: #0: 4.323, true rewards: #0: 3.963 +[2024-11-07 22:51:48,190][40007] Avg episode reward: 4.323, avg true_objective: 3.963 +[2024-11-07 22:51:48,308][40007] Num frames 66800... +[2024-11-07 22:51:48,499][40007] Num frames 66900... +[2024-11-07 22:51:48,692][40007] Num frames 67000... +[2024-11-07 22:51:48,888][40007] Num frames 67100... +[2024-11-07 22:51:49,000][40007] Avg episode rewards: #0: 4.317, true rewards: #0: 3.967 +[2024-11-07 22:51:49,004][40007] Avg episode reward: 4.317, avg true_objective: 3.967 +[2024-11-07 22:51:49,159][40007] Num frames 67200... +[2024-11-07 22:51:49,367][40007] Num frames 67300... +[2024-11-07 22:51:49,564][40007] Num frames 67400... +[2024-11-07 22:51:49,761][40007] Num frames 67500... +[2024-11-07 22:51:49,841][40007] Avg episode rewards: #0: 4.317, true rewards: #0: 3.967 +[2024-11-07 22:51:49,847][40007] Avg episode reward: 4.317, avg true_objective: 3.967 +[2024-11-07 22:51:50,048][40007] Num frames 67600... +[2024-11-07 22:51:50,240][40007] Num frames 67700... +[2024-11-07 22:51:50,423][40007] Num frames 67800... +[2024-11-07 22:51:50,611][40007] Num frames 67900... +[2024-11-07 22:51:50,720][40007] Avg episode rewards: #0: 4.330, true rewards: #0: 3.970 +[2024-11-07 22:51:50,724][40007] Avg episode reward: 4.330, avg true_objective: 3.970 +[2024-11-07 22:51:50,865][40007] Num frames 68000... +[2024-11-07 22:51:51,059][40007] Num frames 68100... +[2024-11-07 22:51:51,233][40007] Num frames 68200... +[2024-11-07 22:51:51,405][40007] Num frames 68300... +[2024-11-07 22:51:51,598][40007] Avg episode rewards: #0: 4.346, true rewards: #0: 3.976 +[2024-11-07 22:51:51,601][40007] Avg episode reward: 4.346, avg true_objective: 3.976 +[2024-11-07 22:51:51,670][40007] Num frames 68400... +[2024-11-07 22:51:51,853][40007] Num frames 68500... +[2024-11-07 22:51:52,037][40007] Num frames 68600... +[2024-11-07 22:51:52,153][40007] Avg episode rewards: #0: 4.346, true rewards: #0: 3.976 +[2024-11-07 22:51:52,155][40007] Avg episode reward: 4.346, avg true_objective: 3.976 +[2024-11-07 22:51:52,312][40007] Num frames 68700... +[2024-11-07 22:51:52,487][40007] Num frames 68800... +[2024-11-07 22:51:52,665][40007] Num frames 68900... +[2024-11-07 22:51:52,842][40007] Num frames 69000... +[2024-11-07 22:51:52,990][40007] Avg episode rewards: #0: 4.359, true rewards: #0: 3.979 +[2024-11-07 22:51:52,994][40007] Avg episode reward: 4.359, avg true_objective: 3.979 +[2024-11-07 22:51:53,116][40007] Num frames 69100... +[2024-11-07 22:51:53,296][40007] Num frames 69200... +[2024-11-07 22:51:53,470][40007] Num frames 69300... +[2024-11-07 22:51:53,645][40007] Num frames 69400... +[2024-11-07 22:51:53,758][40007] Avg episode rewards: #0: 4.359, true rewards: #0: 3.979 +[2024-11-07 22:51:53,763][40007] Avg episode reward: 4.359, avg true_objective: 3.979 +[2024-11-07 22:51:53,905][40007] Num frames 69500... +[2024-11-07 22:51:54,076][40007] Num frames 69600... +[2024-11-07 22:51:54,278][40007] Num frames 69700... +[2024-11-07 22:51:54,488][40007] Num frames 69800... +[2024-11-07 22:51:54,576][40007] Avg episode rewards: #0: 4.359, true rewards: #0: 3.979 +[2024-11-07 22:51:54,582][40007] Avg episode reward: 4.359, avg true_objective: 3.979 +[2024-11-07 22:51:54,768][40007] Num frames 69900... +[2024-11-07 22:51:54,974][40007] Num frames 70000... +[2024-11-07 22:51:55,173][40007] Num frames 70100... +[2024-11-07 22:51:55,365][40007] Num frames 70200... +[2024-11-07 22:51:55,544][40007] Avg episode rewards: #0: 4.376, true rewards: #0: 3.986 +[2024-11-07 22:51:55,549][40007] Avg episode reward: 4.376, avg true_objective: 3.986 +[2024-11-07 22:51:55,638][40007] Num frames 70300... +[2024-11-07 22:51:55,831][40007] Num frames 70400... +[2024-11-07 22:51:56,026][40007] Num frames 70500... +[2024-11-07 22:51:56,211][40007] Num frames 70600... +[2024-11-07 22:51:56,357][40007] Avg episode rewards: #0: 4.376, true rewards: #0: 3.986 +[2024-11-07 22:51:56,361][40007] Avg episode reward: 4.376, avg true_objective: 3.986 +[2024-11-07 22:51:56,489][40007] Num frames 70700... +[2024-11-07 22:51:56,696][40007] Num frames 70800... +[2024-11-07 22:51:56,911][40007] Num frames 70900... +[2024-11-07 22:51:57,106][40007] Num frames 71000... +[2024-11-07 22:51:57,230][40007] Avg episode rewards: #0: 4.376, true rewards: #0: 3.986 +[2024-11-07 22:51:57,236][40007] Avg episode reward: 4.376, avg true_objective: 3.986 +[2024-11-07 22:51:57,391][40007] Num frames 71100... +[2024-11-07 22:51:57,585][40007] Num frames 71200... +[2024-11-07 22:51:57,781][40007] Num frames 71300... +[2024-11-07 22:51:57,978][40007] Num frames 71400... +[2024-11-07 22:51:58,066][40007] Avg episode rewards: #0: 4.376, true rewards: #0: 3.986 +[2024-11-07 22:51:58,070][40007] Avg episode reward: 4.376, avg true_objective: 3.986 +[2024-11-07 22:51:58,255][40007] Num frames 71500... +[2024-11-07 22:51:58,451][40007] Num frames 71600... +[2024-11-07 22:51:58,650][40007] Num frames 71700... +[2024-11-07 22:51:58,878][40007] Num frames 71800... +[2024-11-07 22:51:59,078][40007] Avg episode rewards: #0: 4.359, true rewards: #0: 3.979 +[2024-11-07 22:51:59,081][40007] Avg episode reward: 4.359, avg true_objective: 3.979 +[2024-11-07 22:51:59,169][40007] Num frames 71900... +[2024-11-07 22:51:59,418][40007] Num frames 72000... +[2024-11-07 22:51:59,639][40007] Num frames 72100... +[2024-11-07 22:51:59,849][40007] Num frames 72200... +[2024-11-07 22:52:00,046][40007] Num frames 72300... +[2024-11-07 22:52:00,128][40007] Avg episode rewards: #0: 4.376, true rewards: #0: 3.986 +[2024-11-07 22:52:00,130][40007] Avg episode reward: 4.376, avg true_objective: 3.986 +[2024-11-07 22:52:00,374][40007] Num frames 72400... +[2024-11-07 22:52:00,586][40007] Num frames 72500... +[2024-11-07 22:52:00,809][40007] Num frames 72600... +[2024-11-07 22:52:01,030][40007] Num frames 72700... +[2024-11-07 22:52:02,617][40007] Avg episode rewards: #0: 4.379, true rewards: #0: 3.989 +[2024-11-07 22:52:02,620][40007] Avg episode reward: 4.379, avg true_objective: 3.989 +[2024-11-07 22:52:02,801][40007] Num frames 72800... +[2024-11-07 22:52:02,984][40007] Num frames 72900... +[2024-11-07 22:52:03,171][40007] Num frames 73000... +[2024-11-07 22:52:03,360][40007] Num frames 73100... +[2024-11-07 22:52:03,439][40007] Avg episode rewards: #0: 4.363, true rewards: #0: 3.983 +[2024-11-07 22:52:03,443][40007] Avg episode reward: 4.363, avg true_objective: 3.983 +[2024-11-07 22:52:03,630][40007] Num frames 73200... +[2024-11-07 22:52:03,817][40007] Num frames 73300... +[2024-11-07 22:52:04,005][40007] Num frames 73400... +[2024-11-07 22:52:04,243][40007] Avg episode rewards: #0: 4.363, true rewards: #0: 3.983 +[2024-11-07 22:52:04,247][40007] Avg episode reward: 4.363, avg true_objective: 3.983 +[2024-11-07 22:52:04,276][40007] Num frames 73500... +[2024-11-07 22:52:04,493][40007] Num frames 73600... +[2024-11-07 22:52:04,696][40007] Num frames 73700... +[2024-11-07 22:52:04,898][40007] Num frames 73800... +[2024-11-07 22:52:05,107][40007] Avg episode rewards: #0: 4.354, true rewards: #0: 3.984 +[2024-11-07 22:52:05,111][40007] Avg episode reward: 4.354, avg true_objective: 3.984 +[2024-11-07 22:52:05,181][40007] Num frames 73900... +[2024-11-07 22:52:05,380][40007] Num frames 74000... +[2024-11-07 22:52:05,575][40007] Num frames 74100... +[2024-11-07 22:52:05,794][40007] Num frames 74200... +[2024-11-07 22:52:05,982][40007] Avg episode rewards: #0: 4.338, true rewards: #0: 3.978 +[2024-11-07 22:52:05,987][40007] Avg episode reward: 4.338, avg true_objective: 3.978 +[2024-11-07 22:52:06,085][40007] Num frames 74300... +[2024-11-07 22:52:06,285][40007] Num frames 74400... +[2024-11-07 22:52:06,479][40007] Num frames 74500... +[2024-11-07 22:52:06,670][40007] Num frames 74600... +[2024-11-07 22:52:06,823][40007] Avg episode rewards: #0: 4.338, true rewards: #0: 3.978 +[2024-11-07 22:52:06,827][40007] Avg episode reward: 4.338, avg true_objective: 3.978 +[2024-11-07 22:52:06,947][40007] Num frames 74700... +[2024-11-07 22:52:07,146][40007] Num frames 74800... +[2024-11-07 22:52:07,335][40007] Num frames 74900... +[2024-11-07 22:52:07,522][40007] Num frames 75000... +[2024-11-07 22:52:07,765][40007] Avg episode rewards: #0: 4.354, true rewards: #0: 3.984 +[2024-11-07 22:52:07,768][40007] Avg episode reward: 4.354, avg true_objective: 3.984 +[2024-11-07 22:52:07,801][40007] Num frames 75100... +[2024-11-07 22:52:08,004][40007] Num frames 75200... +[2024-11-07 22:52:08,197][40007] Num frames 75300... +[2024-11-07 22:52:08,380][40007] Num frames 75400... +[2024-11-07 22:52:08,581][40007] Num frames 75500... +[2024-11-07 22:52:08,723][40007] Avg episode rewards: #0: 4.383, true rewards: #0: 4.003 +[2024-11-07 22:52:08,726][40007] Avg episode reward: 4.383, avg true_objective: 4.003 +[2024-11-07 22:52:08,857][40007] Num frames 75600... +[2024-11-07 22:52:09,031][40007] Num frames 75700... +[2024-11-07 22:52:09,205][40007] Num frames 75800... +[2024-11-07 22:52:09,385][40007] Num frames 75900... +[2024-11-07 22:52:09,654][40007] Avg episode rewards: #0: 4.383, true rewards: #0: 4.003 +[2024-11-07 22:52:09,656][40007] Avg episode reward: 4.383, avg true_objective: 4.003 +[2024-11-07 22:52:09,686][40007] Num frames 76000... +[2024-11-07 22:52:09,882][40007] Num frames 76100... +[2024-11-07 22:52:10,057][40007] Num frames 76200... +[2024-11-07 22:52:10,241][40007] Num frames 76300... +[2024-11-07 22:52:10,416][40007] Num frames 76400... +[2024-11-07 22:52:10,558][40007] Avg episode rewards: #0: 4.400, true rewards: #0: 4.010 +[2024-11-07 22:52:10,562][40007] Avg episode reward: 4.400, avg true_objective: 4.010 +[2024-11-07 22:52:10,689][40007] Num frames 76500... +[2024-11-07 22:52:10,875][40007] Num frames 76600... +[2024-11-07 22:52:11,076][40007] Num frames 76700... +[2024-11-07 22:52:11,252][40007] Num frames 76800... +[2024-11-07 22:52:11,402][40007] Avg episode rewards: #0: 4.403, true rewards: #0: 4.013 +[2024-11-07 22:52:11,405][40007] Avg episode reward: 4.403, avg true_objective: 4.013 +[2024-11-07 22:52:11,506][40007] Num frames 76900... +[2024-11-07 22:52:11,678][40007] Num frames 77000... +[2024-11-07 22:52:11,870][40007] Num frames 77100... +[2024-11-07 22:52:12,038][40007] Num frames 77200... +[2024-11-07 22:52:12,165][40007] Avg episode rewards: #0: 4.403, true rewards: #0: 4.013 +[2024-11-07 22:52:12,171][40007] Avg episode reward: 4.403, avg true_objective: 4.013 +[2024-11-07 22:52:12,304][40007] Num frames 77300... +[2024-11-07 22:52:12,506][40007] Num frames 77400... +[2024-11-07 22:52:12,727][40007] Num frames 77500... +[2024-11-07 22:52:12,933][40007] Num frames 77600... +[2024-11-07 22:52:13,159][40007] Avg episode rewards: #0: 4.419, true rewards: #0: 4.019 +[2024-11-07 22:52:13,160][40007] Avg episode reward: 4.419, avg true_objective: 4.019 +[2024-11-07 22:52:13,188][40007] Num frames 77700... +[2024-11-07 22:52:13,399][40007] Num frames 77800... +[2024-11-07 22:52:13,631][40007] Num frames 77900... +[2024-11-07 22:52:13,866][40007] Num frames 78000... +[2024-11-07 22:52:14,069][40007] Avg episode rewards: #0: 4.419, true rewards: #0: 4.019 +[2024-11-07 22:52:14,071][40007] Avg episode reward: 4.419, avg true_objective: 4.019 +[2024-11-07 22:52:14,143][40007] Num frames 78100... +[2024-11-07 22:52:14,332][40007] Num frames 78200... +[2024-11-07 22:52:14,513][40007] Num frames 78300... +[2024-11-07 22:52:14,692][40007] Num frames 78400... +[2024-11-07 22:52:14,858][40007] Avg episode rewards: #0: 4.419, true rewards: #0: 4.019 +[2024-11-07 22:52:14,862][40007] Avg episode reward: 4.419, avg true_objective: 4.019 +[2024-11-07 22:52:14,963][40007] Num frames 78500... +[2024-11-07 22:52:15,165][40007] Num frames 78600... +[2024-11-07 22:52:15,396][40007] Num frames 78700... +[2024-11-07 22:52:15,612][40007] Num frames 78800... +[2024-11-07 22:52:15,787][40007] Avg episode rewards: #0: 4.400, true rewards: #0: 4.010 +[2024-11-07 22:52:15,788][40007] Avg episode reward: 4.400, avg true_objective: 4.010 +[2024-11-07 22:52:15,922][40007] Num frames 78900... +[2024-11-07 22:52:16,120][40007] Num frames 79000... +[2024-11-07 22:52:16,299][40007] Num frames 79100... +[2024-11-07 22:52:16,470][40007] Num frames 79200... +[2024-11-07 22:52:16,568][40007] Avg episode rewards: #0: 4.400, true rewards: #0: 4.010 +[2024-11-07 22:52:16,572][40007] Avg episode reward: 4.400, avg true_objective: 4.010 +[2024-11-07 22:52:16,736][40007] Num frames 79300... +[2024-11-07 22:52:16,938][40007] Num frames 79400... +[2024-11-07 22:52:17,144][40007] Num frames 79500... +[2024-11-07 22:52:17,341][40007] Num frames 79600... +[2024-11-07 22:52:17,413][40007] Avg episode rewards: #0: 4.380, true rewards: #0: 4.000 +[2024-11-07 22:52:17,416][40007] Avg episode reward: 4.380, avg true_objective: 4.000 +[2024-11-07 22:52:17,601][40007] Num frames 79700... +[2024-11-07 22:52:17,784][40007] Num frames 79800... +[2024-11-07 22:52:17,957][40007] Num frames 79900... +[2024-11-07 22:52:18,126][40007] Num frames 80000... +[2024-11-07 22:52:18,285][40007] Avg episode rewards: #0: 4.396, true rewards: #0: 4.006 +[2024-11-07 22:52:18,289][40007] Avg episode reward: 4.396, avg true_objective: 4.006 +[2024-11-07 22:52:18,382][40007] Num frames 80100... +[2024-11-07 22:52:18,575][40007] Num frames 80200... +[2024-11-07 22:52:18,765][40007] Num frames 80300... +[2024-11-07 22:52:18,944][40007] Num frames 80400... +[2024-11-07 22:52:19,071][40007] Avg episode rewards: #0: 4.396, true rewards: #0: 4.006 +[2024-11-07 22:52:19,074][40007] Avg episode reward: 4.396, avg true_objective: 4.006 +[2024-11-07 22:52:19,204][40007] Num frames 80500... +[2024-11-07 22:52:19,385][40007] Num frames 80600... +[2024-11-07 22:52:19,571][40007] Num frames 80700... +[2024-11-07 22:52:19,745][40007] Num frames 80800... +[2024-11-07 22:52:19,925][40007] Num frames 80900... +[2024-11-07 22:52:20,127][40007] Avg episode rewards: #0: 4.432, true rewards: #0: 4.022 +[2024-11-07 22:52:20,131][40007] Avg episode reward: 4.432, avg true_objective: 4.022 +[2024-11-07 22:52:20,180][40007] Num frames 81000... +[2024-11-07 22:52:20,389][40007] Num frames 81100... +[2024-11-07 22:52:20,631][40007] Num frames 81200... +[2024-11-07 22:52:20,866][40007] Num frames 81300... +[2024-11-07 22:52:21,105][40007] Num frames 81400... +[2024-11-07 22:52:21,229][40007] Avg episode rewards: #0: 4.449, true rewards: #0: 4.029 +[2024-11-07 22:52:21,232][40007] Avg episode reward: 4.449, avg true_objective: 4.029 +[2024-11-07 22:52:21,392][40007] Num frames 81500... +[2024-11-07 22:52:21,596][40007] Num frames 81600... +[2024-11-07 22:52:21,787][40007] Num frames 81700... +[2024-11-07 22:52:21,990][40007] Num frames 81800... +[2024-11-07 22:52:22,137][40007] Avg episode rewards: #0: 4.458, true rewards: #0: 4.038 +[2024-11-07 22:52:22,142][40007] Avg episode reward: 4.458, avg true_objective: 4.038 +[2024-11-07 22:52:22,267][40007] Num frames 81900... +[2024-11-07 22:52:22,465][40007] Num frames 82000... +[2024-11-07 22:52:22,668][40007] Num frames 82100... +[2024-11-07 22:52:22,858][40007] Num frames 82200... +[2024-11-07 22:52:23,086][40007] Avg episode rewards: #0: 4.475, true rewards: #0: 4.045 +[2024-11-07 22:52:23,089][40007] Avg episode reward: 4.475, avg true_objective: 4.045 +[2024-11-07 22:52:23,116][40007] Num frames 82300... +[2024-11-07 22:52:23,322][40007] Num frames 82400... +[2024-11-07 22:52:23,536][40007] Num frames 82500... +[2024-11-07 22:52:23,768][40007] Num frames 82600... +[2024-11-07 22:52:23,999][40007] Num frames 82700... +[2024-11-07 22:52:24,158][40007] Avg episode rewards: #0: 4.491, true rewards: #0: 4.051 +[2024-11-07 22:52:24,162][40007] Avg episode reward: 4.491, avg true_objective: 4.051 +[2024-11-07 22:52:24,302][40007] Num frames 82800... +[2024-11-07 22:52:24,529][40007] Num frames 82900... +[2024-11-07 22:52:24,750][40007] Num frames 83000... +[2024-11-07 22:52:24,962][40007] Num frames 83100... +[2024-11-07 22:52:25,082][40007] Avg episode rewards: #0: 4.462, true rewards: #0: 4.042 +[2024-11-07 22:52:25,086][40007] Avg episode reward: 4.462, avg true_objective: 4.042 +[2024-11-07 22:52:25,275][40007] Num frames 83200... +[2024-11-07 22:52:25,492][40007] Num frames 83300... +[2024-11-07 22:52:25,712][40007] Num frames 83400... +[2024-11-07 22:52:25,930][40007] Num frames 83500... +[2024-11-07 22:52:26,011][40007] Avg episode rewards: #0: 4.462, true rewards: #0: 4.042 +[2024-11-07 22:52:26,016][40007] Avg episode reward: 4.462, avg true_objective: 4.042 +[2024-11-07 22:52:26,230][40007] Num frames 83600... +[2024-11-07 22:52:26,444][40007] Num frames 83700... +[2024-11-07 22:52:26,668][40007] Num frames 83800... +[2024-11-07 22:52:26,933][40007] Avg episode rewards: #0: 4.462, true rewards: #0: 4.042 +[2024-11-07 22:52:26,937][40007] Avg episode reward: 4.462, avg true_objective: 4.042 +[2024-11-07 22:52:26,968][40007] Num frames 83900... +[2024-11-07 22:52:27,198][40007] Num frames 84000... +[2024-11-07 22:52:27,412][40007] Num frames 84100... +[2024-11-07 22:52:27,645][40007] Num frames 84200... +[2024-11-07 22:52:27,902][40007] Avg episode rewards: #0: 4.426, true rewards: #0: 4.026 +[2024-11-07 22:52:27,906][40007] Avg episode reward: 4.426, avg true_objective: 4.026 +[2024-11-07 22:52:27,982][40007] Num frames 84300... +[2024-11-07 22:52:28,230][40007] Num frames 84400... +[2024-11-07 22:52:28,459][40007] Num frames 84500... +[2024-11-07 22:52:28,707][40007] Num frames 84600... +[2024-11-07 22:52:28,941][40007] Num frames 84700... +[2024-11-07 22:52:29,059][40007] Avg episode rewards: #0: 4.442, true rewards: #0: 4.032 +[2024-11-07 22:52:29,060][40007] Avg episode reward: 4.442, avg true_objective: 4.032 +[2024-11-07 22:52:29,241][40007] Num frames 84800... +[2024-11-07 22:52:29,477][40007] Num frames 84900... +[2024-11-07 22:52:29,713][40007] Num frames 85000... +[2024-11-07 22:52:29,945][40007] Num frames 85100... +[2024-11-07 22:52:30,033][40007] Avg episode rewards: #0: 4.442, true rewards: #0: 4.032 +[2024-11-07 22:52:30,038][40007] Avg episode reward: 4.442, avg true_objective: 4.032 +[2024-11-07 22:52:30,279][40007] Num frames 85200... +[2024-11-07 22:52:30,501][40007] Num frames 85300... +[2024-11-07 22:52:30,738][40007] Num frames 85400... +[2024-11-07 22:52:31,023][40007] Num frames 85500... +[2024-11-07 22:52:31,223][40007] Avg episode rewards: #0: 4.458, true rewards: #0: 4.038 +[2024-11-07 22:52:31,227][40007] Avg episode reward: 4.458, avg true_objective: 4.038 +[2024-11-07 22:52:31,348][40007] Num frames 85600... +[2024-11-07 22:52:31,581][40007] Num frames 85700... +[2024-11-07 22:52:31,833][40007] Num frames 85800... +[2024-11-07 22:52:32,061][40007] Num frames 85900... +[2024-11-07 22:52:32,278][40007] Num frames 86000... +[2024-11-07 22:52:32,487][40007] Num frames 86100... +[2024-11-07 22:52:32,553][40007] Avg episode rewards: #0: 4.494, true rewards: #0: 4.054 +[2024-11-07 22:52:32,556][40007] Avg episode reward: 4.494, avg true_objective: 4.054 +[2024-11-07 22:52:32,776][40007] Num frames 86200... +[2024-11-07 22:52:32,969][40007] Num frames 86300... +[2024-11-07 22:52:33,169][40007] Num frames 86400... +[2024-11-07 22:52:33,402][40007] Avg episode rewards: #0: 4.494, true rewards: #0: 4.054 +[2024-11-07 22:52:33,404][40007] Avg episode reward: 4.494, avg true_objective: 4.054 +[2024-11-07 22:52:33,447][40007] Num frames 86500... +[2024-11-07 22:52:33,658][40007] Num frames 86600... +[2024-11-07 22:52:33,871][40007] Num frames 86700... +[2024-11-07 22:52:34,065][40007] Num frames 86800... +[2024-11-07 22:52:34,258][40007] Avg episode rewards: #0: 4.494, true rewards: #0: 4.054 +[2024-11-07 22:52:34,261][40007] Avg episode reward: 4.494, avg true_objective: 4.054 +[2024-11-07 22:52:34,341][40007] Num frames 86900... +[2024-11-07 22:52:34,539][40007] Num frames 87000... +[2024-11-07 22:52:34,746][40007] Num frames 87100... +[2024-11-07 22:52:36,443][40007] Num frames 87200... +[2024-11-07 22:52:36,615][40007] Avg episode rewards: #0: 4.494, true rewards: #0: 4.054 +[2024-11-07 22:52:36,617][40007] Avg episode reward: 4.494, avg true_objective: 4.054 +[2024-11-07 22:52:36,729][40007] Num frames 87300... +[2024-11-07 22:52:36,923][40007] Num frames 87400... +[2024-11-07 22:52:37,109][40007] Num frames 87500... +[2024-11-07 22:52:37,302][40007] Num frames 87600... +[2024-11-07 22:52:37,510][40007] Num frames 87700... +[2024-11-07 22:52:37,571][40007] Avg episode rewards: #0: 4.511, true rewards: #0: 4.061 +[2024-11-07 22:52:37,576][40007] Avg episode reward: 4.511, avg true_objective: 4.061 +[2024-11-07 22:52:37,806][40007] Num frames 87800... +[2024-11-07 22:52:38,116][40007] Num frames 87900... +[2024-11-07 22:52:38,324][40007] Num frames 88000... +[2024-11-07 22:52:38,579][40007] Avg episode rewards: #0: 4.504, true rewards: #0: 4.064 +[2024-11-07 22:52:38,580][40007] Avg episode reward: 4.504, avg true_objective: 4.064 +[2024-11-07 22:52:38,625][40007] Num frames 88100... +[2024-11-07 22:52:38,828][40007] Num frames 88200... +[2024-11-07 22:52:39,034][40007] Num frames 88300... +[2024-11-07 22:52:39,246][40007] Num frames 88400... +[2024-11-07 22:52:39,452][40007] Avg episode rewards: #0: 4.504, true rewards: #0: 4.064 +[2024-11-07 22:52:39,458][40007] Avg episode reward: 4.504, avg true_objective: 4.064 +[2024-11-07 22:52:39,553][40007] Num frames 88500... +[2024-11-07 22:52:39,773][40007] Num frames 88600... +[2024-11-07 22:52:39,979][40007] Num frames 88700... +[2024-11-07 22:52:40,174][40007] Num frames 88800... +[2024-11-07 22:52:40,363][40007] Num frames 88900... +[2024-11-07 22:52:40,460][40007] Avg episode rewards: #0: 4.520, true rewards: #0: 4.070 +[2024-11-07 22:52:40,462][40007] Avg episode reward: 4.520, avg true_objective: 4.070 +[2024-11-07 22:52:40,637][40007] Num frames 89000... +[2024-11-07 22:52:40,831][40007] Num frames 89100... +[2024-11-07 22:52:41,019][40007] Num frames 89200... +[2024-11-07 22:52:41,205][40007] Num frames 89300... +[2024-11-07 22:52:41,266][40007] Avg episode rewards: #0: 4.507, true rewards: #0: 4.067 +[2024-11-07 22:52:41,270][40007] Avg episode reward: 4.507, avg true_objective: 4.067 +[2024-11-07 22:52:41,490][40007] Num frames 89400... +[2024-11-07 22:52:41,695][40007] Num frames 89500... +[2024-11-07 22:52:41,878][40007] Num frames 89600... +[2024-11-07 22:52:42,102][40007] Avg episode rewards: #0: 4.507, true rewards: #0: 4.067 +[2024-11-07 22:52:42,104][40007] Avg episode reward: 4.507, avg true_objective: 4.067 +[2024-11-07 22:52:42,137][40007] Num frames 89700... +[2024-11-07 22:52:42,336][40007] Num frames 89800... +[2024-11-07 22:52:42,584][40007] Num frames 89900... +[2024-11-07 22:52:42,848][40007] Num frames 90000... +[2024-11-07 22:52:43,069][40007] Avg episode rewards: #0: 4.507, true rewards: #0: 4.067 +[2024-11-07 22:52:43,072][40007] Avg episode reward: 4.507, avg true_objective: 4.067 +[2024-11-07 22:52:43,184][40007] Num frames 90100... +[2024-11-07 22:52:43,448][40007] Num frames 90200... +[2024-11-07 22:52:43,636][40007] Num frames 90300... +[2024-11-07 22:52:43,827][40007] Num frames 90400... +[2024-11-07 22:52:43,987][40007] Avg episode rewards: #0: 4.491, true rewards: #0: 4.061 +[2024-11-07 22:52:43,990][40007] Avg episode reward: 4.491, avg true_objective: 4.061 +[2024-11-07 22:52:44,095][40007] Num frames 90500... +[2024-11-07 22:52:44,417][40007] Num frames 90600... +[2024-11-07 22:52:44,715][40007] Num frames 90700... +[2024-11-07 22:52:44,810][40007] Avg episode rewards: #0: 4.478, true rewards: #0: 4.048 +[2024-11-07 22:52:44,811][40007] Avg episode reward: 4.478, avg true_objective: 4.048 +[2024-11-07 22:52:45,010][40007] Num frames 90800... +[2024-11-07 22:52:45,276][40007] Num frames 90900... +[2024-11-07 22:52:45,534][40007] Num frames 91000... +[2024-11-07 22:52:45,917][40007] Avg episode rewards: #0: 4.462, true rewards: #0: 4.042 +[2024-11-07 22:52:45,921][40007] Avg episode reward: 4.462, avg true_objective: 4.042 +[2024-11-07 22:52:45,954][40007] Num frames 91100... +[2024-11-07 22:52:46,264][40007] Num frames 91200... +[2024-11-07 22:52:46,594][40007] Num frames 91300... +[2024-11-07 22:52:46,876][40007] Num frames 91400... +[2024-11-07 22:52:47,218][40007] Avg episode rewards: #0: 4.462, true rewards: #0: 4.042 +[2024-11-07 22:52:47,220][40007] Avg episode reward: 4.462, avg true_objective: 4.042 +[2024-11-07 22:52:47,273][40007] Num frames 91500... +[2024-11-07 22:52:47,664][40007] Num frames 91600... +[2024-11-07 22:52:47,968][40007] Num frames 91700... +[2024-11-07 22:52:48,434][40007] Num frames 91800... +[2024-11-07 22:52:48,719][40007] Avg episode rewards: #0: 4.462, true rewards: #0: 4.042 +[2024-11-07 22:52:48,721][40007] Avg episode reward: 4.462, avg true_objective: 4.042 +[2024-11-07 22:52:48,903][40007] Num frames 91900... +[2024-11-07 22:52:49,512][40007] Num frames 92000... +[2024-11-07 22:52:49,927][40007] Num frames 92100... +[2024-11-07 22:52:50,332][40007] Num frames 92200... +[2024-11-07 22:52:50,678][40007] Avg episode rewards: #0: 4.475, true rewards: #0: 4.045 +[2024-11-07 22:52:50,680][40007] Avg episode reward: 4.475, avg true_objective: 4.045 +[2024-11-07 22:52:50,768][40007] Num frames 92300... +[2024-11-07 22:52:51,219][40007] Num frames 92400... +[2024-11-07 22:52:51,909][40007] Num frames 92500... +[2024-11-07 22:52:52,274][40007] Num frames 92600... +[2024-11-07 22:52:52,605][40007] Avg episode rewards: #0: 4.475, true rewards: #0: 4.045 +[2024-11-07 22:52:52,607][40007] Avg episode reward: 4.475, avg true_objective: 4.045 +[2024-11-07 22:52:53,344][40007] Num frames 92700... +[2024-11-07 22:52:55,512][40007] Num frames 92800... +[2024-11-07 22:52:55,832][40007] Num frames 92900... +[2024-11-07 22:52:56,086][40007] Num frames 93000... +[2024-11-07 22:52:56,241][40007] Avg episode rewards: #0: 4.475, true rewards: #0: 4.045 +[2024-11-07 22:52:56,245][40007] Avg episode reward: 4.475, avg true_objective: 4.045 +[2024-11-07 22:52:56,463][40007] Num frames 93100... +[2024-11-07 22:52:56,687][40007] Num frames 93200... +[2024-11-07 22:52:56,898][40007] Num frames 93300... +[2024-11-07 22:52:57,124][40007] Num frames 93400... +[2024-11-07 22:52:57,254][40007] Avg episode rewards: #0: 4.462, true rewards: #0: 4.042 +[2024-11-07 22:52:57,257][40007] Avg episode reward: 4.462, avg true_objective: 4.042 +[2024-11-07 22:52:57,436][40007] Num frames 93500... +[2024-11-07 22:52:57,930][40007] Num frames 93600... +[2024-11-07 22:52:58,171][40007] Num frames 93700... +[2024-11-07 22:52:58,364][40007] Num frames 93800... +[2024-11-07 22:52:58,461][40007] Avg episode rewards: #0: 4.462, true rewards: #0: 4.042 +[2024-11-07 22:52:58,466][40007] Avg episode reward: 4.462, avg true_objective: 4.042 +[2024-11-07 22:52:58,654][40007] Num frames 93900... +[2024-11-07 22:52:59,064][40007] Num frames 94000... +[2024-11-07 22:52:59,686][40007] Num frames 94100... +[2024-11-07 22:53:00,140][40007] Num frames 94200... +[2024-11-07 22:53:00,262][40007] Avg episode rewards: #0: 4.465, true rewards: #0: 4.045 +[2024-11-07 22:53:00,263][40007] Avg episode reward: 4.465, avg true_objective: 4.045 +[2024-11-07 22:53:00,407][40007] Num frames 94300... +[2024-11-07 22:53:00,723][40007] Num frames 94400... +[2024-11-07 22:53:00,960][40007] Num frames 94500... +[2024-11-07 22:53:01,205][40007] Num frames 94600... +[2024-11-07 22:53:01,331][40007] Avg episode rewards: #0: 4.465, true rewards: #0: 4.045 +[2024-11-07 22:53:01,333][40007] Avg episode reward: 4.465, avg true_objective: 4.045 +[2024-11-07 22:53:01,558][40007] Num frames 94700... +[2024-11-07 22:53:01,748][40007] Num frames 94800... +[2024-11-07 22:53:02,013][40007] Num frames 94900... +[2024-11-07 22:53:02,238][40007] Num frames 95000... +[2024-11-07 22:53:02,421][40007] Avg episode rewards: #0: 4.468, true rewards: #0: 4.048 +[2024-11-07 22:53:02,424][40007] Avg episode reward: 4.468, avg true_objective: 4.048 +[2024-11-07 22:53:02,508][40007] Num frames 95100... +[2024-11-07 22:53:02,711][40007] Num frames 95200... +[2024-11-07 22:53:02,929][40007] Num frames 95300... +[2024-11-07 22:53:03,140][40007] Num frames 95400... +[2024-11-07 22:53:03,411][40007] Avg episode rewards: #0: 4.468, true rewards: #0: 4.048 +[2024-11-07 22:53:03,416][40007] Avg episode reward: 4.468, avg true_objective: 4.048 +[2024-11-07 22:53:03,537][40007] Num frames 95500... +[2024-11-07 22:53:03,738][40007] Num frames 95600... +[2024-11-07 22:53:03,926][40007] Num frames 95700... +[2024-11-07 22:53:04,133][40007] Num frames 95800... +[2024-11-07 22:53:04,311][40007] Avg episode rewards: #0: 4.481, true rewards: #0: 4.051 +[2024-11-07 22:53:04,315][40007] Avg episode reward: 4.481, avg true_objective: 4.051 +[2024-11-07 22:53:04,405][40007] Num frames 95900... +[2024-11-07 22:53:04,638][40007] Num frames 96000... +[2024-11-07 22:53:04,848][40007] Num frames 96100... +[2024-11-07 22:53:05,052][40007] Num frames 96200... +[2024-11-07 22:53:05,205][40007] Avg episode rewards: #0: 4.481, true rewards: #0: 4.051 +[2024-11-07 22:53:05,206][40007] Avg episode reward: 4.481, avg true_objective: 4.051 +[2024-11-07 22:53:05,317][40007] Num frames 96300... +[2024-11-07 22:53:05,517][40007] Num frames 96400... +[2024-11-07 22:53:05,716][40007] Num frames 96500... +[2024-11-07 22:53:05,915][40007] Num frames 96600... +[2024-11-07 22:53:06,202][40007] Num frames 96700... +[2024-11-07 22:53:06,409][40007] Num frames 96800... +[2024-11-07 22:53:06,512][40007] Avg episode rewards: #0: 4.540, true rewards: #0: 4.080 +[2024-11-07 22:53:06,514][40007] Avg episode reward: 4.540, avg true_objective: 4.080 +[2024-11-07 22:53:06,678][40007] Num frames 96900... +[2024-11-07 22:53:06,906][40007] Num frames 97000... +[2024-11-07 22:53:07,104][40007] Num frames 97100... +[2024-11-07 22:53:07,315][40007] Num frames 97200... +[2024-11-07 22:53:07,389][40007] Avg episode rewards: #0: 4.540, true rewards: #0: 4.080 +[2024-11-07 22:53:07,391][40007] Avg episode reward: 4.540, avg true_objective: 4.080 +[2024-11-07 22:53:07,574][40007] Num frames 97300... +[2024-11-07 22:53:07,758][40007] Num frames 97400... +[2024-11-07 22:53:07,970][40007] Num frames 97500... +[2024-11-07 22:53:08,229][40007] Avg episode rewards: #0: 4.540, true rewards: #0: 4.080 +[2024-11-07 22:53:08,231][40007] Avg episode reward: 4.540, avg true_objective: 4.080 +[2024-11-07 22:53:08,266][40007] Num frames 97600... +[2024-11-07 22:53:08,485][40007] Num frames 97700... +[2024-11-07 22:53:10,404][40007] Num frames 97800... +[2024-11-07 22:53:10,638][40007] Num frames 97900... +[2024-11-07 22:53:10,858][40007] Num frames 98000... +[2024-11-07 22:53:11,002][40007] Avg episode rewards: #0: 4.527, true rewards: #0: 4.077 +[2024-11-07 22:53:11,007][40007] Avg episode reward: 4.527, avg true_objective: 4.077 +[2024-11-07 22:53:11,165][40007] Num frames 98100... +[2024-11-07 22:53:11,381][40007] Num frames 98200... +[2024-11-07 22:53:11,585][40007] Num frames 98300... +[2024-11-07 22:53:11,778][40007] Num frames 98400... +[2024-11-07 22:53:12,077][40007] Avg episode rewards: #0: 4.543, true rewards: #0: 4.083 +[2024-11-07 22:53:12,083][40007] Avg episode reward: 4.543, avg true_objective: 4.083 +[2024-11-07 22:53:12,126][40007] Num frames 98500... +[2024-11-07 22:53:12,368][40007] Num frames 98600... +[2024-11-07 22:53:12,606][40007] Num frames 98700... +[2024-11-07 22:53:12,822][40007] Num frames 98800... +[2024-11-07 22:53:13,046][40007] Num frames 98900... +[2024-11-07 22:53:13,179][40007] Avg episode rewards: #0: 4.546, true rewards: #0: 4.086 +[2024-11-07 22:53:13,181][40007] Avg episode reward: 4.546, avg true_objective: 4.086 +[2024-11-07 22:53:13,328][40007] Num frames 99000... +[2024-11-07 22:53:13,587][40007] Num frames 99100... +[2024-11-07 22:53:13,792][40007] Num frames 99200... +[2024-11-07 22:53:13,996][40007] Num frames 99300... +[2024-11-07 22:53:14,092][40007] Avg episode rewards: #0: 4.546, true rewards: #0: 4.086 +[2024-11-07 22:53:14,094][40007] Avg episode reward: 4.546, avg true_objective: 4.086 +[2024-11-07 22:53:14,275][40007] Num frames 99400... +[2024-11-07 22:53:14,479][40007] Num frames 99500... +[2024-11-07 22:53:14,680][40007] Num frames 99600... +[2024-11-07 22:53:14,871][40007] Num frames 99700... +[2024-11-07 22:53:14,933][40007] Avg episode rewards: #0: 4.546, true rewards: #0: 4.086 +[2024-11-07 22:53:14,935][40007] Avg episode reward: 4.546, avg true_objective: 4.086 +[2024-11-07 22:53:15,156][40007] Num frames 99800... +[2024-11-07 22:53:15,357][40007] Num frames 99900... +[2024-11-07 22:53:15,544][40007] Num frames 100000... +[2024-11-07 22:53:15,802][40007] Avg episode rewards: #0: 4.546, true rewards: #0: 4.086 +[2024-11-07 22:53:15,806][40007] Avg episode reward: 4.546, avg true_objective: 4.086 +[2024-11-07 22:53:15,854][40007] Num frames 100100... +[2024-11-07 22:53:16,056][40007] Num frames 100200... +[2024-11-07 22:53:16,307][40007] Num frames 100300... +[2024-11-07 22:53:16,514][40007] Num frames 100400... +[2024-11-07 22:53:16,731][40007] Num frames 100500... +[2024-11-07 22:53:16,866][40007] Avg episode rewards: #0: 4.550, true rewards: #0: 4.090 +[2024-11-07 22:53:16,871][40007] Avg episode reward: 4.550, avg true_objective: 4.090 +[2024-11-07 22:53:17,136][40007] Num frames 100600... +[2024-11-07 22:53:17,487][40007] Num frames 100700... +[2024-11-07 22:53:18,563][40007] Num frames 100800... +[2024-11-07 22:53:19,121][40007] Num frames 100900... +[2024-11-07 22:53:19,588][40007] Avg episode rewards: #0: 4.517, true rewards: #0: 4.077 +[2024-11-07 22:53:19,590][40007] Avg episode reward: 4.517, avg true_objective: 4.077 +[2024-11-07 22:53:20,202][40007] Num frames 101000... +[2024-11-07 22:53:20,461][40007] Num frames 101100... +[2024-11-07 22:53:20,726][40007] Num frames 101200... +[2024-11-07 22:53:20,961][40007] Num frames 101300... +[2024-11-07 22:53:21,028][40007] Avg episode rewards: #0: 4.500, true rewards: #0: 4.070 +[2024-11-07 22:53:21,030][40007] Avg episode reward: 4.500, avg true_objective: 4.070 +[2024-11-07 22:53:21,274][40007] Num frames 101400... +[2024-11-07 22:53:21,503][40007] Num frames 101500... +[2024-11-07 22:53:21,718][40007] Num frames 101600... +[2024-11-07 22:53:22,188][40007] Avg episode rewards: #0: 4.500, true rewards: #0: 4.070 +[2024-11-07 22:53:22,190][40007] Avg episode reward: 4.500, avg true_objective: 4.070 +[2024-11-07 22:53:22,248][40007] Num frames 101700... +[2024-11-07 22:53:22,518][40007] Num frames 101800... +[2024-11-07 22:53:22,781][40007] Num frames 101900... +[2024-11-07 22:53:23,035][40007] Num frames 102000... +[2024-11-07 22:53:23,592][40007] Avg episode rewards: #0: 4.500, true rewards: #0: 4.070 +[2024-11-07 22:53:23,594][40007] Avg episode reward: 4.500, avg true_objective: 4.070 +[2024-11-07 22:53:23,787][40007] Num frames 102100... +[2024-11-07 22:53:24,282][40007] Num frames 102200... +[2024-11-07 22:53:24,572][40007] Num frames 102300... +[2024-11-07 22:53:24,934][40007] Num frames 102400... +[2024-11-07 22:53:25,170][40007] Avg episode rewards: #0: 4.500, true rewards: #0: 4.070 +[2024-11-07 22:53:25,172][40007] Avg episode reward: 4.500, avg true_objective: 4.070 +[2024-11-07 22:53:25,293][40007] Num frames 102500... +[2024-11-07 22:53:25,514][40007] Num frames 102600... +[2024-11-07 22:53:25,702][40007] Num frames 102700... +[2024-11-07 22:53:25,884][40007] Num frames 102800... +[2024-11-07 22:53:26,022][40007] Avg episode rewards: #0: 4.468, true rewards: #0: 4.058 +[2024-11-07 22:53:26,026][40007] Avg episode reward: 4.468, avg true_objective: 4.058 +[2024-11-07 22:53:26,162][40007] Num frames 102900... +[2024-11-07 22:53:26,363][40007] Num frames 103000... +[2024-11-07 22:53:26,560][40007] Num frames 103100... +[2024-11-07 22:53:26,761][40007] Num frames 103200... +[2024-11-07 22:53:26,866][40007] Avg episode rewards: #0: 4.454, true rewards: #0: 4.054 +[2024-11-07 22:53:26,869][40007] Avg episode reward: 4.454, avg true_objective: 4.054 +[2024-11-07 22:53:27,061][40007] Num frames 103300... +[2024-11-07 22:53:27,334][40007] Num frames 103400... +[2024-11-07 22:53:27,630][40007] Num frames 103500... +[2024-11-07 22:53:27,886][40007] Num frames 103600... +[2024-11-07 22:53:28,131][40007] Avg episode rewards: #0: 4.448, true rewards: #0: 4.058 +[2024-11-07 22:53:28,134][40007] Avg episode reward: 4.448, avg true_objective: 4.058 +[2024-11-07 22:53:28,212][40007] Num frames 103700... +[2024-11-07 22:53:28,414][40007] Num frames 103800... +[2024-11-07 22:53:28,622][40007] Num frames 103900... +[2024-11-07 22:53:28,856][40007] Num frames 104000... +[2024-11-07 22:53:29,076][40007] Num frames 104100... +[2024-11-07 22:53:29,227][40007] Avg episode rewards: #0: 4.424, true rewards: #0: 4.054 +[2024-11-07 22:53:29,231][40007] Avg episode reward: 4.424, avg true_objective: 4.054 +[2024-11-07 22:53:29,353][40007] Num frames 104200... +[2024-11-07 22:53:29,532][40007] Num frames 104300... +[2024-11-07 22:53:29,743][40007] Num frames 104400... +[2024-11-07 22:53:30,021][40007] Num frames 104500... +[2024-11-07 22:53:30,168][40007] Avg episode rewards: #0: 4.408, true rewards: #0: 4.048 +[2024-11-07 22:53:30,171][40007] Avg episode reward: 4.408, avg true_objective: 4.048 +[2024-11-07 22:53:30,346][40007] Num frames 104600... +[2024-11-07 22:53:30,606][40007] Num frames 104700... +[2024-11-07 22:53:30,828][40007] Num frames 104800... +[2024-11-07 22:53:31,089][40007] Num frames 104900... +[2024-11-07 22:53:31,342][40007] Avg episode rewards: #0: 4.424, true rewards: #0: 4.054 +[2024-11-07 22:53:31,347][40007] Avg episode reward: 4.424, avg true_objective: 4.054 +[2024-11-07 22:53:31,416][40007] Num frames 105000... +[2024-11-07 22:53:31,657][40007] Num frames 105100... +[2024-11-07 22:53:31,895][40007] Num frames 105200... +[2024-11-07 22:53:32,146][40007] Num frames 105300... +[2024-11-07 22:53:32,359][40007] Avg episode rewards: #0: 4.424, true rewards: #0: 4.054 +[2024-11-07 22:53:32,361][40007] Avg episode reward: 4.424, avg true_objective: 4.054 +[2024-11-07 22:53:32,461][40007] Num frames 105400... +[2024-11-07 22:53:32,717][40007] Num frames 105500... +[2024-11-07 22:53:32,987][40007] Num frames 105600... +[2024-11-07 22:53:33,275][40007] Num frames 105700... +[2024-11-07 22:53:33,435][40007] Avg episode rewards: #0: 4.424, true rewards: #0: 4.054 +[2024-11-07 22:53:33,438][40007] Avg episode reward: 4.424, avg true_objective: 4.054 +[2024-11-07 22:53:33,639][40007] Num frames 105800... +[2024-11-07 22:53:33,864][40007] Num frames 105900... +[2024-11-07 22:53:34,093][40007] Num frames 106000... +[2024-11-07 22:53:34,405][40007] Num frames 106100... +[2024-11-07 22:53:34,568][40007] Avg episode rewards: #0: 4.424, true rewards: #0: 4.054 +[2024-11-07 22:53:34,572][40007] Avg episode reward: 4.424, avg true_objective: 4.054 +[2024-11-07 22:53:34,803][40007] Num frames 106200... +[2024-11-07 22:53:35,091][40007] Num frames 106300... +[2024-11-07 22:53:35,435][40007] Num frames 106400... +[2024-11-07 22:53:36,061][40007] Num frames 106500... +[2024-11-07 22:53:36,175][40007] Avg episode rewards: #0: 4.424, true rewards: #0: 4.054 +[2024-11-07 22:53:36,179][40007] Avg episode reward: 4.424, avg true_objective: 4.054 +[2024-11-07 22:53:36,376][40007] Num frames 106600... +[2024-11-07 22:53:36,647][40007] Num frames 106700... +[2024-11-07 22:53:37,405][40007] Num frames 106800... +[2024-11-07 22:53:37,643][40007] Num frames 106900... +[2024-11-07 22:53:37,863][40007] Avg episode rewards: #0: 4.441, true rewards: #0: 4.061 +[2024-11-07 22:53:37,869][40007] Avg episode reward: 4.441, avg true_objective: 4.061 +[2024-11-07 22:53:37,983][40007] Num frames 107000... +[2024-11-07 22:53:38,221][40007] Num frames 107100... +[2024-11-07 22:53:38,486][40007] Num frames 107200... +[2024-11-07 22:53:38,709][40007] Num frames 107300... +[2024-11-07 22:53:38,925][40007] Avg episode rewards: #0: 4.464, true rewards: #0: 4.064 +[2024-11-07 22:53:38,928][40007] Avg episode reward: 4.464, avg true_objective: 4.064 +[2024-11-07 22:53:38,991][40007] Num frames 107400... +[2024-11-07 22:53:39,193][40007] Num frames 107500... +[2024-11-07 22:53:39,393][40007] Num frames 107600... +[2024-11-07 22:53:39,597][40007] Num frames 107700... +[2024-11-07 22:53:39,796][40007] Num frames 107800... +[2024-11-07 22:53:39,925][40007] Avg episode rewards: #0: 4.480, true rewards: #0: 4.070 +[2024-11-07 22:53:39,935][40007] Avg episode reward: 4.480, avg true_objective: 4.070 +[2024-11-07 22:53:40,085][40007] Num frames 107900... +[2024-11-07 22:53:40,283][40007] Num frames 108000... +[2024-11-07 22:53:40,573][40007] Num frames 108100... +[2024-11-07 22:53:40,849][40007] Num frames 108200... +[2024-11-07 22:53:41,115][40007] Avg episode rewards: #0: 4.497, true rewards: #0: 4.077 +[2024-11-07 22:53:41,118][40007] Avg episode reward: 4.497, avg true_objective: 4.077 +[2024-11-07 22:53:41,183][40007] Num frames 108300... +[2024-11-07 22:53:41,390][40007] Num frames 108400... +[2024-11-07 22:53:41,583][40007] Num frames 108500... +[2024-11-07 22:53:41,775][40007] Num frames 108600... +[2024-11-07 22:53:41,969][40007] Avg episode rewards: #0: 4.484, true rewards: #0: 4.074 +[2024-11-07 22:53:41,975][40007] Avg episode reward: 4.484, avg true_objective: 4.074 +[2024-11-07 22:53:42,076][40007] Num frames 108700... +[2024-11-07 22:53:42,274][40007] Num frames 108800... +[2024-11-07 22:53:42,498][40007] Num frames 108900... +[2024-11-07 22:53:44,138][40007] Num frames 109000... +[2024-11-07 22:53:44,289][40007] Avg episode rewards: #0: 4.467, true rewards: #0: 4.067 +[2024-11-07 22:53:44,290][40007] Avg episode reward: 4.467, avg true_objective: 4.067 +[2024-11-07 22:53:44,437][40007] Num frames 109100... +[2024-11-07 22:53:44,813][40007] Num frames 109200... +[2024-11-07 22:53:45,141][40007] Num frames 109300... +[2024-11-07 22:53:45,403][40007] Num frames 109400... +[2024-11-07 22:53:45,969][40007] Avg episode rewards: #0: 4.496, true rewards: #0: 4.086 +[2024-11-07 22:53:45,971][40007] Avg episode reward: 4.496, avg true_objective: 4.086 +[2024-11-07 22:53:45,990][40007] Num frames 109500... +[2024-11-07 22:53:46,251][40007] Num frames 109600... +[2024-11-07 22:53:46,561][40007] Num frames 109700... +[2024-11-07 22:53:47,056][40007] Num frames 109800... +[2024-11-07 22:53:47,460][40007] Avg episode rewards: #0: 4.483, true rewards: #0: 4.083 +[2024-11-07 22:53:47,461][40007] Avg episode reward: 4.483, avg true_objective: 4.083 +[2024-11-07 22:53:47,540][40007] Num frames 109900... +[2024-11-07 22:53:47,970][40007] Num frames 110000... +[2024-11-07 22:53:48,288][40007] Num frames 110100... +[2024-11-07 22:53:48,570][40007] Num frames 110200... +[2024-11-07 22:53:48,803][40007] Avg episode rewards: #0: 4.483, true rewards: #0: 4.083 +[2024-11-07 22:53:48,809][40007] Avg episode reward: 4.483, avg true_objective: 4.083 +[2024-11-07 22:53:48,918][40007] Num frames 110300... +[2024-11-07 22:53:49,153][40007] Num frames 110400... +[2024-11-07 22:53:49,354][40007] Num frames 110500... +[2024-11-07 22:53:49,684][40007] Num frames 110600... +[2024-11-07 22:53:49,842][40007] Avg episode rewards: #0: 4.483, true rewards: #0: 4.083 +[2024-11-07 22:53:49,845][40007] Avg episode reward: 4.483, avg true_objective: 4.083 +[2024-11-07 22:53:50,008][40007] Num frames 110700... +[2024-11-07 22:53:50,227][40007] Num frames 110800... +[2024-11-07 22:53:50,436][40007] Num frames 110900... +[2024-11-07 22:53:50,672][40007] Num frames 111000... +[2024-11-07 22:53:50,799][40007] Avg episode rewards: #0: 4.467, true rewards: #0: 4.077 +[2024-11-07 22:53:50,801][40007] Avg episode reward: 4.467, avg true_objective: 4.077 +[2024-11-07 22:53:50,976][40007] Num frames 111100... +[2024-11-07 22:53:51,244][40007] Num frames 111200... +[2024-11-07 22:53:51,471][40007] Num frames 111300... +[2024-11-07 22:53:51,716][40007] Num frames 111400... +[2024-11-07 22:53:51,962][40007] Num frames 111500... +[2024-11-07 22:53:52,224][40007] Num frames 111600... +[2024-11-07 22:53:52,383][40007] Avg episode rewards: #0: 4.519, true rewards: #0: 4.099 +[2024-11-07 22:53:52,385][40007] Avg episode reward: 4.519, avg true_objective: 4.099 +[2024-11-07 22:53:52,517][40007] Num frames 111700... +[2024-11-07 22:53:52,756][40007] Num frames 111800... +[2024-11-07 22:53:53,008][40007] Num frames 111900... +[2024-11-07 22:53:53,278][40007] Num frames 112000... +[2024-11-07 22:53:53,399][40007] Avg episode rewards: #0: 4.519, true rewards: #0: 4.099 +[2024-11-07 22:53:53,400][40007] Avg episode reward: 4.519, avg true_objective: 4.099 +[2024-11-07 22:53:53,614][40007] Num frames 112100... +[2024-11-07 22:53:53,859][40007] Num frames 112200... +[2024-11-07 22:53:54,159][40007] Num frames 112300... +[2024-11-07 22:53:54,451][40007] Num frames 112400... +[2024-11-07 22:53:54,523][40007] Avg episode rewards: #0: 4.519, true rewards: #0: 4.099 +[2024-11-07 22:53:54,526][40007] Avg episode reward: 4.519, avg true_objective: 4.099 +[2024-11-07 22:53:54,788][40007] Num frames 112500... +[2024-11-07 22:53:55,103][40007] Num frames 112600... +[2024-11-07 22:53:55,395][40007] Num frames 112700... +[2024-11-07 22:53:55,702][40007] Avg episode rewards: #0: 4.503, true rewards: #0: 4.093 +[2024-11-07 22:53:55,707][40007] Avg episode reward: 4.503, avg true_objective: 4.093 +[2024-11-07 22:53:55,768][40007] Num frames 112800... +[2024-11-07 22:53:56,005][40007] Num frames 112900... +[2024-11-07 22:53:56,259][40007] Num frames 113000... +[2024-11-07 22:53:56,479][40007] Num frames 113100... +[2024-11-07 22:53:56,745][40007] Avg episode rewards: #0: 4.486, true rewards: #0: 4.086 +[2024-11-07 22:53:56,747][40007] Avg episode reward: 4.486, avg true_objective: 4.086 +[2024-11-07 22:53:56,798][40007] Num frames 113200... +[2024-11-07 22:53:56,996][40007] Num frames 113300... +[2024-11-07 22:53:57,232][40007] Num frames 113400... +[2024-11-07 22:53:57,467][40007] Num frames 113500... +[2024-11-07 22:53:57,696][40007] Num frames 113600... +[2024-11-07 22:53:57,940][40007] Num frames 113700... +[2024-11-07 22:53:58,050][40007] Avg episode rewards: #0: 4.519, true rewards: #0: 4.099 +[2024-11-07 22:53:58,056][40007] Avg episode reward: 4.519, avg true_objective: 4.099 +[2024-11-07 22:53:58,276][40007] Num frames 113800... +[2024-11-07 22:53:58,501][40007] Num frames 113900... +[2024-11-07 22:53:58,755][40007] Num frames 114000... +[2024-11-07 22:53:58,885][40007] Avg episode rewards: #0: 4.512, true rewards: #0: 4.092 +[2024-11-07 22:53:58,889][40007] Avg episode reward: 4.512, avg true_objective: 4.092 +[2024-11-07 22:53:59,052][40007] Num frames 114100... +[2024-11-07 22:53:59,301][40007] Num frames 114200... +[2024-11-07 22:53:59,517][40007] Num frames 114300... +[2024-11-07 22:53:59,765][40007] Num frames 114400... +[2024-11-07 22:53:59,857][40007] Avg episode rewards: #0: 4.512, true rewards: #0: 4.092 +[2024-11-07 22:53:59,862][40007] Avg episode reward: 4.512, avg true_objective: 4.092 +[2024-11-07 22:54:00,063][40007] Num frames 114500... +[2024-11-07 22:54:00,293][40007] Num frames 114600... +[2024-11-07 22:54:00,523][40007] Num frames 114700... +[2024-11-07 22:54:00,738][40007] Num frames 114800... +[2024-11-07 22:54:00,944][40007] Avg episode rewards: #0: 4.529, true rewards: #0: 4.098 +[2024-11-07 22:54:00,946][40007] Avg episode reward: 4.529, avg true_objective: 4.098 +[2024-11-07 22:54:01,082][40007] Num frames 114900... +[2024-11-07 22:54:01,369][40007] Num frames 115000... +[2024-11-07 22:54:01,632][40007] Num frames 115100... +[2024-11-07 22:54:01,907][40007] Num frames 115200... +[2024-11-07 22:54:02,109][40007] Avg episode rewards: #0: 4.529, true rewards: #0: 4.098 +[2024-11-07 22:54:02,112][40007] Avg episode reward: 4.529, avg true_objective: 4.098 +[2024-11-07 22:54:02,255][40007] Num frames 115300... +[2024-11-07 22:54:02,484][40007] Num frames 115400... +[2024-11-07 22:54:02,720][40007] Num frames 115500... +[2024-11-07 22:54:02,958][40007] Num frames 115600... +[2024-11-07 22:54:03,151][40007] Avg episode rewards: #0: 4.529, true rewards: #0: 4.098 +[2024-11-07 22:54:03,152][40007] Avg episode reward: 4.529, avg true_objective: 4.098 +[2024-11-07 22:54:03,326][40007] Num frames 115700... +[2024-11-07 22:54:03,565][40007] Num frames 115800... +[2024-11-07 22:54:03,811][40007] Num frames 115900... +[2024-11-07 22:54:04,067][40007] Num frames 116000... +[2024-11-07 22:54:04,163][40007] Avg episode rewards: #0: 4.512, true rewards: #0: 4.092 +[2024-11-07 22:54:04,165][40007] Avg episode reward: 4.512, avg true_objective: 4.092 +[2024-11-07 22:54:04,382][40007] Num frames 116100... +[2024-11-07 22:54:04,620][40007] Num frames 116200... +[2024-11-07 22:54:04,845][40007] Num frames 116300... +[2024-11-07 22:54:05,095][40007] Avg episode rewards: #0: 4.496, true rewards: #0: 4.086 +[2024-11-07 22:54:05,098][40007] Avg episode reward: 4.496, avg true_objective: 4.086 +[2024-11-07 22:54:05,106][40007] Num frames 116400... +[2024-11-07 22:54:05,336][40007] Num frames 116500... +[2024-11-07 22:54:05,532][40007] Num frames 116600... +[2024-11-07 22:54:05,790][40007] Num frames 116700... +[2024-11-07 22:54:06,029][40007] Avg episode rewards: #0: 4.479, true rewards: #0: 4.079 +[2024-11-07 22:54:06,036][40007] Avg episode reward: 4.479, avg true_objective: 4.079 +[2024-11-07 22:54:06,414][40007] Num frames 116800... +[2024-11-07 22:54:06,648][40007] Num frames 116900... +[2024-11-07 22:54:06,855][40007] Num frames 117000... +[2024-11-07 22:54:07,071][40007] Num frames 117100... +[2024-11-07 22:54:07,277][40007] Avg episode rewards: #0: 4.463, true rewards: #0: 4.073 +[2024-11-07 22:54:07,278][40007] Avg episode reward: 4.463, avg true_objective: 4.073 +[2024-11-07 22:54:07,354][40007] Num frames 117200... +[2024-11-07 22:54:07,568][40007] Num frames 117300... +[2024-11-07 22:54:07,777][40007] Num frames 117400... +[2024-11-07 22:54:08,049][40007] Num frames 117500... +[2024-11-07 22:54:08,269][40007] Num frames 117600... +[2024-11-07 22:54:08,360][40007] Avg episode rewards: #0: 4.476, true rewards: #0: 4.076 +[2024-11-07 22:54:08,363][40007] Avg episode reward: 4.476, avg true_objective: 4.076 +[2024-11-07 22:54:08,548][40007] Num frames 117700... +[2024-11-07 22:54:08,752][40007] Num frames 117800... +[2024-11-07 22:54:08,953][40007] Num frames 117900... +[2024-11-07 22:54:09,211][40007] Avg episode rewards: #0: 4.476, true rewards: #0: 4.076 +[2024-11-07 22:54:09,216][40007] Avg episode reward: 4.476, avg true_objective: 4.076 +[2024-11-07 22:54:09,228][40007] Num frames 118000... +[2024-11-07 22:54:09,449][40007] Num frames 118100... +[2024-11-07 22:54:09,682][40007] Num frames 118200... +[2024-11-07 22:54:09,910][40007] Num frames 118300... +[2024-11-07 22:54:10,124][40007] Num frames 118400... +[2024-11-07 22:54:10,361][40007] Num frames 118500... +[2024-11-07 22:54:10,514][40007] Avg episode rewards: #0: 4.496, true rewards: #0: 4.086 +[2024-11-07 22:54:10,521][40007] Avg episode reward: 4.496, avg true_objective: 4.086 +[2024-11-07 22:54:10,653][40007] Num frames 118600... +[2024-11-07 22:54:10,878][40007] Num frames 118700... +[2024-11-07 22:54:11,125][40007] Num frames 118800... +[2024-11-07 22:54:11,374][40007] Num frames 118900... +[2024-11-07 22:54:11,643][40007] Avg episode rewards: #0: 4.512, true rewards: #0: 4.092 +[2024-11-07 22:54:11,645][40007] Avg episode reward: 4.512, avg true_objective: 4.092 +[2024-11-07 22:54:11,681][40007] Num frames 119000... +[2024-11-07 22:54:11,915][40007] Num frames 119100... +[2024-11-07 22:54:12,138][40007] Num frames 119200... +[2024-11-07 22:54:12,364][40007] Num frames 119300... +[2024-11-07 22:54:12,591][40007] Num frames 119400... +[2024-11-07 22:54:12,664][40007] Avg episode rewards: #0: 4.525, true rewards: #0: 4.095 +[2024-11-07 22:54:12,669][40007] Avg episode reward: 4.525, avg true_objective: 4.095 +[2024-11-07 22:54:12,904][40007] Num frames 119500... +[2024-11-07 22:54:13,125][40007] Num frames 119600... +[2024-11-07 22:54:13,328][40007] Num frames 119700... +[2024-11-07 22:54:13,592][40007] Num frames 119800... +[2024-11-07 22:54:14,047][40007] Avg episode rewards: #0: 4.542, true rewards: #0: 4.102 +[2024-11-07 22:54:14,048][40007] Avg episode reward: 4.542, avg true_objective: 4.102 +[2024-11-07 22:54:14,162][40007] Num frames 119900... +[2024-11-07 22:54:14,432][40007] Num frames 120000... +[2024-11-07 22:54:14,673][40007] Num frames 120100... +[2024-11-07 22:54:14,905][40007] Num frames 120200... +[2024-11-07 22:54:15,121][40007] Num frames 120300... +[2024-11-07 22:54:44,296][40007] Avg episode rewards: #0: 4.558, true rewards: #0: 4.108 +[2024-11-07 22:54:44,748][40007] Avg episode reward: 4.558, avg true_objective: 4.108 +[2024-11-07 23:13:49,674][41694] Saving configuration to /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/config.json... +[2024-11-07 23:13:49,700][41694] Rollout worker 0 uses device cpu +[2024-11-07 23:13:49,701][41694] Rollout worker 1 uses device cpu +[2024-11-07 23:13:49,702][41694] Rollout worker 2 uses device cpu +[2024-11-07 23:13:49,703][41694] Rollout worker 3 uses device cpu +[2024-11-07 23:13:49,704][41694] Rollout worker 4 uses device cpu +[2024-11-07 23:13:49,705][41694] Rollout worker 5 uses device cpu +[2024-11-07 23:13:49,708][41694] Rollout worker 6 uses device cpu +[2024-11-07 23:13:49,710][41694] Rollout worker 7 uses device cpu +[2024-11-07 23:13:50,027][41694] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-11-07 23:13:50,029][41694] InferenceWorker_p0-w0: min num requests: 2 +[2024-11-07 23:13:50,062][41694] Starting all processes... +[2024-11-07 23:13:50,064][41694] Starting process learner_proc0 +[2024-11-07 23:13:50,313][41694] Starting all processes... +[2024-11-07 23:13:50,372][41694] Starting process inference_proc0-0 +[2024-11-07 23:13:50,373][41694] Starting process rollout_proc0 +[2024-11-07 23:13:50,374][41694] Starting process rollout_proc1 +[2024-11-07 23:13:50,374][41694] Starting process rollout_proc2 +[2024-11-07 23:13:50,375][41694] Starting process rollout_proc3 +[2024-11-07 23:13:50,376][41694] Starting process rollout_proc4 +[2024-11-07 23:13:50,376][41694] Starting process rollout_proc5 +[2024-11-07 23:13:50,377][41694] Starting process rollout_proc6 +[2024-11-07 23:13:50,378][41694] Starting process rollout_proc7 +[2024-11-07 23:13:56,106][42009] Worker 5 uses CPU cores [5] +[2024-11-07 23:13:57,184][42004] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-11-07 23:13:57,184][42004] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 +[2024-11-07 23:13:57,333][42010] Worker 4 uses CPU cores [4] +[2024-11-07 23:13:57,431][42004] Num visible devices: 1 +[2024-11-07 23:13:57,692][42007] Worker 2 uses CPU cores [2] +[2024-11-07 23:13:57,971][42005] Worker 0 uses CPU cores [0] +[2024-11-07 23:13:57,981][42017] Worker 6 uses CPU cores [6] +[2024-11-07 23:13:58,028][42008] Worker 3 uses CPU cores [3] +[2024-11-07 23:13:58,229][41991] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-11-07 23:13:58,230][41991] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 +[2024-11-07 23:13:58,251][41991] Num visible devices: 1 +[2024-11-07 23:13:58,259][41991] Starting seed is not provided +[2024-11-07 23:13:58,260][41991] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-11-07 23:13:58,260][41991] Initializing actor-critic model on device cuda:0 +[2024-11-07 23:13:58,260][41991] RunningMeanStd input shape: (3, 72, 128) +[2024-11-07 23:13:58,264][41991] RunningMeanStd input shape: (1,) +[2024-11-07 23:13:58,279][41991] ConvEncoder: input_channels=3 +[2024-11-07 23:13:58,521][42018] Worker 7 uses CPU cores [0, 1, 2, 3, 4, 5, 6] +[2024-11-07 23:13:58,619][42006] Worker 1 uses CPU cores [1] +[2024-11-07 23:13:59,504][41991] Conv encoder output size: 512 +[2024-11-07 23:13:59,505][41991] Policy head output size: 512 +[2024-11-07 23:13:59,884][41991] Created Actor Critic model with architecture: +[2024-11-07 23:13:59,885][41991] ActorCriticSharedWeights( + (obs_normalizer): ObservationNormalizer( + (running_mean_std): RunningMeanStdDictInPlace( + (running_mean_std): ModuleDict( + (obs): RunningMeanStdInPlace() + ) + ) + ) + (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) + (encoder): VizdoomEncoder( + (basic_encoder): ConvEncoder( + (enc): RecursiveScriptModule( + original_name=ConvEncoderImpl + (conv_head): RecursiveScriptModule( + original_name=Sequential + (0): RecursiveScriptModule(original_name=Conv2d) + (1): RecursiveScriptModule(original_name=ELU) + (2): RecursiveScriptModule(original_name=Conv2d) + (3): RecursiveScriptModule(original_name=ELU) + (4): RecursiveScriptModule(original_name=Conv2d) + (5): RecursiveScriptModule(original_name=ELU) + ) + (mlp_layers): RecursiveScriptModule( + original_name=Sequential + (0): RecursiveScriptModule(original_name=Linear) + (1): RecursiveScriptModule(original_name=ELU) + ) + ) + ) + ) + (core): ModelCoreRNN( + (core): GRU(512, 512) + ) + (decoder): MlpDecoder( + (mlp): Identity() + ) + (critic_linear): Linear(in_features=512, out_features=1, bias=True) + (action_parameterization): ActionParameterizationDefault( + (distribution_linear): Linear(in_features=512, out_features=5, bias=True) + ) +) +[2024-11-07 23:14:02,026][41991] Using optimizer +[2024-11-07 23:14:08,174][41991] Loading state from checkpoint /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000004886_20013056.pth... +[2024-11-07 23:14:08,593][41991] Loading model from checkpoint +[2024-11-07 23:14:08,598][41991] Loaded experiment state at self.train_step=4886, self.env_steps=20013056 +[2024-11-07 23:14:08,598][41991] Initialized policy 0 weights for model version 4886 +[2024-11-07 23:14:08,608][41991] LearnerWorker_p0 finished initialization! +[2024-11-07 23:14:08,608][41991] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2024-11-07 23:14:08,861][42004] RunningMeanStd input shape: (3, 72, 128) +[2024-11-07 23:14:08,862][42004] RunningMeanStd input shape: (1,) +[2024-11-07 23:14:08,875][42004] ConvEncoder: input_channels=3 +[2024-11-07 23:14:08,978][42004] Conv encoder output size: 512 +[2024-11-07 23:14:08,978][42004] Policy head output size: 512 +[2024-11-07 23:14:09,034][41694] Inference worker 0-0 is ready! +[2024-11-07 23:14:09,035][41694] All inference workers are ready! Signal rollout workers to start! +[2024-11-07 23:14:09,147][42008] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 23:14:09,148][42010] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 23:14:09,157][42009] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 23:14:09,159][42007] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 23:14:09,164][42005] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 23:14:09,165][42006] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 23:14:09,190][42017] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 23:14:09,204][42018] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-07 23:14:10,019][41694] Heartbeat connected on Batcher_0 +[2024-11-07 23:14:10,026][41694] Heartbeat connected on LearnerWorker_p0 +[2024-11-07 23:14:10,074][41694] Heartbeat connected on InferenceWorker_p0-w0 +[2024-11-07 23:14:12,072][42018] Decorrelating experience for 0 frames... +[2024-11-07 23:14:12,072][42010] Decorrelating experience for 0 frames... +[2024-11-07 23:14:12,072][42006] Decorrelating experience for 0 frames... +[2024-11-07 23:14:12,072][42009] Decorrelating experience for 0 frames... +[2024-11-07 23:14:12,072][42005] Decorrelating experience for 0 frames... +[2024-11-07 23:14:12,072][42007] Decorrelating experience for 0 frames... +[2024-11-07 23:14:12,074][42017] Decorrelating experience for 0 frames... +[2024-11-07 23:14:12,358][42008] Decorrelating experience for 0 frames... +[2024-11-07 23:14:12,422][42006] Decorrelating experience for 32 frames... +[2024-11-07 23:14:12,450][42005] Decorrelating experience for 32 frames... +[2024-11-07 23:14:12,458][42007] Decorrelating experience for 32 frames... +[2024-11-07 23:14:12,497][42018] Decorrelating experience for 32 frames... +[2024-11-07 23:14:12,746][42008] Decorrelating experience for 32 frames... +[2024-11-07 23:14:12,844][42010] Decorrelating experience for 32 frames... +[2024-11-07 23:14:12,932][41694] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 20013056. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-11-07 23:14:12,955][42007] Decorrelating experience for 64 frames... +[2024-11-07 23:14:12,959][42006] Decorrelating experience for 64 frames... +[2024-11-07 23:14:13,040][42005] Decorrelating experience for 64 frames... +[2024-11-07 23:14:13,050][42018] Decorrelating experience for 64 frames... +[2024-11-07 23:14:13,271][42009] Decorrelating experience for 32 frames... +[2024-11-07 23:14:13,363][42010] Decorrelating experience for 64 frames... +[2024-11-07 23:14:13,416][42008] Decorrelating experience for 64 frames... +[2024-11-07 23:14:13,492][42007] Decorrelating experience for 96 frames... +[2024-11-07 23:14:13,547][42005] Decorrelating experience for 96 frames... +[2024-11-07 23:14:13,573][41694] Heartbeat connected on RolloutWorker_w2 +[2024-11-07 23:14:13,627][42018] Decorrelating experience for 96 frames... +[2024-11-07 23:14:13,659][41694] Heartbeat connected on RolloutWorker_w0 +[2024-11-07 23:14:13,773][42006] Decorrelating experience for 96 frames... +[2024-11-07 23:14:13,812][41694] Heartbeat connected on RolloutWorker_w7 +[2024-11-07 23:14:13,866][42009] Decorrelating experience for 64 frames... +[2024-11-07 23:14:13,875][41694] Heartbeat connected on RolloutWorker_w1 +[2024-11-07 23:14:13,922][42008] Decorrelating experience for 96 frames... +[2024-11-07 23:14:14,007][41694] Heartbeat connected on RolloutWorker_w3 +[2024-11-07 23:14:14,011][42010] Decorrelating experience for 96 frames... +[2024-11-07 23:14:14,079][41694] Heartbeat connected on RolloutWorker_w4 +[2024-11-07 23:14:14,160][42017] Decorrelating experience for 32 frames... +[2024-11-07 23:14:14,288][42009] Decorrelating experience for 96 frames... +[2024-11-07 23:14:14,336][41694] Heartbeat connected on RolloutWorker_w5 +[2024-11-07 23:14:14,529][42017] Decorrelating experience for 64 frames... +[2024-11-07 23:14:14,847][42017] Decorrelating experience for 96 frames... +[2024-11-07 23:14:14,896][41694] Heartbeat connected on RolloutWorker_w6 +[2024-11-07 23:14:17,932][41694] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 20013056. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-11-07 23:14:21,624][41991] Signal inference workers to stop experience collection... +[2024-11-07 23:14:21,635][42004] InferenceWorker_p0-w0: stopping experience collection +[2024-11-07 23:14:22,932][41694] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 20013056. Throughput: 0: 122.4. Samples: 1224. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-11-07 23:14:22,935][41694] Avg episode reward: [(0, '2.104')] +[2024-11-07 23:14:27,931][41694] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 20013056. Throughput: 0: 150.5. Samples: 2258. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2024-11-07 23:14:27,933][41694] Avg episode reward: [(0, '2.104')] +[2024-11-07 23:14:30,864][41991] Signal inference workers to resume experience collection... +[2024-11-07 23:14:30,864][42004] InferenceWorker_p0-w0: resuming experience collection +[2024-11-07 23:14:32,932][41694] Fps is (10 sec: 1638.5, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 20029440. Throughput: 0: 112.9. Samples: 2258. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) +[2024-11-07 23:14:32,935][41694] Avg episode reward: [(0, '3.393')] +[2024-11-07 23:14:36,549][42004] Updated weights for policy 0, policy_version 4896 (0.0195) +[2024-11-07 23:14:37,932][41694] Fps is (10 sec: 4505.3, 60 sec: 1802.2, 300 sec: 1802.2). Total num frames: 20058112. Throughput: 0: 408.4. Samples: 10210. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:14:37,936][41694] Avg episode reward: [(0, '4.164')] +[2024-11-07 23:14:42,932][41694] Fps is (10 sec: 4915.2, 60 sec: 2184.5, 300 sec: 2184.5). Total num frames: 20078592. Throughput: 0: 558.8. Samples: 16764. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:14:42,934][41694] Avg episode reward: [(0, '4.382')] +[2024-11-07 23:14:47,102][42004] Updated weights for policy 0, policy_version 4906 (0.0075) +[2024-11-07 23:14:47,932][41694] Fps is (10 sec: 3686.5, 60 sec: 2340.6, 300 sec: 2340.6). Total num frames: 20094976. Throughput: 0: 575.8. Samples: 20154. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:14:47,934][41694] Avg episode reward: [(0, '4.318')] +[2024-11-07 23:14:52,932][41694] Fps is (10 sec: 2457.5, 60 sec: 2252.8, 300 sec: 2252.8). Total num frames: 20103168. Throughput: 0: 574.5. Samples: 22980. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:14:52,935][41694] Avg episode reward: [(0, '4.267')] +[2024-11-07 23:14:57,935][41694] Fps is (10 sec: 2048.0, 60 sec: 2275.6, 300 sec: 2275.6). Total num frames: 20115456. Throughput: 0: 583.4. Samples: 26254. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:14:57,938][41694] Avg episode reward: [(0, '4.157')] +[2024-11-07 23:15:01,763][42004] Updated weights for policy 0, policy_version 4916 (0.0068) +[2024-11-07 23:15:02,932][41694] Fps is (10 sec: 3686.6, 60 sec: 2539.5, 300 sec: 2539.5). Total num frames: 20140032. Throughput: 0: 642.8. Samples: 28926. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:15:02,934][41694] Avg episode reward: [(0, '4.345')] +[2024-11-07 23:15:07,931][41694] Fps is (10 sec: 5734.5, 60 sec: 2904.4, 300 sec: 2904.4). Total num frames: 20172800. Throughput: 0: 809.1. Samples: 37632. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:15:07,933][41694] Avg episode reward: [(0, '4.305')] +[2024-11-07 23:15:08,163][42004] Updated weights for policy 0, policy_version 4926 (0.0029) +[2024-11-07 23:15:12,932][41694] Fps is (10 sec: 6553.5, 60 sec: 3208.5, 300 sec: 3208.5). Total num frames: 20205568. Throughput: 0: 1032.8. Samples: 48736. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:15:12,934][41694] Avg episode reward: [(0, '4.309')] +[2024-11-07 23:15:14,684][42004] Updated weights for policy 0, policy_version 4936 (0.0036) +[2024-11-07 23:15:17,931][41694] Fps is (10 sec: 6553.6, 60 sec: 3754.7, 300 sec: 3465.9). Total num frames: 20238336. Throughput: 0: 1116.1. Samples: 52484. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:15:17,933][41694] Avg episode reward: [(0, '4.553')] +[2024-11-07 23:15:20,923][42004] Updated weights for policy 0, policy_version 4946 (0.0042) +[2024-11-07 23:15:22,932][41694] Fps is (10 sec: 6553.7, 60 sec: 4300.8, 300 sec: 3686.4). Total num frames: 20271104. Throughput: 0: 1158.6. Samples: 62346. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-11-07 23:15:22,934][41694] Avg episode reward: [(0, '4.498')] +[2024-11-07 23:15:27,932][41694] Fps is (10 sec: 5324.8, 60 sec: 4642.1, 300 sec: 3713.7). Total num frames: 20291584. Throughput: 0: 1180.7. Samples: 69894. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:15:27,933][41694] Avg episode reward: [(0, '4.436')] +[2024-11-07 23:15:28,550][42004] Updated weights for policy 0, policy_version 4956 (0.0032) +[2024-11-07 23:15:32,932][41694] Fps is (10 sec: 6143.6, 60 sec: 5051.7, 300 sec: 3993.6). Total num frames: 20332544. Throughput: 0: 1225.3. Samples: 75292. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:15:32,935][41694] Avg episode reward: [(0, '4.524')] +[2024-11-07 23:15:33,956][42004] Updated weights for policy 0, policy_version 4966 (0.0026) +[2024-11-07 23:15:37,932][41694] Fps is (10 sec: 7372.7, 60 sec: 5120.0, 300 sec: 4144.2). Total num frames: 20365312. Throughput: 0: 1398.1. Samples: 85892. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:15:37,934][41694] Avg episode reward: [(0, '4.442')] +[2024-11-07 23:15:37,949][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000004972_20365312.pth... +[2024-11-07 23:15:38,073][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000004884_20004864.pth +[2024-11-07 23:15:40,320][42004] Updated weights for policy 0, policy_version 4976 (0.0030) +[2024-11-07 23:15:42,932][41694] Fps is (10 sec: 6963.6, 60 sec: 5393.1, 300 sec: 4323.6). Total num frames: 20402176. Throughput: 0: 1554.9. Samples: 96226. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-11-07 23:15:42,933][41694] Avg episode reward: [(0, '4.372')] +[2024-11-07 23:15:45,861][42004] Updated weights for policy 0, policy_version 4986 (0.0027) +[2024-11-07 23:15:47,931][41694] Fps is (10 sec: 6963.4, 60 sec: 5666.2, 300 sec: 4440.9). Total num frames: 20434944. Throughput: 0: 1614.1. Samples: 101560. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-11-07 23:15:47,933][41694] Avg episode reward: [(0, '4.305')] +[2024-11-07 23:15:51,414][42004] Updated weights for policy 0, policy_version 4996 (0.0027) +[2024-11-07 23:15:52,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6144.1, 300 sec: 4587.5). Total num frames: 20471808. Throughput: 0: 1670.3. Samples: 112796. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:15:52,933][41694] Avg episode reward: [(0, '4.378')] +[2024-11-07 23:15:57,128][42004] Updated weights for policy 0, policy_version 5006 (0.0047) +[2024-11-07 23:15:59,216][41694] Fps is (10 sec: 6533.8, 60 sec: 6416.3, 300 sec: 4663.1). Total num frames: 20508672. Throughput: 0: 1615.9. Samples: 123526. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:15:59,217][41694] Avg episode reward: [(0, '4.346')] +[2024-11-07 23:16:02,932][41694] Fps is (10 sec: 6143.9, 60 sec: 6553.6, 300 sec: 4729.0). Total num frames: 20533248. Throughput: 0: 1640.8. Samples: 126320. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:16:02,933][41694] Avg episode reward: [(0, '4.581')] +[2024-11-07 23:16:04,614][42004] Updated weights for policy 0, policy_version 5016 (0.0027) +[2024-11-07 23:16:07,932][41694] Fps is (10 sec: 7049.1, 60 sec: 6621.9, 300 sec: 4844.0). Total num frames: 20570112. Throughput: 0: 1649.9. Samples: 136590. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:16:07,933][41694] Avg episode reward: [(0, '4.181')] +[2024-11-07 23:16:10,252][42004] Updated weights for policy 0, policy_version 5026 (0.0044) +[2024-11-07 23:16:12,934][41694] Fps is (10 sec: 6961.5, 60 sec: 6621.6, 300 sec: 4915.1). Total num frames: 20602880. Throughput: 0: 1718.6. Samples: 147236. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:16:12,938][41694] Avg episode reward: [(0, '4.410')] +[2024-11-07 23:16:16,077][42004] Updated weights for policy 0, policy_version 5036 (0.0029) +[2024-11-07 23:16:17,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6690.1, 300 sec: 5013.5). Total num frames: 20639744. Throughput: 0: 1710.3. Samples: 152256. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-07 23:16:17,933][41694] Avg episode reward: [(0, '4.456')] +[2024-11-07 23:16:21,467][42004] Updated weights for policy 0, policy_version 5046 (0.0032) +[2024-11-07 23:16:22,932][41694] Fps is (10 sec: 7374.6, 60 sec: 6758.4, 300 sec: 5104.2). Total num frames: 20676608. Throughput: 0: 1728.0. Samples: 163654. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-07 23:16:22,934][41694] Avg episode reward: [(0, '4.516')] +[2024-11-07 23:16:27,084][42004] Updated weights for policy 0, policy_version 5056 (0.0040) +[2024-11-07 23:16:27,932][41694] Fps is (10 sec: 7372.3, 60 sec: 7031.4, 300 sec: 5188.2). Total num frames: 20713472. Throughput: 0: 1742.8. Samples: 174652. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:16:27,934][41694] Avg episode reward: [(0, '4.304')] +[2024-11-07 23:16:33,140][41694] Fps is (10 sec: 6018.6, 60 sec: 6735.1, 300 sec: 5170.8). Total num frames: 20738048. Throughput: 0: 1737.1. Samples: 180092. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:16:33,142][41694] Avg episode reward: [(0, '4.632')] +[2024-11-07 23:16:34,644][42004] Updated weights for policy 0, policy_version 5066 (0.0025) +[2024-11-07 23:16:37,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6758.3, 300 sec: 5225.9). Total num frames: 20770816. Throughput: 0: 1661.1. Samples: 187546. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-07 23:16:37,935][41694] Avg episode reward: [(0, '4.672')] +[2024-11-07 23:16:40,113][42004] Updated weights for policy 0, policy_version 5076 (0.0031) +[2024-11-07 23:16:42,934][41694] Fps is (10 sec: 7109.4, 60 sec: 6758.1, 300 sec: 5297.4). Total num frames: 20807680. Throughput: 0: 1720.1. Samples: 198726. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:16:42,937][41694] Avg episode reward: [(0, '4.650')] +[2024-11-07 23:16:46,338][42004] Updated weights for policy 0, policy_version 5086 (0.0036) +[2024-11-07 23:16:47,932][41694] Fps is (10 sec: 6963.7, 60 sec: 6758.4, 300 sec: 5338.0). Total num frames: 20840448. Throughput: 0: 1714.4. Samples: 203470. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:16:47,934][41694] Avg episode reward: [(0, '4.359')] +[2024-11-07 23:16:51,674][42004] Updated weights for policy 0, policy_version 5096 (0.0042) +[2024-11-07 23:16:52,932][41694] Fps is (10 sec: 7374.5, 60 sec: 6826.6, 300 sec: 5427.2). Total num frames: 20881408. Throughput: 0: 1735.5. Samples: 214688. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:16:52,934][41694] Avg episode reward: [(0, '4.543')] +[2024-11-07 23:16:57,541][42004] Updated weights for policy 0, policy_version 5106 (0.0038) +[2024-11-07 23:16:57,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6906.2, 300 sec: 5461.3). Total num frames: 20914176. Throughput: 0: 1731.8. Samples: 225164. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-07 23:16:57,934][41694] Avg episode reward: [(0, '4.574')] +[2024-11-07 23:17:02,932][41694] Fps is (10 sec: 6963.4, 60 sec: 6963.2, 300 sec: 5517.5). Total num frames: 20951040. Throughput: 0: 1747.8. Samples: 230908. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-07 23:17:02,935][41694] Avg episode reward: [(0, '4.391')] +[2024-11-07 23:17:03,411][42004] Updated weights for policy 0, policy_version 5116 (0.0028) +[2024-11-07 23:17:07,932][41694] Fps is (10 sec: 6143.9, 60 sec: 6758.4, 300 sec: 5500.3). Total num frames: 20975616. Throughput: 0: 1697.2. Samples: 240026. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:17:07,933][41694] Avg episode reward: [(0, '4.340')] +[2024-11-07 23:17:10,440][42004] Updated weights for policy 0, policy_version 5126 (0.0049) +[2024-11-07 23:17:12,931][41694] Fps is (10 sec: 6144.1, 60 sec: 6827.0, 300 sec: 5552.4). Total num frames: 21012480. Throughput: 0: 1662.6. Samples: 249466. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:17:12,933][41694] Avg episode reward: [(0, '4.520')] +[2024-11-07 23:17:16,420][42004] Updated weights for policy 0, policy_version 5136 (0.0027) +[2024-11-07 23:17:17,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6758.4, 300 sec: 5579.4). Total num frames: 21045248. Throughput: 0: 1661.7. Samples: 254522. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:17:17,935][41694] Avg episode reward: [(0, '4.523')] +[2024-11-07 23:17:22,272][42004] Updated weights for policy 0, policy_version 5146 (0.0024) +[2024-11-07 23:17:22,932][41694] Fps is (10 sec: 6963.0, 60 sec: 6758.4, 300 sec: 5626.6). Total num frames: 21082112. Throughput: 0: 1710.8. Samples: 264532. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:17:22,934][41694] Avg episode reward: [(0, '4.213')] +[2024-11-07 23:17:27,768][42004] Updated weights for policy 0, policy_version 5156 (0.0033) +[2024-11-07 23:17:27,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6758.5, 300 sec: 5671.4). Total num frames: 21118976. Throughput: 0: 1717.5. Samples: 276010. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:17:27,937][41694] Avg episode reward: [(0, '4.523')] +[2024-11-07 23:17:32,932][41694] Fps is (10 sec: 7373.0, 60 sec: 6987.5, 300 sec: 5713.9). Total num frames: 21155840. Throughput: 0: 1737.2. Samples: 281646. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:17:32,938][41694] Avg episode reward: [(0, '4.265')] +[2024-11-07 23:17:33,205][42004] Updated weights for policy 0, policy_version 5166 (0.0033) +[2024-11-07 23:17:37,931][41694] Fps is (10 sec: 7372.9, 60 sec: 7031.6, 300 sec: 5754.4). Total num frames: 21192704. Throughput: 0: 1722.6. Samples: 292206. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:17:37,933][41694] Avg episode reward: [(0, '4.459')] +[2024-11-07 23:17:37,945][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000005174_21192704.pth... +[2024-11-07 23:17:38,053][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000004886_20013056.pth +[2024-11-07 23:17:40,601][42004] Updated weights for policy 0, policy_version 5176 (0.0030) +[2024-11-07 23:17:42,932][41694] Fps is (10 sec: 6144.0, 60 sec: 6827.0, 300 sec: 5734.4). Total num frames: 21217280. Throughput: 0: 1670.8. Samples: 300350. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:17:42,934][41694] Avg episode reward: [(0, '4.284')] +[2024-11-07 23:17:46,033][42004] Updated weights for policy 0, policy_version 5186 (0.0028) +[2024-11-07 23:17:47,932][41694] Fps is (10 sec: 6143.9, 60 sec: 6894.9, 300 sec: 5772.5). Total num frames: 21254144. Throughput: 0: 1669.1. Samples: 306018. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:17:47,933][41694] Avg episode reward: [(0, '4.708')] +[2024-11-07 23:17:51,536][42004] Updated weights for policy 0, policy_version 5196 (0.0033) +[2024-11-07 23:17:52,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6826.7, 300 sec: 5808.9). Total num frames: 21291008. Throughput: 0: 1718.7. Samples: 317366. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-07 23:17:52,933][41694] Avg episode reward: [(0, '4.642')] +[2024-11-07 23:17:57,743][42004] Updated weights for policy 0, policy_version 5206 (0.0032) +[2024-11-07 23:17:57,932][41694] Fps is (10 sec: 6963.3, 60 sec: 6826.7, 300 sec: 5825.4). Total num frames: 21323776. Throughput: 0: 1733.9. Samples: 327490. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:17:57,933][41694] Avg episode reward: [(0, '4.614')] +[2024-11-07 23:18:02,931][41694] Fps is (10 sec: 6553.7, 60 sec: 6758.4, 300 sec: 5841.3). Total num frames: 21356544. Throughput: 0: 1731.6. Samples: 332444. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:18:02,933][41694] Avg episode reward: [(0, '4.504')] +[2024-11-07 23:18:04,001][42004] Updated weights for policy 0, policy_version 5216 (0.0026) +[2024-11-07 23:18:07,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6963.2, 300 sec: 5873.8). Total num frames: 21393408. Throughput: 0: 1729.5. Samples: 342358. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:18:07,936][41694] Avg episode reward: [(0, '4.212')] +[2024-11-07 23:18:09,558][42004] Updated weights for policy 0, policy_version 5226 (0.0030) +[2024-11-07 23:18:14,270][41694] Fps is (10 sec: 6141.4, 60 sec: 6744.5, 300 sec: 5855.4). Total num frames: 21426176. Throughput: 0: 1674.0. Samples: 353580. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:18:14,273][41694] Avg episode reward: [(0, '4.370')] +[2024-11-07 23:18:16,659][42004] Updated weights for policy 0, policy_version 5236 (0.0027) +[2024-11-07 23:18:17,932][41694] Fps is (10 sec: 6144.1, 60 sec: 6826.6, 300 sec: 5884.9). Total num frames: 21454848. Throughput: 0: 1654.8. Samples: 356110. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:18:17,933][41694] Avg episode reward: [(0, '4.339')] +[2024-11-07 23:18:22,263][42004] Updated weights for policy 0, policy_version 5246 (0.0035) +[2024-11-07 23:18:22,931][41694] Fps is (10 sec: 7566.0, 60 sec: 6826.7, 300 sec: 5914.6). Total num frames: 21491712. Throughput: 0: 1663.6. Samples: 367066. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:18:22,933][41694] Avg episode reward: [(0, '4.504')] +[2024-11-07 23:18:27,832][42004] Updated weights for policy 0, policy_version 5256 (0.0029) +[2024-11-07 23:18:27,932][41694] Fps is (10 sec: 7372.5, 60 sec: 6826.6, 300 sec: 5943.2). Total num frames: 21528576. Throughput: 0: 1729.6. Samples: 378184. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-07 23:18:27,934][41694] Avg episode reward: [(0, '4.379')] +[2024-11-07 23:18:32,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6758.4, 300 sec: 5955.0). Total num frames: 21561344. Throughput: 0: 1719.5. Samples: 383396. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-07 23:18:32,933][41694] Avg episode reward: [(0, '4.289')] +[2024-11-07 23:18:34,228][42004] Updated weights for policy 0, policy_version 5266 (0.0037) +[2024-11-07 23:18:37,931][41694] Fps is (10 sec: 6554.0, 60 sec: 6690.1, 300 sec: 5966.3). Total num frames: 21594112. Throughput: 0: 1688.1. Samples: 393332. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:18:37,933][41694] Avg episode reward: [(0, '4.271')] +[2024-11-07 23:18:39,927][42004] Updated weights for policy 0, policy_version 5276 (0.0034) +[2024-11-07 23:18:42,932][41694] Fps is (10 sec: 6963.3, 60 sec: 6894.9, 300 sec: 5992.3). Total num frames: 21630976. Throughput: 0: 1702.7. Samples: 404112. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:18:42,934][41694] Avg episode reward: [(0, '4.371')] +[2024-11-07 23:18:45,483][42004] Updated weights for policy 0, policy_version 5286 (0.0026) +[2024-11-07 23:18:47,984][41694] Fps is (10 sec: 6112.0, 60 sec: 6684.3, 300 sec: 5971.6). Total num frames: 21655552. Throughput: 0: 1710.9. Samples: 409522. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:18:47,986][41694] Avg episode reward: [(0, '4.212')] +[2024-11-07 23:18:52,505][42004] Updated weights for policy 0, policy_version 5296 (0.0030) +[2024-11-07 23:18:52,932][41694] Fps is (10 sec: 6143.9, 60 sec: 6690.1, 300 sec: 5997.7). Total num frames: 21692416. Throughput: 0: 1678.9. Samples: 417910. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:18:52,933][41694] Avg episode reward: [(0, '4.309')] +[2024-11-07 23:18:57,932][41694] Fps is (10 sec: 6176.2, 60 sec: 6553.6, 300 sec: 5978.7). Total num frames: 21716992. Throughput: 0: 1672.5. Samples: 426604. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:18:57,934][41694] Avg episode reward: [(0, '4.440')] +[2024-11-07 23:19:00,980][42004] Updated weights for policy 0, policy_version 5306 (0.0030) +[2024-11-07 23:19:02,932][41694] Fps is (10 sec: 4505.7, 60 sec: 6348.8, 300 sec: 5946.3). Total num frames: 21737472. Throughput: 0: 1635.7. Samples: 429716. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:19:02,933][41694] Avg episode reward: [(0, '4.387')] +[2024-11-07 23:19:07,932][41694] Fps is (10 sec: 4505.6, 60 sec: 6144.0, 300 sec: 5928.8). Total num frames: 21762048. Throughput: 0: 1537.3. Samples: 436246. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:19:07,936][41694] Avg episode reward: [(0, '4.461')] +[2024-11-07 23:19:10,443][42004] Updated weights for policy 0, policy_version 5316 (0.0047) +[2024-11-07 23:19:12,931][41694] Fps is (10 sec: 4505.6, 60 sec: 6074.7, 300 sec: 5998.2). Total num frames: 21782528. Throughput: 0: 1438.8. Samples: 442930. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:19:12,935][41694] Avg episode reward: [(0, '4.373')] +[2024-11-07 23:19:17,938][41694] Fps is (10 sec: 4912.2, 60 sec: 5938.6, 300 sec: 6095.3). Total num frames: 21811200. Throughput: 0: 1410.2. Samples: 446864. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:19:17,944][41694] Avg episode reward: [(0, '4.178')] +[2024-11-07 23:19:18,276][42004] Updated weights for policy 0, policy_version 5326 (0.0037) +[2024-11-07 23:19:22,932][41694] Fps is (10 sec: 4915.2, 60 sec: 5666.1, 300 sec: 6164.8). Total num frames: 21831680. Throughput: 0: 1349.5. Samples: 454060. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:19:22,942][41694] Avg episode reward: [(0, '4.190')] +[2024-11-07 23:19:26,855][42004] Updated weights for policy 0, policy_version 5336 (0.0041) +[2024-11-07 23:19:27,932][41694] Fps is (10 sec: 4918.1, 60 sec: 5529.6, 300 sec: 6206.5). Total num frames: 21860352. Throughput: 0: 1281.0. Samples: 461756. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:19:27,934][41694] Avg episode reward: [(0, '4.514')] +[2024-11-07 23:19:32,932][41694] Fps is (10 sec: 5734.2, 60 sec: 5461.3, 300 sec: 6206.5). Total num frames: 21889024. Throughput: 0: 1256.3. Samples: 465990. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:19:32,934][41694] Avg episode reward: [(0, '4.506')] +[2024-11-07 23:19:34,427][42004] Updated weights for policy 0, policy_version 5346 (0.0041) +[2024-11-07 23:19:37,932][41694] Fps is (10 sec: 5324.8, 60 sec: 5324.8, 300 sec: 6220.4). Total num frames: 21913600. Throughput: 0: 1240.8. Samples: 473744. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:19:37,933][41694] Avg episode reward: [(0, '4.723')] +[2024-11-07 23:19:37,949][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000005350_21913600.pth... +[2024-11-07 23:19:38,175][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000004972_20365312.pth +[2024-11-07 23:19:42,932][41694] Fps is (10 sec: 4505.7, 60 sec: 5051.7, 300 sec: 6234.3). Total num frames: 21934080. Throughput: 0: 1191.4. Samples: 480218. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-11-07 23:19:42,934][41694] Avg episode reward: [(0, '4.619')] +[2024-11-07 23:19:43,630][42004] Updated weights for policy 0, policy_version 5356 (0.0051) +[2024-11-07 23:19:47,936][41694] Fps is (10 sec: 4094.4, 60 sec: 4987.5, 300 sec: 6275.8). Total num frames: 21954560. Throughput: 0: 1187.8. Samples: 483174. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:19:47,937][41694] Avg episode reward: [(0, '4.444')] +[2024-11-07 23:19:52,815][42004] Updated weights for policy 0, policy_version 5366 (0.0027) +[2024-11-07 23:19:52,931][41694] Fps is (10 sec: 4505.6, 60 sec: 4778.7, 300 sec: 6317.6). Total num frames: 21979136. Throughput: 0: 1189.9. Samples: 489792. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:19:52,935][41694] Avg episode reward: [(0, '4.538')] +[2024-11-07 23:19:57,950][41694] Fps is (10 sec: 4090.2, 60 sec: 4640.7, 300 sec: 6289.4). Total num frames: 21995520. Throughput: 0: 1165.0. Samples: 495378. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:19:57,961][41694] Avg episode reward: [(0, '4.533')] +[2024-11-07 23:20:02,487][42004] Updated weights for policy 0, policy_version 5376 (0.0033) +[2024-11-07 23:20:02,932][41694] Fps is (10 sec: 4095.8, 60 sec: 4710.4, 300 sec: 6262.0). Total num frames: 22020096. Throughput: 0: 1163.1. Samples: 499198. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:20:02,935][41694] Avg episode reward: [(0, '4.614')] +[2024-11-07 23:20:07,934][41694] Fps is (10 sec: 4923.3, 60 sec: 4710.3, 300 sec: 6234.2). Total num frames: 22044672. Throughput: 0: 1167.5. Samples: 506600. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-11-07 23:20:07,936][41694] Avg episode reward: [(0, '4.482')] +[2024-11-07 23:20:10,718][42004] Updated weights for policy 0, policy_version 5386 (0.0024) +[2024-11-07 23:20:12,933][41694] Fps is (10 sec: 5324.3, 60 sec: 4846.8, 300 sec: 6220.3). Total num frames: 22073344. Throughput: 0: 1176.6. Samples: 514704. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:20:12,935][41694] Avg episode reward: [(0, '4.449')] +[2024-11-07 23:20:17,932][41694] Fps is (10 sec: 5325.7, 60 sec: 4779.2, 300 sec: 6192.6). Total num frames: 22097920. Throughput: 0: 1167.0. Samples: 518506. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:20:17,934][41694] Avg episode reward: [(0, '4.275')] +[2024-11-07 23:20:18,027][42004] Updated weights for policy 0, policy_version 5396 (0.0057) +[2024-11-07 23:20:22,932][41694] Fps is (10 sec: 5325.5, 60 sec: 4915.2, 300 sec: 6220.4). Total num frames: 22126592. Throughput: 0: 1177.3. Samples: 526722. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:20:22,935][41694] Avg episode reward: [(0, '4.287')] +[2024-11-07 23:20:25,432][42004] Updated weights for policy 0, policy_version 5406 (0.0041) +[2024-11-07 23:20:29,142][41694] Fps is (10 sec: 5115.1, 60 sec: 4818.0, 300 sec: 6153.5). Total num frames: 22155264. Throughput: 0: 1197.7. Samples: 535564. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:20:29,145][41694] Avg episode reward: [(0, '4.491')] +[2024-11-07 23:20:32,932][41694] Fps is (10 sec: 5324.7, 60 sec: 4846.9, 300 sec: 6150.9). Total num frames: 22179840. Throughput: 0: 1220.7. Samples: 538100. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:20:32,938][41694] Avg episode reward: [(0, '4.315')] +[2024-11-07 23:20:33,213][42004] Updated weights for policy 0, policy_version 5416 (0.0034) +[2024-11-07 23:20:37,932][41694] Fps is (10 sec: 6058.3, 60 sec: 4915.2, 300 sec: 6123.2). Total num frames: 22208512. Throughput: 0: 1277.1. Samples: 547262. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:20:37,934][41694] Avg episode reward: [(0, '4.408')] +[2024-11-07 23:20:39,932][42004] Updated weights for policy 0, policy_version 5426 (0.0052) +[2024-11-07 23:20:42,932][41694] Fps is (10 sec: 6144.0, 60 sec: 5120.0, 300 sec: 6123.2). Total num frames: 22241280. Throughput: 0: 1373.2. Samples: 557148. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:20:42,934][41694] Avg episode reward: [(0, '4.375')] +[2024-11-07 23:20:46,436][42004] Updated weights for policy 0, policy_version 5436 (0.0036) +[2024-11-07 23:20:47,932][41694] Fps is (10 sec: 6553.3, 60 sec: 5325.1, 300 sec: 6109.3). Total num frames: 22274048. Throughput: 0: 1388.6. Samples: 561686. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:20:47,934][41694] Avg episode reward: [(0, '4.320')] +[2024-11-07 23:20:52,627][42004] Updated weights for policy 0, policy_version 5446 (0.0035) +[2024-11-07 23:20:52,932][41694] Fps is (10 sec: 6553.6, 60 sec: 5461.3, 300 sec: 6122.0). Total num frames: 22306816. Throughput: 0: 1446.7. Samples: 571700. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:20:52,935][41694] Avg episode reward: [(0, '4.444')] +[2024-11-07 23:20:57,932][41694] Fps is (10 sec: 6144.3, 60 sec: 5667.8, 300 sec: 6109.3). Total num frames: 22335488. Throughput: 0: 1457.7. Samples: 580300. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:20:57,933][41694] Avg episode reward: [(0, '4.487')] +[2024-11-07 23:20:59,892][42004] Updated weights for policy 0, policy_version 5456 (0.0037) +[2024-11-07 23:21:02,932][41694] Fps is (10 sec: 4915.3, 60 sec: 5597.9, 300 sec: 6053.7). Total num frames: 22355968. Throughput: 0: 1471.6. Samples: 584726. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:21:02,934][41694] Avg episode reward: [(0, '4.447')] +[2024-11-07 23:21:07,843][42004] Updated weights for policy 0, policy_version 5466 (0.0037) +[2024-11-07 23:21:07,932][41694] Fps is (10 sec: 5324.8, 60 sec: 5734.6, 300 sec: 6053.8). Total num frames: 22388736. Throughput: 0: 1440.8. Samples: 591556. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-11-07 23:21:07,933][41694] Avg episode reward: [(0, '4.593')] +[2024-11-07 23:21:12,937][41694] Fps is (10 sec: 6140.8, 60 sec: 5734.0, 300 sec: 6025.9). Total num frames: 22417408. Throughput: 0: 1489.6. Samples: 600802. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:21:12,941][41694] Avg episode reward: [(0, '4.328')] +[2024-11-07 23:21:14,990][42004] Updated weights for policy 0, policy_version 5476 (0.0043) +[2024-11-07 23:21:17,931][41694] Fps is (10 sec: 5734.5, 60 sec: 5802.7, 300 sec: 5998.2). Total num frames: 22446080. Throughput: 0: 1484.7. Samples: 604910. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:21:17,934][41694] Avg episode reward: [(0, '4.296')] +[2024-11-07 23:21:21,421][42004] Updated weights for policy 0, policy_version 5486 (0.0031) +[2024-11-07 23:21:22,931][41694] Fps is (10 sec: 6147.2, 60 sec: 5870.9, 300 sec: 5984.3). Total num frames: 22478848. Throughput: 0: 1492.1. Samples: 614406. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:21:22,933][41694] Avg episode reward: [(0, '4.217')] +[2024-11-07 23:21:27,498][42004] Updated weights for policy 0, policy_version 5496 (0.0032) +[2024-11-07 23:21:27,931][41694] Fps is (10 sec: 6553.5, 60 sec: 6061.5, 300 sec: 6016.3). Total num frames: 22511616. Throughput: 0: 1499.8. Samples: 624638. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:21:27,938][41694] Avg episode reward: [(0, '4.468')] +[2024-11-07 23:21:32,932][41694] Fps is (10 sec: 6143.9, 60 sec: 6007.5, 300 sec: 5998.2). Total num frames: 22540288. Throughput: 0: 1503.3. Samples: 629336. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:21:32,935][41694] Avg episode reward: [(0, '4.506')] +[2024-11-07 23:21:34,659][42004] Updated weights for policy 0, policy_version 5506 (0.0043) +[2024-11-07 23:21:37,932][41694] Fps is (10 sec: 4915.2, 60 sec: 5870.9, 300 sec: 5942.7). Total num frames: 22560768. Throughput: 0: 1440.4. Samples: 636518. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:21:37,937][41694] Avg episode reward: [(0, '4.585')] +[2024-11-07 23:21:37,954][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000005509_22564864.pth... +[2024-11-07 23:21:38,088][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000005174_21192704.pth +[2024-11-07 23:21:42,231][42004] Updated weights for policy 0, policy_version 5516 (0.0044) +[2024-11-07 23:21:42,932][41694] Fps is (10 sec: 5734.5, 60 sec: 5939.2, 300 sec: 5956.6). Total num frames: 22597632. Throughput: 0: 1449.9. Samples: 645546. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:21:42,934][41694] Avg episode reward: [(0, '4.721')] +[2024-11-07 23:21:47,932][41694] Fps is (10 sec: 6963.1, 60 sec: 5939.2, 300 sec: 5928.8). Total num frames: 22630400. Throughput: 0: 1452.7. Samples: 650096. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-07 23:21:47,934][41694] Avg episode reward: [(0, '4.529')] +[2024-11-07 23:21:48,520][42004] Updated weights for policy 0, policy_version 5526 (0.0062) +[2024-11-07 23:21:52,932][41694] Fps is (10 sec: 6553.6, 60 sec: 5939.2, 300 sec: 5928.8). Total num frames: 22663168. Throughput: 0: 1524.6. Samples: 660162. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:21:52,933][41694] Avg episode reward: [(0, '4.368')] +[2024-11-07 23:21:54,601][42004] Updated weights for policy 0, policy_version 5536 (0.0032) +[2024-11-07 23:21:57,932][41694] Fps is (10 sec: 6553.7, 60 sec: 6007.5, 300 sec: 5914.9). Total num frames: 22695936. Throughput: 0: 1541.2. Samples: 670150. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:21:57,934][41694] Avg episode reward: [(0, '4.420')] +[2024-11-07 23:22:00,899][42004] Updated weights for policy 0, policy_version 5546 (0.0038) +[2024-11-07 23:22:02,932][41694] Fps is (10 sec: 6143.9, 60 sec: 6144.0, 300 sec: 5928.8). Total num frames: 22724608. Throughput: 0: 1560.8. Samples: 675148. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:22:02,936][41694] Avg episode reward: [(0, '4.534')] +[2024-11-07 23:22:07,544][42004] Updated weights for policy 0, policy_version 5556 (0.0030) +[2024-11-07 23:22:07,932][41694] Fps is (10 sec: 6143.9, 60 sec: 6144.0, 300 sec: 5914.9). Total num frames: 22757376. Throughput: 0: 1560.0. Samples: 684608. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:22:07,933][41694] Avg episode reward: [(0, '4.492')] +[2024-11-07 23:22:12,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6076.3, 300 sec: 5887.1). Total num frames: 22781952. Throughput: 0: 1495.3. Samples: 691928. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:22:12,933][41694] Avg episode reward: [(0, '4.384')] +[2024-11-07 23:22:14,920][42004] Updated weights for policy 0, policy_version 5566 (0.0039) +[2024-11-07 23:22:17,931][41694] Fps is (10 sec: 6144.2, 60 sec: 6212.3, 300 sec: 5887.1). Total num frames: 22818816. Throughput: 0: 1510.2. Samples: 697294. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:22:17,933][41694] Avg episode reward: [(0, '4.337')] +[2024-11-07 23:22:20,818][42004] Updated weights for policy 0, policy_version 5576 (0.0026) +[2024-11-07 23:22:22,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6212.3, 300 sec: 5873.2). Total num frames: 22851584. Throughput: 0: 1581.9. Samples: 707702. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:22:22,934][41694] Avg episode reward: [(0, '4.418')] +[2024-11-07 23:22:26,369][42004] Updated weights for policy 0, policy_version 5586 (0.0028) +[2024-11-07 23:22:27,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6280.5, 300 sec: 5873.2). Total num frames: 22888448. Throughput: 0: 1627.2. Samples: 718768. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:22:27,933][41694] Avg episode reward: [(0, '4.566')] +[2024-11-07 23:22:32,216][42004] Updated weights for policy 0, policy_version 5596 (0.0031) +[2024-11-07 23:22:32,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6417.1, 300 sec: 5873.2). Total num frames: 22925312. Throughput: 0: 1646.0. Samples: 724164. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:22:32,933][41694] Avg episode reward: [(0, '4.653')] +[2024-11-07 23:22:37,931][41694] Fps is (10 sec: 6553.7, 60 sec: 6553.6, 300 sec: 5887.1). Total num frames: 22953984. Throughput: 0: 1642.1. Samples: 734056. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:22:37,934][41694] Avg episode reward: [(0, '4.336')] +[2024-11-07 23:22:38,714][42004] Updated weights for policy 0, policy_version 5606 (0.0027) +[2024-11-07 23:22:44,075][41694] Fps is (10 sec: 5513.7, 60 sec: 6364.1, 300 sec: 5850.6). Total num frames: 22986752. Throughput: 0: 1595.8. Samples: 743786. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:22:44,076][41694] Avg episode reward: [(0, '4.503')] +[2024-11-07 23:22:45,978][42004] Updated weights for policy 0, policy_version 5616 (0.0037) +[2024-11-07 23:22:47,932][41694] Fps is (10 sec: 6143.9, 60 sec: 6417.1, 300 sec: 5845.5). Total num frames: 23015424. Throughput: 0: 1588.3. Samples: 746620. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:22:47,933][41694] Avg episode reward: [(0, '4.465')] +[2024-11-07 23:22:51,740][42004] Updated weights for policy 0, policy_version 5626 (0.0028) +[2024-11-07 23:22:52,932][41694] Fps is (10 sec: 7398.7, 60 sec: 6485.2, 300 sec: 5859.3). Total num frames: 23052288. Throughput: 0: 1613.8. Samples: 757230. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:22:52,935][41694] Avg episode reward: [(0, '4.584')] +[2024-11-07 23:22:57,215][42004] Updated weights for policy 0, policy_version 5636 (0.0028) +[2024-11-07 23:22:57,933][41694] Fps is (10 sec: 7372.1, 60 sec: 6553.5, 300 sec: 5873.2). Total num frames: 23089152. Throughput: 0: 1699.5. Samples: 768408. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:22:57,936][41694] Avg episode reward: [(0, '4.578')] +[2024-11-07 23:23:02,932][41694] Fps is (10 sec: 6963.7, 60 sec: 6621.9, 300 sec: 5859.4). Total num frames: 23121920. Throughput: 0: 1702.7. Samples: 773914. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:23:02,933][41694] Avg episode reward: [(0, '4.717')] +[2024-11-07 23:23:03,287][42004] Updated weights for policy 0, policy_version 5646 (0.0034) +[2024-11-07 23:23:07,931][41694] Fps is (10 sec: 6964.0, 60 sec: 6690.2, 300 sec: 5900.0). Total num frames: 23158784. Throughput: 0: 1695.4. Samples: 783996. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:23:07,933][41694] Avg episode reward: [(0, '4.331')] +[2024-11-07 23:23:09,043][42004] Updated weights for policy 0, policy_version 5656 (0.0038) +[2024-11-07 23:23:12,932][41694] Fps is (10 sec: 6553.7, 60 sec: 6758.4, 300 sec: 5873.2). Total num frames: 23187456. Throughput: 0: 1667.9. Samples: 793824. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:23:12,934][41694] Avg episode reward: [(0, '4.356')] +[2024-11-07 23:23:16,067][42004] Updated weights for policy 0, policy_version 5666 (0.0044) +[2024-11-07 23:23:17,932][41694] Fps is (10 sec: 4915.1, 60 sec: 6485.3, 300 sec: 5817.7). Total num frames: 23207936. Throughput: 0: 1645.0. Samples: 798188. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:23:17,935][41694] Avg episode reward: [(0, '4.365')] +[2024-11-07 23:23:22,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6553.6, 300 sec: 5817.7). Total num frames: 23244800. Throughput: 0: 1583.9. Samples: 805330. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:23:22,933][41694] Avg episode reward: [(0, '4.525')] +[2024-11-07 23:23:23,421][42004] Updated weights for policy 0, policy_version 5676 (0.0037) +[2024-11-07 23:23:27,932][41694] Fps is (10 sec: 6963.3, 60 sec: 6485.3, 300 sec: 5817.7). Total num frames: 23277568. Throughput: 0: 1649.5. Samples: 816126. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:23:27,935][41694] Avg episode reward: [(0, '4.498')] +[2024-11-07 23:23:29,165][42004] Updated weights for policy 0, policy_version 5686 (0.0028) +[2024-11-07 23:23:32,933][41694] Fps is (10 sec: 6552.6, 60 sec: 6416.9, 300 sec: 5817.7). Total num frames: 23310336. Throughput: 0: 1664.1. Samples: 821506. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:23:32,935][41694] Avg episode reward: [(0, '4.521')] +[2024-11-07 23:23:36,403][42004] Updated weights for policy 0, policy_version 5696 (0.0046) +[2024-11-07 23:23:37,932][41694] Fps is (10 sec: 6143.5, 60 sec: 6417.0, 300 sec: 5789.9). Total num frames: 23339008. Throughput: 0: 1610.0. Samples: 829680. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:23:37,934][41694] Avg episode reward: [(0, '4.387')] +[2024-11-07 23:23:37,944][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000005698_23339008.pth... +[2024-11-07 23:23:38,105][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000005350_21913600.pth +[2024-11-07 23:23:42,932][41694] Fps is (10 sec: 5735.2, 60 sec: 6472.1, 300 sec: 5804.8). Total num frames: 23367680. Throughput: 0: 1552.7. Samples: 838278. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:23:42,933][41694] Avg episode reward: [(0, '4.341')] +[2024-11-07 23:23:43,328][42004] Updated weights for policy 0, policy_version 5706 (0.0031) +[2024-11-07 23:23:47,932][41694] Fps is (10 sec: 5734.8, 60 sec: 6348.8, 300 sec: 5776.1). Total num frames: 23396352. Throughput: 0: 1531.5. Samples: 842830. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:23:47,940][41694] Avg episode reward: [(0, '4.441')] +[2024-11-07 23:23:51,790][42004] Updated weights for policy 0, policy_version 5716 (0.0031) +[2024-11-07 23:23:52,931][41694] Fps is (10 sec: 4915.3, 60 sec: 6075.8, 300 sec: 5762.2). Total num frames: 23416832. Throughput: 0: 1460.5. Samples: 849720. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:23:52,933][41694] Avg episode reward: [(0, '4.404')] +[2024-11-07 23:23:57,932][41694] Fps is (10 sec: 4915.2, 60 sec: 5939.3, 300 sec: 5789.9). Total num frames: 23445504. Throughput: 0: 1420.7. Samples: 857754. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:23:57,937][41694] Avg episode reward: [(0, '4.474')] +[2024-11-07 23:23:59,290][42004] Updated weights for policy 0, policy_version 5726 (0.0042) +[2024-11-07 23:24:02,933][41694] Fps is (10 sec: 5323.8, 60 sec: 5802.5, 300 sec: 5789.9). Total num frames: 23470080. Throughput: 0: 1411.7. Samples: 861716. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-11-07 23:24:02,936][41694] Avg episode reward: [(0, '4.493')] +[2024-11-07 23:24:06,806][42004] Updated weights for policy 0, policy_version 5736 (0.0044) +[2024-11-07 23:24:07,931][41694] Fps is (10 sec: 5325.0, 60 sec: 5666.1, 300 sec: 5817.7). Total num frames: 23498752. Throughput: 0: 1434.7. Samples: 869892. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:24:07,935][41694] Avg episode reward: [(0, '4.515')] +[2024-11-07 23:24:12,933][41694] Fps is (10 sec: 5734.7, 60 sec: 5666.0, 300 sec: 5817.8). Total num frames: 23527424. Throughput: 0: 1397.9. Samples: 879034. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:24:12,936][41694] Avg episode reward: [(0, '4.633')] +[2024-11-07 23:24:13,903][42004] Updated weights for policy 0, policy_version 5746 (0.0032) +[2024-11-07 23:24:17,937][41694] Fps is (10 sec: 5732.5, 60 sec: 5802.4, 300 sec: 5845.4). Total num frames: 23556096. Throughput: 0: 1359.2. Samples: 882672. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:24:17,940][41694] Avg episode reward: [(0, '4.432')] +[2024-11-07 23:24:21,250][42004] Updated weights for policy 0, policy_version 5756 (0.0034) +[2024-11-07 23:24:22,931][41694] Fps is (10 sec: 5735.2, 60 sec: 5666.1, 300 sec: 5845.5). Total num frames: 23584768. Throughput: 0: 1368.1. Samples: 891244. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:24:22,936][41694] Avg episode reward: [(0, '4.290')] +[2024-11-07 23:24:27,931][41694] Fps is (10 sec: 4916.8, 60 sec: 5461.3, 300 sec: 5817.7). Total num frames: 23605248. Throughput: 0: 1330.7. Samples: 898160. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:24:27,938][41694] Avg episode reward: [(0, '4.351')] +[2024-11-07 23:24:29,028][42004] Updated weights for policy 0, policy_version 5766 (0.0037) +[2024-11-07 23:24:32,931][41694] Fps is (10 sec: 5734.4, 60 sec: 5529.7, 300 sec: 5859.4). Total num frames: 23642112. Throughput: 0: 1352.1. Samples: 903676. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:24:32,938][41694] Avg episode reward: [(0, '4.261')] +[2024-11-07 23:24:34,852][42004] Updated weights for policy 0, policy_version 5776 (0.0028) +[2024-11-07 23:24:37,931][41694] Fps is (10 sec: 6963.2, 60 sec: 5598.0, 300 sec: 5901.0). Total num frames: 23674880. Throughput: 0: 1424.4. Samples: 913818. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:24:37,934][41694] Avg episode reward: [(0, '4.400')] +[2024-11-07 23:24:41,092][42004] Updated weights for policy 0, policy_version 5786 (0.0030) +[2024-11-07 23:24:42,932][41694] Fps is (10 sec: 6553.5, 60 sec: 5666.1, 300 sec: 5942.7). Total num frames: 23707648. Throughput: 0: 1462.8. Samples: 923580. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:24:42,935][41694] Avg episode reward: [(0, '4.343')] +[2024-11-07 23:24:47,311][42004] Updated weights for policy 0, policy_version 5796 (0.0042) +[2024-11-07 23:24:47,932][41694] Fps is (10 sec: 6963.1, 60 sec: 5802.7, 300 sec: 5984.3). Total num frames: 23744512. Throughput: 0: 1483.7. Samples: 928478. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:24:47,934][41694] Avg episode reward: [(0, '4.496')] +[2024-11-07 23:24:52,932][41694] Fps is (10 sec: 6553.6, 60 sec: 5939.2, 300 sec: 6026.3). Total num frames: 23773184. Throughput: 0: 1524.3. Samples: 938484. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:24:52,933][41694] Avg episode reward: [(0, '4.529')] +[2024-11-07 23:24:54,130][42004] Updated weights for policy 0, policy_version 5806 (0.0042) +[2024-11-07 23:24:58,973][41694] Fps is (10 sec: 4822.6, 60 sec: 5770.8, 300 sec: 6004.8). Total num frames: 23797760. Throughput: 0: 1474.5. Samples: 946918. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:24:58,976][41694] Avg episode reward: [(0, '4.464')] +[2024-11-07 23:25:02,260][42004] Updated weights for policy 0, policy_version 5816 (0.0037) +[2024-11-07 23:25:02,932][41694] Fps is (10 sec: 4915.2, 60 sec: 5871.1, 300 sec: 6026.0). Total num frames: 23822336. Throughput: 0: 1485.3. Samples: 949506. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:25:02,934][41694] Avg episode reward: [(0, '4.542')] +[2024-11-07 23:25:07,932][41694] Fps is (10 sec: 6858.1, 60 sec: 6007.5, 300 sec: 6053.8). Total num frames: 23859200. Throughput: 0: 1510.0. Samples: 959192. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:25:07,938][41694] Avg episode reward: [(0, '4.595')] +[2024-11-07 23:25:08,392][42004] Updated weights for policy 0, policy_version 5826 (0.0035) +[2024-11-07 23:25:12,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6075.9, 300 sec: 6081.5). Total num frames: 23891968. Throughput: 0: 1589.9. Samples: 969704. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:25:12,933][41694] Avg episode reward: [(0, '4.355')] +[2024-11-07 23:25:14,685][42004] Updated weights for policy 0, policy_version 5836 (0.0042) +[2024-11-07 23:25:17,932][41694] Fps is (10 sec: 6143.5, 60 sec: 6076.0, 300 sec: 6081.5). Total num frames: 23920640. Throughput: 0: 1562.5. Samples: 973990. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:25:17,935][41694] Avg episode reward: [(0, '4.223')] +[2024-11-07 23:25:22,028][42004] Updated weights for policy 0, policy_version 5846 (0.0033) +[2024-11-07 23:25:22,933][41694] Fps is (10 sec: 5733.6, 60 sec: 6075.6, 300 sec: 6106.6). Total num frames: 23949312. Throughput: 0: 1517.5. Samples: 982108. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:25:22,935][41694] Avg episode reward: [(0, '4.454')] +[2024-11-07 23:25:27,932][41694] Fps is (10 sec: 6144.5, 60 sec: 6280.5, 300 sec: 6109.3). Total num frames: 23982080. Throughput: 0: 1522.0. Samples: 992070. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:25:27,934][41694] Avg episode reward: [(0, '4.394')] +[2024-11-07 23:25:28,323][42004] Updated weights for policy 0, policy_version 5856 (0.0045) +[2024-11-07 23:25:32,932][41694] Fps is (10 sec: 5325.5, 60 sec: 6007.5, 300 sec: 6081.5). Total num frames: 24002560. Throughput: 0: 1513.8. Samples: 996598. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:25:32,934][41694] Avg episode reward: [(0, '4.512')] +[2024-11-07 23:25:36,148][42004] Updated weights for policy 0, policy_version 5866 (0.0036) +[2024-11-07 23:25:37,932][41694] Fps is (10 sec: 5324.8, 60 sec: 6007.5, 300 sec: 6081.5). Total num frames: 24035328. Throughput: 0: 1457.8. Samples: 1004084. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:25:37,934][41694] Avg episode reward: [(0, '4.605')] +[2024-11-07 23:25:37,952][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000005869_24039424.pth... +[2024-11-07 23:25:38,135][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000005509_22564864.pth +[2024-11-07 23:25:42,091][42004] Updated weights for policy 0, policy_version 5876 (0.0042) +[2024-11-07 23:25:42,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6075.8, 300 sec: 6095.4). Total num frames: 24072192. Throughput: 0: 1534.2. Samples: 1014358. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:25:42,935][41694] Avg episode reward: [(0, '4.627')] +[2024-11-07 23:25:47,633][42004] Updated weights for policy 0, policy_version 5886 (0.0028) +[2024-11-07 23:25:47,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6075.7, 300 sec: 6109.3). Total num frames: 24109056. Throughput: 0: 1561.3. Samples: 1019766. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:25:47,934][41694] Avg episode reward: [(0, '4.356')] +[2024-11-07 23:25:52,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6212.3, 300 sec: 6137.1). Total num frames: 24145920. Throughput: 0: 1594.9. Samples: 1030962. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:25:52,933][41694] Avg episode reward: [(0, '4.589')] +[2024-11-07 23:25:53,347][42004] Updated weights for policy 0, policy_version 5896 (0.0037) +[2024-11-07 23:25:57,932][41694] Fps is (10 sec: 6962.7, 60 sec: 6460.8, 300 sec: 6178.7). Total num frames: 24178688. Throughput: 0: 1592.7. Samples: 1041378. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:25:57,935][41694] Avg episode reward: [(0, '4.396')] +[2024-11-07 23:25:59,291][42004] Updated weights for policy 0, policy_version 5906 (0.0039) +[2024-11-07 23:26:02,932][41694] Fps is (10 sec: 6143.6, 60 sec: 6417.0, 300 sec: 6164.8). Total num frames: 24207360. Throughput: 0: 1604.0. Samples: 1046168. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:26:02,934][41694] Avg episode reward: [(0, '4.481')] +[2024-11-07 23:26:07,932][41694] Fps is (10 sec: 4915.5, 60 sec: 6144.0, 300 sec: 6137.2). Total num frames: 24227840. Throughput: 0: 1571.6. Samples: 1052826. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:26:07,934][41694] Avg episode reward: [(0, '4.459')] +[2024-11-07 23:26:08,280][42004] Updated weights for policy 0, policy_version 5916 (0.0022) +[2024-11-07 23:26:12,932][41694] Fps is (10 sec: 5325.0, 60 sec: 6144.0, 300 sec: 6150.9). Total num frames: 24260608. Throughput: 0: 1546.5. Samples: 1061662. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:26:12,936][41694] Avg episode reward: [(0, '4.420')] +[2024-11-07 23:26:14,568][42004] Updated weights for policy 0, policy_version 5926 (0.0036) +[2024-11-07 23:26:17,931][41694] Fps is (10 sec: 6553.8, 60 sec: 6212.4, 300 sec: 6150.9). Total num frames: 24293376. Throughput: 0: 1554.6. Samples: 1066554. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:26:17,933][41694] Avg episode reward: [(0, '4.313')] +[2024-11-07 23:26:20,283][42004] Updated weights for policy 0, policy_version 5936 (0.0027) +[2024-11-07 23:26:22,931][41694] Fps is (10 sec: 6963.5, 60 sec: 6349.0, 300 sec: 6164.8). Total num frames: 24330240. Throughput: 0: 1628.2. Samples: 1077354. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:26:22,934][41694] Avg episode reward: [(0, '4.285')] +[2024-11-07 23:26:26,023][42004] Updated weights for policy 0, policy_version 5946 (0.0043) +[2024-11-07 23:26:27,932][41694] Fps is (10 sec: 7372.6, 60 sec: 6417.1, 300 sec: 6192.6). Total num frames: 24367104. Throughput: 0: 1640.8. Samples: 1088194. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:26:27,934][41694] Avg episode reward: [(0, '4.440')] +[2024-11-07 23:26:31,565][42004] Updated weights for policy 0, policy_version 5956 (0.0034) +[2024-11-07 23:26:32,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6690.1, 300 sec: 6248.1). Total num frames: 24403968. Throughput: 0: 1640.7. Samples: 1093596. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:26:32,933][41694] Avg episode reward: [(0, '4.479')] +[2024-11-07 23:26:37,932][41694] Fps is (10 sec: 6553.7, 60 sec: 6621.9, 300 sec: 6220.4). Total num frames: 24432640. Throughput: 0: 1608.9. Samples: 1103362. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:26:37,936][41694] Avg episode reward: [(0, '4.403')] +[2024-11-07 23:26:38,146][42004] Updated weights for policy 0, policy_version 5966 (0.0043) +[2024-11-07 23:26:42,931][41694] Fps is (10 sec: 4915.2, 60 sec: 6348.8, 300 sec: 6178.7). Total num frames: 24453120. Throughput: 0: 1527.2. Samples: 1110102. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:26:42,933][41694] Avg episode reward: [(0, '4.515')] +[2024-11-07 23:26:45,798][42004] Updated weights for policy 0, policy_version 5976 (0.0037) +[2024-11-07 23:26:47,932][41694] Fps is (10 sec: 5734.2, 60 sec: 6348.8, 300 sec: 6192.6). Total num frames: 24489984. Throughput: 0: 1545.0. Samples: 1115694. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:26:47,935][41694] Avg episode reward: [(0, '4.521')] +[2024-11-07 23:26:51,308][42004] Updated weights for policy 0, policy_version 5986 (0.0038) +[2024-11-07 23:26:52,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6348.8, 300 sec: 6206.5). Total num frames: 24526848. Throughput: 0: 1644.8. Samples: 1126842. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-07 23:26:52,934][41694] Avg episode reward: [(0, '4.375')] +[2024-11-07 23:26:57,167][42004] Updated weights for policy 0, policy_version 5996 (0.0029) +[2024-11-07 23:26:57,932][41694] Fps is (10 sec: 7373.1, 60 sec: 6417.2, 300 sec: 6234.3). Total num frames: 24563712. Throughput: 0: 1681.5. Samples: 1137328. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-07 23:26:57,934][41694] Avg episode reward: [(0, '4.338')] +[2024-11-07 23:27:02,899][42004] Updated weights for policy 0, policy_version 6006 (0.0031) +[2024-11-07 23:27:02,931][41694] Fps is (10 sec: 7372.7, 60 sec: 6553.7, 300 sec: 6248.1). Total num frames: 24600576. Throughput: 0: 1693.2. Samples: 1142748. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:27:02,934][41694] Avg episode reward: [(0, '4.259')] +[2024-11-07 23:27:07,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6758.4, 300 sec: 6275.9). Total num frames: 24633344. Throughput: 0: 1687.2. Samples: 1153278. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:27:07,933][41694] Avg episode reward: [(0, '4.312')] +[2024-11-07 23:27:08,868][42004] Updated weights for policy 0, policy_version 6016 (0.0045) +[2024-11-07 23:27:14,568][41694] Fps is (10 sec: 5279.7, 60 sec: 6512.5, 300 sec: 6213.7). Total num frames: 24662016. Throughput: 0: 1593.0. Samples: 1162488. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:27:14,572][41694] Avg episode reward: [(0, '4.386')] +[2024-11-07 23:27:17,005][42004] Updated weights for policy 0, policy_version 6026 (0.0035) +[2024-11-07 23:27:17,932][41694] Fps is (10 sec: 5324.7, 60 sec: 6553.6, 300 sec: 6220.4). Total num frames: 24686592. Throughput: 0: 1581.9. Samples: 1164784. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:27:17,934][41694] Avg episode reward: [(0, '4.582')] +[2024-11-07 23:27:22,838][42004] Updated weights for policy 0, policy_version 6036 (0.0031) +[2024-11-07 23:27:22,931][41694] Fps is (10 sec: 7346.7, 60 sec: 6553.6, 300 sec: 6220.4). Total num frames: 24723456. Throughput: 0: 1593.1. Samples: 1175052. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:27:22,933][41694] Avg episode reward: [(0, '4.439')] +[2024-11-07 23:27:27,934][41694] Fps is (10 sec: 7371.6, 60 sec: 6553.4, 300 sec: 6220.3). Total num frames: 24760320. Throughput: 0: 1701.3. Samples: 1186664. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:27:27,937][41694] Avg episode reward: [(0, '4.607')] +[2024-11-07 23:27:28,032][42004] Updated weights for policy 0, policy_version 6046 (0.0020) +[2024-11-07 23:27:32,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6553.6, 300 sec: 6248.1). Total num frames: 24797184. Throughput: 0: 1700.2. Samples: 1192204. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:27:32,933][41694] Avg episode reward: [(0, '4.558')] +[2024-11-07 23:27:33,635][42004] Updated weights for policy 0, policy_version 6056 (0.0034) +[2024-11-07 23:27:37,931][41694] Fps is (10 sec: 7374.2, 60 sec: 6690.1, 300 sec: 6286.4). Total num frames: 24834048. Throughput: 0: 1699.2. Samples: 1203304. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:27:37,934][41694] Avg episode reward: [(0, '4.557')] +[2024-11-07 23:27:37,947][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000006063_24834048.pth... +[2024-11-07 23:27:38,148][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000005698_23339008.pth +[2024-11-07 23:27:39,400][42004] Updated weights for policy 0, policy_version 6066 (0.0027) +[2024-11-07 23:27:42,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6963.2, 300 sec: 6289.8). Total num frames: 24870912. Throughput: 0: 1706.1. Samples: 1214104. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:27:42,933][41694] Avg episode reward: [(0, '4.480')] +[2024-11-07 23:27:45,275][42004] Updated weights for policy 0, policy_version 6076 (0.0037) +[2024-11-07 23:27:48,571][41694] Fps is (10 sec: 5774.7, 60 sec: 6687.2, 300 sec: 6234.6). Total num frames: 24895488. Throughput: 0: 1669.3. Samples: 1218934. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:27:48,573][41694] Avg episode reward: [(0, '4.463')] +[2024-11-07 23:27:52,702][42004] Updated weights for policy 0, policy_version 6086 (0.0032) +[2024-11-07 23:27:52,932][41694] Fps is (10 sec: 5734.2, 60 sec: 6690.1, 300 sec: 6234.3). Total num frames: 24928256. Throughput: 0: 1625.6. Samples: 1226432. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:27:52,934][41694] Avg episode reward: [(0, '4.220')] +[2024-11-07 23:27:57,932][41694] Fps is (10 sec: 7438.9, 60 sec: 6690.1, 300 sec: 6248.1). Total num frames: 24965120. Throughput: 0: 1735.3. Samples: 1237738. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:27:57,933][41694] Avg episode reward: [(0, '4.487')] +[2024-11-07 23:27:58,424][42004] Updated weights for policy 0, policy_version 6096 (0.0036) +[2024-11-07 23:28:02,932][41694] Fps is (10 sec: 6963.3, 60 sec: 6621.8, 300 sec: 6234.2). Total num frames: 24997888. Throughput: 0: 1741.2. Samples: 1243138. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:28:02,934][41694] Avg episode reward: [(0, '4.184')] +[2024-11-07 23:28:04,183][42004] Updated weights for policy 0, policy_version 6106 (0.0028) +[2024-11-07 23:28:07,932][41694] Fps is (10 sec: 7372.6, 60 sec: 6758.4, 300 sec: 6275.9). Total num frames: 25038848. Throughput: 0: 1753.1. Samples: 1253942. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-07 23:28:07,934][41694] Avg episode reward: [(0, '4.459')] +[2024-11-07 23:28:09,356][42004] Updated weights for policy 0, policy_version 6116 (0.0027) +[2024-11-07 23:28:12,932][41694] Fps is (10 sec: 7782.4, 60 sec: 7088.3, 300 sec: 6331.4). Total num frames: 25075712. Throughput: 0: 1755.5. Samples: 1265658. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:28:12,934][41694] Avg episode reward: [(0, '4.400')] +[2024-11-07 23:28:14,875][42004] Updated weights for policy 0, policy_version 6126 (0.0026) +[2024-11-07 23:28:17,932][41694] Fps is (10 sec: 6963.4, 60 sec: 7031.5, 300 sec: 6317.6). Total num frames: 25108480. Throughput: 0: 1750.7. Samples: 1270986. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-07 23:28:17,933][41694] Avg episode reward: [(0, '4.429')] +[2024-11-07 23:28:22,932][41694] Fps is (10 sec: 5324.8, 60 sec: 6758.4, 300 sec: 6275.9). Total num frames: 25128960. Throughput: 0: 1705.7. Samples: 1280062. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-07 23:28:22,936][41694] Avg episode reward: [(0, '4.511')] +[2024-11-07 23:28:23,347][42004] Updated weights for policy 0, policy_version 6136 (0.0054) +[2024-11-07 23:28:27,933][41694] Fps is (10 sec: 5733.8, 60 sec: 6758.5, 300 sec: 6289.8). Total num frames: 25165824. Throughput: 0: 1638.4. Samples: 1287836. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:28:27,935][41694] Avg episode reward: [(0, '4.550')] +[2024-11-07 23:28:28,862][42004] Updated weights for policy 0, policy_version 6146 (0.0032) +[2024-11-07 23:28:32,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6758.4, 300 sec: 6317.6). Total num frames: 25202688. Throughput: 0: 1670.0. Samples: 1293014. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:28:32,933][41694] Avg episode reward: [(0, '4.386')] +[2024-11-07 23:28:34,259][42004] Updated weights for policy 0, policy_version 6156 (0.0028) +[2024-11-07 23:28:37,931][41694] Fps is (10 sec: 7373.6, 60 sec: 6758.4, 300 sec: 6345.3). Total num frames: 25239552. Throughput: 0: 1741.2. Samples: 1304786. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:28:37,933][41694] Avg episode reward: [(0, '4.312')] +[2024-11-07 23:28:39,750][42004] Updated weights for policy 0, policy_version 6166 (0.0031) +[2024-11-07 23:28:42,931][41694] Fps is (10 sec: 7782.4, 60 sec: 6826.7, 300 sec: 6387.0). Total num frames: 25280512. Throughput: 0: 1740.6. Samples: 1316064. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:28:42,933][41694] Avg episode reward: [(0, '4.270')] +[2024-11-07 23:28:44,818][42004] Updated weights for policy 0, policy_version 6176 (0.0026) +[2024-11-07 23:28:47,934][41694] Fps is (10 sec: 7780.6, 60 sec: 7106.9, 300 sec: 6442.5). Total num frames: 25317376. Throughput: 0: 1756.0. Samples: 1322164. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:28:47,936][41694] Avg episode reward: [(0, '4.397')] +[2024-11-07 23:28:50,273][42004] Updated weights for policy 0, policy_version 6186 (0.0034) +[2024-11-07 23:28:52,932][41694] Fps is (10 sec: 7372.7, 60 sec: 7099.8, 300 sec: 6470.3). Total num frames: 25354240. Throughput: 0: 1758.3. Samples: 1333064. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:28:52,934][41694] Avg episode reward: [(0, '4.416')] +[2024-11-07 23:28:57,932][41694] Fps is (10 sec: 5735.7, 60 sec: 6826.7, 300 sec: 6456.4). Total num frames: 25374720. Throughput: 0: 1651.6. Samples: 1339978. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:28:57,934][41694] Avg episode reward: [(0, '4.388')] +[2024-11-07 23:28:58,624][42004] Updated weights for policy 0, policy_version 6196 (0.0036) +[2024-11-07 23:29:02,932][41694] Fps is (10 sec: 5324.7, 60 sec: 6826.7, 300 sec: 6470.3). Total num frames: 25407488. Throughput: 0: 1642.7. Samples: 1344906. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:29:02,934][41694] Avg episode reward: [(0, '4.371')] +[2024-11-07 23:29:04,766][42004] Updated weights for policy 0, policy_version 6206 (0.0037) +[2024-11-07 23:29:07,932][41694] Fps is (10 sec: 6553.7, 60 sec: 6690.2, 300 sec: 6484.2). Total num frames: 25440256. Throughput: 0: 1655.1. Samples: 1354540. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:29:07,935][41694] Avg episode reward: [(0, '4.580')] +[2024-11-07 23:29:10,362][42004] Updated weights for policy 0, policy_version 6216 (0.0031) +[2024-11-07 23:29:12,932][41694] Fps is (10 sec: 7372.9, 60 sec: 6758.4, 300 sec: 6525.9). Total num frames: 25481216. Throughput: 0: 1738.2. Samples: 1366052. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:29:12,936][41694] Avg episode reward: [(0, '4.344')] +[2024-11-07 23:29:15,596][42004] Updated weights for policy 0, policy_version 6226 (0.0033) +[2024-11-07 23:29:17,931][41694] Fps is (10 sec: 7782.5, 60 sec: 6826.7, 300 sec: 6553.6). Total num frames: 25518080. Throughput: 0: 1751.9. Samples: 1371850. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:29:17,933][41694] Avg episode reward: [(0, '4.333')] +[2024-11-07 23:29:21,080][42004] Updated weights for policy 0, policy_version 6236 (0.0038) +[2024-11-07 23:29:22,932][41694] Fps is (10 sec: 7372.8, 60 sec: 7099.7, 300 sec: 6609.1). Total num frames: 25554944. Throughput: 0: 1741.4. Samples: 1383150. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:29:22,934][41694] Avg episode reward: [(0, '4.333')] +[2024-11-07 23:29:27,082][42004] Updated weights for policy 0, policy_version 6246 (0.0034) +[2024-11-07 23:29:27,931][41694] Fps is (10 sec: 6963.1, 60 sec: 7031.6, 300 sec: 6595.3). Total num frames: 25587712. Throughput: 0: 1713.4. Samples: 1393168. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:29:27,933][41694] Avg episode reward: [(0, '4.414')] +[2024-11-07 23:29:32,932][41694] Fps is (10 sec: 5324.7, 60 sec: 6758.4, 300 sec: 6553.6). Total num frames: 25608192. Throughput: 0: 1674.3. Samples: 1397506. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:29:32,935][41694] Avg episode reward: [(0, '4.278')] +[2024-11-07 23:29:34,946][42004] Updated weights for policy 0, policy_version 6256 (0.0032) +[2024-11-07 23:29:37,933][41694] Fps is (10 sec: 5733.7, 60 sec: 6758.3, 300 sec: 6567.5). Total num frames: 25645056. Throughput: 0: 1610.5. Samples: 1405538. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:29:37,946][41694] Avg episode reward: [(0, '4.174')] +[2024-11-07 23:29:37,979][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000006261_25645056.pth... +[2024-11-07 23:29:38,276][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000005869_24039424.pth +[2024-11-07 23:29:40,981][42004] Updated weights for policy 0, policy_version 6266 (0.0027) +[2024-11-07 23:29:42,932][41694] Fps is (10 sec: 6963.3, 60 sec: 6621.8, 300 sec: 6553.6). Total num frames: 25677824. Throughput: 0: 1682.3. Samples: 1415680. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-07 23:29:42,936][41694] Avg episode reward: [(0, '4.345')] +[2024-11-07 23:29:46,446][42004] Updated weights for policy 0, policy_version 6276 (0.0023) +[2024-11-07 23:29:47,932][41694] Fps is (10 sec: 6963.8, 60 sec: 6622.1, 300 sec: 6581.4). Total num frames: 25714688. Throughput: 0: 1699.9. Samples: 1421402. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:29:47,933][41694] Avg episode reward: [(0, '4.369')] +[2024-11-07 23:29:51,667][42004] Updated weights for policy 0, policy_version 6286 (0.0028) +[2024-11-07 23:29:52,931][41694] Fps is (10 sec: 7782.5, 60 sec: 6690.1, 300 sec: 6660.4). Total num frames: 25755648. Throughput: 0: 1746.4. Samples: 1433126. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:29:52,933][41694] Avg episode reward: [(0, '4.363')] +[2024-11-07 23:29:57,308][42004] Updated weights for policy 0, policy_version 6296 (0.0029) +[2024-11-07 23:29:57,932][41694] Fps is (10 sec: 7373.0, 60 sec: 6894.9, 300 sec: 6664.7). Total num frames: 25788416. Throughput: 0: 1739.1. Samples: 1444312. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:29:57,933][41694] Avg episode reward: [(0, '4.267')] +[2024-11-07 23:30:02,931][41694] Fps is (10 sec: 6553.6, 60 sec: 6895.0, 300 sec: 6650.8). Total num frames: 25821184. Throughput: 0: 1711.0. Samples: 1448844. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:30:02,933][41694] Avg episode reward: [(0, '4.644')] +[2024-11-07 23:30:03,872][42004] Updated weights for policy 0, policy_version 6306 (0.0030) +[2024-11-07 23:30:07,932][41694] Fps is (10 sec: 5324.7, 60 sec: 6690.1, 300 sec: 6609.1). Total num frames: 25841664. Throughput: 0: 1596.0. Samples: 1454970. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:30:07,934][41694] Avg episode reward: [(0, '4.767')] +[2024-11-07 23:30:11,725][42004] Updated weights for policy 0, policy_version 6316 (0.0028) +[2024-11-07 23:30:12,935][41694] Fps is (10 sec: 5322.9, 60 sec: 6553.2, 300 sec: 6623.0). Total num frames: 25874432. Throughput: 0: 1606.9. Samples: 1465484. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:30:12,941][41694] Avg episode reward: [(0, '4.262')] +[2024-11-07 23:30:17,932][41694] Fps is (10 sec: 6553.6, 60 sec: 6485.3, 300 sec: 6636.9). Total num frames: 25907200. Throughput: 0: 1615.0. Samples: 1470182. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:30:17,934][41694] Avg episode reward: [(0, '4.420')] +[2024-11-07 23:30:18,105][42004] Updated weights for policy 0, policy_version 6326 (0.0030) +[2024-11-07 23:30:22,931][41694] Fps is (10 sec: 6965.7, 60 sec: 6485.4, 300 sec: 6650.8). Total num frames: 25944064. Throughput: 0: 1664.0. Samples: 1480418. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:30:22,933][41694] Avg episode reward: [(0, '4.478')] +[2024-11-07 23:30:23,705][42004] Updated weights for policy 0, policy_version 6336 (0.0033) +[2024-11-07 23:30:27,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6621.9, 300 sec: 6720.2). Total num frames: 25985024. Throughput: 0: 1699.7. Samples: 1492166. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-07 23:30:27,934][41694] Avg episode reward: [(0, '4.499')] +[2024-11-07 23:30:28,901][42004] Updated weights for policy 0, policy_version 6346 (0.0032) +[2024-11-07 23:30:32,931][41694] Fps is (10 sec: 7372.7, 60 sec: 6826.7, 300 sec: 6720.2). Total num frames: 26017792. Throughput: 0: 1700.3. Samples: 1497914. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:30:32,933][41694] Avg episode reward: [(0, '4.400')] +[2024-11-07 23:30:35,290][42004] Updated weights for policy 0, policy_version 6356 (0.0025) +[2024-11-07 23:30:37,931][41694] Fps is (10 sec: 6553.7, 60 sec: 6758.5, 300 sec: 6706.3). Total num frames: 26050560. Throughput: 0: 1646.8. Samples: 1507232. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:30:37,933][41694] Avg episode reward: [(0, '4.198')] +[2024-11-07 23:30:42,931][41694] Fps is (10 sec: 4915.2, 60 sec: 6485.3, 300 sec: 6636.9). Total num frames: 26066944. Throughput: 0: 1541.0. Samples: 1513656. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:30:42,936][41694] Avg episode reward: [(0, '4.423')] +[2024-11-07 23:30:43,647][42004] Updated weights for policy 0, policy_version 6366 (0.0034) +[2024-11-07 23:30:47,931][41694] Fps is (10 sec: 5324.8, 60 sec: 6485.4, 300 sec: 6636.9). Total num frames: 26103808. Throughput: 0: 1551.4. Samples: 1518658. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:30:47,935][41694] Avg episode reward: [(0, '4.585')] +[2024-11-07 23:30:49,286][42004] Updated weights for policy 0, policy_version 6376 (0.0027) +[2024-11-07 23:30:52,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6417.1, 300 sec: 6650.8). Total num frames: 26140672. Throughput: 0: 1660.4. Samples: 1529686. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:30:52,935][41694] Avg episode reward: [(0, '4.446')] +[2024-11-07 23:30:54,577][42004] Updated weights for policy 0, policy_version 6386 (0.0024) +[2024-11-07 23:30:57,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6553.6, 300 sec: 6692.5). Total num frames: 26181632. Throughput: 0: 1693.6. Samples: 1541690. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:30:57,934][41694] Avg episode reward: [(0, '4.274')] +[2024-11-07 23:30:59,951][42004] Updated weights for policy 0, policy_version 6396 (0.0026) +[2024-11-07 23:31:02,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6553.6, 300 sec: 6734.1). Total num frames: 26214400. Throughput: 0: 1710.0. Samples: 1547130. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:31:02,933][41694] Avg episode reward: [(0, '4.484')] +[2024-11-07 23:31:05,896][42004] Updated weights for policy 0, policy_version 6406 (0.0026) +[2024-11-07 23:31:07,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6826.7, 300 sec: 6748.0). Total num frames: 26251264. Throughput: 0: 1713.1. Samples: 1557506. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:31:07,934][41694] Avg episode reward: [(0, '4.408')] +[2024-11-07 23:31:12,325][42004] Updated weights for policy 0, policy_version 6416 (0.0038) +[2024-11-07 23:31:14,825][41694] Fps is (10 sec: 5510.5, 60 sec: 6552.1, 300 sec: 6691.2). Total num frames: 26279936. Throughput: 0: 1595.6. Samples: 1566988. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:31:14,828][41694] Avg episode reward: [(0, '4.498')] +[2024-11-07 23:31:17,931][41694] Fps is (10 sec: 4915.3, 60 sec: 6553.6, 300 sec: 6678.6). Total num frames: 26300416. Throughput: 0: 1569.3. Samples: 1568534. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:31:17,934][41694] Avg episode reward: [(0, '4.447')] +[2024-11-07 23:31:20,553][42004] Updated weights for policy 0, policy_version 6426 (0.0037) +[2024-11-07 23:31:22,935][41694] Fps is (10 sec: 7073.2, 60 sec: 6553.6, 300 sec: 6678.6). Total num frames: 26337280. Throughput: 0: 1585.1. Samples: 1578560. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:31:22,938][41694] Avg episode reward: [(0, '4.505')] +[2024-11-07 23:31:26,110][42004] Updated weights for policy 0, policy_version 6436 (0.0028) +[2024-11-07 23:31:27,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6485.3, 300 sec: 6678.6). Total num frames: 26374144. Throughput: 0: 1692.4. Samples: 1589814. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:31:27,935][41694] Avg episode reward: [(0, '4.490')] +[2024-11-07 23:31:31,425][42004] Updated weights for policy 0, policy_version 6446 (0.0035) +[2024-11-07 23:31:32,931][41694] Fps is (10 sec: 7373.0, 60 sec: 6553.6, 300 sec: 6706.3). Total num frames: 26411008. Throughput: 0: 1705.7. Samples: 1595414. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:31:32,933][41694] Avg episode reward: [(0, '4.736')] +[2024-11-07 23:31:36,890][42004] Updated weights for policy 0, policy_version 6456 (0.0024) +[2024-11-07 23:31:37,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6621.8, 300 sec: 6761.9). Total num frames: 26447872. Throughput: 0: 1714.3. Samples: 1606832. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-11-07 23:31:37,936][41694] Avg episode reward: [(0, '4.123')] +[2024-11-07 23:31:37,956][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000006457_26447872.pth... +[2024-11-07 23:31:38,129][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000006063_24834048.pth +[2024-11-07 23:31:42,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6894.9, 300 sec: 6748.0). Total num frames: 26480640. Throughput: 0: 1660.9. Samples: 1616430. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:31:42,933][41694] Avg episode reward: [(0, '4.371')] +[2024-11-07 23:31:43,488][42004] Updated weights for policy 0, policy_version 6466 (0.0029) +[2024-11-07 23:31:49,066][41694] Fps is (10 sec: 5518.1, 60 sec: 6633.0, 300 sec: 6694.5). Total num frames: 26509312. Throughput: 0: 1607.5. Samples: 1621290. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:31:49,070][41694] Avg episode reward: [(0, '4.351')] +[2024-11-07 23:31:51,036][42004] Updated weights for policy 0, policy_version 6476 (0.0033) +[2024-11-07 23:31:52,931][41694] Fps is (10 sec: 5734.4, 60 sec: 6621.9, 300 sec: 6692.4). Total num frames: 26537984. Throughput: 0: 1583.4. Samples: 1628760. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:31:52,933][41694] Avg episode reward: [(0, '4.316')] +[2024-11-07 23:31:56,644][42004] Updated weights for policy 0, policy_version 6486 (0.0035) +[2024-11-07 23:31:57,932][41694] Fps is (10 sec: 7392.2, 60 sec: 6553.6, 300 sec: 6692.4). Total num frames: 26574848. Throughput: 0: 1691.3. Samples: 1639894. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:31:57,934][41694] Avg episode reward: [(0, '4.255')] +[2024-11-07 23:32:02,388][42004] Updated weights for policy 0, policy_version 6496 (0.0033) +[2024-11-07 23:32:02,933][41694] Fps is (10 sec: 6962.4, 60 sec: 6553.5, 300 sec: 6692.4). Total num frames: 26607616. Throughput: 0: 1706.4. Samples: 1645324. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:32:02,940][41694] Avg episode reward: [(0, '4.344')] +[2024-11-07 23:32:07,799][42004] Updated weights for policy 0, policy_version 6506 (0.0029) +[2024-11-07 23:32:07,931][41694] Fps is (10 sec: 7373.0, 60 sec: 6621.9, 300 sec: 6771.7). Total num frames: 26648576. Throughput: 0: 1727.1. Samples: 1656278. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:32:07,933][41694] Avg episode reward: [(0, '4.291')] +[2024-11-07 23:32:12,932][41694] Fps is (10 sec: 7783.3, 60 sec: 6978.6, 300 sec: 6775.8). Total num frames: 26685440. Throughput: 0: 1731.3. Samples: 1667722. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:32:12,937][41694] Avg episode reward: [(0, '4.245')] +[2024-11-07 23:32:13,506][42004] Updated weights for policy 0, policy_version 6516 (0.0035) +[2024-11-07 23:32:17,934][41694] Fps is (10 sec: 6551.9, 60 sec: 6894.7, 300 sec: 6747.9). Total num frames: 26714112. Throughput: 0: 1708.9. Samples: 1672320. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:32:17,936][41694] Avg episode reward: [(0, '4.466')] +[2024-11-07 23:32:19,598][42004] Updated weights for policy 0, policy_version 6526 (0.0035) +[2024-11-07 23:32:23,300][41694] Fps is (10 sec: 5135.4, 60 sec: 6649.3, 300 sec: 6698.0). Total num frames: 26738688. Throughput: 0: 1671.5. Samples: 1682668. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:32:23,302][41694] Avg episode reward: [(0, '4.474')] +[2024-11-07 23:32:27,437][42004] Updated weights for policy 0, policy_version 6536 (0.0032) +[2024-11-07 23:32:27,933][41694] Fps is (10 sec: 5735.0, 60 sec: 6621.7, 300 sec: 6692.4). Total num frames: 26771456. Throughput: 0: 1627.1. Samples: 1689652. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:32:27,936][41694] Avg episode reward: [(0, '4.388')] +[2024-11-07 23:32:32,931][41694] Fps is (10 sec: 7229.9, 60 sec: 6621.9, 300 sec: 6692.4). Total num frames: 26808320. Throughput: 0: 1677.7. Samples: 1694882. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:32:32,933][41694] Avg episode reward: [(0, '4.432')] +[2024-11-07 23:32:32,964][42004] Updated weights for policy 0, policy_version 6546 (0.0028) +[2024-11-07 23:32:37,932][41694] Fps is (10 sec: 7783.4, 60 sec: 6690.1, 300 sec: 6706.3). Total num frames: 26849280. Throughput: 0: 1726.2. Samples: 1706440. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:32:37,935][41694] Avg episode reward: [(0, '4.323')] +[2024-11-07 23:32:38,655][42004] Updated weights for policy 0, policy_version 6556 (0.0023) +[2024-11-07 23:32:42,932][41694] Fps is (10 sec: 7782.0, 60 sec: 6758.3, 300 sec: 6762.6). Total num frames: 26886144. Throughput: 0: 1730.4. Samples: 1717762. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-07 23:32:42,934][41694] Avg episode reward: [(0, '4.472')] +[2024-11-07 23:32:43,818][42004] Updated weights for policy 0, policy_version 6566 (0.0030) +[2024-11-07 23:32:47,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6958.2, 300 sec: 6748.0). Total num frames: 26918912. Throughput: 0: 1735.9. Samples: 1723438. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-07 23:32:47,933][41694] Avg episode reward: [(0, '4.246')] +[2024-11-07 23:32:50,126][42004] Updated weights for policy 0, policy_version 6576 (0.0037) +[2024-11-07 23:32:52,932][41694] Fps is (10 sec: 6553.7, 60 sec: 6894.9, 300 sec: 6734.1). Total num frames: 26951680. Throughput: 0: 1699.9. Samples: 1732776. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:32:52,936][41694] Avg episode reward: [(0, '4.361')] +[2024-11-07 23:32:57,932][41694] Fps is (10 sec: 5324.8, 60 sec: 6621.9, 300 sec: 6692.4). Total num frames: 26972160. Throughput: 0: 1623.1. Samples: 1740760. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:32:57,933][41694] Avg episode reward: [(0, '4.287')] +[2024-11-07 23:32:58,240][42004] Updated weights for policy 0, policy_version 6586 (0.0032) +[2024-11-07 23:33:02,931][41694] Fps is (10 sec: 5325.0, 60 sec: 6622.0, 300 sec: 6664.7). Total num frames: 27004928. Throughput: 0: 1601.0. Samples: 1744360. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:33:02,956][41694] Avg episode reward: [(0, '4.384')] +[2024-11-07 23:33:04,735][42004] Updated weights for policy 0, policy_version 6596 (0.0034) +[2024-11-07 23:33:07,931][41694] Fps is (10 sec: 6553.7, 60 sec: 6485.3, 300 sec: 6650.8). Total num frames: 27037696. Throughput: 0: 1605.3. Samples: 1754314. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:33:07,933][41694] Avg episode reward: [(0, '4.476')] +[2024-11-07 23:33:10,331][42004] Updated weights for policy 0, policy_version 6606 (0.0031) +[2024-11-07 23:33:12,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6485.3, 300 sec: 6664.7). Total num frames: 27074560. Throughput: 0: 1674.8. Samples: 1765018. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:33:12,933][41694] Avg episode reward: [(0, '4.316')] +[2024-11-07 23:33:15,771][42004] Updated weights for policy 0, policy_version 6616 (0.0035) +[2024-11-07 23:33:17,933][41694] Fps is (10 sec: 7371.7, 60 sec: 6622.0, 300 sec: 6720.2). Total num frames: 27111424. Throughput: 0: 1690.1. Samples: 1770940. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:33:17,937][41694] Avg episode reward: [(0, '4.372')] +[2024-11-07 23:33:21,960][42004] Updated weights for policy 0, policy_version 6626 (0.0036) +[2024-11-07 23:33:22,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6800.2, 300 sec: 6706.4). Total num frames: 27144192. Throughput: 0: 1663.5. Samples: 1781296. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:33:22,933][41694] Avg episode reward: [(0, '4.401')] +[2024-11-07 23:33:27,834][42004] Updated weights for policy 0, policy_version 6636 (0.0025) +[2024-11-07 23:33:27,932][41694] Fps is (10 sec: 6963.7, 60 sec: 6826.7, 300 sec: 6706.3). Total num frames: 27181056. Throughput: 0: 1635.1. Samples: 1791344. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:33:27,935][41694] Avg episode reward: [(0, '4.340')] +[2024-11-07 23:33:32,932][41694] Fps is (10 sec: 5324.8, 60 sec: 6485.3, 300 sec: 6636.9). Total num frames: 27197440. Throughput: 0: 1623.2. Samples: 1796482. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:33:32,933][41694] Avg episode reward: [(0, '4.405')] +[2024-11-07 23:33:36,020][42004] Updated weights for policy 0, policy_version 6646 (0.0038) +[2024-11-07 23:33:37,932][41694] Fps is (10 sec: 5325.0, 60 sec: 6417.1, 300 sec: 6623.0). Total num frames: 27234304. Throughput: 0: 1559.6. Samples: 1802956. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:33:37,934][41694] Avg episode reward: [(0, '4.531')] +[2024-11-07 23:33:37,963][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000006649_27234304.pth... +[2024-11-07 23:33:38,064][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000006261_25645056.pth +[2024-11-07 23:33:41,940][42004] Updated weights for policy 0, policy_version 6656 (0.0026) +[2024-11-07 23:33:42,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6348.9, 300 sec: 6609.2). Total num frames: 27267072. Throughput: 0: 1613.7. Samples: 1813376. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:33:42,934][41694] Avg episode reward: [(0, '4.512')] +[2024-11-07 23:33:47,348][42004] Updated weights for policy 0, policy_version 6666 (0.0027) +[2024-11-07 23:33:47,932][41694] Fps is (10 sec: 7372.9, 60 sec: 6485.3, 300 sec: 6623.0). Total num frames: 27308032. Throughput: 0: 1661.6. Samples: 1819134. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:33:47,934][41694] Avg episode reward: [(0, '4.558')] +[2024-11-07 23:33:52,886][42004] Updated weights for policy 0, policy_version 6676 (0.0033) +[2024-11-07 23:33:52,932][41694] Fps is (10 sec: 7782.2, 60 sec: 6553.6, 300 sec: 6678.6). Total num frames: 27344896. Throughput: 0: 1688.5. Samples: 1830296. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:33:52,934][41694] Avg episode reward: [(0, '4.525')] +[2024-11-07 23:33:57,932][41694] Fps is (10 sec: 6553.6, 60 sec: 6690.1, 300 sec: 6664.7). Total num frames: 27373568. Throughput: 0: 1669.3. Samples: 1840136. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:33:57,934][41694] Avg episode reward: [(0, '4.520')] +[2024-11-07 23:33:59,676][42004] Updated weights for policy 0, policy_version 6686 (0.0055) +[2024-11-07 23:34:02,932][41694] Fps is (10 sec: 6144.1, 60 sec: 6690.1, 300 sec: 6664.7). Total num frames: 27406336. Throughput: 0: 1635.7. Samples: 1844546. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:34:02,934][41694] Avg episode reward: [(0, '4.449')] +[2024-11-07 23:34:07,644][42004] Updated weights for policy 0, policy_version 6696 (0.0032) +[2024-11-07 23:34:07,932][41694] Fps is (10 sec: 5324.5, 60 sec: 6485.3, 300 sec: 6595.2). Total num frames: 27426816. Throughput: 0: 1574.2. Samples: 1852138. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:34:07,935][41694] Avg episode reward: [(0, '4.383')] +[2024-11-07 23:34:12,931][41694] Fps is (10 sec: 5324.8, 60 sec: 6417.1, 300 sec: 6581.4). Total num frames: 27459584. Throughput: 0: 1551.9. Samples: 1861178. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-11-07 23:34:12,933][41694] Avg episode reward: [(0, '4.420')] +[2024-11-07 23:34:13,807][42004] Updated weights for policy 0, policy_version 6706 (0.0038) +[2024-11-07 23:34:17,932][41694] Fps is (10 sec: 6963.5, 60 sec: 6417.2, 300 sec: 6581.4). Total num frames: 27496448. Throughput: 0: 1552.7. Samples: 1866354. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:34:17,934][41694] Avg episode reward: [(0, '4.432')] +[2024-11-07 23:34:19,622][42004] Updated weights for policy 0, policy_version 6716 (0.0040) +[2024-11-07 23:34:22,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6417.1, 300 sec: 6581.4). Total num frames: 27529216. Throughput: 0: 1643.8. Samples: 1876926. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:34:22,933][41694] Avg episode reward: [(0, '4.439')] +[2024-11-07 23:34:25,154][42004] Updated weights for policy 0, policy_version 6726 (0.0036) +[2024-11-07 23:34:27,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6417.1, 300 sec: 6636.9). Total num frames: 27566080. Throughput: 0: 1657.0. Samples: 1887942. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-07 23:34:27,933][41694] Avg episode reward: [(0, '4.383')] +[2024-11-07 23:34:31,611][42004] Updated weights for policy 0, policy_version 6736 (0.0034) +[2024-11-07 23:34:32,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6690.1, 300 sec: 6623.1). Total num frames: 27598848. Throughput: 0: 1636.0. Samples: 1892752. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:34:32,933][41694] Avg episode reward: [(0, '4.414')] +[2024-11-07 23:34:37,221][42004] Updated weights for policy 0, policy_version 6746 (0.0030) +[2024-11-07 23:34:37,932][41694] Fps is (10 sec: 6553.5, 60 sec: 6621.9, 300 sec: 6623.0). Total num frames: 27631616. Throughput: 0: 1620.8. Samples: 1903232. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:34:37,933][41694] Avg episode reward: [(0, '4.579')] +[2024-11-07 23:34:42,932][41694] Fps is (10 sec: 5324.6, 60 sec: 6417.0, 300 sec: 6567.5). Total num frames: 27652096. Throughput: 0: 1542.2. Samples: 1909536. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:34:42,935][41694] Avg episode reward: [(0, '4.474')] +[2024-11-07 23:34:45,717][42004] Updated weights for policy 0, policy_version 6756 (0.0031) +[2024-11-07 23:34:47,931][41694] Fps is (10 sec: 5325.0, 60 sec: 6280.5, 300 sec: 6539.7). Total num frames: 27684864. Throughput: 0: 1553.8. Samples: 1914468. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:34:47,933][41694] Avg episode reward: [(0, '4.384')] +[2024-11-07 23:34:51,262][42004] Updated weights for policy 0, policy_version 6766 (0.0036) +[2024-11-07 23:34:52,931][41694] Fps is (10 sec: 6963.5, 60 sec: 6280.6, 300 sec: 6553.6). Total num frames: 27721728. Throughput: 0: 1631.2. Samples: 1925542. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:34:52,933][41694] Avg episode reward: [(0, '4.493')] +[2024-11-07 23:34:56,878][42004] Updated weights for policy 0, policy_version 6776 (0.0026) +[2024-11-07 23:34:57,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6417.1, 300 sec: 6567.5). Total num frames: 27758592. Throughput: 0: 1668.8. Samples: 1936272. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:34:57,933][41694] Avg episode reward: [(0, '4.197')] +[2024-11-07 23:35:02,931][41694] Fps is (10 sec: 6553.6, 60 sec: 6348.8, 300 sec: 6595.3). Total num frames: 27787264. Throughput: 0: 1661.4. Samples: 1941118. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-11-07 23:35:02,935][41694] Avg episode reward: [(0, '4.297')] +[2024-11-07 23:35:03,745][42004] Updated weights for policy 0, policy_version 6786 (0.0044) +[2024-11-07 23:35:07,931][41694] Fps is (10 sec: 6144.0, 60 sec: 6553.7, 300 sec: 6595.3). Total num frames: 27820032. Throughput: 0: 1616.7. Samples: 1949678. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:35:07,933][41694] Avg episode reward: [(0, '4.318')] +[2024-11-07 23:35:10,120][42004] Updated weights for policy 0, policy_version 6796 (0.0040) +[2024-11-07 23:35:12,931][41694] Fps is (10 sec: 6144.0, 60 sec: 6485.3, 300 sec: 6581.4). Total num frames: 27848704. Throughput: 0: 1591.9. Samples: 1959578. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:35:12,937][41694] Avg episode reward: [(0, '4.258')] +[2024-11-07 23:35:17,931][41694] Fps is (10 sec: 4915.2, 60 sec: 6212.3, 300 sec: 6525.8). Total num frames: 27869184. Throughput: 0: 1514.7. Samples: 1960914. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:35:17,933][41694] Avg episode reward: [(0, '4.260')] +[2024-11-07 23:35:18,777][42004] Updated weights for policy 0, policy_version 6806 (0.0040) +[2024-11-07 23:35:22,932][41694] Fps is (10 sec: 5324.7, 60 sec: 6212.3, 300 sec: 6498.1). Total num frames: 27901952. Throughput: 0: 1482.0. Samples: 1969922. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:35:22,935][41694] Avg episode reward: [(0, '4.312')] +[2024-11-07 23:35:25,065][42004] Updated weights for policy 0, policy_version 6816 (0.0034) +[2024-11-07 23:35:27,932][41694] Fps is (10 sec: 6553.4, 60 sec: 6144.0, 300 sec: 6498.1). Total num frames: 27934720. Throughput: 0: 1579.8. Samples: 1980628. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:35:27,933][41694] Avg episode reward: [(0, '4.469')] +[2024-11-07 23:35:31,043][42004] Updated weights for policy 0, policy_version 6826 (0.0045) +[2024-11-07 23:35:32,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6212.3, 300 sec: 6511.9). Total num frames: 27971584. Throughput: 0: 1580.7. Samples: 1985600. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:35:32,934][41694] Avg episode reward: [(0, '4.414')] +[2024-11-07 23:35:37,216][42004] Updated weights for policy 0, policy_version 6836 (0.0028) +[2024-11-07 23:35:37,932][41694] Fps is (10 sec: 6553.1, 60 sec: 6143.9, 300 sec: 6553.6). Total num frames: 28000256. Throughput: 0: 1561.3. Samples: 1995800. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-07 23:35:37,935][41694] Avg episode reward: [(0, '4.242')] +[2024-11-07 23:35:37,953][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000006837_28004352.pth... +[2024-11-07 23:35:38,084][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000006457_26447872.pth +[2024-11-07 23:35:42,932][41694] Fps is (10 sec: 6144.0, 60 sec: 6348.8, 300 sec: 6539.7). Total num frames: 28033024. Throughput: 0: 1520.1. Samples: 2004678. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-11-07 23:35:42,934][41694] Avg episode reward: [(0, '4.357')] +[2024-11-07 23:35:43,895][42004] Updated weights for policy 0, policy_version 6846 (0.0041) +[2024-11-07 23:35:48,720][41694] Fps is (10 sec: 5695.5, 60 sec: 6199.1, 300 sec: 6494.6). Total num frames: 28061696. Throughput: 0: 1500.1. Samples: 2009804. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-11-07 23:35:48,722][41694] Avg episode reward: [(0, '4.508')] +[2024-11-07 23:35:51,215][42004] Updated weights for policy 0, policy_version 6856 (0.0029) +[2024-11-07 23:35:52,931][41694] Fps is (10 sec: 6144.1, 60 sec: 6212.3, 300 sec: 6484.2). Total num frames: 28094464. Throughput: 0: 1510.9. Samples: 2017668. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:35:52,935][41694] Avg episode reward: [(0, '4.507')] +[2024-11-07 23:35:57,027][42004] Updated weights for policy 0, policy_version 6866 (0.0037) +[2024-11-07 23:35:57,931][41694] Fps is (10 sec: 7114.6, 60 sec: 6144.0, 300 sec: 6484.2). Total num frames: 28127232. Throughput: 0: 1528.8. Samples: 2028376. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-11-07 23:35:57,933][41694] Avg episode reward: [(0, '4.262')] +[2024-11-07 23:36:02,932][41694] Fps is (10 sec: 6553.5, 60 sec: 6212.3, 300 sec: 6470.3). Total num frames: 28160000. Throughput: 0: 1612.7. Samples: 2033484. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:36:02,934][41694] Avg episode reward: [(0, '4.313')] +[2024-11-07 23:36:03,208][42004] Updated weights for policy 0, policy_version 6876 (0.0024) +[2024-11-07 23:36:07,932][41694] Fps is (10 sec: 6963.0, 60 sec: 6280.5, 300 sec: 6540.0). Total num frames: 28196864. Throughput: 0: 1633.7. Samples: 2043440. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:36:07,933][41694] Avg episode reward: [(0, '4.443')] +[2024-11-07 23:36:09,357][42004] Updated weights for policy 0, policy_version 6886 (0.0029) +[2024-11-07 23:36:12,933][41694] Fps is (10 sec: 6143.3, 60 sec: 6212.1, 300 sec: 6511.9). Total num frames: 28221440. Throughput: 0: 1586.7. Samples: 2052030. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:36:12,934][41694] Avg episode reward: [(0, '4.583')] +[2024-11-07 23:36:16,686][42004] Updated weights for policy 0, policy_version 6896 (0.0031) +[2024-11-07 23:36:17,931][41694] Fps is (10 sec: 5734.5, 60 sec: 6417.1, 300 sec: 6498.1). Total num frames: 28254208. Throughput: 0: 1568.1. Samples: 2056166. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:36:17,933][41694] Avg episode reward: [(0, '4.473')] +[2024-11-07 23:36:22,932][41694] Fps is (10 sec: 5325.3, 60 sec: 6212.2, 300 sec: 6442.5). Total num frames: 28274688. Throughput: 0: 1565.6. Samples: 2066250. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:36:22,934][41694] Avg episode reward: [(0, '4.542')] +[2024-11-07 23:36:24,342][42004] Updated weights for policy 0, policy_version 6906 (0.0027) +[2024-11-07 23:36:27,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6280.5, 300 sec: 6442.5). Total num frames: 28311552. Throughput: 0: 1542.6. Samples: 2074094. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:36:27,933][41694] Avg episode reward: [(0, '4.454')] +[2024-11-07 23:36:29,886][42004] Updated weights for policy 0, policy_version 6916 (0.0031) +[2024-11-07 23:36:32,932][41694] Fps is (10 sec: 7372.9, 60 sec: 6280.5, 300 sec: 6442.5). Total num frames: 28348416. Throughput: 0: 1584.8. Samples: 2079870. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:36:32,934][41694] Avg episode reward: [(0, '4.376')] +[2024-11-07 23:36:35,174][42004] Updated weights for policy 0, policy_version 6926 (0.0030) +[2024-11-07 23:36:37,933][41694] Fps is (10 sec: 7781.2, 60 sec: 6485.3, 300 sec: 6470.3). Total num frames: 28389376. Throughput: 0: 1642.2. Samples: 2091568. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:36:37,935][41694] Avg episode reward: [(0, '4.357')] +[2024-11-07 23:36:40,714][42004] Updated weights for policy 0, policy_version 6936 (0.0025) +[2024-11-07 23:36:42,940][41694] Fps is (10 sec: 7776.0, 60 sec: 6552.7, 300 sec: 6523.0). Total num frames: 28426240. Throughput: 0: 1647.9. Samples: 2102544. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:36:42,945][41694] Avg episode reward: [(0, '4.534')] +[2024-11-07 23:36:46,680][42004] Updated weights for policy 0, policy_version 6946 (0.0031) +[2024-11-07 23:36:47,934][41694] Fps is (10 sec: 6553.0, 60 sec: 6640.6, 300 sec: 6498.0). Total num frames: 28454912. Throughput: 0: 1642.8. Samples: 2107412. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:36:47,936][41694] Avg episode reward: [(0, '4.543')] +[2024-11-07 23:36:52,425][42004] Updated weights for policy 0, policy_version 6956 (0.0025) +[2024-11-07 23:36:52,931][41694] Fps is (10 sec: 6559.1, 60 sec: 6621.9, 300 sec: 6498.1). Total num frames: 28491776. Throughput: 0: 1652.1. Samples: 2117786. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:36:52,933][41694] Avg episode reward: [(0, '4.337')] +[2024-11-07 23:36:57,932][41694] Fps is (10 sec: 6145.5, 60 sec: 6485.3, 300 sec: 6470.3). Total num frames: 28516352. Throughput: 0: 1641.8. Samples: 2125908. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-07 23:36:57,933][41694] Avg episode reward: [(0, '4.450')] +[2024-11-07 23:36:59,540][42004] Updated weights for policy 0, policy_version 6966 (0.0025) +[2024-11-07 23:37:02,933][41694] Fps is (10 sec: 6143.7, 60 sec: 6553.6, 300 sec: 6456.4). Total num frames: 28553216. Throughput: 0: 1677.8. Samples: 2131668. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-07 23:37:02,936][41694] Avg episode reward: [(0, '4.487')] +[2024-11-07 23:37:05,420][42004] Updated weights for policy 0, policy_version 6976 (0.0028) +[2024-11-07 23:37:07,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6553.6, 300 sec: 6456.4). Total num frames: 28590080. Throughput: 0: 1687.2. Samples: 2142172. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-07 23:37:07,934][41694] Avg episode reward: [(0, '4.614')] +[2024-11-07 23:37:10,950][42004] Updated weights for policy 0, policy_version 6986 (0.0029) +[2024-11-07 23:37:12,931][41694] Fps is (10 sec: 7373.2, 60 sec: 6758.5, 300 sec: 6484.2). Total num frames: 28626944. Throughput: 0: 1761.0. Samples: 2153340. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:37:12,933][41694] Avg episode reward: [(0, '4.357')] +[2024-11-07 23:37:16,506][42004] Updated weights for policy 0, policy_version 6996 (0.0030) +[2024-11-07 23:37:17,933][41694] Fps is (10 sec: 7372.1, 60 sec: 6826.5, 300 sec: 6534.0). Total num frames: 28663808. Throughput: 0: 1752.6. Samples: 2158738. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:37:17,936][41694] Avg episode reward: [(0, '4.373')] +[2024-11-07 23:37:22,863][42004] Updated weights for policy 0, policy_version 7006 (0.0035) +[2024-11-07 23:37:22,932][41694] Fps is (10 sec: 6963.1, 60 sec: 7031.5, 300 sec: 6525.9). Total num frames: 28696576. Throughput: 0: 1704.5. Samples: 2168266. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:37:22,934][41694] Avg episode reward: [(0, '4.406')] +[2024-11-07 23:37:27,931][41694] Fps is (10 sec: 6964.0, 60 sec: 7031.5, 300 sec: 6525.8). Total num frames: 28733440. Throughput: 0: 1713.5. Samples: 2179636. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:37:27,936][41694] Avg episode reward: [(0, '4.392')] +[2024-11-07 23:37:28,237][42004] Updated weights for policy 0, policy_version 7016 (0.0031) +[2024-11-07 23:37:32,931][41694] Fps is (10 sec: 6144.1, 60 sec: 6826.7, 300 sec: 6470.3). Total num frames: 28758016. Throughput: 0: 1686.6. Samples: 2183306. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:37:32,933][41694] Avg episode reward: [(0, '4.323')] +[2024-11-07 23:37:35,541][42004] Updated weights for policy 0, policy_version 7026 (0.0023) +[2024-11-07 23:37:37,932][41694] Fps is (10 sec: 6143.9, 60 sec: 6758.6, 300 sec: 6470.3). Total num frames: 28794880. Throughput: 0: 1671.7. Samples: 2193012. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:37:37,934][41694] Avg episode reward: [(0, '4.403')] +[2024-11-07 23:37:37,949][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000007030_28794880.pth... +[2024-11-07 23:37:38,313][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000006649_27234304.pth +[2024-11-07 23:37:41,453][42004] Updated weights for policy 0, policy_version 7036 (0.0037) +[2024-11-07 23:37:42,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6691.1, 300 sec: 6470.3). Total num frames: 28827648. Throughput: 0: 1719.7. Samples: 2203296. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:37:42,935][41694] Avg episode reward: [(0, '4.417')] +[2024-11-07 23:37:47,367][42004] Updated weights for policy 0, policy_version 7046 (0.0042) +[2024-11-07 23:37:47,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6826.9, 300 sec: 6484.2). Total num frames: 28864512. Throughput: 0: 1701.7. Samples: 2208242. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:37:47,934][41694] Avg episode reward: [(0, '4.587')] +[2024-11-07 23:37:52,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6758.4, 300 sec: 6525.8). Total num frames: 28897280. Throughput: 0: 1713.9. Samples: 2219296. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:37:52,933][41694] Avg episode reward: [(0, '4.347')] +[2024-11-07 23:37:53,155][42004] Updated weights for policy 0, policy_version 7056 (0.0028) +[2024-11-07 23:37:57,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6963.2, 300 sec: 6539.7). Total num frames: 28934144. Throughput: 0: 1698.1. Samples: 2229754. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:37:57,934][41694] Avg episode reward: [(0, '4.401')] +[2024-11-07 23:37:58,757][42004] Updated weights for policy 0, policy_version 7066 (0.0025) +[2024-11-07 23:38:02,933][41694] Fps is (10 sec: 6963.2, 60 sec: 6895.0, 300 sec: 6539.7). Total num frames: 28966912. Throughput: 0: 1688.5. Samples: 2234720. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:38:02,937][41694] Avg episode reward: [(0, '4.730')] +[2024-11-07 23:38:06,920][42004] Updated weights for policy 0, policy_version 7076 (0.0043) +[2024-11-07 23:38:07,931][41694] Fps is (10 sec: 5324.9, 60 sec: 6621.9, 300 sec: 6484.2). Total num frames: 28987392. Throughput: 0: 1628.7. Samples: 2241558. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:38:07,933][41694] Avg episode reward: [(0, '4.298')] +[2024-11-07 23:38:12,416][42004] Updated weights for policy 0, policy_version 7086 (0.0032) +[2024-11-07 23:38:12,932][41694] Fps is (10 sec: 6144.0, 60 sec: 6690.1, 300 sec: 6498.1). Total num frames: 29028352. Throughput: 0: 1629.2. Samples: 2252948. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:38:12,936][41694] Avg episode reward: [(0, '4.234')] +[2024-11-07 23:38:17,724][42004] Updated weights for policy 0, policy_version 7096 (0.0027) +[2024-11-07 23:38:17,932][41694] Fps is (10 sec: 7782.1, 60 sec: 6690.2, 300 sec: 6511.9). Total num frames: 29065216. Throughput: 0: 1675.5. Samples: 2258702. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:38:17,935][41694] Avg episode reward: [(0, '4.413')] +[2024-11-07 23:38:22,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6758.4, 300 sec: 6512.0). Total num frames: 29102080. Throughput: 0: 1709.8. Samples: 2269954. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:38:22,934][41694] Avg episode reward: [(0, '4.438')] +[2024-11-07 23:38:23,124][42004] Updated weights for policy 0, policy_version 7106 (0.0023) +[2024-11-07 23:38:27,931][41694] Fps is (10 sec: 6963.4, 60 sec: 6690.1, 300 sec: 6567.5). Total num frames: 29134848. Throughput: 0: 1714.2. Samples: 2280436. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:38:27,933][41694] Avg episode reward: [(0, '4.413')] +[2024-11-07 23:38:29,286][42004] Updated weights for policy 0, policy_version 7116 (0.0035) +[2024-11-07 23:38:32,933][41694] Fps is (10 sec: 6962.2, 60 sec: 6894.8, 300 sec: 6567.5). Total num frames: 29171712. Throughput: 0: 1722.4. Samples: 2285752. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:38:32,935][41694] Avg episode reward: [(0, '4.423')] +[2024-11-07 23:38:34,570][42004] Updated weights for policy 0, policy_version 7126 (0.0031) +[2024-11-07 23:38:38,835][41694] Fps is (10 sec: 6386.3, 60 sec: 6725.4, 300 sec: 6547.4). Total num frames: 29204480. Throughput: 0: 1695.9. Samples: 2297144. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:38:38,840][41694] Avg episode reward: [(0, '4.423')] +[2024-11-07 23:38:42,085][42004] Updated weights for policy 0, policy_version 7136 (0.0024) +[2024-11-07 23:38:42,932][41694] Fps is (10 sec: 6144.8, 60 sec: 6758.4, 300 sec: 6525.8). Total num frames: 29233152. Throughput: 0: 1665.6. Samples: 2304706. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:38:42,934][41694] Avg episode reward: [(0, '4.462')] +[2024-11-07 23:38:47,366][42004] Updated weights for policy 0, policy_version 7146 (0.0023) +[2024-11-07 23:38:47,931][41694] Fps is (10 sec: 7654.7, 60 sec: 6826.7, 300 sec: 6539.7). Total num frames: 29274112. Throughput: 0: 1684.8. Samples: 2310538. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:38:47,934][41694] Avg episode reward: [(0, '4.296')] +[2024-11-07 23:38:52,635][42004] Updated weights for policy 0, policy_version 7156 (0.0025) +[2024-11-07 23:38:52,931][41694] Fps is (10 sec: 7782.5, 60 sec: 6894.9, 300 sec: 6567.5). Total num frames: 29310976. Throughput: 0: 1789.6. Samples: 2322092. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:38:52,935][41694] Avg episode reward: [(0, '4.236')] +[2024-11-07 23:38:57,932][41694] Fps is (10 sec: 7372.6, 60 sec: 6894.9, 300 sec: 6581.4). Total num frames: 29347840. Throughput: 0: 1790.4. Samples: 2333518. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:38:57,933][41694] Avg episode reward: [(0, '4.367')] +[2024-11-07 23:38:58,183][42004] Updated weights for policy 0, policy_version 7166 (0.0029) +[2024-11-07 23:39:02,932][41694] Fps is (10 sec: 6553.6, 60 sec: 6826.7, 300 sec: 6609.2). Total num frames: 29376512. Throughput: 0: 1765.2. Samples: 2338134. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:39:02,933][41694] Avg episode reward: [(0, '4.415')] +[2024-11-07 23:39:04,730][42004] Updated weights for policy 0, policy_version 7176 (0.0026) +[2024-11-07 23:39:07,931][41694] Fps is (10 sec: 6963.4, 60 sec: 7168.0, 300 sec: 6636.9). Total num frames: 29417472. Throughput: 0: 1739.2. Samples: 2348220. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:39:07,933][41694] Avg episode reward: [(0, '4.390')] +[2024-11-07 23:39:10,452][42004] Updated weights for policy 0, policy_version 7186 (0.0032) +[2024-11-07 23:39:12,931][41694] Fps is (10 sec: 6144.1, 60 sec: 6826.7, 300 sec: 6581.4). Total num frames: 29437952. Throughput: 0: 1705.8. Samples: 2357196. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:39:12,933][41694] Avg episode reward: [(0, '4.447')] +[2024-11-07 23:39:17,779][42004] Updated weights for policy 0, policy_version 7196 (0.0041) +[2024-11-07 23:39:17,932][41694] Fps is (10 sec: 5734.1, 60 sec: 6826.6, 300 sec: 6595.2). Total num frames: 29474816. Throughput: 0: 1675.8. Samples: 2361162. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:39:17,934][41694] Avg episode reward: [(0, '4.473')] +[2024-11-07 23:39:22,932][41694] Fps is (10 sec: 7372.5, 60 sec: 6826.6, 300 sec: 6595.2). Total num frames: 29511680. Throughput: 0: 1704.5. Samples: 2372306. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:39:22,934][41694] Avg episode reward: [(0, '4.372')] +[2024-11-07 23:39:23,161][42004] Updated weights for policy 0, policy_version 7206 (0.0029) +[2024-11-07 23:39:27,931][41694] Fps is (10 sec: 7373.2, 60 sec: 6894.9, 300 sec: 6609.1). Total num frames: 29548544. Throughput: 0: 1759.6. Samples: 2383888. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:39:27,934][41694] Avg episode reward: [(0, '4.267')] +[2024-11-07 23:39:28,544][42004] Updated weights for policy 0, policy_version 7216 (0.0035) +[2024-11-07 23:39:32,934][41694] Fps is (10 sec: 7371.1, 60 sec: 6894.8, 300 sec: 6623.0). Total num frames: 29585408. Throughput: 0: 1752.9. Samples: 2389424. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:39:32,937][41694] Avg episode reward: [(0, '4.498')] +[2024-11-07 23:39:34,526][42004] Updated weights for policy 0, policy_version 7226 (0.0034) +[2024-11-07 23:39:37,932][41694] Fps is (10 sec: 7372.6, 60 sec: 7069.6, 300 sec: 6678.6). Total num frames: 29622272. Throughput: 0: 1719.0. Samples: 2399448. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:39:37,934][41694] Avg episode reward: [(0, '4.575')] +[2024-11-07 23:39:37,950][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000007232_29622272.pth... +[2024-11-07 23:39:38,101][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000006837_28004352.pth +[2024-11-07 23:39:40,531][42004] Updated weights for policy 0, policy_version 7236 (0.0035) +[2024-11-07 23:39:42,932][41694] Fps is (10 sec: 6964.6, 60 sec: 7031.4, 300 sec: 6678.5). Total num frames: 29655040. Throughput: 0: 1698.1. Samples: 2409932. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:39:42,933][41694] Avg episode reward: [(0, '4.399')] +[2024-11-07 23:39:47,917][42004] Updated weights for policy 0, policy_version 7246 (0.0036) +[2024-11-07 23:39:47,931][41694] Fps is (10 sec: 5734.5, 60 sec: 6758.4, 300 sec: 6636.9). Total num frames: 29679616. Throughput: 0: 1716.7. Samples: 2415386. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:39:47,933][41694] Avg episode reward: [(0, '4.479')] +[2024-11-07 23:39:52,931][41694] Fps is (10 sec: 5734.8, 60 sec: 6690.1, 300 sec: 6623.0). Total num frames: 29712384. Throughput: 0: 1664.1. Samples: 2423106. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:39:52,934][41694] Avg episode reward: [(0, '4.511')] +[2024-11-07 23:39:53,506][42004] Updated weights for policy 0, policy_version 7256 (0.0028) +[2024-11-07 23:39:57,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6758.4, 300 sec: 6664.7). Total num frames: 29753344. Throughput: 0: 1715.4. Samples: 2434390. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:39:57,938][41694] Avg episode reward: [(0, '4.313')] +[2024-11-07 23:39:58,925][42004] Updated weights for policy 0, policy_version 7266 (0.0033) +[2024-11-07 23:40:02,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6826.7, 300 sec: 6664.7). Total num frames: 29786112. Throughput: 0: 1743.3. Samples: 2439608. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:40:02,933][41694] Avg episode reward: [(0, '4.306')] +[2024-11-07 23:40:05,201][42004] Updated weights for policy 0, policy_version 7276 (0.0036) +[2024-11-07 23:40:07,931][41694] Fps is (10 sec: 6553.6, 60 sec: 6690.1, 300 sec: 6678.6). Total num frames: 29818880. Throughput: 0: 1711.8. Samples: 2449336. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-11-07 23:40:07,933][41694] Avg episode reward: [(0, '4.446')] +[2024-11-07 23:40:11,719][42004] Updated weights for policy 0, policy_version 7286 (0.0036) +[2024-11-07 23:40:12,932][41694] Fps is (10 sec: 6553.5, 60 sec: 6894.9, 300 sec: 6720.2). Total num frames: 29851648. Throughput: 0: 1670.8. Samples: 2459072. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-11-07 23:40:12,935][41694] Avg episode reward: [(0, '4.376')] +[2024-11-07 23:40:17,932][41694] Fps is (10 sec: 6143.9, 60 sec: 6758.5, 300 sec: 6706.3). Total num frames: 29880320. Throughput: 0: 1650.3. Samples: 2463684. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:40:17,934][41694] Avg episode reward: [(0, '4.472')] +[2024-11-07 23:40:17,990][42004] Updated weights for policy 0, policy_version 7296 (0.0028) +[2024-11-07 23:40:22,932][41694] Fps is (10 sec: 5324.7, 60 sec: 6553.6, 300 sec: 6678.6). Total num frames: 29904896. Throughput: 0: 1589.1. Samples: 2470958. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:40:22,936][41694] Avg episode reward: [(0, '4.443')] +[2024-11-07 23:40:25,770][42004] Updated weights for policy 0, policy_version 7306 (0.0029) +[2024-11-07 23:40:27,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6485.3, 300 sec: 6664.7). Total num frames: 29937664. Throughput: 0: 1583.5. Samples: 2481190. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:40:27,934][41694] Avg episode reward: [(0, '4.312')] +[2024-11-07 23:40:31,630][42004] Updated weights for policy 0, policy_version 7316 (0.0033) +[2024-11-07 23:40:32,932][41694] Fps is (10 sec: 6553.6, 60 sec: 6417.3, 300 sec: 6678.6). Total num frames: 29970432. Throughput: 0: 1581.0. Samples: 2486532. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:40:32,934][41694] Avg episode reward: [(0, '4.392')] +[2024-11-07 23:40:37,935][41694] Fps is (10 sec: 6551.6, 60 sec: 6348.5, 300 sec: 6678.5). Total num frames: 30003200. Throughput: 0: 1598.4. Samples: 2495040. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:40:37,936][41694] Avg episode reward: [(0, '4.268')] +[2024-11-07 23:40:38,340][42004] Updated weights for policy 0, policy_version 7326 (0.0037) +[2024-11-07 23:40:42,932][41694] Fps is (10 sec: 6553.5, 60 sec: 6348.8, 300 sec: 6710.4). Total num frames: 30035968. Throughput: 0: 1569.4. Samples: 2505012. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:40:42,934][41694] Avg episode reward: [(0, '4.261')] +[2024-11-07 23:40:44,978][42004] Updated weights for policy 0, policy_version 7336 (0.0027) +[2024-11-07 23:40:47,931][41694] Fps is (10 sec: 6555.7, 60 sec: 6485.3, 300 sec: 6692.4). Total num frames: 30068736. Throughput: 0: 1557.0. Samples: 2509672. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:40:47,933][41694] Avg episode reward: [(0, '4.356')] +[2024-11-07 23:40:50,471][42004] Updated weights for policy 0, policy_version 7346 (0.0024) +[2024-11-07 23:40:52,931][41694] Fps is (10 sec: 6963.4, 60 sec: 6553.6, 300 sec: 6706.3). Total num frames: 30105600. Throughput: 0: 1589.9. Samples: 2520880. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:40:52,933][41694] Avg episode reward: [(0, '4.497')] +[2024-11-07 23:40:57,783][42004] Updated weights for policy 0, policy_version 7356 (0.0024) +[2024-11-07 23:40:57,932][41694] Fps is (10 sec: 6143.8, 60 sec: 6280.5, 300 sec: 6678.6). Total num frames: 30130176. Throughput: 0: 1554.0. Samples: 2529004. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:40:57,934][41694] Avg episode reward: [(0, '4.319')] +[2024-11-07 23:41:02,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6280.5, 300 sec: 6664.7). Total num frames: 30162944. Throughput: 0: 1565.9. Samples: 2534150. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:41:02,934][41694] Avg episode reward: [(0, '4.352')] +[2024-11-07 23:41:03,770][42004] Updated weights for policy 0, policy_version 7366 (0.0035) +[2024-11-07 23:41:07,932][41694] Fps is (10 sec: 6553.7, 60 sec: 6280.5, 300 sec: 6692.5). Total num frames: 30195712. Throughput: 0: 1623.1. Samples: 2543998. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:41:07,934][41694] Avg episode reward: [(0, '4.331')] +[2024-11-07 23:41:11,132][42004] Updated weights for policy 0, policy_version 7376 (0.0036) +[2024-11-07 23:41:12,937][41694] Fps is (10 sec: 5733.4, 60 sec: 6143.8, 300 sec: 6664.6). Total num frames: 30220288. Throughput: 0: 1572.6. Samples: 2551960. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-11-07 23:41:12,940][41694] Avg episode reward: [(0, '4.510')] +[2024-11-07 23:41:17,648][42004] Updated weights for policy 0, policy_version 7386 (0.0028) +[2024-11-07 23:41:17,931][41694] Fps is (10 sec: 5734.5, 60 sec: 6212.3, 300 sec: 6706.3). Total num frames: 30253056. Throughput: 0: 1545.7. Samples: 2556088. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:41:17,933][41694] Avg episode reward: [(0, '4.291')] +[2024-11-07 23:41:22,932][41694] Fps is (10 sec: 6554.8, 60 sec: 6348.8, 300 sec: 6692.4). Total num frames: 30285824. Throughput: 0: 1585.5. Samples: 2566384. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:41:22,936][41694] Avg episode reward: [(0, '4.512')] +[2024-11-07 23:41:23,443][42004] Updated weights for policy 0, policy_version 7396 (0.0029) +[2024-11-07 23:41:29,096][41694] Fps is (10 sec: 6236.8, 60 sec: 6294.9, 300 sec: 6666.1). Total num frames: 30322688. Throughput: 0: 1579.8. Samples: 2577942. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:41:29,098][41694] Avg episode reward: [(0, '4.493')] +[2024-11-07 23:41:30,630][42004] Updated weights for policy 0, policy_version 7406 (0.0031) +[2024-11-07 23:41:32,931][41694] Fps is (10 sec: 6144.1, 60 sec: 6280.6, 300 sec: 6636.9). Total num frames: 30347264. Throughput: 0: 1566.0. Samples: 2580144. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:41:32,933][41694] Avg episode reward: [(0, '4.505')] +[2024-11-07 23:41:36,136][42004] Updated weights for policy 0, policy_version 7416 (0.0030) +[2024-11-07 23:41:37,931][41694] Fps is (10 sec: 7417.6, 60 sec: 6417.4, 300 sec: 6651.0). Total num frames: 30388224. Throughput: 0: 1567.7. Samples: 2591426. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:41:37,934][41694] Avg episode reward: [(0, '4.476')] +[2024-11-07 23:41:37,946][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000007419_30388224.pth... +[2024-11-07 23:41:38,089][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000007030_28794880.pth +[2024-11-07 23:41:41,616][42004] Updated weights for policy 0, policy_version 7426 (0.0026) +[2024-11-07 23:41:42,931][41694] Fps is (10 sec: 7782.4, 60 sec: 6485.4, 300 sec: 6678.6). Total num frames: 30425088. Throughput: 0: 1633.2. Samples: 2602498. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:41:42,933][41694] Avg episode reward: [(0, '4.514')] +[2024-11-07 23:41:47,462][42004] Updated weights for policy 0, policy_version 7436 (0.0039) +[2024-11-07 23:41:47,932][41694] Fps is (10 sec: 6962.7, 60 sec: 6485.3, 300 sec: 6664.7). Total num frames: 30457856. Throughput: 0: 1637.4. Samples: 2607832. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:41:47,935][41694] Avg episode reward: [(0, '4.175')] +[2024-11-07 23:41:52,932][41694] Fps is (10 sec: 6553.4, 60 sec: 6417.0, 300 sec: 6692.4). Total num frames: 30490624. Throughput: 0: 1636.4. Samples: 2617638. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:41:52,934][41694] Avg episode reward: [(0, '4.524')] +[2024-11-07 23:41:53,486][42004] Updated weights for policy 0, policy_version 7446 (0.0041) +[2024-11-07 23:41:57,932][41694] Fps is (10 sec: 7372.9, 60 sec: 6690.1, 300 sec: 6706.3). Total num frames: 30531584. Throughput: 0: 1714.3. Samples: 2629100. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:41:57,934][41694] Avg episode reward: [(0, '4.306')] +[2024-11-07 23:41:58,736][42004] Updated weights for policy 0, policy_version 7456 (0.0030) +[2024-11-07 23:42:03,170][41694] Fps is (10 sec: 6000.9, 60 sec: 6459.7, 300 sec: 6645.4). Total num frames: 30552064. Throughput: 0: 1735.1. Samples: 2634582. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:42:03,172][41694] Avg episode reward: [(0, '4.232')] +[2024-11-07 23:42:06,492][42004] Updated weights for policy 0, policy_version 7466 (0.0029) +[2024-11-07 23:42:07,931][41694] Fps is (10 sec: 5734.7, 60 sec: 6553.6, 300 sec: 6650.8). Total num frames: 30588928. Throughput: 0: 1676.6. Samples: 2641830. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:42:07,936][41694] Avg episode reward: [(0, '4.475')] +[2024-11-07 23:42:12,091][42004] Updated weights for policy 0, policy_version 7476 (0.0027) +[2024-11-07 23:42:12,933][41694] Fps is (10 sec: 7552.2, 60 sec: 6758.5, 300 sec: 6650.8). Total num frames: 30625792. Throughput: 0: 1706.3. Samples: 2652738. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:42:12,935][41694] Avg episode reward: [(0, '4.405')] +[2024-11-07 23:42:17,356][42004] Updated weights for policy 0, policy_version 7486 (0.0027) +[2024-11-07 23:42:17,932][41694] Fps is (10 sec: 7782.3, 60 sec: 6894.9, 300 sec: 6678.6). Total num frames: 30666752. Throughput: 0: 1744.3. Samples: 2658640. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:42:17,934][41694] Avg episode reward: [(0, '4.454')] +[2024-11-07 23:42:22,932][41694] Fps is (10 sec: 7373.4, 60 sec: 6894.9, 300 sec: 6664.7). Total num frames: 30699520. Throughput: 0: 1737.9. Samples: 2669632. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:42:22,944][41694] Avg episode reward: [(0, '4.489')] +[2024-11-07 23:42:23,446][42004] Updated weights for policy 0, policy_version 7496 (0.0036) +[2024-11-07 23:42:27,932][41694] Fps is (10 sec: 6962.9, 60 sec: 7031.4, 300 sec: 6706.3). Total num frames: 30736384. Throughput: 0: 1720.3. Samples: 2679912. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:42:27,935][41694] Avg episode reward: [(0, '4.171')] +[2024-11-07 23:42:28,892][42004] Updated weights for policy 0, policy_version 7506 (0.0028) +[2024-11-07 23:42:32,931][41694] Fps is (10 sec: 7373.0, 60 sec: 7099.7, 300 sec: 6706.3). Total num frames: 30773248. Throughput: 0: 1728.6. Samples: 2685616. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:42:32,933][41694] Avg episode reward: [(0, '4.259')] +[2024-11-07 23:42:34,351][42004] Updated weights for policy 0, policy_version 7516 (0.0031) +[2024-11-07 23:42:37,931][41694] Fps is (10 sec: 6144.4, 60 sec: 6826.7, 300 sec: 6678.6). Total num frames: 30797824. Throughput: 0: 1739.8. Samples: 2695928. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:42:37,933][41694] Avg episode reward: [(0, '4.165')] +[2024-11-07 23:42:41,464][42004] Updated weights for policy 0, policy_version 7526 (0.0024) +[2024-11-07 23:42:42,932][41694] Fps is (10 sec: 6143.9, 60 sec: 6826.6, 300 sec: 6678.6). Total num frames: 30834688. Throughput: 0: 1688.6. Samples: 2705086. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:42:42,933][41694] Avg episode reward: [(0, '4.362')] +[2024-11-07 23:42:46,980][42004] Updated weights for policy 0, policy_version 7536 (0.0031) +[2024-11-07 23:42:47,932][41694] Fps is (10 sec: 7372.6, 60 sec: 6895.0, 300 sec: 6692.4). Total num frames: 30871552. Throughput: 0: 1700.7. Samples: 2710710. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:42:47,933][41694] Avg episode reward: [(0, '4.416')] +[2024-11-07 23:42:52,268][42004] Updated weights for policy 0, policy_version 7546 (0.0029) +[2024-11-07 23:42:52,931][41694] Fps is (10 sec: 7782.5, 60 sec: 7031.5, 300 sec: 6706.3). Total num frames: 30912512. Throughput: 0: 1786.5. Samples: 2722222. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:42:52,933][41694] Avg episode reward: [(0, '4.207')] +[2024-11-07 23:42:57,932][41694] Fps is (10 sec: 7372.9, 60 sec: 6895.0, 300 sec: 6706.3). Total num frames: 30945280. Throughput: 0: 1776.5. Samples: 2732678. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:42:57,934][41694] Avg episode reward: [(0, '4.332')] +[2024-11-07 23:42:58,160][42004] Updated weights for policy 0, policy_version 7556 (0.0036) +[2024-11-07 23:43:02,931][41694] Fps is (10 sec: 6553.7, 60 sec: 7128.1, 300 sec: 6748.0). Total num frames: 30978048. Throughput: 0: 1770.9. Samples: 2738332. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:43:02,936][41694] Avg episode reward: [(0, '4.550')] +[2024-11-07 23:43:04,028][42004] Updated weights for policy 0, policy_version 7566 (0.0032) +[2024-11-07 23:43:07,932][41694] Fps is (10 sec: 7372.9, 60 sec: 7168.0, 300 sec: 6748.0). Total num frames: 31019008. Throughput: 0: 1763.9. Samples: 2749006. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:43:07,933][41694] Avg episode reward: [(0, '4.286')] +[2024-11-07 23:43:09,315][42004] Updated weights for policy 0, policy_version 7576 (0.0032) +[2024-11-07 23:43:12,932][41694] Fps is (10 sec: 6143.9, 60 sec: 6895.1, 300 sec: 6692.5). Total num frames: 31039488. Throughput: 0: 1703.1. Samples: 2756552. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:43:12,933][41694] Avg episode reward: [(0, '4.395')] +[2024-11-07 23:43:16,856][42004] Updated weights for policy 0, policy_version 7586 (0.0036) +[2024-11-07 23:43:17,932][41694] Fps is (10 sec: 6144.0, 60 sec: 6894.9, 300 sec: 6706.3). Total num frames: 31080448. Throughput: 0: 1696.9. Samples: 2761978. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:43:17,934][41694] Avg episode reward: [(0, '4.625')] +[2024-11-07 23:43:22,466][42004] Updated weights for policy 0, policy_version 7596 (0.0025) +[2024-11-07 23:43:22,932][41694] Fps is (10 sec: 7372.2, 60 sec: 6894.9, 300 sec: 6706.3). Total num frames: 31113216. Throughput: 0: 1723.0. Samples: 2773464. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:43:22,935][41694] Avg episode reward: [(0, '4.416')] +[2024-11-07 23:43:27,898][42004] Updated weights for policy 0, policy_version 7606 (0.0041) +[2024-11-07 23:43:27,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6963.3, 300 sec: 6720.2). Total num frames: 31154176. Throughput: 0: 1768.6. Samples: 2784674. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:43:27,934][41694] Avg episode reward: [(0, '4.370')] +[2024-11-07 23:43:32,933][41694] Fps is (10 sec: 7372.5, 60 sec: 6894.8, 300 sec: 6740.8). Total num frames: 31186944. Throughput: 0: 1751.6. Samples: 2789532. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:43:32,934][41694] Avg episode reward: [(0, '4.464')] +[2024-11-07 23:43:33,794][42004] Updated weights for policy 0, policy_version 7616 (0.0048) +[2024-11-07 23:43:37,933][41694] Fps is (10 sec: 6962.5, 60 sec: 7099.6, 300 sec: 6748.0). Total num frames: 31223808. Throughput: 0: 1734.4. Samples: 2800270. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:43:37,935][41694] Avg episode reward: [(0, '4.368')] +[2024-11-07 23:43:37,945][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000007623_31223808.pth... +[2024-11-07 23:43:38,079][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000007232_29622272.pth +[2024-11-07 23:43:39,221][42004] Updated weights for policy 0, policy_version 7626 (0.0029) +[2024-11-07 23:43:42,932][41694] Fps is (10 sec: 7373.7, 60 sec: 7099.7, 300 sec: 6734.1). Total num frames: 31260672. Throughput: 0: 1744.9. Samples: 2811198. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:43:42,933][41694] Avg episode reward: [(0, '4.580')] +[2024-11-07 23:43:46,907][42004] Updated weights for policy 0, policy_version 7636 (0.0040) +[2024-11-07 23:43:47,932][41694] Fps is (10 sec: 5734.7, 60 sec: 6826.6, 300 sec: 6678.6). Total num frames: 31281152. Throughput: 0: 1689.4. Samples: 2814356. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:43:47,934][41694] Avg episode reward: [(0, '4.323')] +[2024-11-07 23:43:52,286][42004] Updated weights for policy 0, policy_version 7646 (0.0027) +[2024-11-07 23:43:52,932][41694] Fps is (10 sec: 6143.7, 60 sec: 6826.6, 300 sec: 6692.4). Total num frames: 31322112. Throughput: 0: 1682.7. Samples: 2824728. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:43:52,934][41694] Avg episode reward: [(0, '4.393')] +[2024-11-07 23:43:57,493][42004] Updated weights for policy 0, policy_version 7656 (0.0024) +[2024-11-07 23:43:57,931][41694] Fps is (10 sec: 7782.9, 60 sec: 6895.0, 300 sec: 6720.2). Total num frames: 31358976. Throughput: 0: 1772.1. Samples: 2836294. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:43:57,933][41694] Avg episode reward: [(0, '4.429')] +[2024-11-07 23:44:02,932][41694] Fps is (10 sec: 6963.5, 60 sec: 6894.9, 300 sec: 6692.4). Total num frames: 31391744. Throughput: 0: 1780.1. Samples: 2842084. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:44:02,933][41694] Avg episode reward: [(0, '4.272')] +[2024-11-07 23:44:03,719][42004] Updated weights for policy 0, policy_version 7666 (0.0042) +[2024-11-07 23:44:07,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6826.7, 300 sec: 6748.0). Total num frames: 31428608. Throughput: 0: 1731.7. Samples: 2851390. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:44:07,933][41694] Avg episode reward: [(0, '4.273')] +[2024-11-07 23:44:09,379][42004] Updated weights for policy 0, policy_version 7676 (0.0030) +[2024-11-07 23:44:12,931][41694] Fps is (10 sec: 7372.9, 60 sec: 7099.7, 300 sec: 6748.0). Total num frames: 31465472. Throughput: 0: 1737.1. Samples: 2862844. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:44:12,933][41694] Avg episode reward: [(0, '4.122')] +[2024-11-07 23:44:15,367][42004] Updated weights for policy 0, policy_version 7686 (0.0028) +[2024-11-07 23:44:19,627][41694] Fps is (10 sec: 5953.8, 60 sec: 6771.9, 300 sec: 6695.6). Total num frames: 31498240. Throughput: 0: 1667.2. Samples: 2867380. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:44:19,629][41694] Avg episode reward: [(0, '4.429')] +[2024-11-07 23:44:22,617][42004] Updated weights for policy 0, policy_version 7696 (0.0032) +[2024-11-07 23:44:22,934][41694] Fps is (10 sec: 5733.2, 60 sec: 6826.5, 300 sec: 6692.4). Total num frames: 31522816. Throughput: 0: 1663.2. Samples: 2875114. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:44:22,938][41694] Avg episode reward: [(0, '4.558')] +[2024-11-07 23:44:27,932][41694] Fps is (10 sec: 7398.2, 60 sec: 6758.4, 300 sec: 6692.5). Total num frames: 31559680. Throughput: 0: 1666.8. Samples: 2886204. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:44:27,934][41694] Avg episode reward: [(0, '4.573')] +[2024-11-07 23:44:28,208][42004] Updated weights for policy 0, policy_version 7706 (0.0028) +[2024-11-07 23:44:32,931][41694] Fps is (10 sec: 7374.4, 60 sec: 6826.8, 300 sec: 6692.5). Total num frames: 31596544. Throughput: 0: 1722.4. Samples: 2891864. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:44:32,933][41694] Avg episode reward: [(0, '4.401')] +[2024-11-07 23:44:33,667][42004] Updated weights for policy 0, policy_version 7716 (0.0026) +[2024-11-07 23:44:37,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6826.8, 300 sec: 6706.3). Total num frames: 31633408. Throughput: 0: 1741.3. Samples: 2903084. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:44:37,934][41694] Avg episode reward: [(0, '4.566')] +[2024-11-07 23:44:39,457][42004] Updated weights for policy 0, policy_version 7726 (0.0034) +[2024-11-07 23:44:42,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6758.4, 300 sec: 6734.1). Total num frames: 31666176. Throughput: 0: 1710.0. Samples: 2913242. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:44:42,934][41694] Avg episode reward: [(0, '4.174')] +[2024-11-07 23:44:45,460][42004] Updated weights for policy 0, policy_version 7736 (0.0035) +[2024-11-07 23:44:47,932][41694] Fps is (10 sec: 6963.1, 60 sec: 7031.5, 300 sec: 6748.0). Total num frames: 31703040. Throughput: 0: 1694.3. Samples: 2918326. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:44:47,938][41694] Avg episode reward: [(0, '4.470')] +[2024-11-07 23:44:50,879][42004] Updated weights for policy 0, policy_version 7746 (0.0036) +[2024-11-07 23:44:53,714][41694] Fps is (10 sec: 6077.8, 60 sec: 6738.8, 300 sec: 6688.6). Total num frames: 31731712. Throughput: 0: 1712.0. Samples: 2929768. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:44:53,716][41694] Avg episode reward: [(0, '4.537')] +[2024-11-07 23:44:57,933][41694] Fps is (10 sec: 6144.0, 60 sec: 6758.4, 300 sec: 6706.3). Total num frames: 31764480. Throughput: 0: 1664.2. Samples: 2937734. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:44:57,935][41694] Avg episode reward: [(0, '4.537')] +[2024-11-07 23:44:57,989][42004] Updated weights for policy 0, policy_version 7756 (0.0039) +[2024-11-07 23:45:02,931][41694] Fps is (10 sec: 7554.7, 60 sec: 6826.7, 300 sec: 6720.2). Total num frames: 31801344. Throughput: 0: 1753.6. Samples: 2943320. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:45:02,935][41694] Avg episode reward: [(0, '4.457')] +[2024-11-07 23:45:03,816][42004] Updated weights for policy 0, policy_version 7766 (0.0033) +[2024-11-07 23:45:07,931][41694] Fps is (10 sec: 7373.0, 60 sec: 6826.7, 300 sec: 6734.1). Total num frames: 31838208. Throughput: 0: 1751.3. Samples: 2953920. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:45:07,933][41694] Avg episode reward: [(0, '4.341')] +[2024-11-07 23:45:09,678][42004] Updated weights for policy 0, policy_version 7776 (0.0030) +[2024-11-07 23:45:12,932][41694] Fps is (10 sec: 6553.5, 60 sec: 6690.1, 300 sec: 6734.1). Total num frames: 31866880. Throughput: 0: 1717.7. Samples: 2963500. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:45:12,934][41694] Avg episode reward: [(0, '4.306')] +[2024-11-07 23:45:16,644][42004] Updated weights for policy 0, policy_version 7786 (0.0034) +[2024-11-07 23:45:17,932][41694] Fps is (10 sec: 6143.9, 60 sec: 6884.7, 300 sec: 6761.9). Total num frames: 31899648. Throughput: 0: 1685.6. Samples: 2967718. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:45:17,933][41694] Avg episode reward: [(0, '4.417')] +[2024-11-07 23:45:22,543][42004] Updated weights for policy 0, policy_version 7796 (0.0031) +[2024-11-07 23:45:22,931][41694] Fps is (10 sec: 6553.6, 60 sec: 6826.9, 300 sec: 6761.9). Total num frames: 31932416. Throughput: 0: 1657.6. Samples: 2977674. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:45:22,933][41694] Avg episode reward: [(0, '4.477')] +[2024-11-07 23:45:27,932][41694] Fps is (10 sec: 5324.8, 60 sec: 6553.6, 300 sec: 6720.2). Total num frames: 31952896. Throughput: 0: 1623.9. Samples: 2986316. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:45:27,936][41694] Avg episode reward: [(0, '4.243')] +[2024-11-07 23:45:30,334][42004] Updated weights for policy 0, policy_version 7806 (0.0034) +[2024-11-07 23:45:32,931][41694] Fps is (10 sec: 5734.4, 60 sec: 6553.6, 300 sec: 6734.2). Total num frames: 31989760. Throughput: 0: 1599.4. Samples: 2990300. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:45:32,933][41694] Avg episode reward: [(0, '4.282')] +[2024-11-07 23:45:36,131][42004] Updated weights for policy 0, policy_version 7816 (0.0026) +[2024-11-07 23:45:37,931][41694] Fps is (10 sec: 7373.0, 60 sec: 6553.6, 300 sec: 6748.0). Total num frames: 32026624. Throughput: 0: 1609.3. Samples: 3000926. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:45:37,934][41694] Avg episode reward: [(0, '4.502')] +[2024-11-07 23:45:37,947][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000007819_32026624.pth... +[2024-11-07 23:45:38,086][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000007419_30388224.pth +[2024-11-07 23:45:41,991][42004] Updated weights for policy 0, policy_version 7826 (0.0032) +[2024-11-07 23:45:42,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6553.6, 300 sec: 6748.0). Total num frames: 32059392. Throughput: 0: 1636.9. Samples: 3011396. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:45:42,933][41694] Avg episode reward: [(0, '4.416')] +[2024-11-07 23:45:47,932][41694] Fps is (10 sec: 6143.8, 60 sec: 6417.1, 300 sec: 6720.2). Total num frames: 32088064. Throughput: 0: 1613.7. Samples: 3015938. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:45:47,934][41694] Avg episode reward: [(0, '4.494')] +[2024-11-07 23:45:48,632][42004] Updated weights for policy 0, policy_version 7836 (0.0033) +[2024-11-07 23:45:52,932][41694] Fps is (10 sec: 6553.6, 60 sec: 6640.2, 300 sec: 6761.9). Total num frames: 32124928. Throughput: 0: 1597.4. Samples: 3025804. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-07 23:45:52,933][41694] Avg episode reward: [(0, '4.568')] +[2024-11-07 23:45:54,267][42004] Updated weights for policy 0, policy_version 7846 (0.0028) +[2024-11-07 23:45:57,932][41694] Fps is (10 sec: 7372.9, 60 sec: 6621.9, 300 sec: 6775.8). Total num frames: 32161792. Throughput: 0: 1629.9. Samples: 3036846. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-07 23:45:57,933][41694] Avg episode reward: [(0, '4.341')] +[2024-11-07 23:45:59,794][42004] Updated weights for policy 0, policy_version 7856 (0.0035) +[2024-11-07 23:46:02,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6348.8, 300 sec: 6734.1). Total num frames: 32182272. Throughput: 0: 1658.6. Samples: 3042356. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:46:02,934][41694] Avg episode reward: [(0, '4.543')] +[2024-11-07 23:46:07,527][42004] Updated weights for policy 0, policy_version 7866 (0.0028) +[2024-11-07 23:46:07,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6348.8, 300 sec: 6775.8). Total num frames: 32219136. Throughput: 0: 1595.7. Samples: 3049482. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:46:07,934][41694] Avg episode reward: [(0, '4.320')] +[2024-11-07 23:46:12,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6417.1, 300 sec: 6775.8). Total num frames: 32251904. Throughput: 0: 1632.4. Samples: 3059774. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:46:12,934][41694] Avg episode reward: [(0, '4.403')] +[2024-11-07 23:46:13,665][42004] Updated weights for policy 0, policy_version 7876 (0.0036) +[2024-11-07 23:46:17,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6485.3, 300 sec: 6789.6). Total num frames: 32288768. Throughput: 0: 1655.6. Samples: 3064800. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:46:17,934][41694] Avg episode reward: [(0, '4.452')] +[2024-11-07 23:46:19,633][42004] Updated weights for policy 0, policy_version 7886 (0.0034) +[2024-11-07 23:46:22,932][41694] Fps is (10 sec: 6553.5, 60 sec: 6417.1, 300 sec: 6788.7). Total num frames: 32317440. Throughput: 0: 1639.5. Samples: 3074706. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:46:22,934][41694] Avg episode reward: [(0, '4.532')] +[2024-11-07 23:46:25,643][42004] Updated weights for policy 0, policy_version 7896 (0.0022) +[2024-11-07 23:46:27,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6758.4, 300 sec: 6817.4). Total num frames: 32358400. Throughput: 0: 1646.0. Samples: 3085464. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:46:27,934][41694] Avg episode reward: [(0, '4.360')] +[2024-11-07 23:46:31,087][42004] Updated weights for policy 0, policy_version 7906 (0.0024) +[2024-11-07 23:46:32,932][41694] Fps is (10 sec: 7782.5, 60 sec: 6758.4, 300 sec: 6803.5). Total num frames: 32395264. Throughput: 0: 1674.2. Samples: 3091278. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:46:32,934][41694] Avg episode reward: [(0, '4.379')] +[2024-11-07 23:46:37,932][41694] Fps is (10 sec: 6143.8, 60 sec: 6553.6, 300 sec: 6761.9). Total num frames: 32419840. Throughput: 0: 1630.1. Samples: 3099158. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:46:37,934][41694] Avg episode reward: [(0, '4.242')] +[2024-11-07 23:46:38,389][42004] Updated weights for policy 0, policy_version 7916 (0.0023) +[2024-11-07 23:46:42,932][41694] Fps is (10 sec: 6144.0, 60 sec: 6621.9, 300 sec: 6775.8). Total num frames: 32456704. Throughput: 0: 1636.4. Samples: 3110484. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:46:42,933][41694] Avg episode reward: [(0, '4.490')] +[2024-11-07 23:46:43,769][42004] Updated weights for policy 0, policy_version 7926 (0.0030) +[2024-11-07 23:46:47,932][41694] Fps is (10 sec: 7372.9, 60 sec: 6758.4, 300 sec: 6789.6). Total num frames: 32493568. Throughput: 0: 1640.8. Samples: 3116194. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-07 23:46:47,934][41694] Avg episode reward: [(0, '4.566')] +[2024-11-07 23:46:49,225][42004] Updated weights for policy 0, policy_version 7936 (0.0027) +[2024-11-07 23:46:52,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6758.4, 300 sec: 6775.8). Total num frames: 32530432. Throughput: 0: 1731.4. Samples: 3127394. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-07 23:46:52,933][41694] Avg episode reward: [(0, '4.552')] +[2024-11-07 23:46:55,146][42004] Updated weights for policy 0, policy_version 7946 (0.0034) +[2024-11-07 23:46:57,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6758.4, 300 sec: 6836.8). Total num frames: 32567296. Throughput: 0: 1734.8. Samples: 3137842. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:46:57,934][41694] Avg episode reward: [(0, '4.376')] +[2024-11-07 23:47:00,443][42004] Updated weights for policy 0, policy_version 7956 (0.0022) +[2024-11-07 23:47:02,931][41694] Fps is (10 sec: 7372.9, 60 sec: 7031.5, 300 sec: 6831.3). Total num frames: 32604160. Throughput: 0: 1754.9. Samples: 3143770. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:47:02,933][41694] Avg episode reward: [(0, '4.370')] +[2024-11-07 23:47:05,949][42004] Updated weights for policy 0, policy_version 7966 (0.0032) +[2024-11-07 23:47:07,932][41694] Fps is (10 sec: 7372.7, 60 sec: 7031.5, 300 sec: 6831.3). Total num frames: 32641024. Throughput: 0: 1780.3. Samples: 3154818. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:47:07,934][41694] Avg episode reward: [(0, '4.354')] +[2024-11-07 23:47:12,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6826.6, 300 sec: 6761.9). Total num frames: 32661504. Throughput: 0: 1702.4. Samples: 3162074. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:47:12,934][41694] Avg episode reward: [(0, '4.476')] +[2024-11-07 23:47:13,568][42004] Updated weights for policy 0, policy_version 7976 (0.0023) +[2024-11-07 23:47:17,932][41694] Fps is (10 sec: 6144.0, 60 sec: 6894.9, 300 sec: 6789.6). Total num frames: 32702464. Throughput: 0: 1699.7. Samples: 3167766. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:47:17,936][41694] Avg episode reward: [(0, '4.345')] +[2024-11-07 23:47:18,927][42004] Updated weights for policy 0, policy_version 7986 (0.0030) +[2024-11-07 23:47:22,931][41694] Fps is (10 sec: 7782.6, 60 sec: 7031.5, 300 sec: 6789.7). Total num frames: 32739328. Throughput: 0: 1776.9. Samples: 3179118. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:47:22,933][41694] Avg episode reward: [(0, '4.454')] +[2024-11-07 23:47:24,285][42004] Updated weights for policy 0, policy_version 7996 (0.0031) +[2024-11-07 23:47:27,933][41694] Fps is (10 sec: 6962.5, 60 sec: 6894.8, 300 sec: 6775.7). Total num frames: 32772096. Throughput: 0: 1760.5. Samples: 3189708. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-07 23:47:27,935][41694] Avg episode reward: [(0, '4.373')] +[2024-11-07 23:47:30,354][42004] Updated weights for policy 0, policy_version 8006 (0.0026) +[2024-11-07 23:47:32,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6894.9, 300 sec: 6817.4). Total num frames: 32808960. Throughput: 0: 1753.9. Samples: 3195118. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:47:32,933][41694] Avg episode reward: [(0, '4.474')] +[2024-11-07 23:47:35,726][42004] Updated weights for policy 0, policy_version 8016 (0.0021) +[2024-11-07 23:47:37,931][41694] Fps is (10 sec: 7783.4, 60 sec: 7168.0, 300 sec: 6831.3). Total num frames: 32849920. Throughput: 0: 1755.6. Samples: 3206394. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:47:37,933][41694] Avg episode reward: [(0, '4.324')] +[2024-11-07 23:47:37,942][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000008020_32849920.pth... +[2024-11-07 23:47:38,077][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000007623_31223808.pth +[2024-11-07 23:47:41,125][42004] Updated weights for policy 0, policy_version 8026 (0.0017) +[2024-11-07 23:47:44,573][41694] Fps is (10 sec: 6333.1, 60 sec: 6910.6, 300 sec: 6779.7). Total num frames: 32882688. Throughput: 0: 1715.8. Samples: 3217870. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:47:44,576][41694] Avg episode reward: [(0, '4.555')] +[2024-11-07 23:47:47,931][41694] Fps is (10 sec: 6143.9, 60 sec: 6963.2, 300 sec: 6775.8). Total num frames: 32911360. Throughput: 0: 1693.4. Samples: 3219972. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:47:47,933][41694] Avg episode reward: [(0, '4.567')] +[2024-11-07 23:47:48,321][42004] Updated weights for policy 0, policy_version 8036 (0.0024) +[2024-11-07 23:47:52,932][41694] Fps is (10 sec: 7840.7, 60 sec: 6963.2, 300 sec: 6789.6). Total num frames: 32948224. Throughput: 0: 1708.0. Samples: 3231678. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:47:52,934][41694] Avg episode reward: [(0, '4.508')] +[2024-11-07 23:47:53,779][42004] Updated weights for policy 0, policy_version 8046 (0.0025) +[2024-11-07 23:47:57,932][41694] Fps is (10 sec: 6553.5, 60 sec: 6826.7, 300 sec: 6775.8). Total num frames: 32976896. Throughput: 0: 1747.5. Samples: 3240712. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:47:57,934][41694] Avg episode reward: [(0, '4.339')] +[2024-11-07 23:48:02,475][42004] Updated weights for policy 0, policy_version 8056 (0.0048) +[2024-11-07 23:48:02,942][41694] Fps is (10 sec: 4910.1, 60 sec: 6552.4, 300 sec: 6706.1). Total num frames: 32997376. Throughput: 0: 1694.6. Samples: 3244042. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:48:02,944][41694] Avg episode reward: [(0, '4.196')] +[2024-11-07 23:48:07,931][41694] Fps is (10 sec: 5324.9, 60 sec: 6485.4, 300 sec: 6748.0). Total num frames: 33030144. Throughput: 0: 1626.7. Samples: 3252320. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:48:07,933][41694] Avg episode reward: [(0, '4.502')] +[2024-11-07 23:48:08,599][42004] Updated weights for policy 0, policy_version 8066 (0.0037) +[2024-11-07 23:48:12,932][41694] Fps is (10 sec: 7380.6, 60 sec: 6826.7, 300 sec: 6748.0). Total num frames: 33071104. Throughput: 0: 1645.3. Samples: 3263744. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:48:12,934][41694] Avg episode reward: [(0, '4.220')] +[2024-11-07 23:48:14,010][42004] Updated weights for policy 0, policy_version 8076 (0.0028) +[2024-11-07 23:48:18,779][41694] Fps is (10 sec: 6419.1, 60 sec: 6529.6, 300 sec: 6714.8). Total num frames: 33099776. Throughput: 0: 1687.4. Samples: 3272482. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:48:18,781][41694] Avg episode reward: [(0, '4.239')] +[2024-11-07 23:48:21,375][42004] Updated weights for policy 0, policy_version 8086 (0.0027) +[2024-11-07 23:48:22,931][41694] Fps is (10 sec: 5734.5, 60 sec: 6485.3, 300 sec: 6692.5). Total num frames: 33128448. Throughput: 0: 1566.9. Samples: 3276904. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:48:22,933][41694] Avg episode reward: [(0, '4.419')] +[2024-11-07 23:48:26,778][42004] Updated weights for policy 0, policy_version 8096 (0.0037) +[2024-11-07 23:48:27,932][41694] Fps is (10 sec: 7608.0, 60 sec: 6622.0, 300 sec: 6720.2). Total num frames: 33169408. Throughput: 0: 1625.7. Samples: 3288356. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:48:27,935][41694] Avg episode reward: [(0, '4.437')] +[2024-11-07 23:48:32,096][42004] Updated weights for policy 0, policy_version 8106 (0.0026) +[2024-11-07 23:48:32,932][41694] Fps is (10 sec: 7782.1, 60 sec: 6621.8, 300 sec: 6720.2). Total num frames: 33206272. Throughput: 0: 1644.8. Samples: 3293990. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:48:32,934][41694] Avg episode reward: [(0, '4.314')] +[2024-11-07 23:48:37,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6553.6, 300 sec: 6720.2). Total num frames: 33243136. Throughput: 0: 1630.4. Samples: 3305046. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:48:37,933][41694] Avg episode reward: [(0, '4.398')] +[2024-11-07 23:48:37,937][42004] Updated weights for policy 0, policy_version 8116 (0.0033) +[2024-11-07 23:48:42,931][41694] Fps is (10 sec: 7373.1, 60 sec: 6808.2, 300 sec: 6775.8). Total num frames: 33280000. Throughput: 0: 1679.5. Samples: 3316290. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:48:42,933][41694] Avg episode reward: [(0, '4.651')] +[2024-11-07 23:48:43,195][42004] Updated weights for policy 0, policy_version 8126 (0.0024) +[2024-11-07 23:48:47,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6758.4, 300 sec: 6761.9). Total num frames: 33316864. Throughput: 0: 1738.2. Samples: 3322242. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:48:47,934][41694] Avg episode reward: [(0, '4.344')] +[2024-11-07 23:48:48,556][42004] Updated weights for policy 0, policy_version 8136 (0.0029) +[2024-11-07 23:48:52,932][41694] Fps is (10 sec: 6143.9, 60 sec: 6553.6, 300 sec: 6720.2). Total num frames: 33341440. Throughput: 0: 1676.2. Samples: 3327748. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:48:52,933][41694] Avg episode reward: [(0, '4.258')] +[2024-11-07 23:48:55,735][42004] Updated weights for policy 0, policy_version 8146 (0.0023) +[2024-11-07 23:48:57,932][41694] Fps is (10 sec: 6553.8, 60 sec: 6758.4, 300 sec: 6748.0). Total num frames: 33382400. Throughput: 0: 1727.7. Samples: 3341492. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:48:57,934][41694] Avg episode reward: [(0, '4.508')] +[2024-11-07 23:49:01,051][42004] Updated weights for policy 0, policy_version 8156 (0.0029) +[2024-11-07 23:49:02,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6964.4, 300 sec: 6734.1). Total num frames: 33415168. Throughput: 0: 1696.8. Samples: 3347402. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:49:02,935][41694] Avg episode reward: [(0, '4.520')] +[2024-11-07 23:49:06,616][42004] Updated weights for policy 0, policy_version 8166 (0.0039) +[2024-11-07 23:49:07,932][41694] Fps is (10 sec: 7372.7, 60 sec: 7099.7, 300 sec: 6748.0). Total num frames: 33456128. Throughput: 0: 1809.1. Samples: 3358312. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:49:07,934][41694] Avg episode reward: [(0, '4.340')] +[2024-11-07 23:49:12,423][42004] Updated weights for policy 0, policy_version 8176 (0.0033) +[2024-11-07 23:49:12,931][41694] Fps is (10 sec: 7373.0, 60 sec: 6963.2, 300 sec: 6787.0). Total num frames: 33488896. Throughput: 0: 1793.3. Samples: 3369056. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:49:12,933][41694] Avg episode reward: [(0, '4.183')] +[2024-11-07 23:49:17,787][42004] Updated weights for policy 0, policy_version 8186 (0.0032) +[2024-11-07 23:49:17,932][41694] Fps is (10 sec: 7372.8, 60 sec: 7270.7, 300 sec: 6803.6). Total num frames: 33529856. Throughput: 0: 1793.2. Samples: 3374684. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:49:17,933][41694] Avg episode reward: [(0, '4.487')] +[2024-11-07 23:49:22,932][41694] Fps is (10 sec: 7781.7, 60 sec: 7304.4, 300 sec: 6803.5). Total num frames: 33566720. Throughput: 0: 1803.0. Samples: 3386184. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:49:22,937][41694] Avg episode reward: [(0, '4.369')] +[2024-11-07 23:49:23,190][42004] Updated weights for policy 0, policy_version 8196 (0.0036) +[2024-11-07 23:49:27,931][41694] Fps is (10 sec: 5734.5, 60 sec: 6963.2, 300 sec: 6748.0). Total num frames: 33587200. Throughput: 0: 1729.6. Samples: 3394122. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:49:27,933][41694] Avg episode reward: [(0, '4.492')] +[2024-11-07 23:49:30,632][42004] Updated weights for policy 0, policy_version 8206 (0.0036) +[2024-11-07 23:49:32,932][41694] Fps is (10 sec: 6144.5, 60 sec: 7031.5, 300 sec: 6761.9). Total num frames: 33628160. Throughput: 0: 1713.2. Samples: 3399334. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:49:32,933][41694] Avg episode reward: [(0, '4.344')] +[2024-11-07 23:49:35,974][42004] Updated weights for policy 0, policy_version 8216 (0.0033) +[2024-11-07 23:49:37,932][41694] Fps is (10 sec: 7782.3, 60 sec: 7031.5, 300 sec: 6775.8). Total num frames: 33665024. Throughput: 0: 1838.8. Samples: 3410496. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:49:37,933][41694] Avg episode reward: [(0, '4.340')] +[2024-11-07 23:49:37,943][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000008219_33665024.pth... +[2024-11-07 23:49:38,076][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000007819_32026624.pth +[2024-11-07 23:49:41,600][42004] Updated weights for policy 0, policy_version 8226 (0.0024) +[2024-11-07 23:49:42,931][41694] Fps is (10 sec: 7372.8, 60 sec: 7031.5, 300 sec: 6775.8). Total num frames: 33701888. Throughput: 0: 1785.7. Samples: 3421850. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:49:42,934][41694] Avg episode reward: [(0, '4.543')] +[2024-11-07 23:49:47,932][41694] Fps is (10 sec: 5734.2, 60 sec: 6758.4, 300 sec: 6765.9). Total num frames: 33722368. Throughput: 0: 1729.1. Samples: 3425210. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:49:47,934][41694] Avg episode reward: [(0, '4.347')] +[2024-11-07 23:49:49,665][42004] Updated weights for policy 0, policy_version 8236 (0.0031) +[2024-11-07 23:49:52,932][41694] Fps is (10 sec: 5324.8, 60 sec: 6894.9, 300 sec: 6748.0). Total num frames: 33755136. Throughput: 0: 1673.0. Samples: 3433598. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:49:52,935][41694] Avg episode reward: [(0, '4.368')] +[2024-11-07 23:49:55,445][42004] Updated weights for policy 0, policy_version 8246 (0.0022) +[2024-11-07 23:49:57,932][41694] Fps is (10 sec: 6963.3, 60 sec: 6826.6, 300 sec: 6748.0). Total num frames: 33792000. Throughput: 0: 1673.6. Samples: 3444368. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:49:57,934][41694] Avg episode reward: [(0, '4.414')] +[2024-11-07 23:50:02,932][41694] Fps is (10 sec: 5734.2, 60 sec: 6621.9, 300 sec: 6692.4). Total num frames: 33812480. Throughput: 0: 1653.2. Samples: 3449080. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:50:02,934][41694] Avg episode reward: [(0, '4.281')] +[2024-11-07 23:50:03,116][42004] Updated weights for policy 0, policy_version 8256 (0.0037) +[2024-11-07 23:50:07,932][41694] Fps is (10 sec: 5734.5, 60 sec: 6553.6, 300 sec: 6720.2). Total num frames: 33849344. Throughput: 0: 1565.7. Samples: 3456638. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:50:07,936][41694] Avg episode reward: [(0, '4.307')] +[2024-11-07 23:50:08,971][42004] Updated weights for policy 0, policy_version 8266 (0.0031) +[2024-11-07 23:50:12,932][41694] Fps is (10 sec: 7373.0, 60 sec: 6621.9, 300 sec: 6734.1). Total num frames: 33886208. Throughput: 0: 1637.2. Samples: 3467794. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:50:12,933][41694] Avg episode reward: [(0, '4.563')] +[2024-11-07 23:50:14,487][42004] Updated weights for policy 0, policy_version 8276 (0.0029) +[2024-11-07 23:50:17,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6553.6, 300 sec: 6748.0). Total num frames: 33923072. Throughput: 0: 1640.5. Samples: 3473156. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:50:17,934][41694] Avg episode reward: [(0, '4.498')] +[2024-11-07 23:50:20,331][42004] Updated weights for policy 0, policy_version 8286 (0.0021) +[2024-11-07 23:50:22,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6485.4, 300 sec: 6789.6). Total num frames: 33955840. Throughput: 0: 1623.2. Samples: 3483538. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:50:22,934][41694] Avg episode reward: [(0, '4.469')] +[2024-11-07 23:50:25,777][42004] Updated weights for policy 0, policy_version 8296 (0.0031) +[2024-11-07 23:50:27,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6826.7, 300 sec: 6803.5). Total num frames: 33996800. Throughput: 0: 1628.4. Samples: 3495130. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:50:27,933][41694] Avg episode reward: [(0, '4.346')] +[2024-11-07 23:50:31,226][42004] Updated weights for policy 0, policy_version 8306 (0.0034) +[2024-11-07 23:50:32,931][41694] Fps is (10 sec: 7782.5, 60 sec: 6758.4, 300 sec: 6803.5). Total num frames: 34033664. Throughput: 0: 1675.9. Samples: 3500626. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:50:32,933][41694] Avg episode reward: [(0, '4.319')] +[2024-11-07 23:50:37,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6485.3, 300 sec: 6761.9). Total num frames: 34054144. Throughput: 0: 1649.6. Samples: 3507830. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:50:37,934][41694] Avg episode reward: [(0, '4.381')] +[2024-11-07 23:50:39,027][42004] Updated weights for policy 0, policy_version 8316 (0.0029) +[2024-11-07 23:50:42,932][41694] Fps is (10 sec: 5324.8, 60 sec: 6417.1, 300 sec: 6775.8). Total num frames: 34086912. Throughput: 0: 1647.7. Samples: 3518514. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:50:42,934][41694] Avg episode reward: [(0, '4.419')] +[2024-11-07 23:50:44,714][42004] Updated weights for policy 0, policy_version 8326 (0.0027) +[2024-11-07 23:50:47,931][41694] Fps is (10 sec: 7373.0, 60 sec: 6758.5, 300 sec: 6789.6). Total num frames: 34127872. Throughput: 0: 1663.9. Samples: 3523954. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:50:47,933][41694] Avg episode reward: [(0, '4.488')] +[2024-11-07 23:50:49,928][42004] Updated weights for policy 0, policy_version 8336 (0.0034) +[2024-11-07 23:50:52,936][41694] Fps is (10 sec: 7779.1, 60 sec: 6826.2, 300 sec: 6789.5). Total num frames: 34164736. Throughput: 0: 1759.0. Samples: 3535802. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:50:52,937][41694] Avg episode reward: [(0, '4.480')] +[2024-11-07 23:50:55,555][42004] Updated weights for policy 0, policy_version 8346 (0.0029) +[2024-11-07 23:50:57,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6758.4, 300 sec: 6831.3). Total num frames: 34197504. Throughput: 0: 1740.3. Samples: 3546108. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:50:57,934][41694] Avg episode reward: [(0, '4.662')] +[2024-11-07 23:51:02,674][42004] Updated weights for policy 0, policy_version 8356 (0.0024) +[2024-11-07 23:51:02,932][41694] Fps is (10 sec: 6146.6, 60 sec: 6895.0, 300 sec: 6803.5). Total num frames: 34226176. Throughput: 0: 1721.9. Samples: 3550642. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:51:02,934][41694] Avg episode reward: [(0, '4.307')] +[2024-11-07 23:51:07,932][41694] Fps is (10 sec: 5324.4, 60 sec: 6690.1, 300 sec: 6775.7). Total num frames: 34250752. Throughput: 0: 1653.6. Samples: 3557952. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:51:07,934][41694] Avg episode reward: [(0, '4.251')] +[2024-11-07 23:51:12,508][42004] Updated weights for policy 0, policy_version 8366 (0.0047) +[2024-11-07 23:51:12,931][41694] Fps is (10 sec: 4096.0, 60 sec: 6348.8, 300 sec: 6706.3). Total num frames: 34267136. Throughput: 0: 1521.8. Samples: 3563612. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:51:12,934][41694] Avg episode reward: [(0, '4.199')] +[2024-11-07 23:51:17,932][41694] Fps is (10 sec: 4505.9, 60 sec: 6212.3, 300 sec: 6706.3). Total num frames: 34295808. Throughput: 0: 1503.2. Samples: 3568268. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:51:17,933][41694] Avg episode reward: [(0, '4.408')] +[2024-11-07 23:51:19,073][42004] Updated weights for policy 0, policy_version 8376 (0.0039) +[2024-11-07 23:51:22,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6348.8, 300 sec: 6706.3). Total num frames: 34336768. Throughput: 0: 1568.5. Samples: 3578410. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:51:22,934][41694] Avg episode reward: [(0, '4.528')] +[2024-11-07 23:51:24,338][42004] Updated weights for policy 0, policy_version 8386 (0.0033) +[2024-11-07 23:51:27,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6280.5, 300 sec: 6706.3). Total num frames: 34373632. Throughput: 0: 1584.0. Samples: 3589796. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:51:27,933][41694] Avg episode reward: [(0, '4.439')] +[2024-11-07 23:51:30,014][42004] Updated weights for policy 0, policy_version 8396 (0.0037) +[2024-11-07 23:51:32,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6280.5, 300 sec: 6748.0). Total num frames: 34410496. Throughput: 0: 1580.2. Samples: 3595064. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:51:32,934][41694] Avg episode reward: [(0, '4.577')] +[2024-11-07 23:51:35,229][42004] Updated weights for policy 0, policy_version 8406 (0.0033) +[2024-11-07 23:51:37,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6553.6, 300 sec: 6748.0). Total num frames: 34447360. Throughput: 0: 1579.3. Samples: 3606864. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:51:37,933][41694] Avg episode reward: [(0, '4.677')] +[2024-11-07 23:51:38,016][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000008411_34451456.pth... +[2024-11-07 23:51:38,132][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000008020_32849920.pth +[2024-11-07 23:51:40,706][42004] Updated weights for policy 0, policy_version 8416 (0.0029) +[2024-11-07 23:51:45,073][41694] Fps is (10 sec: 6072.2, 60 sec: 6393.6, 300 sec: 6699.3). Total num frames: 34484224. Throughput: 0: 1523.6. Samples: 3617934. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:51:45,076][41694] Avg episode reward: [(0, '4.521')] +[2024-11-07 23:51:47,932][41694] Fps is (10 sec: 5734.1, 60 sec: 6280.5, 300 sec: 6692.4). Total num frames: 34504704. Throughput: 0: 1529.6. Samples: 3619476. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:51:47,935][41694] Avg episode reward: [(0, '4.507')] +[2024-11-07 23:51:48,513][42004] Updated weights for policy 0, policy_version 8426 (0.0032) +[2024-11-07 23:51:52,932][41694] Fps is (10 sec: 7297.4, 60 sec: 6281.0, 300 sec: 6692.4). Total num frames: 34541568. Throughput: 0: 1602.9. Samples: 3630082. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:51:52,933][41694] Avg episode reward: [(0, '4.358')] +[2024-11-07 23:51:54,100][42004] Updated weights for policy 0, policy_version 8436 (0.0033) +[2024-11-07 23:51:57,933][41694] Fps is (10 sec: 7781.8, 60 sec: 6416.9, 300 sec: 6706.3). Total num frames: 34582528. Throughput: 0: 1740.6. Samples: 3641940. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:51:57,935][41694] Avg episode reward: [(0, '4.391')] +[2024-11-07 23:51:59,220][42004] Updated weights for policy 0, policy_version 8446 (0.0024) +[2024-11-07 23:52:02,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6485.3, 300 sec: 6692.5). Total num frames: 34615296. Throughput: 0: 1767.9. Samples: 3647822. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:52:02,933][41694] Avg episode reward: [(0, '4.550')] +[2024-11-07 23:52:05,363][42004] Updated weights for policy 0, policy_version 8456 (0.0031) +[2024-11-07 23:52:07,932][41694] Fps is (10 sec: 7373.7, 60 sec: 6758.5, 300 sec: 6761.9). Total num frames: 34656256. Throughput: 0: 1766.3. Samples: 3657892. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:52:07,934][41694] Avg episode reward: [(0, '4.437')] +[2024-11-07 23:52:10,491][42004] Updated weights for policy 0, policy_version 8466 (0.0024) +[2024-11-07 23:52:12,931][41694] Fps is (10 sec: 7782.4, 60 sec: 7099.7, 300 sec: 6748.0). Total num frames: 34693120. Throughput: 0: 1772.2. Samples: 3669546. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:52:12,933][41694] Avg episode reward: [(0, '4.593')] +[2024-11-07 23:52:16,016][42004] Updated weights for policy 0, policy_version 8476 (0.0025) +[2024-11-07 23:52:19,575][41694] Fps is (10 sec: 5980.6, 60 sec: 6976.9, 300 sec: 6696.8). Total num frames: 34725888. Throughput: 0: 1718.4. Samples: 3675216. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:52:19,576][41694] Avg episode reward: [(0, '4.599')] +[2024-11-07 23:52:22,935][41694] Fps is (10 sec: 5323.0, 60 sec: 6826.3, 300 sec: 6692.4). Total num frames: 34746368. Throughput: 0: 1666.9. Samples: 3681882. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:52:22,937][41694] Avg episode reward: [(0, '4.603')] +[2024-11-07 23:52:24,439][42004] Updated weights for policy 0, policy_version 8486 (0.0027) +[2024-11-07 23:52:27,932][41694] Fps is (10 sec: 6371.7, 60 sec: 6758.4, 300 sec: 6678.6). Total num frames: 34779136. Throughput: 0: 1718.3. Samples: 3691578. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:52:27,933][41694] Avg episode reward: [(0, '4.401')] +[2024-11-07 23:52:30,293][42004] Updated weights for policy 0, policy_version 8496 (0.0031) +[2024-11-07 23:52:32,931][41694] Fps is (10 sec: 6965.5, 60 sec: 6758.4, 300 sec: 6664.7). Total num frames: 34816000. Throughput: 0: 1725.4. Samples: 3697118. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:52:32,933][41694] Avg episode reward: [(0, '4.364')] +[2024-11-07 23:52:36,015][42004] Updated weights for policy 0, policy_version 8506 (0.0028) +[2024-11-07 23:52:37,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6758.4, 300 sec: 6715.9). Total num frames: 34852864. Throughput: 0: 1722.4. Samples: 3707588. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:52:37,934][41694] Avg episode reward: [(0, '4.527')] +[2024-11-07 23:52:41,580][42004] Updated weights for policy 0, policy_version 8516 (0.0028) +[2024-11-07 23:52:42,932][41694] Fps is (10 sec: 7372.7, 60 sec: 7008.6, 300 sec: 6706.3). Total num frames: 34889728. Throughput: 0: 1704.4. Samples: 3718636. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:52:42,934][41694] Avg episode reward: [(0, '4.329')] +[2024-11-07 23:52:47,416][42004] Updated weights for policy 0, policy_version 8526 (0.0037) +[2024-11-07 23:52:47,931][41694] Fps is (10 sec: 6963.4, 60 sec: 6963.3, 300 sec: 6692.5). Total num frames: 34922496. Throughput: 0: 1693.2. Samples: 3724014. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-07 23:52:47,935][41694] Avg episode reward: [(0, '4.336')] +[2024-11-07 23:52:54,116][41694] Fps is (10 sec: 5493.6, 60 sec: 6694.6, 300 sec: 6665.7). Total num frames: 34951168. Throughput: 0: 1663.0. Samples: 3734696. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-07 23:52:54,119][41694] Avg episode reward: [(0, '4.324')] +[2024-11-07 23:52:55,450][42004] Updated weights for policy 0, policy_version 8536 (0.0025) +[2024-11-07 23:52:57,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6622.0, 300 sec: 6720.5). Total num frames: 34979840. Throughput: 0: 1587.6. Samples: 3740988. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:52:57,936][41694] Avg episode reward: [(0, '4.231')] +[2024-11-07 23:53:01,011][42004] Updated weights for policy 0, policy_version 8546 (0.0034) +[2024-11-07 23:53:02,931][41694] Fps is (10 sec: 7433.8, 60 sec: 6690.1, 300 sec: 6734.1). Total num frames: 35016704. Throughput: 0: 1648.9. Samples: 3746708. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:53:02,936][41694] Avg episode reward: [(0, '4.527')] +[2024-11-07 23:53:06,679][42004] Updated weights for policy 0, policy_version 8556 (0.0032) +[2024-11-07 23:53:07,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6621.9, 300 sec: 6720.2). Total num frames: 35053568. Throughput: 0: 1683.5. Samples: 3757634. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:53:07,934][41694] Avg episode reward: [(0, '4.481')] +[2024-11-07 23:53:12,576][42004] Updated weights for policy 0, policy_version 8566 (0.0034) +[2024-11-07 23:53:12,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6553.6, 300 sec: 6753.5). Total num frames: 35086336. Throughput: 0: 1700.3. Samples: 3768090. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:53:12,933][41694] Avg episode reward: [(0, '4.628')] +[2024-11-07 23:53:17,648][42004] Updated weights for policy 0, policy_version 8576 (0.0026) +[2024-11-07 23:53:17,932][41694] Fps is (10 sec: 7372.2, 60 sec: 6878.4, 300 sec: 6775.7). Total num frames: 35127296. Throughput: 0: 1708.9. Samples: 3774020. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:53:17,934][41694] Avg episode reward: [(0, '4.463')] +[2024-11-07 23:53:22,931][41694] Fps is (10 sec: 7782.5, 60 sec: 6963.6, 300 sec: 6761.9). Total num frames: 35164160. Throughput: 0: 1732.1. Samples: 3785534. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:53:22,933][41694] Avg episode reward: [(0, '4.567')] +[2024-11-07 23:53:23,112][42004] Updated weights for policy 0, policy_version 8586 (0.0033) +[2024-11-07 23:53:28,644][41694] Fps is (10 sec: 6118.4, 60 sec: 6814.1, 300 sec: 6717.9). Total num frames: 35192832. Throughput: 0: 1591.4. Samples: 3791382. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:53:28,645][41694] Avg episode reward: [(0, '4.221')] +[2024-11-07 23:53:30,721][42004] Updated weights for policy 0, policy_version 8596 (0.0026) +[2024-11-07 23:53:32,931][41694] Fps is (10 sec: 5734.4, 60 sec: 6758.4, 300 sec: 6706.3). Total num frames: 35221504. Throughput: 0: 1658.4. Samples: 3798640. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:53:32,933][41694] Avg episode reward: [(0, '4.310')] +[2024-11-07 23:53:36,119][42004] Updated weights for policy 0, policy_version 8606 (0.0036) +[2024-11-07 23:53:37,932][41694] Fps is (10 sec: 7497.0, 60 sec: 6826.7, 300 sec: 6720.2). Total num frames: 35262464. Throughput: 0: 1719.0. Samples: 3810014. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:53:37,934][41694] Avg episode reward: [(0, '4.245')] +[2024-11-07 23:53:37,950][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000008609_35262464.pth... +[2024-11-07 23:53:38,074][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000008219_33665024.pth +[2024-11-07 23:53:41,296][42004] Updated weights for policy 0, policy_version 8616 (0.0026) +[2024-11-07 23:53:42,932][41694] Fps is (10 sec: 7782.3, 60 sec: 6826.7, 300 sec: 6720.2). Total num frames: 35299328. Throughput: 0: 1790.0. Samples: 3821538. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:53:42,933][41694] Avg episode reward: [(0, '4.377')] +[2024-11-07 23:53:47,231][42004] Updated weights for policy 0, policy_version 8626 (0.0035) +[2024-11-07 23:53:47,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6894.9, 300 sec: 6761.9). Total num frames: 35336192. Throughput: 0: 1777.4. Samples: 3826692. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-07 23:53:47,934][41694] Avg episode reward: [(0, '4.424')] +[2024-11-07 23:53:52,409][42004] Updated weights for policy 0, policy_version 8636 (0.0033) +[2024-11-07 23:53:52,932][41694] Fps is (10 sec: 7372.8, 60 sec: 7173.0, 300 sec: 6748.0). Total num frames: 35373056. Throughput: 0: 1788.1. Samples: 3838098. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-07 23:53:52,934][41694] Avg episode reward: [(0, '4.601')] +[2024-11-07 23:53:57,932][41694] Fps is (10 sec: 7372.7, 60 sec: 7168.0, 300 sec: 6761.9). Total num frames: 35409920. Throughput: 0: 1804.2. Samples: 3849278. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:53:57,934][41694] Avg episode reward: [(0, '4.467')] +[2024-11-07 23:53:57,941][42004] Updated weights for policy 0, policy_version 8646 (0.0035) +[2024-11-07 23:54:03,206][41694] Fps is (10 sec: 5980.1, 60 sec: 6931.5, 300 sec: 6700.1). Total num frames: 35434496. Throughput: 0: 1785.4. Samples: 3854850. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:54:03,209][41694] Avg episode reward: [(0, '4.378')] +[2024-11-07 23:54:05,943][42004] Updated weights for policy 0, policy_version 8656 (0.0027) +[2024-11-07 23:54:07,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6894.9, 300 sec: 6706.3). Total num frames: 35467264. Throughput: 0: 1686.8. Samples: 3861442. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:54:07,933][41694] Avg episode reward: [(0, '4.444')] +[2024-11-07 23:54:11,524][42004] Updated weights for policy 0, policy_version 8666 (0.0030) +[2024-11-07 23:54:12,932][41694] Fps is (10 sec: 7159.4, 60 sec: 6963.2, 300 sec: 6692.4). Total num frames: 35504128. Throughput: 0: 1835.0. Samples: 3872650. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-07 23:54:12,933][41694] Avg episode reward: [(0, '4.275')] +[2024-11-07 23:54:16,912][42004] Updated weights for policy 0, policy_version 8676 (0.0031) +[2024-11-07 23:54:17,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6895.0, 300 sec: 6692.5). Total num frames: 35540992. Throughput: 0: 1775.2. Samples: 3878526. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-07 23:54:17,933][41694] Avg episode reward: [(0, '4.544')] +[2024-11-07 23:54:22,706][42004] Updated weights for policy 0, policy_version 8686 (0.0049) +[2024-11-07 23:54:22,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6894.9, 300 sec: 6748.0). Total num frames: 35577856. Throughput: 0: 1755.1. Samples: 3888992. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:54:22,934][41694] Avg episode reward: [(0, '4.510')] +[2024-11-07 23:54:27,931][41694] Fps is (10 sec: 7372.8, 60 sec: 7115.9, 300 sec: 6734.1). Total num frames: 35614720. Throughput: 0: 1742.3. Samples: 3899942. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:54:27,934][41694] Avg episode reward: [(0, '4.481')] +[2024-11-07 23:54:28,220][42004] Updated weights for policy 0, policy_version 8696 (0.0023) +[2024-11-07 23:54:32,931][41694] Fps is (10 sec: 7372.8, 60 sec: 7168.0, 300 sec: 6734.1). Total num frames: 35651584. Throughput: 0: 1755.2. Samples: 3905678. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:54:32,933][41694] Avg episode reward: [(0, '4.380')] +[2024-11-07 23:54:33,573][42004] Updated weights for policy 0, policy_version 8706 (0.0023) +[2024-11-07 23:54:37,931][41694] Fps is (10 sec: 5734.4, 60 sec: 6826.7, 300 sec: 6678.6). Total num frames: 35672064. Throughput: 0: 1739.6. Samples: 3916380. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:54:37,934][41694] Avg episode reward: [(0, '4.382')] +[2024-11-07 23:54:41,437][42004] Updated weights for policy 0, policy_version 8716 (0.0037) +[2024-11-07 23:54:42,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6826.7, 300 sec: 6734.1). Total num frames: 35708928. Throughput: 0: 1658.5. Samples: 3923910. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:54:42,934][41694] Avg episode reward: [(0, '4.414')] +[2024-11-07 23:54:46,864][42004] Updated weights for policy 0, policy_version 8726 (0.0026) +[2024-11-07 23:54:47,931][41694] Fps is (10 sec: 7782.4, 60 sec: 6894.9, 300 sec: 6761.9). Total num frames: 35749888. Throughput: 0: 1666.2. Samples: 3929372. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:54:47,934][41694] Avg episode reward: [(0, '4.495')] +[2024-11-07 23:54:52,400][42004] Updated weights for policy 0, policy_version 8736 (0.0035) +[2024-11-07 23:54:52,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6826.7, 300 sec: 6748.0). Total num frames: 35782656. Throughput: 0: 1759.6. Samples: 3940626. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:54:52,933][41694] Avg episode reward: [(0, '4.447')] +[2024-11-07 23:54:57,618][42004] Updated weights for policy 0, policy_version 8746 (0.0025) +[2024-11-07 23:54:57,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6895.0, 300 sec: 6817.4). Total num frames: 35823616. Throughput: 0: 1766.3. Samples: 3952132. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:54:57,933][41694] Avg episode reward: [(0, '4.283')] +[2024-11-07 23:55:02,932][41694] Fps is (10 sec: 7372.8, 60 sec: 7063.7, 300 sec: 6803.5). Total num frames: 35856384. Throughput: 0: 1760.4. Samples: 3957742. Policy #0 lag: (min: 0.0, avg: 0.8, max: 4.0) +[2024-11-07 23:55:02,935][41694] Avg episode reward: [(0, '4.474')] +[2024-11-07 23:55:03,595][42004] Updated weights for policy 0, policy_version 8756 (0.0023) +[2024-11-07 23:55:07,931][41694] Fps is (10 sec: 7372.8, 60 sec: 7168.0, 300 sec: 6817.4). Total num frames: 35897344. Throughput: 0: 1763.6. Samples: 3968356. Policy #0 lag: (min: 0.0, avg: 0.8, max: 4.0) +[2024-11-07 23:55:07,936][41694] Avg episode reward: [(0, '4.357')] +[2024-11-07 23:55:08,816][42004] Updated weights for policy 0, policy_version 8766 (0.0034) +[2024-11-07 23:55:12,931][41694] Fps is (10 sec: 5734.4, 60 sec: 6826.7, 300 sec: 6748.0). Total num frames: 35913728. Throughput: 0: 1687.8. Samples: 3975892. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-11-07 23:55:12,934][41694] Avg episode reward: [(0, '4.377')] +[2024-11-07 23:55:17,209][42004] Updated weights for policy 0, policy_version 8776 (0.2340) +[2024-11-07 23:55:17,946][41694] Fps is (10 sec: 5317.2, 60 sec: 6825.0, 300 sec: 6761.5). Total num frames: 35950592. Throughput: 0: 1658.7. Samples: 3980342. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-07 23:55:17,947][41694] Avg episode reward: [(0, '4.459')] +[2024-11-07 23:55:22,801][42004] Updated weights for policy 0, policy_version 8786 (0.0040) +[2024-11-07 23:55:22,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6826.7, 300 sec: 6748.0). Total num frames: 35987456. Throughput: 0: 1659.4. Samples: 3991052. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-07 23:55:22,933][41694] Avg episode reward: [(0, '4.555')] +[2024-11-07 23:55:27,931][41694] Fps is (10 sec: 6562.9, 60 sec: 6690.1, 300 sec: 6720.2). Total num frames: 36016128. Throughput: 0: 1713.8. Samples: 4001032. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-07 23:55:27,934][41694] Avg episode reward: [(0, '4.441')] +[2024-11-07 23:55:29,033][42004] Updated weights for policy 0, policy_version 8796 (0.0034) +[2024-11-07 23:55:32,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6758.4, 300 sec: 6789.6). Total num frames: 36057088. Throughput: 0: 1714.4. Samples: 4006518. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:55:32,933][41694] Avg episode reward: [(0, '4.237')] +[2024-11-07 23:55:34,308][42004] Updated weights for policy 0, policy_version 8806 (0.0021) +[2024-11-07 23:55:37,932][41694] Fps is (10 sec: 7781.8, 60 sec: 7031.4, 300 sec: 6803.5). Total num frames: 36093952. Throughput: 0: 1715.6. Samples: 4017830. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:55:37,935][41694] Avg episode reward: [(0, '4.574')] +[2024-11-07 23:55:37,950][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000008812_36093952.pth... +[2024-11-07 23:55:38,142][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000008411_34451456.pth +[2024-11-07 23:55:39,774][42004] Updated weights for policy 0, policy_version 8816 (0.0029) +[2024-11-07 23:55:42,931][41694] Fps is (10 sec: 7372.9, 60 sec: 7031.5, 300 sec: 6789.6). Total num frames: 36130816. Throughput: 0: 1707.2. Samples: 4028958. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:55:42,933][41694] Avg episode reward: [(0, '4.229')] +[2024-11-07 23:55:47,605][42004] Updated weights for policy 0, policy_version 8826 (0.0026) +[2024-11-07 23:55:47,932][41694] Fps is (10 sec: 5734.8, 60 sec: 6690.1, 300 sec: 6734.2). Total num frames: 36151296. Throughput: 0: 1693.5. Samples: 4033950. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:55:47,933][41694] Avg episode reward: [(0, '4.304')] +[2024-11-07 23:55:52,829][42004] Updated weights for policy 0, policy_version 8836 (0.0032) +[2024-11-07 23:55:52,932][41694] Fps is (10 sec: 6143.8, 60 sec: 6826.6, 300 sec: 6761.9). Total num frames: 36192256. Throughput: 0: 1637.1. Samples: 4042028. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:55:52,935][41694] Avg episode reward: [(0, '4.270')] +[2024-11-07 23:55:57,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6758.4, 300 sec: 6789.6). Total num frames: 36229120. Throughput: 0: 1722.8. Samples: 4053418. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:55:57,933][41694] Avg episode reward: [(0, '4.509')] +[2024-11-07 23:55:58,394][42004] Updated weights for policy 0, policy_version 8846 (0.0026) +[2024-11-07 23:56:02,932][41694] Fps is (10 sec: 6963.4, 60 sec: 6758.4, 300 sec: 6817.4). Total num frames: 36261888. Throughput: 0: 1738.8. Samples: 4058562. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:56:02,934][41694] Avg episode reward: [(0, '4.267')] +[2024-11-07 23:56:04,311][42004] Updated weights for policy 0, policy_version 8856 (0.0040) +[2024-11-07 23:56:07,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6690.1, 300 sec: 6886.8). Total num frames: 36298752. Throughput: 0: 1739.9. Samples: 4069348. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-07 23:56:07,933][41694] Avg episode reward: [(0, '4.735')] +[2024-11-07 23:56:10,146][42004] Updated weights for policy 0, policy_version 8866 (0.0031) +[2024-11-07 23:56:12,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6963.2, 300 sec: 6900.7). Total num frames: 36331520. Throughput: 0: 1732.3. Samples: 4078986. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:56:12,934][41694] Avg episode reward: [(0, '4.703')] +[2024-11-07 23:56:16,844][42004] Updated weights for policy 0, policy_version 8876 (0.0026) +[2024-11-07 23:56:17,932][41694] Fps is (10 sec: 6553.5, 60 sec: 6896.6, 300 sec: 6872.9). Total num frames: 36364288. Throughput: 0: 1709.1. Samples: 4083426. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:56:17,935][41694] Avg episode reward: [(0, '4.344')] +[2024-11-07 23:56:22,932][41694] Fps is (10 sec: 5324.8, 60 sec: 6621.8, 300 sec: 6817.4). Total num frames: 36384768. Throughput: 0: 1630.2. Samples: 4091186. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-07 23:56:22,935][41694] Avg episode reward: [(0, '4.543')] +[2024-11-07 23:56:24,397][42004] Updated weights for policy 0, policy_version 8886 (0.0031) +[2024-11-07 23:56:27,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6758.4, 300 sec: 6817.4). Total num frames: 36421632. Throughput: 0: 1619.5. Samples: 4101834. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:56:27,933][41694] Avg episode reward: [(0, '4.364')] +[2024-11-07 23:56:29,619][42004] Updated weights for policy 0, policy_version 8896 (0.0026) +[2024-11-07 23:56:32,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6690.2, 300 sec: 6817.4). Total num frames: 36458496. Throughput: 0: 1638.8. Samples: 4107694. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:56:32,936][41694] Avg episode reward: [(0, '4.500')] +[2024-11-07 23:56:35,530][42004] Updated weights for policy 0, policy_version 8906 (0.0032) +[2024-11-07 23:56:37,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6690.2, 300 sec: 6867.3). Total num frames: 36495360. Throughput: 0: 1695.6. Samples: 4118328. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:56:37,933][41694] Avg episode reward: [(0, '4.463')] +[2024-11-07 23:56:40,689][42004] Updated weights for policy 0, policy_version 8916 (0.0030) +[2024-11-07 23:56:42,931][41694] Fps is (10 sec: 7782.4, 60 sec: 6758.4, 300 sec: 6886.8). Total num frames: 36536320. Throughput: 0: 1705.4. Samples: 4130160. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:56:42,933][41694] Avg episode reward: [(0, '4.567')] +[2024-11-07 23:56:45,868][42004] Updated weights for policy 0, policy_version 8926 (0.0027) +[2024-11-07 23:56:47,932][41694] Fps is (10 sec: 7782.5, 60 sec: 7031.5, 300 sec: 6886.8). Total num frames: 36573184. Throughput: 0: 1721.9. Samples: 4136046. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:56:47,934][41694] Avg episode reward: [(0, '4.565')] +[2024-11-07 23:56:51,225][42004] Updated weights for policy 0, policy_version 8936 (0.0030) +[2024-11-07 23:56:52,932][41694] Fps is (10 sec: 7782.0, 60 sec: 7031.5, 300 sec: 6886.9). Total num frames: 36614144. Throughput: 0: 1737.9. Samples: 4147554. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:56:52,933][41694] Avg episode reward: [(0, '4.344')] +[2024-11-07 23:56:57,932][41694] Fps is (10 sec: 6144.0, 60 sec: 6758.4, 300 sec: 6845.2). Total num frames: 36634624. Throughput: 0: 1678.4. Samples: 4154514. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:56:57,933][41694] Avg episode reward: [(0, '4.365')] +[2024-11-07 23:56:58,928][42004] Updated weights for policy 0, policy_version 8946 (0.0030) +[2024-11-07 23:57:02,931][41694] Fps is (10 sec: 5325.1, 60 sec: 6758.4, 300 sec: 6817.4). Total num frames: 36667392. Throughput: 0: 1704.9. Samples: 4160146. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:57:02,934][41694] Avg episode reward: [(0, '4.382')] +[2024-11-07 23:57:04,842][42004] Updated weights for policy 0, policy_version 8956 (0.0026) +[2024-11-07 23:57:07,934][41694] Fps is (10 sec: 6961.6, 60 sec: 6758.1, 300 sec: 6817.4). Total num frames: 36704256. Throughput: 0: 1762.0. Samples: 4170482. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:57:07,937][41694] Avg episode reward: [(0, '4.325')] +[2024-11-07 23:57:10,580][42004] Updated weights for policy 0, policy_version 8966 (0.0031) +[2024-11-07 23:57:12,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6826.7, 300 sec: 6869.6). Total num frames: 36741120. Throughput: 0: 1774.2. Samples: 4181674. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:57:12,932][41694] Avg episode reward: [(0, '4.236')] +[2024-11-07 23:57:15,557][42004] Updated weights for policy 0, policy_version 8976 (0.0027) +[2024-11-07 23:57:17,931][41694] Fps is (10 sec: 7784.2, 60 sec: 6963.2, 300 sec: 6900.8). Total num frames: 36782080. Throughput: 0: 1782.2. Samples: 4187892. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:57:17,933][41694] Avg episode reward: [(0, '4.538')] +[2024-11-07 23:57:21,067][42004] Updated weights for policy 0, policy_version 8986 (0.0029) +[2024-11-07 23:57:22,932][41694] Fps is (10 sec: 7782.3, 60 sec: 7236.3, 300 sec: 6914.6). Total num frames: 36818944. Throughput: 0: 1794.6. Samples: 4199084. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:57:22,934][41694] Avg episode reward: [(0, '4.774')] +[2024-11-07 23:57:26,409][42004] Updated weights for policy 0, policy_version 8996 (0.0020) +[2024-11-07 23:57:27,932][41694] Fps is (10 sec: 7372.7, 60 sec: 7236.3, 300 sec: 6914.6). Total num frames: 36855808. Throughput: 0: 1789.2. Samples: 4210674. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:57:27,933][41694] Avg episode reward: [(0, '4.475')] +[2024-11-07 23:57:32,931][41694] Fps is (10 sec: 5734.5, 60 sec: 6963.2, 300 sec: 6859.1). Total num frames: 36876288. Throughput: 0: 1700.6. Samples: 4212572. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:57:32,933][41694] Avg episode reward: [(0, '4.539')] +[2024-11-07 23:57:34,170][42004] Updated weights for policy 0, policy_version 9006 (0.0027) +[2024-11-07 23:57:37,932][41694] Fps is (10 sec: 5734.1, 60 sec: 6963.2, 300 sec: 6859.1). Total num frames: 36913152. Throughput: 0: 1682.5. Samples: 4223268. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:57:37,935][41694] Avg episode reward: [(0, '4.357')] +[2024-11-07 23:57:37,978][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000009013_36917248.pth... +[2024-11-07 23:57:38,134][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000008609_35262464.pth +[2024-11-07 23:57:39,933][42004] Updated weights for policy 0, policy_version 9016 (0.0032) +[2024-11-07 23:57:42,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6894.9, 300 sec: 6872.9). Total num frames: 36950016. Throughput: 0: 1753.9. Samples: 4233438. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:57:42,933][41694] Avg episode reward: [(0, '4.248')] +[2024-11-07 23:57:45,664][42004] Updated weights for policy 0, policy_version 9026 (0.0034) +[2024-11-07 23:57:47,932][41694] Fps is (10 sec: 6963.5, 60 sec: 6826.7, 300 sec: 6914.6). Total num frames: 36982784. Throughput: 0: 1750.6. Samples: 4238924. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:57:47,934][41694] Avg episode reward: [(0, '4.359')] +[2024-11-07 23:57:51,668][42004] Updated weights for policy 0, policy_version 9036 (0.0039) +[2024-11-07 23:57:52,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6758.4, 300 sec: 6914.6). Total num frames: 37019648. Throughput: 0: 1747.5. Samples: 4249114. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:57:52,934][41694] Avg episode reward: [(0, '4.531')] +[2024-11-07 23:57:57,045][42004] Updated weights for policy 0, policy_version 9046 (0.0029) +[2024-11-07 23:57:57,932][41694] Fps is (10 sec: 7372.8, 60 sec: 7031.5, 300 sec: 6914.6). Total num frames: 37056512. Throughput: 0: 1754.6. Samples: 4260630. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:57:57,934][41694] Avg episode reward: [(0, '4.565')] +[2024-11-07 23:58:02,655][42004] Updated weights for policy 0, policy_version 9056 (0.0022) +[2024-11-07 23:58:04,713][41694] Fps is (10 sec: 6258.2, 60 sec: 6895.0, 300 sec: 6873.1). Total num frames: 37093376. Throughput: 0: 1673.2. Samples: 4266168. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:58:04,715][41694] Avg episode reward: [(0, '4.382')] +[2024-11-07 23:58:07,931][41694] Fps is (10 sec: 5734.5, 60 sec: 6826.9, 300 sec: 6873.0). Total num frames: 37113856. Throughput: 0: 1655.8. Samples: 4273594. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:58:07,934][41694] Avg episode reward: [(0, '4.229')] +[2024-11-07 23:58:10,040][42004] Updated weights for policy 0, policy_version 9066 (0.0029) +[2024-11-07 23:58:12,932][41694] Fps is (10 sec: 6977.1, 60 sec: 6826.6, 300 sec: 6859.1). Total num frames: 37150720. Throughput: 0: 1642.4. Samples: 4284580. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:58:12,934][41694] Avg episode reward: [(0, '4.072')] +[2024-11-07 23:58:16,022][42004] Updated weights for policy 0, policy_version 9076 (0.0029) +[2024-11-07 23:58:17,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6758.4, 300 sec: 6859.1). Total num frames: 37187584. Throughput: 0: 1712.7. Samples: 4289642. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:58:17,935][41694] Avg episode reward: [(0, '4.253')] +[2024-11-07 23:58:21,602][42004] Updated weights for policy 0, policy_version 9086 (0.0026) +[2024-11-07 23:58:22,932][41694] Fps is (10 sec: 7372.6, 60 sec: 6758.4, 300 sec: 6903.5). Total num frames: 37224448. Throughput: 0: 1718.4. Samples: 4300596. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:58:22,934][41694] Avg episode reward: [(0, '4.341')] +[2024-11-07 23:58:26,789][42004] Updated weights for policy 0, policy_version 9096 (0.0031) +[2024-11-07 23:58:27,931][41694] Fps is (10 sec: 7782.4, 60 sec: 6826.7, 300 sec: 6928.5). Total num frames: 37265408. Throughput: 0: 1752.4. Samples: 4312296. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:58:27,933][41694] Avg episode reward: [(0, '4.428')] +[2024-11-07 23:58:31,901][42004] Updated weights for policy 0, policy_version 9106 (0.0031) +[2024-11-07 23:58:32,932][41694] Fps is (10 sec: 7782.6, 60 sec: 7099.7, 300 sec: 6914.6). Total num frames: 37302272. Throughput: 0: 1761.3. Samples: 4318182. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:58:32,934][41694] Avg episode reward: [(0, '4.599')] +[2024-11-07 23:58:39,080][41694] Fps is (10 sec: 6245.9, 60 sec: 6899.5, 300 sec: 6874.0). Total num frames: 37335040. Throughput: 0: 1755.0. Samples: 4330104. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:58:39,083][41694] Avg episode reward: [(0, '4.505')] +[2024-11-07 23:58:39,289][42004] Updated weights for policy 0, policy_version 9116 (0.0033) +[2024-11-07 23:58:42,931][41694] Fps is (10 sec: 6144.0, 60 sec: 6894.9, 300 sec: 6872.9). Total num frames: 37363712. Throughput: 0: 1710.4. Samples: 4337596. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:58:42,933][41694] Avg episode reward: [(0, '4.343')] +[2024-11-07 23:58:44,615][42004] Updated weights for policy 0, policy_version 9126 (0.0029) +[2024-11-07 23:58:47,932][41694] Fps is (10 sec: 7866.5, 60 sec: 7031.5, 300 sec: 6886.8). Total num frames: 37404672. Throughput: 0: 1788.9. Samples: 4343484. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-07 23:58:47,935][41694] Avg episode reward: [(0, '4.231')] +[2024-11-07 23:58:50,433][42004] Updated weights for policy 0, policy_version 9136 (0.0029) +[2024-11-07 23:58:52,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6963.2, 300 sec: 6873.0). Total num frames: 37437440. Throughput: 0: 1781.8. Samples: 4353774. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:58:52,936][41694] Avg episode reward: [(0, '4.406')] +[2024-11-07 23:58:55,882][42004] Updated weights for policy 0, policy_version 9146 (0.0026) +[2024-11-07 23:58:57,932][41694] Fps is (10 sec: 6963.0, 60 sec: 6963.2, 300 sec: 6921.0). Total num frames: 37474304. Throughput: 0: 1793.2. Samples: 4365276. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:58:57,933][41694] Avg episode reward: [(0, '4.374')] +[2024-11-07 23:59:01,336][42004] Updated weights for policy 0, policy_version 9156 (0.0025) +[2024-11-07 23:59:02,932][41694] Fps is (10 sec: 7372.8, 60 sec: 7176.2, 300 sec: 6928.5). Total num frames: 37511168. Throughput: 0: 1809.5. Samples: 4371070. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:59:02,934][41694] Avg episode reward: [(0, '4.405')] +[2024-11-07 23:59:06,729][42004] Updated weights for policy 0, policy_version 9166 (0.0027) +[2024-11-07 23:59:07,932][41694] Fps is (10 sec: 7782.5, 60 sec: 7304.5, 300 sec: 6942.4). Total num frames: 37552128. Throughput: 0: 1815.4. Samples: 4382290. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:59:07,934][41694] Avg episode reward: [(0, '4.406')] +[2024-11-07 23:59:13,461][41694] Fps is (10 sec: 6224.1, 60 sec: 7037.6, 300 sec: 6888.4). Total num frames: 37576704. Throughput: 0: 1662.2. Samples: 4387974. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:59:13,463][41694] Avg episode reward: [(0, '4.322')] +[2024-11-07 23:59:14,325][42004] Updated weights for policy 0, policy_version 9176 (0.0026) +[2024-11-07 23:59:17,932][41694] Fps is (10 sec: 5734.5, 60 sec: 7031.5, 300 sec: 6886.8). Total num frames: 37609472. Throughput: 0: 1711.0. Samples: 4395176. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:59:17,938][41694] Avg episode reward: [(0, '4.439')] +[2024-11-07 23:59:19,884][42004] Updated weights for policy 0, policy_version 9186 (0.0024) +[2024-11-07 23:59:22,931][41694] Fps is (10 sec: 6920.0, 60 sec: 6963.2, 300 sec: 6872.9). Total num frames: 37642240. Throughput: 0: 1732.6. Samples: 4406082. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:59:22,934][41694] Avg episode reward: [(0, '4.282')] +[2024-11-07 23:59:26,210][42004] Updated weights for policy 0, policy_version 9196 (0.0034) +[2024-11-07 23:59:27,932][41694] Fps is (10 sec: 6553.5, 60 sec: 6826.6, 300 sec: 6859.1). Total num frames: 37675008. Throughput: 0: 1731.0. Samples: 4415490. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:59:27,933][41694] Avg episode reward: [(0, '4.313')] +[2024-11-07 23:59:32,034][42004] Updated weights for policy 0, policy_version 9206 (0.0054) +[2024-11-07 23:59:32,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6826.7, 300 sec: 6914.6). Total num frames: 37711872. Throughput: 0: 1716.0. Samples: 4420706. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-07 23:59:32,933][41694] Avg episode reward: [(0, '4.547')] +[2024-11-07 23:59:37,585][42004] Updated weights for policy 0, policy_version 9216 (0.0039) +[2024-11-07 23:59:37,934][41694] Fps is (10 sec: 7371.1, 60 sec: 7029.2, 300 sec: 6914.5). Total num frames: 37748736. Throughput: 0: 1733.6. Samples: 4431790. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:59:37,936][41694] Avg episode reward: [(0, '4.472')] +[2024-11-07 23:59:37,973][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000009216_37748736.pth... +[2024-11-07 23:59:38,111][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000008812_36093952.pth +[2024-11-07 23:59:42,931][41694] Fps is (10 sec: 7372.8, 60 sec: 7031.5, 300 sec: 6900.7). Total num frames: 37785600. Throughput: 0: 1722.7. Samples: 4442796. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:59:42,933][41694] Avg episode reward: [(0, '4.356')] +[2024-11-07 23:59:43,163][42004] Updated weights for policy 0, policy_version 9226 (0.0043) +[2024-11-07 23:59:47,931][41694] Fps is (10 sec: 5735.9, 60 sec: 6690.1, 300 sec: 6859.1). Total num frames: 37806080. Throughput: 0: 1721.4. Samples: 4448532. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-07 23:59:47,933][41694] Avg episode reward: [(0, '4.273')] +[2024-11-07 23:59:50,597][42004] Updated weights for policy 0, policy_version 9236 (0.0029) +[2024-11-07 23:59:52,932][41694] Fps is (10 sec: 6143.6, 60 sec: 6826.6, 300 sec: 6859.1). Total num frames: 37847040. Throughput: 0: 1640.4. Samples: 4456108. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:59:52,936][41694] Avg episode reward: [(0, '4.441')] +[2024-11-07 23:59:56,328][42004] Updated weights for policy 0, policy_version 9246 (0.0037) +[2024-11-07 23:59:57,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6758.4, 300 sec: 6859.1). Total num frames: 37879808. Throughput: 0: 1766.3. Samples: 4466524. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-07 23:59:57,933][41694] Avg episode reward: [(0, '4.344')] +[2024-11-08 00:00:02,783][42004] Updated weights for policy 0, policy_version 9256 (0.0037) +[2024-11-08 00:00:02,932][41694] Fps is (10 sec: 6553.7, 60 sec: 6690.1, 300 sec: 6831.3). Total num frames: 37912576. Throughput: 0: 1693.1. Samples: 4471368. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:00:02,933][41694] Avg episode reward: [(0, '4.537')] +[2024-11-08 00:00:07,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6621.9, 300 sec: 6900.7). Total num frames: 37949440. Throughput: 0: 1685.2. Samples: 4481918. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:00:07,934][41694] Avg episode reward: [(0, '4.360')] +[2024-11-08 00:00:08,049][42004] Updated weights for policy 0, policy_version 9266 (0.0028) +[2024-11-08 00:00:12,932][41694] Fps is (10 sec: 7373.0, 60 sec: 6887.4, 300 sec: 6901.1). Total num frames: 37986304. Throughput: 0: 1732.0. Samples: 4493428. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:00:12,933][41694] Avg episode reward: [(0, '4.389')] +[2024-11-08 00:00:13,673][42004] Updated weights for policy 0, policy_version 9276 (0.0028) +[2024-11-08 00:00:17,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6894.9, 300 sec: 6900.7). Total num frames: 38023168. Throughput: 0: 1722.4. Samples: 4498214. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:00:17,933][41694] Avg episode reward: [(0, '4.443')] +[2024-11-08 00:00:19,486][42004] Updated weights for policy 0, policy_version 9286 (0.0033) +[2024-11-08 00:00:22,931][41694] Fps is (10 sec: 5734.4, 60 sec: 6690.1, 300 sec: 6873.0). Total num frames: 38043648. Throughput: 0: 1678.4. Samples: 4507312. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:00:22,933][41694] Avg episode reward: [(0, '4.467')] +[2024-11-08 00:00:27,359][42004] Updated weights for policy 0, policy_version 9296 (0.0027) +[2024-11-08 00:00:27,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6758.4, 300 sec: 6859.1). Total num frames: 38080512. Throughput: 0: 1629.5. Samples: 4516122. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:00:27,933][41694] Avg episode reward: [(0, '4.512')] +[2024-11-08 00:00:32,533][42004] Updated weights for policy 0, policy_version 9306 (0.0031) +[2024-11-08 00:00:32,932][41694] Fps is (10 sec: 7372.4, 60 sec: 6758.3, 300 sec: 6859.1). Total num frames: 38117376. Throughput: 0: 1630.8. Samples: 4521920. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:00:32,934][41694] Avg episode reward: [(0, '4.504')] +[2024-11-08 00:00:37,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6758.7, 300 sec: 6859.1). Total num frames: 38154240. Throughput: 0: 1718.1. Samples: 4533424. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:00:37,934][41694] Avg episode reward: [(0, '4.351')] +[2024-11-08 00:00:38,217][42004] Updated weights for policy 0, policy_version 9316 (0.0034) +[2024-11-08 00:00:42,932][41694] Fps is (10 sec: 7373.2, 60 sec: 6758.4, 300 sec: 6914.6). Total num frames: 38191104. Throughput: 0: 1719.0. Samples: 4543878. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:00:42,933][41694] Avg episode reward: [(0, '4.384')] +[2024-11-08 00:00:43,780][42004] Updated weights for policy 0, policy_version 9326 (0.0027) +[2024-11-08 00:00:47,933][41694] Fps is (10 sec: 7371.9, 60 sec: 7031.3, 300 sec: 6900.7). Total num frames: 38227968. Throughput: 0: 1732.9. Samples: 4549352. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:00:47,936][41694] Avg episode reward: [(0, '4.482')] +[2024-11-08 00:00:49,424][42004] Updated weights for policy 0, policy_version 9336 (0.0020) +[2024-11-08 00:00:52,932][41694] Fps is (10 sec: 7372.6, 60 sec: 6963.2, 300 sec: 6900.7). Total num frames: 38264832. Throughput: 0: 1750.3. Samples: 4560684. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:00:52,934][41694] Avg episode reward: [(0, '4.469')] +[2024-11-08 00:00:56,855][42004] Updated weights for policy 0, policy_version 9346 (0.0029) +[2024-11-08 00:00:57,931][41694] Fps is (10 sec: 6144.9, 60 sec: 6826.7, 300 sec: 6873.0). Total num frames: 38289408. Throughput: 0: 1663.4. Samples: 4568282. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:00:57,933][41694] Avg episode reward: [(0, '4.466')] +[2024-11-08 00:01:02,284][42004] Updated weights for policy 0, policy_version 9356 (0.0032) +[2024-11-08 00:01:02,931][41694] Fps is (10 sec: 6144.2, 60 sec: 6895.0, 300 sec: 6872.9). Total num frames: 38326272. Throughput: 0: 1682.8. Samples: 4573938. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:01:02,933][41694] Avg episode reward: [(0, '4.263')] +[2024-11-08 00:01:07,782][42004] Updated weights for policy 0, policy_version 9366 (0.0026) +[2024-11-08 00:01:07,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6894.9, 300 sec: 6886.8). Total num frames: 38363136. Throughput: 0: 1725.5. Samples: 4584960. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:01:07,933][41694] Avg episode reward: [(0, '4.327')] +[2024-11-08 00:01:12,932][41694] Fps is (10 sec: 6553.4, 60 sec: 6758.4, 300 sec: 6872.9). Total num frames: 38391808. Throughput: 0: 1751.4. Samples: 4594936. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:01:12,937][41694] Avg episode reward: [(0, '4.522')] +[2024-11-08 00:01:14,490][42004] Updated weights for policy 0, policy_version 9376 (0.0034) +[2024-11-08 00:01:17,932][41694] Fps is (10 sec: 6144.0, 60 sec: 6690.1, 300 sec: 6914.6). Total num frames: 38424576. Throughput: 0: 1720.2. Samples: 4599330. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:01:17,935][41694] Avg episode reward: [(0, '4.539')] +[2024-11-08 00:01:20,044][42004] Updated weights for policy 0, policy_version 9386 (0.0032) +[2024-11-08 00:01:22,932][41694] Fps is (10 sec: 7373.0, 60 sec: 7031.5, 300 sec: 6928.5). Total num frames: 38465536. Throughput: 0: 1713.7. Samples: 4610538. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:01:22,940][41694] Avg episode reward: [(0, '4.389')] +[2024-11-08 00:01:25,708][42004] Updated weights for policy 0, policy_version 9396 (0.0022) +[2024-11-08 00:01:27,931][41694] Fps is (10 sec: 7782.5, 60 sec: 7031.5, 300 sec: 6928.5). Total num frames: 38502400. Throughput: 0: 1729.6. Samples: 4621710. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:01:27,933][41694] Avg episode reward: [(0, '4.394')] +[2024-11-08 00:01:32,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6758.4, 300 sec: 6872.9). Total num frames: 38522880. Throughput: 0: 1680.1. Samples: 4624956. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:01:32,934][41694] Avg episode reward: [(0, '4.334')] +[2024-11-08 00:01:33,382][42004] Updated weights for policy 0, policy_version 9406 (0.0031) +[2024-11-08 00:01:37,931][41694] Fps is (10 sec: 5734.4, 60 sec: 6758.4, 300 sec: 6859.1). Total num frames: 38559744. Throughput: 0: 1639.3. Samples: 4634452. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:01:37,933][41694] Avg episode reward: [(0, '4.435')] +[2024-11-08 00:01:38,053][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000009415_38563840.pth... +[2024-11-08 00:01:38,217][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000009013_36917248.pth +[2024-11-08 00:01:38,623][42004] Updated weights for policy 0, policy_version 9416 (0.0025) +[2024-11-08 00:01:42,931][41694] Fps is (10 sec: 7782.6, 60 sec: 6826.7, 300 sec: 6873.0). Total num frames: 38600704. Throughput: 0: 1729.2. Samples: 4646098. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:01:42,933][41694] Avg episode reward: [(0, '4.520')] +[2024-11-08 00:01:43,921][42004] Updated weights for policy 0, policy_version 9426 (0.0026) +[2024-11-08 00:01:47,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6758.6, 300 sec: 6845.2). Total num frames: 38633472. Throughput: 0: 1722.4. Samples: 4651444. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:01:47,934][41694] Avg episode reward: [(0, '4.413')] +[2024-11-08 00:01:49,978][42004] Updated weights for policy 0, policy_version 9436 (0.0033) +[2024-11-08 00:01:52,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6758.4, 300 sec: 6900.7). Total num frames: 38670336. Throughput: 0: 1712.7. Samples: 4662032. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:01:52,933][41694] Avg episode reward: [(0, '4.430')] +[2024-11-08 00:01:55,341][42004] Updated weights for policy 0, policy_version 9446 (0.0028) +[2024-11-08 00:01:57,932][41694] Fps is (10 sec: 7782.3, 60 sec: 7031.4, 300 sec: 6928.5). Total num frames: 38711296. Throughput: 0: 1749.0. Samples: 4673640. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:01:57,934][41694] Avg episode reward: [(0, '4.500')] +[2024-11-08 00:02:00,740][42004] Updated weights for policy 0, policy_version 9456 (0.0031) +[2024-11-08 00:02:02,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6963.2, 300 sec: 6914.7). Total num frames: 38744064. Throughput: 0: 1774.8. Samples: 4679198. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:02:02,934][41694] Avg episode reward: [(0, '4.477')] +[2024-11-08 00:02:07,932][41694] Fps is (10 sec: 5734.5, 60 sec: 6758.4, 300 sec: 6872.9). Total num frames: 38768640. Throughput: 0: 1682.4. Samples: 4686246. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:02:07,933][41694] Avg episode reward: [(0, '4.339')] +[2024-11-08 00:02:08,386][42004] Updated weights for policy 0, policy_version 9466 (0.0032) +[2024-11-08 00:02:12,932][41694] Fps is (10 sec: 6144.0, 60 sec: 6895.0, 300 sec: 6859.1). Total num frames: 38805504. Throughput: 0: 1681.6. Samples: 4697384. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:02:12,935][41694] Avg episode reward: [(0, '4.681')] +[2024-11-08 00:02:13,838][42004] Updated weights for policy 0, policy_version 9476 (0.0025) +[2024-11-08 00:02:17,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6963.2, 300 sec: 6859.1). Total num frames: 38842368. Throughput: 0: 1735.4. Samples: 4703048. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:02:17,940][41694] Avg episode reward: [(0, '4.264')] +[2024-11-08 00:02:19,568][42004] Updated weights for policy 0, policy_version 9486 (0.0024) +[2024-11-08 00:02:22,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6826.7, 300 sec: 6845.2). Total num frames: 38875136. Throughput: 0: 1760.2. Samples: 4713660. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:02:22,934][41694] Avg episode reward: [(0, '4.568')] +[2024-11-08 00:02:25,134][42004] Updated weights for policy 0, policy_version 9496 (0.0032) +[2024-11-08 00:02:27,933][41694] Fps is (10 sec: 7371.6, 60 sec: 6894.7, 300 sec: 6914.6). Total num frames: 38916096. Throughput: 0: 1759.5. Samples: 4725276. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:02:27,935][41694] Avg episode reward: [(0, '4.348')] +[2024-11-08 00:02:30,260][42004] Updated weights for policy 0, policy_version 9506 (0.0029) +[2024-11-08 00:02:32,932][41694] Fps is (10 sec: 8191.8, 60 sec: 7236.3, 300 sec: 6928.5). Total num frames: 38957056. Throughput: 0: 1770.9. Samples: 4731136. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:02:32,934][41694] Avg episode reward: [(0, '4.439')] +[2024-11-08 00:02:35,514][42004] Updated weights for policy 0, policy_version 9516 (0.0031) +[2024-11-08 00:02:39,609][41694] Fps is (10 sec: 6314.5, 60 sec: 6973.0, 300 sec: 6875.5). Total num frames: 38989824. Throughput: 0: 1731.6. Samples: 4742858. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:02:39,613][41694] Avg episode reward: [(0, '4.379')] +[2024-11-08 00:02:42,876][42004] Updated weights for policy 0, policy_version 9526 (0.0025) +[2024-11-08 00:02:42,932][41694] Fps is (10 sec: 6143.9, 60 sec: 6963.1, 300 sec: 6900.7). Total num frames: 39018496. Throughput: 0: 1708.9. Samples: 4750540. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:02:42,934][41694] Avg episode reward: [(0, '4.521')] +[2024-11-08 00:02:47,932][41694] Fps is (10 sec: 7874.3, 60 sec: 7031.4, 300 sec: 6900.7). Total num frames: 39055360. Throughput: 0: 1718.6. Samples: 4756536. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:02:47,936][41694] Avg episode reward: [(0, '4.459')] +[2024-11-08 00:02:48,116][42004] Updated weights for policy 0, policy_version 9536 (0.0020) +[2024-11-08 00:02:52,932][41694] Fps is (10 sec: 6963.6, 60 sec: 6963.2, 300 sec: 6886.8). Total num frames: 39088128. Throughput: 0: 1805.2. Samples: 4767482. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:02:52,934][41694] Avg episode reward: [(0, '4.486')] +[2024-11-08 00:02:54,187][42004] Updated weights for policy 0, policy_version 9546 (0.0043) +[2024-11-08 00:02:57,931][41694] Fps is (10 sec: 6963.5, 60 sec: 6895.0, 300 sec: 6928.7). Total num frames: 39124992. Throughput: 0: 1786.0. Samples: 4777752. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:02:57,933][41694] Avg episode reward: [(0, '4.247')] +[2024-11-08 00:02:59,719][42004] Updated weights for policy 0, policy_version 9556 (0.0028) +[2024-11-08 00:03:02,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6963.2, 300 sec: 6942.4). Total num frames: 39161856. Throughput: 0: 1787.2. Samples: 4783474. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:03:02,934][41694] Avg episode reward: [(0, '4.185')] +[2024-11-08 00:03:05,567][42004] Updated weights for policy 0, policy_version 9566 (0.0035) +[2024-11-08 00:03:07,932][41694] Fps is (10 sec: 7372.7, 60 sec: 7168.0, 300 sec: 6942.4). Total num frames: 39198720. Throughput: 0: 1784.0. Samples: 4793940. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:03:07,933][41694] Avg episode reward: [(0, '4.215')] +[2024-11-08 00:03:10,993][42004] Updated weights for policy 0, policy_version 9576 (0.0032) +[2024-11-08 00:03:13,947][41694] Fps is (10 sec: 5949.5, 60 sec: 6914.5, 300 sec: 6890.9). Total num frames: 39227392. Throughput: 0: 1615.2. Samples: 4799598. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:03:13,949][41694] Avg episode reward: [(0, '4.293')] +[2024-11-08 00:03:17,932][41694] Fps is (10 sec: 5734.2, 60 sec: 6894.9, 300 sec: 6886.8). Total num frames: 39256064. Throughput: 0: 1678.9. Samples: 4806686. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:03:17,934][41694] Avg episode reward: [(0, '4.203')] +[2024-11-08 00:03:18,624][42004] Updated weights for policy 0, policy_version 9586 (0.0025) +[2024-11-08 00:03:22,932][41694] Fps is (10 sec: 7750.0, 60 sec: 7031.5, 300 sec: 6886.8). Total num frames: 39297024. Throughput: 0: 1741.9. Samples: 4818320. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:03:22,933][41694] Avg episode reward: [(0, '4.466')] +[2024-11-08 00:03:23,952][42004] Updated weights for policy 0, policy_version 9596 (0.0026) +[2024-11-08 00:03:27,931][41694] Fps is (10 sec: 7373.1, 60 sec: 6895.1, 300 sec: 6873.0). Total num frames: 39329792. Throughput: 0: 1743.0. Samples: 4828972. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:03:27,933][41694] Avg episode reward: [(0, '4.714')] +[2024-11-08 00:03:30,039][42004] Updated weights for policy 0, policy_version 9606 (0.0028) +[2024-11-08 00:03:32,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6826.7, 300 sec: 6913.7). Total num frames: 39366656. Throughput: 0: 1722.4. Samples: 4834044. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:03:32,933][41694] Avg episode reward: [(0, '4.451')] +[2024-11-08 00:03:35,449][42004] Updated weights for policy 0, policy_version 9616 (0.0027) +[2024-11-08 00:03:37,932][41694] Fps is (10 sec: 7372.6, 60 sec: 7093.2, 300 sec: 6914.6). Total num frames: 39403520. Throughput: 0: 1732.7. Samples: 4845456. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:03:37,933][41694] Avg episode reward: [(0, '4.348')] +[2024-11-08 00:03:37,950][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000009620_39403520.pth... +[2024-11-08 00:03:38,086][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000009216_37748736.pth +[2024-11-08 00:03:40,928][42004] Updated weights for policy 0, policy_version 9626 (0.0045) +[2024-11-08 00:03:42,933][41694] Fps is (10 sec: 7371.9, 60 sec: 7031.4, 300 sec: 6900.7). Total num frames: 39440384. Throughput: 0: 1751.6. Samples: 4856578. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:03:42,935][41694] Avg episode reward: [(0, '4.331')] +[2024-11-08 00:03:48,378][41694] Fps is (10 sec: 5881.6, 60 sec: 6776.3, 300 sec: 6862.6). Total num frames: 39464960. Throughput: 0: 1738.0. Samples: 4862460. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:03:48,381][41694] Avg episode reward: [(0, '4.481')] +[2024-11-08 00:03:48,420][42004] Updated weights for policy 0, policy_version 9636 (0.0028) +[2024-11-08 00:03:52,938][41694] Fps is (10 sec: 5731.6, 60 sec: 6826.0, 300 sec: 6858.9). Total num frames: 39497728. Throughput: 0: 1679.5. Samples: 4869530. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:03:52,945][41694] Avg episode reward: [(0, '4.522')] +[2024-11-08 00:03:54,290][42004] Updated weights for policy 0, policy_version 9646 (0.0032) +[2024-11-08 00:03:57,932][41694] Fps is (10 sec: 6859.7, 60 sec: 6758.4, 300 sec: 6845.2). Total num frames: 39530496. Throughput: 0: 1812.5. Samples: 4879322. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:03:57,934][41694] Avg episode reward: [(0, '4.683')] +[2024-11-08 00:04:01,068][42004] Updated weights for policy 0, policy_version 9656 (0.0043) +[2024-11-08 00:04:02,931][41694] Fps is (10 sec: 6147.8, 60 sec: 6621.9, 300 sec: 6803.5). Total num frames: 39559168. Throughput: 0: 1717.3. Samples: 4883964. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:04:02,934][41694] Avg episode reward: [(0, '4.516')] +[2024-11-08 00:04:07,380][42004] Updated weights for policy 0, policy_version 9666 (0.0035) +[2024-11-08 00:04:07,932][41694] Fps is (10 sec: 6144.0, 60 sec: 6553.6, 300 sec: 6843.6). Total num frames: 39591936. Throughput: 0: 1664.2. Samples: 4893210. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:04:07,934][41694] Avg episode reward: [(0, '4.581')] +[2024-11-08 00:04:12,933][41694] Fps is (10 sec: 6552.8, 60 sec: 6735.7, 300 sec: 6831.3). Total num frames: 39624704. Throughput: 0: 1639.3. Samples: 4902742. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:04:12,936][41694] Avg episode reward: [(0, '4.568')] +[2024-11-08 00:04:13,826][42004] Updated weights for policy 0, policy_version 9676 (0.0028) +[2024-11-08 00:04:17,932][41694] Fps is (10 sec: 6963.0, 60 sec: 6758.4, 300 sec: 6845.2). Total num frames: 39661568. Throughput: 0: 1643.5. Samples: 4908000. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:04:17,935][41694] Avg episode reward: [(0, '4.404')] +[2024-11-08 00:04:19,430][42004] Updated weights for policy 0, policy_version 9686 (0.0034) +[2024-11-08 00:04:22,931][41694] Fps is (10 sec: 5735.1, 60 sec: 6417.1, 300 sec: 6803.5). Total num frames: 39682048. Throughput: 0: 1615.4. Samples: 4918150. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:04:22,933][41694] Avg episode reward: [(0, '4.380')] +[2024-11-08 00:04:27,147][42004] Updated weights for policy 0, policy_version 9696 (0.0025) +[2024-11-08 00:04:27,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6485.3, 300 sec: 6803.5). Total num frames: 39718912. Throughput: 0: 1540.8. Samples: 4925914. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:04:27,934][41694] Avg episode reward: [(0, '4.525')] +[2024-11-08 00:04:32,760][42004] Updated weights for policy 0, policy_version 9706 (0.0031) +[2024-11-08 00:04:32,933][41694] Fps is (10 sec: 7371.4, 60 sec: 6485.1, 300 sec: 6803.5). Total num frames: 39755776. Throughput: 0: 1550.5. Samples: 4931544. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:04:32,938][41694] Avg episode reward: [(0, '4.659')] +[2024-11-08 00:04:37,932][41694] Fps is (10 sec: 7373.0, 60 sec: 6485.3, 300 sec: 6803.5). Total num frames: 39792640. Throughput: 0: 1618.0. Samples: 4942328. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:04:37,933][41694] Avg episode reward: [(0, '4.394')] +[2024-11-08 00:04:38,476][42004] Updated weights for policy 0, policy_version 9716 (0.0027) +[2024-11-08 00:04:42,932][41694] Fps is (10 sec: 6554.4, 60 sec: 6348.9, 300 sec: 6831.3). Total num frames: 39821312. Throughput: 0: 1626.8. Samples: 4952530. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:04:42,934][41694] Avg episode reward: [(0, '4.428')] +[2024-11-08 00:04:45,128][42004] Updated weights for policy 0, policy_version 9726 (0.0040) +[2024-11-08 00:04:47,932][41694] Fps is (10 sec: 6144.0, 60 sec: 6533.9, 300 sec: 6803.5). Total num frames: 39854080. Throughput: 0: 1615.6. Samples: 4956666. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-11-08 00:04:47,933][41694] Avg episode reward: [(0, '4.332')] +[2024-11-08 00:04:51,556][42004] Updated weights for policy 0, policy_version 9736 (0.0043) +[2024-11-08 00:04:52,932][41694] Fps is (10 sec: 6144.3, 60 sec: 6417.7, 300 sec: 6789.6). Total num frames: 39882752. Throughput: 0: 1622.8. Samples: 4966238. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-11-08 00:04:52,933][41694] Avg episode reward: [(0, '4.268')] +[2024-11-08 00:04:57,932][41694] Fps is (10 sec: 4505.4, 60 sec: 6144.0, 300 sec: 6734.1). Total num frames: 39899136. Throughput: 0: 1540.9. Samples: 4972082. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:04:57,936][41694] Avg episode reward: [(0, '4.230')] +[2024-11-08 00:05:00,688][42004] Updated weights for policy 0, policy_version 9746 (0.0038) +[2024-11-08 00:05:02,931][41694] Fps is (10 sec: 4915.3, 60 sec: 6212.3, 300 sec: 6720.2). Total num frames: 39931904. Throughput: 0: 1520.9. Samples: 4976442. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:05:02,933][41694] Avg episode reward: [(0, '4.328')] +[2024-11-08 00:05:07,599][42004] Updated weights for policy 0, policy_version 9756 (0.0031) +[2024-11-08 00:05:07,932][41694] Fps is (10 sec: 6143.9, 60 sec: 6143.9, 300 sec: 6692.4). Total num frames: 39960576. Throughput: 0: 1490.3. Samples: 4985214. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:05:07,935][41694] Avg episode reward: [(0, '4.129')] +[2024-11-08 00:05:12,934][41694] Fps is (10 sec: 5733.1, 60 sec: 6075.6, 300 sec: 6664.6). Total num frames: 39989248. Throughput: 0: 1519.1. Samples: 4994276. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:05:12,936][41694] Avg episode reward: [(0, '4.295')] +[2024-11-08 00:05:14,610][42004] Updated weights for policy 0, policy_version 9766 (0.0053) +[2024-11-08 00:05:17,932][41694] Fps is (10 sec: 5325.1, 60 sec: 5871.0, 300 sec: 6678.6). Total num frames: 40013824. Throughput: 0: 1484.6. Samples: 4998348. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:05:17,934][41694] Avg episode reward: [(0, '4.438')] +[2024-11-08 00:05:22,937][41694] Fps is (10 sec: 4913.5, 60 sec: 5938.6, 300 sec: 6636.8). Total num frames: 40038400. Throughput: 0: 1401.5. Samples: 5005402. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:05:22,941][41694] Avg episode reward: [(0, '4.319')] +[2024-11-08 00:05:22,963][42004] Updated weights for policy 0, policy_version 9776 (0.0026) +[2024-11-08 00:05:27,932][41694] Fps is (10 sec: 6144.0, 60 sec: 5939.2, 300 sec: 6636.9). Total num frames: 40075264. Throughput: 0: 1383.3. Samples: 5014778. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:05:27,934][41694] Avg episode reward: [(0, '4.329')] +[2024-11-08 00:05:29,190][42004] Updated weights for policy 0, policy_version 9786 (0.0025) +[2024-11-08 00:05:32,932][41694] Fps is (10 sec: 5327.9, 60 sec: 5598.0, 300 sec: 6567.5). Total num frames: 40091648. Throughput: 0: 1376.1. Samples: 5018592. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:05:32,933][41694] Avg episode reward: [(0, '4.666')] +[2024-11-08 00:05:37,473][42004] Updated weights for policy 0, policy_version 9796 (0.0035) +[2024-11-08 00:05:37,932][41694] Fps is (10 sec: 4915.2, 60 sec: 5529.6, 300 sec: 6553.6). Total num frames: 40124416. Throughput: 0: 1331.2. Samples: 5026144. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:05:37,934][41694] Avg episode reward: [(0, '4.441')] +[2024-11-08 00:05:38,045][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000009797_40128512.pth... +[2024-11-08 00:05:38,206][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000009415_38563840.pth +[2024-11-08 00:05:42,932][41694] Fps is (10 sec: 6963.2, 60 sec: 5666.2, 300 sec: 6553.6). Total num frames: 40161280. Throughput: 0: 1427.1. Samples: 5036302. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:05:42,933][41694] Avg episode reward: [(0, '4.307')] +[2024-11-08 00:05:43,487][42004] Updated weights for policy 0, policy_version 9806 (0.0027) +[2024-11-08 00:05:47,934][41694] Fps is (10 sec: 6961.9, 60 sec: 5665.9, 300 sec: 6539.7). Total num frames: 40194048. Throughput: 0: 1443.8. Samples: 5041416. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:05:47,935][41694] Avg episode reward: [(0, '4.543')] +[2024-11-08 00:05:49,544][42004] Updated weights for policy 0, policy_version 9816 (0.0049) +[2024-11-08 00:05:52,932][41694] Fps is (10 sec: 6553.6, 60 sec: 5734.4, 300 sec: 6567.5). Total num frames: 40226816. Throughput: 0: 1474.2. Samples: 5051554. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:05:52,934][41694] Avg episode reward: [(0, '4.299')] +[2024-11-08 00:05:56,432][42004] Updated weights for policy 0, policy_version 9826 (0.0033) +[2024-11-08 00:05:57,932][41694] Fps is (10 sec: 6145.2, 60 sec: 5939.2, 300 sec: 6539.7). Total num frames: 40255488. Throughput: 0: 1464.2. Samples: 5060164. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:05:57,935][41694] Avg episode reward: [(0, '4.167')] +[2024-11-08 00:06:02,932][41694] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 6511.9). Total num frames: 40284160. Throughput: 0: 1473.5. Samples: 5064654. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:06:02,934][41694] Avg episode reward: [(0, '4.614')] +[2024-11-08 00:06:03,273][42004] Updated weights for policy 0, policy_version 9836 (0.0026) +[2024-11-08 00:06:07,932][41694] Fps is (10 sec: 4915.2, 60 sec: 5734.4, 300 sec: 6484.2). Total num frames: 40304640. Throughput: 0: 1449.5. Samples: 5070620. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:06:07,934][41694] Avg episode reward: [(0, '4.461')] +[2024-11-08 00:06:11,268][42004] Updated weights for policy 0, policy_version 9846 (0.0032) +[2024-11-08 00:06:12,931][41694] Fps is (10 sec: 5324.9, 60 sec: 5802.9, 300 sec: 6484.2). Total num frames: 40337408. Throughput: 0: 1467.3. Samples: 5080804. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:06:12,937][41694] Avg episode reward: [(0, '4.435')] +[2024-11-08 00:06:17,868][42004] Updated weights for policy 0, policy_version 9856 (0.0032) +[2024-11-08 00:06:17,931][41694] Fps is (10 sec: 6553.8, 60 sec: 5939.2, 300 sec: 6456.4). Total num frames: 40370176. Throughput: 0: 1482.9. Samples: 5085320. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:06:17,933][41694] Avg episode reward: [(0, '4.272')] +[2024-11-08 00:06:22,932][41694] Fps is (10 sec: 6553.4, 60 sec: 6076.3, 300 sec: 6442.5). Total num frames: 40402944. Throughput: 0: 1544.4. Samples: 5095640. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:06:22,942][41694] Avg episode reward: [(0, '4.589')] +[2024-11-08 00:06:23,836][42004] Updated weights for policy 0, policy_version 9866 (0.0033) +[2024-11-08 00:06:27,932][41694] Fps is (10 sec: 6553.5, 60 sec: 6007.5, 300 sec: 6484.2). Total num frames: 40435712. Throughput: 0: 1532.6. Samples: 5105268. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:06:27,934][41694] Avg episode reward: [(0, '4.494')] +[2024-11-08 00:06:30,621][42004] Updated weights for policy 0, policy_version 9876 (0.0031) +[2024-11-08 00:06:32,931][41694] Fps is (10 sec: 6144.2, 60 sec: 6212.3, 300 sec: 6456.4). Total num frames: 40464384. Throughput: 0: 1514.7. Samples: 5109574. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:06:32,933][41694] Avg episode reward: [(0, '4.637')] +[2024-11-08 00:06:36,901][42004] Updated weights for policy 0, policy_version 9886 (0.0027) +[2024-11-08 00:06:37,932][41694] Fps is (10 sec: 6144.0, 60 sec: 6212.3, 300 sec: 6428.6). Total num frames: 40497152. Throughput: 0: 1502.8. Samples: 5119180. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:06:37,936][41694] Avg episode reward: [(0, '4.493')] +[2024-11-08 00:06:42,932][41694] Fps is (10 sec: 5324.8, 60 sec: 5939.2, 300 sec: 6387.0). Total num frames: 40517632. Throughput: 0: 1452.7. Samples: 5125536. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:06:42,933][41694] Avg episode reward: [(0, '4.686')] +[2024-11-08 00:06:45,315][42004] Updated weights for policy 0, policy_version 9896 (0.2151) +[2024-11-08 00:06:47,932][41694] Fps is (10 sec: 5324.7, 60 sec: 5939.4, 300 sec: 6373.1). Total num frames: 40550400. Throughput: 0: 1463.7. Samples: 5130520. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:06:47,933][41694] Avg episode reward: [(0, '4.580')] +[2024-11-08 00:06:51,222][42004] Updated weights for policy 0, policy_version 9906 (0.0039) +[2024-11-08 00:06:52,933][41694] Fps is (10 sec: 6552.9, 60 sec: 5939.1, 300 sec: 6345.3). Total num frames: 40583168. Throughput: 0: 1565.4. Samples: 5141064. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:06:52,938][41694] Avg episode reward: [(0, '4.318')] +[2024-11-08 00:06:57,111][42004] Updated weights for policy 0, policy_version 9916 (0.0027) +[2024-11-08 00:06:57,932][41694] Fps is (10 sec: 6963.3, 60 sec: 6075.7, 300 sec: 6359.2). Total num frames: 40620032. Throughput: 0: 1570.0. Samples: 5151456. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:06:57,934][41694] Avg episode reward: [(0, '4.130')] +[2024-11-08 00:07:02,931][41694] Fps is (10 sec: 6554.3, 60 sec: 6075.7, 300 sec: 6373.1). Total num frames: 40648704. Throughput: 0: 1583.2. Samples: 5156566. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:07:02,934][41694] Avg episode reward: [(0, '4.298')] +[2024-11-08 00:07:03,708][42004] Updated weights for policy 0, policy_version 9926 (0.0034) +[2024-11-08 00:07:07,936][41694] Fps is (10 sec: 5732.0, 60 sec: 6211.8, 300 sec: 6345.2). Total num frames: 40677376. Throughput: 0: 1534.8. Samples: 5164712. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:07:07,938][41694] Avg episode reward: [(0, '4.349')] +[2024-11-08 00:07:10,326][42004] Updated weights for policy 0, policy_version 9936 (0.0036) +[2024-11-08 00:07:14,596][41694] Fps is (10 sec: 5267.4, 60 sec: 6044.6, 300 sec: 6295.9). Total num frames: 40710144. Throughput: 0: 1493.1. Samples: 5174940. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:07:14,597][41694] Avg episode reward: [(0, '4.383')] +[2024-11-08 00:07:17,932][41694] Fps is (10 sec: 5736.7, 60 sec: 6075.7, 300 sec: 6303.7). Total num frames: 40734720. Throughput: 0: 1486.2. Samples: 5176452. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:07:17,934][41694] Avg episode reward: [(0, '4.466')] +[2024-11-08 00:07:18,341][42004] Updated weights for policy 0, policy_version 9946 (0.0029) +[2024-11-08 00:07:22,932][41694] Fps is (10 sec: 6878.9, 60 sec: 6075.7, 300 sec: 6275.9). Total num frames: 40767488. Throughput: 0: 1492.9. Samples: 5186360. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:07:22,935][41694] Avg episode reward: [(0, '4.591')] +[2024-11-08 00:07:24,568][42004] Updated weights for policy 0, policy_version 9956 (0.0029) +[2024-11-08 00:07:27,932][41694] Fps is (10 sec: 6553.6, 60 sec: 6075.7, 300 sec: 6248.1). Total num frames: 40800256. Throughput: 0: 1578.7. Samples: 5196578. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:07:27,934][41694] Avg episode reward: [(0, '4.468')] +[2024-11-08 00:07:30,502][42004] Updated weights for policy 0, policy_version 9966 (0.0044) +[2024-11-08 00:07:32,932][41694] Fps is (10 sec: 6962.9, 60 sec: 6212.2, 300 sec: 6297.8). Total num frames: 40837120. Throughput: 0: 1583.8. Samples: 5201790. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:07:32,934][41694] Avg episode reward: [(0, '4.296')] +[2024-11-08 00:07:36,977][42004] Updated weights for policy 0, policy_version 9976 (0.0066) +[2024-11-08 00:07:37,932][41694] Fps is (10 sec: 6553.7, 60 sec: 6144.0, 300 sec: 6262.0). Total num frames: 40865792. Throughput: 0: 1565.4. Samples: 5211506. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:07:37,937][41694] Avg episode reward: [(0, '4.535')] +[2024-11-08 00:07:37,956][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000009977_40865792.pth... +[2024-11-08 00:07:38,102][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000009620_39403520.pth +[2024-11-08 00:07:42,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6280.4, 300 sec: 6234.2). Total num frames: 40894464. Throughput: 0: 1533.4. Samples: 5220462. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:07:42,935][41694] Avg episode reward: [(0, '4.504')] +[2024-11-08 00:07:43,643][42004] Updated weights for policy 0, policy_version 9986 (0.0046) +[2024-11-08 00:07:48,978][41694] Fps is (10 sec: 5191.0, 60 sec: 6105.8, 300 sec: 6198.4). Total num frames: 40923136. Throughput: 0: 1494.9. Samples: 5225400. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:07:48,980][41694] Avg episode reward: [(0, '4.424')] +[2024-11-08 00:07:51,894][42004] Updated weights for policy 0, policy_version 9996 (0.0036) +[2024-11-08 00:07:52,932][41694] Fps is (10 sec: 5325.2, 60 sec: 6075.8, 300 sec: 6178.7). Total num frames: 40947712. Throughput: 0: 1493.8. Samples: 5231928. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:07:52,935][41694] Avg episode reward: [(0, '4.515')] +[2024-11-08 00:07:57,770][42004] Updated weights for policy 0, policy_version 10006 (0.0032) +[2024-11-08 00:07:57,933][41694] Fps is (10 sec: 6861.4, 60 sec: 6075.6, 300 sec: 6178.7). Total num frames: 40984576. Throughput: 0: 1558.2. Samples: 5242470. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:07:57,937][41694] Avg episode reward: [(0, '4.476')] +[2024-11-08 00:08:02,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6144.0, 300 sec: 6164.8). Total num frames: 41017344. Throughput: 0: 1574.2. Samples: 5247292. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:08:02,933][41694] Avg episode reward: [(0, '4.416')] +[2024-11-08 00:08:04,092][42004] Updated weights for policy 0, policy_version 10016 (0.0034) +[2024-11-08 00:08:07,932][41694] Fps is (10 sec: 6144.7, 60 sec: 6144.4, 300 sec: 6186.1). Total num frames: 41046016. Throughput: 0: 1566.1. Samples: 5256834. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:08:07,933][41694] Avg episode reward: [(0, '4.295')] +[2024-11-08 00:08:10,699][42004] Updated weights for policy 0, policy_version 10026 (0.0043) +[2024-11-08 00:08:12,932][41694] Fps is (10 sec: 6144.0, 60 sec: 6319.3, 300 sec: 6178.7). Total num frames: 41078784. Throughput: 0: 1543.9. Samples: 5266052. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:08:12,934][41694] Avg episode reward: [(0, '4.327')] +[2024-11-08 00:08:17,285][42004] Updated weights for policy 0, policy_version 10036 (0.0047) +[2024-11-08 00:08:17,932][41694] Fps is (10 sec: 6553.7, 60 sec: 6280.5, 300 sec: 6150.9). Total num frames: 41111552. Throughput: 0: 1528.7. Samples: 5270580. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:08:17,935][41694] Avg episode reward: [(0, '4.573')] +[2024-11-08 00:08:23,369][41694] Fps is (10 sec: 5101.8, 60 sec: 6031.8, 300 sec: 6100.2). Total num frames: 41132032. Throughput: 0: 1523.5. Samples: 5280728. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:08:23,371][41694] Avg episode reward: [(0, '4.613')] +[2024-11-08 00:08:25,447][42004] Updated weights for policy 0, policy_version 10046 (0.0038) +[2024-11-08 00:08:27,932][41694] Fps is (10 sec: 5324.7, 60 sec: 6075.7, 300 sec: 6095.4). Total num frames: 41164800. Throughput: 0: 1483.4. Samples: 5287216. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:08:27,934][41694] Avg episode reward: [(0, '4.365')] +[2024-11-08 00:08:31,440][42004] Updated weights for policy 0, policy_version 10056 (0.0032) +[2024-11-08 00:08:32,934][41694] Fps is (10 sec: 6851.5, 60 sec: 6007.3, 300 sec: 6081.5). Total num frames: 41197568. Throughput: 0: 1525.4. Samples: 5292448. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:08:32,938][41694] Avg episode reward: [(0, '4.429')] +[2024-11-08 00:08:37,389][42004] Updated weights for policy 0, policy_version 10066 (0.0038) +[2024-11-08 00:08:37,932][41694] Fps is (10 sec: 6553.6, 60 sec: 6075.7, 300 sec: 6067.7). Total num frames: 41230336. Throughput: 0: 1573.4. Samples: 5302732. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:08:37,934][41694] Avg episode reward: [(0, '4.408')] +[2024-11-08 00:08:42,931][41694] Fps is (10 sec: 6964.8, 60 sec: 6212.4, 300 sec: 6118.5). Total num frames: 41267200. Throughput: 0: 1569.7. Samples: 5313104. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:08:42,941][41694] Avg episode reward: [(0, '4.324')] +[2024-11-08 00:08:43,357][42004] Updated weights for policy 0, policy_version 10076 (0.0042) +[2024-11-08 00:08:47,932][41694] Fps is (10 sec: 6553.3, 60 sec: 6322.5, 300 sec: 6095.5). Total num frames: 41295872. Throughput: 0: 1566.3. Samples: 5317778. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:08:47,938][41694] Avg episode reward: [(0, '4.443')] +[2024-11-08 00:08:50,200][42004] Updated weights for policy 0, policy_version 10086 (0.0041) +[2024-11-08 00:08:52,931][41694] Fps is (10 sec: 6144.0, 60 sec: 6348.8, 300 sec: 6095.4). Total num frames: 41328640. Throughput: 0: 1558.3. Samples: 5326958. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:08:52,937][41694] Avg episode reward: [(0, '4.435')] +[2024-11-08 00:08:57,932][41694] Fps is (10 sec: 5324.8, 60 sec: 6075.8, 300 sec: 6067.6). Total num frames: 41349120. Throughput: 0: 1530.6. Samples: 5334928. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:08:57,937][41694] Avg episode reward: [(0, '4.367')] +[2024-11-08 00:08:58,206][42004] Updated weights for policy 0, policy_version 10096 (0.0024) +[2024-11-08 00:09:02,932][41694] Fps is (10 sec: 5324.8, 60 sec: 6075.7, 300 sec: 6067.6). Total num frames: 41381888. Throughput: 0: 1515.6. Samples: 5338780. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:09:02,936][41694] Avg episode reward: [(0, '4.420')] +[2024-11-08 00:09:04,565][42004] Updated weights for policy 0, policy_version 10106 (0.0029) +[2024-11-08 00:09:07,933][41694] Fps is (10 sec: 6552.8, 60 sec: 6143.9, 300 sec: 6067.6). Total num frames: 41414656. Throughput: 0: 1516.8. Samples: 5348322. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:09:07,937][41694] Avg episode reward: [(0, '4.443')] +[2024-11-08 00:09:10,773][42004] Updated weights for policy 0, policy_version 10116 (0.0039) +[2024-11-08 00:09:12,932][41694] Fps is (10 sec: 6553.0, 60 sec: 6143.9, 300 sec: 6053.7). Total num frames: 41447424. Throughput: 0: 1580.8. Samples: 5358354. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:09:12,934][41694] Avg episode reward: [(0, '4.442')] +[2024-11-08 00:09:17,031][42004] Updated weights for policy 0, policy_version 10126 (0.0030) +[2024-11-08 00:09:17,932][41694] Fps is (10 sec: 6554.6, 60 sec: 6144.0, 300 sec: 6095.4). Total num frames: 41480192. Throughput: 0: 1569.5. Samples: 5363070. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:09:17,933][41694] Avg episode reward: [(0, '4.347')] +[2024-11-08 00:09:22,932][41694] Fps is (10 sec: 6144.6, 60 sec: 6326.6, 300 sec: 6067.6). Total num frames: 41508864. Throughput: 0: 1538.2. Samples: 5371952. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:09:22,934][41694] Avg episode reward: [(0, '4.473')] +[2024-11-08 00:09:24,186][42004] Updated weights for policy 0, policy_version 10136 (0.0030) +[2024-11-08 00:09:27,932][41694] Fps is (10 sec: 6144.0, 60 sec: 6280.5, 300 sec: 6053.8). Total num frames: 41541632. Throughput: 0: 1530.6. Samples: 5381980. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:09:27,934][41694] Avg episode reward: [(0, '4.486')] +[2024-11-08 00:09:29,723][42004] Updated weights for policy 0, policy_version 10146 (0.0029) +[2024-11-08 00:09:32,931][41694] Fps is (10 sec: 5734.5, 60 sec: 6144.2, 300 sec: 6012.1). Total num frames: 41566208. Throughput: 0: 1551.9. Samples: 5387612. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:09:32,933][41694] Avg episode reward: [(0, '4.402')] +[2024-11-08 00:09:37,214][42004] Updated weights for policy 0, policy_version 10156 (0.0029) +[2024-11-08 00:09:37,933][41694] Fps is (10 sec: 6143.0, 60 sec: 6212.1, 300 sec: 6039.8). Total num frames: 41603072. Throughput: 0: 1513.4. Samples: 5395062. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:09:37,937][41694] Avg episode reward: [(0, '4.339')] +[2024-11-08 00:09:37,950][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000010157_41603072.pth... +[2024-11-08 00:09:38,073][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000009797_40128512.pth +[2024-11-08 00:09:42,932][41694] Fps is (10 sec: 6963.0, 60 sec: 6144.0, 300 sec: 6039.9). Total num frames: 41635840. Throughput: 0: 1569.9. Samples: 5405572. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:09:42,933][41694] Avg episode reward: [(0, '4.480')] +[2024-11-08 00:09:43,238][42004] Updated weights for policy 0, policy_version 10166 (0.0033) +[2024-11-08 00:09:47,931][41694] Fps is (10 sec: 6964.5, 60 sec: 6280.6, 300 sec: 6067.6). Total num frames: 41672704. Throughput: 0: 1598.2. Samples: 5410698. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:09:47,933][41694] Avg episode reward: [(0, '4.521')] +[2024-11-08 00:09:48,701][42004] Updated weights for policy 0, policy_version 10176 (0.0032) +[2024-11-08 00:09:52,931][41694] Fps is (10 sec: 7373.0, 60 sec: 6348.8, 300 sec: 6137.1). Total num frames: 41709568. Throughput: 0: 1638.8. Samples: 5422064. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:09:52,933][41694] Avg episode reward: [(0, '4.328')] +[2024-11-08 00:09:54,658][42004] Updated weights for policy 0, policy_version 10186 (0.0042) +[2024-11-08 00:09:57,932][41694] Fps is (10 sec: 6962.8, 60 sec: 6553.6, 300 sec: 6137.0). Total num frames: 41742336. Throughput: 0: 1637.9. Samples: 5432060. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:09:57,934][41694] Avg episode reward: [(0, '4.417')] +[2024-11-08 00:10:00,543][42004] Updated weights for policy 0, policy_version 10196 (0.0038) +[2024-11-08 00:10:02,932][41694] Fps is (10 sec: 6553.5, 60 sec: 6553.6, 300 sec: 6151.0). Total num frames: 41775104. Throughput: 0: 1656.1. Samples: 5437594. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:10:02,933][41694] Avg episode reward: [(0, '4.422')] +[2024-11-08 00:10:07,932][41694] Fps is (10 sec: 5734.6, 60 sec: 6417.2, 300 sec: 6137.1). Total num frames: 41799680. Throughput: 0: 1632.8. Samples: 5445428. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:10:07,933][41694] Avg episode reward: [(0, '4.553')] +[2024-11-08 00:10:08,307][42004] Updated weights for policy 0, policy_version 10206 (0.0030) +[2024-11-08 00:10:12,932][41694] Fps is (10 sec: 6144.0, 60 sec: 6485.4, 300 sec: 6178.7). Total num frames: 41836544. Throughput: 0: 1633.6. Samples: 5455494. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:10:12,934][41694] Avg episode reward: [(0, '4.460')] +[2024-11-08 00:10:13,889][42004] Updated weights for policy 0, policy_version 10216 (0.0026) +[2024-11-08 00:10:17,932][41694] Fps is (10 sec: 6553.5, 60 sec: 6417.1, 300 sec: 6192.7). Total num frames: 41865216. Throughput: 0: 1612.6. Samples: 5460178. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:10:17,934][41694] Avg episode reward: [(0, '4.380')] +[2024-11-08 00:10:20,702][42004] Updated weights for policy 0, policy_version 10226 (0.0050) +[2024-11-08 00:10:22,931][41694] Fps is (10 sec: 6144.0, 60 sec: 6485.3, 300 sec: 6178.7). Total num frames: 41897984. Throughput: 0: 1652.7. Samples: 5469430. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:10:22,934][41694] Avg episode reward: [(0, '4.403')] +[2024-11-08 00:10:26,359][42004] Updated weights for policy 0, policy_version 10236 (0.0028) +[2024-11-08 00:10:27,933][41694] Fps is (10 sec: 6962.5, 60 sec: 6553.5, 300 sec: 6248.1). Total num frames: 41934848. Throughput: 0: 1662.7. Samples: 5480396. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:10:27,934][41694] Avg episode reward: [(0, '4.270')] +[2024-11-08 00:10:32,540][42004] Updated weights for policy 0, policy_version 10246 (0.0035) +[2024-11-08 00:10:32,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6690.1, 300 sec: 6248.1). Total num frames: 41967616. Throughput: 0: 1659.6. Samples: 5485382. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:10:32,934][41694] Avg episode reward: [(0, '4.338')] +[2024-11-08 00:10:37,801][42004] Updated weights for policy 0, policy_version 10256 (0.0032) +[2024-11-08 00:10:37,931][41694] Fps is (10 sec: 7373.6, 60 sec: 6758.6, 300 sec: 6262.0). Total num frames: 42008576. Throughput: 0: 1647.7. Samples: 5496210. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:10:37,933][41694] Avg episode reward: [(0, '4.428')] +[2024-11-08 00:10:42,932][41694] Fps is (10 sec: 6143.9, 60 sec: 6553.6, 300 sec: 6220.4). Total num frames: 42029056. Throughput: 0: 1594.4. Samples: 5503806. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:10:42,934][41694] Avg episode reward: [(0, '4.352')] +[2024-11-08 00:10:45,297][42004] Updated weights for policy 0, policy_version 10266 (0.0025) +[2024-11-08 00:10:47,931][41694] Fps is (10 sec: 5734.4, 60 sec: 6553.6, 300 sec: 6234.3). Total num frames: 42065920. Throughput: 0: 1595.5. Samples: 5509390. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:10:47,933][41694] Avg episode reward: [(0, '4.397')] +[2024-11-08 00:10:50,634][42004] Updated weights for policy 0, policy_version 10276 (0.0028) +[2024-11-08 00:10:52,932][41694] Fps is (10 sec: 7782.5, 60 sec: 6621.9, 300 sec: 6275.9). Total num frames: 42106880. Throughput: 0: 1677.9. Samples: 5520934. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:10:52,934][41694] Avg episode reward: [(0, '4.347')] +[2024-11-08 00:10:55,745][42004] Updated weights for policy 0, policy_version 10286 (0.0026) +[2024-11-08 00:10:57,932][41694] Fps is (10 sec: 7782.1, 60 sec: 6690.1, 300 sec: 6303.7). Total num frames: 42143744. Throughput: 0: 1717.7. Samples: 5532792. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:10:57,934][41694] Avg episode reward: [(0, '4.397')] +[2024-11-08 00:11:01,647][42004] Updated weights for policy 0, policy_version 10296 (0.0037) +[2024-11-08 00:11:02,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6690.2, 300 sec: 6345.3). Total num frames: 42176512. Throughput: 0: 1736.1. Samples: 5538302. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:11:02,935][41694] Avg episode reward: [(0, '4.429')] +[2024-11-08 00:11:07,871][42004] Updated weights for policy 0, policy_version 10306 (0.0032) +[2024-11-08 00:11:07,932][41694] Fps is (10 sec: 6963.4, 60 sec: 6894.9, 300 sec: 6359.2). Total num frames: 42213376. Throughput: 0: 1731.2. Samples: 5547334. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:11:07,934][41694] Avg episode reward: [(0, '4.580')] +[2024-11-08 00:11:12,931][41694] Fps is (10 sec: 6553.5, 60 sec: 6758.4, 300 sec: 6345.3). Total num frames: 42242048. Throughput: 0: 1714.1. Samples: 5557528. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:11:12,933][41694] Avg episode reward: [(0, '4.499')] +[2024-11-08 00:11:16,375][42004] Updated weights for policy 0, policy_version 10316 (0.0025) +[2024-11-08 00:11:17,931][41694] Fps is (10 sec: 4915.3, 60 sec: 6621.9, 300 sec: 6303.7). Total num frames: 42262528. Throughput: 0: 1649.6. Samples: 5559612. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:11:17,933][41694] Avg episode reward: [(0, '4.342')] +[2024-11-08 00:11:22,569][42004] Updated weights for policy 0, policy_version 10326 (0.0025) +[2024-11-08 00:11:22,932][41694] Fps is (10 sec: 5324.6, 60 sec: 6621.8, 300 sec: 6303.7). Total num frames: 42295296. Throughput: 0: 1614.9. Samples: 5568882. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:11:22,934][41694] Avg episode reward: [(0, '4.459')] +[2024-11-08 00:11:27,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6622.0, 300 sec: 6331.4). Total num frames: 42332160. Throughput: 0: 1680.7. Samples: 5579438. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:11:27,933][41694] Avg episode reward: [(0, '4.434')] +[2024-11-08 00:11:28,301][42004] Updated weights for policy 0, policy_version 10336 (0.0029) +[2024-11-08 00:11:32,931][41694] Fps is (10 sec: 7373.1, 60 sec: 6690.1, 300 sec: 6345.3). Total num frames: 42369024. Throughput: 0: 1670.8. Samples: 5584578. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:11:32,933][41694] Avg episode reward: [(0, '4.449')] +[2024-11-08 00:11:33,777][42004] Updated weights for policy 0, policy_version 10346 (0.0030) +[2024-11-08 00:11:37,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6553.6, 300 sec: 6387.0). Total num frames: 42401792. Throughput: 0: 1653.2. Samples: 5595326. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:11:37,934][41694] Avg episode reward: [(0, '4.442')] +[2024-11-08 00:11:37,948][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000010352_42401792.pth... +[2024-11-08 00:11:38,088][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000009977_40865792.pth +[2024-11-08 00:11:39,942][42004] Updated weights for policy 0, policy_version 10356 (0.0027) +[2024-11-08 00:11:42,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6826.7, 300 sec: 6400.9). Total num frames: 42438656. Throughput: 0: 1615.5. Samples: 5605488. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:11:42,933][41694] Avg episode reward: [(0, '4.499')] +[2024-11-08 00:11:45,646][42004] Updated weights for policy 0, policy_version 10366 (0.0023) +[2024-11-08 00:11:49,835][41694] Fps is (10 sec: 5849.7, 60 sec: 6550.6, 300 sec: 6359.8). Total num frames: 42471424. Throughput: 0: 1553.1. Samples: 5611148. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:11:49,837][41694] Avg episode reward: [(0, '4.622')] +[2024-11-08 00:11:52,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6485.3, 300 sec: 6359.2). Total num frames: 42496000. Throughput: 0: 1585.6. Samples: 5618684. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:11:52,933][41694] Avg episode reward: [(0, '4.506')] +[2024-11-08 00:11:53,008][42004] Updated weights for policy 0, policy_version 10376 (0.0032) +[2024-11-08 00:11:57,932][41694] Fps is (10 sec: 8094.5, 60 sec: 6553.6, 300 sec: 6400.9). Total num frames: 42536960. Throughput: 0: 1614.5. Samples: 5630182. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:11:57,934][41694] Avg episode reward: [(0, '4.385')] +[2024-11-08 00:11:58,419][42004] Updated weights for policy 0, policy_version 10386 (0.0026) +[2024-11-08 00:12:02,933][41694] Fps is (10 sec: 7781.5, 60 sec: 6621.7, 300 sec: 6428.7). Total num frames: 42573824. Throughput: 0: 1701.6. Samples: 5636186. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:12:02,935][41694] Avg episode reward: [(0, '4.678')] +[2024-11-08 00:12:03,702][42004] Updated weights for policy 0, policy_version 10396 (0.0029) +[2024-11-08 00:12:07,932][41694] Fps is (10 sec: 7372.1, 60 sec: 6621.8, 300 sec: 6479.0). Total num frames: 42610688. Throughput: 0: 1743.6. Samples: 5647346. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:12:07,940][41694] Avg episode reward: [(0, '4.346')] +[2024-11-08 00:12:09,344][42004] Updated weights for policy 0, policy_version 10406 (0.0021) +[2024-11-08 00:12:12,932][41694] Fps is (10 sec: 7373.6, 60 sec: 6758.4, 300 sec: 6484.2). Total num frames: 42647552. Throughput: 0: 1744.8. Samples: 5657956. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:12:12,933][41694] Avg episode reward: [(0, '4.444')] +[2024-11-08 00:12:15,031][42004] Updated weights for policy 0, policy_version 10416 (0.0028) +[2024-11-08 00:12:17,932][41694] Fps is (10 sec: 7373.5, 60 sec: 7031.5, 300 sec: 6498.1). Total num frames: 42684416. Throughput: 0: 1753.6. Samples: 5663492. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:12:17,934][41694] Avg episode reward: [(0, '4.305')] +[2024-11-08 00:12:20,287][42004] Updated weights for policy 0, policy_version 10426 (0.0026) +[2024-11-08 00:12:24,181][41694] Fps is (10 sec: 6189.7, 60 sec: 6888.0, 300 sec: 6470.7). Total num frames: 42717184. Throughput: 0: 1729.7. Samples: 5675326. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:12:24,185][41694] Avg episode reward: [(0, '4.253')] +[2024-11-08 00:12:27,673][42004] Updated weights for policy 0, policy_version 10436 (0.0023) +[2024-11-08 00:12:27,932][41694] Fps is (10 sec: 6143.5, 60 sec: 6894.8, 300 sec: 6470.3). Total num frames: 42745856. Throughput: 0: 1716.8. Samples: 5682744. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:12:27,935][41694] Avg episode reward: [(0, '4.303')] +[2024-11-08 00:12:32,931][41694] Fps is (10 sec: 7489.6, 60 sec: 6894.9, 300 sec: 6498.1). Total num frames: 42782720. Throughput: 0: 1796.0. Samples: 5688548. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:12:32,933][41694] Avg episode reward: [(0, '4.237')] +[2024-11-08 00:12:32,936][42004] Updated weights for policy 0, policy_version 10446 (0.0033) +[2024-11-08 00:12:37,931][41694] Fps is (10 sec: 7783.0, 60 sec: 7031.5, 300 sec: 6539.7). Total num frames: 42823680. Throughput: 0: 1807.6. Samples: 5700028. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:12:37,934][41694] Avg episode reward: [(0, '4.391')] +[2024-11-08 00:12:38,432][42004] Updated weights for policy 0, policy_version 10456 (0.0038) +[2024-11-08 00:12:42,932][41694] Fps is (10 sec: 7782.4, 60 sec: 7031.5, 300 sec: 6590.9). Total num frames: 42860544. Throughput: 0: 1798.7. Samples: 5711124. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:12:42,934][41694] Avg episode reward: [(0, '4.544')] +[2024-11-08 00:12:44,155][42004] Updated weights for policy 0, policy_version 10466 (0.0026) +[2024-11-08 00:12:47,932][41694] Fps is (10 sec: 6963.2, 60 sec: 7261.9, 300 sec: 6595.3). Total num frames: 42893312. Throughput: 0: 1779.2. Samples: 5716250. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:12:47,933][41694] Avg episode reward: [(0, '4.307')] +[2024-11-08 00:12:49,832][42004] Updated weights for policy 0, policy_version 10476 (0.0025) +[2024-11-08 00:12:52,933][41694] Fps is (10 sec: 6962.4, 60 sec: 7236.1, 300 sec: 6595.3). Total num frames: 42930176. Throughput: 0: 1777.5. Samples: 5727334. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:12:52,935][41694] Avg episode reward: [(0, '4.434')] +[2024-11-08 00:12:55,275][42004] Updated weights for policy 0, policy_version 10486 (0.0026) +[2024-11-08 00:12:58,549][41694] Fps is (10 sec: 6172.3, 60 sec: 6959.8, 300 sec: 6567.6). Total num frames: 42958848. Throughput: 0: 1645.3. Samples: 5733010. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:12:58,553][41694] Avg episode reward: [(0, '4.604')] +[2024-11-08 00:13:02,931][41694] Fps is (10 sec: 5735.1, 60 sec: 6895.1, 300 sec: 6581.4). Total num frames: 42987520. Throughput: 0: 1704.4. Samples: 5740190. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:13:02,934][41694] Avg episode reward: [(0, '4.456')] +[2024-11-08 00:13:03,132][42004] Updated weights for policy 0, policy_version 10496 (0.0034) +[2024-11-08 00:13:07,932][41694] Fps is (10 sec: 6985.1, 60 sec: 6895.0, 300 sec: 6595.3). Total num frames: 43024384. Throughput: 0: 1730.8. Samples: 5751050. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:13:07,933][41694] Avg episode reward: [(0, '4.399')] +[2024-11-08 00:13:08,566][42004] Updated weights for policy 0, policy_version 10506 (0.0043) +[2024-11-08 00:13:12,932][41694] Fps is (10 sec: 7782.3, 60 sec: 6963.2, 300 sec: 6623.0). Total num frames: 43065344. Throughput: 0: 1772.7. Samples: 5762512. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:13:12,934][41694] Avg episode reward: [(0, '4.728')] +[2024-11-08 00:13:13,748][42004] Updated weights for policy 0, policy_version 10516 (0.0021) +[2024-11-08 00:13:17,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6895.0, 300 sec: 6674.6). Total num frames: 43098112. Throughput: 0: 1767.6. Samples: 5768092. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:13:17,935][41694] Avg episode reward: [(0, '4.420')] +[2024-11-08 00:13:19,861][42004] Updated weights for policy 0, policy_version 10526 (0.0036) +[2024-11-08 00:13:22,931][41694] Fps is (10 sec: 7372.8, 60 sec: 7181.0, 300 sec: 6692.5). Total num frames: 43139072. Throughput: 0: 1746.8. Samples: 5778636. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:13:22,933][41694] Avg episode reward: [(0, '4.528')] +[2024-11-08 00:13:25,146][42004] Updated weights for policy 0, policy_version 10536 (0.0044) +[2024-11-08 00:13:27,932][41694] Fps is (10 sec: 7782.2, 60 sec: 7168.1, 300 sec: 6706.4). Total num frames: 43175936. Throughput: 0: 1750.7. Samples: 5789906. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:13:27,934][41694] Avg episode reward: [(0, '4.376')] +[2024-11-08 00:13:30,479][42004] Updated weights for policy 0, policy_version 10546 (0.0030) +[2024-11-08 00:13:33,061][41694] Fps is (10 sec: 5661.2, 60 sec: 6880.1, 300 sec: 6661.8). Total num frames: 43196416. Throughput: 0: 1762.0. Samples: 5795768. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:13:33,062][41694] Avg episode reward: [(0, '4.560')] +[2024-11-08 00:13:37,931][41694] Fps is (10 sec: 5734.5, 60 sec: 6826.7, 300 sec: 6664.7). Total num frames: 43233280. Throughput: 0: 1673.1. Samples: 5802622. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:13:37,934][41694] Avg episode reward: [(0, '4.396')] +[2024-11-08 00:13:37,945][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000010555_43233280.pth... +[2024-11-08 00:13:38,080][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000010157_41603072.pth +[2024-11-08 00:13:38,339][42004] Updated weights for policy 0, policy_version 10556 (0.0033) +[2024-11-08 00:13:42,932][41694] Fps is (10 sec: 7469.4, 60 sec: 6826.7, 300 sec: 6692.5). Total num frames: 43270144. Throughput: 0: 1821.9. Samples: 5813872. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:13:42,933][41694] Avg episode reward: [(0, '4.420')] +[2024-11-08 00:13:43,566][42004] Updated weights for policy 0, policy_version 10566 (0.0032) +[2024-11-08 00:13:47,931][41694] Fps is (10 sec: 7782.4, 60 sec: 6963.2, 300 sec: 6720.2). Total num frames: 43311104. Throughput: 0: 1773.7. Samples: 5820008. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:13:47,934][41694] Avg episode reward: [(0, '4.491')] +[2024-11-08 00:13:48,669][42004] Updated weights for policy 0, policy_version 10576 (0.0028) +[2024-11-08 00:13:52,931][41694] Fps is (10 sec: 7782.5, 60 sec: 6963.3, 300 sec: 6775.8). Total num frames: 43347968. Throughput: 0: 1783.2. Samples: 5831294. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:13:52,935][41694] Avg episode reward: [(0, '4.448')] +[2024-11-08 00:13:54,443][42004] Updated weights for policy 0, policy_version 10586 (0.0041) +[2024-11-08 00:13:57,932][41694] Fps is (10 sec: 7372.8, 60 sec: 7173.6, 300 sec: 6789.6). Total num frames: 43384832. Throughput: 0: 1780.7. Samples: 5842642. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:13:57,934][41694] Avg episode reward: [(0, '4.237')] +[2024-11-08 00:13:59,679][42004] Updated weights for policy 0, policy_version 10596 (0.0026) +[2024-11-08 00:14:02,931][41694] Fps is (10 sec: 7372.8, 60 sec: 7236.3, 300 sec: 6803.6). Total num frames: 43421696. Throughput: 0: 1782.7. Samples: 5848312. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:14:02,933][41694] Avg episode reward: [(0, '4.327')] +[2024-11-08 00:14:07,655][42004] Updated weights for policy 0, policy_version 10606 (0.0027) +[2024-11-08 00:14:07,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6963.2, 300 sec: 6761.9). Total num frames: 43442176. Throughput: 0: 1763.5. Samples: 5857992. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:14:07,934][41694] Avg episode reward: [(0, '4.302')] +[2024-11-08 00:14:12,931][41694] Fps is (10 sec: 5734.4, 60 sec: 6894.9, 300 sec: 6775.8). Total num frames: 43479040. Throughput: 0: 1682.5. Samples: 5865618. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:14:12,933][41694] Avg episode reward: [(0, '4.628')] +[2024-11-08 00:14:13,482][42004] Updated weights for policy 0, policy_version 10616 (0.0026) +[2024-11-08 00:14:17,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6963.2, 300 sec: 6803.5). Total num frames: 43515904. Throughput: 0: 1684.0. Samples: 5871330. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:14:17,933][41694] Avg episode reward: [(0, '4.423')] +[2024-11-08 00:14:18,523][42004] Updated weights for policy 0, policy_version 10626 (0.0043) +[2024-11-08 00:14:22,931][41694] Fps is (10 sec: 7782.4, 60 sec: 6963.2, 300 sec: 6831.3). Total num frames: 43556864. Throughput: 0: 1793.1. Samples: 5883310. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:14:22,933][41694] Avg episode reward: [(0, '4.429')] +[2024-11-08 00:14:24,062][42004] Updated weights for policy 0, policy_version 10636 (0.0024) +[2024-11-08 00:14:27,931][41694] Fps is (10 sec: 7373.0, 60 sec: 6894.9, 300 sec: 6859.1). Total num frames: 43589632. Throughput: 0: 1780.0. Samples: 5893972. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:14:27,935][41694] Avg episode reward: [(0, '4.348')] +[2024-11-08 00:14:29,536][42004] Updated weights for policy 0, policy_version 10646 (0.0034) +[2024-11-08 00:14:32,932][41694] Fps is (10 sec: 7372.7, 60 sec: 7251.9, 300 sec: 6873.0). Total num frames: 43630592. Throughput: 0: 1776.0. Samples: 5899928. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:14:32,933][41694] Avg episode reward: [(0, '4.439')] +[2024-11-08 00:14:34,832][42004] Updated weights for policy 0, policy_version 10656 (0.0028) +[2024-11-08 00:14:37,932][41694] Fps is (10 sec: 7782.2, 60 sec: 7236.2, 300 sec: 6886.8). Total num frames: 43667456. Throughput: 0: 1785.0. Samples: 5911618. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:14:37,934][41694] Avg episode reward: [(0, '4.104')] +[2024-11-08 00:14:42,514][42004] Updated weights for policy 0, policy_version 10666 (0.0033) +[2024-11-08 00:14:42,932][41694] Fps is (10 sec: 5734.0, 60 sec: 6963.1, 300 sec: 6831.3). Total num frames: 43687936. Throughput: 0: 1689.8. Samples: 5918686. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:14:42,943][41694] Avg episode reward: [(0, '4.209')] +[2024-11-08 00:14:47,933][41694] Fps is (10 sec: 5324.3, 60 sec: 6826.5, 300 sec: 6817.4). Total num frames: 43720704. Throughput: 0: 1671.2. Samples: 5923518. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:14:47,937][41694] Avg episode reward: [(0, '4.393')] +[2024-11-08 00:14:48,517][42004] Updated weights for policy 0, policy_version 10676 (0.0043) +[2024-11-08 00:14:52,932][41694] Fps is (10 sec: 7373.2, 60 sec: 6894.9, 300 sec: 6845.2). Total num frames: 43761664. Throughput: 0: 1699.6. Samples: 5934472. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:14:52,934][41694] Avg episode reward: [(0, '4.340')] +[2024-11-08 00:14:54,002][42004] Updated weights for policy 0, policy_version 10686 (0.0035) +[2024-11-08 00:14:57,931][41694] Fps is (10 sec: 7373.7, 60 sec: 6826.7, 300 sec: 6845.2). Total num frames: 43794432. Throughput: 0: 1772.0. Samples: 5945356. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:14:57,933][41694] Avg episode reward: [(0, '4.565')] +[2024-11-08 00:14:59,904][42004] Updated weights for policy 0, policy_version 10696 (0.0026) +[2024-11-08 00:15:02,932][41694] Fps is (10 sec: 6553.6, 60 sec: 6758.4, 300 sec: 6872.9). Total num frames: 43827200. Throughput: 0: 1754.6. Samples: 5950286. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-11-08 00:15:02,935][41694] Avg episode reward: [(0, '4.360')] +[2024-11-08 00:15:05,773][42004] Updated weights for policy 0, policy_version 10706 (0.0033) +[2024-11-08 00:15:07,932][41694] Fps is (10 sec: 6963.0, 60 sec: 7031.5, 300 sec: 6872.9). Total num frames: 43864064. Throughput: 0: 1725.4. Samples: 5960954. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-11-08 00:15:07,934][41694] Avg episode reward: [(0, '4.280')] +[2024-11-08 00:15:11,340][42004] Updated weights for policy 0, policy_version 10716 (0.0024) +[2024-11-08 00:15:12,937][41694] Fps is (10 sec: 7368.9, 60 sec: 7030.8, 300 sec: 6900.6). Total num frames: 43900928. Throughput: 0: 1725.8. Samples: 5971644. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:15:12,941][41694] Avg episode reward: [(0, '4.347')] +[2024-11-08 00:15:17,932][41694] Fps is (10 sec: 5324.7, 60 sec: 6690.1, 300 sec: 6845.2). Total num frames: 43917312. Throughput: 0: 1664.7. Samples: 5974838. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:15:17,934][41694] Avg episode reward: [(0, '4.389')] +[2024-11-08 00:15:20,269][42004] Updated weights for policy 0, policy_version 10726 (0.0048) +[2024-11-08 00:15:22,932][41694] Fps is (10 sec: 4917.8, 60 sec: 6553.6, 300 sec: 6831.3). Total num frames: 43950080. Throughput: 0: 1572.9. Samples: 5982400. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:15:22,933][41694] Avg episode reward: [(0, '4.211')] +[2024-11-08 00:15:25,715][42004] Updated weights for policy 0, policy_version 10736 (0.0023) +[2024-11-08 00:15:27,932][41694] Fps is (10 sec: 7372.9, 60 sec: 6690.1, 300 sec: 6859.1). Total num frames: 43991040. Throughput: 0: 1665.4. Samples: 5993628. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-08 00:15:27,934][41694] Avg episode reward: [(0, '4.625')] +[2024-11-08 00:15:31,220][42004] Updated weights for policy 0, policy_version 10746 (0.0033) +[2024-11-08 00:15:32,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6553.6, 300 sec: 6831.3). Total num frames: 44023808. Throughput: 0: 1685.3. Samples: 5999356. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-08 00:15:32,933][41694] Avg episode reward: [(0, '4.396')] +[2024-11-08 00:15:36,746][42004] Updated weights for policy 0, policy_version 10756 (0.0022) +[2024-11-08 00:15:37,932][41694] Fps is (10 sec: 7372.9, 60 sec: 6621.9, 300 sec: 6900.7). Total num frames: 44064768. Throughput: 0: 1685.5. Samples: 6010318. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:15:37,933][41694] Avg episode reward: [(0, '4.374')] +[2024-11-08 00:15:37,942][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000010758_44064768.pth... +[2024-11-08 00:15:38,058][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000010352_42401792.pth +[2024-11-08 00:15:42,191][42004] Updated weights for policy 0, policy_version 10766 (0.0029) +[2024-11-08 00:15:42,932][41694] Fps is (10 sec: 7782.2, 60 sec: 6895.0, 300 sec: 6900.7). Total num frames: 44101632. Throughput: 0: 1697.6. Samples: 6021748. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:15:42,934][41694] Avg episode reward: [(0, '4.435')] +[2024-11-08 00:15:47,619][42004] Updated weights for policy 0, policy_version 10776 (0.0029) +[2024-11-08 00:15:47,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6963.3, 300 sec: 6886.8). Total num frames: 44138496. Throughput: 0: 1711.7. Samples: 6027312. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:15:47,935][41694] Avg episode reward: [(0, '4.364')] +[2024-11-08 00:15:52,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6621.9, 300 sec: 6831.3). Total num frames: 44158976. Throughput: 0: 1631.0. Samples: 6034350. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:15:52,934][41694] Avg episode reward: [(0, '4.448')] +[2024-11-08 00:15:55,462][42004] Updated weights for policy 0, policy_version 10786 (0.0031) +[2024-11-08 00:15:57,932][41694] Fps is (10 sec: 5734.5, 60 sec: 6690.1, 300 sec: 6845.2). Total num frames: 44195840. Throughput: 0: 1640.2. Samples: 6045444. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:15:57,933][41694] Avg episode reward: [(0, '4.349')] +[2024-11-08 00:16:00,768][42004] Updated weights for policy 0, policy_version 10796 (0.0029) +[2024-11-08 00:16:02,933][41694] Fps is (10 sec: 7372.9, 60 sec: 6758.4, 300 sec: 6845.2). Total num frames: 44232704. Throughput: 0: 1701.7. Samples: 6051412. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:16:02,935][41694] Avg episode reward: [(0, '4.227')] +[2024-11-08 00:16:06,738][42004] Updated weights for policy 0, policy_version 10806 (0.0031) +[2024-11-08 00:16:07,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6758.4, 300 sec: 6872.9). Total num frames: 44269568. Throughput: 0: 1757.9. Samples: 6061504. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:16:07,934][41694] Avg episode reward: [(0, '4.226')] +[2024-11-08 00:16:12,774][42004] Updated weights for policy 0, policy_version 10816 (0.0036) +[2024-11-08 00:16:12,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6690.7, 300 sec: 6914.6). Total num frames: 44302336. Throughput: 0: 1734.1. Samples: 6071662. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:16:12,933][41694] Avg episode reward: [(0, '4.329')] +[2024-11-08 00:16:17,932][41694] Fps is (10 sec: 6963.2, 60 sec: 7031.5, 300 sec: 6928.5). Total num frames: 44339200. Throughput: 0: 1725.5. Samples: 6077004. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:16:17,934][41694] Avg episode reward: [(0, '4.452')] +[2024-11-08 00:16:18,271][42004] Updated weights for policy 0, policy_version 10826 (0.0033) +[2024-11-08 00:16:22,932][41694] Fps is (10 sec: 7372.5, 60 sec: 7099.7, 300 sec: 6928.5). Total num frames: 44376064. Throughput: 0: 1739.1. Samples: 6088576. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:16:22,934][41694] Avg episode reward: [(0, '4.415')] +[2024-11-08 00:16:25,877][42004] Updated weights for policy 0, policy_version 10836 (0.0027) +[2024-11-08 00:16:27,932][41694] Fps is (10 sec: 5733.9, 60 sec: 6758.3, 300 sec: 6872.9). Total num frames: 44396544. Throughput: 0: 1640.0. Samples: 6095548. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:16:27,935][41694] Avg episode reward: [(0, '4.427')] +[2024-11-08 00:16:31,462][42004] Updated weights for policy 0, policy_version 10846 (0.0035) +[2024-11-08 00:16:32,932][41694] Fps is (10 sec: 5734.2, 60 sec: 6826.6, 300 sec: 6886.8). Total num frames: 44433408. Throughput: 0: 1636.8. Samples: 6100970. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:16:32,934][41694] Avg episode reward: [(0, '4.291')] +[2024-11-08 00:16:36,772][42004] Updated weights for policy 0, policy_version 10856 (0.0025) +[2024-11-08 00:16:37,932][41694] Fps is (10 sec: 7373.3, 60 sec: 6758.4, 300 sec: 6886.8). Total num frames: 44470272. Throughput: 0: 1741.5. Samples: 6112718. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:16:37,936][41694] Avg episode reward: [(0, '4.458')] +[2024-11-08 00:16:42,931][41694] Fps is (10 sec: 6963.8, 60 sec: 6690.2, 300 sec: 6931.6). Total num frames: 44503040. Throughput: 0: 1710.6. Samples: 6122420. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:16:42,933][41694] Avg episode reward: [(0, '4.479')] +[2024-11-08 00:16:43,025][42004] Updated weights for policy 0, policy_version 10866 (0.0037) +[2024-11-08 00:16:47,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6758.4, 300 sec: 6942.4). Total num frames: 44544000. Throughput: 0: 1704.0. Samples: 6128090. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:16:47,933][41694] Avg episode reward: [(0, '4.585')] +[2024-11-08 00:16:48,337][42004] Updated weights for policy 0, policy_version 10876 (0.0032) +[2024-11-08 00:16:52,932][41694] Fps is (10 sec: 7782.3, 60 sec: 7031.5, 300 sec: 6928.5). Total num frames: 44580864. Throughput: 0: 1730.0. Samples: 6139356. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:16:52,935][41694] Avg episode reward: [(0, '4.321')] +[2024-11-08 00:16:53,884][42004] Updated weights for policy 0, policy_version 10886 (0.0025) +[2024-11-08 00:16:59,870][41694] Fps is (10 sec: 5832.7, 60 sec: 6745.3, 300 sec: 6869.5). Total num frames: 44613632. Throughput: 0: 1675.4. Samples: 6150302. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:16:59,872][41694] Avg episode reward: [(0, '4.483')] +[2024-11-08 00:17:01,883][42004] Updated weights for policy 0, policy_version 10896 (0.0026) +[2024-11-08 00:17:02,934][41694] Fps is (10 sec: 5323.7, 60 sec: 6689.9, 300 sec: 6859.0). Total num frames: 44634112. Throughput: 0: 1661.3. Samples: 6151768. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:17:02,936][41694] Avg episode reward: [(0, '4.328')] +[2024-11-08 00:17:07,635][42004] Updated weights for policy 0, policy_version 10906 (0.0029) +[2024-11-08 00:17:07,931][41694] Fps is (10 sec: 7113.3, 60 sec: 6690.1, 300 sec: 6859.1). Total num frames: 44670976. Throughput: 0: 1636.1. Samples: 6162198. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:17:07,933][41694] Avg episode reward: [(0, '4.587')] +[2024-11-08 00:17:12,933][41694] Fps is (10 sec: 7373.6, 60 sec: 6758.3, 300 sec: 6859.0). Total num frames: 44707840. Throughput: 0: 1735.6. Samples: 6173650. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:17:12,935][41694] Avg episode reward: [(0, '4.385')] +[2024-11-08 00:17:13,167][42004] Updated weights for policy 0, policy_version 10916 (0.0029) +[2024-11-08 00:17:17,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6758.4, 300 sec: 6902.2). Total num frames: 44744704. Throughput: 0: 1731.7. Samples: 6178894. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:17:17,935][41694] Avg episode reward: [(0, '4.476')] +[2024-11-08 00:17:18,582][42004] Updated weights for policy 0, policy_version 10926 (0.0026) +[2024-11-08 00:17:22,932][41694] Fps is (10 sec: 7783.1, 60 sec: 6826.7, 300 sec: 6914.6). Total num frames: 44785664. Throughput: 0: 1731.5. Samples: 6190634. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:17:22,935][41694] Avg episode reward: [(0, '4.273')] +[2024-11-08 00:17:23,713][42004] Updated weights for policy 0, policy_version 10936 (0.0030) +[2024-11-08 00:17:27,931][41694] Fps is (10 sec: 7782.3, 60 sec: 7099.8, 300 sec: 6914.6). Total num frames: 44822528. Throughput: 0: 1769.2. Samples: 6202036. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:17:27,933][41694] Avg episode reward: [(0, '4.375')] +[2024-11-08 00:17:29,294][42004] Updated weights for policy 0, policy_version 10946 (0.0026) +[2024-11-08 00:17:34,273][41694] Fps is (10 sec: 6139.5, 60 sec: 6877.7, 300 sec: 6855.7). Total num frames: 44855296. Throughput: 0: 1720.2. Samples: 6207806. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:17:34,275][41694] Avg episode reward: [(0, '4.350')] +[2024-11-08 00:17:37,166][42004] Updated weights for policy 0, policy_version 10956 (0.0025) +[2024-11-08 00:17:37,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6826.7, 300 sec: 6845.2). Total num frames: 44879872. Throughput: 0: 1669.7. Samples: 6214492. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:17:37,933][41694] Avg episode reward: [(0, '4.375')] +[2024-11-08 00:17:37,943][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000010957_44879872.pth... +[2024-11-08 00:17:38,226][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000010555_43233280.pth +[2024-11-08 00:17:42,771][42004] Updated weights for policy 0, policy_version 10966 (0.0023) +[2024-11-08 00:17:42,931][41694] Fps is (10 sec: 7096.2, 60 sec: 6894.9, 300 sec: 6859.1). Total num frames: 44916736. Throughput: 0: 1741.1. Samples: 6225276. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:17:42,934][41694] Avg episode reward: [(0, '4.512')] +[2024-11-08 00:17:47,914][42004] Updated weights for policy 0, policy_version 10976 (0.0027) +[2024-11-08 00:17:47,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6894.9, 300 sec: 6873.0). Total num frames: 44957696. Throughput: 0: 1762.4. Samples: 6231072. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:17:47,934][41694] Avg episode reward: [(0, '4.428')] +[2024-11-08 00:17:52,931][41694] Fps is (10 sec: 7782.4, 60 sec: 6894.9, 300 sec: 6915.2). Total num frames: 44994560. Throughput: 0: 1795.6. Samples: 6242998. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:17:52,933][41694] Avg episode reward: [(0, '4.537')] +[2024-11-08 00:17:53,045][42004] Updated weights for policy 0, policy_version 10986 (0.0031) +[2024-11-08 00:17:57,932][41694] Fps is (10 sec: 7372.8, 60 sec: 7195.7, 300 sec: 6928.5). Total num frames: 45031424. Throughput: 0: 1797.8. Samples: 6254548. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:17:57,933][41694] Avg episode reward: [(0, '4.349')] +[2024-11-08 00:17:58,666][42004] Updated weights for policy 0, policy_version 10996 (0.0041) +[2024-11-08 00:18:02,932][41694] Fps is (10 sec: 7372.5, 60 sec: 7236.5, 300 sec: 6928.5). Total num frames: 45068288. Throughput: 0: 1793.0. Samples: 6259582. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:18:02,934][41694] Avg episode reward: [(0, '4.413')] +[2024-11-08 00:18:04,477][42004] Updated weights for policy 0, policy_version 11006 (0.0029) +[2024-11-08 00:18:08,879][41694] Fps is (10 sec: 5612.2, 60 sec: 6922.1, 300 sec: 6850.9). Total num frames: 45092864. Throughput: 0: 1740.0. Samples: 6270584. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:18:08,881][41694] Avg episode reward: [(0, '4.475')] +[2024-11-08 00:18:12,236][42004] Updated weights for policy 0, policy_version 11016 (0.0019) +[2024-11-08 00:18:12,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6963.3, 300 sec: 6872.9). Total num frames: 45125632. Throughput: 0: 1678.1. Samples: 6277550. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:18:12,935][41694] Avg episode reward: [(0, '4.520')] +[2024-11-08 00:18:17,332][42004] Updated weights for policy 0, policy_version 11026 (0.0029) +[2024-11-08 00:18:17,932][41694] Fps is (10 sec: 8144.3, 60 sec: 7031.4, 300 sec: 6872.9). Total num frames: 45166592. Throughput: 0: 1735.2. Samples: 6283564. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:18:17,938][41694] Avg episode reward: [(0, '4.230')] +[2024-11-08 00:18:22,560][42004] Updated weights for policy 0, policy_version 11036 (0.0028) +[2024-11-08 00:18:22,932][41694] Fps is (10 sec: 7782.7, 60 sec: 6963.2, 300 sec: 6873.0). Total num frames: 45203456. Throughput: 0: 1795.3. Samples: 6295282. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:18:22,933][41694] Avg episode reward: [(0, '4.468')] +[2024-11-08 00:18:27,766][42004] Updated weights for policy 0, policy_version 11046 (0.0030) +[2024-11-08 00:18:27,931][41694] Fps is (10 sec: 7782.7, 60 sec: 7031.5, 300 sec: 6945.4). Total num frames: 45244416. Throughput: 0: 1821.8. Samples: 6307258. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:18:27,933][41694] Avg episode reward: [(0, '4.534')] +[2024-11-08 00:18:32,932][41694] Fps is (10 sec: 7782.4, 60 sec: 7262.1, 300 sec: 6942.4). Total num frames: 45281280. Throughput: 0: 1815.2. Samples: 6312754. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:18:32,934][41694] Avg episode reward: [(0, '4.401')] +[2024-11-08 00:18:33,497][42004] Updated weights for policy 0, policy_version 11056 (0.0041) +[2024-11-08 00:18:37,932][41694] Fps is (10 sec: 7372.7, 60 sec: 7304.5, 300 sec: 6942.4). Total num frames: 45318144. Throughput: 0: 1791.9. Samples: 6323632. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:18:37,933][41694] Avg episode reward: [(0, '4.608')] +[2024-11-08 00:18:38,825][42004] Updated weights for policy 0, policy_version 11066 (0.0039) +[2024-11-08 00:18:43,290][41694] Fps is (10 sec: 5931.3, 60 sec: 7057.6, 300 sec: 6878.5). Total num frames: 45342720. Throughput: 0: 1652.4. Samples: 6329498. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:18:43,292][41694] Avg episode reward: [(0, '4.369')] +[2024-11-08 00:18:46,233][42004] Updated weights for policy 0, policy_version 11076 (0.0033) +[2024-11-08 00:18:47,932][41694] Fps is (10 sec: 6144.0, 60 sec: 7031.5, 300 sec: 6886.8). Total num frames: 45379584. Throughput: 0: 1720.8. Samples: 6337016. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:18:47,933][41694] Avg episode reward: [(0, '4.672')] +[2024-11-08 00:18:51,411][42004] Updated weights for policy 0, policy_version 11086 (0.0026) +[2024-11-08 00:18:52,931][41694] Fps is (10 sec: 7647.1, 60 sec: 7031.5, 300 sec: 6886.8). Total num frames: 45416448. Throughput: 0: 1778.6. Samples: 6348934. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:18:52,933][41694] Avg episode reward: [(0, '4.334')] +[2024-11-08 00:18:56,690][42004] Updated weights for policy 0, policy_version 11096 (0.0029) +[2024-11-08 00:18:57,933][41694] Fps is (10 sec: 7781.1, 60 sec: 7099.5, 300 sec: 6900.7). Total num frames: 45457408. Throughput: 0: 1840.7. Samples: 6360382. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:18:57,935][41694] Avg episode reward: [(0, '4.203')] +[2024-11-08 00:19:02,498][42004] Updated weights for policy 0, policy_version 11106 (0.0027) +[2024-11-08 00:19:02,932][41694] Fps is (10 sec: 7372.7, 60 sec: 7031.5, 300 sec: 6942.4). Total num frames: 45490176. Throughput: 0: 1826.7. Samples: 6365764. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:19:02,933][41694] Avg episode reward: [(0, '4.491')] +[2024-11-08 00:19:07,932][41694] Fps is (10 sec: 6964.4, 60 sec: 7352.4, 300 sec: 6942.4). Total num frames: 45527040. Throughput: 0: 1794.1. Samples: 6376018. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:19:07,933][41694] Avg episode reward: [(0, '4.338')] +[2024-11-08 00:19:08,691][42004] Updated weights for policy 0, policy_version 11116 (0.0027) +[2024-11-08 00:19:12,933][41694] Fps is (10 sec: 6962.4, 60 sec: 7236.2, 300 sec: 6928.5). Total num frames: 45559808. Throughput: 0: 1758.9. Samples: 6386412. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:19:12,936][41694] Avg episode reward: [(0, '4.343')] +[2024-11-08 00:19:14,220][42004] Updated weights for policy 0, policy_version 11126 (0.0032) +[2024-11-08 00:19:17,931][41694] Fps is (10 sec: 5324.8, 60 sec: 6895.0, 300 sec: 6859.1). Total num frames: 45580288. Throughput: 0: 1763.2. Samples: 6392098. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:19:17,934][41694] Avg episode reward: [(0, '4.416')] +[2024-11-08 00:19:21,847][42004] Updated weights for policy 0, policy_version 11136 (0.0025) +[2024-11-08 00:19:22,931][41694] Fps is (10 sec: 6144.7, 60 sec: 6963.2, 300 sec: 6886.8). Total num frames: 45621248. Throughput: 0: 1680.6. Samples: 6399260. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:19:22,933][41694] Avg episode reward: [(0, '4.243')] +[2024-11-08 00:19:27,067][42004] Updated weights for policy 0, policy_version 11146 (0.0024) +[2024-11-08 00:19:27,931][41694] Fps is (10 sec: 7782.4, 60 sec: 6894.9, 300 sec: 6873.0). Total num frames: 45658112. Throughput: 0: 1822.2. Samples: 6410844. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:19:27,933][41694] Avg episode reward: [(0, '4.559')] +[2024-11-08 00:19:32,782][42004] Updated weights for policy 0, policy_version 11156 (0.0039) +[2024-11-08 00:19:32,932][41694] Fps is (10 sec: 7372.5, 60 sec: 6894.9, 300 sec: 6872.9). Total num frames: 45694976. Throughput: 0: 1763.2. Samples: 6416360. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:19:32,933][41694] Avg episode reward: [(0, '4.471')] +[2024-11-08 00:19:37,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6826.7, 300 sec: 6914.6). Total num frames: 45727744. Throughput: 0: 1726.5. Samples: 6426626. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:19:37,933][41694] Avg episode reward: [(0, '4.380')] +[2024-11-08 00:19:37,943][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000011164_45727744.pth... +[2024-11-08 00:19:38,075][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000010758_44064768.pth +[2024-11-08 00:19:39,188][42004] Updated weights for policy 0, policy_version 11166 (0.0032) +[2024-11-08 00:19:42,932][41694] Fps is (10 sec: 6553.8, 60 sec: 7005.1, 300 sec: 6914.6). Total num frames: 45760512. Throughput: 0: 1681.4. Samples: 6436044. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:19:42,935][41694] Avg episode reward: [(0, '4.339')] +[2024-11-08 00:19:45,346][42004] Updated weights for policy 0, policy_version 11176 (0.0033) +[2024-11-08 00:19:47,931][41694] Fps is (10 sec: 6553.7, 60 sec: 6894.9, 300 sec: 6886.8). Total num frames: 45793280. Throughput: 0: 1676.9. Samples: 6441226. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:19:47,933][41694] Avg episode reward: [(0, '4.478')] +[2024-11-08 00:19:52,904][42004] Updated weights for policy 0, policy_version 11186 (0.0030) +[2024-11-08 00:19:52,931][41694] Fps is (10 sec: 5734.5, 60 sec: 6690.1, 300 sec: 6859.1). Total num frames: 45817856. Throughput: 0: 1649.9. Samples: 6450264. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:19:52,933][41694] Avg episode reward: [(0, '4.505')] +[2024-11-08 00:19:57,932][41694] Fps is (10 sec: 6144.0, 60 sec: 6622.1, 300 sec: 6873.0). Total num frames: 45854720. Throughput: 0: 1633.1. Samples: 6459898. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:19:57,935][41694] Avg episode reward: [(0, '4.299')] +[2024-11-08 00:19:58,236][42004] Updated weights for policy 0, policy_version 11196 (0.0045) +[2024-11-08 00:20:02,932][41694] Fps is (10 sec: 7372.5, 60 sec: 6690.1, 300 sec: 6872.9). Total num frames: 45891584. Throughput: 0: 1632.1. Samples: 6465542. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:20:02,934][41694] Avg episode reward: [(0, '4.307')] +[2024-11-08 00:20:03,885][42004] Updated weights for policy 0, policy_version 11206 (0.0026) +[2024-11-08 00:20:07,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6690.1, 300 sec: 6873.1). Total num frames: 45928448. Throughput: 0: 1716.3. Samples: 6476494. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:20:07,933][41694] Avg episode reward: [(0, '4.581')] +[2024-11-08 00:20:09,196][42004] Updated weights for policy 0, policy_version 11216 (0.0027) +[2024-11-08 00:20:12,931][41694] Fps is (10 sec: 6963.5, 60 sec: 6690.3, 300 sec: 6928.5). Total num frames: 45961216. Throughput: 0: 1700.7. Samples: 6487376. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:20:12,933][41694] Avg episode reward: [(0, '4.293')] +[2024-11-08 00:20:16,102][42004] Updated weights for policy 0, policy_version 11226 (0.0035) +[2024-11-08 00:20:17,931][41694] Fps is (10 sec: 6553.6, 60 sec: 6894.9, 300 sec: 6928.5). Total num frames: 45993984. Throughput: 0: 1666.3. Samples: 6491344. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:20:17,933][41694] Avg episode reward: [(0, '4.293')] +[2024-11-08 00:20:21,609][42004] Updated weights for policy 0, policy_version 11236 (0.0031) +[2024-11-08 00:20:22,932][41694] Fps is (10 sec: 6962.7, 60 sec: 6826.6, 300 sec: 6914.6). Total num frames: 46030848. Throughput: 0: 1675.3. Samples: 6502014. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:20:22,934][41694] Avg episode reward: [(0, '4.615')] +[2024-11-08 00:20:27,932][41694] Fps is (10 sec: 6143.9, 60 sec: 6621.8, 300 sec: 6886.8). Total num frames: 46055424. Throughput: 0: 1633.3. Samples: 6509544. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:20:27,934][41694] Avg episode reward: [(0, '4.531')] +[2024-11-08 00:20:29,136][42004] Updated weights for policy 0, policy_version 11246 (0.0030) +[2024-11-08 00:20:32,931][41694] Fps is (10 sec: 6144.5, 60 sec: 6621.9, 300 sec: 6873.0). Total num frames: 46092288. Throughput: 0: 1644.2. Samples: 6515214. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:20:32,933][41694] Avg episode reward: [(0, '4.222')] +[2024-11-08 00:20:34,453][42004] Updated weights for policy 0, policy_version 11256 (0.0030) +[2024-11-08 00:20:37,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6621.9, 300 sec: 6859.1). Total num frames: 46125056. Throughput: 0: 1693.3. Samples: 6526464. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:20:37,934][41694] Avg episode reward: [(0, '4.462')] +[2024-11-08 00:20:40,306][42004] Updated weights for policy 0, policy_version 11266 (0.0025) +[2024-11-08 00:20:42,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6690.1, 300 sec: 6859.1). Total num frames: 46161920. Throughput: 0: 1715.2. Samples: 6537082. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:20:42,936][41694] Avg episode reward: [(0, '4.435')] +[2024-11-08 00:20:46,348][42004] Updated weights for policy 0, policy_version 11276 (0.0025) +[2024-11-08 00:20:47,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6690.1, 300 sec: 6900.7). Total num frames: 46194688. Throughput: 0: 1694.3. Samples: 6541784. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:20:47,934][41694] Avg episode reward: [(0, '4.505')] +[2024-11-08 00:20:51,995][42004] Updated weights for policy 0, policy_version 11286 (0.0028) +[2024-11-08 00:20:52,939][41694] Fps is (10 sec: 6958.2, 60 sec: 6894.1, 300 sec: 6900.6). Total num frames: 46231552. Throughput: 0: 1686.9. Samples: 6552416. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:20:52,941][41694] Avg episode reward: [(0, '4.622')] +[2024-11-08 00:20:57,091][42004] Updated weights for policy 0, policy_version 11296 (0.0018) +[2024-11-08 00:20:57,931][41694] Fps is (10 sec: 7782.4, 60 sec: 6963.2, 300 sec: 6914.6). Total num frames: 46272512. Throughput: 0: 1713.0. Samples: 6564462. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:20:57,933][41694] Avg episode reward: [(0, '4.511')] +[2024-11-08 00:21:02,932][41694] Fps is (10 sec: 6148.3, 60 sec: 6690.2, 300 sec: 6859.1). Total num frames: 46292992. Throughput: 0: 1693.7. Samples: 6567562. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:21:02,934][41694] Avg episode reward: [(0, '4.376')] +[2024-11-08 00:21:04,847][42004] Updated weights for policy 0, policy_version 11306 (0.0032) +[2024-11-08 00:21:07,932][41694] Fps is (10 sec: 5734.0, 60 sec: 6690.1, 300 sec: 6872.9). Total num frames: 46329856. Throughput: 0: 1665.4. Samples: 6576958. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:21:07,934][41694] Avg episode reward: [(0, '4.243')] +[2024-11-08 00:21:10,362][42004] Updated weights for policy 0, policy_version 11316 (0.0029) +[2024-11-08 00:21:12,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6758.4, 300 sec: 6873.0). Total num frames: 46366720. Throughput: 0: 1736.5. Samples: 6587686. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:21:12,933][41694] Avg episode reward: [(0, '4.258')] +[2024-11-08 00:21:16,242][42004] Updated weights for policy 0, policy_version 11326 (0.0034) +[2024-11-08 00:21:17,932][41694] Fps is (10 sec: 6963.6, 60 sec: 6758.4, 300 sec: 6859.1). Total num frames: 46399488. Throughput: 0: 1729.8. Samples: 6593056. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:21:17,934][41694] Avg episode reward: [(0, '4.379')] +[2024-11-08 00:21:22,265][42004] Updated weights for policy 0, policy_version 11336 (0.0024) +[2024-11-08 00:21:22,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6758.5, 300 sec: 6914.6). Total num frames: 46436352. Throughput: 0: 1710.7. Samples: 6603446. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:21:22,935][41694] Avg episode reward: [(0, '4.608')] +[2024-11-08 00:21:27,367][42004] Updated weights for policy 0, policy_version 11346 (0.0027) +[2024-11-08 00:21:27,931][41694] Fps is (10 sec: 7782.5, 60 sec: 7031.5, 300 sec: 6928.5). Total num frames: 46477312. Throughput: 0: 1735.2. Samples: 6615168. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:21:27,933][41694] Avg episode reward: [(0, '4.467')] +[2024-11-08 00:21:32,578][42004] Updated weights for policy 0, policy_version 11356 (0.0031) +[2024-11-08 00:21:32,932][41694] Fps is (10 sec: 7782.3, 60 sec: 7031.4, 300 sec: 6928.5). Total num frames: 46514176. Throughput: 0: 1759.4. Samples: 6620958. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:21:32,933][41694] Avg episode reward: [(0, '4.298')] +[2024-11-08 00:21:37,932][41694] Fps is (10 sec: 6143.9, 60 sec: 6894.9, 300 sec: 6900.7). Total num frames: 46538752. Throughput: 0: 1693.8. Samples: 6628626. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:21:37,933][41694] Avg episode reward: [(0, '4.239')] +[2024-11-08 00:21:37,945][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000011362_46538752.pth... +[2024-11-08 00:21:38,058][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000010957_44879872.pth +[2024-11-08 00:21:40,008][42004] Updated weights for policy 0, policy_version 11366 (0.0028) +[2024-11-08 00:21:42,932][41694] Fps is (10 sec: 6144.0, 60 sec: 6894.9, 300 sec: 6886.8). Total num frames: 46575616. Throughput: 0: 1679.7. Samples: 6640048. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:21:42,934][41694] Avg episode reward: [(0, '4.392')] +[2024-11-08 00:21:45,337][42004] Updated weights for policy 0, policy_version 11376 (0.0042) +[2024-11-08 00:21:47,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6963.2, 300 sec: 6886.8). Total num frames: 46612480. Throughput: 0: 1740.7. Samples: 6645894. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:21:47,933][41694] Avg episode reward: [(0, '4.604')] +[2024-11-08 00:21:50,913][42004] Updated weights for policy 0, policy_version 11386 (0.0032) +[2024-11-08 00:21:52,932][41694] Fps is (10 sec: 7372.6, 60 sec: 6964.0, 300 sec: 6946.4). Total num frames: 46649344. Throughput: 0: 1777.7. Samples: 6656956. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:21:52,934][41694] Avg episode reward: [(0, '4.490')] +[2024-11-08 00:21:56,736][42004] Updated weights for policy 0, policy_version 11396 (0.0035) +[2024-11-08 00:21:57,935][41694] Fps is (10 sec: 7370.0, 60 sec: 6894.5, 300 sec: 6956.2). Total num frames: 46686208. Throughput: 0: 1776.4. Samples: 6667632. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:21:57,938][41694] Avg episode reward: [(0, '4.231')] +[2024-11-08 00:22:02,276][42004] Updated weights for policy 0, policy_version 11406 (0.0042) +[2024-11-08 00:22:02,932][41694] Fps is (10 sec: 7373.0, 60 sec: 7168.0, 300 sec: 6956.3). Total num frames: 46723072. Throughput: 0: 1783.3. Samples: 6673306. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:22:02,934][41694] Avg episode reward: [(0, '4.538')] +[2024-11-08 00:22:09,456][41694] Fps is (10 sec: 6044.1, 60 sec: 6923.9, 300 sec: 6906.7). Total num frames: 46755840. Throughput: 0: 1734.9. Samples: 6684162. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:22:09,460][41694] Avg episode reward: [(0, '4.364')] +[2024-11-08 00:22:09,784][42004] Updated weights for policy 0, policy_version 11416 (0.0033) +[2024-11-08 00:22:12,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6894.9, 300 sec: 6900.7). Total num frames: 46780416. Throughput: 0: 1700.4. Samples: 6691688. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:22:12,933][41694] Avg episode reward: [(0, '4.472')] +[2024-11-08 00:22:15,109][42004] Updated weights for policy 0, policy_version 11426 (0.0021) +[2024-11-08 00:22:17,931][41694] Fps is (10 sec: 7732.3, 60 sec: 7031.5, 300 sec: 6900.7). Total num frames: 46821376. Throughput: 0: 1702.9. Samples: 6697590. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:22:17,933][41694] Avg episode reward: [(0, '4.311')] +[2024-11-08 00:22:20,412][42004] Updated weights for policy 0, policy_version 11436 (0.0024) +[2024-11-08 00:22:22,932][41694] Fps is (10 sec: 7782.4, 60 sec: 7031.5, 300 sec: 6900.7). Total num frames: 46858240. Throughput: 0: 1787.7. Samples: 6709074. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:22:22,938][41694] Avg episode reward: [(0, '4.469')] +[2024-11-08 00:22:26,202][42004] Updated weights for policy 0, policy_version 11446 (0.0029) +[2024-11-08 00:22:27,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6894.9, 300 sec: 6932.2). Total num frames: 46891008. Throughput: 0: 1771.2. Samples: 6719750. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:22:27,934][41694] Avg episode reward: [(0, '4.564')] +[2024-11-08 00:22:31,716][42004] Updated weights for policy 0, policy_version 11456 (0.0031) +[2024-11-08 00:22:32,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6963.2, 300 sec: 6956.3). Total num frames: 46931968. Throughput: 0: 1760.7. Samples: 6725128. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:22:32,933][41694] Avg episode reward: [(0, '4.433')] +[2024-11-08 00:22:37,011][42004] Updated weights for policy 0, policy_version 11466 (0.0029) +[2024-11-08 00:22:37,932][41694] Fps is (10 sec: 7782.4, 60 sec: 7168.0, 300 sec: 6956.3). Total num frames: 46968832. Throughput: 0: 1774.0. Samples: 6736786. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:22:37,937][41694] Avg episode reward: [(0, '4.289')] +[2024-11-08 00:22:43,832][41694] Fps is (10 sec: 6388.2, 60 sec: 6994.8, 300 sec: 6907.4). Total num frames: 47001600. Throughput: 0: 1633.5. Samples: 6742606. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:22:43,833][41694] Avg episode reward: [(0, '4.610')] +[2024-11-08 00:22:44,320][42004] Updated weights for policy 0, policy_version 11476 (0.0024) +[2024-11-08 00:22:47,931][41694] Fps is (10 sec: 6144.1, 60 sec: 6963.2, 300 sec: 6900.7). Total num frames: 47030272. Throughput: 0: 1711.8. Samples: 6750338. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:22:47,933][41694] Avg episode reward: [(0, '4.505')] +[2024-11-08 00:22:49,615][42004] Updated weights for policy 0, policy_version 11486 (0.0025) +[2024-11-08 00:22:52,931][41694] Fps is (10 sec: 7652.1, 60 sec: 7031.5, 300 sec: 6914.6). Total num frames: 47071232. Throughput: 0: 1788.8. Samples: 6761932. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:22:52,934][41694] Avg episode reward: [(0, '4.555')] +[2024-11-08 00:22:54,943][42004] Updated weights for policy 0, policy_version 11496 (0.0021) +[2024-11-08 00:22:57,932][41694] Fps is (10 sec: 7782.3, 60 sec: 7031.9, 300 sec: 6914.6). Total num frames: 47108096. Throughput: 0: 1819.6. Samples: 6773568. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:22:57,935][41694] Avg episode reward: [(0, '4.289')] +[2024-11-08 00:23:00,705][42004] Updated weights for policy 0, policy_version 11506 (0.0028) +[2024-11-08 00:23:02,932][41694] Fps is (10 sec: 6962.9, 60 sec: 6963.2, 300 sec: 6964.7). Total num frames: 47140864. Throughput: 0: 1801.8. Samples: 6778670. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:23:02,934][41694] Avg episode reward: [(0, '4.301')] +[2024-11-08 00:23:06,408][42004] Updated weights for policy 0, policy_version 11516 (0.0033) +[2024-11-08 00:23:07,933][41694] Fps is (10 sec: 6962.5, 60 sec: 7214.6, 300 sec: 6956.2). Total num frames: 47177728. Throughput: 0: 1779.8. Samples: 6789168. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:23:07,934][41694] Avg episode reward: [(0, '4.264')] +[2024-11-08 00:23:12,081][42004] Updated weights for policy 0, policy_version 11526 (0.0032) +[2024-11-08 00:23:12,932][41694] Fps is (10 sec: 7373.0, 60 sec: 7236.3, 300 sec: 6942.4). Total num frames: 47214592. Throughput: 0: 1781.9. Samples: 6799936. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:23:12,934][41694] Avg episode reward: [(0, '4.326')] +[2024-11-08 00:23:18,211][41694] Fps is (10 sec: 5977.3, 60 sec: 6930.9, 300 sec: 6894.2). Total num frames: 47239168. Throughput: 0: 1782.7. Samples: 6805846. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:23:18,214][41694] Avg episode reward: [(0, '4.348')] +[2024-11-08 00:23:19,449][42004] Updated weights for policy 0, policy_version 11536 (0.0030) +[2024-11-08 00:23:22,932][41694] Fps is (10 sec: 6144.0, 60 sec: 6963.2, 300 sec: 6886.8). Total num frames: 47276032. Throughput: 0: 1696.3. Samples: 6813118. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:23:22,933][41694] Avg episode reward: [(0, '4.635')] +[2024-11-08 00:23:25,219][42004] Updated weights for policy 0, policy_version 11546 (0.0027) +[2024-11-08 00:23:27,932][41694] Fps is (10 sec: 7163.6, 60 sec: 6963.2, 300 sec: 6872.9). Total num frames: 47308800. Throughput: 0: 1845.4. Samples: 6823988. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:23:27,934][41694] Avg episode reward: [(0, '4.535')] +[2024-11-08 00:23:30,584][42004] Updated weights for policy 0, policy_version 11556 (0.0021) +[2024-11-08 00:23:32,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6894.9, 300 sec: 6872.9). Total num frames: 47345664. Throughput: 0: 1765.2. Samples: 6829770. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:23:32,934][41694] Avg episode reward: [(0, '4.507')] +[2024-11-08 00:23:36,526][42004] Updated weights for policy 0, policy_version 11566 (0.0030) +[2024-11-08 00:23:37,932][41694] Fps is (10 sec: 7372.9, 60 sec: 6894.9, 300 sec: 6923.0). Total num frames: 47382528. Throughput: 0: 1738.0. Samples: 6840140. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:23:37,934][41694] Avg episode reward: [(0, '4.555')] +[2024-11-08 00:23:37,949][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000011568_47382528.pth... +[2024-11-08 00:23:38,089][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000011164_45727744.pth +[2024-11-08 00:23:41,874][42004] Updated weights for policy 0, policy_version 11576 (0.0022) +[2024-11-08 00:23:42,932][41694] Fps is (10 sec: 7782.4, 60 sec: 7138.6, 300 sec: 6928.5). Total num frames: 47423488. Throughput: 0: 1736.8. Samples: 6851722. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:23:42,934][41694] Avg episode reward: [(0, '4.585')] +[2024-11-08 00:23:47,039][42004] Updated weights for policy 0, policy_version 11586 (0.0025) +[2024-11-08 00:23:47,932][41694] Fps is (10 sec: 7782.1, 60 sec: 7168.0, 300 sec: 6928.5). Total num frames: 47460352. Throughput: 0: 1754.2. Samples: 6857608. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:23:47,934][41694] Avg episode reward: [(0, '4.488')] +[2024-11-08 00:23:52,932][41694] Fps is (10 sec: 6143.8, 60 sec: 6894.9, 300 sec: 6873.0). Total num frames: 47484928. Throughput: 0: 1753.8. Samples: 6868088. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:23:52,934][41694] Avg episode reward: [(0, '4.352')] +[2024-11-08 00:23:54,615][42004] Updated weights for policy 0, policy_version 11596 (0.0030) +[2024-11-08 00:23:57,931][41694] Fps is (10 sec: 6144.3, 60 sec: 6894.9, 300 sec: 6886.8). Total num frames: 47521792. Throughput: 0: 1706.3. Samples: 6876720. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:23:57,934][41694] Avg episode reward: [(0, '4.353')] +[2024-11-08 00:23:59,778][42004] Updated weights for policy 0, policy_version 11606 (0.0027) +[2024-11-08 00:24:02,931][41694] Fps is (10 sec: 7373.1, 60 sec: 6963.3, 300 sec: 6886.8). Total num frames: 47558656. Throughput: 0: 1716.1. Samples: 6882588. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:24:02,934][41694] Avg episode reward: [(0, '4.461')] +[2024-11-08 00:24:05,792][42004] Updated weights for policy 0, policy_version 11616 (0.0030) +[2024-11-08 00:24:07,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6895.0, 300 sec: 6886.9). Total num frames: 47591424. Throughput: 0: 1771.4. Samples: 6892830. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:24:07,934][41694] Avg episode reward: [(0, '4.510')] +[2024-11-08 00:24:11,544][42004] Updated weights for policy 0, policy_version 11626 (0.0032) +[2024-11-08 00:24:12,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6894.9, 300 sec: 6942.4). Total num frames: 47628288. Throughput: 0: 1768.7. Samples: 6903578. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:24:12,933][41694] Avg episode reward: [(0, '4.453')] +[2024-11-08 00:24:16,941][42004] Updated weights for policy 0, policy_version 11636 (0.0027) +[2024-11-08 00:24:17,931][41694] Fps is (10 sec: 7372.9, 60 sec: 7133.0, 300 sec: 6928.5). Total num frames: 47665152. Throughput: 0: 1765.9. Samples: 6909234. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:24:17,934][41694] Avg episode reward: [(0, '4.449')] +[2024-11-08 00:24:22,227][42004] Updated weights for policy 0, policy_version 11646 (0.0033) +[2024-11-08 00:24:22,932][41694] Fps is (10 sec: 7782.4, 60 sec: 7168.0, 300 sec: 6942.4). Total num frames: 47706112. Throughput: 0: 1791.4. Samples: 6920754. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:24:22,933][41694] Avg episode reward: [(0, '4.483')] +[2024-11-08 00:24:27,932][41694] Fps is (10 sec: 6143.8, 60 sec: 6963.2, 300 sec: 6886.8). Total num frames: 47726592. Throughput: 0: 1701.6. Samples: 6928296. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:24:27,933][41694] Avg episode reward: [(0, '4.533')] +[2024-11-08 00:24:29,678][42004] Updated weights for policy 0, policy_version 11656 (0.0025) +[2024-11-08 00:24:32,931][41694] Fps is (10 sec: 6144.1, 60 sec: 7031.5, 300 sec: 6914.6). Total num frames: 47767552. Throughput: 0: 1698.0. Samples: 6934018. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:24:32,933][41694] Avg episode reward: [(0, '4.531')] +[2024-11-08 00:24:34,967][42004] Updated weights for policy 0, policy_version 11666 (0.0025) +[2024-11-08 00:24:37,932][41694] Fps is (10 sec: 7782.5, 60 sec: 7031.5, 300 sec: 6928.5). Total num frames: 47804416. Throughput: 0: 1718.8. Samples: 6945434. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:24:37,934][41694] Avg episode reward: [(0, '4.394')] +[2024-11-08 00:24:40,850][42004] Updated weights for policy 0, policy_version 11676 (0.0032) +[2024-11-08 00:24:42,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6894.9, 300 sec: 6928.5). Total num frames: 47837184. Throughput: 0: 1759.0. Samples: 6955876. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:24:42,934][41694] Avg episode reward: [(0, '4.488')] +[2024-11-08 00:24:46,339][42004] Updated weights for policy 0, policy_version 11686 (0.0026) +[2024-11-08 00:24:47,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6895.0, 300 sec: 6970.1). Total num frames: 47874048. Throughput: 0: 1749.6. Samples: 6961320. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:24:47,936][41694] Avg episode reward: [(0, '4.429')] +[2024-11-08 00:24:51,878][42004] Updated weights for policy 0, policy_version 11696 (0.0045) +[2024-11-08 00:24:52,931][41694] Fps is (10 sec: 7372.8, 60 sec: 7099.8, 300 sec: 6970.1). Total num frames: 47910912. Throughput: 0: 1775.6. Samples: 6972732. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:24:52,933][41694] Avg episode reward: [(0, '4.284')] +[2024-11-08 00:24:57,264][42004] Updated weights for policy 0, policy_version 11706 (0.0029) +[2024-11-08 00:24:57,932][41694] Fps is (10 sec: 7782.4, 60 sec: 7168.0, 300 sec: 6984.0). Total num frames: 47951872. Throughput: 0: 1790.4. Samples: 6984144. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:24:57,934][41694] Avg episode reward: [(0, '4.251')] +[2024-11-08 00:25:02,932][41694] Fps is (10 sec: 6143.9, 60 sec: 6894.9, 300 sec: 6928.5). Total num frames: 47972352. Throughput: 0: 1758.3. Samples: 6988360. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:25:02,935][41694] Avg episode reward: [(0, '4.364')] +[2024-11-08 00:25:04,894][42004] Updated weights for policy 0, policy_version 11716 (0.0027) +[2024-11-08 00:25:07,931][41694] Fps is (10 sec: 5734.5, 60 sec: 6963.2, 300 sec: 6942.4). Total num frames: 48009216. Throughput: 0: 1694.0. Samples: 6996986. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:25:07,933][41694] Avg episode reward: [(0, '4.363')] +[2024-11-08 00:25:10,261][42004] Updated weights for policy 0, policy_version 11726 (0.0036) +[2024-11-08 00:25:12,933][41694] Fps is (10 sec: 7372.2, 60 sec: 6963.1, 300 sec: 6956.2). Total num frames: 48046080. Throughput: 0: 1769.3. Samples: 7007916. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:25:12,934][41694] Avg episode reward: [(0, '4.322')] +[2024-11-08 00:25:16,751][42004] Updated weights for policy 0, policy_version 11736 (0.0043) +[2024-11-08 00:25:17,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6894.9, 300 sec: 6942.4). Total num frames: 48078848. Throughput: 0: 1744.9. Samples: 7012540. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:25:17,935][41694] Avg episode reward: [(0, '4.592')] +[2024-11-08 00:25:22,193][42004] Updated weights for policy 0, policy_version 11746 (0.0026) +[2024-11-08 00:25:22,931][41694] Fps is (10 sec: 6963.9, 60 sec: 6826.7, 300 sec: 6984.0). Total num frames: 48115712. Throughput: 0: 1725.2. Samples: 7023066. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:25:22,933][41694] Avg episode reward: [(0, '4.207')] +[2024-11-08 00:25:27,579][42004] Updated weights for policy 0, policy_version 11756 (0.0039) +[2024-11-08 00:25:27,931][41694] Fps is (10 sec: 7372.9, 60 sec: 7099.8, 300 sec: 6984.0). Total num frames: 48152576. Throughput: 0: 1750.3. Samples: 7034640. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:25:27,934][41694] Avg episode reward: [(0, '4.472')] +[2024-11-08 00:25:32,786][42004] Updated weights for policy 0, policy_version 11766 (0.0036) +[2024-11-08 00:25:32,932][41694] Fps is (10 sec: 7782.3, 60 sec: 7099.7, 300 sec: 7011.8). Total num frames: 48193536. Throughput: 0: 1757.9. Samples: 7040424. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:25:32,934][41694] Avg episode reward: [(0, '4.413')] +[2024-11-08 00:25:37,934][41694] Fps is (10 sec: 6142.4, 60 sec: 6826.4, 300 sec: 6956.2). Total num frames: 48214016. Throughput: 0: 1667.5. Samples: 7047776. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:25:37,938][41694] Avg episode reward: [(0, '4.361')] +[2024-11-08 00:25:37,954][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000011771_48214016.pth... +[2024-11-08 00:25:38,096][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000011362_46538752.pth +[2024-11-08 00:25:40,433][42004] Updated weights for policy 0, policy_version 11776 (0.0024) +[2024-11-08 00:25:42,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6894.9, 300 sec: 6970.1). Total num frames: 48250880. Throughput: 0: 1667.9. Samples: 7059200. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:25:42,936][41694] Avg episode reward: [(0, '4.333')] +[2024-11-08 00:25:45,915][42004] Updated weights for policy 0, policy_version 11786 (0.0028) +[2024-11-08 00:25:47,931][41694] Fps is (10 sec: 7374.7, 60 sec: 6894.9, 300 sec: 6970.3). Total num frames: 48287744. Throughput: 0: 1702.1. Samples: 7064954. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:25:47,934][41694] Avg episode reward: [(0, '4.360')] +[2024-11-08 00:25:51,713][42004] Updated weights for policy 0, policy_version 11796 (0.0033) +[2024-11-08 00:25:52,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6894.9, 300 sec: 6956.3). Total num frames: 48324608. Throughput: 0: 1739.4. Samples: 7075260. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:25:52,935][41694] Avg episode reward: [(0, '4.198')] +[2024-11-08 00:25:56,788][42004] Updated weights for policy 0, policy_version 11806 (0.0024) +[2024-11-08 00:25:57,932][41694] Fps is (10 sec: 7782.3, 60 sec: 6894.9, 300 sec: 7025.7). Total num frames: 48365568. Throughput: 0: 1765.9. Samples: 7087378. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:25:57,933][41694] Avg episode reward: [(0, '4.421')] +[2024-11-08 00:26:02,269][42004] Updated weights for policy 0, policy_version 11816 (0.0026) +[2024-11-08 00:26:02,932][41694] Fps is (10 sec: 7782.4, 60 sec: 7168.0, 300 sec: 7025.7). Total num frames: 48402432. Throughput: 0: 1789.0. Samples: 7093044. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:26:02,934][41694] Avg episode reward: [(0, '4.316')] +[2024-11-08 00:26:07,783][42004] Updated weights for policy 0, policy_version 11826 (0.0032) +[2024-11-08 00:26:07,932][41694] Fps is (10 sec: 7372.7, 60 sec: 7168.0, 300 sec: 7025.7). Total num frames: 48439296. Throughput: 0: 1801.5. Samples: 7104136. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:26:07,933][41694] Avg episode reward: [(0, '4.467')] +[2024-11-08 00:26:12,932][41694] Fps is (10 sec: 5324.8, 60 sec: 6826.8, 300 sec: 6970.1). Total num frames: 48455680. Throughput: 0: 1676.9. Samples: 7110102. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:26:12,933][41694] Avg episode reward: [(0, '4.409')] +[2024-11-08 00:26:16,215][42004] Updated weights for policy 0, policy_version 11836 (0.0040) +[2024-11-08 00:26:17,932][41694] Fps is (10 sec: 5324.8, 60 sec: 6894.9, 300 sec: 6970.1). Total num frames: 48492544. Throughput: 0: 1662.8. Samples: 7115252. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:26:17,934][41694] Avg episode reward: [(0, '4.652')] +[2024-11-08 00:26:21,882][42004] Updated weights for policy 0, policy_version 11846 (0.0023) +[2024-11-08 00:26:22,932][41694] Fps is (10 sec: 6962.9, 60 sec: 6826.6, 300 sec: 6942.4). Total num frames: 48525312. Throughput: 0: 1742.4. Samples: 7126180. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:26:22,936][41694] Avg episode reward: [(0, '4.429')] +[2024-11-08 00:26:27,892][42004] Updated weights for policy 0, policy_version 11856 (0.0031) +[2024-11-08 00:26:27,932][41694] Fps is (10 sec: 6963.0, 60 sec: 6826.6, 300 sec: 6942.4). Total num frames: 48562176. Throughput: 0: 1715.9. Samples: 7136414. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:26:27,935][41694] Avg episode reward: [(0, '4.549')] +[2024-11-08 00:26:32,931][41694] Fps is (10 sec: 7373.1, 60 sec: 6758.4, 300 sec: 6984.0). Total num frames: 48599040. Throughput: 0: 1707.8. Samples: 7141804. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:26:32,933][41694] Avg episode reward: [(0, '4.441')] +[2024-11-08 00:26:33,239][42004] Updated weights for policy 0, policy_version 11866 (0.0030) +[2024-11-08 00:26:37,932][41694] Fps is (10 sec: 7372.7, 60 sec: 7031.7, 300 sec: 6984.0). Total num frames: 48635904. Throughput: 0: 1733.1. Samples: 7153250. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:26:37,934][41694] Avg episode reward: [(0, '4.268')] +[2024-11-08 00:26:38,700][42004] Updated weights for policy 0, policy_version 11876 (0.0027) +[2024-11-08 00:26:44,521][41694] Fps is (10 sec: 6361.9, 60 sec: 6850.1, 300 sec: 6946.6). Total num frames: 48672768. Throughput: 0: 1661.7. Samples: 7164794. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:26:44,523][41694] Avg episode reward: [(0, '4.380')] +[2024-11-08 00:26:45,897][42004] Updated weights for policy 0, policy_version 11886 (0.0028) +[2024-11-08 00:26:47,932][41694] Fps is (10 sec: 6144.1, 60 sec: 6826.6, 300 sec: 6942.4). Total num frames: 48697344. Throughput: 0: 1644.8. Samples: 7167062. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-08 00:26:47,934][41694] Avg episode reward: [(0, '4.505')] +[2024-11-08 00:26:51,198][42004] Updated weights for policy 0, policy_version 11896 (0.0029) +[2024-11-08 00:26:52,931][41694] Fps is (10 sec: 7791.8, 60 sec: 6895.0, 300 sec: 6956.3). Total num frames: 48738304. Throughput: 0: 1656.0. Samples: 7178656. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-08 00:26:52,933][41694] Avg episode reward: [(0, '4.543')] +[2024-11-08 00:26:57,084][42004] Updated weights for policy 0, policy_version 11906 (0.0032) +[2024-11-08 00:26:57,931][41694] Fps is (10 sec: 7373.1, 60 sec: 6758.4, 300 sec: 6942.4). Total num frames: 48771072. Throughput: 0: 1757.2. Samples: 7189176. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:26:57,933][41694] Avg episode reward: [(0, '4.337')] +[2024-11-08 00:27:02,637][42004] Updated weights for policy 0, policy_version 11916 (0.0036) +[2024-11-08 00:27:02,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6758.4, 300 sec: 6992.4). Total num frames: 48807936. Throughput: 0: 1771.3. Samples: 7194962. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:27:02,933][41694] Avg episode reward: [(0, '4.372')] +[2024-11-08 00:27:07,874][42004] Updated weights for policy 0, policy_version 11926 (0.0031) +[2024-11-08 00:27:07,932][41694] Fps is (10 sec: 7782.3, 60 sec: 6826.7, 300 sec: 7011.8). Total num frames: 48848896. Throughput: 0: 1774.0. Samples: 7206010. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:27:07,935][41694] Avg episode reward: [(0, '4.306')] +[2024-11-08 00:27:12,932][41694] Fps is (10 sec: 7782.2, 60 sec: 7168.0, 300 sec: 6997.9). Total num frames: 48885760. Throughput: 0: 1809.0. Samples: 7217820. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:27:12,934][41694] Avg episode reward: [(0, '4.229')] +[2024-11-08 00:27:13,133][42004] Updated weights for policy 0, policy_version 11936 (0.0029) +[2024-11-08 00:27:18,915][41694] Fps is (10 sec: 5966.9, 60 sec: 6918.1, 300 sec: 6947.0). Total num frames: 48914432. Throughput: 0: 1775.9. Samples: 7223466. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:27:18,917][41694] Avg episode reward: [(0, '4.096')] +[2024-11-08 00:27:20,672][42004] Updated weights for policy 0, policy_version 11946 (0.0031) +[2024-11-08 00:27:22,932][41694] Fps is (10 sec: 6144.0, 60 sec: 7031.5, 300 sec: 6970.1). Total num frames: 48947200. Throughput: 0: 1727.7. Samples: 7230994. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:27:22,933][41694] Avg episode reward: [(0, '4.289')] +[2024-11-08 00:27:25,937][42004] Updated weights for policy 0, policy_version 11956 (0.0019) +[2024-11-08 00:27:27,934][41694] Fps is (10 sec: 7720.5, 60 sec: 7031.2, 300 sec: 6956.2). Total num frames: 48984064. Throughput: 0: 1783.8. Samples: 7242236. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:27:27,937][41694] Avg episode reward: [(0, '4.494')] +[2024-11-08 00:27:31,997][42004] Updated weights for policy 0, policy_version 11966 (0.0028) +[2024-11-08 00:27:32,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6963.2, 300 sec: 6942.4). Total num frames: 49016832. Throughput: 0: 1780.4. Samples: 7247180. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 00:27:32,934][41694] Avg episode reward: [(0, '4.258')] +[2024-11-08 00:27:37,234][42004] Updated weights for policy 0, policy_version 11976 (0.0029) +[2024-11-08 00:27:37,934][41694] Fps is (10 sec: 7372.7, 60 sec: 7031.2, 300 sec: 6991.4). Total num frames: 49057792. Throughput: 0: 1778.1. Samples: 7258676. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 00:27:37,936][41694] Avg episode reward: [(0, '4.467')] +[2024-11-08 00:27:37,947][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000011977_49057792.pth... +[2024-11-08 00:27:38,082][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000011568_47382528.pth +[2024-11-08 00:27:42,442][42004] Updated weights for policy 0, policy_version 11986 (0.0032) +[2024-11-08 00:27:42,931][41694] Fps is (10 sec: 7782.4, 60 sec: 7222.8, 300 sec: 6997.9). Total num frames: 49094656. Throughput: 0: 1801.9. Samples: 7270260. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:27:42,933][41694] Avg episode reward: [(0, '4.118')] +[2024-11-08 00:27:47,762][42004] Updated weights for policy 0, policy_version 11996 (0.0035) +[2024-11-08 00:27:47,933][41694] Fps is (10 sec: 7783.6, 60 sec: 7304.5, 300 sec: 6997.9). Total num frames: 49135616. Throughput: 0: 1802.8. Samples: 7276090. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:27:47,934][41694] Avg episode reward: [(0, '4.248')] +[2024-11-08 00:27:53,286][41694] Fps is (10 sec: 6329.4, 60 sec: 6990.2, 300 sec: 6947.9). Total num frames: 49160192. Throughput: 0: 1805.0. Samples: 7287876. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:27:53,287][41694] Avg episode reward: [(0, '4.395')] +[2024-11-08 00:27:55,090][42004] Updated weights for policy 0, policy_version 12006 (0.0028) +[2024-11-08 00:27:57,931][41694] Fps is (10 sec: 6144.7, 60 sec: 7099.7, 300 sec: 6970.2). Total num frames: 49197056. Throughput: 0: 1721.2. Samples: 7295272. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:27:57,933][41694] Avg episode reward: [(0, '4.559')] +[2024-11-08 00:28:00,567][42004] Updated weights for policy 0, policy_version 12016 (0.0047) +[2024-11-08 00:28:02,931][41694] Fps is (10 sec: 7218.9, 60 sec: 7031.5, 300 sec: 6956.3). Total num frames: 49229824. Throughput: 0: 1762.6. Samples: 7301050. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:28:02,933][41694] Avg episode reward: [(0, '4.423')] +[2024-11-08 00:28:06,790][42004] Updated weights for policy 0, policy_version 12026 (0.0022) +[2024-11-08 00:28:07,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6963.2, 300 sec: 6956.3). Total num frames: 49266688. Throughput: 0: 1770.8. Samples: 7310680. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:28:07,934][41694] Avg episode reward: [(0, '4.266')] +[2024-11-08 00:28:12,059][42004] Updated weights for policy 0, policy_version 12036 (0.0024) +[2024-11-08 00:28:12,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6963.2, 300 sec: 7004.6). Total num frames: 49303552. Throughput: 0: 1780.4. Samples: 7322348. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:28:12,933][41694] Avg episode reward: [(0, '4.259')] +[2024-11-08 00:28:17,136][42004] Updated weights for policy 0, policy_version 12046 (0.0030) +[2024-11-08 00:28:17,933][41694] Fps is (10 sec: 7781.7, 60 sec: 7287.3, 300 sec: 7011.8). Total num frames: 49344512. Throughput: 0: 1803.9. Samples: 7328356. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:28:17,935][41694] Avg episode reward: [(0, '4.384')] +[2024-11-08 00:28:22,339][42004] Updated weights for policy 0, policy_version 12056 (0.0025) +[2024-11-08 00:28:22,931][41694] Fps is (10 sec: 8192.0, 60 sec: 7304.6, 300 sec: 7039.6). Total num frames: 49385472. Throughput: 0: 1811.6. Samples: 7340194. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:28:22,934][41694] Avg episode reward: [(0, '4.444')] +[2024-11-08 00:28:27,932][41694] Fps is (10 sec: 6144.6, 60 sec: 7031.8, 300 sec: 6984.0). Total num frames: 49405952. Throughput: 0: 1750.8. Samples: 7349046. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:28:27,934][41694] Avg episode reward: [(0, '4.516')] +[2024-11-08 00:28:29,741][42004] Updated weights for policy 0, policy_version 12066 (0.0032) +[2024-11-08 00:28:32,932][41694] Fps is (10 sec: 6143.9, 60 sec: 7168.0, 300 sec: 6997.9). Total num frames: 49446912. Throughput: 0: 1723.8. Samples: 7353658. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:28:32,933][41694] Avg episode reward: [(0, '4.433')] +[2024-11-08 00:28:35,245][42004] Updated weights for policy 0, policy_version 12076 (0.0025) +[2024-11-08 00:28:37,932][41694] Fps is (10 sec: 7372.7, 60 sec: 7031.8, 300 sec: 6970.1). Total num frames: 49479680. Throughput: 0: 1723.1. Samples: 7364806. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:28:37,936][41694] Avg episode reward: [(0, '4.300')] +[2024-11-08 00:28:41,094][42004] Updated weights for policy 0, policy_version 12086 (0.0035) +[2024-11-08 00:28:42,932][41694] Fps is (10 sec: 6963.1, 60 sec: 7031.4, 300 sec: 6970.1). Total num frames: 49516544. Throughput: 0: 1781.2. Samples: 7375428. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:28:42,933][41694] Avg episode reward: [(0, '4.339')] +[2024-11-08 00:28:46,344][42004] Updated weights for policy 0, policy_version 12096 (0.0024) +[2024-11-08 00:28:47,932][41694] Fps is (10 sec: 7372.2, 60 sec: 6963.2, 300 sec: 7011.8). Total num frames: 49553408. Throughput: 0: 1779.7. Samples: 7381140. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:28:47,935][41694] Avg episode reward: [(0, '4.463')] +[2024-11-08 00:28:51,766][42004] Updated weights for policy 0, policy_version 12106 (0.0031) +[2024-11-08 00:28:52,931][41694] Fps is (10 sec: 7782.6, 60 sec: 7279.2, 300 sec: 7025.7). Total num frames: 49594368. Throughput: 0: 1823.8. Samples: 7392750. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:28:52,933][41694] Avg episode reward: [(0, '4.226')] +[2024-11-08 00:28:56,975][42004] Updated weights for policy 0, policy_version 12116 (0.0029) +[2024-11-08 00:28:57,931][41694] Fps is (10 sec: 7783.2, 60 sec: 7236.3, 300 sec: 7025.7). Total num frames: 49631232. Throughput: 0: 1824.1. Samples: 7404432. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:28:57,933][41694] Avg episode reward: [(0, '4.372')] +[2024-11-08 00:29:02,937][41694] Fps is (10 sec: 5731.1, 60 sec: 7030.8, 300 sec: 6983.9). Total num frames: 49651712. Throughput: 0: 1811.5. Samples: 7409880. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:29:02,940][41694] Avg episode reward: [(0, '4.429')] +[2024-11-08 00:29:04,700][42004] Updated weights for policy 0, policy_version 12126 (0.0026) +[2024-11-08 00:29:07,934][41694] Fps is (10 sec: 5733.1, 60 sec: 7031.2, 300 sec: 6984.0). Total num frames: 49688576. Throughput: 0: 1705.9. Samples: 7416964. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:29:07,938][41694] Avg episode reward: [(0, '4.370')] +[2024-11-08 00:29:10,555][42004] Updated weights for policy 0, policy_version 12136 (0.0023) +[2024-11-08 00:29:12,932][41694] Fps is (10 sec: 6967.0, 60 sec: 6963.2, 300 sec: 6970.1). Total num frames: 49721344. Throughput: 0: 1737.6. Samples: 7427240. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:29:12,933][41694] Avg episode reward: [(0, '4.565')] +[2024-11-08 00:29:16,046][42004] Updated weights for policy 0, policy_version 12146 (0.0026) +[2024-11-08 00:29:17,931][41694] Fps is (10 sec: 7374.4, 60 sec: 6963.3, 300 sec: 6970.1). Total num frames: 49762304. Throughput: 0: 1762.7. Samples: 7432978. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:29:17,933][41694] Avg episode reward: [(0, '4.592')] +[2024-11-08 00:29:21,252][42004] Updated weights for policy 0, policy_version 12156 (0.0023) +[2024-11-08 00:29:22,931][41694] Fps is (10 sec: 8192.2, 60 sec: 6963.2, 300 sec: 7039.6). Total num frames: 49803264. Throughput: 0: 1776.1. Samples: 7444730. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:29:22,933][41694] Avg episode reward: [(0, '4.438')] +[2024-11-08 00:29:26,574][42004] Updated weights for policy 0, policy_version 12166 (0.0031) +[2024-11-08 00:29:27,932][41694] Fps is (10 sec: 7782.1, 60 sec: 7236.2, 300 sec: 7025.7). Total num frames: 49840128. Throughput: 0: 1798.7. Samples: 7456368. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:29:27,934][41694] Avg episode reward: [(0, '4.353')] +[2024-11-08 00:29:31,965][42004] Updated weights for policy 0, policy_version 12176 (0.0026) +[2024-11-08 00:29:32,932][41694] Fps is (10 sec: 7372.7, 60 sec: 7168.0, 300 sec: 7025.7). Total num frames: 49876992. Throughput: 0: 1796.8. Samples: 7461996. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:29:32,933][41694] Avg episode reward: [(0, '4.404')] +[2024-11-08 00:29:37,932][41694] Fps is (10 sec: 6144.1, 60 sec: 7031.5, 300 sec: 6997.9). Total num frames: 49901568. Throughput: 0: 1726.3. Samples: 7470434. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:29:37,933][41694] Avg episode reward: [(0, '4.487')] +[2024-11-08 00:29:37,945][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000012183_49901568.pth... +[2024-11-08 00:29:38,079][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000011771_48214016.pth +[2024-11-08 00:29:39,412][42004] Updated weights for policy 0, policy_version 12186 (0.0044) +[2024-11-08 00:29:42,934][41694] Fps is (10 sec: 5733.2, 60 sec: 6963.0, 300 sec: 6984.0). Total num frames: 49934336. Throughput: 0: 1686.9. Samples: 7480344. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:29:42,935][41694] Avg episode reward: [(0, '4.286')] +[2024-11-08 00:29:45,832][42004] Updated weights for policy 0, policy_version 12196 (0.0036) +[2024-11-08 00:29:47,931][41694] Fps is (10 sec: 6553.7, 60 sec: 6895.0, 300 sec: 6970.1). Total num frames: 49967104. Throughput: 0: 1669.1. Samples: 7484982. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:29:47,933][41694] Avg episode reward: [(0, '4.403')] +[2024-11-08 00:29:51,131][42004] Updated weights for policy 0, policy_version 12206 (0.0028) +[2024-11-08 00:29:52,932][41694] Fps is (10 sec: 6964.6, 60 sec: 6826.7, 300 sec: 6956.3). Total num frames: 50003968. Throughput: 0: 1765.4. Samples: 7496402. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:29:52,934][41694] Avg episode reward: [(0, '4.271')] +[2024-11-08 00:29:56,860][42004] Updated weights for policy 0, policy_version 12216 (0.0041) +[2024-11-08 00:29:57,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6894.9, 300 sec: 7025.7). Total num frames: 50044928. Throughput: 0: 1773.4. Samples: 7507042. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:29:57,934][41694] Avg episode reward: [(0, '4.424')] +[2024-11-08 00:30:02,446][42004] Updated weights for policy 0, policy_version 12226 (0.0028) +[2024-11-08 00:30:02,931][41694] Fps is (10 sec: 7372.9, 60 sec: 7100.4, 300 sec: 7011.8). Total num frames: 50077696. Throughput: 0: 1769.0. Samples: 7512582. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:30:02,933][41694] Avg episode reward: [(0, '4.299')] +[2024-11-08 00:30:07,925][42004] Updated weights for policy 0, policy_version 12236 (0.0030) +[2024-11-08 00:30:07,932][41694] Fps is (10 sec: 7372.8, 60 sec: 7168.3, 300 sec: 7025.7). Total num frames: 50118656. Throughput: 0: 1753.6. Samples: 7523644. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:30:07,933][41694] Avg episode reward: [(0, '4.539')] +[2024-11-08 00:30:12,932][41694] Fps is (10 sec: 6143.9, 60 sec: 6963.2, 300 sec: 6984.0). Total num frames: 50139136. Throughput: 0: 1656.1. Samples: 7530890. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:30:12,933][41694] Avg episode reward: [(0, '4.475')] +[2024-11-08 00:30:16,123][42004] Updated weights for policy 0, policy_version 12246 (0.0025) +[2024-11-08 00:30:17,932][41694] Fps is (10 sec: 4915.2, 60 sec: 6758.4, 300 sec: 6956.3). Total num frames: 50167808. Throughput: 0: 1642.7. Samples: 7535918. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:30:17,933][41694] Avg episode reward: [(0, '4.571')] +[2024-11-08 00:30:22,631][42004] Updated weights for policy 0, policy_version 12256 (0.0032) +[2024-11-08 00:30:22,932][41694] Fps is (10 sec: 6143.8, 60 sec: 6621.8, 300 sec: 6942.4). Total num frames: 50200576. Throughput: 0: 1647.4. Samples: 7544566. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:30:22,934][41694] Avg episode reward: [(0, '4.362')] +[2024-11-08 00:30:27,668][42004] Updated weights for policy 0, policy_version 12266 (0.0033) +[2024-11-08 00:30:27,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6690.2, 300 sec: 6942.4). Total num frames: 50241536. Throughput: 0: 1695.5. Samples: 7556636. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:30:27,933][41694] Avg episode reward: [(0, '4.472')] +[2024-11-08 00:30:32,932][41694] Fps is (10 sec: 7782.5, 60 sec: 6690.1, 300 sec: 6998.0). Total num frames: 50278400. Throughput: 0: 1714.9. Samples: 7562152. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:30:32,934][41694] Avg episode reward: [(0, '4.426')] +[2024-11-08 00:30:33,089][42004] Updated weights for policy 0, policy_version 12276 (0.0022) +[2024-11-08 00:30:37,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6963.2, 300 sec: 7011.8). Total num frames: 50319360. Throughput: 0: 1725.4. Samples: 7574046. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:30:37,935][41694] Avg episode reward: [(0, '4.329')] +[2024-11-08 00:30:38,195][42004] Updated weights for policy 0, policy_version 12286 (0.0026) +[2024-11-08 00:30:42,937][41694] Fps is (10 sec: 7778.4, 60 sec: 7031.1, 300 sec: 7011.7). Total num frames: 50356224. Throughput: 0: 1751.0. Samples: 7585848. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:30:42,938][41694] Avg episode reward: [(0, '4.553')] +[2024-11-08 00:30:45,668][42004] Updated weights for policy 0, policy_version 12296 (0.0030) +[2024-11-08 00:30:47,932][41694] Fps is (10 sec: 6144.0, 60 sec: 6894.9, 300 sec: 6970.1). Total num frames: 50380800. Throughput: 0: 1669.9. Samples: 7587726. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:30:47,934][41694] Avg episode reward: [(0, '4.346')] +[2024-11-08 00:30:51,153][42004] Updated weights for policy 0, policy_version 12306 (0.0030) +[2024-11-08 00:30:52,932][41694] Fps is (10 sec: 5737.4, 60 sec: 6826.7, 300 sec: 6942.4). Total num frames: 50413568. Throughput: 0: 1667.2. Samples: 7598670. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:30:52,935][41694] Avg episode reward: [(0, '4.359')] +[2024-11-08 00:30:56,702][42004] Updated weights for policy 0, policy_version 12316 (0.0023) +[2024-11-08 00:30:57,932][41694] Fps is (10 sec: 7372.3, 60 sec: 6826.6, 300 sec: 6956.2). Total num frames: 50454528. Throughput: 0: 1753.6. Samples: 7609802. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:30:57,933][41694] Avg episode reward: [(0, '4.286')] +[2024-11-08 00:31:02,133][42004] Updated weights for policy 0, policy_version 12326 (0.0022) +[2024-11-08 00:31:02,931][41694] Fps is (10 sec: 7782.5, 60 sec: 6894.9, 300 sec: 6956.3). Total num frames: 50491392. Throughput: 0: 1773.7. Samples: 7615734. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:31:02,933][41694] Avg episode reward: [(0, '4.567')] +[2024-11-08 00:31:07,339][42004] Updated weights for policy 0, policy_version 12336 (0.0023) +[2024-11-08 00:31:07,932][41694] Fps is (10 sec: 7782.8, 60 sec: 6894.9, 300 sec: 7039.6). Total num frames: 50532352. Throughput: 0: 1833.4. Samples: 7627068. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:31:07,934][41694] Avg episode reward: [(0, '4.368')] +[2024-11-08 00:31:12,932][41694] Fps is (10 sec: 7372.7, 60 sec: 7099.7, 300 sec: 7025.7). Total num frames: 50565120. Throughput: 0: 1800.3. Samples: 7637648. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:31:12,935][41694] Avg episode reward: [(0, '4.499')] +[2024-11-08 00:31:13,326][42004] Updated weights for policy 0, policy_version 12346 (0.0035) +[2024-11-08 00:31:19,571][41694] Fps is (10 sec: 5630.5, 60 sec: 6977.4, 300 sec: 6986.9). Total num frames: 50597888. Throughput: 0: 1733.7. Samples: 7643012. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:31:19,576][41694] Avg episode reward: [(0, '4.595')] +[2024-11-08 00:31:20,796][42004] Updated weights for policy 0, policy_version 12356 (0.0032) +[2024-11-08 00:31:22,932][41694] Fps is (10 sec: 5734.4, 60 sec: 7031.5, 300 sec: 6984.0). Total num frames: 50622464. Throughput: 0: 1701.6. Samples: 7650618. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:31:22,934][41694] Avg episode reward: [(0, '4.517')] +[2024-11-08 00:31:26,648][42004] Updated weights for policy 0, policy_version 12366 (0.0027) +[2024-11-08 00:31:27,931][41694] Fps is (10 sec: 7348.8, 60 sec: 6963.2, 300 sec: 6984.0). Total num frames: 50659328. Throughput: 0: 1668.0. Samples: 7660900. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:31:27,933][41694] Avg episode reward: [(0, '4.567')] +[2024-11-08 00:31:32,031][42004] Updated weights for policy 0, policy_version 12376 (0.0030) +[2024-11-08 00:31:32,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6963.2, 300 sec: 6984.0). Total num frames: 50696192. Throughput: 0: 1751.7. Samples: 7666552. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:31:32,933][41694] Avg episode reward: [(0, '4.432')] +[2024-11-08 00:31:37,096][42004] Updated weights for policy 0, policy_version 12386 (0.0024) +[2024-11-08 00:31:37,932][41694] Fps is (10 sec: 7782.1, 60 sec: 6963.2, 300 sec: 7035.8). Total num frames: 50737152. Throughput: 0: 1774.8. Samples: 7678536. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:31:37,933][41694] Avg episode reward: [(0, '4.199')] +[2024-11-08 00:31:37,944][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000012387_50737152.pth... +[2024-11-08 00:31:38,076][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000011977_49057792.pth +[2024-11-08 00:31:42,341][42004] Updated weights for policy 0, policy_version 12396 (0.0027) +[2024-11-08 00:31:42,932][41694] Fps is (10 sec: 8192.1, 60 sec: 7032.1, 300 sec: 7053.5). Total num frames: 50778112. Throughput: 0: 1791.7. Samples: 7690428. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:31:42,934][41694] Avg episode reward: [(0, '4.418')] +[2024-11-08 00:31:47,458][42004] Updated weights for policy 0, policy_version 12406 (0.0031) +[2024-11-08 00:31:47,931][41694] Fps is (10 sec: 7782.7, 60 sec: 7236.3, 300 sec: 7039.6). Total num frames: 50814976. Throughput: 0: 1792.0. Samples: 7696374. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:31:47,935][41694] Avg episode reward: [(0, '4.393')] +[2024-11-08 00:31:53,942][41694] Fps is (10 sec: 5951.9, 60 sec: 7049.2, 300 sec: 7001.7). Total num frames: 50843648. Throughput: 0: 1756.5. Samples: 7707886. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:31:53,944][41694] Avg episode reward: [(0, '4.417')] +[2024-11-08 00:31:55,064][42004] Updated weights for policy 0, policy_version 12416 (0.0029) +[2024-11-08 00:31:57,931][41694] Fps is (10 sec: 5734.4, 60 sec: 6963.3, 300 sec: 6997.9). Total num frames: 50872320. Throughput: 0: 1714.3. Samples: 7714792. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:31:57,933][41694] Avg episode reward: [(0, '4.412')] +[2024-11-08 00:32:01,009][42004] Updated weights for policy 0, policy_version 12426 (0.0030) +[2024-11-08 00:32:02,932][41694] Fps is (10 sec: 6834.9, 60 sec: 6894.9, 300 sec: 6970.1). Total num frames: 50905088. Throughput: 0: 1776.6. Samples: 7720046. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:32:02,934][41694] Avg episode reward: [(0, '4.649')] +[2024-11-08 00:32:06,647][42004] Updated weights for policy 0, policy_version 12436 (0.0029) +[2024-11-08 00:32:07,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6895.0, 300 sec: 6984.0). Total num frames: 50946048. Throughput: 0: 1780.6. Samples: 7730744. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:32:07,933][41694] Avg episode reward: [(0, '4.386')] +[2024-11-08 00:32:11,857][42004] Updated weights for policy 0, policy_version 12446 (0.0024) +[2024-11-08 00:32:12,931][41694] Fps is (10 sec: 8192.1, 60 sec: 7031.5, 300 sec: 7049.2). Total num frames: 50987008. Throughput: 0: 1813.3. Samples: 7742500. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:32:12,934][41694] Avg episode reward: [(0, '4.511')] +[2024-11-08 00:32:17,025][42004] Updated weights for policy 0, policy_version 12456 (0.0024) +[2024-11-08 00:32:17,932][41694] Fps is (10 sec: 7782.3, 60 sec: 7299.2, 300 sec: 7039.6). Total num frames: 51023872. Throughput: 0: 1819.5. Samples: 7748430. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:32:17,934][41694] Avg episode reward: [(0, '4.250')] +[2024-11-08 00:32:22,366][42004] Updated weights for policy 0, policy_version 12466 (0.0022) +[2024-11-08 00:32:22,932][41694] Fps is (10 sec: 7782.3, 60 sec: 7372.8, 300 sec: 7053.5). Total num frames: 51064832. Throughput: 0: 1813.5. Samples: 7760142. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:32:22,934][41694] Avg episode reward: [(0, '4.452')] +[2024-11-08 00:32:28,355][41694] Fps is (10 sec: 5894.4, 60 sec: 7050.0, 300 sec: 7001.7). Total num frames: 51085312. Throughput: 0: 1665.2. Samples: 7766066. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:32:28,356][41694] Avg episode reward: [(0, '4.372')] +[2024-11-08 00:32:30,000][42004] Updated weights for policy 0, policy_version 12476 (0.0029) +[2024-11-08 00:32:32,931][41694] Fps is (10 sec: 5324.8, 60 sec: 7031.5, 300 sec: 6984.1). Total num frames: 51118080. Throughput: 0: 1699.5. Samples: 7772852. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:32:32,933][41694] Avg episode reward: [(0, '4.374')] +[2024-11-08 00:32:36,109][42004] Updated weights for policy 0, policy_version 12486 (0.0038) +[2024-11-08 00:32:37,932][41694] Fps is (10 sec: 7271.1, 60 sec: 6963.2, 300 sec: 6984.0). Total num frames: 51154944. Throughput: 0: 1709.3. Samples: 7783078. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:32:37,933][41694] Avg episode reward: [(0, '4.476')] +[2024-11-08 00:32:41,571][42004] Updated weights for policy 0, policy_version 12496 (0.0036) +[2024-11-08 00:32:42,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6894.9, 300 sec: 6970.2). Total num frames: 51191808. Throughput: 0: 1770.8. Samples: 7794480. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 00:32:42,935][41694] Avg episode reward: [(0, '4.513')] +[2024-11-08 00:32:46,732][42004] Updated weights for policy 0, policy_version 12506 (0.0027) +[2024-11-08 00:32:47,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6963.2, 300 sec: 7034.1). Total num frames: 51232768. Throughput: 0: 1783.9. Samples: 7800322. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 00:32:47,934][41694] Avg episode reward: [(0, '4.419')] +[2024-11-08 00:32:51,957][42004] Updated weights for policy 0, policy_version 12516 (0.0027) +[2024-11-08 00:32:52,932][41694] Fps is (10 sec: 7782.4, 60 sec: 7221.4, 300 sec: 7025.7). Total num frames: 51269632. Throughput: 0: 1807.4. Samples: 7812076. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:32:52,934][41694] Avg episode reward: [(0, '4.281')] +[2024-11-08 00:32:57,295][42004] Updated weights for policy 0, policy_version 12526 (0.0023) +[2024-11-08 00:32:57,931][41694] Fps is (10 sec: 7782.5, 60 sec: 7304.5, 300 sec: 7053.5). Total num frames: 51310592. Throughput: 0: 1806.3. Samples: 7823782. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:32:57,934][41694] Avg episode reward: [(0, '4.343')] +[2024-11-08 00:33:02,957][41694] Fps is (10 sec: 6128.5, 60 sec: 7096.7, 300 sec: 6997.3). Total num frames: 51331072. Throughput: 0: 1796.8. Samples: 7829332. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:33:02,960][41694] Avg episode reward: [(0, '4.305')] +[2024-11-08 00:33:05,521][42004] Updated weights for policy 0, policy_version 12536 (0.0044) +[2024-11-08 00:33:07,939][41694] Fps is (10 sec: 5324.4, 60 sec: 6963.1, 300 sec: 6984.0). Total num frames: 51363840. Throughput: 0: 1667.7. Samples: 7835188. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:33:07,942][41694] Avg episode reward: [(0, '4.543')] +[2024-11-08 00:33:11,305][42004] Updated weights for policy 0, policy_version 12546 (0.0025) +[2024-11-08 00:33:12,931][41694] Fps is (10 sec: 6980.9, 60 sec: 6894.9, 300 sec: 6970.2). Total num frames: 51400704. Throughput: 0: 1795.5. Samples: 7846102. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:33:12,933][41694] Avg episode reward: [(0, '4.289')] +[2024-11-08 00:33:16,274][42004] Updated weights for policy 0, policy_version 12556 (0.0031) +[2024-11-08 00:33:17,931][41694] Fps is (10 sec: 7783.0, 60 sec: 6963.2, 300 sec: 6970.1). Total num frames: 51441664. Throughput: 0: 1765.8. Samples: 7852314. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:33:17,933][41694] Avg episode reward: [(0, '4.355')] +[2024-11-08 00:33:21,462][42004] Updated weights for policy 0, policy_version 12566 (0.0031) +[2024-11-08 00:33:22,932][41694] Fps is (10 sec: 7781.8, 60 sec: 6894.9, 300 sec: 7025.7). Total num frames: 51478528. Throughput: 0: 1802.9. Samples: 7864208. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:33:22,934][41694] Avg episode reward: [(0, '4.389')] +[2024-11-08 00:33:26,634][42004] Updated weights for policy 0, policy_version 12576 (0.0030) +[2024-11-08 00:33:27,931][41694] Fps is (10 sec: 7782.4, 60 sec: 7287.7, 300 sec: 7025.7). Total num frames: 51519488. Throughput: 0: 1814.4. Samples: 7876126. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:33:27,934][41694] Avg episode reward: [(0, '4.231')] +[2024-11-08 00:33:31,789][42004] Updated weights for policy 0, policy_version 12586 (0.0027) +[2024-11-08 00:33:32,932][41694] Fps is (10 sec: 8192.4, 60 sec: 7372.8, 300 sec: 7053.4). Total num frames: 51560448. Throughput: 0: 1814.8. Samples: 7881990. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:33:32,933][41694] Avg episode reward: [(0, '4.434')] +[2024-11-08 00:33:37,932][41694] Fps is (10 sec: 6143.5, 60 sec: 7099.6, 300 sec: 6997.9). Total num frames: 51580928. Throughput: 0: 1774.7. Samples: 7891940. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:33:37,934][41694] Avg episode reward: [(0, '4.402')] +[2024-11-08 00:33:37,947][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000012593_51580928.pth... +[2024-11-08 00:33:38,122][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000012183_49901568.pth +[2024-11-08 00:33:39,747][42004] Updated weights for policy 0, policy_version 12596 (0.0025) +[2024-11-08 00:33:42,931][41694] Fps is (10 sec: 5324.9, 60 sec: 7031.5, 300 sec: 6984.1). Total num frames: 51613696. Throughput: 0: 1688.5. Samples: 7899766. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:33:42,933][41694] Avg episode reward: [(0, '4.495')] +[2024-11-08 00:33:45,236][42004] Updated weights for policy 0, policy_version 12606 (0.0032) +[2024-11-08 00:33:47,931][41694] Fps is (10 sec: 7373.3, 60 sec: 7031.5, 300 sec: 6984.0). Total num frames: 51654656. Throughput: 0: 1699.1. Samples: 7905748. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:33:47,933][41694] Avg episode reward: [(0, '4.435')] +[2024-11-08 00:33:50,296][42004] Updated weights for policy 0, policy_version 12616 (0.0032) +[2024-11-08 00:33:52,932][41694] Fps is (10 sec: 8191.9, 60 sec: 7099.7, 300 sec: 6997.9). Total num frames: 51695616. Throughput: 0: 1838.3. Samples: 7917910. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:33:52,933][41694] Avg episode reward: [(0, '4.477')] +[2024-11-08 00:33:55,450][42004] Updated weights for policy 0, policy_version 12626 (0.0024) +[2024-11-08 00:33:57,932][41694] Fps is (10 sec: 7782.3, 60 sec: 7031.5, 300 sec: 7053.6). Total num frames: 51732480. Throughput: 0: 1855.5. Samples: 7929600. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:33:57,934][41694] Avg episode reward: [(0, '4.378')] +[2024-11-08 00:34:01,008][42004] Updated weights for policy 0, policy_version 12636 (0.0031) +[2024-11-08 00:34:02,932][41694] Fps is (10 sec: 7372.8, 60 sec: 7307.6, 300 sec: 7053.5). Total num frames: 51769344. Throughput: 0: 1843.9. Samples: 7935288. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:34:02,935][41694] Avg episode reward: [(0, '4.287')] +[2024-11-08 00:34:06,660][42004] Updated weights for policy 0, policy_version 12646 (0.0034) +[2024-11-08 00:34:07,932][41694] Fps is (10 sec: 7372.5, 60 sec: 7372.8, 300 sec: 7067.3). Total num frames: 51806208. Throughput: 0: 1816.1. Samples: 7945932. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:34:07,934][41694] Avg episode reward: [(0, '4.421')] +[2024-11-08 00:34:12,932][41694] Fps is (10 sec: 5324.8, 60 sec: 7031.5, 300 sec: 6984.0). Total num frames: 51822592. Throughput: 0: 1699.8. Samples: 7952616. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:34:12,933][41694] Avg episode reward: [(0, '4.557')] +[2024-11-08 00:34:14,774][42004] Updated weights for policy 0, policy_version 12656 (0.0034) +[2024-11-08 00:34:17,933][41694] Fps is (10 sec: 5324.3, 60 sec: 6963.0, 300 sec: 6970.1). Total num frames: 51859456. Throughput: 0: 1681.0. Samples: 7957636. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:34:17,934][41694] Avg episode reward: [(0, '4.433')] +[2024-11-08 00:34:20,235][42004] Updated weights for policy 0, policy_version 12666 (0.0026) +[2024-11-08 00:34:22,932][41694] Fps is (10 sec: 7782.3, 60 sec: 7031.5, 300 sec: 6984.0). Total num frames: 51900416. Throughput: 0: 1717.3. Samples: 7969216. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:34:22,933][41694] Avg episode reward: [(0, '4.206')] +[2024-11-08 00:34:25,090][42004] Updated weights for policy 0, policy_version 12676 (0.0032) +[2024-11-08 00:34:27,932][41694] Fps is (10 sec: 8192.8, 60 sec: 7031.4, 300 sec: 6997.9). Total num frames: 51941376. Throughput: 0: 1812.6. Samples: 7981334. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:34:27,934][41694] Avg episode reward: [(0, '4.475')] +[2024-11-08 00:34:30,448][42004] Updated weights for policy 0, policy_version 12686 (0.0024) +[2024-11-08 00:34:32,932][41694] Fps is (10 sec: 7782.5, 60 sec: 6963.2, 300 sec: 7039.6). Total num frames: 51978240. Throughput: 0: 1810.4. Samples: 7987216. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:34:32,933][41694] Avg episode reward: [(0, '4.293')] +[2024-11-08 00:34:35,777][42004] Updated weights for policy 0, policy_version 12696 (0.0030) +[2024-11-08 00:34:37,931][41694] Fps is (10 sec: 7782.9, 60 sec: 7304.6, 300 sec: 7067.4). Total num frames: 52019200. Throughput: 0: 1797.7. Samples: 7998804. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:34:37,933][41694] Avg episode reward: [(0, '4.493')] +[2024-11-08 00:34:41,119][42004] Updated weights for policy 0, policy_version 12706 (0.0020) +[2024-11-08 00:34:42,931][41694] Fps is (10 sec: 7372.9, 60 sec: 7304.5, 300 sec: 7067.3). Total num frames: 52051968. Throughput: 0: 1783.0. Samples: 8009836. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:34:42,933][41694] Avg episode reward: [(0, '4.399')] +[2024-11-08 00:34:47,932][41694] Fps is (10 sec: 5324.7, 60 sec: 6963.2, 300 sec: 7011.8). Total num frames: 52072448. Throughput: 0: 1721.3. Samples: 8012748. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:34:47,933][41694] Avg episode reward: [(0, '4.305')] +[2024-11-08 00:34:49,547][42004] Updated weights for policy 0, policy_version 12716 (0.0043) +[2024-11-08 00:34:52,931][41694] Fps is (10 sec: 5734.5, 60 sec: 6895.0, 300 sec: 6997.9). Total num frames: 52109312. Throughput: 0: 1677.5. Samples: 8021420. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:34:52,933][41694] Avg episode reward: [(0, '4.424')] +[2024-11-08 00:34:54,980][42004] Updated weights for policy 0, policy_version 12726 (0.0024) +[2024-11-08 00:34:57,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6894.9, 300 sec: 7011.8). Total num frames: 52146176. Throughput: 0: 1786.8. Samples: 8033020. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:34:57,935][41694] Avg episode reward: [(0, '4.329')] +[2024-11-08 00:35:00,152][42004] Updated weights for policy 0, policy_version 12736 (0.0032) +[2024-11-08 00:35:02,932][41694] Fps is (10 sec: 7372.5, 60 sec: 6894.9, 300 sec: 6997.9). Total num frames: 52183040. Throughput: 0: 1808.1. Samples: 8038998. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:35:02,934][41694] Avg episode reward: [(0, '4.296')] +[2024-11-08 00:35:05,799][42004] Updated weights for policy 0, policy_version 12746 (0.0024) +[2024-11-08 00:35:07,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6963.3, 300 sec: 7067.3). Total num frames: 52224000. Throughput: 0: 1794.4. Samples: 8049964. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:35:07,935][41694] Avg episode reward: [(0, '4.507')] +[2024-11-08 00:35:11,288][42004] Updated weights for policy 0, policy_version 12756 (0.0024) +[2024-11-08 00:35:12,931][41694] Fps is (10 sec: 7373.1, 60 sec: 7236.3, 300 sec: 7081.2). Total num frames: 52256768. Throughput: 0: 1767.6. Samples: 8060874. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:35:12,933][41694] Avg episode reward: [(0, '4.405')] +[2024-11-08 00:35:17,850][42004] Updated weights for policy 0, policy_version 12766 (0.0036) +[2024-11-08 00:35:17,932][41694] Fps is (10 sec: 6553.6, 60 sec: 7168.2, 300 sec: 7081.2). Total num frames: 52289536. Throughput: 0: 1739.6. Samples: 8065500. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:35:17,935][41694] Avg episode reward: [(0, '4.332')] +[2024-11-08 00:35:22,931][41694] Fps is (10 sec: 4915.2, 60 sec: 6758.4, 300 sec: 6997.9). Total num frames: 52305920. Throughput: 0: 1609.6. Samples: 8071236. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:35:22,933][41694] Avg episode reward: [(0, '4.288')] +[2024-11-08 00:35:26,006][42004] Updated weights for policy 0, policy_version 12776 (0.0034) +[2024-11-08 00:35:27,931][41694] Fps is (10 sec: 5324.8, 60 sec: 6690.2, 300 sec: 6997.9). Total num frames: 52342784. Throughput: 0: 1602.0. Samples: 8081926. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:35:27,934][41694] Avg episode reward: [(0, '4.468')] +[2024-11-08 00:35:31,569][42004] Updated weights for policy 0, policy_version 12786 (0.0027) +[2024-11-08 00:35:32,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6690.1, 300 sec: 6984.0). Total num frames: 52379648. Throughput: 0: 1661.7. Samples: 8087524. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:35:32,934][41694] Avg episode reward: [(0, '4.623')] +[2024-11-08 00:35:36,616][42004] Updated weights for policy 0, policy_version 12796 (0.0032) +[2024-11-08 00:35:37,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6690.1, 300 sec: 6998.0). Total num frames: 52420608. Throughput: 0: 1734.0. Samples: 8099450. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:35:37,933][41694] Avg episode reward: [(0, '4.485')] +[2024-11-08 00:35:37,943][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000012798_52420608.pth... +[2024-11-08 00:35:38,104][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000012387_50737152.pth +[2024-11-08 00:35:41,859][42004] Updated weights for policy 0, policy_version 12806 (0.0032) +[2024-11-08 00:35:42,931][41694] Fps is (10 sec: 7782.4, 60 sec: 6758.4, 300 sec: 7039.6). Total num frames: 52457472. Throughput: 0: 1736.3. Samples: 8111152. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:35:42,933][41694] Avg episode reward: [(0, '4.356')] +[2024-11-08 00:35:47,124][42004] Updated weights for policy 0, policy_version 12816 (0.0026) +[2024-11-08 00:35:47,932][41694] Fps is (10 sec: 7782.2, 60 sec: 7099.7, 300 sec: 7067.3). Total num frames: 52498432. Throughput: 0: 1733.1. Samples: 8116986. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:35:47,935][41694] Avg episode reward: [(0, '4.436')] +[2024-11-08 00:35:55,130][41694] Fps is (10 sec: 6043.8, 60 sec: 6782.9, 300 sec: 6987.5). Total num frames: 52531200. Throughput: 0: 1644.2. Samples: 8127566. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:35:55,131][41694] Avg episode reward: [(0, '4.572')] +[2024-11-08 00:35:55,468][42004] Updated weights for policy 0, policy_version 12826 (0.0033) +[2024-11-08 00:35:57,932][41694] Fps is (10 sec: 5325.0, 60 sec: 6758.4, 300 sec: 6984.0). Total num frames: 52551680. Throughput: 0: 1619.9. Samples: 8133770. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:35:57,933][41694] Avg episode reward: [(0, '4.536')] +[2024-11-08 00:36:01,305][42004] Updated weights for policy 0, policy_version 12836 (0.0038) +[2024-11-08 00:36:02,932][41694] Fps is (10 sec: 6825.3, 60 sec: 6690.1, 300 sec: 6956.3). Total num frames: 52584448. Throughput: 0: 1641.0. Samples: 8139346. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:36:02,935][41694] Avg episode reward: [(0, '4.437')] +[2024-11-08 00:36:07,003][42004] Updated weights for policy 0, policy_version 12846 (0.0027) +[2024-11-08 00:36:07,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6621.9, 300 sec: 6970.1). Total num frames: 52621312. Throughput: 0: 1749.7. Samples: 8149972. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:36:07,933][41694] Avg episode reward: [(0, '4.341')] +[2024-11-08 00:36:12,586][42004] Updated weights for policy 0, policy_version 12856 (0.0032) +[2024-11-08 00:36:12,933][41694] Fps is (10 sec: 7371.9, 60 sec: 6690.0, 300 sec: 7023.0). Total num frames: 52658176. Throughput: 0: 1753.9. Samples: 8160854. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:36:12,937][41694] Avg episode reward: [(0, '4.326')] +[2024-11-08 00:36:17,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6758.4, 300 sec: 7025.7). Total num frames: 52695040. Throughput: 0: 1754.1. Samples: 8166458. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:36:17,934][41694] Avg episode reward: [(0, '4.781')] +[2024-11-08 00:36:18,010][42004] Updated weights for policy 0, policy_version 12866 (0.0031) +[2024-11-08 00:36:22,932][41694] Fps is (10 sec: 7373.6, 60 sec: 7099.7, 300 sec: 7025.7). Total num frames: 52731904. Throughput: 0: 1744.6. Samples: 8177956. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:36:22,934][41694] Avg episode reward: [(0, '4.412')] +[2024-11-08 00:36:23,535][42004] Updated weights for policy 0, policy_version 12876 (0.0026) +[2024-11-08 00:36:29,714][41694] Fps is (10 sec: 5909.7, 60 sec: 6828.6, 300 sec: 6969.7). Total num frames: 52764672. Throughput: 0: 1650.6. Samples: 8188370. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 00:36:29,716][41694] Avg episode reward: [(0, '4.444')] +[2024-11-08 00:36:31,837][42004] Updated weights for policy 0, policy_version 12886 (0.0029) +[2024-11-08 00:36:32,932][41694] Fps is (10 sec: 5325.0, 60 sec: 6758.4, 300 sec: 6942.4). Total num frames: 52785152. Throughput: 0: 1611.6. Samples: 8189506. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 00:36:32,933][41694] Avg episode reward: [(0, '4.542')] +[2024-11-08 00:36:37,401][42004] Updated weights for policy 0, policy_version 12896 (0.0026) +[2024-11-08 00:36:37,932][41694] Fps is (10 sec: 7476.8, 60 sec: 6758.4, 300 sec: 6942.4). Total num frames: 52826112. Throughput: 0: 1699.9. Samples: 8200322. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:36:37,933][41694] Avg episode reward: [(0, '4.461')] +[2024-11-08 00:36:42,631][42004] Updated weights for policy 0, policy_version 12906 (0.0028) +[2024-11-08 00:36:42,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6758.4, 300 sec: 6942.4). Total num frames: 52862976. Throughput: 0: 1742.4. Samples: 8212178. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:36:42,933][41694] Avg episode reward: [(0, '4.489')] +[2024-11-08 00:36:47,717][42004] Updated weights for policy 0, policy_version 12916 (0.0022) +[2024-11-08 00:36:47,932][41694] Fps is (10 sec: 7782.0, 60 sec: 6758.4, 300 sec: 7008.0). Total num frames: 52903936. Throughput: 0: 1747.9. Samples: 8218002. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:36:47,935][41694] Avg episode reward: [(0, '4.351')] +[2024-11-08 00:36:52,713][42004] Updated weights for policy 0, policy_version 12926 (0.0026) +[2024-11-08 00:36:52,933][41694] Fps is (10 sec: 8191.0, 60 sec: 7157.1, 300 sec: 7025.6). Total num frames: 52944896. Throughput: 0: 1783.1. Samples: 8230212. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:36:52,935][41694] Avg episode reward: [(0, '4.252')] +[2024-11-08 00:36:57,932][41694] Fps is (10 sec: 7782.6, 60 sec: 7168.0, 300 sec: 7039.6). Total num frames: 52981760. Throughput: 0: 1794.2. Samples: 8241590. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:36:57,934][41694] Avg episode reward: [(0, '4.519')] +[2024-11-08 00:36:58,519][42004] Updated weights for policy 0, policy_version 12936 (0.0042) +[2024-11-08 00:37:04,045][41694] Fps is (10 sec: 5529.0, 60 sec: 6903.4, 300 sec: 6957.8). Total num frames: 53006336. Throughput: 0: 1737.6. Samples: 8246584. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:37:04,047][41694] Avg episode reward: [(0, '4.537')] +[2024-11-08 00:37:06,848][42004] Updated weights for policy 0, policy_version 12946 (0.0046) +[2024-11-08 00:37:07,931][41694] Fps is (10 sec: 5325.0, 60 sec: 6894.9, 300 sec: 6942.4). Total num frames: 53035008. Throughput: 0: 1659.9. Samples: 8252650. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:37:07,933][41694] Avg episode reward: [(0, '4.330')] +[2024-11-08 00:37:12,191][42004] Updated weights for policy 0, policy_version 12956 (0.0029) +[2024-11-08 00:37:12,931][41694] Fps is (10 sec: 7375.0, 60 sec: 6895.1, 300 sec: 6942.4). Total num frames: 53071872. Throughput: 0: 1754.3. Samples: 8264184. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:37:12,933][41694] Avg episode reward: [(0, '4.465')] +[2024-11-08 00:37:17,117][42004] Updated weights for policy 0, policy_version 12966 (0.0027) +[2024-11-08 00:37:17,931][41694] Fps is (10 sec: 7782.4, 60 sec: 6963.2, 300 sec: 6942.4). Total num frames: 53112832. Throughput: 0: 1796.8. Samples: 8270362. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 00:37:17,934][41694] Avg episode reward: [(0, '4.431')] +[2024-11-08 00:37:22,133][42004] Updated weights for policy 0, policy_version 12976 (0.0020) +[2024-11-08 00:37:22,931][41694] Fps is (10 sec: 8192.0, 60 sec: 7031.5, 300 sec: 7021.9). Total num frames: 53153792. Throughput: 0: 1827.9. Samples: 8282578. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 00:37:22,933][41694] Avg episode reward: [(0, '4.510')] +[2024-11-08 00:37:27,267][42004] Updated weights for policy 0, policy_version 12986 (0.0024) +[2024-11-08 00:37:27,932][41694] Fps is (10 sec: 8191.9, 60 sec: 7387.5, 300 sec: 7039.6). Total num frames: 53194752. Throughput: 0: 1835.5. Samples: 8294774. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:37:27,933][41694] Avg episode reward: [(0, '4.418')] +[2024-11-08 00:37:32,931][41694] Fps is (10 sec: 7372.8, 60 sec: 7372.8, 300 sec: 7025.7). Total num frames: 53227520. Throughput: 0: 1825.4. Samples: 8300142. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:37:32,933][41694] Avg episode reward: [(0, '4.614')] +[2024-11-08 00:37:33,240][42004] Updated weights for policy 0, policy_version 12996 (0.0032) +[2024-11-08 00:37:38,608][41694] Fps is (10 sec: 5371.0, 60 sec: 7020.6, 300 sec: 6968.0). Total num frames: 53252096. Throughput: 0: 1757.7. Samples: 8310494. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:37:38,611][41694] Avg episode reward: [(0, '4.425')] +[2024-11-08 00:37:38,650][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000013001_53252096.pth... +[2024-11-08 00:37:38,780][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000012593_51580928.pth +[2024-11-08 00:37:41,311][42004] Updated weights for policy 0, policy_version 13006 (0.0028) +[2024-11-08 00:37:42,932][41694] Fps is (10 sec: 5324.8, 60 sec: 6963.2, 300 sec: 6942.4). Total num frames: 53280768. Throughput: 0: 1673.7. Samples: 8316906. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:37:42,933][41694] Avg episode reward: [(0, '4.697')] +[2024-11-08 00:37:46,833][42004] Updated weights for policy 0, policy_version 13016 (0.0019) +[2024-11-08 00:37:47,931][41694] Fps is (10 sec: 7468.6, 60 sec: 6963.3, 300 sec: 6956.3). Total num frames: 53321728. Throughput: 0: 1727.9. Samples: 8322414. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:37:47,934][41694] Avg episode reward: [(0, '4.450')] +[2024-11-08 00:37:51,823][42004] Updated weights for policy 0, policy_version 13026 (0.0028) +[2024-11-08 00:37:52,931][41694] Fps is (10 sec: 8192.1, 60 sec: 6963.4, 300 sec: 6956.3). Total num frames: 53362688. Throughput: 0: 1821.5. Samples: 8334616. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:37:52,933][41694] Avg episode reward: [(0, '4.334')] +[2024-11-08 00:37:56,786][42004] Updated weights for policy 0, policy_version 13036 (0.0031) +[2024-11-08 00:37:57,932][41694] Fps is (10 sec: 8191.5, 60 sec: 7031.4, 300 sec: 7026.3). Total num frames: 53403648. Throughput: 0: 1842.1. Samples: 8347080. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:37:57,937][41694] Avg episode reward: [(0, '4.420')] +[2024-11-08 00:38:02,080][42004] Updated weights for policy 0, policy_version 13046 (0.0025) +[2024-11-08 00:38:02,932][41694] Fps is (10 sec: 7782.2, 60 sec: 7373.1, 300 sec: 7039.6). Total num frames: 53440512. Throughput: 0: 1837.0. Samples: 8353028. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:38:02,935][41694] Avg episode reward: [(0, '4.752')] +[2024-11-08 00:38:07,932][41694] Fps is (10 sec: 6963.6, 60 sec: 7304.5, 300 sec: 7025.7). Total num frames: 53473280. Throughput: 0: 1793.6. Samples: 8363292. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:38:07,934][41694] Avg episode reward: [(0, '4.435')] +[2024-11-08 00:38:08,085][42004] Updated weights for policy 0, policy_version 13056 (0.0033) +[2024-11-08 00:38:13,280][41694] Fps is (10 sec: 5541.4, 60 sec: 7058.7, 300 sec: 6961.9). Total num frames: 53497856. Throughput: 0: 1622.9. Samples: 8368368. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:38:13,281][41694] Avg episode reward: [(0, '4.449')] +[2024-11-08 00:38:15,926][42004] Updated weights for policy 0, policy_version 13066 (0.0032) +[2024-11-08 00:38:17,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6963.2, 300 sec: 6956.3). Total num frames: 53530624. Throughput: 0: 1673.3. Samples: 8375442. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:38:17,935][41694] Avg episode reward: [(0, '4.404')] +[2024-11-08 00:38:21,186][42004] Updated weights for policy 0, policy_version 13076 (0.0025) +[2024-11-08 00:38:22,932][41694] Fps is (10 sec: 7638.8, 60 sec: 6963.2, 300 sec: 6956.3). Total num frames: 53571584. Throughput: 0: 1727.3. Samples: 8387052. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:38:22,933][41694] Avg episode reward: [(0, '4.569')] +[2024-11-08 00:38:26,339][42004] Updated weights for policy 0, policy_version 13086 (0.0023) +[2024-11-08 00:38:27,931][41694] Fps is (10 sec: 8192.2, 60 sec: 6963.2, 300 sec: 6956.3). Total num frames: 53612544. Throughput: 0: 1826.3. Samples: 8399090. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 00:38:27,934][41694] Avg episode reward: [(0, '4.519')] +[2024-11-08 00:38:31,247][42004] Updated weights for policy 0, policy_version 13096 (0.0029) +[2024-11-08 00:38:32,932][41694] Fps is (10 sec: 8192.1, 60 sec: 7099.7, 300 sec: 7025.7). Total num frames: 53653504. Throughput: 0: 1841.3. Samples: 8405274. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 00:38:32,933][41694] Avg episode reward: [(0, '4.497')] +[2024-11-08 00:38:36,370][42004] Updated weights for policy 0, policy_version 13106 (0.0020) +[2024-11-08 00:38:37,934][41694] Fps is (10 sec: 7780.5, 60 sec: 7387.6, 300 sec: 7039.5). Total num frames: 53690368. Throughput: 0: 1841.5. Samples: 8417488. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:38:37,938][41694] Avg episode reward: [(0, '4.481')] +[2024-11-08 00:38:42,271][42004] Updated weights for policy 0, policy_version 13116 (0.0031) +[2024-11-08 00:38:42,933][41694] Fps is (10 sec: 7372.0, 60 sec: 7440.9, 300 sec: 7025.7). Total num frames: 53727232. Throughput: 0: 1794.3. Samples: 8427824. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:38:42,934][41694] Avg episode reward: [(0, '4.388')] +[2024-11-08 00:38:47,932][41694] Fps is (10 sec: 5735.7, 60 sec: 7099.7, 300 sec: 6956.3). Total num frames: 53747712. Throughput: 0: 1790.3. Samples: 8433592. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:38:47,934][41694] Avg episode reward: [(0, '4.305')] +[2024-11-08 00:38:49,902][42004] Updated weights for policy 0, policy_version 13126 (0.0030) +[2024-11-08 00:38:52,931][41694] Fps is (10 sec: 5735.1, 60 sec: 7031.5, 300 sec: 6956.3). Total num frames: 53784576. Throughput: 0: 1718.7. Samples: 8440632. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:38:52,933][41694] Avg episode reward: [(0, '4.577')] +[2024-11-08 00:38:55,114][42004] Updated weights for policy 0, policy_version 13136 (0.0024) +[2024-11-08 00:38:57,932][41694] Fps is (10 sec: 7782.5, 60 sec: 7031.5, 300 sec: 6970.1). Total num frames: 53825536. Throughput: 0: 1884.6. Samples: 8452520. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:38:57,933][41694] Avg episode reward: [(0, '4.467')] +[2024-11-08 00:39:00,221][42004] Updated weights for policy 0, policy_version 13146 (0.0032) +[2024-11-08 00:39:02,933][41694] Fps is (10 sec: 7781.3, 60 sec: 7031.3, 300 sec: 6970.1). Total num frames: 53862400. Throughput: 0: 1849.6. Samples: 8458674. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:39:02,936][41694] Avg episode reward: [(0, '4.376')] +[2024-11-08 00:39:05,818][42004] Updated weights for policy 0, policy_version 13156 (0.0036) +[2024-11-08 00:39:07,932][41694] Fps is (10 sec: 7372.8, 60 sec: 7099.7, 300 sec: 7039.6). Total num frames: 53899264. Throughput: 0: 1836.4. Samples: 8469688. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:39:07,934][41694] Avg episode reward: [(0, '4.477')] +[2024-11-08 00:39:11,242][42004] Updated weights for policy 0, policy_version 13166 (0.0020) +[2024-11-08 00:39:12,933][41694] Fps is (10 sec: 7373.0, 60 sec: 7347.1, 300 sec: 7039.6). Total num frames: 53936128. Throughput: 0: 1815.7. Samples: 8480798. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:39:12,934][41694] Avg episode reward: [(0, '4.394')] +[2024-11-08 00:39:16,902][42004] Updated weights for policy 0, policy_version 13176 (0.0027) +[2024-11-08 00:39:17,931][41694] Fps is (10 sec: 7372.9, 60 sec: 7372.8, 300 sec: 7025.7). Total num frames: 53972992. Throughput: 0: 1793.3. Samples: 8485972. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:39:17,933][41694] Avg episode reward: [(0, '4.417')] +[2024-11-08 00:39:22,932][41694] Fps is (10 sec: 5734.9, 60 sec: 7031.5, 300 sec: 6956.3). Total num frames: 53993472. Throughput: 0: 1738.6. Samples: 8495720. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:39:22,933][41694] Avg episode reward: [(0, '4.291')] +[2024-11-08 00:39:24,829][42004] Updated weights for policy 0, policy_version 13186 (0.0031) +[2024-11-08 00:39:27,931][41694] Fps is (10 sec: 5734.4, 60 sec: 6963.2, 300 sec: 6956.3). Total num frames: 54030336. Throughput: 0: 1701.4. Samples: 8504386. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:39:27,933][41694] Avg episode reward: [(0, '4.310')] +[2024-11-08 00:39:30,048][42004] Updated weights for policy 0, policy_version 13196 (0.0030) +[2024-11-08 00:39:32,931][41694] Fps is (10 sec: 7782.5, 60 sec: 6963.2, 300 sec: 6956.3). Total num frames: 54071296. Throughput: 0: 1701.7. Samples: 8510166. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:39:32,933][41694] Avg episode reward: [(0, '4.607')] +[2024-11-08 00:39:35,243][42004] Updated weights for policy 0, policy_version 13206 (0.0031) +[2024-11-08 00:39:37,931][41694] Fps is (10 sec: 8192.0, 60 sec: 7031.8, 300 sec: 6984.0). Total num frames: 54112256. Throughput: 0: 1810.7. Samples: 8522112. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:39:37,933][41694] Avg episode reward: [(0, '4.320')] +[2024-11-08 00:39:37,945][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000013211_54112256.pth... +[2024-11-08 00:39:38,058][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000012798_52420608.pth +[2024-11-08 00:39:40,435][42004] Updated weights for policy 0, policy_version 13216 (0.0022) +[2024-11-08 00:39:42,931][41694] Fps is (10 sec: 7782.4, 60 sec: 7031.6, 300 sec: 7039.6). Total num frames: 54149120. Throughput: 0: 1801.7. Samples: 8533598. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:39:42,934][41694] Avg episode reward: [(0, '4.471')] +[2024-11-08 00:39:46,163][42004] Updated weights for policy 0, policy_version 13226 (0.0031) +[2024-11-08 00:39:47,932][41694] Fps is (10 sec: 6963.0, 60 sec: 7236.3, 300 sec: 7025.7). Total num frames: 54181888. Throughput: 0: 1786.7. Samples: 8539074. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:39:47,936][41694] Avg episode reward: [(0, '4.623')] +[2024-11-08 00:39:51,840][42004] Updated weights for policy 0, policy_version 13236 (0.0031) +[2024-11-08 00:39:52,932][41694] Fps is (10 sec: 7372.7, 60 sec: 7304.5, 300 sec: 7039.6). Total num frames: 54222848. Throughput: 0: 1776.3. Samples: 8549622. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:39:52,934][41694] Avg episode reward: [(0, '4.519')] +[2024-11-08 00:39:57,932][41694] Fps is (10 sec: 6144.1, 60 sec: 6963.2, 300 sec: 6984.0). Total num frames: 54243328. Throughput: 0: 1689.8. Samples: 8556836. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 00:39:57,934][41694] Avg episode reward: [(0, '4.421')] +[2024-11-08 00:39:59,445][42004] Updated weights for policy 0, policy_version 13246 (0.0030) +[2024-11-08 00:40:02,931][41694] Fps is (10 sec: 5734.4, 60 sec: 6963.4, 300 sec: 6970.1). Total num frames: 54280192. Throughput: 0: 1703.0. Samples: 8562606. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 00:40:02,933][41694] Avg episode reward: [(0, '4.297')] +[2024-11-08 00:40:05,101][42004] Updated weights for policy 0, policy_version 13256 (0.0028) +[2024-11-08 00:40:07,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6963.2, 300 sec: 6984.0). Total num frames: 54317056. Throughput: 0: 1727.0. Samples: 8573434. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:40:07,934][41694] Avg episode reward: [(0, '4.378')] +[2024-11-08 00:40:10,382][42004] Updated weights for policy 0, policy_version 13266 (0.0024) +[2024-11-08 00:40:12,932][41694] Fps is (10 sec: 7372.6, 60 sec: 6963.3, 300 sec: 6997.9). Total num frames: 54353920. Throughput: 0: 1790.8. Samples: 8584972. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:40:12,933][41694] Avg episode reward: [(0, '4.584')] +[2024-11-08 00:40:16,226][42004] Updated weights for policy 0, policy_version 13276 (0.0031) +[2024-11-08 00:40:17,933][41694] Fps is (10 sec: 6962.6, 60 sec: 6894.8, 300 sec: 7053.4). Total num frames: 54386688. Throughput: 0: 1772.2. Samples: 8589918. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:40:17,935][41694] Avg episode reward: [(0, '4.487')] +[2024-11-08 00:40:22,417][42004] Updated weights for policy 0, policy_version 13286 (0.0032) +[2024-11-08 00:40:22,932][41694] Fps is (10 sec: 6553.2, 60 sec: 7099.7, 300 sec: 7039.5). Total num frames: 54419456. Throughput: 0: 1731.3. Samples: 8600022. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:40:22,936][41694] Avg episode reward: [(0, '4.548')] +[2024-11-08 00:40:27,932][41694] Fps is (10 sec: 6963.9, 60 sec: 7099.7, 300 sec: 7039.6). Total num frames: 54456320. Throughput: 0: 1717.4. Samples: 8610882. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:40:27,933][41694] Avg episode reward: [(0, '4.469')] +[2024-11-08 00:40:27,946][42004] Updated weights for policy 0, policy_version 13296 (0.0024) +[2024-11-08 00:40:32,932][41694] Fps is (10 sec: 6144.4, 60 sec: 6826.7, 300 sec: 6984.0). Total num frames: 54480896. Throughput: 0: 1683.9. Samples: 8614848. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:40:32,934][41694] Avg episode reward: [(0, '4.324')] +[2024-11-08 00:40:35,471][42004] Updated weights for policy 0, policy_version 13306 (0.0028) +[2024-11-08 00:40:37,931][41694] Fps is (10 sec: 6144.1, 60 sec: 6758.4, 300 sec: 6984.0). Total num frames: 54517760. Throughput: 0: 1650.0. Samples: 8623872. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:40:37,933][41694] Avg episode reward: [(0, '4.332')] +[2024-11-08 00:40:40,807][42004] Updated weights for policy 0, policy_version 13316 (0.0026) +[2024-11-08 00:40:42,932][41694] Fps is (10 sec: 7782.0, 60 sec: 6826.6, 300 sec: 6984.0). Total num frames: 54558720. Throughput: 0: 1750.9. Samples: 8635626. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:40:42,935][41694] Avg episode reward: [(0, '4.533')] +[2024-11-08 00:40:46,034][42004] Updated weights for policy 0, policy_version 13326 (0.0023) +[2024-11-08 00:40:47,932][41694] Fps is (10 sec: 7782.2, 60 sec: 6894.9, 300 sec: 7050.5). Total num frames: 54595584. Throughput: 0: 1747.8. Samples: 8641258. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:40:47,933][41694] Avg episode reward: [(0, '4.397')] +[2024-11-08 00:40:51,283][42004] Updated weights for policy 0, policy_version 13336 (0.0032) +[2024-11-08 00:40:52,931][41694] Fps is (10 sec: 7373.2, 60 sec: 6826.7, 300 sec: 7053.5). Total num frames: 54632448. Throughput: 0: 1771.3. Samples: 8653144. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:40:52,933][41694] Avg episode reward: [(0, '4.161')] +[2024-11-08 00:40:57,210][42004] Updated weights for policy 0, policy_version 13346 (0.0030) +[2024-11-08 00:40:57,931][41694] Fps is (10 sec: 7372.9, 60 sec: 7099.8, 300 sec: 7067.3). Total num frames: 54669312. Throughput: 0: 1746.0. Samples: 8663542. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:40:57,933][41694] Avg episode reward: [(0, '4.342')] +[2024-11-08 00:41:02,932][41694] Fps is (10 sec: 6963.2, 60 sec: 7031.5, 300 sec: 7053.5). Total num frames: 54702080. Throughput: 0: 1755.7. Samples: 8668922. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:41:02,933][41694] Avg episode reward: [(0, '4.323')] +[2024-11-08 00:41:02,977][42004] Updated weights for policy 0, policy_version 13356 (0.0025) +[2024-11-08 00:41:07,932][41694] Fps is (10 sec: 5324.7, 60 sec: 6758.4, 300 sec: 6997.9). Total num frames: 54722560. Throughput: 0: 1686.6. Samples: 8675920. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:41:07,934][41694] Avg episode reward: [(0, '4.294')] +[2024-11-08 00:41:10,902][42004] Updated weights for policy 0, policy_version 13366 (0.0037) +[2024-11-08 00:41:12,931][41694] Fps is (10 sec: 5734.5, 60 sec: 6758.4, 300 sec: 6997.9). Total num frames: 54759424. Throughput: 0: 1671.5. Samples: 8686100. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:41:12,933][41694] Avg episode reward: [(0, '4.347')] +[2024-11-08 00:41:16,608][42004] Updated weights for policy 0, policy_version 13376 (0.0031) +[2024-11-08 00:41:17,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6826.8, 300 sec: 6997.9). Total num frames: 54796288. Throughput: 0: 1701.8. Samples: 8691430. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:41:17,935][41694] Avg episode reward: [(0, '4.483')] +[2024-11-08 00:41:21,884][42004] Updated weights for policy 0, policy_version 13386 (0.0032) +[2024-11-08 00:41:22,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6895.0, 300 sec: 7054.4). Total num frames: 54833152. Throughput: 0: 1763.8. Samples: 8703244. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:41:22,934][41694] Avg episode reward: [(0, '4.536')] +[2024-11-08 00:41:27,845][42004] Updated weights for policy 0, policy_version 13396 (0.0034) +[2024-11-08 00:41:27,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6894.9, 300 sec: 7067.3). Total num frames: 54870016. Throughput: 0: 1734.2. Samples: 8713666. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:41:27,933][41694] Avg episode reward: [(0, '4.537')] +[2024-11-08 00:41:32,931][41694] Fps is (10 sec: 6963.3, 60 sec: 7031.5, 300 sec: 7039.6). Total num frames: 54902784. Throughput: 0: 1715.9. Samples: 8718472. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:41:32,933][41694] Avg episode reward: [(0, '4.144')] +[2024-11-08 00:41:34,044][42004] Updated weights for policy 0, policy_version 13406 (0.0035) +[2024-11-08 00:41:39,989][41694] Fps is (10 sec: 5775.0, 60 sec: 6798.3, 300 sec: 6990.8). Total num frames: 54939648. Throughput: 0: 1606.5. Samples: 8728740. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:41:39,991][41694] Avg episode reward: [(0, '4.455')] +[2024-11-08 00:41:40,005][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000013413_54939648.pth... +[2024-11-08 00:41:40,142][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000013001_53252096.pth +[2024-11-08 00:41:41,792][42004] Updated weights for policy 0, policy_version 13416 (0.0036) +[2024-11-08 00:41:42,932][41694] Fps is (10 sec: 5734.2, 60 sec: 6690.2, 300 sec: 6970.2). Total num frames: 54960128. Throughput: 0: 1609.7. Samples: 8735980. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:41:42,935][41694] Avg episode reward: [(0, '4.445')] +[2024-11-08 00:41:47,144][42004] Updated weights for policy 0, policy_version 13426 (0.0023) +[2024-11-08 00:41:47,932][41694] Fps is (10 sec: 7219.7, 60 sec: 6690.1, 300 sec: 6956.3). Total num frames: 54996992. Throughput: 0: 1607.3. Samples: 8741250. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:41:47,935][41694] Avg episode reward: [(0, '4.549')] +[2024-11-08 00:41:52,456][42004] Updated weights for policy 0, policy_version 13436 (0.0022) +[2024-11-08 00:41:52,931][41694] Fps is (10 sec: 7782.6, 60 sec: 6758.4, 300 sec: 6970.2). Total num frames: 55037952. Throughput: 0: 1716.6. Samples: 8753168. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:41:52,933][41694] Avg episode reward: [(0, '4.320')] +[2024-11-08 00:41:57,449][42004] Updated weights for policy 0, policy_version 13446 (0.0030) +[2024-11-08 00:41:57,931][41694] Fps is (10 sec: 7782.6, 60 sec: 6758.4, 300 sec: 7038.4). Total num frames: 55074816. Throughput: 0: 1762.2. Samples: 8765400. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:41:57,933][41694] Avg episode reward: [(0, '4.449')] +[2024-11-08 00:42:02,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6758.4, 300 sec: 7025.7). Total num frames: 55107584. Throughput: 0: 1757.6. Samples: 8770520. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:42:02,940][41694] Avg episode reward: [(0, '4.448')] +[2024-11-08 00:42:03,714][42004] Updated weights for policy 0, policy_version 13456 (0.0029) +[2024-11-08 00:42:07,931][41694] Fps is (10 sec: 6963.1, 60 sec: 7031.5, 300 sec: 7025.7). Total num frames: 55144448. Throughput: 0: 1718.5. Samples: 8780578. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:42:07,934][41694] Avg episode reward: [(0, '4.428')] +[2024-11-08 00:42:09,114][42004] Updated weights for policy 0, policy_version 13466 (0.0029) +[2024-11-08 00:42:14,389][41694] Fps is (10 sec: 6077.5, 60 sec: 6798.1, 300 sec: 6963.5). Total num frames: 55177216. Throughput: 0: 1679.9. Samples: 8791708. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:42:14,392][41694] Avg episode reward: [(0, '4.370')] +[2024-11-08 00:42:16,960][42004] Updated weights for policy 0, policy_version 13476 (0.0029) +[2024-11-08 00:42:17,931][41694] Fps is (10 sec: 5734.4, 60 sec: 6758.4, 300 sec: 6942.4). Total num frames: 55201792. Throughput: 0: 1664.6. Samples: 8793380. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:42:17,933][41694] Avg episode reward: [(0, '4.393')] +[2024-11-08 00:42:22,068][42004] Updated weights for policy 0, policy_version 13486 (0.0028) +[2024-11-08 00:42:22,932][41694] Fps is (10 sec: 7671.6, 60 sec: 6826.7, 300 sec: 6942.4). Total num frames: 55242752. Throughput: 0: 1774.4. Samples: 8804938. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:42:22,933][41694] Avg episode reward: [(0, '4.286')] +[2024-11-08 00:42:27,064][42004] Updated weights for policy 0, policy_version 13496 (0.0031) +[2024-11-08 00:42:27,932][41694] Fps is (10 sec: 8191.4, 60 sec: 6894.9, 300 sec: 6970.1). Total num frames: 55283712. Throughput: 0: 1809.4. Samples: 8817402. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:42:27,934][41694] Avg episode reward: [(0, '4.376')] +[2024-11-08 00:42:32,214][42004] Updated weights for policy 0, policy_version 13506 (0.0032) +[2024-11-08 00:42:32,931][41694] Fps is (10 sec: 8192.1, 60 sec: 7031.5, 300 sec: 7041.8). Total num frames: 55324672. Throughput: 0: 1825.7. Samples: 8823406. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:42:32,933][41694] Avg episode reward: [(0, '4.497')] +[2024-11-08 00:42:37,932][41694] Fps is (10 sec: 7373.2, 60 sec: 7210.4, 300 sec: 7039.6). Total num frames: 55357440. Throughput: 0: 1801.6. Samples: 8834242. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:42:37,934][41694] Avg episode reward: [(0, '4.750')] +[2024-11-08 00:42:38,117][42004] Updated weights for policy 0, policy_version 13516 (0.0033) +[2024-11-08 00:42:42,932][41694] Fps is (10 sec: 7372.6, 60 sec: 7304.5, 300 sec: 7039.6). Total num frames: 55398400. Throughput: 0: 1780.2. Samples: 8845508. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:42:42,933][41694] Avg episode reward: [(0, '4.366')] +[2024-11-08 00:42:43,409][42004] Updated weights for policy 0, policy_version 13526 (0.0025) +[2024-11-08 00:42:48,879][41694] Fps is (10 sec: 5986.3, 60 sec: 6989.4, 300 sec: 6961.7). Total num frames: 55422976. Throughput: 0: 1757.4. Samples: 8851268. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:42:48,881][41694] Avg episode reward: [(0, '4.528')] +[2024-11-08 00:42:51,211][42004] Updated weights for policy 0, policy_version 13536 (0.0026) +[2024-11-08 00:42:52,933][41694] Fps is (10 sec: 5323.9, 60 sec: 6894.7, 300 sec: 6942.3). Total num frames: 55451648. Throughput: 0: 1722.7. Samples: 8858104. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:42:52,937][41694] Avg episode reward: [(0, '4.802')] +[2024-11-08 00:42:56,742][42004] Updated weights for policy 0, policy_version 13546 (0.0022) +[2024-11-08 00:42:57,932][41694] Fps is (10 sec: 7692.2, 60 sec: 6963.2, 300 sec: 6956.3). Total num frames: 55492608. Throughput: 0: 1780.3. Samples: 8869228. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:42:57,936][41694] Avg episode reward: [(0, '4.626')] +[2024-11-08 00:43:02,094][42004] Updated weights for policy 0, policy_version 13556 (0.0028) +[2024-11-08 00:43:02,931][41694] Fps is (10 sec: 7783.9, 60 sec: 7031.5, 300 sec: 6970.1). Total num frames: 55529472. Throughput: 0: 1819.7. Samples: 8875266. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:43:02,933][41694] Avg episode reward: [(0, '4.353')] +[2024-11-08 00:43:07,392][42004] Updated weights for policy 0, policy_version 13566 (0.0024) +[2024-11-08 00:43:07,932][41694] Fps is (10 sec: 7372.8, 60 sec: 7031.5, 300 sec: 7020.1). Total num frames: 55566336. Throughput: 0: 1814.8. Samples: 8886606. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:43:07,935][41694] Avg episode reward: [(0, '4.525')] +[2024-11-08 00:43:12,931][41694] Fps is (10 sec: 7372.9, 60 sec: 7276.5, 300 sec: 7025.7). Total num frames: 55603200. Throughput: 0: 1769.0. Samples: 8897004. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:43:12,932][41694] Avg episode reward: [(0, '4.384')] +[2024-11-08 00:43:13,198][42004] Updated weights for policy 0, policy_version 13576 (0.0025) +[2024-11-08 00:43:17,932][41694] Fps is (10 sec: 7782.4, 60 sec: 7372.8, 300 sec: 7025.7). Total num frames: 55644160. Throughput: 0: 1767.5. Samples: 8902946. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:43:17,934][41694] Avg episode reward: [(0, '4.313')] +[2024-11-08 00:43:18,406][42004] Updated weights for policy 0, policy_version 13586 (0.0023) +[2024-11-08 00:43:23,379][41694] Fps is (10 sec: 5880.8, 60 sec: 6979.4, 300 sec: 6945.7). Total num frames: 55664640. Throughput: 0: 1759.0. Samples: 8914184. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:43:23,383][41694] Avg episode reward: [(0, '4.367')] +[2024-11-08 00:43:26,407][42004] Updated weights for policy 0, policy_version 13596 (0.0028) +[2024-11-08 00:43:27,932][41694] Fps is (10 sec: 5324.5, 60 sec: 6894.9, 300 sec: 6928.5). Total num frames: 55697408. Throughput: 0: 1679.1. Samples: 8921066. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:43:27,934][41694] Avg episode reward: [(0, '4.645')] +[2024-11-08 00:43:31,573][42004] Updated weights for policy 0, policy_version 13606 (0.0027) +[2024-11-08 00:43:32,933][41694] Fps is (10 sec: 7717.4, 60 sec: 6894.8, 300 sec: 6942.4). Total num frames: 55738368. Throughput: 0: 1717.7. Samples: 8926940. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:43:32,935][41694] Avg episode reward: [(0, '4.502')] +[2024-11-08 00:43:36,578][42004] Updated weights for policy 0, policy_version 13616 (0.0019) +[2024-11-08 00:43:37,932][41694] Fps is (10 sec: 8192.0, 60 sec: 7031.4, 300 sec: 6956.3). Total num frames: 55779328. Throughput: 0: 1799.8. Samples: 8939092. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:43:37,935][41694] Avg episode reward: [(0, '4.332')] +[2024-11-08 00:43:37,949][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000013618_55779328.pth... +[2024-11-08 00:43:38,082][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000013211_54112256.pth +[2024-11-08 00:43:42,192][42004] Updated weights for policy 0, policy_version 13626 (0.0037) +[2024-11-08 00:43:42,932][41694] Fps is (10 sec: 7783.0, 60 sec: 6963.2, 300 sec: 7011.8). Total num frames: 55816192. Throughput: 0: 1801.7. Samples: 8950304. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:43:42,935][41694] Avg episode reward: [(0, '4.344')] +[2024-11-08 00:43:47,362][42004] Updated weights for policy 0, policy_version 13636 (0.0018) +[2024-11-08 00:43:47,932][41694] Fps is (10 sec: 7782.9, 60 sec: 7352.4, 300 sec: 7025.7). Total num frames: 55857152. Throughput: 0: 1794.8. Samples: 8956030. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:43:47,934][41694] Avg episode reward: [(0, '4.438')] +[2024-11-08 00:43:52,691][42004] Updated weights for policy 0, policy_version 13646 (0.0031) +[2024-11-08 00:43:52,939][41694] Fps is (10 sec: 7776.8, 60 sec: 7372.1, 300 sec: 7011.6). Total num frames: 55894016. Throughput: 0: 1803.0. Samples: 8967754. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:43:52,941][41694] Avg episode reward: [(0, '4.724')] +[2024-11-08 00:43:57,932][41694] Fps is (10 sec: 5734.3, 60 sec: 7031.5, 300 sec: 6956.3). Total num frames: 55914496. Throughput: 0: 1760.3. Samples: 8976220. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:43:57,934][41694] Avg episode reward: [(0, '4.701')] +[2024-11-08 00:44:00,380][42004] Updated weights for policy 0, policy_version 13656 (0.0032) +[2024-11-08 00:44:02,931][41694] Fps is (10 sec: 5738.6, 60 sec: 7031.5, 300 sec: 6956.3). Total num frames: 55951360. Throughput: 0: 1726.7. Samples: 8980646. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:44:02,933][41694] Avg episode reward: [(0, '4.560')] +[2024-11-08 00:44:05,939][42004] Updated weights for policy 0, policy_version 13666 (0.0025) +[2024-11-08 00:44:07,931][41694] Fps is (10 sec: 7782.6, 60 sec: 7099.8, 300 sec: 6970.2). Total num frames: 55992320. Throughput: 0: 1741.5. Samples: 8991772. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 00:44:07,933][41694] Avg episode reward: [(0, '4.371')] +[2024-11-08 00:44:11,117][42004] Updated weights for policy 0, policy_version 13676 (0.0030) +[2024-11-08 00:44:12,934][41694] Fps is (10 sec: 7780.4, 60 sec: 7099.4, 300 sec: 6970.1). Total num frames: 56029184. Throughput: 0: 1833.4. Samples: 9003572. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 00:44:12,938][41694] Avg episode reward: [(0, '4.352')] +[2024-11-08 00:44:16,685][42004] Updated weights for policy 0, policy_version 13686 (0.0025) +[2024-11-08 00:44:17,932][41694] Fps is (10 sec: 7372.6, 60 sec: 7031.5, 300 sec: 7025.7). Total num frames: 56066048. Throughput: 0: 1817.5. Samples: 9008726. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:44:17,934][41694] Avg episode reward: [(0, '4.477')] +[2024-11-08 00:44:21,960][42004] Updated weights for policy 0, policy_version 13696 (0.0024) +[2024-11-08 00:44:22,932][41694] Fps is (10 sec: 7374.6, 60 sec: 7359.4, 300 sec: 7025.7). Total num frames: 56102912. Throughput: 0: 1804.5. Samples: 9020292. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:44:22,933][41694] Avg episode reward: [(0, '4.361')] +[2024-11-08 00:44:27,211][42004] Updated weights for policy 0, policy_version 13706 (0.0026) +[2024-11-08 00:44:27,932][41694] Fps is (10 sec: 7782.3, 60 sec: 7441.1, 300 sec: 7025.7). Total num frames: 56143872. Throughput: 0: 1821.0. Samples: 9032250. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:44:27,934][41694] Avg episode reward: [(0, '4.405')] +[2024-11-08 00:44:32,931][41694] Fps is (10 sec: 5734.5, 60 sec: 7031.6, 300 sec: 6942.4). Total num frames: 56160256. Throughput: 0: 1818.5. Samples: 9037864. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:44:32,933][41694] Avg episode reward: [(0, '4.272')] +[2024-11-08 00:44:35,049][42004] Updated weights for policy 0, policy_version 13716 (0.0031) +[2024-11-08 00:44:37,932][41694] Fps is (10 sec: 5734.5, 60 sec: 7031.5, 300 sec: 6956.3). Total num frames: 56201216. Throughput: 0: 1712.0. Samples: 9044782. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:44:37,934][41694] Avg episode reward: [(0, '4.354')] +[2024-11-08 00:44:40,214][42004] Updated weights for policy 0, policy_version 13726 (0.0025) +[2024-11-08 00:44:42,932][41694] Fps is (10 sec: 8191.9, 60 sec: 7099.7, 300 sec: 6984.0). Total num frames: 56242176. Throughput: 0: 1781.6. Samples: 9056394. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:44:42,933][41694] Avg episode reward: [(0, '4.518')] +[2024-11-08 00:44:45,694][42004] Updated weights for policy 0, policy_version 13736 (0.0024) +[2024-11-08 00:44:47,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6963.2, 300 sec: 6956.3). Total num frames: 56274944. Throughput: 0: 1806.9. Samples: 9061956. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:44:47,933][41694] Avg episode reward: [(0, '4.241')] +[2024-11-08 00:44:51,669][42004] Updated weights for policy 0, policy_version 13746 (0.0027) +[2024-11-08 00:44:52,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6964.1, 300 sec: 7011.8). Total num frames: 56311808. Throughput: 0: 1791.0. Samples: 9072366. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:44:52,933][41694] Avg episode reward: [(0, '4.305')] +[2024-11-08 00:44:57,155][42004] Updated weights for policy 0, policy_version 13756 (0.0029) +[2024-11-08 00:44:57,932][41694] Fps is (10 sec: 7372.8, 60 sec: 7236.3, 300 sec: 7011.8). Total num frames: 56348672. Throughput: 0: 1775.2. Samples: 9083452. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:44:57,933][41694] Avg episode reward: [(0, '4.441')] +[2024-11-08 00:45:02,613][42004] Updated weights for policy 0, policy_version 13766 (0.0033) +[2024-11-08 00:45:02,932][41694] Fps is (10 sec: 7372.7, 60 sec: 7236.3, 300 sec: 7011.8). Total num frames: 56385536. Throughput: 0: 1786.3. Samples: 9089110. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:45:02,935][41694] Avg episode reward: [(0, '4.445')] +[2024-11-08 00:45:07,931][41694] Fps is (10 sec: 5734.5, 60 sec: 6894.9, 300 sec: 6956.3). Total num frames: 56406016. Throughput: 0: 1727.5. Samples: 9098030. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:45:07,933][41694] Avg episode reward: [(0, '4.401')] +[2024-11-08 00:45:10,398][42004] Updated weights for policy 0, policy_version 13776 (0.0028) +[2024-11-08 00:45:12,931][41694] Fps is (10 sec: 5734.5, 60 sec: 6895.2, 300 sec: 6970.2). Total num frames: 56442880. Throughput: 0: 1663.2. Samples: 9107094. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:45:12,933][41694] Avg episode reward: [(0, '4.277')] +[2024-11-08 00:45:16,233][42004] Updated weights for policy 0, policy_version 13786 (0.0027) +[2024-11-08 00:45:17,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6826.7, 300 sec: 6970.2). Total num frames: 56475648. Throughput: 0: 1653.9. Samples: 9112290. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:45:17,934][41694] Avg episode reward: [(0, '4.686')] +[2024-11-08 00:45:22,163][42004] Updated weights for policy 0, policy_version 13796 (0.0045) +[2024-11-08 00:45:22,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6826.7, 300 sec: 6970.1). Total num frames: 56512512. Throughput: 0: 1728.3. Samples: 9122554. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:45:22,935][41694] Avg episode reward: [(0, '4.328')] +[2024-11-08 00:45:27,776][42004] Updated weights for policy 0, policy_version 13806 (0.0034) +[2024-11-08 00:45:27,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6758.4, 300 sec: 7011.8). Total num frames: 56549376. Throughput: 0: 1708.9. Samples: 9133296. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:45:27,934][41694] Avg episode reward: [(0, '4.461')] +[2024-11-08 00:45:32,932][41694] Fps is (10 sec: 7372.8, 60 sec: 7099.7, 300 sec: 7011.8). Total num frames: 56586240. Throughput: 0: 1712.4. Samples: 9139012. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:45:32,934][41694] Avg episode reward: [(0, '4.446')] +[2024-11-08 00:45:33,289][42004] Updated weights for policy 0, policy_version 13816 (0.0028) +[2024-11-08 00:45:37,931][41694] Fps is (10 sec: 7372.9, 60 sec: 7031.5, 300 sec: 6997.9). Total num frames: 56623104. Throughput: 0: 1734.9. Samples: 9150438. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:45:37,933][41694] Avg episode reward: [(0, '4.554')] +[2024-11-08 00:45:37,951][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000013825_56627200.pth... +[2024-11-08 00:45:38,059][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000013413_54939648.pth +[2024-11-08 00:45:38,484][42004] Updated weights for policy 0, policy_version 13826 (0.0038) +[2024-11-08 00:45:42,932][41694] Fps is (10 sec: 6144.0, 60 sec: 6758.4, 300 sec: 6956.3). Total num frames: 56647680. Throughput: 0: 1655.5. Samples: 9157948. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:45:42,933][41694] Avg episode reward: [(0, '4.519')] +[2024-11-08 00:45:46,003][42004] Updated weights for policy 0, policy_version 13836 (0.0023) +[2024-11-08 00:45:47,931][41694] Fps is (10 sec: 6144.0, 60 sec: 6826.7, 300 sec: 6956.3). Total num frames: 56684544. Throughput: 0: 1655.7. Samples: 9163618. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:45:47,933][41694] Avg episode reward: [(0, '4.470')] +[2024-11-08 00:45:51,360][42004] Updated weights for policy 0, policy_version 13846 (0.0033) +[2024-11-08 00:45:52,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6826.7, 300 sec: 6956.3). Total num frames: 56721408. Throughput: 0: 1715.3. Samples: 9175218. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:45:52,933][41694] Avg episode reward: [(0, '4.348')] +[2024-11-08 00:45:57,212][42004] Updated weights for policy 0, policy_version 13856 (0.0024) +[2024-11-08 00:45:57,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6826.7, 300 sec: 6970.1). Total num frames: 56758272. Throughput: 0: 1745.5. Samples: 9185642. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:45:57,934][41694] Avg episode reward: [(0, '4.205')] +[2024-11-08 00:46:02,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6758.4, 300 sec: 7011.8). Total num frames: 56791040. Throughput: 0: 1747.7. Samples: 9190938. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:46:02,933][41694] Avg episode reward: [(0, '4.320')] +[2024-11-08 00:46:03,013][42004] Updated weights for policy 0, policy_version 13866 (0.0031) +[2024-11-08 00:46:07,932][41694] Fps is (10 sec: 7372.8, 60 sec: 7099.7, 300 sec: 7025.7). Total num frames: 56832000. Throughput: 0: 1764.5. Samples: 9201956. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:46:07,935][41694] Avg episode reward: [(0, '4.719')] +[2024-11-08 00:46:08,322][42004] Updated weights for policy 0, policy_version 13876 (0.0036) +[2024-11-08 00:46:12,931][41694] Fps is (10 sec: 7373.1, 60 sec: 7031.5, 300 sec: 7011.8). Total num frames: 56864768. Throughput: 0: 1762.6. Samples: 9212612. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 00:46:12,933][41694] Avg episode reward: [(0, '4.526')] +[2024-11-08 00:46:16,495][42004] Updated weights for policy 0, policy_version 13886 (0.0024) +[2024-11-08 00:46:17,932][41694] Fps is (10 sec: 5324.7, 60 sec: 6826.6, 300 sec: 6956.3). Total num frames: 56885248. Throughput: 0: 1697.7. Samples: 9215410. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 00:46:17,934][41694] Avg episode reward: [(0, '4.508')] +[2024-11-08 00:46:21,885][42004] Updated weights for policy 0, policy_version 13896 (0.0030) +[2024-11-08 00:46:22,932][41694] Fps is (10 sec: 5734.1, 60 sec: 6826.6, 300 sec: 6956.2). Total num frames: 56922112. Throughput: 0: 1666.6. Samples: 9225436. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:46:22,934][41694] Avg episode reward: [(0, '4.484')] +[2024-11-08 00:46:27,321][42004] Updated weights for policy 0, policy_version 13906 (0.0036) +[2024-11-08 00:46:27,932][41694] Fps is (10 sec: 7372.9, 60 sec: 6826.7, 300 sec: 6970.1). Total num frames: 56958976. Throughput: 0: 1750.6. Samples: 9236724. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:46:27,934][41694] Avg episode reward: [(0, '4.494')] +[2024-11-08 00:46:32,931][41694] Fps is (10 sec: 7373.1, 60 sec: 6826.7, 300 sec: 7019.1). Total num frames: 56995840. Throughput: 0: 1732.6. Samples: 9241584. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:46:32,933][41694] Avg episode reward: [(0, '4.481')] +[2024-11-08 00:46:33,247][42004] Updated weights for policy 0, policy_version 13916 (0.0024) +[2024-11-08 00:46:37,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6894.9, 300 sec: 7039.6). Total num frames: 57036800. Throughput: 0: 1725.5. Samples: 9252864. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:46:37,933][41694] Avg episode reward: [(0, '4.656')] +[2024-11-08 00:46:38,484][42004] Updated weights for policy 0, policy_version 13926 (0.0033) +[2024-11-08 00:46:42,932][41694] Fps is (10 sec: 7782.3, 60 sec: 7099.7, 300 sec: 7039.6). Total num frames: 57073664. Throughput: 0: 1744.4. Samples: 9264142. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:46:42,933][41694] Avg episode reward: [(0, '4.520')] +[2024-11-08 00:46:44,027][42004] Updated weights for policy 0, policy_version 13936 (0.0031) +[2024-11-08 00:46:47,931][41694] Fps is (10 sec: 7372.9, 60 sec: 7099.7, 300 sec: 7025.7). Total num frames: 57110528. Throughput: 0: 1754.6. Samples: 9269896. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:46:47,933][41694] Avg episode reward: [(0, '4.350')] +[2024-11-08 00:46:51,619][42004] Updated weights for policy 0, policy_version 13946 (0.0035) +[2024-11-08 00:46:52,931][41694] Fps is (10 sec: 5734.5, 60 sec: 6826.7, 300 sec: 6970.1). Total num frames: 57131008. Throughput: 0: 1672.4. Samples: 9277214. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:46:52,933][41694] Avg episode reward: [(0, '4.232')] +[2024-11-08 00:46:57,103][42004] Updated weights for policy 0, policy_version 13956 (0.0033) +[2024-11-08 00:46:57,931][41694] Fps is (10 sec: 5734.3, 60 sec: 6826.7, 300 sec: 6984.0). Total num frames: 57167872. Throughput: 0: 1678.2. Samples: 9288132. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:46:57,933][41694] Avg episode reward: [(0, '4.585')] +[2024-11-08 00:47:02,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6826.7, 300 sec: 6970.1). Total num frames: 57200640. Throughput: 0: 1741.7. Samples: 9293788. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-11-08 00:47:02,935][41694] Avg episode reward: [(0, '4.428')] +[2024-11-08 00:47:03,106][42004] Updated weights for policy 0, policy_version 13966 (0.0033) +[2024-11-08 00:47:07,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6758.4, 300 sec: 7018.7). Total num frames: 57237504. Throughput: 0: 1733.3. Samples: 9303434. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-11-08 00:47:07,934][41694] Avg episode reward: [(0, '4.384')] +[2024-11-08 00:47:08,851][42004] Updated weights for policy 0, policy_version 13976 (0.0032) +[2024-11-08 00:47:12,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6826.7, 300 sec: 7025.7). Total num frames: 57274368. Throughput: 0: 1734.6. Samples: 9314780. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:47:12,933][41694] Avg episode reward: [(0, '4.745')] +[2024-11-08 00:47:14,238][42004] Updated weights for policy 0, policy_version 13986 (0.0034) +[2024-11-08 00:47:17,932][41694] Fps is (10 sec: 7372.8, 60 sec: 7099.7, 300 sec: 7011.8). Total num frames: 57311232. Throughput: 0: 1756.3. Samples: 9320618. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:47:17,934][41694] Avg episode reward: [(0, '4.566')] +[2024-11-08 00:47:19,752][42004] Updated weights for policy 0, policy_version 13996 (0.0027) +[2024-11-08 00:47:24,493][41694] Fps is (10 sec: 6022.9, 60 sec: 6853.2, 300 sec: 6947.3). Total num frames: 57344000. Throughput: 0: 1695.9. Samples: 9331828. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:47:24,495][41694] Avg episode reward: [(0, '4.366')] +[2024-11-08 00:47:27,219][42004] Updated weights for policy 0, policy_version 14006 (0.0022) +[2024-11-08 00:47:27,932][41694] Fps is (10 sec: 6144.0, 60 sec: 6894.9, 300 sec: 6942.4). Total num frames: 57372672. Throughput: 0: 1669.0. Samples: 9339246. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:47:27,934][41694] Avg episode reward: [(0, '4.267')] +[2024-11-08 00:47:32,694][42004] Updated weights for policy 0, policy_version 14016 (0.0026) +[2024-11-08 00:47:32,931][41694] Fps is (10 sec: 7766.1, 60 sec: 6894.9, 300 sec: 6956.3). Total num frames: 57409536. Throughput: 0: 1665.1. Samples: 9344826. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:47:32,933][41694] Avg episode reward: [(0, '4.509')] +[2024-11-08 00:47:37,932][41694] Fps is (10 sec: 7372.2, 60 sec: 6826.6, 300 sec: 6942.4). Total num frames: 57446400. Throughput: 0: 1758.4. Samples: 9356346. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:47:37,934][41694] Avg episode reward: [(0, '4.719')] +[2024-11-08 00:47:37,961][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000014026_57450496.pth... +[2024-11-08 00:47:37,960][42004] Updated weights for policy 0, policy_version 14026 (0.0025) +[2024-11-08 00:47:38,067][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000013618_55779328.pth +[2024-11-08 00:47:42,931][41694] Fps is (10 sec: 7782.3, 60 sec: 6894.9, 300 sec: 7020.5). Total num frames: 57487360. Throughput: 0: 1771.3. Samples: 9367842. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:47:42,935][41694] Avg episode reward: [(0, '4.377')] +[2024-11-08 00:47:43,340][42004] Updated weights for policy 0, policy_version 14036 (0.0029) +[2024-11-08 00:47:47,932][41694] Fps is (10 sec: 7783.2, 60 sec: 6894.9, 300 sec: 7025.7). Total num frames: 57524224. Throughput: 0: 1775.4. Samples: 9373682. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:47:47,933][41694] Avg episode reward: [(0, '4.227')] +[2024-11-08 00:47:48,553][42004] Updated weights for policy 0, policy_version 14046 (0.0027) +[2024-11-08 00:47:52,932][41694] Fps is (10 sec: 7782.2, 60 sec: 7236.2, 300 sec: 7025.7). Total num frames: 57565184. Throughput: 0: 1820.8. Samples: 9385372. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:47:52,935][41694] Avg episode reward: [(0, '4.299')] +[2024-11-08 00:47:53,870][42004] Updated weights for policy 0, policy_version 14056 (0.0029) +[2024-11-08 00:47:58,874][41694] Fps is (10 sec: 5989.3, 60 sec: 6922.8, 300 sec: 6961.8). Total num frames: 57589760. Throughput: 0: 1662.8. Samples: 9391172. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:47:58,876][41694] Avg episode reward: [(0, '4.614')] +[2024-11-08 00:48:02,108][42004] Updated weights for policy 0, policy_version 14066 (0.0026) +[2024-11-08 00:48:02,932][41694] Fps is (10 sec: 5324.7, 60 sec: 6963.2, 300 sec: 6956.3). Total num frames: 57618432. Throughput: 0: 1710.5. Samples: 9397590. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:48:02,935][41694] Avg episode reward: [(0, '4.445')] +[2024-11-08 00:48:07,875][42004] Updated weights for policy 0, policy_version 14076 (0.0023) +[2024-11-08 00:48:07,932][41694] Fps is (10 sec: 7235.1, 60 sec: 6963.2, 300 sec: 6956.2). Total num frames: 57655296. Throughput: 0: 1749.5. Samples: 9407826. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:48:07,933][41694] Avg episode reward: [(0, '4.730')] +[2024-11-08 00:48:12,931][41694] Fps is (10 sec: 7373.2, 60 sec: 6963.2, 300 sec: 6942.4). Total num frames: 57692160. Throughput: 0: 1770.1. Samples: 9418898. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 00:48:12,933][41694] Avg episode reward: [(0, '4.345')] +[2024-11-08 00:48:13,391][42004] Updated weights for policy 0, policy_version 14086 (0.0030) +[2024-11-08 00:48:17,932][41694] Fps is (10 sec: 7372.6, 60 sec: 6963.2, 300 sec: 7008.5). Total num frames: 57729024. Throughput: 0: 1775.5. Samples: 9424724. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 00:48:17,934][41694] Avg episode reward: [(0, '4.542')] +[2024-11-08 00:48:18,632][42004] Updated weights for policy 0, policy_version 14096 (0.0030) +[2024-11-08 00:48:22,931][41694] Fps is (10 sec: 7782.3, 60 sec: 7289.4, 300 sec: 7025.7). Total num frames: 57769984. Throughput: 0: 1783.9. Samples: 9436620. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:48:22,933][41694] Avg episode reward: [(0, '4.566')] +[2024-11-08 00:48:23,971][42004] Updated weights for policy 0, policy_version 14106 (0.0027) +[2024-11-08 00:48:27,932][41694] Fps is (10 sec: 7782.7, 60 sec: 7236.3, 300 sec: 7011.8). Total num frames: 57806848. Throughput: 0: 1776.8. Samples: 9447798. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:48:27,934][41694] Avg episode reward: [(0, '4.527')] +[2024-11-08 00:48:29,440][42004] Updated weights for policy 0, policy_version 14116 (0.0033) +[2024-11-08 00:48:33,239][41694] Fps is (10 sec: 5563.3, 60 sec: 6927.7, 300 sec: 6935.2). Total num frames: 57827328. Throughput: 0: 1755.3. Samples: 9453208. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:48:33,241][41694] Avg episode reward: [(0, '4.387')] +[2024-11-08 00:48:37,537][42004] Updated weights for policy 0, policy_version 14126 (0.0047) +[2024-11-08 00:48:37,932][41694] Fps is (10 sec: 5324.7, 60 sec: 6895.0, 300 sec: 6928.5). Total num frames: 57860096. Throughput: 0: 1651.5. Samples: 9459688. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:48:37,935][41694] Avg episode reward: [(0, '4.522')] +[2024-11-08 00:48:42,819][42004] Updated weights for policy 0, policy_version 14136 (0.0042) +[2024-11-08 00:48:42,931][41694] Fps is (10 sec: 7606.7, 60 sec: 6894.9, 300 sec: 6928.5). Total num frames: 57901056. Throughput: 0: 1820.1. Samples: 9471362. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:48:42,933][41694] Avg episode reward: [(0, '4.481')] +[2024-11-08 00:48:47,921][42004] Updated weights for policy 0, policy_version 14146 (0.0036) +[2024-11-08 00:48:47,932][41694] Fps is (10 sec: 8192.1, 60 sec: 6963.2, 300 sec: 6942.5). Total num frames: 57942016. Throughput: 0: 1770.8. Samples: 9477274. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:48:47,933][41694] Avg episode reward: [(0, '4.525')] +[2024-11-08 00:48:52,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6895.0, 300 sec: 6997.9). Total num frames: 57978880. Throughput: 0: 1807.6. Samples: 9489166. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:48:52,934][41694] Avg episode reward: [(0, '4.448')] +[2024-11-08 00:48:53,096][42004] Updated weights for policy 0, policy_version 14156 (0.0027) +[2024-11-08 00:48:57,932][41694] Fps is (10 sec: 7782.3, 60 sec: 7282.3, 300 sec: 7011.8). Total num frames: 58019840. Throughput: 0: 1820.5. Samples: 9500820. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:48:57,939][41694] Avg episode reward: [(0, '4.384')] +[2024-11-08 00:48:58,460][42004] Updated weights for policy 0, policy_version 14166 (0.0033) +[2024-11-08 00:49:02,931][41694] Fps is (10 sec: 7372.9, 60 sec: 7236.3, 300 sec: 6984.0). Total num frames: 58052608. Throughput: 0: 1811.5. Samples: 9506240. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:49:02,933][41694] Avg episode reward: [(0, '4.441')] +[2024-11-08 00:49:04,733][42004] Updated weights for policy 0, policy_version 14176 (0.0035) +[2024-11-08 00:49:07,932][41694] Fps is (10 sec: 4914.8, 60 sec: 6894.8, 300 sec: 6914.6). Total num frames: 58068992. Throughput: 0: 1742.2. Samples: 9515022. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:49:07,934][41694] Avg episode reward: [(0, '4.521')] +[2024-11-08 00:49:12,502][42004] Updated weights for policy 0, policy_version 14186 (0.0037) +[2024-11-08 00:49:12,932][41694] Fps is (10 sec: 5324.7, 60 sec: 6894.9, 300 sec: 6914.6). Total num frames: 58105856. Throughput: 0: 1674.8. Samples: 9523164. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:49:12,934][41694] Avg episode reward: [(0, '4.468')] +[2024-11-08 00:49:17,626][42004] Updated weights for policy 0, policy_version 14196 (0.0030) +[2024-11-08 00:49:17,931][41694] Fps is (10 sec: 7783.2, 60 sec: 6963.3, 300 sec: 6928.5). Total num frames: 58146816. Throughput: 0: 1697.4. Samples: 9529070. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:49:17,933][41694] Avg episode reward: [(0, '4.485')] +[2024-11-08 00:49:22,840][42004] Updated weights for policy 0, policy_version 14206 (0.0027) +[2024-11-08 00:49:22,932][41694] Fps is (10 sec: 8192.2, 60 sec: 6963.2, 300 sec: 6928.5). Total num frames: 58187776. Throughput: 0: 1803.4. Samples: 9540842. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:49:22,934][41694] Avg episode reward: [(0, '4.511')] +[2024-11-08 00:49:27,931][41694] Fps is (10 sec: 7782.3, 60 sec: 6963.2, 300 sec: 6997.9). Total num frames: 58224640. Throughput: 0: 1808.9. Samples: 9552764. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:49:27,933][41694] Avg episode reward: [(0, '4.395')] +[2024-11-08 00:49:27,996][42004] Updated weights for policy 0, policy_version 14216 (0.0033) +[2024-11-08 00:49:32,931][41694] Fps is (10 sec: 7782.5, 60 sec: 7342.2, 300 sec: 6997.9). Total num frames: 58265600. Throughput: 0: 1806.1. Samples: 9558548. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:49:32,933][41694] Avg episode reward: [(0, '4.334')] +[2024-11-08 00:49:33,282][42004] Updated weights for policy 0, policy_version 14226 (0.0027) +[2024-11-08 00:49:37,932][41694] Fps is (10 sec: 7782.4, 60 sec: 7372.8, 300 sec: 6984.0). Total num frames: 58302464. Throughput: 0: 1804.2. Samples: 9570356. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:49:37,933][41694] Avg episode reward: [(0, '4.311')] +[2024-11-08 00:49:37,946][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000014234_58302464.pth... +[2024-11-08 00:49:38,078][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000013825_56627200.pth +[2024-11-08 00:49:38,911][42004] Updated weights for policy 0, policy_version 14236 (0.0035) +[2024-11-08 00:49:42,931][41694] Fps is (10 sec: 5324.7, 60 sec: 6963.2, 300 sec: 6928.5). Total num frames: 58318848. Throughput: 0: 1685.3. Samples: 9576660. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:49:42,933][41694] Avg episode reward: [(0, '4.305')] +[2024-11-08 00:49:47,186][42004] Updated weights for policy 0, policy_version 14246 (0.0033) +[2024-11-08 00:49:47,931][41694] Fps is (10 sec: 5324.9, 60 sec: 6895.0, 300 sec: 6928.5). Total num frames: 58355712. Throughput: 0: 1672.4. Samples: 9581500. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:49:47,933][41694] Avg episode reward: [(0, '4.261')] +[2024-11-08 00:49:52,754][42004] Updated weights for policy 0, policy_version 14256 (0.0021) +[2024-11-08 00:49:52,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6894.9, 300 sec: 6928.5). Total num frames: 58392576. Throughput: 0: 1721.7. Samples: 9592496. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:49:52,933][41694] Avg episode reward: [(0, '4.541')] +[2024-11-08 00:49:57,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6826.7, 300 sec: 6928.5). Total num frames: 58429440. Throughput: 0: 1796.1. Samples: 9603986. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:49:57,933][41694] Avg episode reward: [(0, '4.427')] +[2024-11-08 00:49:58,028][42004] Updated weights for policy 0, policy_version 14266 (0.0027) +[2024-11-08 00:50:02,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6894.9, 300 sec: 6984.0). Total num frames: 58466304. Throughput: 0: 1794.6. Samples: 9609826. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:50:02,933][41694] Avg episode reward: [(0, '4.514')] +[2024-11-08 00:50:03,556][42004] Updated weights for policy 0, policy_version 14276 (0.0035) +[2024-11-08 00:50:07,931][41694] Fps is (10 sec: 7782.5, 60 sec: 7304.6, 300 sec: 6997.9). Total num frames: 58507264. Throughput: 0: 1779.1. Samples: 9620900. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:50:07,933][41694] Avg episode reward: [(0, '4.384')] +[2024-11-08 00:50:08,809][42004] Updated weights for policy 0, policy_version 14286 (0.0033) +[2024-11-08 00:50:12,931][41694] Fps is (10 sec: 6963.2, 60 sec: 7168.0, 300 sec: 6984.0). Total num frames: 58535936. Throughput: 0: 1738.8. Samples: 9631008. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:50:12,933][41694] Avg episode reward: [(0, '4.541')] +[2024-11-08 00:50:17,936][41694] Fps is (10 sec: 4503.6, 60 sec: 6757.9, 300 sec: 6914.5). Total num frames: 58552320. Throughput: 0: 1680.5. Samples: 9634178. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-11-08 00:50:17,950][41694] Avg episode reward: [(0, '4.447')] +[2024-11-08 00:50:18,098][42004] Updated weights for policy 0, policy_version 14296 (0.0033) +[2024-11-08 00:50:22,932][41694] Fps is (10 sec: 5324.5, 60 sec: 6690.1, 300 sec: 6914.6). Total num frames: 58589184. Throughput: 0: 1585.9. Samples: 9641720. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-11-08 00:50:22,934][41694] Avg episode reward: [(0, '4.327')] +[2024-11-08 00:50:23,620][42004] Updated weights for policy 0, policy_version 14306 (0.0024) +[2024-11-08 00:50:27,932][41694] Fps is (10 sec: 7785.5, 60 sec: 6758.3, 300 sec: 6928.5). Total num frames: 58630144. Throughput: 0: 1707.3. Samples: 9653488. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:50:27,935][41694] Avg episode reward: [(0, '4.424')] +[2024-11-08 00:50:28,915][42004] Updated weights for policy 0, policy_version 14316 (0.0026) +[2024-11-08 00:50:32,931][41694] Fps is (10 sec: 7782.9, 60 sec: 6690.1, 300 sec: 6928.5). Total num frames: 58667008. Throughput: 0: 1726.4. Samples: 9659188. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:50:32,933][41694] Avg episode reward: [(0, '4.426')] +[2024-11-08 00:50:34,094][42004] Updated weights for policy 0, policy_version 14326 (0.0031) +[2024-11-08 00:50:37,931][41694] Fps is (10 sec: 7782.8, 60 sec: 6758.4, 300 sec: 6984.0). Total num frames: 58707968. Throughput: 0: 1742.6. Samples: 9670912. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:50:37,933][41694] Avg episode reward: [(0, '4.443')] +[2024-11-08 00:50:39,486][42004] Updated weights for policy 0, policy_version 14336 (0.0030) +[2024-11-08 00:50:42,931][41694] Fps is (10 sec: 7782.4, 60 sec: 7099.7, 300 sec: 6984.0). Total num frames: 58744832. Throughput: 0: 1747.7. Samples: 9682634. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:50:42,941][41694] Avg episode reward: [(0, '4.615')] +[2024-11-08 00:50:44,916][42004] Updated weights for policy 0, policy_version 14346 (0.0021) +[2024-11-08 00:50:47,932][41694] Fps is (10 sec: 7372.6, 60 sec: 7099.7, 300 sec: 6984.0). Total num frames: 58781696. Throughput: 0: 1737.3. Samples: 9688006. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:50:47,933][41694] Avg episode reward: [(0, '4.370')] +[2024-11-08 00:50:52,847][42004] Updated weights for policy 0, policy_version 14356 (0.0036) +[2024-11-08 00:50:52,931][41694] Fps is (10 sec: 5734.4, 60 sec: 6826.7, 300 sec: 6928.5). Total num frames: 58802176. Throughput: 0: 1638.6. Samples: 9694638. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:50:52,933][41694] Avg episode reward: [(0, '4.413')] +[2024-11-08 00:50:57,932][41694] Fps is (10 sec: 5734.1, 60 sec: 6826.6, 300 sec: 6942.4). Total num frames: 58839040. Throughput: 0: 1672.8. Samples: 9706284. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:50:57,934][41694] Avg episode reward: [(0, '4.485')] +[2024-11-08 00:50:58,031][42004] Updated weights for policy 0, policy_version 14366 (0.0026) +[2024-11-08 00:51:02,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6826.7, 300 sec: 6928.5). Total num frames: 58875904. Throughput: 0: 1723.2. Samples: 9711712. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:51:02,935][41694] Avg episode reward: [(0, '4.381')] +[2024-11-08 00:51:03,750][42004] Updated weights for policy 0, policy_version 14376 (0.0032) +[2024-11-08 00:51:07,932][41694] Fps is (10 sec: 7373.3, 60 sec: 6758.4, 300 sec: 6942.4). Total num frames: 58912768. Throughput: 0: 1799.7. Samples: 9722706. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:51:07,937][41694] Avg episode reward: [(0, '4.595')] +[2024-11-08 00:51:09,347][42004] Updated weights for policy 0, policy_version 14386 (0.0036) +[2024-11-08 00:51:12,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6826.7, 300 sec: 6984.0). Total num frames: 58945536. Throughput: 0: 1765.9. Samples: 9732952. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:51:12,933][41694] Avg episode reward: [(0, '4.422')] +[2024-11-08 00:51:15,356][42004] Updated weights for policy 0, policy_version 14396 (0.0037) +[2024-11-08 00:51:17,931][41694] Fps is (10 sec: 6963.3, 60 sec: 7168.5, 300 sec: 6984.0). Total num frames: 58982400. Throughput: 0: 1762.4. Samples: 9738494. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:51:17,934][41694] Avg episode reward: [(0, '4.361')] +[2024-11-08 00:51:21,059][42004] Updated weights for policy 0, policy_version 14406 (0.0034) +[2024-11-08 00:51:22,931][41694] Fps is (10 sec: 7372.8, 60 sec: 7168.1, 300 sec: 6984.0). Total num frames: 59019264. Throughput: 0: 1737.7. Samples: 9749110. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:51:22,933][41694] Avg episode reward: [(0, '4.426')] +[2024-11-08 00:51:27,932][41694] Fps is (10 sec: 5734.2, 60 sec: 6826.7, 300 sec: 6928.5). Total num frames: 59039744. Throughput: 0: 1636.9. Samples: 9756296. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:51:27,933][41694] Avg episode reward: [(0, '4.404')] +[2024-11-08 00:51:28,585][42004] Updated weights for policy 0, policy_version 14416 (0.0028) +[2024-11-08 00:51:32,931][41694] Fps is (10 sec: 6144.0, 60 sec: 6894.9, 300 sec: 6928.5). Total num frames: 59080704. Throughput: 0: 1652.1. Samples: 9762348. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:51:32,934][41694] Avg episode reward: [(0, '4.592')] +[2024-11-08 00:51:33,579][42004] Updated weights for policy 0, policy_version 14426 (0.0025) +[2024-11-08 00:51:37,932][41694] Fps is (10 sec: 8192.1, 60 sec: 6894.9, 300 sec: 6942.4). Total num frames: 59121664. Throughput: 0: 1779.1. Samples: 9774698. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:51:37,935][41694] Avg episode reward: [(0, '4.524')] +[2024-11-08 00:51:37,965][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000014434_59121664.pth... +[2024-11-08 00:51:38,086][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000014026_57450496.pth +[2024-11-08 00:51:38,651][42004] Updated weights for policy 0, policy_version 14436 (0.0026) +[2024-11-08 00:51:42,931][41694] Fps is (10 sec: 8192.0, 60 sec: 6963.2, 300 sec: 6956.3). Total num frames: 59162624. Throughput: 0: 1782.7. Samples: 9786504. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:51:42,933][41694] Avg episode reward: [(0, '4.290')] +[2024-11-08 00:51:43,862][42004] Updated weights for policy 0, policy_version 14446 (0.0030) +[2024-11-08 00:51:47,932][41694] Fps is (10 sec: 7782.6, 60 sec: 6963.2, 300 sec: 7011.8). Total num frames: 59199488. Throughput: 0: 1795.7. Samples: 9792520. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:51:47,934][41694] Avg episode reward: [(0, '4.402')] +[2024-11-08 00:51:49,065][42004] Updated weights for policy 0, policy_version 14456 (0.0023) +[2024-11-08 00:51:52,931][41694] Fps is (10 sec: 7372.8, 60 sec: 7236.3, 300 sec: 7011.8). Total num frames: 59236352. Throughput: 0: 1811.6. Samples: 9804226. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:51:52,933][41694] Avg episode reward: [(0, '4.344')] +[2024-11-08 00:51:54,719][42004] Updated weights for policy 0, policy_version 14466 (0.0023) +[2024-11-08 00:51:59,473][41694] Fps is (10 sec: 6033.3, 60 sec: 6988.6, 300 sec: 6975.4). Total num frames: 59269120. Throughput: 0: 1756.0. Samples: 9814680. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:51:59,475][41694] Avg episode reward: [(0, '4.620')] +[2024-11-08 00:52:02,431][42004] Updated weights for policy 0, policy_version 14476 (0.0031) +[2024-11-08 00:52:02,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6963.2, 300 sec: 6970.1). Total num frames: 59293696. Throughput: 0: 1729.7. Samples: 9816330. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:52:02,934][41694] Avg episode reward: [(0, '4.519')] +[2024-11-08 00:52:07,932][41694] Fps is (10 sec: 7263.4, 60 sec: 6963.2, 300 sec: 6970.1). Total num frames: 59330560. Throughput: 0: 1736.2. Samples: 9827240. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 00:52:07,937][41694] Avg episode reward: [(0, '4.502')] +[2024-11-08 00:52:07,992][42004] Updated weights for policy 0, policy_version 14486 (0.0027) +[2024-11-08 00:52:12,932][41694] Fps is (10 sec: 7782.3, 60 sec: 7099.7, 300 sec: 6984.0). Total num frames: 59371520. Throughput: 0: 1843.6. Samples: 9839256. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 00:52:12,934][41694] Avg episode reward: [(0, '4.481')] +[2024-11-08 00:52:13,095][42004] Updated weights for policy 0, policy_version 14496 (0.0027) +[2024-11-08 00:52:17,932][41694] Fps is (10 sec: 8191.8, 60 sec: 7168.0, 300 sec: 7049.1). Total num frames: 59412480. Throughput: 0: 1840.4. Samples: 9845166. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:52:17,937][41694] Avg episode reward: [(0, '4.409')] +[2024-11-08 00:52:18,268][42004] Updated weights for policy 0, policy_version 14506 (0.0024) +[2024-11-08 00:52:22,932][41694] Fps is (10 sec: 8192.0, 60 sec: 7236.3, 300 sec: 7053.5). Total num frames: 59453440. Throughput: 0: 1839.9. Samples: 9857494. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:52:22,933][41694] Avg episode reward: [(0, '4.406')] +[2024-11-08 00:52:23,287][42004] Updated weights for policy 0, policy_version 14516 (0.0023) +[2024-11-08 00:52:27,931][41694] Fps is (10 sec: 7782.7, 60 sec: 7509.4, 300 sec: 7053.5). Total num frames: 59490304. Throughput: 0: 1827.0. Samples: 9868720. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:52:27,933][41694] Avg episode reward: [(0, '4.351')] +[2024-11-08 00:52:28,856][42004] Updated weights for policy 0, policy_version 14526 (0.0030) +[2024-11-08 00:52:33,834][41694] Fps is (10 sec: 6011.0, 60 sec: 7196.3, 300 sec: 7004.3). Total num frames: 59518976. Throughput: 0: 1776.9. Samples: 9874086. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:52:33,835][41694] Avg episode reward: [(0, '4.358')] +[2024-11-08 00:52:36,416][42004] Updated weights for policy 0, policy_version 14536 (0.0033) +[2024-11-08 00:52:37,932][41694] Fps is (10 sec: 6143.9, 60 sec: 7168.0, 300 sec: 6997.9). Total num frames: 59551744. Throughput: 0: 1719.8. Samples: 9881618. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:52:37,933][41694] Avg episode reward: [(0, '4.520')] +[2024-11-08 00:52:41,797][42004] Updated weights for policy 0, policy_version 14546 (0.0022) +[2024-11-08 00:52:42,931][41694] Fps is (10 sec: 7654.2, 60 sec: 7099.7, 300 sec: 6997.9). Total num frames: 59588608. Throughput: 0: 1808.1. Samples: 9893256. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:52:42,934][41694] Avg episode reward: [(0, '4.652')] +[2024-11-08 00:52:46,788][42004] Updated weights for policy 0, policy_version 14556 (0.0029) +[2024-11-08 00:52:47,932][41694] Fps is (10 sec: 7782.4, 60 sec: 7168.0, 300 sec: 6997.9). Total num frames: 59629568. Throughput: 0: 1844.3. Samples: 9899322. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:52:47,935][41694] Avg episode reward: [(0, '4.574')] +[2024-11-08 00:52:51,794][42004] Updated weights for policy 0, policy_version 14566 (0.0026) +[2024-11-08 00:52:52,931][41694] Fps is (10 sec: 8192.0, 60 sec: 7236.3, 300 sec: 7076.1). Total num frames: 59670528. Throughput: 0: 1873.2. Samples: 9911532. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:52:52,934][41694] Avg episode reward: [(0, '4.412')] +[2024-11-08 00:52:57,005][42004] Updated weights for policy 0, policy_version 14576 (0.0029) +[2024-11-08 00:52:57,932][41694] Fps is (10 sec: 7782.3, 60 sec: 7497.1, 300 sec: 7081.2). Total num frames: 59707392. Throughput: 0: 1871.2. Samples: 9923462. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:52:57,933][41694] Avg episode reward: [(0, '4.315')] +[2024-11-08 00:53:02,883][42004] Updated weights for policy 0, policy_version 14586 (0.0027) +[2024-11-08 00:53:02,932][41694] Fps is (10 sec: 7372.7, 60 sec: 7509.3, 300 sec: 7081.2). Total num frames: 59744256. Throughput: 0: 1859.2. Samples: 9928830. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:53:02,933][41694] Avg episode reward: [(0, '4.409')] +[2024-11-08 00:53:08,225][41694] Fps is (10 sec: 5571.0, 60 sec: 7201.1, 300 sec: 7018.7). Total num frames: 59764736. Throughput: 0: 1810.5. Samples: 9939496. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:53:08,227][41694] Avg episode reward: [(0, '4.212')] +[2024-11-08 00:53:10,386][42004] Updated weights for policy 0, policy_version 14596 (0.0032) +[2024-11-08 00:53:12,932][41694] Fps is (10 sec: 5734.4, 60 sec: 7168.0, 300 sec: 7025.7). Total num frames: 59801600. Throughput: 0: 1734.7. Samples: 9946780. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:53:12,933][41694] Avg episode reward: [(0, '4.554')] +[2024-11-08 00:53:15,552][42004] Updated weights for policy 0, policy_version 14606 (0.0025) +[2024-11-08 00:53:17,932][41694] Fps is (10 sec: 8017.6, 60 sec: 7168.0, 300 sec: 7025.7). Total num frames: 59842560. Throughput: 0: 1787.5. Samples: 9952912. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:53:17,933][41694] Avg episode reward: [(0, '4.441')] +[2024-11-08 00:53:20,669][42004] Updated weights for policy 0, policy_version 14616 (0.0023) +[2024-11-08 00:53:22,932][41694] Fps is (10 sec: 8191.9, 60 sec: 7168.0, 300 sec: 7039.6). Total num frames: 59883520. Throughput: 0: 1851.6. Samples: 9964940. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:53:22,934][41694] Avg episode reward: [(0, '4.462')] +[2024-11-08 00:53:25,785][42004] Updated weights for policy 0, policy_version 14626 (0.0023) +[2024-11-08 00:53:27,932][41694] Fps is (10 sec: 8191.3, 60 sec: 7236.1, 300 sec: 7116.4). Total num frames: 59924480. Throughput: 0: 1861.6. Samples: 9977030. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:53:27,935][41694] Avg episode reward: [(0, '4.514')] +[2024-11-08 00:53:30,882][42004] Updated weights for policy 0, policy_version 14636 (0.0030) +[2024-11-08 00:53:32,932][41694] Fps is (10 sec: 7782.5, 60 sec: 7485.4, 300 sec: 7122.9). Total num frames: 59961344. Throughput: 0: 1856.8. Samples: 9982878. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 00:53:32,935][41694] Avg episode reward: [(0, '4.279')] +[2024-11-08 00:53:36,756][42004] Updated weights for policy 0, policy_version 14646 (0.0030) +[2024-11-08 00:53:37,932][41694] Fps is (10 sec: 7373.4, 60 sec: 7441.1, 300 sec: 7109.0). Total num frames: 59998208. Throughput: 0: 1826.8. Samples: 9993738. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 00:53:37,933][41694] Avg episode reward: [(0, '4.608')] +[2024-11-08 00:53:37,946][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000014648_59998208.pth... +[2024-11-08 00:53:38,182][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000014234_58302464.pth +[2024-11-08 00:53:42,932][41694] Fps is (10 sec: 5734.4, 60 sec: 7168.0, 300 sec: 7039.6). Total num frames: 60018688. Throughput: 0: 1744.4. Samples: 10001958. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 00:53:42,934][41694] Avg episode reward: [(0, '4.425')] +[2024-11-08 00:53:44,341][42004] Updated weights for policy 0, policy_version 14656 (0.0040) +[2024-11-08 00:53:47,931][41694] Fps is (10 sec: 5734.4, 60 sec: 7099.7, 300 sec: 7039.6). Total num frames: 60055552. Throughput: 0: 1727.0. Samples: 10006544. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:53:47,934][41694] Avg episode reward: [(0, '4.308')] +[2024-11-08 00:53:49,553][42004] Updated weights for policy 0, policy_version 14666 (0.0026) +[2024-11-08 00:53:52,932][41694] Fps is (10 sec: 7782.4, 60 sec: 7099.7, 300 sec: 7039.6). Total num frames: 60096512. Throughput: 0: 1764.7. Samples: 10018392. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:53:52,934][41694] Avg episode reward: [(0, '4.403')] +[2024-11-08 00:53:54,713][42004] Updated weights for policy 0, policy_version 14676 (0.0029) +[2024-11-08 00:53:57,932][41694] Fps is (10 sec: 8192.0, 60 sec: 7168.0, 300 sec: 7067.3). Total num frames: 60137472. Throughput: 0: 1859.3. Samples: 10030450. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:53:57,933][41694] Avg episode reward: [(0, '4.439')] +[2024-11-08 00:53:59,744][42004] Updated weights for policy 0, policy_version 14686 (0.0030) +[2024-11-08 00:54:02,932][41694] Fps is (10 sec: 7782.4, 60 sec: 7168.0, 300 sec: 7136.8). Total num frames: 60174336. Throughput: 0: 1856.5. Samples: 10036456. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:54:02,933][41694] Avg episode reward: [(0, '4.252')] +[2024-11-08 00:54:05,515][42004] Updated weights for policy 0, policy_version 14696 (0.0026) +[2024-11-08 00:54:07,932][41694] Fps is (10 sec: 6963.2, 60 sec: 7409.0, 300 sec: 7122.9). Total num frames: 60207104. Throughput: 0: 1826.9. Samples: 10047150. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 00:54:07,933][41694] Avg episode reward: [(0, '4.509')] +[2024-11-08 00:54:11,697][42004] Updated weights for policy 0, policy_version 14706 (0.0032) +[2024-11-08 00:54:12,932][41694] Fps is (10 sec: 6963.2, 60 sec: 7372.8, 300 sec: 7109.0). Total num frames: 60243968. Throughput: 0: 1780.9. Samples: 10057168. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 00:54:12,934][41694] Avg episode reward: [(0, '4.611')] +[2024-11-08 00:54:17,932][41694] Fps is (10 sec: 5734.3, 60 sec: 7031.4, 300 sec: 7039.6). Total num frames: 60264448. Throughput: 0: 1772.5. Samples: 10062640. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:54:17,933][41694] Avg episode reward: [(0, '4.573')] +[2024-11-08 00:54:19,058][42004] Updated weights for policy 0, policy_version 14716 (0.0028) +[2024-11-08 00:54:22,931][41694] Fps is (10 sec: 6144.0, 60 sec: 7031.5, 300 sec: 7053.5). Total num frames: 60305408. Throughput: 0: 1706.0. Samples: 10070510. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:54:22,935][41694] Avg episode reward: [(0, '4.475')] +[2024-11-08 00:54:24,175][42004] Updated weights for policy 0, policy_version 14726 (0.0025) +[2024-11-08 00:54:27,932][41694] Fps is (10 sec: 8192.1, 60 sec: 7031.6, 300 sec: 7053.4). Total num frames: 60346368. Throughput: 0: 1790.0. Samples: 10082506. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:54:27,934][41694] Avg episode reward: [(0, '4.422')] +[2024-11-08 00:54:29,269][42004] Updated weights for policy 0, policy_version 14736 (0.0032) +[2024-11-08 00:54:32,932][41694] Fps is (10 sec: 8192.0, 60 sec: 7099.7, 300 sec: 7067.3). Total num frames: 60387328. Throughput: 0: 1826.8. Samples: 10088748. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:54:32,933][41694] Avg episode reward: [(0, '4.255')] +[2024-11-08 00:54:34,354][42004] Updated weights for policy 0, policy_version 14746 (0.0030) +[2024-11-08 00:54:37,932][41694] Fps is (10 sec: 7782.5, 60 sec: 7099.7, 300 sec: 7136.8). Total num frames: 60424192. Throughput: 0: 1827.6. Samples: 10100634. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:54:37,934][41694] Avg episode reward: [(0, '4.296')] +[2024-11-08 00:54:39,806][42004] Updated weights for policy 0, policy_version 14756 (0.0032) +[2024-11-08 00:54:42,931][41694] Fps is (10 sec: 6963.2, 60 sec: 7304.5, 300 sec: 7122.9). Total num frames: 60456960. Throughput: 0: 1795.2. Samples: 10111234. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:54:42,933][41694] Avg episode reward: [(0, '4.478')] +[2024-11-08 00:54:45,780][42004] Updated weights for policy 0, policy_version 14766 (0.0027) +[2024-11-08 00:54:47,932][41694] Fps is (10 sec: 7372.8, 60 sec: 7372.8, 300 sec: 7136.8). Total num frames: 60497920. Throughput: 0: 1776.0. Samples: 10116374. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:54:47,935][41694] Avg episode reward: [(0, '4.512')] +[2024-11-08 00:54:52,931][41694] Fps is (10 sec: 6144.0, 60 sec: 7031.5, 300 sec: 7081.2). Total num frames: 60518400. Throughput: 0: 1724.4. Samples: 10124748. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:54:52,934][41694] Avg episode reward: [(0, '4.558')] +[2024-11-08 00:54:53,134][42004] Updated weights for policy 0, policy_version 14776 (0.0029) +[2024-11-08 00:54:57,931][41694] Fps is (10 sec: 6144.1, 60 sec: 7031.5, 300 sec: 7095.1). Total num frames: 60559360. Throughput: 0: 1749.6. Samples: 10135902. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:54:57,933][41694] Avg episode reward: [(0, '4.387')] +[2024-11-08 00:54:58,317][42004] Updated weights for policy 0, policy_version 14786 (0.0033) +[2024-11-08 00:55:02,931][41694] Fps is (10 sec: 7782.4, 60 sec: 7031.5, 300 sec: 7081.2). Total num frames: 60596224. Throughput: 0: 1758.9. Samples: 10141788. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:55:02,933][41694] Avg episode reward: [(0, '4.451')] +[2024-11-08 00:55:03,581][42004] Updated weights for policy 0, policy_version 14796 (0.0027) +[2024-11-08 00:55:07,932][41694] Fps is (10 sec: 7782.3, 60 sec: 7168.0, 300 sec: 7122.9). Total num frames: 60637184. Throughput: 0: 1846.9. Samples: 10153620. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:55:07,933][41694] Avg episode reward: [(0, '4.570')] +[2024-11-08 00:55:08,625][42004] Updated weights for policy 0, policy_version 14806 (0.0031) +[2024-11-08 00:55:12,932][41694] Fps is (10 sec: 8191.9, 60 sec: 7236.3, 300 sec: 7206.3). Total num frames: 60678144. Throughput: 0: 1844.1. Samples: 10165490. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:55:12,934][41694] Avg episode reward: [(0, '4.514')] +[2024-11-08 00:55:14,403][42004] Updated weights for policy 0, policy_version 14816 (0.0029) +[2024-11-08 00:55:17,932][41694] Fps is (10 sec: 6553.5, 60 sec: 7304.5, 300 sec: 7164.5). Total num frames: 60702720. Throughput: 0: 1801.9. Samples: 10169834. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:55:17,935][41694] Avg episode reward: [(0, '4.309')] +[2024-11-08 00:55:21,033][42004] Updated weights for policy 0, policy_version 14826 (0.0028) +[2024-11-08 00:55:22,932][41694] Fps is (10 sec: 6144.0, 60 sec: 7236.3, 300 sec: 7150.7). Total num frames: 60739584. Throughput: 0: 1749.2. Samples: 10179346. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:55:22,934][41694] Avg episode reward: [(0, '4.584')] +[2024-11-08 00:55:27,931][41694] Fps is (10 sec: 5734.5, 60 sec: 6895.0, 300 sec: 7095.1). Total num frames: 60760064. Throughput: 0: 1668.0. Samples: 10186296. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:55:27,934][41694] Avg episode reward: [(0, '4.618')] +[2024-11-08 00:55:28,855][42004] Updated weights for policy 0, policy_version 14836 (0.0028) +[2024-11-08 00:55:32,932][41694] Fps is (10 sec: 5734.1, 60 sec: 6826.6, 300 sec: 7081.2). Total num frames: 60796928. Throughput: 0: 1677.4. Samples: 10191858. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:55:32,935][41694] Avg episode reward: [(0, '4.473')] +[2024-11-08 00:55:34,139][42004] Updated weights for policy 0, policy_version 14846 (0.0022) +[2024-11-08 00:55:37,931][41694] Fps is (10 sec: 7782.4, 60 sec: 6894.9, 300 sec: 7095.1). Total num frames: 60837888. Throughput: 0: 1755.8. Samples: 10203758. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:55:37,933][41694] Avg episode reward: [(0, '4.451')] +[2024-11-08 00:55:37,972][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000014854_60841984.pth... +[2024-11-08 00:55:38,083][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000014434_59121664.pth +[2024-11-08 00:55:39,018][42004] Updated weights for policy 0, policy_version 14856 (0.0023) +[2024-11-08 00:55:42,932][41694] Fps is (10 sec: 8191.8, 60 sec: 7031.4, 300 sec: 7109.0). Total num frames: 60878848. Throughput: 0: 1783.2. Samples: 10216148. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:55:42,936][41694] Avg episode reward: [(0, '4.713')] +[2024-11-08 00:55:44,195][42004] Updated weights for policy 0, policy_version 14866 (0.0024) +[2024-11-08 00:55:47,932][41694] Fps is (10 sec: 8191.9, 60 sec: 7031.5, 300 sec: 7178.4). Total num frames: 60919808. Throughput: 0: 1783.7. Samples: 10222056. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:55:47,933][41694] Avg episode reward: [(0, '4.438')] +[2024-11-08 00:55:49,496][42004] Updated weights for policy 0, policy_version 14876 (0.0033) +[2024-11-08 00:55:52,932][41694] Fps is (10 sec: 7373.3, 60 sec: 7236.3, 300 sec: 7164.5). Total num frames: 60952576. Throughput: 0: 1757.6. Samples: 10232714. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:55:52,937][41694] Avg episode reward: [(0, '4.235')] +[2024-11-08 00:55:55,284][42004] Updated weights for policy 0, policy_version 14886 (0.0031) +[2024-11-08 00:55:57,932][41694] Fps is (10 sec: 6963.1, 60 sec: 7168.0, 300 sec: 7164.5). Total num frames: 60989440. Throughput: 0: 1750.0. Samples: 10244240. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:55:57,933][41694] Avg episode reward: [(0, '4.511')] +[2024-11-08 00:56:00,706][42004] Updated weights for policy 0, policy_version 14896 (0.0037) +[2024-11-08 00:56:02,932][41694] Fps is (10 sec: 7372.8, 60 sec: 7168.0, 300 sec: 7164.5). Total num frames: 61026304. Throughput: 0: 1780.0. Samples: 10249936. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 00:56:02,933][41694] Avg episode reward: [(0, '4.326')] +[2024-11-08 00:56:06,117][42004] Updated weights for policy 0, policy_version 14906 (0.0031) +[2024-11-08 00:56:07,932][41694] Fps is (10 sec: 7782.4, 60 sec: 7168.0, 300 sec: 7192.3). Total num frames: 61067264. Throughput: 0: 1816.7. Samples: 10261096. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 00:56:07,940][41694] Avg episode reward: [(0, '4.427')] +[2024-11-08 00:56:11,971][42004] Updated weights for policy 0, policy_version 14916 (0.0030) +[2024-11-08 00:56:12,932][41694] Fps is (10 sec: 7372.8, 60 sec: 7031.5, 300 sec: 7178.4). Total num frames: 61100032. Throughput: 0: 1891.7. Samples: 10271424. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:56:12,933][41694] Avg episode reward: [(0, '4.343')] +[2024-11-08 00:56:17,932][41694] Fps is (10 sec: 4914.8, 60 sec: 6894.9, 300 sec: 7109.0). Total num frames: 61116416. Throughput: 0: 1815.7. Samples: 10273566. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:56:17,937][41694] Avg episode reward: [(0, '4.362')] +[2024-11-08 00:56:20,716][42004] Updated weights for policy 0, policy_version 14926 (0.0026) +[2024-11-08 00:56:22,932][41694] Fps is (10 sec: 4914.8, 60 sec: 6826.6, 300 sec: 7150.6). Total num frames: 61149184. Throughput: 0: 1745.3. Samples: 10282298. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:56:22,934][41694] Avg episode reward: [(0, '4.533')] +[2024-11-08 00:56:27,347][42004] Updated weights for policy 0, policy_version 14936 (0.0038) +[2024-11-08 00:56:27,933][41694] Fps is (10 sec: 6553.4, 60 sec: 7031.3, 300 sec: 7122.8). Total num frames: 61181952. Throughput: 0: 1673.9. Samples: 10291474. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:56:27,936][41694] Avg episode reward: [(0, '4.373')] +[2024-11-08 00:56:32,932][41694] Fps is (10 sec: 6553.9, 60 sec: 6963.2, 300 sec: 7095.1). Total num frames: 61214720. Throughput: 0: 1644.3. Samples: 10296052. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:56:32,934][41694] Avg episode reward: [(0, '4.166')] +[2024-11-08 00:56:33,573][42004] Updated weights for policy 0, policy_version 14946 (0.0034) +[2024-11-08 00:56:37,932][41694] Fps is (10 sec: 6963.8, 60 sec: 6894.9, 300 sec: 7081.2). Total num frames: 61251584. Throughput: 0: 1657.5. Samples: 10307300. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:56:37,934][41694] Avg episode reward: [(0, '4.571')] +[2024-11-08 00:56:38,678][42004] Updated weights for policy 0, policy_version 14956 (0.0027) +[2024-11-08 00:56:42,932][41694] Fps is (10 sec: 7373.0, 60 sec: 6826.7, 300 sec: 7081.2). Total num frames: 61288448. Throughput: 0: 1648.3. Samples: 10318414. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:56:42,934][41694] Avg episode reward: [(0, '4.516')] +[2024-11-08 00:56:44,247][42004] Updated weights for policy 0, policy_version 14966 (0.0026) +[2024-11-08 00:56:47,931][41694] Fps is (10 sec: 7782.7, 60 sec: 6826.7, 300 sec: 7095.1). Total num frames: 61329408. Throughput: 0: 1648.6. Samples: 10324122. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:56:47,934][41694] Avg episode reward: [(0, '4.684')] +[2024-11-08 00:56:49,554][42004] Updated weights for policy 0, policy_version 14976 (0.0028) +[2024-11-08 00:56:52,931][41694] Fps is (10 sec: 7782.5, 60 sec: 6895.0, 300 sec: 7146.3). Total num frames: 61366272. Throughput: 0: 1662.8. Samples: 10335922. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:56:52,934][41694] Avg episode reward: [(0, '4.475')] +[2024-11-08 00:56:54,700][42004] Updated weights for policy 0, policy_version 14986 (0.0028) +[2024-11-08 00:56:57,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6894.9, 300 sec: 7150.6). Total num frames: 61403136. Throughput: 0: 1686.2. Samples: 10347304. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:56:57,934][41694] Avg episode reward: [(0, '4.573')] +[2024-11-08 00:57:00,717][42004] Updated weights for policy 0, policy_version 14996 (0.0024) +[2024-11-08 00:57:02,931][41694] Fps is (10 sec: 6963.1, 60 sec: 6826.7, 300 sec: 7136.8). Total num frames: 61435904. Throughput: 0: 1748.6. Samples: 10352252. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:57:02,934][41694] Avg episode reward: [(0, '4.434')] +[2024-11-08 00:57:06,703][42004] Updated weights for policy 0, policy_version 15006 (0.0030) +[2024-11-08 00:57:07,932][41694] Fps is (10 sec: 6553.0, 60 sec: 6690.0, 300 sec: 7109.0). Total num frames: 61468672. Throughput: 0: 1781.5. Samples: 10362466. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:57:07,935][41694] Avg episode reward: [(0, '4.480')] +[2024-11-08 00:57:12,870][42004] Updated weights for policy 0, policy_version 15016 (0.0028) +[2024-11-08 00:57:12,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6758.4, 300 sec: 7095.1). Total num frames: 61505536. Throughput: 0: 1798.1. Samples: 10372386. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:57:12,934][41694] Avg episode reward: [(0, '4.348')] +[2024-11-08 00:57:17,932][41694] Fps is (10 sec: 6963.7, 60 sec: 7031.5, 300 sec: 7067.3). Total num frames: 61538304. Throughput: 0: 1808.0. Samples: 10377412. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-08 00:57:17,934][41694] Avg episode reward: [(0, '4.306')] +[2024-11-08 00:57:18,740][42004] Updated weights for policy 0, policy_version 15026 (0.0028) +[2024-11-08 00:57:22,932][41694] Fps is (10 sec: 6144.1, 60 sec: 6963.3, 300 sec: 7039.6). Total num frames: 61566976. Throughput: 0: 1775.7. Samples: 10387206. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-08 00:57:22,934][41694] Avg episode reward: [(0, '4.492')] +[2024-11-08 00:57:25,877][42004] Updated weights for policy 0, policy_version 15036 (0.0027) +[2024-11-08 00:57:27,931][41694] Fps is (10 sec: 6144.1, 60 sec: 6963.3, 300 sec: 7075.1). Total num frames: 61599744. Throughput: 0: 1733.8. Samples: 10396436. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:57:27,935][41694] Avg episode reward: [(0, '4.439')] +[2024-11-08 00:57:31,853][42004] Updated weights for policy 0, policy_version 15046 (0.0023) +[2024-11-08 00:57:32,931][41694] Fps is (10 sec: 6553.7, 60 sec: 6963.3, 300 sec: 7053.5). Total num frames: 61632512. Throughput: 0: 1722.3. Samples: 10401624. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:57:32,934][41694] Avg episode reward: [(0, '4.532')] +[2024-11-08 00:57:37,395][42004] Updated weights for policy 0, policy_version 15056 (0.0032) +[2024-11-08 00:57:37,932][41694] Fps is (10 sec: 7372.7, 60 sec: 7031.5, 300 sec: 7067.3). Total num frames: 61673472. Throughput: 0: 1695.6. Samples: 10412224. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:57:37,934][41694] Avg episode reward: [(0, '4.338')] +[2024-11-08 00:57:37,947][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000015057_61673472.pth... +[2024-11-08 00:57:38,062][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000014648_59998208.pth +[2024-11-08 00:57:42,602][42004] Updated weights for policy 0, policy_version 15066 (0.0022) +[2024-11-08 00:57:42,931][41694] Fps is (10 sec: 7782.4, 60 sec: 7031.5, 300 sec: 7053.5). Total num frames: 61710336. Throughput: 0: 1703.0. Samples: 10423940. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:57:42,933][41694] Avg episode reward: [(0, '4.382')] +[2024-11-08 00:57:47,699][42004] Updated weights for policy 0, policy_version 15076 (0.0026) +[2024-11-08 00:57:47,931][41694] Fps is (10 sec: 7782.6, 60 sec: 7031.5, 300 sec: 7053.5). Total num frames: 61751296. Throughput: 0: 1727.4. Samples: 10429986. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:57:47,933][41694] Avg episode reward: [(0, '4.212')] +[2024-11-08 00:57:52,932][41694] Fps is (10 sec: 7782.1, 60 sec: 7031.4, 300 sec: 7053.5). Total num frames: 61788160. Throughput: 0: 1754.4. Samples: 10441412. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-08 00:57:52,933][41694] Avg episode reward: [(0, '4.357')] +[2024-11-08 00:57:53,209][42004] Updated weights for policy 0, policy_version 15086 (0.0031) +[2024-11-08 00:57:57,932][41694] Fps is (10 sec: 6143.8, 60 sec: 6826.7, 300 sec: 7011.8). Total num frames: 61812736. Throughput: 0: 1717.8. Samples: 10449688. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-08 00:57:57,933][41694] Avg episode reward: [(0, '4.504')] +[2024-11-08 00:58:00,466][42004] Updated weights for policy 0, policy_version 15096 (0.0024) +[2024-11-08 00:58:02,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6826.6, 300 sec: 7060.5). Total num frames: 61845504. Throughput: 0: 1725.0. Samples: 10455036. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:58:02,936][41694] Avg episode reward: [(0, '4.244')] +[2024-11-08 00:58:06,806][42004] Updated weights for policy 0, policy_version 15106 (0.0034) +[2024-11-08 00:58:07,932][41694] Fps is (10 sec: 6963.3, 60 sec: 6895.0, 300 sec: 7053.5). Total num frames: 61882368. Throughput: 0: 1722.0. Samples: 10464696. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:58:07,936][41694] Avg episode reward: [(0, '4.417')] +[2024-11-08 00:58:12,243][42004] Updated weights for policy 0, policy_version 15116 (0.0026) +[2024-11-08 00:58:12,931][41694] Fps is (10 sec: 7373.0, 60 sec: 6895.0, 300 sec: 7039.6). Total num frames: 61919232. Throughput: 0: 1766.4. Samples: 10475922. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:58:12,933][41694] Avg episode reward: [(0, '4.467')] +[2024-11-08 00:58:17,349][42004] Updated weights for policy 0, policy_version 15126 (0.0028) +[2024-11-08 00:58:17,932][41694] Fps is (10 sec: 7782.4, 60 sec: 7031.5, 300 sec: 7039.6). Total num frames: 61960192. Throughput: 0: 1783.3. Samples: 10481874. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:58:17,933][41694] Avg episode reward: [(0, '4.376')] +[2024-11-08 00:58:22,750][42004] Updated weights for policy 0, policy_version 15136 (0.0024) +[2024-11-08 00:58:22,931][41694] Fps is (10 sec: 7782.4, 60 sec: 7168.0, 300 sec: 7025.7). Total num frames: 61997056. Throughput: 0: 1809.5. Samples: 10493652. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:58:22,933][41694] Avg episode reward: [(0, '4.627')] +[2024-11-08 00:58:27,932][41694] Fps is (10 sec: 7372.8, 60 sec: 7236.3, 300 sec: 7025.7). Total num frames: 62033920. Throughput: 0: 1800.9. Samples: 10504982. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:58:27,934][41694] Avg episode reward: [(0, '4.447')] +[2024-11-08 00:58:28,099][42004] Updated weights for policy 0, policy_version 15146 (0.0024) +[2024-11-08 00:58:32,932][41694] Fps is (10 sec: 6143.8, 60 sec: 7099.7, 300 sec: 6984.0). Total num frames: 62058496. Throughput: 0: 1727.6. Samples: 10507730. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:58:32,934][41694] Avg episode reward: [(0, '4.386')] +[2024-11-08 00:58:35,249][42004] Updated weights for policy 0, policy_version 15156 (0.0029) +[2024-11-08 00:58:37,932][41694] Fps is (10 sec: 6144.0, 60 sec: 7031.5, 300 sec: 7039.6). Total num frames: 62095360. Throughput: 0: 1718.5. Samples: 10518744. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:58:37,934][41694] Avg episode reward: [(0, '4.198')] +[2024-11-08 00:58:41,297][42004] Updated weights for policy 0, policy_version 15166 (0.0025) +[2024-11-08 00:58:42,932][41694] Fps is (10 sec: 7372.9, 60 sec: 7031.4, 300 sec: 7039.6). Total num frames: 62132224. Throughput: 0: 1761.3. Samples: 10528946. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:58:42,934][41694] Avg episode reward: [(0, '4.270')] +[2024-11-08 00:58:46,478][42004] Updated weights for policy 0, policy_version 15176 (0.0024) +[2024-11-08 00:58:47,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6963.2, 300 sec: 7025.7). Total num frames: 62169088. Throughput: 0: 1776.3. Samples: 10534970. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:58:47,943][41694] Avg episode reward: [(0, '4.409')] +[2024-11-08 00:58:51,542][42004] Updated weights for policy 0, policy_version 15186 (0.0026) +[2024-11-08 00:58:52,932][41694] Fps is (10 sec: 7782.3, 60 sec: 7031.5, 300 sec: 7025.7). Total num frames: 62210048. Throughput: 0: 1827.2. Samples: 10546920. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:58:52,935][41694] Avg episode reward: [(0, '4.481')] +[2024-11-08 00:58:56,843][42004] Updated weights for policy 0, policy_version 15196 (0.0038) +[2024-11-08 00:58:57,931][41694] Fps is (10 sec: 8192.0, 60 sec: 7304.6, 300 sec: 7039.6). Total num frames: 62251008. Throughput: 0: 1836.7. Samples: 10558574. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:58:57,934][41694] Avg episode reward: [(0, '4.526')] +[2024-11-08 00:59:03,759][41694] Fps is (10 sec: 6052.8, 60 sec: 7070.5, 300 sec: 6992.2). Total num frames: 62275584. Throughput: 0: 1796.1. Samples: 10564184. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:59:03,761][41694] Avg episode reward: [(0, '4.552')] +[2024-11-08 00:59:04,364][42004] Updated weights for policy 0, policy_version 15206 (0.0037) +[2024-11-08 00:59:07,932][41694] Fps is (10 sec: 5734.3, 60 sec: 7099.7, 300 sec: 6997.9). Total num frames: 62308352. Throughput: 0: 1736.5. Samples: 10571796. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:59:07,934][41694] Avg episode reward: [(0, '4.518')] +[2024-11-08 00:59:09,761][42004] Updated weights for policy 0, policy_version 15216 (0.0032) +[2024-11-08 00:59:12,931][41694] Fps is (10 sec: 7144.9, 60 sec: 7031.5, 300 sec: 7039.6). Total num frames: 62341120. Throughput: 0: 1720.1. Samples: 10582388. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:59:12,933][41694] Avg episode reward: [(0, '4.507')] +[2024-11-08 00:59:15,960][42004] Updated weights for policy 0, policy_version 15226 (0.0034) +[2024-11-08 00:59:17,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6963.2, 300 sec: 7025.7). Total num frames: 62377984. Throughput: 0: 1770.0. Samples: 10587380. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:59:17,933][41694] Avg episode reward: [(0, '4.531')] +[2024-11-08 00:59:21,337][42004] Updated weights for policy 0, policy_version 15236 (0.0024) +[2024-11-08 00:59:22,932][41694] Fps is (10 sec: 7782.3, 60 sec: 7031.4, 300 sec: 7025.7). Total num frames: 62418944. Throughput: 0: 1777.3. Samples: 10598724. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:59:22,934][41694] Avg episode reward: [(0, '4.459')] +[2024-11-08 00:59:26,614][42004] Updated weights for policy 0, policy_version 15246 (0.0029) +[2024-11-08 00:59:27,931][41694] Fps is (10 sec: 7782.5, 60 sec: 7031.5, 300 sec: 7011.8). Total num frames: 62455808. Throughput: 0: 1805.1. Samples: 10610176. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:59:27,933][41694] Avg episode reward: [(0, '4.451')] +[2024-11-08 00:59:31,899][42004] Updated weights for policy 0, policy_version 15256 (0.0036) +[2024-11-08 00:59:32,931][41694] Fps is (10 sec: 7372.9, 60 sec: 7236.3, 300 sec: 7011.8). Total num frames: 62492672. Throughput: 0: 1795.6. Samples: 10615772. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:59:32,933][41694] Avg episode reward: [(0, '4.328')] +[2024-11-08 00:59:37,931][41694] Fps is (10 sec: 6553.6, 60 sec: 7099.8, 300 sec: 6997.9). Total num frames: 62521344. Throughput: 0: 1792.3. Samples: 10627572. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 00:59:37,934][41694] Avg episode reward: [(0, '4.448')] +[2024-11-08 00:59:37,949][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000015264_62521344.pth... +[2024-11-08 00:59:38,061][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000014854_60841984.pth +[2024-11-08 00:59:38,907][42004] Updated weights for policy 0, policy_version 15266 (0.0026) +[2024-11-08 00:59:42,932][41694] Fps is (10 sec: 6553.3, 60 sec: 7099.7, 300 sec: 6984.0). Total num frames: 62558208. Throughput: 0: 1713.2. Samples: 10635670. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:59:42,935][41694] Avg episode reward: [(0, '4.462')] +[2024-11-08 00:59:44,763][42004] Updated weights for policy 0, policy_version 15276 (0.0033) +[2024-11-08 00:59:47,932][41694] Fps is (10 sec: 6962.9, 60 sec: 7031.4, 300 sec: 7025.7). Total num frames: 62590976. Throughput: 0: 1733.5. Samples: 10640756. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 00:59:47,934][41694] Avg episode reward: [(0, '4.430')] +[2024-11-08 00:59:50,568][42004] Updated weights for policy 0, policy_version 15286 (0.0029) +[2024-11-08 00:59:52,932][41694] Fps is (10 sec: 6963.4, 60 sec: 6963.2, 300 sec: 7011.8). Total num frames: 62627840. Throughput: 0: 1767.2. Samples: 10651322. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:59:52,935][41694] Avg episode reward: [(0, '4.381')] +[2024-11-08 00:59:56,011][42004] Updated weights for policy 0, policy_version 15296 (0.0027) +[2024-11-08 00:59:57,932][41694] Fps is (10 sec: 7373.0, 60 sec: 6894.9, 300 sec: 7011.8). Total num frames: 62664704. Throughput: 0: 1780.3. Samples: 10662502. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 00:59:57,934][41694] Avg episode reward: [(0, '4.245')] +[2024-11-08 01:00:01,756][42004] Updated weights for policy 0, policy_version 15306 (0.0035) +[2024-11-08 01:00:02,932][41694] Fps is (10 sec: 7372.8, 60 sec: 7199.0, 300 sec: 6997.9). Total num frames: 62701568. Throughput: 0: 1792.6. Samples: 10668046. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:00:02,933][41694] Avg episode reward: [(0, '4.441')] +[2024-11-08 01:00:06,975][42004] Updated weights for policy 0, policy_version 15316 (0.0024) +[2024-11-08 01:00:07,932][41694] Fps is (10 sec: 7372.7, 60 sec: 7168.0, 300 sec: 6984.0). Total num frames: 62738432. Throughput: 0: 1787.5. Samples: 10679162. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:00:07,933][41694] Avg episode reward: [(0, '4.501')] +[2024-11-08 01:00:12,931][41694] Fps is (10 sec: 6553.7, 60 sec: 7099.7, 300 sec: 6997.9). Total num frames: 62767104. Throughput: 0: 1723.9. Samples: 10687752. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:00:12,933][41694] Avg episode reward: [(0, '4.652')] +[2024-11-08 01:00:14,039][42004] Updated weights for policy 0, policy_version 15326 (0.0029) +[2024-11-08 01:00:17,932][41694] Fps is (10 sec: 6143.9, 60 sec: 7031.4, 300 sec: 6984.0). Total num frames: 62799872. Throughput: 0: 1713.0. Samples: 10692856. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:00:17,933][41694] Avg episode reward: [(0, '4.532')] +[2024-11-08 01:00:20,845][42004] Updated weights for policy 0, policy_version 15336 (0.0037) +[2024-11-08 01:00:22,935][41694] Fps is (10 sec: 6141.6, 60 sec: 6826.2, 300 sec: 7011.7). Total num frames: 62828544. Throughput: 0: 1650.4. Samples: 10701846. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:00:22,939][41694] Avg episode reward: [(0, '4.588')] +[2024-11-08 01:00:26,504][42004] Updated weights for policy 0, policy_version 15346 (0.0026) +[2024-11-08 01:00:27,934][41694] Fps is (10 sec: 6552.1, 60 sec: 6826.4, 300 sec: 7011.7). Total num frames: 62865408. Throughput: 0: 1716.8. Samples: 10712930. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-08 01:00:27,939][41694] Avg episode reward: [(0, '4.467')] +[2024-11-08 01:00:31,661][42004] Updated weights for policy 0, policy_version 15356 (0.0028) +[2024-11-08 01:00:32,931][41694] Fps is (10 sec: 7785.5, 60 sec: 6894.9, 300 sec: 7011.8). Total num frames: 62906368. Throughput: 0: 1734.3. Samples: 10718800. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-08 01:00:32,933][41694] Avg episode reward: [(0, '4.490')] +[2024-11-08 01:00:36,979][42004] Updated weights for policy 0, policy_version 15366 (0.0023) +[2024-11-08 01:00:37,932][41694] Fps is (10 sec: 7784.1, 60 sec: 7031.4, 300 sec: 6997.9). Total num frames: 62943232. Throughput: 0: 1760.6. Samples: 10730548. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:00:37,938][41694] Avg episode reward: [(0, '4.281')] +[2024-11-08 01:00:42,299][42004] Updated weights for policy 0, policy_version 15376 (0.0030) +[2024-11-08 01:00:42,931][41694] Fps is (10 sec: 7782.4, 60 sec: 7099.8, 300 sec: 6997.9). Total num frames: 62984192. Throughput: 0: 1766.4. Samples: 10741988. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:00:42,933][41694] Avg episode reward: [(0, '4.332')] +[2024-11-08 01:00:47,932][41694] Fps is (10 sec: 6553.9, 60 sec: 6963.2, 300 sec: 6970.1). Total num frames: 63008768. Throughput: 0: 1715.6. Samples: 10745248. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:00:47,934][41694] Avg episode reward: [(0, '4.434')] +[2024-11-08 01:00:49,358][42004] Updated weights for policy 0, policy_version 15386 (0.0037) +[2024-11-08 01:00:52,931][41694] Fps is (10 sec: 6144.0, 60 sec: 6963.2, 300 sec: 6970.1). Total num frames: 63045632. Throughput: 0: 1710.0. Samples: 10756114. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:00:52,934][41694] Avg episode reward: [(0, '4.450')] +[2024-11-08 01:00:55,127][42004] Updated weights for policy 0, policy_version 15396 (0.0030) +[2024-11-08 01:00:57,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6963.2, 300 sec: 6970.1). Total num frames: 63082496. Throughput: 0: 1753.3. Samples: 10766652. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:00:57,934][41694] Avg episode reward: [(0, '4.319')] +[2024-11-08 01:01:00,403][42004] Updated weights for policy 0, policy_version 15406 (0.0028) +[2024-11-08 01:01:02,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6963.2, 300 sec: 6956.3). Total num frames: 63119360. Throughput: 0: 1772.2. Samples: 10772606. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:01:02,933][41694] Avg episode reward: [(0, '4.390')] +[2024-11-08 01:01:05,967][42004] Updated weights for policy 0, policy_version 15416 (0.0025) +[2024-11-08 01:01:07,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6963.2, 300 sec: 6970.1). Total num frames: 63156224. Throughput: 0: 1815.7. Samples: 10783544. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:01:07,935][41694] Avg episode reward: [(0, '4.322')] +[2024-11-08 01:01:11,848][42004] Updated weights for policy 0, policy_version 15426 (0.0024) +[2024-11-08 01:01:12,932][41694] Fps is (10 sec: 7372.7, 60 sec: 7099.7, 300 sec: 7039.6). Total num frames: 63193088. Throughput: 0: 1806.2. Samples: 10794204. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:01:12,934][41694] Avg episode reward: [(0, '4.184')] +[2024-11-08 01:01:17,232][42004] Updated weights for policy 0, policy_version 15436 (0.0025) +[2024-11-08 01:01:19,376][41694] Fps is (10 sec: 6084.2, 60 sec: 6932.8, 300 sec: 7005.3). Total num frames: 63225856. Throughput: 0: 1741.2. Samples: 10799672. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:01:19,378][41694] Avg episode reward: [(0, '4.589')] +[2024-11-08 01:01:22,932][41694] Fps is (10 sec: 6144.0, 60 sec: 7100.2, 300 sec: 7025.7). Total num frames: 63254528. Throughput: 0: 1721.1. Samples: 10807996. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:01:22,934][41694] Avg episode reward: [(0, '4.399')] +[2024-11-08 01:01:24,664][42004] Updated weights for policy 0, policy_version 15446 (0.0032) +[2024-11-08 01:01:27,932][41694] Fps is (10 sec: 7180.9, 60 sec: 7031.7, 300 sec: 7025.7). Total num frames: 63287296. Throughput: 0: 1690.6. Samples: 10818068. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:01:27,937][41694] Avg episode reward: [(0, '4.455')] +[2024-11-08 01:01:30,678][42004] Updated weights for policy 0, policy_version 15456 (0.0051) +[2024-11-08 01:01:32,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6963.2, 300 sec: 7025.7). Total num frames: 63324160. Throughput: 0: 1732.3. Samples: 10823202. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:01:32,933][41694] Avg episode reward: [(0, '4.431')] +[2024-11-08 01:01:35,652][42004] Updated weights for policy 0, policy_version 15466 (0.0029) +[2024-11-08 01:01:37,931][41694] Fps is (10 sec: 7783.1, 60 sec: 7031.5, 300 sec: 7039.6). Total num frames: 63365120. Throughput: 0: 1761.5. Samples: 10835380. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:01:37,934][41694] Avg episode reward: [(0, '4.577')] +[2024-11-08 01:01:37,944][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000015470_63365120.pth... +[2024-11-08 01:01:38,067][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000015057_61673472.pth +[2024-11-08 01:01:40,913][42004] Updated weights for policy 0, policy_version 15476 (0.0030) +[2024-11-08 01:01:42,932][41694] Fps is (10 sec: 7782.3, 60 sec: 6963.2, 300 sec: 7025.7). Total num frames: 63401984. Throughput: 0: 1784.4. Samples: 10846952. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:01:42,935][41694] Avg episode reward: [(0, '4.295')] +[2024-11-08 01:01:46,315][42004] Updated weights for policy 0, policy_version 15486 (0.0026) +[2024-11-08 01:01:47,932][41694] Fps is (10 sec: 7781.7, 60 sec: 7236.2, 300 sec: 7039.5). Total num frames: 63442944. Throughput: 0: 1779.7. Samples: 10852696. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:01:47,934][41694] Avg episode reward: [(0, '4.490')] +[2024-11-08 01:01:51,584][42004] Updated weights for policy 0, policy_version 15496 (0.0027) +[2024-11-08 01:01:53,358][41694] Fps is (10 sec: 6678.3, 60 sec: 7049.6, 300 sec: 7001.7). Total num frames: 63471616. Throughput: 0: 1777.5. Samples: 10864292. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:01:53,360][41694] Avg episode reward: [(0, '4.747')] +[2024-11-08 01:01:57,931][41694] Fps is (10 sec: 6144.6, 60 sec: 7031.5, 300 sec: 7011.8). Total num frames: 63504384. Throughput: 0: 1738.5. Samples: 10872434. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:01:57,934][41694] Avg episode reward: [(0, '4.383')] +[2024-11-08 01:01:58,674][42004] Updated weights for policy 0, policy_version 15506 (0.0035) +[2024-11-08 01:02:02,932][41694] Fps is (10 sec: 6845.7, 60 sec: 6963.2, 300 sec: 7011.8). Total num frames: 63537152. Throughput: 0: 1791.0. Samples: 10877680. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:02:02,934][41694] Avg episode reward: [(0, '4.327')] +[2024-11-08 01:02:05,108][42004] Updated weights for policy 0, policy_version 15516 (0.0021) +[2024-11-08 01:02:07,932][41694] Fps is (10 sec: 6553.2, 60 sec: 6894.9, 300 sec: 6997.9). Total num frames: 63569920. Throughput: 0: 1770.3. Samples: 10887658. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:02:07,935][41694] Avg episode reward: [(0, '4.339')] +[2024-11-08 01:02:10,633][42004] Updated weights for policy 0, policy_version 15526 (0.0040) +[2024-11-08 01:02:12,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6963.2, 300 sec: 7025.7). Total num frames: 63610880. Throughput: 0: 1796.6. Samples: 10898914. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:02:12,933][41694] Avg episode reward: [(0, '4.454')] +[2024-11-08 01:02:15,827][42004] Updated weights for policy 0, policy_version 15536 (0.0023) +[2024-11-08 01:02:17,931][41694] Fps is (10 sec: 7782.8, 60 sec: 7205.0, 300 sec: 7053.5). Total num frames: 63647744. Throughput: 0: 1813.2. Samples: 10904794. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:02:17,933][41694] Avg episode reward: [(0, '4.435')] +[2024-11-08 01:02:21,082][42004] Updated weights for policy 0, policy_version 15546 (0.0026) +[2024-11-08 01:02:22,932][41694] Fps is (10 sec: 7782.3, 60 sec: 7236.3, 300 sec: 7081.2). Total num frames: 63688704. Throughput: 0: 1801.2. Samples: 10916432. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:02:22,934][41694] Avg episode reward: [(0, '4.344')] +[2024-11-08 01:02:27,932][41694] Fps is (10 sec: 6553.4, 60 sec: 7099.8, 300 sec: 7053.4). Total num frames: 63713280. Throughput: 0: 1735.7. Samples: 10925058. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:02:27,934][41694] Avg episode reward: [(0, '4.327')] +[2024-11-08 01:02:28,262][42004] Updated weights for policy 0, policy_version 15556 (0.0027) +[2024-11-08 01:02:32,931][41694] Fps is (10 sec: 6144.1, 60 sec: 7099.7, 300 sec: 7039.6). Total num frames: 63750144. Throughput: 0: 1718.9. Samples: 10930044. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:02:32,936][41694] Avg episode reward: [(0, '4.470')] +[2024-11-08 01:02:33,673][42004] Updated weights for policy 0, policy_version 15566 (0.0037) +[2024-11-08 01:02:37,932][41694] Fps is (10 sec: 6963.3, 60 sec: 6963.2, 300 sec: 7025.7). Total num frames: 63782912. Throughput: 0: 1721.4. Samples: 10941020. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:02:37,933][41694] Avg episode reward: [(0, '4.482')] +[2024-11-08 01:02:39,849][42004] Updated weights for policy 0, policy_version 15576 (0.0033) +[2024-11-08 01:02:42,931][41694] Fps is (10 sec: 6553.6, 60 sec: 6895.0, 300 sec: 6997.9). Total num frames: 63815680. Throughput: 0: 1736.7. Samples: 10950586. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:02:42,933][41694] Avg episode reward: [(0, '4.328')] +[2024-11-08 01:02:45,778][42004] Updated weights for policy 0, policy_version 15586 (0.0030) +[2024-11-08 01:02:47,937][41694] Fps is (10 sec: 6549.8, 60 sec: 6757.8, 300 sec: 6983.9). Total num frames: 63848448. Throughput: 0: 1742.2. Samples: 10956090. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:02:47,939][41694] Avg episode reward: [(0, '4.192')] +[2024-11-08 01:02:52,308][42004] Updated weights for policy 0, policy_version 15596 (0.0045) +[2024-11-08 01:02:52,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6944.3, 300 sec: 7025.7). Total num frames: 63885312. Throughput: 0: 1731.1. Samples: 10965556. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:02:52,933][41694] Avg episode reward: [(0, '4.386')] +[2024-11-08 01:02:57,747][42004] Updated weights for policy 0, policy_version 15606 (0.0031) +[2024-11-08 01:02:57,932][41694] Fps is (10 sec: 7376.7, 60 sec: 6963.1, 300 sec: 7039.6). Total num frames: 63922176. Throughput: 0: 1730.9. Samples: 10976806. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:02:57,934][41694] Avg episode reward: [(0, '4.332')] +[2024-11-08 01:03:02,931][41694] Fps is (10 sec: 6143.9, 60 sec: 6826.7, 300 sec: 6997.9). Total num frames: 63946752. Throughput: 0: 1707.6. Samples: 10981634. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:03:02,934][41694] Avg episode reward: [(0, '4.439')] +[2024-11-08 01:03:05,159][42004] Updated weights for policy 0, policy_version 15616 (0.0032) +[2024-11-08 01:03:07,931][41694] Fps is (10 sec: 5734.7, 60 sec: 6826.7, 300 sec: 6984.0). Total num frames: 63979520. Throughput: 0: 1632.3. Samples: 10989884. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:03:07,933][41694] Avg episode reward: [(0, '4.355')] +[2024-11-08 01:03:10,949][42004] Updated weights for policy 0, policy_version 15626 (0.0045) +[2024-11-08 01:03:12,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6758.4, 300 sec: 6970.1). Total num frames: 64016384. Throughput: 0: 1671.5. Samples: 11000274. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:03:12,936][41694] Avg episode reward: [(0, '4.332')] +[2024-11-08 01:03:16,719][42004] Updated weights for policy 0, policy_version 15636 (0.0041) +[2024-11-08 01:03:17,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6758.4, 300 sec: 6970.1). Total num frames: 64053248. Throughput: 0: 1677.1. Samples: 11005514. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:03:17,934][41694] Avg episode reward: [(0, '4.476')] +[2024-11-08 01:03:22,091][42004] Updated weights for policy 0, policy_version 15646 (0.0031) +[2024-11-08 01:03:22,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6690.1, 300 sec: 6970.1). Total num frames: 64090112. Throughput: 0: 1692.8. Samples: 11017196. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:03:22,934][41694] Avg episode reward: [(0, '4.436')] +[2024-11-08 01:03:27,316][42004] Updated weights for policy 0, policy_version 15656 (0.0031) +[2024-11-08 01:03:27,932][41694] Fps is (10 sec: 7782.3, 60 sec: 6963.2, 300 sec: 7025.7). Total num frames: 64131072. Throughput: 0: 1736.8. Samples: 11028740. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:03:27,934][41694] Avg episode reward: [(0, '4.371')] +[2024-11-08 01:03:32,625][42004] Updated weights for policy 0, policy_version 15666 (0.0031) +[2024-11-08 01:03:32,931][41694] Fps is (10 sec: 7782.6, 60 sec: 6963.2, 300 sec: 7025.7). Total num frames: 64167936. Throughput: 0: 1740.1. Samples: 11034386. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:03:32,934][41694] Avg episode reward: [(0, '4.479')] +[2024-11-08 01:03:37,932][41694] Fps is (10 sec: 6143.8, 60 sec: 6826.6, 300 sec: 6984.0). Total num frames: 64192512. Throughput: 0: 1714.5. Samples: 11042710. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:03:37,934][41694] Avg episode reward: [(0, '4.444')] +[2024-11-08 01:03:37,990][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000015673_64196608.pth... +[2024-11-08 01:03:38,086][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000015264_62521344.pth +[2024-11-08 01:03:39,662][42004] Updated weights for policy 0, policy_version 15676 (0.0029) +[2024-11-08 01:03:42,932][41694] Fps is (10 sec: 6143.9, 60 sec: 6894.9, 300 sec: 6984.0). Total num frames: 64229376. Throughput: 0: 1713.1. Samples: 11053896. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 01:03:42,933][41694] Avg episode reward: [(0, '4.397')] +[2024-11-08 01:03:45,405][42004] Updated weights for policy 0, policy_version 15686 (0.0034) +[2024-11-08 01:03:47,931][41694] Fps is (10 sec: 7373.1, 60 sec: 6963.9, 300 sec: 6970.1). Total num frames: 64266240. Throughput: 0: 1727.1. Samples: 11059354. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 01:03:47,933][41694] Avg episode reward: [(0, '4.696')] +[2024-11-08 01:03:50,931][42004] Updated weights for policy 0, policy_version 15696 (0.0030) +[2024-11-08 01:03:52,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6894.9, 300 sec: 6942.4). Total num frames: 64299008. Throughput: 0: 1787.2. Samples: 11070308. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:03:52,933][41694] Avg episode reward: [(0, '4.331')] +[2024-11-08 01:03:57,725][42004] Updated weights for policy 0, policy_version 15706 (0.0041) +[2024-11-08 01:03:57,932][41694] Fps is (10 sec: 6553.4, 60 sec: 6826.7, 300 sec: 6989.7). Total num frames: 64331776. Throughput: 0: 1756.4. Samples: 11079314. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:03:57,934][41694] Avg episode reward: [(0, '4.414')] +[2024-11-08 01:04:02,932][41694] Fps is (10 sec: 6553.6, 60 sec: 6963.2, 300 sec: 6970.1). Total num frames: 64364544. Throughput: 0: 1758.8. Samples: 11084660. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:04:02,935][41694] Avg episode reward: [(0, '4.547')] +[2024-11-08 01:04:03,865][42004] Updated weights for policy 0, policy_version 15716 (0.0033) +[2024-11-08 01:04:09,024][41694] Fps is (10 sec: 5539.2, 60 sec: 6771.7, 300 sec: 6930.6). Total num frames: 64393216. Throughput: 0: 1662.8. Samples: 11093836. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:04:09,025][41694] Avg episode reward: [(0, '4.412')] +[2024-11-08 01:04:11,949][42004] Updated weights for policy 0, policy_version 15726 (0.0043) +[2024-11-08 01:04:12,931][41694] Fps is (10 sec: 5324.8, 60 sec: 6690.1, 300 sec: 6914.6). Total num frames: 64417792. Throughput: 0: 1609.3. Samples: 11101160. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:04:12,933][41694] Avg episode reward: [(0, '4.548')] +[2024-11-08 01:04:17,931][41694] Fps is (10 sec: 6437.4, 60 sec: 6621.9, 300 sec: 6886.8). Total num frames: 64450560. Throughput: 0: 1584.0. Samples: 11105666. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:04:17,933][41694] Avg episode reward: [(0, '4.267')] +[2024-11-08 01:04:18,223][42004] Updated weights for policy 0, policy_version 15736 (0.0028) +[2024-11-08 01:04:22,932][41694] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6872.9). Total num frames: 64483328. Throughput: 0: 1619.9. Samples: 11115604. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:04:22,934][41694] Avg episode reward: [(0, '4.346')] +[2024-11-08 01:04:24,548][42004] Updated weights for policy 0, policy_version 15746 (0.0028) +[2024-11-08 01:04:27,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6485.3, 300 sec: 6872.9). Total num frames: 64520192. Throughput: 0: 1604.4. Samples: 11126096. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:04:27,940][41694] Avg episode reward: [(0, '4.370')] +[2024-11-08 01:04:30,090][42004] Updated weights for policy 0, policy_version 15756 (0.0036) +[2024-11-08 01:04:32,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6485.3, 300 sec: 6900.7). Total num frames: 64557056. Throughput: 0: 1603.3. Samples: 11131502. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:04:32,933][41694] Avg episode reward: [(0, '4.584')] +[2024-11-08 01:04:35,812][42004] Updated weights for policy 0, policy_version 15766 (0.0029) +[2024-11-08 01:04:37,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6621.9, 300 sec: 6886.8). Total num frames: 64589824. Throughput: 0: 1601.5. Samples: 11142374. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:04:37,933][41694] Avg episode reward: [(0, '4.528')] +[2024-11-08 01:04:43,059][41694] Fps is (10 sec: 5662.5, 60 sec: 6403.5, 300 sec: 6856.1). Total num frames: 64614400. Throughput: 0: 1505.1. Samples: 11147236. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:04:43,060][41694] Avg episode reward: [(0, '4.575')] +[2024-11-08 01:04:43,695][42004] Updated weights for policy 0, policy_version 15776 (0.0032) +[2024-11-08 01:04:47,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6348.8, 300 sec: 6845.2). Total num frames: 64647168. Throughput: 0: 1548.2. Samples: 11154328. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:04:47,933][41694] Avg episode reward: [(0, '4.503')] +[2024-11-08 01:04:49,818][42004] Updated weights for policy 0, policy_version 15786 (0.0029) +[2024-11-08 01:04:52,932][41694] Fps is (10 sec: 6223.0, 60 sec: 6280.5, 300 sec: 6817.4). Total num frames: 64675840. Throughput: 0: 1603.2. Samples: 11164228. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:04:52,933][41694] Avg episode reward: [(0, '4.396')] +[2024-11-08 01:04:56,619][42004] Updated weights for policy 0, policy_version 15796 (0.0048) +[2024-11-08 01:04:57,931][41694] Fps is (10 sec: 6144.1, 60 sec: 6280.6, 300 sec: 6803.5). Total num frames: 64708608. Throughput: 0: 1603.3. Samples: 11173310. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:04:57,933][41694] Avg episode reward: [(0, '4.373')] +[2024-11-08 01:05:02,607][42004] Updated weights for policy 0, policy_version 15806 (0.0043) +[2024-11-08 01:05:02,931][41694] Fps is (10 sec: 6553.7, 60 sec: 6280.5, 300 sec: 6789.6). Total num frames: 64741376. Throughput: 0: 1621.8. Samples: 11178648. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:05:02,933][41694] Avg episode reward: [(0, '4.458')] +[2024-11-08 01:05:07,932][41694] Fps is (10 sec: 6962.6, 60 sec: 6535.9, 300 sec: 6817.4). Total num frames: 64778240. Throughput: 0: 1627.3. Samples: 11188832. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:05:07,934][41694] Avg episode reward: [(0, '4.121')] +[2024-11-08 01:05:08,483][42004] Updated weights for policy 0, policy_version 15816 (0.0028) +[2024-11-08 01:05:12,932][41694] Fps is (10 sec: 6963.0, 60 sec: 6553.6, 300 sec: 6817.4). Total num frames: 64811008. Throughput: 0: 1622.4. Samples: 11199104. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:05:12,933][41694] Avg episode reward: [(0, '4.517')] +[2024-11-08 01:05:14,768][42004] Updated weights for policy 0, policy_version 15826 (0.0040) +[2024-11-08 01:05:17,932][41694] Fps is (10 sec: 4915.5, 60 sec: 6280.5, 300 sec: 6775.8). Total num frames: 64827392. Throughput: 0: 1605.6. Samples: 11203752. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:05:17,933][41694] Avg episode reward: [(0, '4.668')] +[2024-11-08 01:05:22,931][41694] Fps is (10 sec: 4505.7, 60 sec: 6212.3, 300 sec: 6748.0). Total num frames: 64856064. Throughput: 0: 1490.1. Samples: 11209430. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:05:22,934][41694] Avg episode reward: [(0, '4.666')] +[2024-11-08 01:05:23,564][42004] Updated weights for policy 0, policy_version 15836 (0.0045) +[2024-11-08 01:05:27,932][41694] Fps is (10 sec: 6553.5, 60 sec: 6212.3, 300 sec: 6734.1). Total num frames: 64892928. Throughput: 0: 1613.2. Samples: 11219626. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:05:27,934][41694] Avg episode reward: [(0, '4.502')] +[2024-11-08 01:05:29,675][42004] Updated weights for policy 0, policy_version 15846 (0.0022) +[2024-11-08 01:05:32,932][41694] Fps is (10 sec: 6962.8, 60 sec: 6143.9, 300 sec: 6720.2). Total num frames: 64925696. Throughput: 0: 1557.5. Samples: 11224416. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:05:32,936][41694] Avg episode reward: [(0, '4.463')] +[2024-11-08 01:05:35,486][42004] Updated weights for policy 0, policy_version 15856 (0.0041) +[2024-11-08 01:05:37,932][41694] Fps is (10 sec: 6963.3, 60 sec: 6212.3, 300 sec: 6706.3). Total num frames: 64962560. Throughput: 0: 1579.4. Samples: 11235302. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:05:37,933][41694] Avg episode reward: [(0, '4.489')] +[2024-11-08 01:05:37,952][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000015860_64962560.pth... +[2024-11-08 01:05:38,127][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000015470_63365120.pth +[2024-11-08 01:05:41,253][42004] Updated weights for policy 0, policy_version 15866 (0.0039) +[2024-11-08 01:05:42,932][41694] Fps is (10 sec: 6963.3, 60 sec: 6362.2, 300 sec: 6734.1). Total num frames: 64995328. Throughput: 0: 1613.7. Samples: 11245928. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:05:42,938][41694] Avg episode reward: [(0, '4.351')] +[2024-11-08 01:05:47,194][42004] Updated weights for policy 0, policy_version 15876 (0.0036) +[2024-11-08 01:05:47,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6417.1, 300 sec: 6734.1). Total num frames: 65032192. Throughput: 0: 1602.8. Samples: 11250776. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:05:47,933][41694] Avg episode reward: [(0, '4.425')] +[2024-11-08 01:05:52,932][41694] Fps is (10 sec: 5734.1, 60 sec: 6280.4, 300 sec: 6678.5). Total num frames: 65052672. Throughput: 0: 1544.6. Samples: 11258340. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:05:52,937][41694] Avg episode reward: [(0, '4.533')] +[2024-11-08 01:05:55,051][42004] Updated weights for policy 0, policy_version 15886 (0.0029) +[2024-11-08 01:05:57,931][41694] Fps is (10 sec: 5324.8, 60 sec: 6280.5, 300 sec: 6664.7). Total num frames: 65085440. Throughput: 0: 1528.8. Samples: 11267898. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:05:57,933][41694] Avg episode reward: [(0, '4.145')] +[2024-11-08 01:06:00,975][42004] Updated weights for policy 0, policy_version 15896 (0.0039) +[2024-11-08 01:06:02,932][41694] Fps is (10 sec: 6554.0, 60 sec: 6280.5, 300 sec: 6650.8). Total num frames: 65118208. Throughput: 0: 1546.4. Samples: 11273340. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:06:02,933][41694] Avg episode reward: [(0, '4.420')] +[2024-11-08 01:06:07,800][42004] Updated weights for policy 0, policy_version 15906 (0.0027) +[2024-11-08 01:06:07,931][41694] Fps is (10 sec: 6553.6, 60 sec: 6212.3, 300 sec: 6636.9). Total num frames: 65150976. Throughput: 0: 1618.6. Samples: 11282266. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:06:07,933][41694] Avg episode reward: [(0, '4.495')] +[2024-11-08 01:06:12,932][41694] Fps is (10 sec: 6553.8, 60 sec: 6212.3, 300 sec: 6669.6). Total num frames: 65183744. Throughput: 0: 1619.5. Samples: 11292502. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:06:12,933][41694] Avg episode reward: [(0, '4.489')] +[2024-11-08 01:06:13,712][42004] Updated weights for policy 0, policy_version 15916 (0.0031) +[2024-11-08 01:06:17,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6553.6, 300 sec: 6664.7). Total num frames: 65220608. Throughput: 0: 1622.5. Samples: 11297428. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:06:17,933][41694] Avg episode reward: [(0, '4.351')] +[2024-11-08 01:06:19,590][42004] Updated weights for policy 0, policy_version 15926 (0.0034) +[2024-11-08 01:06:22,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6690.1, 300 sec: 6678.6). Total num frames: 65257472. Throughput: 0: 1623.3. Samples: 11308352. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:06:22,933][41694] Avg episode reward: [(0, '4.455')] +[2024-11-08 01:06:26,916][42004] Updated weights for policy 0, policy_version 15936 (0.0028) +[2024-11-08 01:06:27,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6417.1, 300 sec: 6623.0). Total num frames: 65277952. Throughput: 0: 1559.6. Samples: 11316108. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:06:27,934][41694] Avg episode reward: [(0, '4.348')] +[2024-11-08 01:06:32,570][42004] Updated weights for policy 0, policy_version 15946 (0.0025) +[2024-11-08 01:06:32,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6485.4, 300 sec: 6609.1). Total num frames: 65314816. Throughput: 0: 1570.4. Samples: 11321444. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:06:32,933][41694] Avg episode reward: [(0, '4.538')] +[2024-11-08 01:06:37,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6485.3, 300 sec: 6609.1). Total num frames: 65351680. Throughput: 0: 1647.2. Samples: 11332464. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:06:37,933][41694] Avg episode reward: [(0, '4.507')] +[2024-11-08 01:06:38,478][42004] Updated weights for policy 0, policy_version 15956 (0.0029) +[2024-11-08 01:06:42,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6485.4, 300 sec: 6581.4). Total num frames: 65384448. Throughput: 0: 1666.9. Samples: 11342910. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:06:42,933][41694] Avg episode reward: [(0, '4.338')] +[2024-11-08 01:06:44,072][42004] Updated weights for policy 0, policy_version 15966 (0.0049) +[2024-11-08 01:06:47,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6553.6, 300 sec: 6632.6). Total num frames: 65425408. Throughput: 0: 1674.0. Samples: 11348668. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:06:47,933][41694] Avg episode reward: [(0, '4.469')] +[2024-11-08 01:06:49,605][42004] Updated weights for policy 0, policy_version 15976 (0.0035) +[2024-11-08 01:06:52,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6758.5, 300 sec: 6623.0). Total num frames: 65458176. Throughput: 0: 1719.2. Samples: 11359632. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:06:52,933][41694] Avg episode reward: [(0, '4.307')] +[2024-11-08 01:06:55,519][42004] Updated weights for policy 0, policy_version 15986 (0.0029) +[2024-11-08 01:06:59,380][41694] Fps is (10 sec: 5366.5, 60 sec: 6532.4, 300 sec: 6576.8). Total num frames: 65486848. Throughput: 0: 1656.3. Samples: 11369436. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:06:59,384][41694] Avg episode reward: [(0, '4.355')] +[2024-11-08 01:07:02,932][41694] Fps is (10 sec: 5324.7, 60 sec: 6553.6, 300 sec: 6581.4). Total num frames: 65511424. Throughput: 0: 1638.9. Samples: 11371180. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:07:02,936][41694] Avg episode reward: [(0, '4.252')] +[2024-11-08 01:07:04,406][42004] Updated weights for policy 0, policy_version 15996 (0.0036) +[2024-11-08 01:07:07,932][41694] Fps is (10 sec: 6705.8, 60 sec: 6553.6, 300 sec: 6553.6). Total num frames: 65544192. Throughput: 0: 1599.3. Samples: 11380322. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:07:07,937][41694] Avg episode reward: [(0, '4.246')] +[2024-11-08 01:07:10,214][42004] Updated weights for policy 0, policy_version 16006 (0.0026) +[2024-11-08 01:07:12,932][41694] Fps is (10 sec: 6143.9, 60 sec: 6485.3, 300 sec: 6525.8). Total num frames: 65572864. Throughput: 0: 1643.2. Samples: 11390052. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:07:12,935][41694] Avg episode reward: [(0, '4.460')] +[2024-11-08 01:07:16,347][42004] Updated weights for policy 0, policy_version 16016 (0.0036) +[2024-11-08 01:07:17,932][41694] Fps is (10 sec: 6553.6, 60 sec: 6485.3, 300 sec: 6511.9). Total num frames: 65609728. Throughput: 0: 1640.7. Samples: 11395274. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:07:17,934][41694] Avg episode reward: [(0, '4.629')] +[2024-11-08 01:07:22,077][42004] Updated weights for policy 0, policy_version 16026 (0.0029) +[2024-11-08 01:07:22,931][41694] Fps is (10 sec: 7373.0, 60 sec: 6485.3, 300 sec: 6553.6). Total num frames: 65646592. Throughput: 0: 1638.3. Samples: 11406188. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:07:22,933][41694] Avg episode reward: [(0, '4.567')] +[2024-11-08 01:07:27,412][42004] Updated weights for policy 0, policy_version 16036 (0.0025) +[2024-11-08 01:07:27,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6758.4, 300 sec: 6553.6). Total num frames: 65683456. Throughput: 0: 1660.3. Samples: 11417622. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:07:27,934][41694] Avg episode reward: [(0, '4.702')] +[2024-11-08 01:07:33,373][41694] Fps is (10 sec: 6276.5, 60 sec: 6573.5, 300 sec: 6529.9). Total num frames: 65712128. Throughput: 0: 1636.8. Samples: 11423048. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:07:33,374][41694] Avg episode reward: [(0, '4.512')] +[2024-11-08 01:07:34,663][42004] Updated weights for policy 0, policy_version 16046 (0.0025) +[2024-11-08 01:07:37,932][41694] Fps is (10 sec: 6144.0, 60 sec: 6553.6, 300 sec: 6539.7). Total num frames: 65744896. Throughput: 0: 1587.4. Samples: 11431064. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:07:37,934][41694] Avg episode reward: [(0, '4.368')] +[2024-11-08 01:07:37,950][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000016051_65744896.pth... +[2024-11-08 01:07:38,099][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000015673_64196608.pth +[2024-11-08 01:07:40,396][42004] Updated weights for policy 0, policy_version 16056 (0.0026) +[2024-11-08 01:07:42,932][41694] Fps is (10 sec: 7284.6, 60 sec: 6621.8, 300 sec: 6553.7). Total num frames: 65781760. Throughput: 0: 1659.3. Samples: 11441702. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:07:42,937][41694] Avg episode reward: [(0, '4.427')] +[2024-11-08 01:07:46,562][42004] Updated weights for policy 0, policy_version 16066 (0.0025) +[2024-11-08 01:07:47,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6485.3, 300 sec: 6539.7). Total num frames: 65814528. Throughput: 0: 1673.6. Samples: 11446492. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:07:47,933][41694] Avg episode reward: [(0, '4.522')] +[2024-11-08 01:07:52,407][42004] Updated weights for policy 0, policy_version 16076 (0.0032) +[2024-11-08 01:07:52,931][41694] Fps is (10 sec: 6553.8, 60 sec: 6485.3, 300 sec: 6525.8). Total num frames: 65847296. Throughput: 0: 1700.6. Samples: 11456848. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:07:52,933][41694] Avg episode reward: [(0, '4.409')] +[2024-11-08 01:07:57,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6785.7, 300 sec: 6567.5). Total num frames: 65884160. Throughput: 0: 1724.8. Samples: 11467668. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:07:57,934][41694] Avg episode reward: [(0, '4.404')] +[2024-11-08 01:07:57,976][42004] Updated weights for policy 0, policy_version 16086 (0.0042) +[2024-11-08 01:08:02,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6758.4, 300 sec: 6567.5). Total num frames: 65916928. Throughput: 0: 1729.5. Samples: 11473100. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:08:02,933][41694] Avg episode reward: [(0, '4.280')] +[2024-11-08 01:08:04,033][42004] Updated weights for policy 0, policy_version 16096 (0.0047) +[2024-11-08 01:08:07,931][41694] Fps is (10 sec: 5734.5, 60 sec: 6621.9, 300 sec: 6525.8). Total num frames: 65941504. Throughput: 0: 1691.2. Samples: 11482294. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:08:07,933][41694] Avg episode reward: [(0, '4.215')] +[2024-11-08 01:08:11,656][42004] Updated weights for policy 0, policy_version 16106 (0.0034) +[2024-11-08 01:08:12,940][41694] Fps is (10 sec: 6138.6, 60 sec: 6757.4, 300 sec: 6525.6). Total num frames: 65978368. Throughput: 0: 1621.2. Samples: 11490592. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:08:12,942][41694] Avg episode reward: [(0, '4.403')] +[2024-11-08 01:08:17,931][41694] Fps is (10 sec: 6553.6, 60 sec: 6621.9, 300 sec: 6498.1). Total num frames: 66007040. Throughput: 0: 1626.7. Samples: 11495532. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:08:17,934][41694] Avg episode reward: [(0, '4.482')] +[2024-11-08 01:08:17,995][42004] Updated weights for policy 0, policy_version 16116 (0.0029) +[2024-11-08 01:08:22,932][41694] Fps is (10 sec: 6149.4, 60 sec: 6553.6, 300 sec: 6470.3). Total num frames: 66039808. Throughput: 0: 1641.7. Samples: 11504940. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:08:22,933][41694] Avg episode reward: [(0, '4.462')] +[2024-11-08 01:08:24,358][42004] Updated weights for policy 0, policy_version 16126 (0.0031) +[2024-11-08 01:08:27,932][41694] Fps is (10 sec: 6962.8, 60 sec: 6553.5, 300 sec: 6470.3). Total num frames: 66076672. Throughput: 0: 1639.1. Samples: 11515460. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:08:27,938][41694] Avg episode reward: [(0, '4.340')] +[2024-11-08 01:08:30,142][42004] Updated weights for policy 0, policy_version 16136 (0.0034) +[2024-11-08 01:08:32,932][41694] Fps is (10 sec: 7372.6, 60 sec: 6739.7, 300 sec: 6511.9). Total num frames: 66113536. Throughput: 0: 1646.7. Samples: 11520596. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:08:32,934][41694] Avg episode reward: [(0, '4.362')] +[2024-11-08 01:08:35,608][42004] Updated weights for policy 0, policy_version 16146 (0.0032) +[2024-11-08 01:08:37,932][41694] Fps is (10 sec: 7373.2, 60 sec: 6758.4, 300 sec: 6511.9). Total num frames: 66150400. Throughput: 0: 1664.2. Samples: 11531738. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:08:37,933][41694] Avg episode reward: [(0, '4.274')] +[2024-11-08 01:08:42,816][42004] Updated weights for policy 0, policy_version 16156 (0.0030) +[2024-11-08 01:08:42,931][41694] Fps is (10 sec: 6144.2, 60 sec: 6553.6, 300 sec: 6470.3). Total num frames: 66174976. Throughput: 0: 1603.2. Samples: 11539812. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:08:42,933][41694] Avg episode reward: [(0, '4.281')] +[2024-11-08 01:08:47,931][41694] Fps is (10 sec: 6144.0, 60 sec: 6621.9, 300 sec: 6484.2). Total num frames: 66211840. Throughput: 0: 1602.3. Samples: 11545202. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:08:47,934][41694] Avg episode reward: [(0, '4.493')] +[2024-11-08 01:08:48,360][42004] Updated weights for policy 0, policy_version 16166 (0.0026) +[2024-11-08 01:08:52,932][41694] Fps is (10 sec: 6962.9, 60 sec: 6621.8, 300 sec: 6484.2). Total num frames: 66244608. Throughput: 0: 1638.6. Samples: 11556030. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:08:52,934][41694] Avg episode reward: [(0, '4.530')] +[2024-11-08 01:08:54,474][42004] Updated weights for policy 0, policy_version 16176 (0.0031) +[2024-11-08 01:08:57,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6621.9, 300 sec: 6498.1). Total num frames: 66281472. Throughput: 0: 1682.1. Samples: 11566274. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:08:57,936][41694] Avg episode reward: [(0, '4.473')] +[2024-11-08 01:09:00,267][42004] Updated weights for policy 0, policy_version 16186 (0.0039) +[2024-11-08 01:09:02,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6621.8, 300 sec: 6536.1). Total num frames: 66314240. Throughput: 0: 1691.3. Samples: 11571642. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:09:02,934][41694] Avg episode reward: [(0, '4.458')] +[2024-11-08 01:09:06,095][42004] Updated weights for policy 0, policy_version 16196 (0.0033) +[2024-11-08 01:09:07,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6826.7, 300 sec: 6553.6). Total num frames: 66351104. Throughput: 0: 1715.8. Samples: 11582150. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:09:07,934][41694] Avg episode reward: [(0, '4.427')] +[2024-11-08 01:09:11,639][42004] Updated weights for policy 0, policy_version 16206 (0.0028) +[2024-11-08 01:09:12,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6827.6, 300 sec: 6567.5). Total num frames: 66387968. Throughput: 0: 1725.6. Samples: 11593110. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:09:12,935][41694] Avg episode reward: [(0, '4.481')] +[2024-11-08 01:09:17,932][41694] Fps is (10 sec: 6143.9, 60 sec: 6758.4, 300 sec: 6539.7). Total num frames: 66412544. Throughput: 0: 1667.3. Samples: 11595626. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:09:17,934][41694] Avg episode reward: [(0, '4.529')] +[2024-11-08 01:09:18,949][42004] Updated weights for policy 0, policy_version 16216 (0.0028) +[2024-11-08 01:09:22,931][41694] Fps is (10 sec: 6144.3, 60 sec: 6826.7, 300 sec: 6539.7). Total num frames: 66449408. Throughput: 0: 1658.6. Samples: 11606376. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:09:22,933][41694] Avg episode reward: [(0, '4.469')] +[2024-11-08 01:09:24,789][42004] Updated weights for policy 0, policy_version 16226 (0.0035) +[2024-11-08 01:09:27,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6758.4, 300 sec: 6525.8). Total num frames: 66482176. Throughput: 0: 1705.0. Samples: 11616538. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:09:27,934][41694] Avg episode reward: [(0, '4.443')] +[2024-11-08 01:09:30,794][42004] Updated weights for policy 0, policy_version 16236 (0.0034) +[2024-11-08 01:09:32,931][41694] Fps is (10 sec: 6553.6, 60 sec: 6690.2, 300 sec: 6525.8). Total num frames: 66514944. Throughput: 0: 1702.0. Samples: 11621794. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:09:32,933][41694] Avg episode reward: [(0, '4.411')] +[2024-11-08 01:09:36,210][42004] Updated weights for policy 0, policy_version 16246 (0.0031) +[2024-11-08 01:09:37,933][41694] Fps is (10 sec: 6963.0, 60 sec: 6690.1, 300 sec: 6570.3). Total num frames: 66551808. Throughput: 0: 1710.8. Samples: 11633018. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:09:37,935][41694] Avg episode reward: [(0, '4.351')] +[2024-11-08 01:09:37,947][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000016249_66555904.pth... +[2024-11-08 01:09:38,057][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000015860_64962560.pth +[2024-11-08 01:09:41,913][42004] Updated weights for policy 0, policy_version 16256 (0.0044) +[2024-11-08 01:09:42,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6894.9, 300 sec: 6581.4). Total num frames: 66588672. Throughput: 0: 1724.9. Samples: 11643896. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:09:42,933][41694] Avg episode reward: [(0, '4.428')] +[2024-11-08 01:09:49,066][41694] Fps is (10 sec: 6253.9, 60 sec: 6700.0, 300 sec: 6570.0). Total num frames: 66621440. Throughput: 0: 1681.2. Samples: 11649202. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:09:49,068][41694] Avg episode reward: [(0, '4.395')] +[2024-11-08 01:09:49,249][42004] Updated weights for policy 0, policy_version 16266 (0.0030) +[2024-11-08 01:09:52,932][41694] Fps is (10 sec: 6143.9, 60 sec: 6758.4, 300 sec: 6581.4). Total num frames: 66650112. Throughput: 0: 1671.2. Samples: 11657354. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:09:52,933][41694] Avg episode reward: [(0, '4.365')] +[2024-11-08 01:09:54,771][42004] Updated weights for policy 0, policy_version 16276 (0.0036) +[2024-11-08 01:09:57,932][41694] Fps is (10 sec: 7392.3, 60 sec: 6758.4, 300 sec: 6595.2). Total num frames: 66686976. Throughput: 0: 1673.7. Samples: 11668424. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:09:57,935][41694] Avg episode reward: [(0, '4.386')] +[2024-11-08 01:10:01,114][42004] Updated weights for policy 0, policy_version 16286 (0.0032) +[2024-11-08 01:10:02,932][41694] Fps is (10 sec: 6553.3, 60 sec: 6690.1, 300 sec: 6567.5). Total num frames: 66715648. Throughput: 0: 1713.5. Samples: 11672736. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:10:02,935][41694] Avg episode reward: [(0, '4.440')] +[2024-11-08 01:10:07,222][42004] Updated weights for policy 0, policy_version 16296 (0.0029) +[2024-11-08 01:10:07,931][41694] Fps is (10 sec: 6553.8, 60 sec: 6690.1, 300 sec: 6581.4). Total num frames: 66752512. Throughput: 0: 1687.2. Samples: 11682298. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:10:07,933][41694] Avg episode reward: [(0, '4.164')] +[2024-11-08 01:10:12,932][41694] Fps is (10 sec: 6963.6, 60 sec: 6621.9, 300 sec: 6636.9). Total num frames: 66785280. Throughput: 0: 1700.4. Samples: 11693056. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:10:12,933][41694] Avg episode reward: [(0, '4.374')] +[2024-11-08 01:10:13,116][42004] Updated weights for policy 0, policy_version 16306 (0.0024) +[2024-11-08 01:10:17,932][41694] Fps is (10 sec: 6553.1, 60 sec: 6758.4, 300 sec: 6650.8). Total num frames: 66818048. Throughput: 0: 1684.6. Samples: 11697602. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:10:17,934][41694] Avg episode reward: [(0, '4.300')] +[2024-11-08 01:10:19,352][42004] Updated weights for policy 0, policy_version 16316 (0.0024) +[2024-11-08 01:10:23,024][41694] Fps is (10 sec: 5682.1, 60 sec: 6543.5, 300 sec: 6607.1). Total num frames: 66842624. Throughput: 0: 1661.0. Samples: 11707916. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:10:23,026][41694] Avg episode reward: [(0, '4.307')] +[2024-11-08 01:10:26,594][42004] Updated weights for policy 0, policy_version 16326 (0.0027) +[2024-11-08 01:10:27,932][41694] Fps is (10 sec: 6144.3, 60 sec: 6621.9, 300 sec: 6623.0). Total num frames: 66879488. Throughput: 0: 1601.9. Samples: 11715984. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:10:27,936][41694] Avg episode reward: [(0, '4.403')] +[2024-11-08 01:10:32,932][41694] Fps is (10 sec: 6614.3, 60 sec: 6553.5, 300 sec: 6595.2). Total num frames: 66908160. Throughput: 0: 1628.6. Samples: 11720642. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:10:32,934][41694] Avg episode reward: [(0, '4.402')] +[2024-11-08 01:10:33,243][42004] Updated weights for policy 0, policy_version 16336 (0.0026) +[2024-11-08 01:10:37,932][41694] Fps is (10 sec: 6143.7, 60 sec: 6485.3, 300 sec: 6595.2). Total num frames: 66940928. Throughput: 0: 1612.2. Samples: 11729906. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:10:37,934][41694] Avg episode reward: [(0, '4.472')] +[2024-11-08 01:10:39,510][42004] Updated weights for policy 0, policy_version 16346 (0.0048) +[2024-11-08 01:10:42,931][41694] Fps is (10 sec: 6963.5, 60 sec: 6485.3, 300 sec: 6595.3). Total num frames: 66977792. Throughput: 0: 1598.9. Samples: 11740376. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:10:42,933][41694] Avg episode reward: [(0, '4.510')] +[2024-11-08 01:10:45,367][42004] Updated weights for policy 0, policy_version 16356 (0.0041) +[2024-11-08 01:10:47,932][41694] Fps is (10 sec: 6963.6, 60 sec: 6610.3, 300 sec: 6636.9). Total num frames: 67010560. Throughput: 0: 1617.8. Samples: 11745534. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:10:47,935][41694] Avg episode reward: [(0, '4.441')] +[2024-11-08 01:10:50,887][42004] Updated weights for policy 0, policy_version 16366 (0.0021) +[2024-11-08 01:10:52,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6621.9, 300 sec: 6650.8). Total num frames: 67047424. Throughput: 0: 1652.8. Samples: 11756674. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:10:52,935][41694] Avg episode reward: [(0, '4.456')] +[2024-11-08 01:10:57,932][41694] Fps is (10 sec: 6143.8, 60 sec: 6417.0, 300 sec: 6623.0). Total num frames: 67072000. Throughput: 0: 1592.3. Samples: 11764708. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:10:57,934][41694] Avg episode reward: [(0, '4.482')] +[2024-11-08 01:10:58,018][42004] Updated weights for policy 0, policy_version 16376 (0.0037) +[2024-11-08 01:11:02,931][41694] Fps is (10 sec: 6144.1, 60 sec: 6553.7, 300 sec: 6636.9). Total num frames: 67108864. Throughput: 0: 1615.8. Samples: 11770310. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:11:02,934][41694] Avg episode reward: [(0, '4.429')] +[2024-11-08 01:11:03,825][42004] Updated weights for policy 0, policy_version 16386 (0.0033) +[2024-11-08 01:11:07,940][41694] Fps is (10 sec: 6957.8, 60 sec: 6484.4, 300 sec: 6636.7). Total num frames: 67141632. Throughput: 0: 1620.7. Samples: 11780712. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:11:07,944][41694] Avg episode reward: [(0, '4.282')] +[2024-11-08 01:11:10,053][42004] Updated weights for policy 0, policy_version 16396 (0.0055) +[2024-11-08 01:11:12,937][41694] Fps is (10 sec: 6550.1, 60 sec: 6484.8, 300 sec: 6622.9). Total num frames: 67174400. Throughput: 0: 1640.0. Samples: 11789794. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:11:12,939][41694] Avg episode reward: [(0, '4.421')] +[2024-11-08 01:11:16,204][42004] Updated weights for policy 0, policy_version 16406 (0.0044) +[2024-11-08 01:11:17,932][41694] Fps is (10 sec: 6558.8, 60 sec: 6485.4, 300 sec: 6609.1). Total num frames: 67207168. Throughput: 0: 1652.4. Samples: 11794998. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:11:17,933][41694] Avg episode reward: [(0, '4.412')] +[2024-11-08 01:11:21,862][42004] Updated weights for policy 0, policy_version 16416 (0.0028) +[2024-11-08 01:11:22,932][41694] Fps is (10 sec: 7376.4, 60 sec: 6768.8, 300 sec: 6678.6). Total num frames: 67248128. Throughput: 0: 1691.8. Samples: 11806036. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:11:22,935][41694] Avg episode reward: [(0, '4.468')] +[2024-11-08 01:11:27,610][42004] Updated weights for policy 0, policy_version 16426 (0.0029) +[2024-11-08 01:11:27,932][41694] Fps is (10 sec: 7372.4, 60 sec: 6690.0, 300 sec: 6664.7). Total num frames: 67280896. Throughput: 0: 1699.7. Samples: 11816864. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:11:27,934][41694] Avg episode reward: [(0, '4.495')] +[2024-11-08 01:11:32,931][41694] Fps is (10 sec: 5734.6, 60 sec: 6621.9, 300 sec: 6623.0). Total num frames: 67305472. Throughput: 0: 1670.5. Samples: 11820706. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:11:32,934][41694] Avg episode reward: [(0, '4.518')] +[2024-11-08 01:11:34,846][42004] Updated weights for policy 0, policy_version 16436 (0.0029) +[2024-11-08 01:11:37,931][41694] Fps is (10 sec: 6144.5, 60 sec: 6690.2, 300 sec: 6636.9). Total num frames: 67342336. Throughput: 0: 1635.9. Samples: 11830288. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:11:37,933][41694] Avg episode reward: [(0, '4.440')] +[2024-11-08 01:11:37,945][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000016441_67342336.pth... +[2024-11-08 01:11:38,064][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000016051_65744896.pth +[2024-11-08 01:11:40,738][42004] Updated weights for policy 0, policy_version 16446 (0.0028) +[2024-11-08 01:11:42,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6621.8, 300 sec: 6609.1). Total num frames: 67375104. Throughput: 0: 1674.4. Samples: 11840056. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:11:42,934][41694] Avg episode reward: [(0, '4.364')] +[2024-11-08 01:11:47,255][42004] Updated weights for policy 0, policy_version 16456 (0.0034) +[2024-11-08 01:11:47,931][41694] Fps is (10 sec: 6553.7, 60 sec: 6621.9, 300 sec: 6609.1). Total num frames: 67407872. Throughput: 0: 1647.4. Samples: 11844442. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:11:47,933][41694] Avg episode reward: [(0, '4.518')] +[2024-11-08 01:11:52,576][42004] Updated weights for policy 0, policy_version 16466 (0.0034) +[2024-11-08 01:11:52,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6621.9, 300 sec: 6669.7). Total num frames: 67444736. Throughput: 0: 1667.1. Samples: 11855720. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:11:52,934][41694] Avg episode reward: [(0, '4.510')] +[2024-11-08 01:11:57,932][41694] Fps is (10 sec: 7372.6, 60 sec: 6826.7, 300 sec: 6678.6). Total num frames: 67481600. Throughput: 0: 1710.0. Samples: 11866734. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:11:57,934][41694] Avg episode reward: [(0, '4.394')] +[2024-11-08 01:11:58,101][42004] Updated weights for policy 0, policy_version 16476 (0.0032) +[2024-11-08 01:12:02,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6826.6, 300 sec: 6692.4). Total num frames: 67518464. Throughput: 0: 1722.6. Samples: 11872516. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:12:02,934][41694] Avg episode reward: [(0, '4.615')] +[2024-11-08 01:12:05,525][42004] Updated weights for policy 0, policy_version 16486 (0.0031) +[2024-11-08 01:12:07,931][41694] Fps is (10 sec: 6144.1, 60 sec: 6691.1, 300 sec: 6678.6). Total num frames: 67543040. Throughput: 0: 1642.4. Samples: 11879944. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:12:07,933][41694] Avg episode reward: [(0, '4.725')] +[2024-11-08 01:12:11,058][42004] Updated weights for policy 0, policy_version 16496 (0.0027) +[2024-11-08 01:12:12,932][41694] Fps is (10 sec: 6143.7, 60 sec: 6758.9, 300 sec: 6678.6). Total num frames: 67579904. Throughput: 0: 1651.0. Samples: 11891158. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:12:12,935][41694] Avg episode reward: [(0, '4.526')] +[2024-11-08 01:12:17,058][42004] Updated weights for policy 0, policy_version 16506 (0.0047) +[2024-11-08 01:12:17,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6758.4, 300 sec: 6664.7). Total num frames: 67612672. Throughput: 0: 1674.6. Samples: 11896064. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:12:17,933][41694] Avg episode reward: [(0, '4.553')] +[2024-11-08 01:12:22,800][42004] Updated weights for policy 0, policy_version 16516 (0.0037) +[2024-11-08 01:12:22,931][41694] Fps is (10 sec: 6963.7, 60 sec: 6690.2, 300 sec: 6664.7). Total num frames: 67649536. Throughput: 0: 1696.0. Samples: 11906610. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:12:22,933][41694] Avg episode reward: [(0, '4.368')] +[2024-11-08 01:12:27,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6758.5, 300 sec: 6702.5). Total num frames: 67686400. Throughput: 0: 1725.9. Samples: 11917722. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:12:27,934][41694] Avg episode reward: [(0, '4.270')] +[2024-11-08 01:12:28,412][42004] Updated weights for policy 0, policy_version 16526 (0.0024) +[2024-11-08 01:12:32,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6963.2, 300 sec: 6706.3). Total num frames: 67723264. Throughput: 0: 1748.9. Samples: 11923142. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:12:32,933][41694] Avg episode reward: [(0, '4.314')] +[2024-11-08 01:12:34,000][42004] Updated weights for policy 0, policy_version 16536 (0.0036) +[2024-11-08 01:12:38,822][41694] Fps is (10 sec: 6017.5, 60 sec: 6726.8, 300 sec: 6658.5). Total num frames: 67751936. Throughput: 0: 1711.4. Samples: 11934256. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:12:38,824][41694] Avg episode reward: [(0, '4.297')] +[2024-11-08 01:12:41,102][42004] Updated weights for policy 0, policy_version 16546 (0.0025) +[2024-11-08 01:12:42,931][41694] Fps is (10 sec: 6144.0, 60 sec: 6826.7, 300 sec: 6678.6). Total num frames: 67784704. Throughput: 0: 1676.5. Samples: 11942174. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:12:42,933][41694] Avg episode reward: [(0, '4.405')] +[2024-11-08 01:12:46,801][42004] Updated weights for policy 0, policy_version 16556 (0.0034) +[2024-11-08 01:12:47,932][41694] Fps is (10 sec: 7194.4, 60 sec: 6826.6, 300 sec: 6678.6). Total num frames: 67817472. Throughput: 0: 1668.2. Samples: 11947586. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:12:47,934][41694] Avg episode reward: [(0, '4.324')] +[2024-11-08 01:12:52,932][41694] Fps is (10 sec: 6553.1, 60 sec: 6758.3, 300 sec: 6664.7). Total num frames: 67850240. Throughput: 0: 1725.8. Samples: 11957606. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:12:52,934][41694] Avg episode reward: [(0, '4.480')] +[2024-11-08 01:12:53,180][42004] Updated weights for policy 0, policy_version 16566 (0.0028) +[2024-11-08 01:12:57,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6826.7, 300 sec: 6692.4). Total num frames: 67891200. Throughput: 0: 1722.0. Samples: 11968648. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:12:57,933][41694] Avg episode reward: [(0, '4.276')] +[2024-11-08 01:12:58,420][42004] Updated weights for policy 0, policy_version 16576 (0.0024) +[2024-11-08 01:13:02,932][41694] Fps is (10 sec: 7373.3, 60 sec: 6758.4, 300 sec: 6720.2). Total num frames: 67923968. Throughput: 0: 1742.9. Samples: 11974496. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:13:02,933][41694] Avg episode reward: [(0, '4.596')] +[2024-11-08 01:13:04,509][42004] Updated weights for policy 0, policy_version 16586 (0.0029) +[2024-11-08 01:13:07,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6963.2, 300 sec: 6720.4). Total num frames: 67960832. Throughput: 0: 1726.5. Samples: 11984304. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:13:07,935][41694] Avg episode reward: [(0, '4.378')] +[2024-11-08 01:13:10,030][42004] Updated weights for policy 0, policy_version 16596 (0.0035) +[2024-11-08 01:13:12,931][41694] Fps is (10 sec: 6144.0, 60 sec: 6758.5, 300 sec: 6706.3). Total num frames: 67985408. Throughput: 0: 1689.6. Samples: 11993752. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:13:12,933][41694] Avg episode reward: [(0, '4.386')] +[2024-11-08 01:13:17,235][42004] Updated weights for policy 0, policy_version 16606 (0.0043) +[2024-11-08 01:13:17,932][41694] Fps is (10 sec: 6144.1, 60 sec: 6826.7, 300 sec: 6720.2). Total num frames: 68022272. Throughput: 0: 1657.9. Samples: 11997746. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:13:17,933][41694] Avg episode reward: [(0, '4.361')] +[2024-11-08 01:13:22,860][42004] Updated weights for policy 0, policy_version 16616 (0.0027) +[2024-11-08 01:13:22,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6826.7, 300 sec: 6720.2). Total num frames: 68059136. Throughput: 0: 1697.9. Samples: 12009150. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:13:22,933][41694] Avg episode reward: [(0, '4.273')] +[2024-11-08 01:13:27,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6826.7, 300 sec: 6720.2). Total num frames: 68096000. Throughput: 0: 1727.2. Samples: 12019900. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:13:27,933][41694] Avg episode reward: [(0, '4.299')] +[2024-11-08 01:13:28,367][42004] Updated weights for policy 0, policy_version 16626 (0.0039) +[2024-11-08 01:13:32,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6826.7, 300 sec: 6720.2). Total num frames: 68132864. Throughput: 0: 1732.9. Samples: 12025568. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:13:32,933][41694] Avg episode reward: [(0, '4.358')] +[2024-11-08 01:13:33,731][42004] Updated weights for policy 0, policy_version 16636 (0.0029) +[2024-11-08 01:13:37,932][41694] Fps is (10 sec: 7372.6, 60 sec: 7068.1, 300 sec: 6761.9). Total num frames: 68169728. Throughput: 0: 1760.3. Samples: 12036820. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:13:37,934][41694] Avg episode reward: [(0, '4.539')] +[2024-11-08 01:13:37,951][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000016643_68169728.pth... +[2024-11-08 01:13:38,061][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000016249_66555904.pth +[2024-11-08 01:13:39,284][42004] Updated weights for policy 0, policy_version 16646 (0.0024) +[2024-11-08 01:13:42,933][41694] Fps is (10 sec: 7371.9, 60 sec: 7031.3, 300 sec: 6761.8). Total num frames: 68206592. Throughput: 0: 1765.3. Samples: 12048090. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:13:42,936][41694] Avg episode reward: [(0, '4.624')] +[2024-11-08 01:13:44,816][42004] Updated weights for policy 0, policy_version 16656 (0.0032) +[2024-11-08 01:13:47,934][41694] Fps is (10 sec: 6552.7, 60 sec: 6963.0, 300 sec: 6748.0). Total num frames: 68235264. Throughput: 0: 1752.6. Samples: 12053366. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:13:47,937][41694] Avg episode reward: [(0, '4.251')] +[2024-11-08 01:13:51,591][42004] Updated weights for policy 0, policy_version 16666 (0.0029) +[2024-11-08 01:13:52,932][41694] Fps is (10 sec: 6554.3, 60 sec: 7031.5, 300 sec: 6748.0). Total num frames: 68272128. Throughput: 0: 1737.0. Samples: 12062468. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:13:52,934][41694] Avg episode reward: [(0, '4.195')] +[2024-11-08 01:13:57,246][42004] Updated weights for policy 0, policy_version 16676 (0.0029) +[2024-11-08 01:13:57,932][41694] Fps is (10 sec: 7373.8, 60 sec: 6963.2, 300 sec: 6761.9). Total num frames: 68308992. Throughput: 0: 1766.3. Samples: 12073234. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:13:57,934][41694] Avg episode reward: [(0, '4.431')] +[2024-11-08 01:14:02,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6963.2, 300 sec: 6748.0). Total num frames: 68341760. Throughput: 0: 1795.1. Samples: 12078524. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:14:02,933][41694] Avg episode reward: [(0, '4.520')] +[2024-11-08 01:14:03,238][42004] Updated weights for policy 0, policy_version 16686 (0.0027) +[2024-11-08 01:14:07,932][41694] Fps is (10 sec: 7372.9, 60 sec: 7031.5, 300 sec: 6761.9). Total num frames: 68382720. Throughput: 0: 1788.0. Samples: 12089612. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:14:07,934][41694] Avg episode reward: [(0, '4.676')] +[2024-11-08 01:14:08,277][42004] Updated weights for policy 0, policy_version 16696 (0.0021) +[2024-11-08 01:14:12,932][41694] Fps is (10 sec: 7781.8, 60 sec: 7236.2, 300 sec: 6803.5). Total num frames: 68419584. Throughput: 0: 1816.1. Samples: 12101628. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:14:12,936][41694] Avg episode reward: [(0, '4.646')] +[2024-11-08 01:14:13,448][42004] Updated weights for policy 0, policy_version 16706 (0.0030) +[2024-11-08 01:14:17,931][41694] Fps is (10 sec: 7782.4, 60 sec: 7304.5, 300 sec: 6817.4). Total num frames: 68460544. Throughput: 0: 1820.8. Samples: 12107502. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:14:17,933][41694] Avg episode reward: [(0, '4.348')] +[2024-11-08 01:14:20,692][42004] Updated weights for policy 0, policy_version 16716 (0.0030) +[2024-11-08 01:14:22,932][41694] Fps is (10 sec: 6554.0, 60 sec: 7099.7, 300 sec: 6789.6). Total num frames: 68485120. Throughput: 0: 1742.2. Samples: 12115220. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:14:22,934][41694] Avg episode reward: [(0, '4.227')] +[2024-11-08 01:14:26,109][42004] Updated weights for policy 0, policy_version 16726 (0.0033) +[2024-11-08 01:14:27,932][41694] Fps is (10 sec: 6143.9, 60 sec: 7099.7, 300 sec: 6803.5). Total num frames: 68521984. Throughput: 0: 1748.4. Samples: 12126766. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:14:27,935][41694] Avg episode reward: [(0, '4.469')] +[2024-11-08 01:14:32,259][42004] Updated weights for policy 0, policy_version 16736 (0.0045) +[2024-11-08 01:14:32,931][41694] Fps is (10 sec: 6963.3, 60 sec: 7031.5, 300 sec: 6789.7). Total num frames: 68554752. Throughput: 0: 1729.5. Samples: 12131192. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:14:32,933][41694] Avg episode reward: [(0, '4.338')] +[2024-11-08 01:14:37,491][42004] Updated weights for policy 0, policy_version 16746 (0.0025) +[2024-11-08 01:14:37,931][41694] Fps is (10 sec: 6963.3, 60 sec: 7031.5, 300 sec: 6789.6). Total num frames: 68591616. Throughput: 0: 1780.6. Samples: 12142596. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:14:37,934][41694] Avg episode reward: [(0, '4.425')] +[2024-11-08 01:14:42,934][41694] Fps is (10 sec: 7371.3, 60 sec: 7031.4, 300 sec: 6829.7). Total num frames: 68628480. Throughput: 0: 1795.7. Samples: 12154044. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:14:42,936][41694] Avg episode reward: [(0, '4.407')] +[2024-11-08 01:14:43,002][42004] Updated weights for policy 0, policy_version 16756 (0.0039) +[2024-11-08 01:14:47,932][41694] Fps is (10 sec: 7372.5, 60 sec: 7168.1, 300 sec: 6831.3). Total num frames: 68665344. Throughput: 0: 1780.9. Samples: 12158666. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:14:47,933][41694] Avg episode reward: [(0, '4.302')] +[2024-11-08 01:14:48,988][42004] Updated weights for policy 0, policy_version 16766 (0.0034) +[2024-11-08 01:14:54,217][41694] Fps is (10 sec: 5807.9, 60 sec: 6883.9, 300 sec: 6774.0). Total num frames: 68694016. Throughput: 0: 1729.9. Samples: 12169682. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:14:54,219][41694] Avg episode reward: [(0, '4.456')] +[2024-11-08 01:14:56,672][42004] Updated weights for policy 0, policy_version 16776 (0.0028) +[2024-11-08 01:14:57,931][41694] Fps is (10 sec: 5734.7, 60 sec: 6895.0, 300 sec: 6803.5). Total num frames: 68722688. Throughput: 0: 1674.9. Samples: 12176998. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:14:57,933][41694] Avg episode reward: [(0, '4.381')] +[2024-11-08 01:15:02,847][42004] Updated weights for policy 0, policy_version 16786 (0.0033) +[2024-11-08 01:15:02,931][41694] Fps is (10 sec: 7050.7, 60 sec: 6894.9, 300 sec: 6789.6). Total num frames: 68755456. Throughput: 0: 1661.6. Samples: 12182272. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:15:02,933][41694] Avg episode reward: [(0, '4.534')] +[2024-11-08 01:15:07,931][41694] Fps is (10 sec: 6553.6, 60 sec: 6758.4, 300 sec: 6789.6). Total num frames: 68788224. Throughput: 0: 1693.8. Samples: 12191440. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:15:07,933][41694] Avg episode reward: [(0, '4.476')] +[2024-11-08 01:15:08,861][42004] Updated weights for policy 0, policy_version 16796 (0.0031) +[2024-11-08 01:15:12,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6758.5, 300 sec: 6803.5). Total num frames: 68825088. Throughput: 0: 1678.8. Samples: 12202310. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:15:12,934][41694] Avg episode reward: [(0, '4.395')] +[2024-11-08 01:15:14,980][42004] Updated weights for policy 0, policy_version 16806 (0.0037) +[2024-11-08 01:15:17,932][41694] Fps is (10 sec: 6553.5, 60 sec: 6553.6, 300 sec: 6819.5). Total num frames: 68853760. Throughput: 0: 1682.8. Samples: 12206918. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:15:17,935][41694] Avg episode reward: [(0, '4.501')] +[2024-11-08 01:15:20,971][42004] Updated weights for policy 0, policy_version 16816 (0.0037) +[2024-11-08 01:15:22,932][41694] Fps is (10 sec: 6553.6, 60 sec: 6758.4, 300 sec: 6817.4). Total num frames: 68890624. Throughput: 0: 1657.8. Samples: 12217198. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:15:22,935][41694] Avg episode reward: [(0, '4.459')] +[2024-11-08 01:15:28,164][42004] Updated weights for policy 0, policy_version 16826 (0.0032) +[2024-11-08 01:15:28,165][41694] Fps is (10 sec: 6404.2, 60 sec: 6596.2, 300 sec: 6812.0). Total num frames: 68919296. Throughput: 0: 1522.9. Samples: 12222928. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:15:28,172][41694] Avg episode reward: [(0, '4.545')] +[2024-11-08 01:15:32,934][41694] Fps is (10 sec: 6142.5, 60 sec: 6621.6, 300 sec: 6817.4). Total num frames: 68952064. Throughput: 0: 1602.9. Samples: 12230802. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:15:32,936][41694] Avg episode reward: [(0, '4.604')] +[2024-11-08 01:15:33,537][42004] Updated weights for policy 0, policy_version 16836 (0.0034) +[2024-11-08 01:15:37,932][41694] Fps is (10 sec: 7129.5, 60 sec: 6621.9, 300 sec: 6817.4). Total num frames: 68988928. Throughput: 0: 1651.5. Samples: 12241878. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:15:37,934][41694] Avg episode reward: [(0, '4.569')] +[2024-11-08 01:15:37,948][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000016843_68988928.pth... +[2024-11-08 01:15:38,164][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000016441_67342336.pth +[2024-11-08 01:15:39,930][42004] Updated weights for policy 0, policy_version 16846 (0.0030) +[2024-11-08 01:15:42,932][41694] Fps is (10 sec: 6964.9, 60 sec: 6553.8, 300 sec: 6817.4). Total num frames: 69021696. Throughput: 0: 1661.2. Samples: 12251750. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:15:42,935][41694] Avg episode reward: [(0, '4.537')] +[2024-11-08 01:15:45,311][42004] Updated weights for policy 0, policy_version 16856 (0.0033) +[2024-11-08 01:15:47,932][41694] Fps is (10 sec: 6962.9, 60 sec: 6553.6, 300 sec: 6817.4). Total num frames: 69058560. Throughput: 0: 1672.8. Samples: 12257548. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:15:47,935][41694] Avg episode reward: [(0, '4.471')] +[2024-11-08 01:15:50,785][42004] Updated weights for policy 0, policy_version 16866 (0.0029) +[2024-11-08 01:15:52,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6836.7, 300 sec: 6859.1). Total num frames: 69095424. Throughput: 0: 1717.9. Samples: 12268744. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:15:52,933][41694] Avg episode reward: [(0, '4.355')] +[2024-11-08 01:15:56,178][42004] Updated weights for policy 0, policy_version 16876 (0.0030) +[2024-11-08 01:15:57,932][41694] Fps is (10 sec: 7373.1, 60 sec: 6826.7, 300 sec: 6859.1). Total num frames: 69132288. Throughput: 0: 1725.2. Samples: 12279944. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:15:57,934][41694] Avg episode reward: [(0, '4.496')] +[2024-11-08 01:16:02,932][41694] Fps is (10 sec: 6143.4, 60 sec: 6690.0, 300 sec: 6831.5). Total num frames: 69156864. Throughput: 0: 1745.8. Samples: 12285480. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:16:02,935][41694] Avg episode reward: [(0, '4.325')] +[2024-11-08 01:16:03,779][42004] Updated weights for policy 0, policy_version 16886 (0.0032) +[2024-11-08 01:16:07,933][41694] Fps is (10 sec: 6143.3, 60 sec: 6758.3, 300 sec: 6845.3). Total num frames: 69193728. Throughput: 0: 1681.7. Samples: 12292878. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:16:07,935][41694] Avg episode reward: [(0, '4.432')] +[2024-11-08 01:16:09,367][42004] Updated weights for policy 0, policy_version 16896 (0.0031) +[2024-11-08 01:16:12,932][41694] Fps is (10 sec: 6553.7, 60 sec: 6621.8, 300 sec: 6831.3). Total num frames: 69222400. Throughput: 0: 1778.7. Samples: 12302554. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:16:12,936][41694] Avg episode reward: [(0, '4.277')] +[2024-11-08 01:16:15,994][42004] Updated weights for policy 0, policy_version 16906 (0.0035) +[2024-11-08 01:16:17,932][41694] Fps is (10 sec: 6554.3, 60 sec: 6758.4, 300 sec: 6817.4). Total num frames: 69259264. Throughput: 0: 1701.3. Samples: 12307358. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-08 01:16:17,935][41694] Avg episode reward: [(0, '4.294')] +[2024-11-08 01:16:21,462][42004] Updated weights for policy 0, policy_version 16916 (0.0037) +[2024-11-08 01:16:22,931][41694] Fps is (10 sec: 7373.4, 60 sec: 6758.4, 300 sec: 6831.3). Total num frames: 69296128. Throughput: 0: 1707.3. Samples: 12318708. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-08 01:16:22,933][41694] Avg episode reward: [(0, '4.301')] +[2024-11-08 01:16:26,655][42004] Updated weights for policy 0, policy_version 16926 (0.0032) +[2024-11-08 01:16:27,932][41694] Fps is (10 sec: 7782.2, 60 sec: 6990.3, 300 sec: 6886.8). Total num frames: 69337088. Throughput: 0: 1745.4. Samples: 12330292. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-08 01:16:27,934][41694] Avg episode reward: [(0, '4.460')] +[2024-11-08 01:16:32,129][42004] Updated weights for policy 0, policy_version 16936 (0.0034) +[2024-11-08 01:16:32,932][41694] Fps is (10 sec: 7782.3, 60 sec: 7031.7, 300 sec: 6886.8). Total num frames: 69373952. Throughput: 0: 1740.5. Samples: 12335868. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:16:32,933][41694] Avg episode reward: [(0, '4.376')] +[2024-11-08 01:16:37,931][41694] Fps is (10 sec: 6144.4, 60 sec: 6826.7, 300 sec: 6859.1). Total num frames: 69398528. Throughput: 0: 1690.7. Samples: 12344826. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:16:37,934][41694] Avg episode reward: [(0, '4.588')] +[2024-11-08 01:16:39,346][42004] Updated weights for policy 0, policy_version 16946 (0.0036) +[2024-11-08 01:16:42,931][41694] Fps is (10 sec: 6144.1, 60 sec: 6894.9, 300 sec: 6872.9). Total num frames: 69435392. Throughput: 0: 1676.6. Samples: 12355390. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:16:42,933][41694] Avg episode reward: [(0, '4.645')] +[2024-11-08 01:16:45,055][42004] Updated weights for policy 0, policy_version 16956 (0.0033) +[2024-11-08 01:16:47,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6826.7, 300 sec: 6859.1). Total num frames: 69468160. Throughput: 0: 1666.4. Samples: 12360468. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:16:47,934][41694] Avg episode reward: [(0, '4.428')] +[2024-11-08 01:16:50,594][42004] Updated weights for policy 0, policy_version 16966 (0.0024) +[2024-11-08 01:16:52,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6894.9, 300 sec: 6872.9). Total num frames: 69509120. Throughput: 0: 1747.4. Samples: 12371508. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:16:52,934][41694] Avg episode reward: [(0, '4.437')] +[2024-11-08 01:16:56,074][42004] Updated weights for policy 0, policy_version 16976 (0.0030) +[2024-11-08 01:16:57,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6894.9, 300 sec: 6872.9). Total num frames: 69545984. Throughput: 0: 1775.5. Samples: 12382450. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:16:57,933][41694] Avg episode reward: [(0, '4.365')] +[2024-11-08 01:17:02,070][42004] Updated weights for policy 0, policy_version 16986 (0.0029) +[2024-11-08 01:17:02,932][41694] Fps is (10 sec: 6963.2, 60 sec: 7031.6, 300 sec: 6900.7). Total num frames: 69578752. Throughput: 0: 1796.1. Samples: 12388182. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:17:02,933][41694] Avg episode reward: [(0, '4.391')] +[2024-11-08 01:17:07,423][42004] Updated weights for policy 0, policy_version 16996 (0.0033) +[2024-11-08 01:17:07,931][41694] Fps is (10 sec: 6963.3, 60 sec: 7031.6, 300 sec: 6900.7). Total num frames: 69615616. Throughput: 0: 1781.0. Samples: 12398852. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:17:07,933][41694] Avg episode reward: [(0, '4.450')] +[2024-11-08 01:17:12,932][41694] Fps is (10 sec: 6144.0, 60 sec: 6963.3, 300 sec: 6872.9). Total num frames: 69640192. Throughput: 0: 1696.4. Samples: 12406630. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:17:12,934][41694] Avg episode reward: [(0, '4.276')] +[2024-11-08 01:17:14,615][42004] Updated weights for policy 0, policy_version 17006 (0.0034) +[2024-11-08 01:17:17,932][41694] Fps is (10 sec: 6143.9, 60 sec: 6963.2, 300 sec: 6872.9). Total num frames: 69677056. Throughput: 0: 1704.0. Samples: 12412550. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:17:17,933][41694] Avg episode reward: [(0, '4.504')] +[2024-11-08 01:17:20,554][42004] Updated weights for policy 0, policy_version 17016 (0.0041) +[2024-11-08 01:17:22,933][41694] Fps is (10 sec: 7371.4, 60 sec: 6963.0, 300 sec: 6872.9). Total num frames: 69713920. Throughput: 0: 1728.7. Samples: 12422622. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 01:17:22,938][41694] Avg episode reward: [(0, '4.530')] +[2024-11-08 01:17:26,007][42004] Updated weights for policy 0, policy_version 17026 (0.0023) +[2024-11-08 01:17:27,931][41694] Fps is (10 sec: 7373.0, 60 sec: 6895.0, 300 sec: 6873.0). Total num frames: 69750784. Throughput: 0: 1744.4. Samples: 12433890. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 01:17:27,933][41694] Avg episode reward: [(0, '4.398')] +[2024-11-08 01:17:31,430][42004] Updated weights for policy 0, policy_version 17036 (0.0036) +[2024-11-08 01:17:32,932][41694] Fps is (10 sec: 7374.2, 60 sec: 6894.9, 300 sec: 6921.6). Total num frames: 69787648. Throughput: 0: 1757.6. Samples: 12439562. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:17:32,934][41694] Avg episode reward: [(0, '4.428')] +[2024-11-08 01:17:36,837][42004] Updated weights for policy 0, policy_version 17046 (0.0033) +[2024-11-08 01:17:37,932][41694] Fps is (10 sec: 7782.2, 60 sec: 7168.0, 300 sec: 6928.5). Total num frames: 69828608. Throughput: 0: 1768.6. Samples: 12451094. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:17:37,933][41694] Avg episode reward: [(0, '4.433')] +[2024-11-08 01:17:37,944][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000017048_69828608.pth... +[2024-11-08 01:17:38,051][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000016643_68169728.pth +[2024-11-08 01:17:42,313][42004] Updated weights for policy 0, policy_version 17056 (0.0034) +[2024-11-08 01:17:44,322][41694] Fps is (10 sec: 6473.1, 60 sec: 6939.0, 300 sec: 6896.0). Total num frames: 69861376. Throughput: 0: 1722.7. Samples: 12462366. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:17:44,323][41694] Avg episode reward: [(0, '4.218')] +[2024-11-08 01:17:47,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6963.2, 300 sec: 6900.7). Total num frames: 69885952. Throughput: 0: 1694.6. Samples: 12464438. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:17:47,934][41694] Avg episode reward: [(0, '4.199')] +[2024-11-08 01:17:49,774][42004] Updated weights for policy 0, policy_version 17066 (0.0042) +[2024-11-08 01:17:52,931][41694] Fps is (10 sec: 7135.9, 60 sec: 6894.9, 300 sec: 6886.8). Total num frames: 69922816. Throughput: 0: 1705.6. Samples: 12475606. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:17:52,933][41694] Avg episode reward: [(0, '4.323')] +[2024-11-08 01:17:55,108][42004] Updated weights for policy 0, policy_version 17076 (0.0028) +[2024-11-08 01:17:57,932][41694] Fps is (10 sec: 7782.1, 60 sec: 6963.1, 300 sec: 6914.6). Total num frames: 69963776. Throughput: 0: 1787.0. Samples: 12487044. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:17:57,935][41694] Avg episode reward: [(0, '4.511')] +[2024-11-08 01:18:01,000][42004] Updated weights for policy 0, policy_version 17086 (0.0032) +[2024-11-08 01:18:02,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6894.9, 300 sec: 6886.8). Total num frames: 69992448. Throughput: 0: 1761.4. Samples: 12491812. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:18:02,934][41694] Avg episode reward: [(0, '4.287')] +[2024-11-08 01:18:06,795][42004] Updated weights for policy 0, policy_version 17096 (0.0025) +[2024-11-08 01:18:07,931][41694] Fps is (10 sec: 6963.7, 60 sec: 6963.2, 300 sec: 6942.4). Total num frames: 70033408. Throughput: 0: 1771.7. Samples: 12502344. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:18:07,934][41694] Avg episode reward: [(0, '4.289')] +[2024-11-08 01:18:12,269][42004] Updated weights for policy 0, policy_version 17106 (0.0030) +[2024-11-08 01:18:12,932][41694] Fps is (10 sec: 7782.3, 60 sec: 7168.0, 300 sec: 6942.4). Total num frames: 70070272. Throughput: 0: 1771.6. Samples: 12513614. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:18:12,934][41694] Avg episode reward: [(0, '4.436')] +[2024-11-08 01:18:18,275][41694] Fps is (10 sec: 5939.7, 60 sec: 6923.5, 300 sec: 6892.7). Total num frames: 70094848. Throughput: 0: 1754.5. Samples: 12519118. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:18:18,277][41694] Avg episode reward: [(0, '4.338')] +[2024-11-08 01:18:19,536][42004] Updated weights for policy 0, policy_version 17116 (0.0036) +[2024-11-08 01:18:22,931][41694] Fps is (10 sec: 6144.1, 60 sec: 6963.4, 300 sec: 6900.7). Total num frames: 70131712. Throughput: 0: 1691.6. Samples: 12527216. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:18:22,933][41694] Avg episode reward: [(0, '4.378')] +[2024-11-08 01:18:24,811][42004] Updated weights for policy 0, policy_version 17126 (0.0027) +[2024-11-08 01:18:27,931][41694] Fps is (10 sec: 7635.4, 60 sec: 6963.2, 300 sec: 6900.7). Total num frames: 70168576. Throughput: 0: 1749.0. Samples: 12538640. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:18:27,933][41694] Avg episode reward: [(0, '4.586')] +[2024-11-08 01:18:30,189][42004] Updated weights for policy 0, policy_version 17136 (0.0033) +[2024-11-08 01:18:32,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6963.2, 300 sec: 6900.7). Total num frames: 70205440. Throughput: 0: 1780.0. Samples: 12544538. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:18:32,933][41694] Avg episode reward: [(0, '4.628')] +[2024-11-08 01:18:36,365][42004] Updated weights for policy 0, policy_version 17146 (0.0026) +[2024-11-08 01:18:37,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6826.7, 300 sec: 6886.9). Total num frames: 70238208. Throughput: 0: 1751.3. Samples: 12554416. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:18:37,933][41694] Avg episode reward: [(0, '4.586')] +[2024-11-08 01:18:41,668][42004] Updated weights for policy 0, policy_version 17156 (0.0032) +[2024-11-08 01:18:42,931][41694] Fps is (10 sec: 7372.9, 60 sec: 7128.3, 300 sec: 6928.5). Total num frames: 70279168. Throughput: 0: 1753.8. Samples: 12565966. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:18:42,933][41694] Avg episode reward: [(0, '4.459')] +[2024-11-08 01:18:47,163][42004] Updated weights for policy 0, policy_version 17166 (0.0030) +[2024-11-08 01:18:47,934][41694] Fps is (10 sec: 7780.5, 60 sec: 7167.7, 300 sec: 6928.4). Total num frames: 70316032. Throughput: 0: 1769.6. Samples: 12571448. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:18:47,936][41694] Avg episode reward: [(0, '4.494')] +[2024-11-08 01:18:52,932][41694] Fps is (10 sec: 6143.9, 60 sec: 6963.2, 300 sec: 6886.8). Total num frames: 70340608. Throughput: 0: 1766.4. Samples: 12581834. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:18:52,933][41694] Avg episode reward: [(0, '4.415')] +[2024-11-08 01:18:54,339][42004] Updated weights for policy 0, policy_version 17176 (0.0025) +[2024-11-08 01:18:57,932][41694] Fps is (10 sec: 6555.1, 60 sec: 6963.3, 300 sec: 6914.6). Total num frames: 70381568. Throughput: 0: 1723.2. Samples: 12591156. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:18:57,934][41694] Avg episode reward: [(0, '4.408')] +[2024-11-08 01:18:59,827][42004] Updated weights for policy 0, policy_version 17186 (0.0032) +[2024-11-08 01:19:02,931][41694] Fps is (10 sec: 7372.9, 60 sec: 7031.5, 300 sec: 6886.8). Total num frames: 70414336. Throughput: 0: 1727.2. Samples: 12596246. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:19:02,933][41694] Avg episode reward: [(0, '4.333')] +[2024-11-08 01:19:05,873][42004] Updated weights for policy 0, policy_version 17196 (0.0030) +[2024-11-08 01:19:07,934][41694] Fps is (10 sec: 6552.3, 60 sec: 6894.7, 300 sec: 6872.9). Total num frames: 70447104. Throughput: 0: 1760.1. Samples: 12606426. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:19:07,936][41694] Avg episode reward: [(0, '4.546')] +[2024-11-08 01:19:12,227][42004] Updated weights for policy 0, policy_version 17206 (0.0044) +[2024-11-08 01:19:12,932][41694] Fps is (10 sec: 6553.5, 60 sec: 6826.7, 300 sec: 6845.2). Total num frames: 70479872. Throughput: 0: 1724.4. Samples: 12616238. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:19:12,933][41694] Avg episode reward: [(0, '4.423')] +[2024-11-08 01:19:17,911][42004] Updated weights for policy 0, policy_version 17216 (0.0039) +[2024-11-08 01:19:17,931][41694] Fps is (10 sec: 6964.7, 60 sec: 7072.0, 300 sec: 6886.8). Total num frames: 70516736. Throughput: 0: 1711.5. Samples: 12621554. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:19:17,933][41694] Avg episode reward: [(0, '4.471')] +[2024-11-08 01:19:22,932][41694] Fps is (10 sec: 7372.5, 60 sec: 7031.4, 300 sec: 6886.8). Total num frames: 70553600. Throughput: 0: 1735.7. Samples: 12632522. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:19:22,934][41694] Avg episode reward: [(0, '4.411')] +[2024-11-08 01:19:23,478][42004] Updated weights for policy 0, policy_version 17226 (0.0033) +[2024-11-08 01:19:27,931][41694] Fps is (10 sec: 6144.0, 60 sec: 6826.7, 300 sec: 6859.1). Total num frames: 70578176. Throughput: 0: 1657.2. Samples: 12640542. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 01:19:27,933][41694] Avg episode reward: [(0, '4.223')] +[2024-11-08 01:19:30,543][42004] Updated weights for policy 0, policy_version 17236 (0.0024) +[2024-11-08 01:19:32,932][41694] Fps is (10 sec: 6143.9, 60 sec: 6826.6, 300 sec: 6859.0). Total num frames: 70615040. Throughput: 0: 1663.4. Samples: 12646300. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 01:19:32,935][41694] Avg episode reward: [(0, '4.473')] +[2024-11-08 01:19:36,075][42004] Updated weights for policy 0, policy_version 17246 (0.0028) +[2024-11-08 01:19:37,932][41694] Fps is (10 sec: 7372.4, 60 sec: 6894.9, 300 sec: 6859.1). Total num frames: 70651904. Throughput: 0: 1680.8. Samples: 12657472. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 01:19:37,934][41694] Avg episode reward: [(0, '4.384')] +[2024-11-08 01:19:37,950][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000017249_70651904.pth... +[2024-11-08 01:19:38,136][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000016843_68988928.pth +[2024-11-08 01:19:42,288][42004] Updated weights for policy 0, policy_version 17256 (0.0046) +[2024-11-08 01:19:42,933][41694] Fps is (10 sec: 6553.1, 60 sec: 6690.0, 300 sec: 6831.3). Total num frames: 70680576. Throughput: 0: 1691.8. Samples: 12667288. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:19:42,935][41694] Avg episode reward: [(0, '4.236')] +[2024-11-08 01:19:47,790][42004] Updated weights for policy 0, policy_version 17266 (0.0038) +[2024-11-08 01:19:47,931][41694] Fps is (10 sec: 6963.6, 60 sec: 6758.7, 300 sec: 6903.0). Total num frames: 70721536. Throughput: 0: 1696.3. Samples: 12672580. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:19:47,934][41694] Avg episode reward: [(0, '4.283')] +[2024-11-08 01:19:52,932][41694] Fps is (10 sec: 7783.2, 60 sec: 6963.2, 300 sec: 6900.7). Total num frames: 70758400. Throughput: 0: 1716.2. Samples: 12683652. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 01:19:52,939][41694] Avg episode reward: [(0, '4.347')] +[2024-11-08 01:19:53,410][42004] Updated weights for policy 0, policy_version 17276 (0.0031) +[2024-11-08 01:19:57,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6894.9, 300 sec: 6914.6). Total num frames: 70795264. Throughput: 0: 1749.9. Samples: 12694982. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 01:19:57,938][41694] Avg episode reward: [(0, '4.332')] +[2024-11-08 01:20:00,617][42004] Updated weights for policy 0, policy_version 17286 (0.0022) +[2024-11-08 01:20:02,932][41694] Fps is (10 sec: 5734.6, 60 sec: 6690.1, 300 sec: 6872.9). Total num frames: 70815744. Throughput: 0: 1697.9. Samples: 12697960. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 01:20:02,936][41694] Avg episode reward: [(0, '4.246')] +[2024-11-08 01:20:06,367][42004] Updated weights for policy 0, policy_version 17296 (0.0025) +[2024-11-08 01:20:07,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6758.6, 300 sec: 6872.9). Total num frames: 70852608. Throughput: 0: 1676.7. Samples: 12707974. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:20:07,935][41694] Avg episode reward: [(0, '4.622')] +[2024-11-08 01:20:12,094][42004] Updated weights for policy 0, policy_version 17306 (0.0057) +[2024-11-08 01:20:12,932][41694] Fps is (10 sec: 7372.5, 60 sec: 6826.6, 300 sec: 6900.7). Total num frames: 70889472. Throughput: 0: 1738.3. Samples: 12718766. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:20:12,934][41694] Avg episode reward: [(0, '4.501')] +[2024-11-08 01:20:17,932][41694] Fps is (10 sec: 6553.1, 60 sec: 6690.0, 300 sec: 6872.9). Total num frames: 70918144. Throughput: 0: 1706.7. Samples: 12723104. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:20:17,936][41694] Avg episode reward: [(0, '4.555')] +[2024-11-08 01:20:19,103][42004] Updated weights for policy 0, policy_version 17316 (0.0024) +[2024-11-08 01:20:22,932][41694] Fps is (10 sec: 6144.2, 60 sec: 6621.9, 300 sec: 6892.3). Total num frames: 70950912. Throughput: 0: 1661.4. Samples: 12732234. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:20:22,936][41694] Avg episode reward: [(0, '4.279')] +[2024-11-08 01:20:24,894][42004] Updated weights for policy 0, policy_version 17326 (0.0031) +[2024-11-08 01:20:27,932][41694] Fps is (10 sec: 6554.2, 60 sec: 6758.4, 300 sec: 6886.9). Total num frames: 70983680. Throughput: 0: 1676.3. Samples: 12742718. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:20:27,933][41694] Avg episode reward: [(0, '4.413')] +[2024-11-08 01:20:30,868][42004] Updated weights for policy 0, policy_version 17336 (0.0040) +[2024-11-08 01:20:34,771][41694] Fps is (10 sec: 5881.5, 60 sec: 6557.4, 300 sec: 6844.2). Total num frames: 71020544. Throughput: 0: 1614.5. Samples: 12748200. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:20:34,772][41694] Avg episode reward: [(0, '4.292')] +[2024-11-08 01:20:37,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6485.4, 300 sec: 6845.2). Total num frames: 71041024. Throughput: 0: 1584.5. Samples: 12754952. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:20:37,933][41694] Avg episode reward: [(0, '4.324')] +[2024-11-08 01:20:38,820][42004] Updated weights for policy 0, policy_version 17346 (0.0035) +[2024-11-08 01:20:42,932][41694] Fps is (10 sec: 6524.9, 60 sec: 6553.8, 300 sec: 6831.3). Total num frames: 71073792. Throughput: 0: 1566.2. Samples: 12765460. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:20:42,933][41694] Avg episode reward: [(0, '4.310')] +[2024-11-08 01:20:44,902][42004] Updated weights for policy 0, policy_version 17356 (0.0030) +[2024-11-08 01:20:47,932][41694] Fps is (10 sec: 6963.0, 60 sec: 6485.3, 300 sec: 6831.3). Total num frames: 71110656. Throughput: 0: 1607.7. Samples: 12770308. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:20:47,934][41694] Avg episode reward: [(0, '4.393')] +[2024-11-08 01:20:51,126][42004] Updated weights for policy 0, policy_version 17366 (0.0050) +[2024-11-08 01:20:52,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6417.1, 300 sec: 6817.4). Total num frames: 71143424. Throughput: 0: 1605.2. Samples: 12780210. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:20:52,933][41694] Avg episode reward: [(0, '4.425')] +[2024-11-08 01:20:56,540][42004] Updated weights for policy 0, policy_version 17376 (0.0039) +[2024-11-08 01:20:57,932][41694] Fps is (10 sec: 6963.5, 60 sec: 6417.1, 300 sec: 6859.1). Total num frames: 71180288. Throughput: 0: 1618.6. Samples: 12791604. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:20:57,933][41694] Avg episode reward: [(0, '4.579')] +[2024-11-08 01:21:02,165][42004] Updated weights for policy 0, policy_version 17386 (0.0026) +[2024-11-08 01:21:02,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6690.1, 300 sec: 6859.1). Total num frames: 71217152. Throughput: 0: 1643.9. Samples: 12797076. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:21:02,933][41694] Avg episode reward: [(0, '4.416')] +[2024-11-08 01:21:08,880][41694] Fps is (10 sec: 6359.9, 60 sec: 6518.8, 300 sec: 6850.9). Total num frames: 71249920. Throughput: 0: 1647.1. Samples: 12807918. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-11-08 01:21:08,882][41694] Avg episode reward: [(0, '4.443')] +[2024-11-08 01:21:09,417][42004] Updated weights for policy 0, policy_version 17396 (0.0026) +[2024-11-08 01:21:12,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6417.1, 300 sec: 6831.3). Total num frames: 71274496. Throughput: 0: 1605.1. Samples: 12814950. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-11-08 01:21:12,933][41694] Avg episode reward: [(0, '4.279')] +[2024-11-08 01:21:15,738][42004] Updated weights for policy 0, policy_version 17406 (0.0041) +[2024-11-08 01:21:17,931][41694] Fps is (10 sec: 6335.4, 60 sec: 6485.4, 300 sec: 6817.4). Total num frames: 71307264. Throughput: 0: 1662.3. Samples: 12819946. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-11-08 01:21:17,934][41694] Avg episode reward: [(0, '4.544')] +[2024-11-08 01:21:21,375][42004] Updated weights for policy 0, policy_version 17416 (0.0043) +[2024-11-08 01:21:22,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6553.6, 300 sec: 6803.5). Total num frames: 71344128. Throughput: 0: 1685.9. Samples: 12830818. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:21:22,934][41694] Avg episode reward: [(0, '4.548')] +[2024-11-08 01:21:27,346][42004] Updated weights for policy 0, policy_version 17426 (0.0028) +[2024-11-08 01:21:27,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6621.9, 300 sec: 6803.5). Total num frames: 71380992. Throughput: 0: 1684.1. Samples: 12841246. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:21:27,933][41694] Avg episode reward: [(0, '4.437')] +[2024-11-08 01:21:32,672][42004] Updated weights for policy 0, policy_version 17436 (0.0027) +[2024-11-08 01:21:32,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6831.2, 300 sec: 6845.2). Total num frames: 71417856. Throughput: 0: 1705.4. Samples: 12847050. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 01:21:32,933][41694] Avg episode reward: [(0, '4.340')] +[2024-11-08 01:21:37,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6894.9, 300 sec: 6845.2). Total num frames: 71454720. Throughput: 0: 1740.0. Samples: 12858508. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 01:21:37,934][41694] Avg episode reward: [(0, '4.459')] +[2024-11-08 01:21:37,999][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000017446_71458816.pth... +[2024-11-08 01:21:38,001][42004] Updated weights for policy 0, policy_version 17446 (0.0029) +[2024-11-08 01:21:38,185][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000017048_69828608.pth +[2024-11-08 01:21:42,959][41694] Fps is (10 sec: 6127.2, 60 sec: 6755.3, 300 sec: 6816.8). Total num frames: 71479296. Throughput: 0: 1606.5. Samples: 12863940. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 01:21:42,961][41694] Avg episode reward: [(0, '4.492')] +[2024-11-08 01:21:45,650][42004] Updated weights for policy 0, policy_version 17456 (0.0024) +[2024-11-08 01:21:47,932][41694] Fps is (10 sec: 6144.0, 60 sec: 6758.4, 300 sec: 6803.5). Total num frames: 71516160. Throughput: 0: 1651.3. Samples: 12871384. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:21:47,933][41694] Avg episode reward: [(0, '4.432')] +[2024-11-08 01:21:51,306][42004] Updated weights for policy 0, policy_version 17466 (0.0031) +[2024-11-08 01:21:52,931][41694] Fps is (10 sec: 7393.4, 60 sec: 6826.7, 300 sec: 6803.5). Total num frames: 71553024. Throughput: 0: 1686.8. Samples: 12882222. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:21:52,933][41694] Avg episode reward: [(0, '4.312')] +[2024-11-08 01:21:57,167][42004] Updated weights for policy 0, policy_version 17476 (0.0029) +[2024-11-08 01:21:57,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6758.4, 300 sec: 6803.5). Total num frames: 71585792. Throughput: 0: 1726.5. Samples: 12892642. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:21:57,933][41694] Avg episode reward: [(0, '4.401')] +[2024-11-08 01:22:02,790][42004] Updated weights for policy 0, policy_version 17486 (0.0024) +[2024-11-08 01:22:02,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6758.4, 300 sec: 6803.5). Total num frames: 71622656. Throughput: 0: 1737.8. Samples: 12898146. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:22:02,933][41694] Avg episode reward: [(0, '4.559')] +[2024-11-08 01:22:07,723][42004] Updated weights for policy 0, policy_version 17496 (0.0036) +[2024-11-08 01:22:07,932][41694] Fps is (10 sec: 7782.3, 60 sec: 7005.7, 300 sec: 6859.1). Total num frames: 71663616. Throughput: 0: 1760.5. Samples: 12910042. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:22:07,934][41694] Avg episode reward: [(0, '4.420')] +[2024-11-08 01:22:12,932][41694] Fps is (10 sec: 7782.4, 60 sec: 7099.7, 300 sec: 6859.1). Total num frames: 71700480. Throughput: 0: 1787.6. Samples: 12921690. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:22:12,934][41694] Avg episode reward: [(0, '4.287')] +[2024-11-08 01:22:13,067][42004] Updated weights for policy 0, policy_version 17506 (0.0029) +[2024-11-08 01:22:17,932][41694] Fps is (10 sec: 6144.0, 60 sec: 6963.2, 300 sec: 6817.5). Total num frames: 71725056. Throughput: 0: 1786.9. Samples: 12927458. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:22:17,934][41694] Avg episode reward: [(0, '4.219')] +[2024-11-08 01:22:20,743][42004] Updated weights for policy 0, policy_version 17516 (0.0027) +[2024-11-08 01:22:22,933][41694] Fps is (10 sec: 6143.0, 60 sec: 6963.1, 300 sec: 6817.4). Total num frames: 71761920. Throughput: 0: 1687.2. Samples: 12934434. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:22:22,935][41694] Avg episode reward: [(0, '4.532')] +[2024-11-08 01:22:26,122][42004] Updated weights for policy 0, policy_version 17526 (0.0026) +[2024-11-08 01:22:27,932][41694] Fps is (10 sec: 7372.9, 60 sec: 6963.2, 300 sec: 6817.4). Total num frames: 71798784. Throughput: 0: 1826.0. Samples: 12946060. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:22:27,933][41694] Avg episode reward: [(0, '4.458')] +[2024-11-08 01:22:31,848][42004] Updated weights for policy 0, policy_version 17536 (0.0025) +[2024-11-08 01:22:32,932][41694] Fps is (10 sec: 6964.1, 60 sec: 6894.9, 300 sec: 6789.6). Total num frames: 71831552. Throughput: 0: 1769.7. Samples: 12951020. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:22:32,934][41694] Avg episode reward: [(0, '4.445')] +[2024-11-08 01:22:37,209][42004] Updated weights for policy 0, policy_version 17546 (0.0033) +[2024-11-08 01:22:37,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6963.2, 300 sec: 6849.7). Total num frames: 71872512. Throughput: 0: 1780.0. Samples: 12962320. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:22:37,934][41694] Avg episode reward: [(0, '4.831')] +[2024-11-08 01:22:42,708][42004] Updated weights for policy 0, policy_version 17556 (0.0034) +[2024-11-08 01:22:42,932][41694] Fps is (10 sec: 7782.5, 60 sec: 7171.3, 300 sec: 6859.1). Total num frames: 71909376. Throughput: 0: 1800.0. Samples: 12973642. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:22:42,934][41694] Avg episode reward: [(0, '4.575')] +[2024-11-08 01:22:47,932][41694] Fps is (10 sec: 7372.7, 60 sec: 7168.0, 300 sec: 6859.1). Total num frames: 71946240. Throughput: 0: 1797.7. Samples: 12979042. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:22:47,936][41694] Avg episode reward: [(0, '4.333')] +[2024-11-08 01:22:48,083][42004] Updated weights for policy 0, policy_version 17566 (0.0030) +[2024-11-08 01:22:52,931][41694] Fps is (10 sec: 6144.1, 60 sec: 6963.2, 300 sec: 6803.5). Total num frames: 71970816. Throughput: 0: 1737.9. Samples: 12988246. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:22:52,933][41694] Avg episode reward: [(0, '4.364')] +[2024-11-08 01:22:55,673][42004] Updated weights for policy 0, policy_version 17576 (0.0026) +[2024-11-08 01:22:57,931][41694] Fps is (10 sec: 6144.1, 60 sec: 7031.5, 300 sec: 6831.3). Total num frames: 72007680. Throughput: 0: 1693.1. Samples: 12997878. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:22:57,933][41694] Avg episode reward: [(0, '4.734')] +[2024-11-08 01:23:01,083][42004] Updated weights for policy 0, policy_version 17586 (0.0029) +[2024-11-08 01:23:02,935][41694] Fps is (10 sec: 6960.7, 60 sec: 6962.8, 300 sec: 6803.4). Total num frames: 72040448. Throughput: 0: 1692.9. Samples: 13003642. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:23:02,937][41694] Avg episode reward: [(0, '4.482')] +[2024-11-08 01:23:07,237][42004] Updated weights for policy 0, policy_version 17596 (0.0030) +[2024-11-08 01:23:07,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6894.9, 300 sec: 6803.5). Total num frames: 72077312. Throughput: 0: 1756.6. Samples: 13013480. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:23:07,933][41694] Avg episode reward: [(0, '4.575')] +[2024-11-08 01:23:12,499][42004] Updated weights for policy 0, policy_version 17606 (0.0033) +[2024-11-08 01:23:12,931][41694] Fps is (10 sec: 7375.4, 60 sec: 6894.9, 300 sec: 6853.2). Total num frames: 72114176. Throughput: 0: 1753.1. Samples: 13024948. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:23:12,933][41694] Avg episode reward: [(0, '4.563')] +[2024-11-08 01:23:17,932][41694] Fps is (10 sec: 7372.7, 60 sec: 7099.7, 300 sec: 6845.2). Total num frames: 72151040. Throughput: 0: 1763.6. Samples: 13030382. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:23:17,934][41694] Avg episode reward: [(0, '4.280')] +[2024-11-08 01:23:18,067][42004] Updated weights for policy 0, policy_version 17616 (0.0029) +[2024-11-08 01:23:22,931][41694] Fps is (10 sec: 7782.4, 60 sec: 7168.2, 300 sec: 6859.1). Total num frames: 72192000. Throughput: 0: 1772.6. Samples: 13042088. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:23:22,934][41694] Avg episode reward: [(0, '4.366')] +[2024-11-08 01:23:23,361][42004] Updated weights for policy 0, policy_version 17626 (0.0028) +[2024-11-08 01:23:27,932][41694] Fps is (10 sec: 6143.9, 60 sec: 6894.9, 300 sec: 6803.5). Total num frames: 72212480. Throughput: 0: 1682.0. Samples: 13049332. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:23:27,937][41694] Avg episode reward: [(0, '4.189')] +[2024-11-08 01:23:31,055][42004] Updated weights for policy 0, policy_version 17636 (0.0029) +[2024-11-08 01:23:32,931][41694] Fps is (10 sec: 5734.4, 60 sec: 6963.2, 300 sec: 6817.4). Total num frames: 72249344. Throughput: 0: 1683.6. Samples: 13054802. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:23:32,933][41694] Avg episode reward: [(0, '4.450')] +[2024-11-08 01:23:36,965][42004] Updated weights for policy 0, policy_version 17646 (0.0030) +[2024-11-08 01:23:37,931][41694] Fps is (10 sec: 6963.4, 60 sec: 6826.7, 300 sec: 6789.6). Total num frames: 72282112. Throughput: 0: 1716.9. Samples: 13065508. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:23:37,933][41694] Avg episode reward: [(0, '4.514')] +[2024-11-08 01:23:37,944][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000017647_72282112.pth... +[2024-11-08 01:23:38,058][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000017249_70651904.pth +[2024-11-08 01:23:42,580][42004] Updated weights for policy 0, policy_version 17656 (0.0029) +[2024-11-08 01:23:42,933][41694] Fps is (10 sec: 6962.2, 60 sec: 6826.5, 300 sec: 6789.7). Total num frames: 72318976. Throughput: 0: 1742.8. Samples: 13076306. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:23:42,936][41694] Avg episode reward: [(0, '4.502')] +[2024-11-08 01:23:47,585][42004] Updated weights for policy 0, policy_version 17666 (0.0024) +[2024-11-08 01:23:47,932][41694] Fps is (10 sec: 7782.3, 60 sec: 6894.9, 300 sec: 6845.2). Total num frames: 72359936. Throughput: 0: 1747.0. Samples: 13082252. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:23:47,935][41694] Avg episode reward: [(0, '4.425')] +[2024-11-08 01:23:52,854][42004] Updated weights for policy 0, policy_version 17676 (0.0022) +[2024-11-08 01:23:52,932][41694] Fps is (10 sec: 8193.0, 60 sec: 7168.0, 300 sec: 6845.2). Total num frames: 72400896. Throughput: 0: 1793.7. Samples: 13094196. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:23:52,933][41694] Avg episode reward: [(0, '4.432')] +[2024-11-08 01:23:57,932][41694] Fps is (10 sec: 7372.8, 60 sec: 7099.7, 300 sec: 6845.2). Total num frames: 72433664. Throughput: 0: 1787.1. Samples: 13105368. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:23:57,933][41694] Avg episode reward: [(0, '4.485')] +[2024-11-08 01:23:58,568][42004] Updated weights for policy 0, policy_version 17686 (0.0034) +[2024-11-08 01:24:02,932][41694] Fps is (10 sec: 5324.6, 60 sec: 6895.3, 300 sec: 6803.6). Total num frames: 72454144. Throughput: 0: 1734.0. Samples: 13108412. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:24:02,934][41694] Avg episode reward: [(0, '4.530')] +[2024-11-08 01:24:06,957][42004] Updated weights for policy 0, policy_version 17696 (0.0034) +[2024-11-08 01:24:07,931][41694] Fps is (10 sec: 5324.9, 60 sec: 6826.7, 300 sec: 6803.5). Total num frames: 72486912. Throughput: 0: 1653.4. Samples: 13116490. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 01:24:07,933][41694] Avg episode reward: [(0, '4.095')] +[2024-11-08 01:24:12,713][42004] Updated weights for policy 0, policy_version 17706 (0.0025) +[2024-11-08 01:24:12,931][41694] Fps is (10 sec: 6963.6, 60 sec: 6826.7, 300 sec: 6803.5). Total num frames: 72523776. Throughput: 0: 1733.3. Samples: 13127328. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 01:24:12,934][41694] Avg episode reward: [(0, '4.295')] +[2024-11-08 01:24:17,791][42004] Updated weights for policy 0, policy_version 17716 (0.0028) +[2024-11-08 01:24:17,932][41694] Fps is (10 sec: 7782.1, 60 sec: 6894.9, 300 sec: 6817.4). Total num frames: 72564736. Throughput: 0: 1730.9. Samples: 13132694. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:24:17,934][41694] Avg episode reward: [(0, '4.362')] +[2024-11-08 01:24:22,844][42004] Updated weights for policy 0, policy_version 17726 (0.0030) +[2024-11-08 01:24:22,931][41694] Fps is (10 sec: 8192.0, 60 sec: 6894.9, 300 sec: 6872.9). Total num frames: 72605696. Throughput: 0: 1772.1. Samples: 13145254. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:24:22,932][41694] Avg episode reward: [(0, '4.386')] +[2024-11-08 01:24:27,932][41694] Fps is (10 sec: 7782.5, 60 sec: 7168.0, 300 sec: 6873.0). Total num frames: 72642560. Throughput: 0: 1792.8. Samples: 13156982. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:24:27,934][41694] Avg episode reward: [(0, '4.346')] +[2024-11-08 01:24:28,190][42004] Updated weights for policy 0, policy_version 17736 (0.0033) +[2024-11-08 01:24:32,932][41694] Fps is (10 sec: 7372.5, 60 sec: 7168.0, 300 sec: 6873.0). Total num frames: 72679424. Throughput: 0: 1789.8. Samples: 13162792. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:24:32,934][41694] Avg episode reward: [(0, '4.375')] +[2024-11-08 01:24:35,679][42004] Updated weights for policy 0, policy_version 17746 (0.0033) +[2024-11-08 01:24:37,932][41694] Fps is (10 sec: 6144.0, 60 sec: 7031.4, 300 sec: 6859.1). Total num frames: 72704000. Throughput: 0: 1689.2. Samples: 13170210. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:24:37,934][41694] Avg episode reward: [(0, '4.270')] +[2024-11-08 01:24:41,195][42004] Updated weights for policy 0, policy_version 17756 (0.0027) +[2024-11-08 01:24:42,934][41694] Fps is (10 sec: 5733.1, 60 sec: 6963.1, 300 sec: 6831.2). Total num frames: 72736768. Throughput: 0: 1686.6. Samples: 13181270. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:24:42,936][41694] Avg episode reward: [(0, '4.425')] +[2024-11-08 01:24:47,443][42004] Updated weights for policy 0, policy_version 17766 (0.0032) +[2024-11-08 01:24:47,932][41694] Fps is (10 sec: 6553.5, 60 sec: 6826.7, 300 sec: 6817.4). Total num frames: 72769536. Throughput: 0: 1717.7. Samples: 13185710. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:24:47,933][41694] Avg episode reward: [(0, '4.479')] +[2024-11-08 01:24:52,731][42004] Updated weights for policy 0, policy_version 17776 (0.0028) +[2024-11-08 01:24:52,933][41694] Fps is (10 sec: 7373.8, 60 sec: 6826.6, 300 sec: 6831.3). Total num frames: 72810496. Throughput: 0: 1781.1. Samples: 13196640. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:24:52,935][41694] Avg episode reward: [(0, '4.546')] +[2024-11-08 01:24:57,932][41694] Fps is (10 sec: 7782.2, 60 sec: 6894.9, 300 sec: 6886.8). Total num frames: 72847360. Throughput: 0: 1797.1. Samples: 13208198. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:24:57,934][41694] Avg episode reward: [(0, '4.356')] +[2024-11-08 01:24:58,233][42004] Updated weights for policy 0, policy_version 17786 (0.0027) +[2024-11-08 01:25:02,932][41694] Fps is (10 sec: 6963.9, 60 sec: 7099.8, 300 sec: 6872.9). Total num frames: 72880128. Throughput: 0: 1790.3. Samples: 13213258. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:25:02,935][41694] Avg episode reward: [(0, '4.398')] +[2024-11-08 01:25:04,285][42004] Updated weights for policy 0, policy_version 17796 (0.0029) +[2024-11-08 01:25:09,662][41694] Fps is (10 sec: 5586.9, 60 sec: 6900.7, 300 sec: 6819.1). Total num frames: 72912896. Throughput: 0: 1676.5. Samples: 13223600. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:25:09,666][41694] Avg episode reward: [(0, '4.429')] +[2024-11-08 01:25:12,077][42004] Updated weights for policy 0, policy_version 17806 (0.0037) +[2024-11-08 01:25:12,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6894.9, 300 sec: 6845.2). Total num frames: 72937472. Throughput: 0: 1642.0. Samples: 13230874. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:25:12,936][41694] Avg episode reward: [(0, '4.426')] +[2024-11-08 01:25:17,932][41694] Fps is (10 sec: 6439.2, 60 sec: 6690.2, 300 sec: 6831.3). Total num frames: 72966144. Throughput: 0: 1619.5. Samples: 13235670. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:25:17,934][41694] Avg episode reward: [(0, '4.350')] +[2024-11-08 01:25:18,659][42004] Updated weights for policy 0, policy_version 17816 (0.0048) +[2024-11-08 01:25:22,932][41694] Fps is (10 sec: 6144.2, 60 sec: 6553.6, 300 sec: 6831.3). Total num frames: 72998912. Throughput: 0: 1652.7. Samples: 13244582. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:25:22,934][41694] Avg episode reward: [(0, '4.470')] +[2024-11-08 01:25:24,626][42004] Updated weights for policy 0, policy_version 17826 (0.0024) +[2024-11-08 01:25:27,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6621.9, 300 sec: 6888.1). Total num frames: 73039872. Throughput: 0: 1656.5. Samples: 13255808. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:25:27,933][41694] Avg episode reward: [(0, '4.503')] +[2024-11-08 01:25:30,082][42004] Updated weights for policy 0, policy_version 17836 (0.0033) +[2024-11-08 01:25:32,933][41694] Fps is (10 sec: 7781.7, 60 sec: 6621.8, 300 sec: 6900.7). Total num frames: 73076736. Throughput: 0: 1682.8. Samples: 13261438. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:25:32,935][41694] Avg episode reward: [(0, '4.522')] +[2024-11-08 01:25:35,533][42004] Updated weights for policy 0, policy_version 17846 (0.0040) +[2024-11-08 01:25:37,932][41694] Fps is (10 sec: 7372.6, 60 sec: 6826.7, 300 sec: 6914.6). Total num frames: 73113600. Throughput: 0: 1689.9. Samples: 13272686. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:25:37,934][41694] Avg episode reward: [(0, '4.345')] +[2024-11-08 01:25:37,944][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000017850_73113600.pth... +[2024-11-08 01:25:38,064][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000017446_71458816.pth +[2024-11-08 01:25:41,157][42004] Updated weights for policy 0, policy_version 17856 (0.0032) +[2024-11-08 01:25:43,609][41694] Fps is (10 sec: 6138.5, 60 sec: 6683.3, 300 sec: 6871.1). Total num frames: 73142272. Throughput: 0: 1534.5. Samples: 13278290. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:25:43,610][41694] Avg episode reward: [(0, '4.592')] +[2024-11-08 01:25:47,932][41694] Fps is (10 sec: 6144.1, 60 sec: 6758.4, 300 sec: 6886.8). Total num frames: 73175040. Throughput: 0: 1612.0. Samples: 13285796. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:25:47,933][41694] Avg episode reward: [(0, '4.589')] +[2024-11-08 01:25:48,405][42004] Updated weights for policy 0, policy_version 17866 (0.0039) +[2024-11-08 01:25:52,932][41694] Fps is (10 sec: 7029.6, 60 sec: 6622.0, 300 sec: 6872.9). Total num frames: 73207808. Throughput: 0: 1688.6. Samples: 13296664. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:25:52,933][41694] Avg episode reward: [(0, '4.484')] +[2024-11-08 01:25:54,556][42004] Updated weights for policy 0, policy_version 17876 (0.0025) +[2024-11-08 01:25:57,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6621.9, 300 sec: 6872.9). Total num frames: 73244672. Throughput: 0: 1692.1. Samples: 13307016. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:25:57,933][41694] Avg episode reward: [(0, '4.344')] +[2024-11-08 01:26:00,087][42004] Updated weights for policy 0, policy_version 17886 (0.0035) +[2024-11-08 01:26:02,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6621.9, 300 sec: 6895.1). Total num frames: 73277440. Throughput: 0: 1710.9. Samples: 13312660. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:26:02,933][41694] Avg episode reward: [(0, '4.232')] +[2024-11-08 01:26:05,824][42004] Updated weights for policy 0, policy_version 17896 (0.0027) +[2024-11-08 01:26:07,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6888.8, 300 sec: 6914.6). Total num frames: 73314304. Throughput: 0: 1750.8. Samples: 13323368. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:26:07,933][41694] Avg episode reward: [(0, '4.540')] +[2024-11-08 01:26:12,106][42004] Updated weights for policy 0, policy_version 17906 (0.0046) +[2024-11-08 01:26:12,932][41694] Fps is (10 sec: 6963.0, 60 sec: 6826.7, 300 sec: 6914.6). Total num frames: 73347072. Throughput: 0: 1719.3. Samples: 13333176. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:26:12,933][41694] Avg episode reward: [(0, '4.620')] +[2024-11-08 01:26:17,932][41694] Fps is (10 sec: 5324.7, 60 sec: 6690.1, 300 sec: 6859.1). Total num frames: 73367552. Throughput: 0: 1709.4. Samples: 13338358. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:26:17,936][41694] Avg episode reward: [(0, '4.716')] +[2024-11-08 01:26:19,666][42004] Updated weights for policy 0, policy_version 17916 (0.0024) +[2024-11-08 01:26:22,931][41694] Fps is (10 sec: 5734.5, 60 sec: 6758.4, 300 sec: 6859.1). Total num frames: 73404416. Throughput: 0: 1630.6. Samples: 13346064. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:26:22,933][41694] Avg episode reward: [(0, '4.497')] +[2024-11-08 01:26:25,573][42004] Updated weights for policy 0, policy_version 17926 (0.0030) +[2024-11-08 01:26:27,932][41694] Fps is (10 sec: 6962.9, 60 sec: 6621.8, 300 sec: 6845.2). Total num frames: 73437184. Throughput: 0: 1755.9. Samples: 13356116. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:26:27,934][41694] Avg episode reward: [(0, '4.504')] +[2024-11-08 01:26:31,331][42004] Updated weights for policy 0, policy_version 17936 (0.0032) +[2024-11-08 01:26:32,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6622.0, 300 sec: 6845.2). Total num frames: 73474048. Throughput: 0: 1683.2. Samples: 13361538. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:26:32,933][41694] Avg episode reward: [(0, '4.429')] +[2024-11-08 01:26:36,754][42004] Updated weights for policy 0, policy_version 17946 (0.0027) +[2024-11-08 01:26:37,932][41694] Fps is (10 sec: 7782.7, 60 sec: 6690.1, 300 sec: 6901.4). Total num frames: 73515008. Throughput: 0: 1692.6. Samples: 13372830. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:26:37,934][41694] Avg episode reward: [(0, '4.526')] +[2024-11-08 01:26:42,544][42004] Updated weights for policy 0, policy_version 17956 (0.0027) +[2024-11-08 01:26:42,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6835.5, 300 sec: 6886.8). Total num frames: 73547776. Throughput: 0: 1701.8. Samples: 13383598. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:26:42,934][41694] Avg episode reward: [(0, '4.415')] +[2024-11-08 01:26:47,932][41694] Fps is (10 sec: 6963.3, 60 sec: 6826.7, 300 sec: 6886.8). Total num frames: 73584640. Throughput: 0: 1695.7. Samples: 13388966. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:26:47,933][41694] Avg episode reward: [(0, '4.469')] +[2024-11-08 01:26:48,512][42004] Updated weights for policy 0, policy_version 17966 (0.0034) +[2024-11-08 01:26:52,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6621.9, 300 sec: 6845.2). Total num frames: 73605120. Throughput: 0: 1640.6. Samples: 13397196. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:26:52,934][41694] Avg episode reward: [(0, '4.561')] +[2024-11-08 01:26:56,204][42004] Updated weights for policy 0, policy_version 17976 (0.0032) +[2024-11-08 01:26:57,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6621.9, 300 sec: 6845.2). Total num frames: 73641984. Throughput: 0: 1630.1. Samples: 13406532. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:26:57,933][41694] Avg episode reward: [(0, '4.355')] +[2024-11-08 01:27:02,931][41694] Fps is (10 sec: 6144.1, 60 sec: 6485.3, 300 sec: 6789.6). Total num frames: 73666560. Throughput: 0: 1619.2. Samples: 13411222. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:27:02,935][41694] Avg episode reward: [(0, '4.516')] +[2024-11-08 01:27:03,051][42004] Updated weights for policy 0, policy_version 17986 (0.0037) +[2024-11-08 01:27:07,932][41694] Fps is (10 sec: 6143.8, 60 sec: 6485.3, 300 sec: 6789.6). Total num frames: 73703424. Throughput: 0: 1664.2. Samples: 13420956. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:27:07,935][41694] Avg episode reward: [(0, '4.160')] +[2024-11-08 01:27:08,640][42004] Updated weights for policy 0, policy_version 17996 (0.0027) +[2024-11-08 01:27:12,932][41694] Fps is (10 sec: 7782.2, 60 sec: 6621.9, 300 sec: 6845.2). Total num frames: 73744384. Throughput: 0: 1686.5. Samples: 13432008. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:27:12,935][41694] Avg episode reward: [(0, '4.414')] +[2024-11-08 01:27:13,886][42004] Updated weights for policy 0, policy_version 18006 (0.0025) +[2024-11-08 01:27:17,931][41694] Fps is (10 sec: 7782.8, 60 sec: 6895.0, 300 sec: 6845.2). Total num frames: 73781248. Throughput: 0: 1695.8. Samples: 13437848. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:27:17,934][41694] Avg episode reward: [(0, '4.388')] +[2024-11-08 01:27:19,345][42004] Updated weights for policy 0, policy_version 18016 (0.0031) +[2024-11-08 01:27:22,931][41694] Fps is (10 sec: 7373.0, 60 sec: 6894.9, 300 sec: 6845.2). Total num frames: 73818112. Throughput: 0: 1692.0. Samples: 13448970. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:27:22,934][41694] Avg episode reward: [(0, '4.356')] +[2024-11-08 01:27:26,755][42004] Updated weights for policy 0, policy_version 18026 (0.0031) +[2024-11-08 01:27:27,931][41694] Fps is (10 sec: 6144.0, 60 sec: 6758.5, 300 sec: 6817.4). Total num frames: 73842688. Throughput: 0: 1627.3. Samples: 13456826. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:27:27,933][41694] Avg episode reward: [(0, '4.428')] +[2024-11-08 01:27:32,701][42004] Updated weights for policy 0, policy_version 18036 (0.0034) +[2024-11-08 01:27:32,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6690.1, 300 sec: 6789.6). Total num frames: 73875456. Throughput: 0: 1625.9. Samples: 13462132. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:27:32,933][41694] Avg episode reward: [(0, '4.420')] +[2024-11-08 01:27:37,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6621.9, 300 sec: 6789.6). Total num frames: 73912320. Throughput: 0: 1659.0. Samples: 13471850. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:27:37,934][41694] Avg episode reward: [(0, '4.288')] +[2024-11-08 01:27:37,947][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000018045_73912320.pth... +[2024-11-08 01:27:38,065][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000017647_72282112.pth +[2024-11-08 01:27:38,468][42004] Updated weights for policy 0, policy_version 18046 (0.0028) +[2024-11-08 01:27:42,932][41694] Fps is (10 sec: 7372.4, 60 sec: 6690.1, 300 sec: 6789.6). Total num frames: 73949184. Throughput: 0: 1708.8. Samples: 13483428. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:27:42,934][41694] Avg episode reward: [(0, '4.390')] +[2024-11-08 01:27:43,816][42004] Updated weights for policy 0, policy_version 18056 (0.0023) +[2024-11-08 01:27:47,932][41694] Fps is (10 sec: 7372.3, 60 sec: 6690.1, 300 sec: 6831.3). Total num frames: 73986048. Throughput: 0: 1729.8. Samples: 13489066. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:27:47,935][41694] Avg episode reward: [(0, '4.304')] +[2024-11-08 01:27:49,322][42004] Updated weights for policy 0, policy_version 18066 (0.0025) +[2024-11-08 01:27:52,932][41694] Fps is (10 sec: 7373.2, 60 sec: 6963.2, 300 sec: 6831.3). Total num frames: 74022912. Throughput: 0: 1765.4. Samples: 13500400. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:27:52,935][41694] Avg episode reward: [(0, '4.427')] +[2024-11-08 01:27:54,726][42004] Updated weights for policy 0, policy_version 18076 (0.0031) +[2024-11-08 01:27:57,932][41694] Fps is (10 sec: 7782.7, 60 sec: 7031.4, 300 sec: 6859.1). Total num frames: 74063872. Throughput: 0: 1770.9. Samples: 13511698. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:27:57,933][41694] Avg episode reward: [(0, '4.344')] +[2024-11-08 01:28:02,151][42004] Updated weights for policy 0, policy_version 18086 (0.0028) +[2024-11-08 01:28:02,932][41694] Fps is (10 sec: 6144.0, 60 sec: 6963.2, 300 sec: 6803.5). Total num frames: 74084352. Throughput: 0: 1696.3. Samples: 13514180. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:28:02,934][41694] Avg episode reward: [(0, '4.259')] +[2024-11-08 01:28:07,933][41694] Fps is (10 sec: 5324.2, 60 sec: 6894.8, 300 sec: 6789.6). Total num frames: 74117120. Throughput: 0: 1663.9. Samples: 13523846. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:28:07,935][41694] Avg episode reward: [(0, '4.485')] +[2024-11-08 01:28:08,372][42004] Updated weights for policy 0, policy_version 18096 (0.0023) +[2024-11-08 01:28:12,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6826.7, 300 sec: 6789.6). Total num frames: 74153984. Throughput: 0: 1725.5. Samples: 13534472. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:28:12,934][41694] Avg episode reward: [(0, '4.391')] +[2024-11-08 01:28:13,991][42004] Updated weights for policy 0, policy_version 18106 (0.0032) +[2024-11-08 01:28:17,932][41694] Fps is (10 sec: 6963.9, 60 sec: 6758.3, 300 sec: 6761.9). Total num frames: 74186752. Throughput: 0: 1718.7. Samples: 13539476. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:28:17,935][41694] Avg episode reward: [(0, '4.256')] +[2024-11-08 01:28:19,878][42004] Updated weights for policy 0, policy_version 18116 (0.0022) +[2024-11-08 01:28:22,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6758.4, 300 sec: 6817.4). Total num frames: 74223616. Throughput: 0: 1747.8. Samples: 13550502. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:28:22,933][41694] Avg episode reward: [(0, '4.323')] +[2024-11-08 01:28:25,301][42004] Updated weights for policy 0, policy_version 18126 (0.0028) +[2024-11-08 01:28:27,931][41694] Fps is (10 sec: 7782.9, 60 sec: 7031.5, 300 sec: 6831.3). Total num frames: 74264576. Throughput: 0: 1743.3. Samples: 13561876. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:28:27,933][41694] Avg episode reward: [(0, '4.258')] +[2024-11-08 01:28:30,608][42004] Updated weights for policy 0, policy_version 18136 (0.0024) +[2024-11-08 01:28:34,397][41694] Fps is (10 sec: 6430.3, 60 sec: 6863.8, 300 sec: 6797.5). Total num frames: 74297344. Throughput: 0: 1688.1. Samples: 13567502. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:28:34,399][41694] Avg episode reward: [(0, '4.518')] +[2024-11-08 01:28:37,863][42004] Updated weights for policy 0, policy_version 18146 (0.0033) +[2024-11-08 01:28:37,931][41694] Fps is (10 sec: 6144.0, 60 sec: 6895.0, 300 sec: 6803.6). Total num frames: 74326016. Throughput: 0: 1664.3. Samples: 13575292. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:28:37,933][41694] Avg episode reward: [(0, '4.390')] +[2024-11-08 01:28:42,933][41694] Fps is (10 sec: 7197.8, 60 sec: 6826.6, 300 sec: 6775.7). Total num frames: 74358784. Throughput: 0: 1649.3. Samples: 13585920. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:28:42,935][41694] Avg episode reward: [(0, '4.583')] +[2024-11-08 01:28:43,892][42004] Updated weights for policy 0, policy_version 18156 (0.0036) +[2024-11-08 01:28:47,932][41694] Fps is (10 sec: 6962.9, 60 sec: 6826.7, 300 sec: 6761.9). Total num frames: 74395648. Throughput: 0: 1717.0. Samples: 13591444. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:28:47,934][41694] Avg episode reward: [(0, '4.496')] +[2024-11-08 01:28:49,316][42004] Updated weights for policy 0, policy_version 18166 (0.0028) +[2024-11-08 01:28:52,932][41694] Fps is (10 sec: 7373.9, 60 sec: 6826.7, 300 sec: 6775.8). Total num frames: 74432512. Throughput: 0: 1752.6. Samples: 13602710. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-11-08 01:28:52,933][41694] Avg episode reward: [(0, '4.263')] +[2024-11-08 01:28:54,618][42004] Updated weights for policy 0, policy_version 18176 (0.0034) +[2024-11-08 01:28:57,931][41694] Fps is (10 sec: 7782.7, 60 sec: 6826.7, 300 sec: 6845.2). Total num frames: 74473472. Throughput: 0: 1773.6. Samples: 13614284. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-11-08 01:28:57,933][41694] Avg episode reward: [(0, '4.393')] +[2024-11-08 01:28:59,987][42004] Updated weights for policy 0, policy_version 18186 (0.0040) +[2024-11-08 01:29:02,932][41694] Fps is (10 sec: 7372.8, 60 sec: 7031.5, 300 sec: 6845.2). Total num frames: 74506240. Throughput: 0: 1790.4. Samples: 13620042. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:29:02,934][41694] Avg episode reward: [(0, '4.339')] +[2024-11-08 01:29:05,968][42004] Updated weights for policy 0, policy_version 18196 (0.0054) +[2024-11-08 01:29:08,564][41694] Fps is (10 sec: 5778.4, 60 sec: 6890.7, 300 sec: 6802.8). Total num frames: 74534912. Throughput: 0: 1749.5. Samples: 13630338. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:29:08,568][41694] Avg episode reward: [(0, '4.227')] +[2024-11-08 01:29:12,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6826.7, 300 sec: 6775.8). Total num frames: 74563584. Throughput: 0: 1678.8. Samples: 13637420. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:29:12,934][41694] Avg episode reward: [(0, '4.307')] +[2024-11-08 01:29:14,041][42004] Updated weights for policy 0, policy_version 18206 (0.0026) +[2024-11-08 01:29:17,933][41694] Fps is (10 sec: 6558.1, 60 sec: 6826.6, 300 sec: 6748.0). Total num frames: 74596352. Throughput: 0: 1709.1. Samples: 13641910. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:29:17,935][41694] Avg episode reward: [(0, '4.396')] +[2024-11-08 01:29:20,041][42004] Updated weights for policy 0, policy_version 18216 (0.0033) +[2024-11-08 01:29:22,932][41694] Fps is (10 sec: 6553.4, 60 sec: 6758.4, 300 sec: 6734.1). Total num frames: 74629120. Throughput: 0: 1710.5. Samples: 13652264. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:29:22,934][41694] Avg episode reward: [(0, '4.388')] +[2024-11-08 01:29:25,843][42004] Updated weights for policy 0, policy_version 18226 (0.0026) +[2024-11-08 01:29:27,932][41694] Fps is (10 sec: 6963.6, 60 sec: 6690.0, 300 sec: 6734.1). Total num frames: 74665984. Throughput: 0: 1712.3. Samples: 13662972. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:29:27,933][41694] Avg episode reward: [(0, '4.493')] +[2024-11-08 01:29:31,833][42004] Updated weights for policy 0, policy_version 18236 (0.0032) +[2024-11-08 01:29:32,931][41694] Fps is (10 sec: 7373.0, 60 sec: 6927.6, 300 sec: 6775.8). Total num frames: 74702848. Throughput: 0: 1699.2. Samples: 13667906. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:29:32,934][41694] Avg episode reward: [(0, '4.384')] +[2024-11-08 01:29:37,106][42004] Updated weights for policy 0, policy_version 18246 (0.0032) +[2024-11-08 01:29:37,932][41694] Fps is (10 sec: 7373.2, 60 sec: 6894.9, 300 sec: 6789.7). Total num frames: 74739712. Throughput: 0: 1702.9. Samples: 13679342. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:29:37,934][41694] Avg episode reward: [(0, '4.623')] +[2024-11-08 01:29:37,949][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000018247_74739712.pth... +[2024-11-08 01:29:38,114][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000017850_73113600.pth +[2024-11-08 01:29:42,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6690.3, 300 sec: 6748.0). Total num frames: 74760192. Throughput: 0: 1623.5. Samples: 13687342. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:29:42,933][41694] Avg episode reward: [(0, '4.275')] +[2024-11-08 01:29:45,569][42004] Updated weights for policy 0, policy_version 18256 (0.0024) +[2024-11-08 01:29:47,932][41694] Fps is (10 sec: 4915.1, 60 sec: 6553.6, 300 sec: 6706.3). Total num frames: 74788864. Throughput: 0: 1571.4. Samples: 13690756. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:29:47,934][41694] Avg episode reward: [(0, '4.229')] +[2024-11-08 01:29:52,898][42004] Updated weights for policy 0, policy_version 18266 (0.0031) +[2024-11-08 01:29:52,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6417.1, 300 sec: 6678.6). Total num frames: 74817536. Throughput: 0: 1556.0. Samples: 13699372. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:29:52,935][41694] Avg episode reward: [(0, '4.520')] +[2024-11-08 01:29:57,933][41694] Fps is (10 sec: 5733.9, 60 sec: 6212.1, 300 sec: 6664.7). Total num frames: 74846208. Throughput: 0: 1578.7. Samples: 13708462. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:29:57,935][41694] Avg episode reward: [(0, '4.661')] +[2024-11-08 01:29:59,276][42004] Updated weights for policy 0, policy_version 18276 (0.0045) +[2024-11-08 01:30:02,931][41694] Fps is (10 sec: 5734.5, 60 sec: 6144.0, 300 sec: 6690.0). Total num frames: 74874880. Throughput: 0: 1588.2. Samples: 13713376. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:30:02,933][41694] Avg episode reward: [(0, '4.457')] +[2024-11-08 01:30:06,200][42004] Updated weights for policy 0, policy_version 18286 (0.0035) +[2024-11-08 01:30:07,932][41694] Fps is (10 sec: 6144.7, 60 sec: 6278.5, 300 sec: 6678.6). Total num frames: 74907648. Throughput: 0: 1553.9. Samples: 13722190. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:30:07,933][41694] Avg episode reward: [(0, '4.563')] +[2024-11-08 01:30:12,000][42004] Updated weights for policy 0, policy_version 18296 (0.0034) +[2024-11-08 01:30:12,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6348.8, 300 sec: 6706.3). Total num frames: 74944512. Throughput: 0: 1545.2. Samples: 13732506. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:30:12,933][41694] Avg episode reward: [(0, '4.470')] +[2024-11-08 01:30:17,932][41694] Fps is (10 sec: 5324.5, 60 sec: 6075.8, 300 sec: 6650.8). Total num frames: 74960896. Throughput: 0: 1529.9. Samples: 13736754. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:30:17,934][41694] Avg episode reward: [(0, '4.302')] +[2024-11-08 01:30:20,481][42004] Updated weights for policy 0, policy_version 18306 (0.0029) +[2024-11-08 01:30:22,932][41694] Fps is (10 sec: 4915.1, 60 sec: 6075.7, 300 sec: 6623.0). Total num frames: 74993664. Throughput: 0: 1432.0. Samples: 13743782. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:30:22,934][41694] Avg episode reward: [(0, '4.615')] +[2024-11-08 01:30:26,624][42004] Updated weights for policy 0, policy_version 18316 (0.0024) +[2024-11-08 01:30:27,932][41694] Fps is (10 sec: 6963.0, 60 sec: 6075.7, 300 sec: 6623.0). Total num frames: 75030528. Throughput: 0: 1479.3. Samples: 13753912. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:30:27,936][41694] Avg episode reward: [(0, '4.423')] +[2024-11-08 01:30:32,093][42004] Updated weights for policy 0, policy_version 18326 (0.0026) +[2024-11-08 01:30:32,931][41694] Fps is (10 sec: 7373.0, 60 sec: 6075.7, 300 sec: 6623.0). Total num frames: 75067392. Throughput: 0: 1523.6. Samples: 13759318. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:30:32,934][41694] Avg episode reward: [(0, '4.334')] +[2024-11-08 01:30:37,656][42004] Updated weights for policy 0, policy_version 18336 (0.0023) +[2024-11-08 01:30:37,932][41694] Fps is (10 sec: 7373.2, 60 sec: 6075.7, 300 sec: 6666.1). Total num frames: 75104256. Throughput: 0: 1581.8. Samples: 13770552. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:30:37,934][41694] Avg episode reward: [(0, '4.238')] +[2024-11-08 01:30:42,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6348.8, 300 sec: 6664.7). Total num frames: 75141120. Throughput: 0: 1635.7. Samples: 13782066. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:30:42,933][41694] Avg episode reward: [(0, '4.470')] +[2024-11-08 01:30:43,092][42004] Updated weights for policy 0, policy_version 18346 (0.0029) +[2024-11-08 01:30:47,931][41694] Fps is (10 sec: 7373.0, 60 sec: 6485.4, 300 sec: 6678.6). Total num frames: 75177984. Throughput: 0: 1643.4. Samples: 13787330. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:30:47,937][41694] Avg episode reward: [(0, '4.379')] +[2024-11-08 01:30:48,509][42004] Updated weights for policy 0, policy_version 18356 (0.0030) +[2024-11-08 01:30:52,932][41694] Fps is (10 sec: 6143.6, 60 sec: 6417.0, 300 sec: 6636.9). Total num frames: 75202560. Throughput: 0: 1622.0. Samples: 13795180. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:30:52,934][41694] Avg episode reward: [(0, '4.563')] +[2024-11-08 01:30:56,047][42004] Updated weights for policy 0, policy_version 18366 (0.0037) +[2024-11-08 01:30:57,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6485.5, 300 sec: 6636.9). Total num frames: 75235328. Throughput: 0: 1631.5. Samples: 13805922. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:30:57,933][41694] Avg episode reward: [(0, '4.265')] +[2024-11-08 01:31:01,896][42004] Updated weights for policy 0, policy_version 18376 (0.0025) +[2024-11-08 01:31:02,932][41694] Fps is (10 sec: 6963.5, 60 sec: 6621.8, 300 sec: 6636.9). Total num frames: 75272192. Throughput: 0: 1649.3. Samples: 13810972. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:31:02,933][41694] Avg episode reward: [(0, '4.416')] +[2024-11-08 01:31:07,609][42004] Updated weights for policy 0, policy_version 18386 (0.0032) +[2024-11-08 01:31:07,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6690.1, 300 sec: 6650.8). Total num frames: 75309056. Throughput: 0: 1733.5. Samples: 13821788. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:31:07,933][41694] Avg episode reward: [(0, '4.296')] +[2024-11-08 01:31:12,931][41694] Fps is (10 sec: 7373.0, 60 sec: 6690.1, 300 sec: 6706.3). Total num frames: 75345920. Throughput: 0: 1739.9. Samples: 13832204. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:31:12,933][41694] Avg episode reward: [(0, '4.216')] +[2024-11-08 01:31:13,536][42004] Updated weights for policy 0, policy_version 18396 (0.0040) +[2024-11-08 01:31:17,931][41694] Fps is (10 sec: 6963.4, 60 sec: 6963.3, 300 sec: 6692.4). Total num frames: 75378688. Throughput: 0: 1738.9. Samples: 13837568. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:31:17,934][41694] Avg episode reward: [(0, '4.463')] +[2024-11-08 01:31:19,173][42004] Updated weights for policy 0, policy_version 18406 (0.0037) +[2024-11-08 01:31:22,932][41694] Fps is (10 sec: 6963.1, 60 sec: 7031.5, 300 sec: 6706.3). Total num frames: 75415552. Throughput: 0: 1732.3. Samples: 13848504. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:31:22,933][41694] Avg episode reward: [(0, '4.425')] +[2024-11-08 01:31:27,413][42004] Updated weights for policy 0, policy_version 18416 (0.0037) +[2024-11-08 01:31:27,932][41694] Fps is (10 sec: 5324.7, 60 sec: 6690.2, 300 sec: 6636.9). Total num frames: 75431936. Throughput: 0: 1618.4. Samples: 13854892. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:31:27,938][41694] Avg episode reward: [(0, '4.581')] +[2024-11-08 01:31:32,931][41694] Fps is (10 sec: 4915.3, 60 sec: 6621.9, 300 sec: 6609.1). Total num frames: 75464704. Throughput: 0: 1598.0. Samples: 13859240. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:31:32,934][41694] Avg episode reward: [(0, '4.608')] +[2024-11-08 01:31:34,194][42004] Updated weights for policy 0, policy_version 18426 (0.0027) +[2024-11-08 01:31:37,932][41694] Fps is (10 sec: 6553.5, 60 sec: 6553.6, 300 sec: 6609.1). Total num frames: 75497472. Throughput: 0: 1645.4. Samples: 13869224. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:31:37,934][41694] Avg episode reward: [(0, '4.493')] +[2024-11-08 01:31:37,996][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000018433_75501568.pth... +[2024-11-08 01:31:38,113][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000018045_73912320.pth +[2024-11-08 01:31:39,717][42004] Updated weights for policy 0, policy_version 18436 (0.0033) +[2024-11-08 01:31:42,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6621.9, 300 sec: 6623.0). Total num frames: 75538432. Throughput: 0: 1655.7. Samples: 13880430. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:31:42,934][41694] Avg episode reward: [(0, '4.469')] +[2024-11-08 01:31:44,981][42004] Updated weights for policy 0, policy_version 18446 (0.0027) +[2024-11-08 01:31:47,932][41694] Fps is (10 sec: 7782.5, 60 sec: 6621.9, 300 sec: 6678.6). Total num frames: 75575296. Throughput: 0: 1671.3. Samples: 13886182. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:31:47,933][41694] Avg episode reward: [(0, '4.425')] +[2024-11-08 01:31:50,554][42004] Updated weights for policy 0, policy_version 18456 (0.0032) +[2024-11-08 01:31:52,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6826.7, 300 sec: 6678.6). Total num frames: 75612160. Throughput: 0: 1680.4. Samples: 13897408. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:31:52,933][41694] Avg episode reward: [(0, '4.219')] +[2024-11-08 01:31:56,065][42004] Updated weights for policy 0, policy_version 18466 (0.0031) +[2024-11-08 01:31:59,382][41694] Fps is (10 sec: 6081.0, 60 sec: 6665.5, 300 sec: 6673.5). Total num frames: 75644928. Throughput: 0: 1644.5. Samples: 13908594. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:31:59,384][41694] Avg episode reward: [(0, '4.415')] +[2024-11-08 01:32:02,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6621.9, 300 sec: 6664.7). Total num frames: 75669504. Throughput: 0: 1621.1. Samples: 13910520. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:32:02,934][41694] Avg episode reward: [(0, '4.552')] +[2024-11-08 01:32:04,016][42004] Updated weights for policy 0, policy_version 18476 (0.0042) +[2024-11-08 01:32:07,932][41694] Fps is (10 sec: 6707.5, 60 sec: 6553.6, 300 sec: 6636.9). Total num frames: 75702272. Throughput: 0: 1589.7. Samples: 13920042. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:32:07,934][41694] Avg episode reward: [(0, '4.524')] +[2024-11-08 01:32:10,002][42004] Updated weights for policy 0, policy_version 18486 (0.0040) +[2024-11-08 01:32:12,931][41694] Fps is (10 sec: 6963.4, 60 sec: 6553.6, 300 sec: 6636.9). Total num frames: 75739136. Throughput: 0: 1694.0. Samples: 13931120. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:32:12,933][41694] Avg episode reward: [(0, '4.395')] +[2024-11-08 01:32:15,363][42004] Updated weights for policy 0, policy_version 18496 (0.0032) +[2024-11-08 01:32:17,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6621.9, 300 sec: 6636.9). Total num frames: 75776000. Throughput: 0: 1723.4. Samples: 13936794. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:32:17,933][41694] Avg episode reward: [(0, '4.501')] +[2024-11-08 01:32:20,864][42004] Updated weights for policy 0, policy_version 18506 (0.0050) +[2024-11-08 01:32:22,932][41694] Fps is (10 sec: 7372.5, 60 sec: 6621.8, 300 sec: 6678.6). Total num frames: 75812864. Throughput: 0: 1752.2. Samples: 13948074. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:32:22,937][41694] Avg episode reward: [(0, '4.391')] +[2024-11-08 01:32:26,341][42004] Updated weights for policy 0, policy_version 18516 (0.0023) +[2024-11-08 01:32:27,931][41694] Fps is (10 sec: 7782.5, 60 sec: 7031.5, 300 sec: 6706.3). Total num frames: 75853824. Throughput: 0: 1753.0. Samples: 13959316. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:32:27,933][41694] Avg episode reward: [(0, '4.676')] +[2024-11-08 01:32:33,366][41694] Fps is (10 sec: 6281.1, 60 sec: 6845.4, 300 sec: 6654.9). Total num frames: 75878400. Throughput: 0: 1734.9. Samples: 13965006. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:32:33,367][41694] Avg episode reward: [(0, '4.512')] +[2024-11-08 01:32:33,471][42004] Updated weights for policy 0, policy_version 18526 (0.0033) +[2024-11-08 01:32:37,932][41694] Fps is (10 sec: 5733.9, 60 sec: 6894.9, 300 sec: 6650.8). Total num frames: 75911168. Throughput: 0: 1678.2. Samples: 13972928. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:32:37,934][41694] Avg episode reward: [(0, '4.528')] +[2024-11-08 01:32:39,408][42004] Updated weights for policy 0, policy_version 18536 (0.0026) +[2024-11-08 01:32:42,931][41694] Fps is (10 sec: 7279.3, 60 sec: 6826.7, 300 sec: 6650.8). Total num frames: 75948032. Throughput: 0: 1707.1. Samples: 13982938. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:32:42,933][41694] Avg episode reward: [(0, '4.439')] +[2024-11-08 01:32:45,202][42004] Updated weights for policy 0, policy_version 18546 (0.0028) +[2024-11-08 01:32:47,931][41694] Fps is (10 sec: 6963.9, 60 sec: 6758.4, 300 sec: 6636.9). Total num frames: 75980800. Throughput: 0: 1729.1. Samples: 13988330. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:32:47,933][41694] Avg episode reward: [(0, '4.134')] +[2024-11-08 01:32:50,995][42004] Updated weights for policy 0, policy_version 18556 (0.0036) +[2024-11-08 01:32:52,932][41694] Fps is (10 sec: 6962.8, 60 sec: 6758.4, 300 sec: 6623.0). Total num frames: 76017664. Throughput: 0: 1750.5. Samples: 13998814. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:32:52,934][41694] Avg episode reward: [(0, '4.443')] +[2024-11-08 01:32:56,476][42004] Updated weights for policy 0, policy_version 18566 (0.0033) +[2024-11-08 01:32:57,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6995.8, 300 sec: 6678.6). Total num frames: 76054528. Throughput: 0: 1757.0. Samples: 14010184. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:32:57,933][41694] Avg episode reward: [(0, '4.311')] +[2024-11-08 01:33:02,102][42004] Updated weights for policy 0, policy_version 18576 (0.0036) +[2024-11-08 01:33:02,931][41694] Fps is (10 sec: 7373.2, 60 sec: 7031.5, 300 sec: 6692.5). Total num frames: 76091392. Throughput: 0: 1756.1. Samples: 14015818. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:33:02,933][41694] Avg episode reward: [(0, '4.577')] +[2024-11-08 01:33:08,006][41694] Fps is (10 sec: 5692.0, 60 sec: 6818.2, 300 sec: 6635.2). Total num frames: 76111872. Throughput: 0: 1625.2. Samples: 14021326. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:33:08,007][41694] Avg episode reward: [(0, '4.461')] +[2024-11-08 01:33:09,787][42004] Updated weights for policy 0, policy_version 18586 (0.0025) +[2024-11-08 01:33:12,931][41694] Fps is (10 sec: 5324.8, 60 sec: 6758.4, 300 sec: 6636.9). Total num frames: 76144640. Throughput: 0: 1636.7. Samples: 14032966. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:33:12,934][41694] Avg episode reward: [(0, '4.470')] +[2024-11-08 01:33:16,080][42004] Updated weights for policy 0, policy_version 18596 (0.0038) +[2024-11-08 01:33:17,931][41694] Fps is (10 sec: 7015.4, 60 sec: 6758.4, 300 sec: 6636.9). Total num frames: 76181504. Throughput: 0: 1638.3. Samples: 14038018. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:33:17,933][41694] Avg episode reward: [(0, '4.524')] +[2024-11-08 01:33:21,454][42004] Updated weights for policy 0, policy_version 18606 (0.0027) +[2024-11-08 01:33:22,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6758.4, 300 sec: 6623.0). Total num frames: 76218368. Throughput: 0: 1695.0. Samples: 14049200. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:33:22,935][41694] Avg episode reward: [(0, '4.268')] +[2024-11-08 01:33:26,922][42004] Updated weights for policy 0, policy_version 18616 (0.0030) +[2024-11-08 01:33:27,931][41694] Fps is (10 sec: 7782.5, 60 sec: 6758.4, 300 sec: 6684.0). Total num frames: 76259328. Throughput: 0: 1725.6. Samples: 14060592. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:33:27,933][41694] Avg episode reward: [(0, '4.241')] +[2024-11-08 01:33:32,242][42004] Updated weights for policy 0, policy_version 18626 (0.0027) +[2024-11-08 01:33:32,932][41694] Fps is (10 sec: 7782.4, 60 sec: 7014.0, 300 sec: 6678.6). Total num frames: 76296192. Throughput: 0: 1732.1. Samples: 14066276. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:33:32,934][41694] Avg episode reward: [(0, '4.473')] +[2024-11-08 01:33:37,793][42004] Updated weights for policy 0, policy_version 18636 (0.0029) +[2024-11-08 01:33:37,932][41694] Fps is (10 sec: 7372.1, 60 sec: 7031.5, 300 sec: 6692.5). Total num frames: 76333056. Throughput: 0: 1749.9. Samples: 14077560. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:33:37,936][41694] Avg episode reward: [(0, '4.396')] +[2024-11-08 01:33:37,954][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000018636_76333056.pth... +[2024-11-08 01:33:38,109][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000018247_74739712.pth +[2024-11-08 01:33:42,931][41694] Fps is (10 sec: 5734.5, 60 sec: 6758.4, 300 sec: 6636.9). Total num frames: 76353536. Throughput: 0: 1663.2. Samples: 14085030. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:33:42,934][41694] Avg episode reward: [(0, '4.461')] +[2024-11-08 01:33:46,045][42004] Updated weights for policy 0, policy_version 18646 (0.0031) +[2024-11-08 01:33:47,931][41694] Fps is (10 sec: 4915.6, 60 sec: 6690.1, 300 sec: 6609.1). Total num frames: 76382208. Throughput: 0: 1637.0. Samples: 14089482. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:33:47,933][41694] Avg episode reward: [(0, '4.465')] +[2024-11-08 01:33:51,879][42004] Updated weights for policy 0, policy_version 18656 (0.0029) +[2024-11-08 01:33:52,932][41694] Fps is (10 sec: 6962.5, 60 sec: 6758.3, 300 sec: 6609.1). Total num frames: 76423168. Throughput: 0: 1744.6. Samples: 14099704. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:33:52,934][41694] Avg episode reward: [(0, '4.399')] +[2024-11-08 01:33:57,144][42004] Updated weights for policy 0, policy_version 18666 (0.0026) +[2024-11-08 01:33:57,932][41694] Fps is (10 sec: 7782.3, 60 sec: 6758.4, 300 sec: 6623.0). Total num frames: 76460032. Throughput: 0: 1741.7. Samples: 14111344. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:33:57,933][41694] Avg episode reward: [(0, '4.252')] +[2024-11-08 01:34:02,699][42004] Updated weights for policy 0, policy_version 18676 (0.0027) +[2024-11-08 01:34:02,931][41694] Fps is (10 sec: 7373.5, 60 sec: 6758.4, 300 sec: 6665.1). Total num frames: 76496896. Throughput: 0: 1755.9. Samples: 14117032. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:34:02,935][41694] Avg episode reward: [(0, '4.423')] +[2024-11-08 01:34:07,932][41694] Fps is (10 sec: 7372.5, 60 sec: 7040.1, 300 sec: 6678.6). Total num frames: 76533760. Throughput: 0: 1748.6. Samples: 14127886. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:34:07,935][41694] Avg episode reward: [(0, '4.288')] +[2024-11-08 01:34:08,214][42004] Updated weights for policy 0, policy_version 18686 (0.0026) +[2024-11-08 01:34:12,932][41694] Fps is (10 sec: 7782.4, 60 sec: 7168.0, 300 sec: 6706.4). Total num frames: 76574720. Throughput: 0: 1754.5. Samples: 14139544. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:34:12,934][41694] Avg episode reward: [(0, '4.640')] +[2024-11-08 01:34:13,385][42004] Updated weights for policy 0, policy_version 18696 (0.0027) +[2024-11-08 01:34:17,932][41694] Fps is (10 sec: 6144.3, 60 sec: 6894.9, 300 sec: 6664.7). Total num frames: 76595200. Throughput: 0: 1737.0. Samples: 14144440. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:34:17,942][41694] Avg episode reward: [(0, '4.516')] +[2024-11-08 01:34:21,423][42004] Updated weights for policy 0, policy_version 18706 (0.0032) +[2024-11-08 01:34:22,932][41694] Fps is (10 sec: 5324.3, 60 sec: 6826.6, 300 sec: 6650.8). Total num frames: 76627968. Throughput: 0: 1650.2. Samples: 14151818. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:34:22,935][41694] Avg episode reward: [(0, '4.480')] +[2024-11-08 01:34:27,144][42004] Updated weights for policy 0, policy_version 18716 (0.0040) +[2024-11-08 01:34:27,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6758.4, 300 sec: 6650.8). Total num frames: 76664832. Throughput: 0: 1724.3. Samples: 14162626. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:34:27,935][41694] Avg episode reward: [(0, '4.542')] +[2024-11-08 01:34:32,505][42004] Updated weights for policy 0, policy_version 18726 (0.0030) +[2024-11-08 01:34:32,931][41694] Fps is (10 sec: 7373.5, 60 sec: 6758.4, 300 sec: 6650.8). Total num frames: 76701696. Throughput: 0: 1752.7. Samples: 14168354. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:34:32,933][41694] Avg episode reward: [(0, '4.291')] +[2024-11-08 01:34:37,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6758.5, 300 sec: 6706.3). Total num frames: 76738560. Throughput: 0: 1768.5. Samples: 14179284. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:34:37,934][41694] Avg episode reward: [(0, '4.327')] +[2024-11-08 01:34:38,028][42004] Updated weights for policy 0, policy_version 18736 (0.0049) +[2024-11-08 01:34:42,932][41694] Fps is (10 sec: 7372.8, 60 sec: 7031.5, 300 sec: 6734.1). Total num frames: 76775424. Throughput: 0: 1760.8. Samples: 14190578. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:34:42,935][41694] Avg episode reward: [(0, '4.341')] +[2024-11-08 01:34:43,574][42004] Updated weights for policy 0, policy_version 18746 (0.0026) +[2024-11-08 01:34:47,931][41694] Fps is (10 sec: 7373.0, 60 sec: 7168.0, 300 sec: 6761.9). Total num frames: 76812288. Throughput: 0: 1758.8. Samples: 14196180. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:34:47,933][41694] Avg episode reward: [(0, '4.217')] +[2024-11-08 01:34:50,995][42004] Updated weights for policy 0, policy_version 18756 (0.0026) +[2024-11-08 01:34:52,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6826.8, 300 sec: 6734.1). Total num frames: 76832768. Throughput: 0: 1686.4. Samples: 14203772. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:34:52,936][41694] Avg episode reward: [(0, '4.347')] +[2024-11-08 01:34:57,119][42004] Updated weights for policy 0, policy_version 18766 (0.0028) +[2024-11-08 01:34:57,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6826.7, 300 sec: 6761.9). Total num frames: 76869632. Throughput: 0: 1650.8. Samples: 14213832. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:34:57,934][41694] Avg episode reward: [(0, '4.284')] +[2024-11-08 01:35:02,545][42004] Updated weights for policy 0, policy_version 18776 (0.0030) +[2024-11-08 01:35:02,933][41694] Fps is (10 sec: 7371.9, 60 sec: 6826.5, 300 sec: 6775.7). Total num frames: 76906496. Throughput: 0: 1671.2. Samples: 14219646. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 01:35:02,935][41694] Avg episode reward: [(0, '4.397')] +[2024-11-08 01:35:07,932][41694] Fps is (10 sec: 7372.5, 60 sec: 6826.7, 300 sec: 6775.7). Total num frames: 76943360. Throughput: 0: 1748.8. Samples: 14230512. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 01:35:07,935][41694] Avg episode reward: [(0, '4.388')] +[2024-11-08 01:35:08,099][42004] Updated weights for policy 0, policy_version 18786 (0.0033) +[2024-11-08 01:35:12,932][41694] Fps is (10 sec: 7373.2, 60 sec: 6758.3, 300 sec: 6845.2). Total num frames: 76980224. Throughput: 0: 1758.6. Samples: 14241764. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 01:35:12,936][41694] Avg episode reward: [(0, '4.385')] +[2024-11-08 01:35:13,714][42004] Updated weights for policy 0, policy_version 18796 (0.0030) +[2024-11-08 01:35:17,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6963.2, 300 sec: 6845.2). Total num frames: 77012992. Throughput: 0: 1738.5. Samples: 14246586. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:35:17,935][41694] Avg episode reward: [(0, '4.200')] +[2024-11-08 01:35:19,654][42004] Updated weights for policy 0, policy_version 18806 (0.0032) +[2024-11-08 01:35:22,931][41694] Fps is (10 sec: 6963.7, 60 sec: 7031.6, 300 sec: 6845.2). Total num frames: 77049856. Throughput: 0: 1737.4. Samples: 14257468. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:35:22,934][41694] Avg episode reward: [(0, '4.608')] +[2024-11-08 01:35:27,587][42004] Updated weights for policy 0, policy_version 18816 (0.2051) +[2024-11-08 01:35:27,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6758.4, 300 sec: 6789.6). Total num frames: 77070336. Throughput: 0: 1639.3. Samples: 14264346. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:35:27,937][41694] Avg episode reward: [(0, '4.456')] +[2024-11-08 01:35:32,931][41694] Fps is (10 sec: 5734.4, 60 sec: 6758.4, 300 sec: 6789.6). Total num frames: 77107200. Throughput: 0: 1627.8. Samples: 14269432. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:35:32,933][41694] Avg episode reward: [(0, '4.423')] +[2024-11-08 01:35:33,055][42004] Updated weights for policy 0, policy_version 18826 (0.0027) +[2024-11-08 01:35:37,932][41694] Fps is (10 sec: 7373.0, 60 sec: 6758.4, 300 sec: 6789.6). Total num frames: 77144064. Throughput: 0: 1717.3. Samples: 14281050. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:35:37,933][41694] Avg episode reward: [(0, '4.292')] +[2024-11-08 01:35:37,987][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000018835_77148160.pth... +[2024-11-08 01:35:38,116][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000018433_75501568.pth +[2024-11-08 01:35:38,637][42004] Updated weights for policy 0, policy_version 18836 (0.0034) +[2024-11-08 01:35:42,933][41694] Fps is (10 sec: 7372.0, 60 sec: 6758.3, 300 sec: 6789.6). Total num frames: 77180928. Throughput: 0: 1731.4. Samples: 14291746. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:35:42,934][41694] Avg episode reward: [(0, '4.210')] +[2024-11-08 01:35:44,305][42004] Updated weights for policy 0, policy_version 18846 (0.0029) +[2024-11-08 01:35:47,939][41694] Fps is (10 sec: 7371.3, 60 sec: 6758.2, 300 sec: 6831.3). Total num frames: 77217792. Throughput: 0: 1719.8. Samples: 14297038. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:35:47,942][41694] Avg episode reward: [(0, '4.173')] +[2024-11-08 01:35:49,859][42004] Updated weights for policy 0, policy_version 18856 (0.0025) +[2024-11-08 01:35:52,932][41694] Fps is (10 sec: 7373.4, 60 sec: 7031.4, 300 sec: 6845.2). Total num frames: 77254656. Throughput: 0: 1726.1. Samples: 14308188. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:35:52,933][41694] Avg episode reward: [(0, '4.277')] +[2024-11-08 01:35:55,373][42004] Updated weights for policy 0, policy_version 18866 (0.0024) +[2024-11-08 01:35:59,234][41694] Fps is (10 sec: 6161.9, 60 sec: 6815.3, 300 sec: 6801.3). Total num frames: 77287424. Throughput: 0: 1673.3. Samples: 14319240. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:35:59,238][41694] Avg episode reward: [(0, '4.508')] +[2024-11-08 01:36:02,931][41694] Fps is (10 sec: 5325.0, 60 sec: 6690.3, 300 sec: 6775.8). Total num frames: 77307904. Throughput: 0: 1649.6. Samples: 14320818. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:36:02,934][41694] Avg episode reward: [(0, '4.317')] +[2024-11-08 01:36:03,671][42004] Updated weights for policy 0, policy_version 18876 (0.0022) +[2024-11-08 01:36:07,931][41694] Fps is (10 sec: 6593.2, 60 sec: 6690.2, 300 sec: 6775.8). Total num frames: 77344768. Throughput: 0: 1631.4. Samples: 14330880. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:36:07,934][41694] Avg episode reward: [(0, '4.399')] +[2024-11-08 01:36:09,342][42004] Updated weights for policy 0, policy_version 18886 (0.0027) +[2024-11-08 01:36:12,935][41694] Fps is (10 sec: 6960.9, 60 sec: 6621.6, 300 sec: 6775.7). Total num frames: 77377536. Throughput: 0: 1692.4. Samples: 14340508. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:36:12,937][41694] Avg episode reward: [(0, '4.580')] +[2024-11-08 01:36:15,858][42004] Updated weights for policy 0, policy_version 18896 (0.0044) +[2024-11-08 01:36:17,931][41694] Fps is (10 sec: 6553.6, 60 sec: 6621.9, 300 sec: 6761.9). Total num frames: 77410304. Throughput: 0: 1689.8. Samples: 14345472. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:36:17,935][41694] Avg episode reward: [(0, '4.297')] +[2024-11-08 01:36:21,427][42004] Updated weights for policy 0, policy_version 18906 (0.0026) +[2024-11-08 01:36:22,932][41694] Fps is (10 sec: 6965.4, 60 sec: 6621.9, 300 sec: 6831.3). Total num frames: 77447168. Throughput: 0: 1676.5. Samples: 14356492. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:36:22,936][41694] Avg episode reward: [(0, '4.281')] +[2024-11-08 01:36:26,719][42004] Updated weights for policy 0, policy_version 18916 (0.0041) +[2024-11-08 01:36:27,931][41694] Fps is (10 sec: 7782.4, 60 sec: 6963.2, 300 sec: 6859.1). Total num frames: 77488128. Throughput: 0: 1695.5. Samples: 14368040. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:36:27,934][41694] Avg episode reward: [(0, '4.376')] +[2024-11-08 01:36:33,465][41694] Fps is (10 sec: 6221.6, 60 sec: 6698.8, 300 sec: 6819.0). Total num frames: 77512704. Throughput: 0: 1680.4. Samples: 14373550. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:36:33,467][41694] Avg episode reward: [(0, '4.463')] +[2024-11-08 01:36:34,539][42004] Updated weights for policy 0, policy_version 18926 (0.0035) +[2024-11-08 01:36:37,931][41694] Fps is (10 sec: 5324.8, 60 sec: 6621.9, 300 sec: 6789.6). Total num frames: 77541376. Throughput: 0: 1599.3. Samples: 14380154. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:36:37,934][41694] Avg episode reward: [(0, '4.451')] +[2024-11-08 01:36:40,237][42004] Updated weights for policy 0, policy_version 18936 (0.0028) +[2024-11-08 01:36:42,931][41694] Fps is (10 sec: 6923.1, 60 sec: 6622.0, 300 sec: 6789.6). Total num frames: 77578240. Throughput: 0: 1650.3. Samples: 14391356. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:36:42,935][41694] Avg episode reward: [(0, '4.369')] +[2024-11-08 01:36:45,573][42004] Updated weights for policy 0, policy_version 18946 (0.0044) +[2024-11-08 01:36:47,931][41694] Fps is (10 sec: 7782.4, 60 sec: 6690.4, 300 sec: 6803.5). Total num frames: 77619200. Throughput: 0: 1697.1. Samples: 14397188. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:36:47,933][41694] Avg episode reward: [(0, '4.360')] +[2024-11-08 01:36:50,748][42004] Updated weights for policy 0, policy_version 18956 (0.0025) +[2024-11-08 01:36:52,933][41694] Fps is (10 sec: 8191.0, 60 sec: 6758.3, 300 sec: 6865.0). Total num frames: 77660160. Throughput: 0: 1738.5. Samples: 14409116. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:36:52,934][41694] Avg episode reward: [(0, '4.307')] +[2024-11-08 01:36:55,996][42004] Updated weights for policy 0, policy_version 18966 (0.0031) +[2024-11-08 01:36:57,932][41694] Fps is (10 sec: 7782.3, 60 sec: 6978.1, 300 sec: 6873.0). Total num frames: 77697024. Throughput: 0: 1785.1. Samples: 14420832. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:36:57,934][41694] Avg episode reward: [(0, '4.572')] +[2024-11-08 01:37:01,557][42004] Updated weights for policy 0, policy_version 18976 (0.0036) +[2024-11-08 01:37:02,932][41694] Fps is (10 sec: 7373.6, 60 sec: 7099.7, 300 sec: 6886.8). Total num frames: 77733888. Throughput: 0: 1797.4. Samples: 14426354. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:37:02,934][41694] Avg episode reward: [(0, '4.546')] +[2024-11-08 01:37:07,971][41694] Fps is (10 sec: 5304.0, 60 sec: 6754.0, 300 sec: 6816.5). Total num frames: 77750272. Throughput: 0: 1666.1. Samples: 14431532. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:37:07,973][41694] Avg episode reward: [(0, '4.545')] +[2024-11-08 01:37:09,805][42004] Updated weights for policy 0, policy_version 18986 (0.0037) +[2024-11-08 01:37:12,932][41694] Fps is (10 sec: 5324.8, 60 sec: 6827.0, 300 sec: 6817.4). Total num frames: 77787136. Throughput: 0: 1666.1. Samples: 14443014. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:37:12,935][41694] Avg episode reward: [(0, '4.425')] +[2024-11-08 01:37:15,409][42004] Updated weights for policy 0, policy_version 18996 (0.0031) +[2024-11-08 01:37:17,931][41694] Fps is (10 sec: 7401.8, 60 sec: 6894.9, 300 sec: 6817.4). Total num frames: 77824000. Throughput: 0: 1690.3. Samples: 14448712. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-08 01:37:17,933][41694] Avg episode reward: [(0, '4.522')] +[2024-11-08 01:37:20,802][42004] Updated weights for policy 0, policy_version 19006 (0.0025) +[2024-11-08 01:37:22,931][41694] Fps is (10 sec: 7782.5, 60 sec: 6963.2, 300 sec: 6817.4). Total num frames: 77864960. Throughput: 0: 1777.8. Samples: 14460156. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-08 01:37:22,933][41694] Avg episode reward: [(0, '4.565')] +[2024-11-08 01:37:26,112][42004] Updated weights for policy 0, policy_version 19016 (0.0022) +[2024-11-08 01:37:27,932][41694] Fps is (10 sec: 7782.3, 60 sec: 6894.9, 300 sec: 6869.2). Total num frames: 77901824. Throughput: 0: 1787.3. Samples: 14471784. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:37:27,933][41694] Avg episode reward: [(0, '4.338')] +[2024-11-08 01:37:31,533][42004] Updated weights for policy 0, policy_version 19026 (0.0031) +[2024-11-08 01:37:32,932][41694] Fps is (10 sec: 7372.8, 60 sec: 7163.5, 300 sec: 6873.0). Total num frames: 77938688. Throughput: 0: 1776.6. Samples: 14477136. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:37:32,936][41694] Avg episode reward: [(0, '4.394')] +[2024-11-08 01:37:37,090][42004] Updated weights for policy 0, policy_version 19036 (0.0024) +[2024-11-08 01:37:37,939][41694] Fps is (10 sec: 7367.3, 60 sec: 7235.4, 300 sec: 6872.8). Total num frames: 77975552. Throughput: 0: 1765.6. Samples: 14488580. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:37:37,941][41694] Avg episode reward: [(0, '4.483')] +[2024-11-08 01:37:37,963][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000019037_77975552.pth... +[2024-11-08 01:37:38,119][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000018636_76333056.pth +[2024-11-08 01:37:42,932][41694] Fps is (10 sec: 5324.6, 60 sec: 6894.9, 300 sec: 6817.4). Total num frames: 77991936. Throughput: 0: 1652.9. Samples: 14495214. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:37:42,934][41694] Avg episode reward: [(0, '4.410')] +[2024-11-08 01:37:45,374][42004] Updated weights for policy 0, policy_version 19046 (0.0041) +[2024-11-08 01:37:47,932][41694] Fps is (10 sec: 5328.7, 60 sec: 6826.6, 300 sec: 6817.4). Total num frames: 78028800. Throughput: 0: 1637.4. Samples: 14500038. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:37:47,933][41694] Avg episode reward: [(0, '4.393')] +[2024-11-08 01:37:50,798][42004] Updated weights for policy 0, policy_version 19056 (0.0029) +[2024-11-08 01:37:52,931][41694] Fps is (10 sec: 7782.6, 60 sec: 6826.8, 300 sec: 6831.3). Total num frames: 78069760. Throughput: 0: 1780.6. Samples: 14511590. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:37:52,933][41694] Avg episode reward: [(0, '4.556')] +[2024-11-08 01:37:55,912][42004] Updated weights for policy 0, policy_version 19066 (0.0024) +[2024-11-08 01:37:57,932][41694] Fps is (10 sec: 7782.5, 60 sec: 6826.7, 300 sec: 6831.3). Total num frames: 78106624. Throughput: 0: 1786.1. Samples: 14523388. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:37:57,933][41694] Avg episode reward: [(0, '4.579')] +[2024-11-08 01:38:01,596][42004] Updated weights for policy 0, policy_version 19076 (0.0026) +[2024-11-08 01:38:02,935][41694] Fps is (10 sec: 7370.5, 60 sec: 6826.3, 300 sec: 6888.5). Total num frames: 78143488. Throughput: 0: 1778.2. Samples: 14528736. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:38:02,948][41694] Avg episode reward: [(0, '4.516')] +[2024-11-08 01:38:06,921][42004] Updated weights for policy 0, policy_version 19086 (0.0026) +[2024-11-08 01:38:07,932][41694] Fps is (10 sec: 7372.7, 60 sec: 7172.7, 300 sec: 6900.7). Total num frames: 78180352. Throughput: 0: 1770.3. Samples: 14539820. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:38:07,934][41694] Avg episode reward: [(0, '4.413')] +[2024-11-08 01:38:12,846][42004] Updated weights for policy 0, policy_version 19096 (0.0036) +[2024-11-08 01:38:12,931][41694] Fps is (10 sec: 7375.1, 60 sec: 7168.0, 300 sec: 6900.7). Total num frames: 78217216. Throughput: 0: 1747.0. Samples: 14550398. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:38:12,933][41694] Avg episode reward: [(0, '4.445')] +[2024-11-08 01:38:17,931][41694] Fps is (10 sec: 5324.9, 60 sec: 6826.7, 300 sec: 6831.3). Total num frames: 78233600. Throughput: 0: 1710.9. Samples: 14554126. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:38:17,933][41694] Avg episode reward: [(0, '4.425')] +[2024-11-08 01:38:21,085][42004] Updated weights for policy 0, policy_version 19106 (0.0042) +[2024-11-08 01:38:22,931][41694] Fps is (10 sec: 5324.8, 60 sec: 6758.4, 300 sec: 6817.4). Total num frames: 78270464. Throughput: 0: 1633.3. Samples: 14562066. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:38:22,933][41694] Avg episode reward: [(0, '4.564')] +[2024-11-08 01:38:26,501][42004] Updated weights for policy 0, policy_version 19116 (0.0022) +[2024-11-08 01:38:27,932][41694] Fps is (10 sec: 7372.6, 60 sec: 6758.4, 300 sec: 6817.4). Total num frames: 78307328. Throughput: 0: 1740.4. Samples: 14573534. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:38:27,933][41694] Avg episode reward: [(0, '4.476')] +[2024-11-08 01:38:31,688][42004] Updated weights for policy 0, policy_version 19126 (0.0024) +[2024-11-08 01:38:32,931][41694] Fps is (10 sec: 7782.4, 60 sec: 6826.7, 300 sec: 6831.3). Total num frames: 78348288. Throughput: 0: 1761.9. Samples: 14579324. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:38:32,935][41694] Avg episode reward: [(0, '4.568')] +[2024-11-08 01:38:36,990][42004] Updated weights for policy 0, policy_version 19136 (0.0026) +[2024-11-08 01:38:37,931][41694] Fps is (10 sec: 7782.6, 60 sec: 6827.5, 300 sec: 6886.8). Total num frames: 78385152. Throughput: 0: 1763.8. Samples: 14590962. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:38:37,933][41694] Avg episode reward: [(0, '4.621')] +[2024-11-08 01:38:42,447][42004] Updated weights for policy 0, policy_version 19146 (0.0029) +[2024-11-08 01:38:42,932][41694] Fps is (10 sec: 7372.7, 60 sec: 7168.0, 300 sec: 6914.6). Total num frames: 78422016. Throughput: 0: 1752.9. Samples: 14602270. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:38:42,933][41694] Avg episode reward: [(0, '4.411')] +[2024-11-08 01:38:47,931][41694] Fps is (10 sec: 7372.8, 60 sec: 7168.0, 300 sec: 6900.7). Total num frames: 78458880. Throughput: 0: 1758.1. Samples: 14607846. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:38:47,934][41694] Avg episode reward: [(0, '4.452')] +[2024-11-08 01:38:48,273][42004] Updated weights for policy 0, policy_version 19156 (0.0031) +[2024-11-08 01:38:52,931][41694] Fps is (10 sec: 5734.5, 60 sec: 6826.7, 300 sec: 6845.2). Total num frames: 78479360. Throughput: 0: 1652.8. Samples: 14614194. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:38:52,932][41694] Avg episode reward: [(0, '4.570')] +[2024-11-08 01:38:56,218][42004] Updated weights for policy 0, policy_version 19166 (0.0027) +[2024-11-08 01:38:57,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6826.7, 300 sec: 6845.2). Total num frames: 78516224. Throughput: 0: 1660.7. Samples: 14625130. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:38:57,933][41694] Avg episode reward: [(0, '4.471')] +[2024-11-08 01:39:01,841][42004] Updated weights for policy 0, policy_version 19176 (0.0026) +[2024-11-08 01:39:02,931][41694] Fps is (10 sec: 6963.1, 60 sec: 6758.7, 300 sec: 6831.3). Total num frames: 78548992. Throughput: 0: 1696.5. Samples: 14630468. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:39:02,933][41694] Avg episode reward: [(0, '4.399')] +[2024-11-08 01:39:07,231][42004] Updated weights for policy 0, policy_version 19186 (0.0029) +[2024-11-08 01:39:07,932][41694] Fps is (10 sec: 7372.6, 60 sec: 6826.6, 300 sec: 6831.3). Total num frames: 78589952. Throughput: 0: 1769.0. Samples: 14641674. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:39:07,935][41694] Avg episode reward: [(0, '4.458')] +[2024-11-08 01:39:12,813][42004] Updated weights for policy 0, policy_version 19196 (0.0035) +[2024-11-08 01:39:12,931][41694] Fps is (10 sec: 7782.4, 60 sec: 6826.7, 300 sec: 6886.8). Total num frames: 78626816. Throughput: 0: 1763.3. Samples: 14652880. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:39:12,933][41694] Avg episode reward: [(0, '4.349')] +[2024-11-08 01:39:17,932][41694] Fps is (10 sec: 7373.1, 60 sec: 7168.0, 300 sec: 6900.7). Total num frames: 78663680. Throughput: 0: 1755.3. Samples: 14658312. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:39:17,933][41694] Avg episode reward: [(0, '4.360')] +[2024-11-08 01:39:18,440][42004] Updated weights for policy 0, policy_version 19206 (0.0035) +[2024-11-08 01:39:22,931][41694] Fps is (10 sec: 6963.2, 60 sec: 7099.7, 300 sec: 6886.8). Total num frames: 78696448. Throughput: 0: 1720.5. Samples: 14668386. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:39:22,933][41694] Avg episode reward: [(0, '4.390')] +[2024-11-08 01:39:26,599][42004] Updated weights for policy 0, policy_version 19216 (0.0030) +[2024-11-08 01:39:27,932][41694] Fps is (10 sec: 5324.8, 60 sec: 6826.7, 300 sec: 6831.3). Total num frames: 78716928. Throughput: 0: 1626.6. Samples: 14675468. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:39:27,934][41694] Avg episode reward: [(0, '4.431')] +[2024-11-08 01:39:32,199][42004] Updated weights for policy 0, policy_version 19226 (0.0032) +[2024-11-08 01:39:32,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6758.4, 300 sec: 6831.3). Total num frames: 78753792. Throughput: 0: 1620.8. Samples: 14680782. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:39:32,934][41694] Avg episode reward: [(0, '4.555')] +[2024-11-08 01:39:37,603][42004] Updated weights for policy 0, policy_version 19236 (0.0020) +[2024-11-08 01:39:37,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6758.4, 300 sec: 6831.3). Total num frames: 78790656. Throughput: 0: 1733.0. Samples: 14692180. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:39:37,933][41694] Avg episode reward: [(0, '4.583')] +[2024-11-08 01:39:37,950][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000019236_78790656.pth... +[2024-11-08 01:39:38,065][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000018835_77148160.pth +[2024-11-08 01:39:42,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6758.4, 300 sec: 6831.3). Total num frames: 78827520. Throughput: 0: 1741.9. Samples: 14703516. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:39:42,935][41694] Avg episode reward: [(0, '4.310')] +[2024-11-08 01:39:43,070][42004] Updated weights for policy 0, policy_version 19246 (0.0029) +[2024-11-08 01:39:47,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6826.7, 300 sec: 6900.7). Total num frames: 78868480. Throughput: 0: 1752.1. Samples: 14709312. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:39:47,933][41694] Avg episode reward: [(0, '4.321')] +[2024-11-08 01:39:48,215][42004] Updated weights for policy 0, policy_version 19256 (0.0019) +[2024-11-08 01:39:52,932][41694] Fps is (10 sec: 7372.6, 60 sec: 7031.4, 300 sec: 6886.8). Total num frames: 78901248. Throughput: 0: 1755.2. Samples: 14720656. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:39:52,935][41694] Avg episode reward: [(0, '4.318')] +[2024-11-08 01:39:54,223][42004] Updated weights for policy 0, policy_version 19266 (0.0038) +[2024-11-08 01:39:59,967][41694] Fps is (10 sec: 5445.1, 60 sec: 6734.7, 300 sec: 6825.9). Total num frames: 78934016. Throughput: 0: 1652.6. Samples: 14730612. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:39:59,972][41694] Avg episode reward: [(0, '4.290')] +[2024-11-08 01:40:02,550][42004] Updated weights for policy 0, policy_version 19276 (0.0028) +[2024-11-08 01:40:02,931][41694] Fps is (10 sec: 5324.9, 60 sec: 6758.4, 300 sec: 6817.4). Total num frames: 78954496. Throughput: 0: 1634.5. Samples: 14731864. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:40:02,933][41694] Avg episode reward: [(0, '4.283')] +[2024-11-08 01:40:07,932][41694] Fps is (10 sec: 7200.2, 60 sec: 6690.2, 300 sec: 6817.4). Total num frames: 78991360. Throughput: 0: 1639.4. Samples: 14742160. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:40:07,935][41694] Avg episode reward: [(0, '4.669')] +[2024-11-08 01:40:08,341][42004] Updated weights for policy 0, policy_version 19286 (0.0029) +[2024-11-08 01:40:12,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6690.1, 300 sec: 6831.3). Total num frames: 79028224. Throughput: 0: 1723.1. Samples: 14753008. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:40:12,933][41694] Avg episode reward: [(0, '4.746')] +[2024-11-08 01:40:14,176][42004] Updated weights for policy 0, policy_version 19296 (0.0033) +[2024-11-08 01:40:17,932][41694] Fps is (10 sec: 6963.3, 60 sec: 6621.9, 300 sec: 6817.4). Total num frames: 79060992. Throughput: 0: 1717.7. Samples: 14758078. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:40:17,934][41694] Avg episode reward: [(0, '4.344')] +[2024-11-08 01:40:20,277][42004] Updated weights for policy 0, policy_version 19306 (0.0045) +[2024-11-08 01:40:22,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6690.1, 300 sec: 6873.0). Total num frames: 79097856. Throughput: 0: 1693.9. Samples: 14768404. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:40:22,933][41694] Avg episode reward: [(0, '4.222')] +[2024-11-08 01:40:25,730][42004] Updated weights for policy 0, policy_version 19316 (0.0030) +[2024-11-08 01:40:27,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6894.9, 300 sec: 6859.1). Total num frames: 79130624. Throughput: 0: 1681.2. Samples: 14779170. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:40:27,934][41694] Avg episode reward: [(0, '4.407')] +[2024-11-08 01:40:31,559][42004] Updated weights for policy 0, policy_version 19326 (0.0030) +[2024-11-08 01:40:34,582][41694] Fps is (10 sec: 5625.4, 60 sec: 6643.9, 300 sec: 6807.1). Total num frames: 79163392. Throughput: 0: 1608.3. Samples: 14784338. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:40:34,585][41694] Avg episode reward: [(0, '4.486')] +[2024-11-08 01:40:37,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6621.9, 300 sec: 6803.5). Total num frames: 79187968. Throughput: 0: 1565.1. Samples: 14791086. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:40:37,934][41694] Avg episode reward: [(0, '4.415')] +[2024-11-08 01:40:39,696][42004] Updated weights for policy 0, policy_version 19336 (0.0034) +[2024-11-08 01:40:42,932][41694] Fps is (10 sec: 6867.6, 60 sec: 6553.6, 300 sec: 6789.7). Total num frames: 79220736. Throughput: 0: 1657.0. Samples: 14801802. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:40:42,933][41694] Avg episode reward: [(0, '4.394')] +[2024-11-08 01:40:45,221][42004] Updated weights for policy 0, policy_version 19346 (0.0027) +[2024-11-08 01:40:47,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6553.6, 300 sec: 6803.5). Total num frames: 79261696. Throughput: 0: 1681.0. Samples: 14807508. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:40:47,934][41694] Avg episode reward: [(0, '4.364')] +[2024-11-08 01:40:50,395][42004] Updated weights for policy 0, policy_version 19356 (0.0028) +[2024-11-08 01:40:52,931][41694] Fps is (10 sec: 7782.5, 60 sec: 6621.9, 300 sec: 6847.6). Total num frames: 79298560. Throughput: 0: 1715.9. Samples: 14819374. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:40:52,933][41694] Avg episode reward: [(0, '4.609')] +[2024-11-08 01:40:55,831][42004] Updated weights for policy 0, policy_version 19366 (0.0040) +[2024-11-08 01:40:57,932][41694] Fps is (10 sec: 7372.6, 60 sec: 6925.1, 300 sec: 6872.9). Total num frames: 79335424. Throughput: 0: 1727.5. Samples: 14830746. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 01:40:57,934][41694] Avg episode reward: [(0, '4.484')] +[2024-11-08 01:41:01,931][42004] Updated weights for policy 0, policy_version 19376 (0.0041) +[2024-11-08 01:41:02,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6894.9, 300 sec: 6859.1). Total num frames: 79368192. Throughput: 0: 1724.2. Samples: 14835666. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 01:41:02,935][41694] Avg episode reward: [(0, '4.593')] +[2024-11-08 01:41:09,141][41694] Fps is (10 sec: 5481.3, 60 sec: 6624.9, 300 sec: 6817.3). Total num frames: 79396864. Throughput: 0: 1680.7. Samples: 14846066. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 01:41:09,143][41694] Avg episode reward: [(0, '4.497')] +[2024-11-08 01:41:10,090][42004] Updated weights for policy 0, policy_version 19386 (0.0039) +[2024-11-08 01:41:12,935][41694] Fps is (10 sec: 5322.8, 60 sec: 6553.2, 300 sec: 6817.3). Total num frames: 79421440. Throughput: 0: 1611.2. Samples: 14851680. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:41:12,939][41694] Avg episode reward: [(0, '4.451')] +[2024-11-08 01:41:16,683][42004] Updated weights for policy 0, policy_version 19396 (0.0029) +[2024-11-08 01:41:17,931][41694] Fps is (10 sec: 6523.3, 60 sec: 6553.6, 300 sec: 6803.5). Total num frames: 79454208. Throughput: 0: 1665.9. Samples: 14856556. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:41:17,934][41694] Avg episode reward: [(0, '4.408')] +[2024-11-08 01:41:21,938][42004] Updated weights for policy 0, policy_version 19406 (0.0032) +[2024-11-08 01:41:22,931][41694] Fps is (10 sec: 6965.9, 60 sec: 6553.6, 300 sec: 6789.6). Total num frames: 79491072. Throughput: 0: 1698.1. Samples: 14867502. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:41:22,933][41694] Avg episode reward: [(0, '4.472')] +[2024-11-08 01:41:27,059][42004] Updated weights for policy 0, policy_version 19416 (0.0035) +[2024-11-08 01:41:27,932][41694] Fps is (10 sec: 7782.3, 60 sec: 6690.1, 300 sec: 6857.6). Total num frames: 79532032. Throughput: 0: 1724.8. Samples: 14879416. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:41:27,933][41694] Avg episode reward: [(0, '4.653')] +[2024-11-08 01:41:32,255][42004] Updated weights for policy 0, policy_version 19426 (0.0026) +[2024-11-08 01:41:32,931][41694] Fps is (10 sec: 8192.0, 60 sec: 7019.7, 300 sec: 6886.8). Total num frames: 79572992. Throughput: 0: 1731.4. Samples: 14885420. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:41:32,935][41694] Avg episode reward: [(0, '4.358')] +[2024-11-08 01:41:37,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6963.2, 300 sec: 6872.9). Total num frames: 79605760. Throughput: 0: 1703.6. Samples: 14896034. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:41:37,933][41694] Avg episode reward: [(0, '4.452')] +[2024-11-08 01:41:37,947][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000019435_79605760.pth... +[2024-11-08 01:41:38,048][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000019037_77975552.pth +[2024-11-08 01:41:38,267][42004] Updated weights for policy 0, policy_version 19436 (0.0031) +[2024-11-08 01:41:43,818][41694] Fps is (10 sec: 5267.3, 60 sec: 6727.3, 300 sec: 6797.0). Total num frames: 79630336. Throughput: 0: 1539.8. Samples: 14901402. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:41:43,820][41694] Avg episode reward: [(0, '4.553')] +[2024-11-08 01:41:46,273][42004] Updated weights for policy 0, policy_version 19446 (0.0023) +[2024-11-08 01:41:47,931][41694] Fps is (10 sec: 5324.8, 60 sec: 6621.9, 300 sec: 6775.8). Total num frames: 79659008. Throughput: 0: 1609.2. Samples: 14908082. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:41:47,933][41694] Avg episode reward: [(0, '4.331')] +[2024-11-08 01:41:52,059][42004] Updated weights for policy 0, policy_version 19456 (0.0030) +[2024-11-08 01:41:52,932][41694] Fps is (10 sec: 7191.1, 60 sec: 6621.8, 300 sec: 6775.8). Total num frames: 79695872. Throughput: 0: 1657.9. Samples: 14918666. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:41:52,934][41694] Avg episode reward: [(0, '4.277')] +[2024-11-08 01:41:57,434][42004] Updated weights for policy 0, policy_version 19466 (0.0034) +[2024-11-08 01:41:57,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6621.9, 300 sec: 6775.8). Total num frames: 79732736. Throughput: 0: 1738.8. Samples: 14929918. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:41:57,937][41694] Avg episode reward: [(0, '4.383')] +[2024-11-08 01:42:02,823][42004] Updated weights for policy 0, policy_version 19476 (0.0024) +[2024-11-08 01:42:02,931][41694] Fps is (10 sec: 7782.5, 60 sec: 6758.4, 300 sec: 6860.0). Total num frames: 79773696. Throughput: 0: 1759.9. Samples: 14935752. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:42:02,933][41694] Avg episode reward: [(0, '4.469')] +[2024-11-08 01:42:07,931][41694] Fps is (10 sec: 7782.4, 60 sec: 7036.8, 300 sec: 6859.1). Total num frames: 79810560. Throughput: 0: 1765.4. Samples: 14946944. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:42:07,935][41694] Avg episode reward: [(0, '4.263')] +[2024-11-08 01:42:08,441][42004] Updated weights for policy 0, policy_version 19486 (0.0036) +[2024-11-08 01:42:12,931][41694] Fps is (10 sec: 6963.3, 60 sec: 7031.9, 300 sec: 6845.2). Total num frames: 79843328. Throughput: 0: 1729.1. Samples: 14957226. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:42:12,932][41694] Avg episode reward: [(0, '4.331')] +[2024-11-08 01:42:14,263][42004] Updated weights for policy 0, policy_version 19496 (0.0034) +[2024-11-08 01:42:18,380][41694] Fps is (10 sec: 5488.2, 60 sec: 6843.8, 300 sec: 6779.3). Total num frames: 79867904. Throughput: 0: 1702.3. Samples: 14962788. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:42:18,384][41694] Avg episode reward: [(0, '4.294')] +[2024-11-08 01:42:22,220][42004] Updated weights for policy 0, policy_version 19506 (0.0029) +[2024-11-08 01:42:22,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6826.7, 300 sec: 6775.8). Total num frames: 79900672. Throughput: 0: 1630.7. Samples: 14969416. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:42:22,935][41694] Avg episode reward: [(0, '4.392')] +[2024-11-08 01:42:27,668][42004] Updated weights for policy 0, policy_version 19516 (0.0024) +[2024-11-08 01:42:27,932][41694] Fps is (10 sec: 7290.1, 60 sec: 6758.4, 300 sec: 6775.8). Total num frames: 79937536. Throughput: 0: 1798.3. Samples: 14980730. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:42:27,933][41694] Avg episode reward: [(0, '4.670')] +[2024-11-08 01:42:32,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6690.1, 300 sec: 6775.9). Total num frames: 79974400. Throughput: 0: 1737.8. Samples: 14986284. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:42:32,936][41694] Avg episode reward: [(0, '4.441')] +[2024-11-08 01:42:33,079][42004] Updated weights for policy 0, policy_version 19526 (0.0033) +[2024-11-08 01:42:37,932][41694] Fps is (10 sec: 7782.5, 60 sec: 6826.7, 300 sec: 6859.1). Total num frames: 80015360. Throughput: 0: 1764.2. Samples: 14998054. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:42:37,933][41694] Avg episode reward: [(0, '4.473')] +[2024-11-08 01:42:38,421][42004] Updated weights for policy 0, policy_version 19536 (0.0024) +[2024-11-08 01:42:42,932][41694] Fps is (10 sec: 7372.7, 60 sec: 7067.6, 300 sec: 6845.2). Total num frames: 80048128. Throughput: 0: 1748.1. Samples: 15008584. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:42:42,934][41694] Avg episode reward: [(0, '4.354')] +[2024-11-08 01:42:44,397][42004] Updated weights for policy 0, policy_version 19546 (0.0025) +[2024-11-08 01:42:47,931][41694] Fps is (10 sec: 6963.2, 60 sec: 7099.7, 300 sec: 6831.3). Total num frames: 80084992. Throughput: 0: 1733.5. Samples: 15013760. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:42:47,933][41694] Avg episode reward: [(0, '4.426')] +[2024-11-08 01:42:49,778][42004] Updated weights for policy 0, policy_version 19556 (0.0040) +[2024-11-08 01:42:52,931][41694] Fps is (10 sec: 6144.2, 60 sec: 6895.0, 300 sec: 6789.6). Total num frames: 80109568. Throughput: 0: 1723.4. Samples: 15024498. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:42:52,933][41694] Avg episode reward: [(0, '4.362')] +[2024-11-08 01:42:57,023][42004] Updated weights for policy 0, policy_version 19566 (0.0024) +[2024-11-08 01:42:57,932][41694] Fps is (10 sec: 6144.0, 60 sec: 6894.9, 300 sec: 6789.7). Total num frames: 80146432. Throughput: 0: 1686.7. Samples: 15033126. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:42:57,933][41694] Avg episode reward: [(0, '4.741')] +[2024-11-08 01:43:02,409][42004] Updated weights for policy 0, policy_version 19576 (0.0037) +[2024-11-08 01:43:02,931][41694] Fps is (10 sec: 7372.7, 60 sec: 6826.7, 300 sec: 6789.6). Total num frames: 80183296. Throughput: 0: 1707.7. Samples: 15038870. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:43:02,933][41694] Avg episode reward: [(0, '4.570')] +[2024-11-08 01:43:07,428][42004] Updated weights for policy 0, policy_version 19586 (0.0034) +[2024-11-08 01:43:07,931][41694] Fps is (10 sec: 8192.0, 60 sec: 6963.2, 300 sec: 6817.4). Total num frames: 80228352. Throughput: 0: 1805.2. Samples: 15050648. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:43:07,933][41694] Avg episode reward: [(0, '4.442')] +[2024-11-08 01:43:12,761][42004] Updated weights for policy 0, policy_version 19596 (0.0032) +[2024-11-08 01:43:12,931][41694] Fps is (10 sec: 8192.0, 60 sec: 7031.5, 300 sec: 6886.8). Total num frames: 80265216. Throughput: 0: 1818.0. Samples: 15062540. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:43:12,934][41694] Avg episode reward: [(0, '4.544')] +[2024-11-08 01:43:17,931][41694] Fps is (10 sec: 6963.2, 60 sec: 7222.0, 300 sec: 6872.9). Total num frames: 80297984. Throughput: 0: 1802.7. Samples: 15067404. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:43:17,933][41694] Avg episode reward: [(0, '4.259')] +[2024-11-08 01:43:18,767][42004] Updated weights for policy 0, policy_version 19606 (0.0040) +[2024-11-08 01:43:22,931][41694] Fps is (10 sec: 6963.2, 60 sec: 7236.3, 300 sec: 6873.0). Total num frames: 80334848. Throughput: 0: 1783.5. Samples: 15078310. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:43:22,933][41694] Avg episode reward: [(0, '4.415')] +[2024-11-08 01:43:24,205][42004] Updated weights for policy 0, policy_version 19616 (0.0024) +[2024-11-08 01:43:27,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6963.2, 300 sec: 6803.5). Total num frames: 80355328. Throughput: 0: 1704.6. Samples: 15085290. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:43:27,936][41694] Avg episode reward: [(0, '4.425')] +[2024-11-08 01:43:32,214][42004] Updated weights for policy 0, policy_version 19626 (0.0030) +[2024-11-08 01:43:32,931][41694] Fps is (10 sec: 5734.3, 60 sec: 6963.2, 300 sec: 6803.5). Total num frames: 80392192. Throughput: 0: 1700.0. Samples: 15090262. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:43:32,933][41694] Avg episode reward: [(0, '4.444')] +[2024-11-08 01:43:37,813][42004] Updated weights for policy 0, policy_version 19636 (0.0024) +[2024-11-08 01:43:37,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6894.9, 300 sec: 6803.5). Total num frames: 80429056. Throughput: 0: 1706.6. Samples: 15101294. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:43:37,933][41694] Avg episode reward: [(0, '4.335')] +[2024-11-08 01:43:37,945][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000019636_80429056.pth... +[2024-11-08 01:43:38,106][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000019236_78790656.pth +[2024-11-08 01:43:42,760][42004] Updated weights for policy 0, policy_version 19646 (0.0024) +[2024-11-08 01:43:42,932][41694] Fps is (10 sec: 7782.3, 60 sec: 7031.5, 300 sec: 6817.4). Total num frames: 80470016. Throughput: 0: 1784.8. Samples: 15113444. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:43:42,933][41694] Avg episode reward: [(0, '4.584')] +[2024-11-08 01:43:47,932][41694] Fps is (10 sec: 7782.4, 60 sec: 7031.4, 300 sec: 6872.9). Total num frames: 80506880. Throughput: 0: 1789.9. Samples: 15119418. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:43:47,933][41694] Avg episode reward: [(0, '4.646')] +[2024-11-08 01:43:48,066][42004] Updated weights for policy 0, policy_version 19656 (0.0025) +[2024-11-08 01:43:52,931][41694] Fps is (10 sec: 7372.8, 60 sec: 7236.2, 300 sec: 6873.0). Total num frames: 80543744. Throughput: 0: 1770.8. Samples: 15130334. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:43:52,933][41694] Avg episode reward: [(0, '4.357')] +[2024-11-08 01:43:53,674][42004] Updated weights for policy 0, policy_version 19666 (0.0033) +[2024-11-08 01:43:57,932][41694] Fps is (10 sec: 7781.9, 60 sec: 7304.4, 300 sec: 6900.7). Total num frames: 80584704. Throughput: 0: 1768.2. Samples: 15142112. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:43:57,934][41694] Avg episode reward: [(0, '4.421')] +[2024-11-08 01:43:58,814][42004] Updated weights for policy 0, policy_version 19676 (0.0027) +[2024-11-08 01:44:02,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6963.2, 300 sec: 6817.4). Total num frames: 80601088. Throughput: 0: 1760.7. Samples: 15146636. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:44:02,933][41694] Avg episode reward: [(0, '4.541')] +[2024-11-08 01:44:06,795][42004] Updated weights for policy 0, policy_version 19686 (0.0026) +[2024-11-08 01:44:07,931][41694] Fps is (10 sec: 5734.9, 60 sec: 6894.9, 300 sec: 6831.3). Total num frames: 80642048. Throughput: 0: 1691.5. Samples: 15154428. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:44:07,933][41694] Avg episode reward: [(0, '4.498')] +[2024-11-08 01:44:12,260][42004] Updated weights for policy 0, policy_version 19696 (0.0028) +[2024-11-08 01:44:12,931][41694] Fps is (10 sec: 7782.4, 60 sec: 6894.9, 300 sec: 6831.3). Total num frames: 80678912. Throughput: 0: 1787.7. Samples: 15165738. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:44:12,933][41694] Avg episode reward: [(0, '4.331')] +[2024-11-08 01:44:17,532][42004] Updated weights for policy 0, policy_version 19706 (0.0026) +[2024-11-08 01:44:17,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6963.2, 300 sec: 6845.2). Total num frames: 80715776. Throughput: 0: 1802.8. Samples: 15171390. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:44:17,933][41694] Avg episode reward: [(0, '4.527')] +[2024-11-08 01:44:22,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6963.2, 300 sec: 6900.7). Total num frames: 80752640. Throughput: 0: 1816.2. Samples: 15183024. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:44:22,933][41694] Avg episode reward: [(0, '4.396')] +[2024-11-08 01:44:23,235][42004] Updated weights for policy 0, policy_version 19716 (0.0023) +[2024-11-08 01:44:27,932][41694] Fps is (10 sec: 7372.8, 60 sec: 7236.3, 300 sec: 6900.7). Total num frames: 80789504. Throughput: 0: 1784.9. Samples: 15193764. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:44:27,933][41694] Avg episode reward: [(0, '4.255')] +[2024-11-08 01:44:28,654][42004] Updated weights for policy 0, policy_version 19726 (0.0021) +[2024-11-08 01:44:32,932][41694] Fps is (10 sec: 7782.2, 60 sec: 7304.5, 300 sec: 6914.6). Total num frames: 80830464. Throughput: 0: 1783.3. Samples: 15199668. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:44:32,935][41694] Avg episode reward: [(0, '4.632')] +[2024-11-08 01:44:33,877][42004] Updated weights for policy 0, policy_version 19736 (0.0027) +[2024-11-08 01:44:37,932][41694] Fps is (10 sec: 6144.0, 60 sec: 7031.5, 300 sec: 6859.1). Total num frames: 80850944. Throughput: 0: 1713.2. Samples: 15207428. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:44:37,934][41694] Avg episode reward: [(0, '4.413')] +[2024-11-08 01:44:41,590][42004] Updated weights for policy 0, policy_version 19746 (0.0026) +[2024-11-08 01:44:42,932][41694] Fps is (10 sec: 5734.5, 60 sec: 6963.2, 300 sec: 6845.2). Total num frames: 80887808. Throughput: 0: 1695.0. Samples: 15218388. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:44:42,934][41694] Avg episode reward: [(0, '4.385')] +[2024-11-08 01:44:46,751][42004] Updated weights for policy 0, policy_version 19756 (0.0031) +[2024-11-08 01:44:47,931][41694] Fps is (10 sec: 7782.5, 60 sec: 7031.5, 300 sec: 6873.0). Total num frames: 80928768. Throughput: 0: 1719.5. Samples: 15224012. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:44:47,933][41694] Avg episode reward: [(0, '4.659')] +[2024-11-08 01:44:51,959][42004] Updated weights for policy 0, policy_version 19766 (0.0033) +[2024-11-08 01:44:52,931][41694] Fps is (10 sec: 7782.6, 60 sec: 7031.5, 300 sec: 6934.7). Total num frames: 80965632. Throughput: 0: 1812.7. Samples: 15235998. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:44:52,933][41694] Avg episode reward: [(0, '4.629')] +[2024-11-08 01:44:57,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6895.0, 300 sec: 6928.5). Total num frames: 80998400. Throughput: 0: 1787.5. Samples: 15246176. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:44:57,934][41694] Avg episode reward: [(0, '4.444')] +[2024-11-08 01:44:58,152][42004] Updated weights for policy 0, policy_version 19776 (0.0034) +[2024-11-08 01:45:02,932][41694] Fps is (10 sec: 6962.9, 60 sec: 7236.2, 300 sec: 6928.5). Total num frames: 81035264. Throughput: 0: 1781.5. Samples: 15251560. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:45:02,934][41694] Avg episode reward: [(0, '4.596')] +[2024-11-08 01:45:03,611][42004] Updated weights for policy 0, policy_version 19786 (0.0035) +[2024-11-08 01:45:07,931][41694] Fps is (10 sec: 7372.9, 60 sec: 7168.0, 300 sec: 6928.5). Total num frames: 81072128. Throughput: 0: 1775.7. Samples: 15262930. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:45:07,934][41694] Avg episode reward: [(0, '4.549')] +[2024-11-08 01:45:11,430][42004] Updated weights for policy 0, policy_version 19796 (0.0032) +[2024-11-08 01:45:12,931][41694] Fps is (10 sec: 5734.6, 60 sec: 6894.9, 300 sec: 6886.8). Total num frames: 81092608. Throughput: 0: 1685.8. Samples: 15269624. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:45:12,933][41694] Avg episode reward: [(0, '4.384')] +[2024-11-08 01:45:17,560][42004] Updated weights for policy 0, policy_version 19806 (0.0026) +[2024-11-08 01:45:17,932][41694] Fps is (10 sec: 5324.7, 60 sec: 6826.7, 300 sec: 6872.9). Total num frames: 81125376. Throughput: 0: 1665.4. Samples: 15274612. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:45:17,933][41694] Avg episode reward: [(0, '4.457')] +[2024-11-08 01:45:22,710][42004] Updated weights for policy 0, policy_version 19816 (0.0027) +[2024-11-08 01:45:22,932][41694] Fps is (10 sec: 7372.6, 60 sec: 6894.9, 300 sec: 6900.7). Total num frames: 81166336. Throughput: 0: 1735.6. Samples: 15285528. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:45:22,935][41694] Avg episode reward: [(0, '4.393')] +[2024-11-08 01:45:27,934][41694] Fps is (10 sec: 7781.1, 60 sec: 6894.7, 300 sec: 6953.5). Total num frames: 81203200. Throughput: 0: 1757.0. Samples: 15297454. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:45:27,939][41694] Avg episode reward: [(0, '4.455')] +[2024-11-08 01:45:28,186][42004] Updated weights for policy 0, policy_version 19826 (0.0026) +[2024-11-08 01:45:32,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6826.7, 300 sec: 6956.3). Total num frames: 81240064. Throughput: 0: 1743.0. Samples: 15302448. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:45:32,933][41694] Avg episode reward: [(0, '4.372')] +[2024-11-08 01:45:33,791][42004] Updated weights for policy 0, policy_version 19836 (0.0034) +[2024-11-08 01:45:37,931][41694] Fps is (10 sec: 7374.2, 60 sec: 7099.7, 300 sec: 6970.1). Total num frames: 81276928. Throughput: 0: 1725.7. Samples: 15313654. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:45:37,933][41694] Avg episode reward: [(0, '4.330')] +[2024-11-08 01:45:37,951][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000019843_81276928.pth... +[2024-11-08 01:45:38,178][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000019435_79605760.pth +[2024-11-08 01:45:39,330][42004] Updated weights for policy 0, policy_version 19846 (0.0032) +[2024-11-08 01:45:42,933][41694] Fps is (10 sec: 7372.0, 60 sec: 7099.6, 300 sec: 6956.2). Total num frames: 81313792. Throughput: 0: 1750.8. Samples: 15324964. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:45:42,936][41694] Avg episode reward: [(0, '4.351')] +[2024-11-08 01:45:47,203][42004] Updated weights for policy 0, policy_version 19856 (0.0032) +[2024-11-08 01:45:47,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6758.4, 300 sec: 6900.7). Total num frames: 81334272. Throughput: 0: 1672.1. Samples: 15326802. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:45:47,933][41694] Avg episode reward: [(0, '4.471')] +[2024-11-08 01:45:52,606][42004] Updated weights for policy 0, policy_version 19866 (0.0025) +[2024-11-08 01:45:52,931][41694] Fps is (10 sec: 5735.0, 60 sec: 6758.4, 300 sec: 6900.7). Total num frames: 81371136. Throughput: 0: 1651.2. Samples: 15337234. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:45:52,934][41694] Avg episode reward: [(0, '4.290')] +[2024-11-08 01:45:57,774][42004] Updated weights for policy 0, policy_version 19876 (0.0030) +[2024-11-08 01:45:57,937][41694] Fps is (10 sec: 7778.0, 60 sec: 6894.3, 300 sec: 6928.4). Total num frames: 81412096. Throughput: 0: 1766.5. Samples: 15349128. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:45:57,946][41694] Avg episode reward: [(0, '4.452')] +[2024-11-08 01:46:02,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6826.7, 300 sec: 6970.9). Total num frames: 81444864. Throughput: 0: 1779.0. Samples: 15354668. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:46:02,934][41694] Avg episode reward: [(0, '4.465')] +[2024-11-08 01:46:03,789][42004] Updated weights for policy 0, policy_version 19886 (0.0027) +[2024-11-08 01:46:07,932][41694] Fps is (10 sec: 6966.7, 60 sec: 6826.6, 300 sec: 6984.1). Total num frames: 81481728. Throughput: 0: 1761.9. Samples: 15364816. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:46:07,935][41694] Avg episode reward: [(0, '4.402')] +[2024-11-08 01:46:09,744][42004] Updated weights for policy 0, policy_version 19896 (0.0038) +[2024-11-08 01:46:12,932][41694] Fps is (10 sec: 6963.1, 60 sec: 7031.4, 300 sec: 6984.0). Total num frames: 81514496. Throughput: 0: 1724.6. Samples: 15375058. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:46:12,934][41694] Avg episode reward: [(0, '4.404')] +[2024-11-08 01:46:15,477][42004] Updated weights for policy 0, policy_version 19906 (0.0036) +[2024-11-08 01:46:19,954][41694] Fps is (10 sec: 5792.3, 60 sec: 6868.3, 300 sec: 6936.5). Total num frames: 81551360. Throughput: 0: 1658.2. Samples: 15380422. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:46:19,955][41694] Avg episode reward: [(0, '4.417')] +[2024-11-08 01:46:22,932][41694] Fps is (10 sec: 5734.5, 60 sec: 6758.4, 300 sec: 6914.6). Total num frames: 81571840. Throughput: 0: 1641.1. Samples: 15387504. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:46:22,934][41694] Avg episode reward: [(0, '4.466')] +[2024-11-08 01:46:23,004][42004] Updated weights for policy 0, policy_version 19916 (0.0032) +[2024-11-08 01:46:27,931][41694] Fps is (10 sec: 7701.4, 60 sec: 6826.9, 300 sec: 6914.6). Total num frames: 81612800. Throughput: 0: 1654.7. Samples: 15399422. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 01:46:27,933][41694] Avg episode reward: [(0, '4.384')] +[2024-11-08 01:46:28,225][42004] Updated weights for policy 0, policy_version 19926 (0.0030) +[2024-11-08 01:46:32,931][41694] Fps is (10 sec: 7782.5, 60 sec: 6826.7, 300 sec: 6928.5). Total num frames: 81649664. Throughput: 0: 1741.5. Samples: 15405170. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 01:46:32,933][41694] Avg episode reward: [(0, '4.470')] +[2024-11-08 01:46:33,504][42004] Updated weights for policy 0, policy_version 19936 (0.0037) +[2024-11-08 01:46:37,931][41694] Fps is (10 sec: 7372.7, 60 sec: 6826.7, 300 sec: 6991.2). Total num frames: 81686528. Throughput: 0: 1768.0. Samples: 15416796. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 01:46:37,933][41694] Avg episode reward: [(0, '4.458')] +[2024-11-08 01:46:39,380][42004] Updated weights for policy 0, policy_version 19946 (0.0041) +[2024-11-08 01:46:42,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6826.8, 300 sec: 6997.9). Total num frames: 81723392. Throughput: 0: 1735.5. Samples: 15427216. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:46:42,934][41694] Avg episode reward: [(0, '4.428')] +[2024-11-08 01:46:44,786][42004] Updated weights for policy 0, policy_version 19956 (0.0028) +[2024-11-08 01:46:47,932][41694] Fps is (10 sec: 7782.3, 60 sec: 7168.0, 300 sec: 7011.8). Total num frames: 81764352. Throughput: 0: 1741.5. Samples: 15433036. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:46:47,933][41694] Avg episode reward: [(0, '4.543')] +[2024-11-08 01:46:50,107][42004] Updated weights for policy 0, policy_version 19966 (0.0032) +[2024-11-08 01:46:54,450][41694] Fps is (10 sec: 6045.2, 60 sec: 6857.9, 300 sec: 6948.3). Total num frames: 81793024. Throughput: 0: 1712.4. Samples: 15444472. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:46:54,452][41694] Avg episode reward: [(0, '4.417')] +[2024-11-08 01:46:57,788][42004] Updated weights for policy 0, policy_version 19976 (0.0028) +[2024-11-08 01:46:57,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6827.3, 300 sec: 6942.4). Total num frames: 81821696. Throughput: 0: 1699.7. Samples: 15451546. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:46:57,934][41694] Avg episode reward: [(0, '4.424')] +[2024-11-08 01:47:02,931][41694] Fps is (10 sec: 7726.9, 60 sec: 6894.9, 300 sec: 6942.4). Total num frames: 81858560. Throughput: 0: 1784.9. Samples: 15457134. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:47:02,933][41694] Avg episode reward: [(0, '4.726')] +[2024-11-08 01:47:03,325][42004] Updated weights for policy 0, policy_version 19986 (0.0028) +[2024-11-08 01:47:07,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6895.0, 300 sec: 6956.3). Total num frames: 81895424. Throughput: 0: 1795.8. Samples: 15468316. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:47:07,934][41694] Avg episode reward: [(0, '4.683')] +[2024-11-08 01:47:08,710][42004] Updated weights for policy 0, policy_version 19996 (0.0026) +[2024-11-08 01:47:12,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6894.9, 300 sec: 6994.7). Total num frames: 81928192. Throughput: 0: 1764.1. Samples: 15478806. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:47:12,934][41694] Avg episode reward: [(0, '4.574')] +[2024-11-08 01:47:14,967][42004] Updated weights for policy 0, policy_version 20006 (0.0024) +[2024-11-08 01:47:17,932][41694] Fps is (10 sec: 6963.3, 60 sec: 7135.4, 300 sec: 6997.9). Total num frames: 81965056. Throughput: 0: 1752.6. Samples: 15484038. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:47:17,935][41694] Avg episode reward: [(0, '4.339')] +[2024-11-08 01:47:20,134][42004] Updated weights for policy 0, policy_version 20016 (0.0033) +[2024-11-08 01:47:22,932][41694] Fps is (10 sec: 7782.3, 60 sec: 7236.2, 300 sec: 7011.8). Total num frames: 82006016. Throughput: 0: 1753.1. Samples: 15495688. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:47:22,938][41694] Avg episode reward: [(0, '4.396')] +[2024-11-08 01:47:25,477][42004] Updated weights for policy 0, policy_version 20026 (0.0030) +[2024-11-08 01:47:28,919][41694] Fps is (10 sec: 6337.6, 60 sec: 6917.6, 300 sec: 6960.7). Total num frames: 82034688. Throughput: 0: 1615.4. Samples: 15501504. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:47:28,922][41694] Avg episode reward: [(0, '4.563')] +[2024-11-08 01:47:32,932][41694] Fps is (10 sec: 5734.5, 60 sec: 6894.9, 300 sec: 6942.4). Total num frames: 82063360. Throughput: 0: 1678.3. Samples: 15508560. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:47:32,934][41694] Avg episode reward: [(0, '4.558')] +[2024-11-08 01:47:33,015][42004] Updated weights for policy 0, policy_version 20036 (0.0026) +[2024-11-08 01:47:37,932][41694] Fps is (10 sec: 7725.9, 60 sec: 6963.2, 300 sec: 6970.1). Total num frames: 82104320. Throughput: 0: 1743.2. Samples: 15520268. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:47:37,934][41694] Avg episode reward: [(0, '4.419')] +[2024-11-08 01:47:37,945][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000020045_82104320.pth... +[2024-11-08 01:47:38,066][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000019636_80429056.pth +[2024-11-08 01:47:38,219][42004] Updated weights for policy 0, policy_version 20046 (0.0029) +[2024-11-08 01:47:42,931][41694] Fps is (10 sec: 7782.5, 60 sec: 6963.2, 300 sec: 6970.1). Total num frames: 82141184. Throughput: 0: 1782.4. Samples: 15531756. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:47:42,933][41694] Avg episode reward: [(0, '4.263')] +[2024-11-08 01:47:43,664][42004] Updated weights for policy 0, policy_version 20056 (0.0035) +[2024-11-08 01:47:47,933][41694] Fps is (10 sec: 7781.5, 60 sec: 6963.1, 300 sec: 7025.6). Total num frames: 82182144. Throughput: 0: 1790.0. Samples: 15537688. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:47:47,936][41694] Avg episode reward: [(0, '4.506')] +[2024-11-08 01:47:48,787][42004] Updated weights for policy 0, policy_version 20066 (0.0030) +[2024-11-08 01:47:52,931][41694] Fps is (10 sec: 7782.4, 60 sec: 7284.1, 300 sec: 7025.7). Total num frames: 82219008. Throughput: 0: 1800.0. Samples: 15549316. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:47:52,933][41694] Avg episode reward: [(0, '4.557')] +[2024-11-08 01:47:54,126][42004] Updated weights for policy 0, policy_version 20076 (0.0031) +[2024-11-08 01:47:57,937][41694] Fps is (10 sec: 7373.7, 60 sec: 7236.3, 300 sec: 7025.7). Total num frames: 82255872. Throughput: 0: 1820.0. Samples: 15560706. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:47:57,948][41694] Avg episode reward: [(0, '4.462')] +[2024-11-08 01:48:00,108][42004] Updated weights for policy 0, policy_version 20086 (0.0025) +[2024-11-08 01:48:03,415][41694] Fps is (10 sec: 5469.8, 60 sec: 6907.5, 300 sec: 6931.0). Total num frames: 82276352. Throughput: 0: 1791.5. Samples: 15565520. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:48:03,416][41694] Avg episode reward: [(0, '4.477')] +[2024-11-08 01:48:07,889][42004] Updated weights for policy 0, policy_version 20096 (0.0033) +[2024-11-08 01:48:07,931][41694] Fps is (10 sec: 5734.5, 60 sec: 6963.2, 300 sec: 6942.4). Total num frames: 82313216. Throughput: 0: 1699.6. Samples: 15572170. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:48:07,933][41694] Avg episode reward: [(0, '4.273')] +[2024-11-08 01:48:12,932][41694] Fps is (10 sec: 7747.5, 60 sec: 7031.5, 300 sec: 6956.3). Total num frames: 82350080. Throughput: 0: 1862.5. Samples: 15583478. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:48:12,934][41694] Avg episode reward: [(0, '4.302')] +[2024-11-08 01:48:13,370][42004] Updated weights for policy 0, policy_version 20106 (0.0032) +[2024-11-08 01:48:17,931][41694] Fps is (10 sec: 7372.8, 60 sec: 7031.5, 300 sec: 6956.3). Total num frames: 82386944. Throughput: 0: 1791.1. Samples: 15589158. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:48:17,933][41694] Avg episode reward: [(0, '4.371')] +[2024-11-08 01:48:18,788][42004] Updated weights for policy 0, policy_version 20116 (0.0026) +[2024-11-08 01:48:22,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6963.2, 300 sec: 7011.8). Total num frames: 82423808. Throughput: 0: 1780.7. Samples: 15600400. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:48:22,933][41694] Avg episode reward: [(0, '4.178')] +[2024-11-08 01:48:24,071][42004] Updated weights for policy 0, policy_version 20126 (0.0024) +[2024-11-08 01:48:27,931][41694] Fps is (10 sec: 7782.4, 60 sec: 7287.9, 300 sec: 7025.7). Total num frames: 82464768. Throughput: 0: 1785.3. Samples: 15612094. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:48:27,932][41694] Avg episode reward: [(0, '4.438')] +[2024-11-08 01:48:29,512][42004] Updated weights for policy 0, policy_version 20136 (0.0030) +[2024-11-08 01:48:32,931][41694] Fps is (10 sec: 7373.0, 60 sec: 7236.3, 300 sec: 7011.8). Total num frames: 82497536. Throughput: 0: 1776.0. Samples: 15617604. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:48:32,932][41694] Avg episode reward: [(0, '4.487')] +[2024-11-08 01:48:35,559][42004] Updated weights for policy 0, policy_version 20146 (0.0028) +[2024-11-08 01:48:37,933][41694] Fps is (10 sec: 5324.2, 60 sec: 6894.8, 300 sec: 6942.3). Total num frames: 82518016. Throughput: 0: 1730.2. Samples: 15627176. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:48:37,934][41694] Avg episode reward: [(0, '4.532')] +[2024-11-08 01:48:42,931][41694] Fps is (10 sec: 5734.4, 60 sec: 6894.9, 300 sec: 6942.4). Total num frames: 82554880. Throughput: 0: 1655.4. Samples: 15635200. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:48:42,933][41694] Avg episode reward: [(0, '4.505')] +[2024-11-08 01:48:43,152][42004] Updated weights for policy 0, policy_version 20156 (0.0034) +[2024-11-08 01:48:47,931][41694] Fps is (10 sec: 7783.2, 60 sec: 6895.1, 300 sec: 6956.3). Total num frames: 82595840. Throughput: 0: 1698.2. Samples: 15641116. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:48:47,933][41694] Avg episode reward: [(0, '4.460')] +[2024-11-08 01:48:48,362][42004] Updated weights for policy 0, policy_version 20166 (0.0031) +[2024-11-08 01:48:52,932][41694] Fps is (10 sec: 7782.3, 60 sec: 6894.9, 300 sec: 6942.4). Total num frames: 82632704. Throughput: 0: 1786.8. Samples: 15652574. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:48:52,934][41694] Avg episode reward: [(0, '4.611')] +[2024-11-08 01:48:53,740][42004] Updated weights for policy 0, policy_version 20176 (0.0019) +[2024-11-08 01:48:57,931][41694] Fps is (10 sec: 7782.5, 60 sec: 6963.2, 300 sec: 7025.7). Total num frames: 82673664. Throughput: 0: 1793.3. Samples: 15664178. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:48:57,933][41694] Avg episode reward: [(0, '4.395')] +[2024-11-08 01:48:58,897][42004] Updated weights for policy 0, policy_version 20186 (0.0025) +[2024-11-08 01:49:02,932][41694] Fps is (10 sec: 7372.9, 60 sec: 7226.3, 300 sec: 6997.9). Total num frames: 82706432. Throughput: 0: 1794.4. Samples: 15669906. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:49:02,936][41694] Avg episode reward: [(0, '4.464')] +[2024-11-08 01:49:05,098][42004] Updated weights for policy 0, policy_version 20196 (0.0030) +[2024-11-08 01:49:07,932][41694] Fps is (10 sec: 6553.3, 60 sec: 7099.7, 300 sec: 6984.0). Total num frames: 82739200. Throughput: 0: 1761.4. Samples: 15679664. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:49:07,937][41694] Avg episode reward: [(0, '4.553')] +[2024-11-08 01:49:12,931][41694] Fps is (10 sec: 5324.8, 60 sec: 6826.7, 300 sec: 6928.5). Total num frames: 82759680. Throughput: 0: 1666.4. Samples: 15687084. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:49:12,934][41694] Avg episode reward: [(0, '4.506')] +[2024-11-08 01:49:13,035][42004] Updated weights for policy 0, policy_version 20206 (0.0031) +[2024-11-08 01:49:17,931][41694] Fps is (10 sec: 6144.3, 60 sec: 6894.9, 300 sec: 6942.4). Total num frames: 82800640. Throughput: 0: 1658.0. Samples: 15692212. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:49:17,933][41694] Avg episode reward: [(0, '4.444')] +[2024-11-08 01:49:18,211][42004] Updated weights for policy 0, policy_version 20216 (0.0023) +[2024-11-08 01:49:22,931][41694] Fps is (10 sec: 7782.4, 60 sec: 6894.9, 300 sec: 6942.4). Total num frames: 82837504. Throughput: 0: 1703.5. Samples: 15703832. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:49:22,933][41694] Avg episode reward: [(0, '4.338')] +[2024-11-08 01:49:23,665][42004] Updated weights for policy 0, policy_version 20226 (0.0024) +[2024-11-08 01:49:27,932][41694] Fps is (10 sec: 7782.3, 60 sec: 6894.9, 300 sec: 6942.4). Total num frames: 82878464. Throughput: 0: 1785.9. Samples: 15715564. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:49:27,933][41694] Avg episode reward: [(0, '4.447')] +[2024-11-08 01:49:28,836][42004] Updated weights for policy 0, policy_version 20236 (0.0031) +[2024-11-08 01:49:32,931][41694] Fps is (10 sec: 7782.5, 60 sec: 6963.2, 300 sec: 6997.9). Total num frames: 82915328. Throughput: 0: 1777.7. Samples: 15721114. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:49:32,933][41694] Avg episode reward: [(0, '4.514')] +[2024-11-08 01:49:34,321][42004] Updated weights for policy 0, policy_version 20246 (0.0032) +[2024-11-08 01:49:37,937][41694] Fps is (10 sec: 7368.8, 60 sec: 7235.7, 300 sec: 6997.8). Total num frames: 82952192. Throughput: 0: 1777.3. Samples: 15732562. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:49:37,949][41694] Avg episode reward: [(0, '4.462')] +[2024-11-08 01:49:37,962][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000020252_82952192.pth... +[2024-11-08 01:49:38,073][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000019843_81276928.pth +[2024-11-08 01:49:40,274][42004] Updated weights for policy 0, policy_version 20256 (0.0031) +[2024-11-08 01:49:42,931][41694] Fps is (10 sec: 6963.2, 60 sec: 7168.0, 300 sec: 6970.1). Total num frames: 82984960. Throughput: 0: 1742.4. Samples: 15742586. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:49:42,934][41694] Avg episode reward: [(0, '4.539')] +[2024-11-08 01:49:47,931][41694] Fps is (10 sec: 5327.7, 60 sec: 6826.7, 300 sec: 6914.6). Total num frames: 83005440. Throughput: 0: 1723.7. Samples: 15747472. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:49:47,934][41694] Avg episode reward: [(0, '4.648')] +[2024-11-08 01:49:48,182][42004] Updated weights for policy 0, policy_version 20266 (0.0029) +[2024-11-08 01:49:52,931][41694] Fps is (10 sec: 6144.0, 60 sec: 6894.9, 300 sec: 6942.4). Total num frames: 83046400. Throughput: 0: 1685.2. Samples: 15755496. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:49:52,934][41694] Avg episode reward: [(0, '4.467')] +[2024-11-08 01:49:53,358][42004] Updated weights for policy 0, policy_version 20276 (0.0030) +[2024-11-08 01:49:57,932][41694] Fps is (10 sec: 7782.3, 60 sec: 6826.7, 300 sec: 6942.4). Total num frames: 83083264. Throughput: 0: 1781.6. Samples: 15767258. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:49:57,933][41694] Avg episode reward: [(0, '4.398')] +[2024-11-08 01:49:58,678][42004] Updated weights for policy 0, policy_version 20286 (0.0027) +[2024-11-08 01:50:02,932][41694] Fps is (10 sec: 7372.6, 60 sec: 6894.9, 300 sec: 6942.4). Total num frames: 83120128. Throughput: 0: 1796.6. Samples: 15773058. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:50:02,936][41694] Avg episode reward: [(0, '4.306')] +[2024-11-08 01:50:04,117][42004] Updated weights for policy 0, policy_version 20296 (0.0030) +[2024-11-08 01:50:07,931][41694] Fps is (10 sec: 7782.5, 60 sec: 7031.5, 300 sec: 7011.8). Total num frames: 83161088. Throughput: 0: 1786.4. Samples: 15784218. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:50:07,933][41694] Avg episode reward: [(0, '4.462')] +[2024-11-08 01:50:09,367][42004] Updated weights for policy 0, policy_version 20306 (0.0031) +[2024-11-08 01:50:12,931][41694] Fps is (10 sec: 7373.0, 60 sec: 7236.3, 300 sec: 7011.8). Total num frames: 83193856. Throughput: 0: 1771.3. Samples: 15795272. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:50:12,933][41694] Avg episode reward: [(0, '4.378')] +[2024-11-08 01:50:16,235][42004] Updated weights for policy 0, policy_version 20316 (0.0034) +[2024-11-08 01:50:17,932][41694] Fps is (10 sec: 6143.9, 60 sec: 7031.4, 300 sec: 6970.1). Total num frames: 83222528. Throughput: 0: 1738.9. Samples: 15799364. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:50:17,934][41694] Avg episode reward: [(0, '4.431')] +[2024-11-08 01:50:22,931][41694] Fps is (10 sec: 4915.2, 60 sec: 6758.4, 300 sec: 6914.6). Total num frames: 83243008. Throughput: 0: 1633.7. Samples: 15806070. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:50:22,933][41694] Avg episode reward: [(0, '4.481')] +[2024-11-08 01:50:24,161][42004] Updated weights for policy 0, policy_version 20326 (0.0043) +[2024-11-08 01:50:27,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6690.1, 300 sec: 6914.6). Total num frames: 83279872. Throughput: 0: 1644.3. Samples: 15816578. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:50:27,935][41694] Avg episode reward: [(0, '4.314')] +[2024-11-08 01:50:29,630][42004] Updated weights for policy 0, policy_version 20336 (0.0023) +[2024-11-08 01:50:32,931][41694] Fps is (10 sec: 7782.5, 60 sec: 6758.4, 300 sec: 6928.5). Total num frames: 83320832. Throughput: 0: 1657.6. Samples: 15822062. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:50:32,934][41694] Avg episode reward: [(0, '4.242')] +[2024-11-08 01:50:34,996][42004] Updated weights for policy 0, policy_version 20346 (0.0020) +[2024-11-08 01:50:37,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6759.0, 300 sec: 6928.5). Total num frames: 83357696. Throughput: 0: 1737.6. Samples: 15833688. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:50:37,934][41694] Avg episode reward: [(0, '4.466')] +[2024-11-08 01:50:40,352][42004] Updated weights for policy 0, policy_version 20356 (0.0033) +[2024-11-08 01:50:42,931][41694] Fps is (10 sec: 7372.7, 60 sec: 6826.7, 300 sec: 6984.0). Total num frames: 83394560. Throughput: 0: 1733.3. Samples: 15845256. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:50:42,934][41694] Avg episode reward: [(0, '4.526')] +[2024-11-08 01:50:45,922][42004] Updated weights for policy 0, policy_version 20366 (0.0029) +[2024-11-08 01:50:47,932][41694] Fps is (10 sec: 7372.8, 60 sec: 7099.7, 300 sec: 6984.0). Total num frames: 83431424. Throughput: 0: 1728.8. Samples: 15850852. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:50:47,934][41694] Avg episode reward: [(0, '4.457')] +[2024-11-08 01:50:51,704][42004] Updated weights for policy 0, policy_version 20376 (0.0024) +[2024-11-08 01:50:52,931][41694] Fps is (10 sec: 7372.8, 60 sec: 7031.5, 300 sec: 6970.3). Total num frames: 83468288. Throughput: 0: 1714.0. Samples: 15861350. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 01:50:52,933][41694] Avg episode reward: [(0, '4.710')] +[2024-11-08 01:50:57,932][41694] Fps is (10 sec: 5734.2, 60 sec: 6758.4, 300 sec: 6928.5). Total num frames: 83488768. Throughput: 0: 1627.7. Samples: 15868518. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 01:50:57,934][41694] Avg episode reward: [(0, '4.764')] +[2024-11-08 01:50:59,267][42004] Updated weights for policy 0, policy_version 20386 (0.0027) +[2024-11-08 01:51:02,932][41694] Fps is (10 sec: 5324.6, 60 sec: 6690.1, 300 sec: 6914.6). Total num frames: 83521536. Throughput: 0: 1664.7. Samples: 15874276. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 01:51:02,935][41694] Avg episode reward: [(0, '4.456')] +[2024-11-08 01:51:05,155][42004] Updated weights for policy 0, policy_version 20396 (0.0026) +[2024-11-08 01:51:07,931][41694] Fps is (10 sec: 7373.1, 60 sec: 6690.1, 300 sec: 6942.4). Total num frames: 83562496. Throughput: 0: 1748.0. Samples: 15884728. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 01:51:07,934][41694] Avg episode reward: [(0, '4.288')] +[2024-11-08 01:51:10,805][42004] Updated weights for policy 0, policy_version 20406 (0.0025) +[2024-11-08 01:51:12,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6690.1, 300 sec: 6976.3). Total num frames: 83595264. Throughput: 0: 1750.7. Samples: 15895360. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 01:51:12,935][41694] Avg episode reward: [(0, '4.454')] +[2024-11-08 01:51:16,728][42004] Updated weights for policy 0, policy_version 20416 (0.0025) +[2024-11-08 01:51:17,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6826.7, 300 sec: 6984.0). Total num frames: 83632128. Throughput: 0: 1737.5. Samples: 15900250. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 01:51:17,933][41694] Avg episode reward: [(0, '4.516')] +[2024-11-08 01:51:22,781][42004] Updated weights for policy 0, policy_version 20426 (0.0040) +[2024-11-08 01:51:22,932][41694] Fps is (10 sec: 6963.5, 60 sec: 7031.5, 300 sec: 6956.3). Total num frames: 83664896. Throughput: 0: 1713.3. Samples: 15910786. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:51:22,933][41694] Avg episode reward: [(0, '4.358')] +[2024-11-08 01:51:27,932][41694] Fps is (10 sec: 6553.6, 60 sec: 6963.2, 300 sec: 6942.4). Total num frames: 83697664. Throughput: 0: 1686.1. Samples: 15921132. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:51:27,933][41694] Avg episode reward: [(0, '4.315')] +[2024-11-08 01:51:30,912][42004] Updated weights for policy 0, policy_version 20436 (0.0027) +[2024-11-08 01:51:32,931][41694] Fps is (10 sec: 5324.9, 60 sec: 6621.9, 300 sec: 6886.8). Total num frames: 83718144. Throughput: 0: 1606.0. Samples: 15923120. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:51:32,933][41694] Avg episode reward: [(0, '4.355')] +[2024-11-08 01:51:36,340][42004] Updated weights for policy 0, policy_version 20446 (0.0022) +[2024-11-08 01:51:37,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6621.9, 300 sec: 6886.8). Total num frames: 83755008. Throughput: 0: 1607.3. Samples: 15933678. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:51:37,935][41694] Avg episode reward: [(0, '4.382')] +[2024-11-08 01:51:37,992][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000020449_83759104.pth... +[2024-11-08 01:51:38,108][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000020045_82104320.pth +[2024-11-08 01:51:41,706][42004] Updated weights for policy 0, policy_version 20456 (0.0024) +[2024-11-08 01:51:42,931][41694] Fps is (10 sec: 7782.3, 60 sec: 6690.1, 300 sec: 6886.8). Total num frames: 83795968. Throughput: 0: 1701.7. Samples: 15945096. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:51:42,934][41694] Avg episode reward: [(0, '4.308')] +[2024-11-08 01:51:47,167][42004] Updated weights for policy 0, policy_version 20466 (0.0035) +[2024-11-08 01:51:47,931][41694] Fps is (10 sec: 7782.5, 60 sec: 6690.1, 300 sec: 6950.4). Total num frames: 83832832. Throughput: 0: 1692.0. Samples: 15950414. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:51:47,934][41694] Avg episode reward: [(0, '4.245')] +[2024-11-08 01:51:52,502][42004] Updated weights for policy 0, policy_version 20476 (0.0034) +[2024-11-08 01:51:52,932][41694] Fps is (10 sec: 7372.6, 60 sec: 6690.1, 300 sec: 6942.4). Total num frames: 83869696. Throughput: 0: 1721.5. Samples: 15962194. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:51:52,934][41694] Avg episode reward: [(0, '4.476')] +[2024-11-08 01:51:57,932][41694] Fps is (10 sec: 6962.7, 60 sec: 6894.9, 300 sec: 6928.5). Total num frames: 83902464. Throughput: 0: 1714.5. Samples: 15972514. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:51:57,935][41694] Avg episode reward: [(0, '4.498')] +[2024-11-08 01:51:58,531][42004] Updated weights for policy 0, policy_version 20486 (0.0028) +[2024-11-08 01:52:04,991][41694] Fps is (10 sec: 5774.2, 60 sec: 6732.2, 300 sec: 6880.5). Total num frames: 83939328. Throughput: 0: 1649.6. Samples: 15977878. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:52:04,994][41694] Avg episode reward: [(0, '4.472')] +[2024-11-08 01:52:06,563][42004] Updated weights for policy 0, policy_version 20496 (0.0026) +[2024-11-08 01:52:07,932][41694] Fps is (10 sec: 5734.6, 60 sec: 6621.8, 300 sec: 6886.8). Total num frames: 83959808. Throughput: 0: 1643.4. Samples: 15984738. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:52:07,934][41694] Avg episode reward: [(0, '4.550')] +[2024-11-08 01:52:11,953][42004] Updated weights for policy 0, policy_version 20506 (0.0038) +[2024-11-08 01:52:12,931][41694] Fps is (10 sec: 7221.7, 60 sec: 6690.2, 300 sec: 6886.8). Total num frames: 83996672. Throughput: 0: 1660.0. Samples: 15995834. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:52:12,935][41694] Avg episode reward: [(0, '4.330')] +[2024-11-08 01:52:17,136][42004] Updated weights for policy 0, policy_version 20516 (0.0030) +[2024-11-08 01:52:17,932][41694] Fps is (10 sec: 7782.6, 60 sec: 6758.4, 300 sec: 6886.8). Total num frames: 84037632. Throughput: 0: 1747.0. Samples: 16001736. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:52:17,934][41694] Avg episode reward: [(0, '4.303')] +[2024-11-08 01:52:22,271][42004] Updated weights for policy 0, policy_version 20526 (0.0035) +[2024-11-08 01:52:22,933][41694] Fps is (10 sec: 8191.1, 60 sec: 6894.8, 300 sec: 6951.7). Total num frames: 84078592. Throughput: 0: 1778.4. Samples: 16013708. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:52:22,934][41694] Avg episode reward: [(0, '4.435')] +[2024-11-08 01:52:27,775][42004] Updated weights for policy 0, policy_version 20536 (0.0026) +[2024-11-08 01:52:27,932][41694] Fps is (10 sec: 7782.2, 60 sec: 6963.2, 300 sec: 6956.3). Total num frames: 84115456. Throughput: 0: 1778.6. Samples: 16025132. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:52:27,940][41694] Avg episode reward: [(0, '4.596')] +[2024-11-08 01:52:32,931][41694] Fps is (10 sec: 6964.0, 60 sec: 7168.0, 300 sec: 6928.5). Total num frames: 84148224. Throughput: 0: 1766.3. Samples: 16029898. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:52:32,932][41694] Avg episode reward: [(0, '4.592')] +[2024-11-08 01:52:33,540][42004] Updated weights for policy 0, policy_version 20546 (0.0033) +[2024-11-08 01:52:39,430][41694] Fps is (10 sec: 5699.8, 60 sec: 6926.8, 300 sec: 6879.7). Total num frames: 84180992. Throughput: 0: 1694.7. Samples: 16040994. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:52:39,434][41694] Avg episode reward: [(0, '4.459')] +[2024-11-08 01:52:41,500][42004] Updated weights for policy 0, policy_version 20556 (0.0030) +[2024-11-08 01:52:42,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6826.7, 300 sec: 6859.1). Total num frames: 84205568. Throughput: 0: 1670.6. Samples: 16047690. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:52:42,934][41694] Avg episode reward: [(0, '4.352')] +[2024-11-08 01:52:47,039][42004] Updated weights for policy 0, policy_version 20566 (0.0027) +[2024-11-08 01:52:47,931][41694] Fps is (10 sec: 7226.9, 60 sec: 6826.7, 300 sec: 6859.1). Total num frames: 84242432. Throughput: 0: 1751.7. Samples: 16053098. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:52:47,933][41694] Avg episode reward: [(0, '4.508')] +[2024-11-08 01:52:52,223][42004] Updated weights for policy 0, policy_version 20576 (0.0023) +[2024-11-08 01:52:52,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6895.0, 300 sec: 6872.9). Total num frames: 84283392. Throughput: 0: 1783.3. Samples: 16064988. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:52:52,935][41694] Avg episode reward: [(0, '4.436')] +[2024-11-08 01:52:57,334][42004] Updated weights for policy 0, policy_version 20586 (0.0026) +[2024-11-08 01:52:57,932][41694] Fps is (10 sec: 8191.8, 60 sec: 7031.5, 300 sec: 6953.8). Total num frames: 84324352. Throughput: 0: 1804.9. Samples: 16077054. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:52:57,934][41694] Avg episode reward: [(0, '4.556')] +[2024-11-08 01:53:02,931][41694] Fps is (10 sec: 7372.9, 60 sec: 7210.7, 300 sec: 6928.5). Total num frames: 84357120. Throughput: 0: 1798.2. Samples: 16082656. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:53:02,933][41694] Avg episode reward: [(0, '4.581')] +[2024-11-08 01:53:03,121][42004] Updated weights for policy 0, policy_version 20596 (0.0022) +[2024-11-08 01:53:07,931][41694] Fps is (10 sec: 7372.9, 60 sec: 7304.6, 300 sec: 6942.4). Total num frames: 84398080. Throughput: 0: 1766.6. Samples: 16093204. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:53:07,934][41694] Avg episode reward: [(0, '4.222')] +[2024-11-08 01:53:08,369][42004] Updated weights for policy 0, policy_version 20606 (0.0035) +[2024-11-08 01:53:14,322][41694] Fps is (10 sec: 6113.0, 60 sec: 7005.6, 300 sec: 6882.2). Total num frames: 84426752. Throughput: 0: 1600.6. Samples: 16099386. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:53:14,324][41694] Avg episode reward: [(0, '4.557')] +[2024-11-08 01:53:16,286][42004] Updated weights for policy 0, policy_version 20616 (0.0022) +[2024-11-08 01:53:17,932][41694] Fps is (10 sec: 5324.7, 60 sec: 6894.9, 300 sec: 6872.9). Total num frames: 84451328. Throughput: 0: 1689.9. Samples: 16105942. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:53:17,933][41694] Avg episode reward: [(0, '4.590')] +[2024-11-08 01:53:21,855][42004] Updated weights for policy 0, policy_version 20626 (0.0034) +[2024-11-08 01:53:22,932][41694] Fps is (10 sec: 7612.1, 60 sec: 6895.0, 300 sec: 6872.9). Total num frames: 84492288. Throughput: 0: 1744.4. Samples: 16116880. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:53:22,933][41694] Avg episode reward: [(0, '4.402')] +[2024-11-08 01:53:27,350][42004] Updated weights for policy 0, policy_version 20636 (0.0042) +[2024-11-08 01:53:27,933][41694] Fps is (10 sec: 7781.7, 60 sec: 6894.8, 300 sec: 6886.8). Total num frames: 84529152. Throughput: 0: 1791.4. Samples: 16128304. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:53:27,935][41694] Avg episode reward: [(0, '4.619')] +[2024-11-08 01:53:32,555][42004] Updated weights for policy 0, policy_version 20646 (0.0024) +[2024-11-08 01:53:32,932][41694] Fps is (10 sec: 7372.9, 60 sec: 6963.2, 300 sec: 6942.4). Total num frames: 84566016. Throughput: 0: 1796.6. Samples: 16133946. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:53:32,935][41694] Avg episode reward: [(0, '4.540')] +[2024-11-08 01:53:37,931][41694] Fps is (10 sec: 7373.6, 60 sec: 7211.6, 300 sec: 6942.4). Total num frames: 84602880. Throughput: 0: 1774.0. Samples: 16144820. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:53:37,933][41694] Avg episode reward: [(0, '4.363')] +[2024-11-08 01:53:37,946][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000020655_84602880.pth... +[2024-11-08 01:53:38,076][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000020252_82952192.pth +[2024-11-08 01:53:38,460][42004] Updated weights for policy 0, policy_version 20656 (0.0032) +[2024-11-08 01:53:42,931][41694] Fps is (10 sec: 7372.9, 60 sec: 7236.3, 300 sec: 6928.5). Total num frames: 84639744. Throughput: 0: 1755.3. Samples: 16156040. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:53:42,935][41694] Avg episode reward: [(0, '4.652')] +[2024-11-08 01:53:43,729][42004] Updated weights for policy 0, policy_version 20666 (0.0024) +[2024-11-08 01:53:48,813][41694] Fps is (10 sec: 6022.8, 60 sec: 6997.0, 300 sec: 6880.2). Total num frames: 84668416. Throughput: 0: 1724.4. Samples: 16161772. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:53:48,816][41694] Avg episode reward: [(0, '4.345')] +[2024-11-08 01:53:51,434][42004] Updated weights for policy 0, policy_version 20676 (0.0027) +[2024-11-08 01:53:52,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6894.9, 300 sec: 6859.1). Total num frames: 84697088. Throughput: 0: 1683.9. Samples: 16168978. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:53:52,933][41694] Avg episode reward: [(0, '4.470')] +[2024-11-08 01:53:56,721][42004] Updated weights for policy 0, policy_version 20686 (0.0047) +[2024-11-08 01:53:57,931][41694] Fps is (10 sec: 7636.2, 60 sec: 6895.0, 300 sec: 6886.8). Total num frames: 84738048. Throughput: 0: 1858.4. Samples: 16180430. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:53:57,934][41694] Avg episode reward: [(0, '4.210')] +[2024-11-08 01:54:02,365][42004] Updated weights for policy 0, policy_version 20696 (0.0030) +[2024-11-08 01:54:02,932][41694] Fps is (10 sec: 7782.3, 60 sec: 6963.2, 300 sec: 6900.7). Total num frames: 84774912. Throughput: 0: 1779.9. Samples: 16186036. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:54:02,933][41694] Avg episode reward: [(0, '4.318')] +[2024-11-08 01:54:07,880][42004] Updated weights for policy 0, policy_version 20706 (0.0045) +[2024-11-08 01:54:07,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6894.9, 300 sec: 6956.3). Total num frames: 84811776. Throughput: 0: 1784.5. Samples: 16197182. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:54:07,933][41694] Avg episode reward: [(0, '4.386')] +[2024-11-08 01:54:12,932][41694] Fps is (10 sec: 6963.0, 60 sec: 7128.4, 300 sec: 6928.5). Total num frames: 84844544. Throughput: 0: 1755.8. Samples: 16207314. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:54:12,934][41694] Avg episode reward: [(0, '4.318')] +[2024-11-08 01:54:13,851][42004] Updated weights for policy 0, policy_version 20716 (0.0021) +[2024-11-08 01:54:17,932][41694] Fps is (10 sec: 6963.2, 60 sec: 7168.0, 300 sec: 6928.5). Total num frames: 84881408. Throughput: 0: 1753.5. Samples: 16212854. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:54:17,934][41694] Avg episode reward: [(0, '4.536')] +[2024-11-08 01:54:19,112][42004] Updated weights for policy 0, policy_version 20726 (0.0029) +[2024-11-08 01:54:22,931][41694] Fps is (10 sec: 6144.3, 60 sec: 6895.0, 300 sec: 6873.0). Total num frames: 84905984. Throughput: 0: 1764.2. Samples: 16224210. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:54:22,936][41694] Avg episode reward: [(0, '4.633')] +[2024-11-08 01:54:26,595][42004] Updated weights for policy 0, policy_version 20736 (0.0030) +[2024-11-08 01:54:27,932][41694] Fps is (10 sec: 6144.0, 60 sec: 6895.0, 300 sec: 6872.9). Total num frames: 84942848. Throughput: 0: 1688.7. Samples: 16232030. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:54:27,933][41694] Avg episode reward: [(0, '4.772')] +[2024-11-08 01:54:31,811][42004] Updated weights for policy 0, policy_version 20746 (0.0021) +[2024-11-08 01:54:32,931][41694] Fps is (10 sec: 7782.4, 60 sec: 6963.2, 300 sec: 6887.0). Total num frames: 84983808. Throughput: 0: 1719.5. Samples: 16237636. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:54:32,933][41694] Avg episode reward: [(0, '4.583')] +[2024-11-08 01:54:36,976][42004] Updated weights for policy 0, policy_version 20756 (0.0026) +[2024-11-08 01:54:37,931][41694] Fps is (10 sec: 7782.5, 60 sec: 6963.2, 300 sec: 6900.7). Total num frames: 85020672. Throughput: 0: 1794.9. Samples: 16249748. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:54:37,933][41694] Avg episode reward: [(0, '4.430')] +[2024-11-08 01:54:42,909][42004] Updated weights for policy 0, policy_version 20766 (0.0036) +[2024-11-08 01:54:42,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6963.2, 300 sec: 6956.3). Total num frames: 85057536. Throughput: 0: 1781.3. Samples: 16260590. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:54:42,935][41694] Avg episode reward: [(0, '4.344')] +[2024-11-08 01:54:47,931][41694] Fps is (10 sec: 6963.2, 60 sec: 7136.3, 300 sec: 6928.5). Total num frames: 85090304. Throughput: 0: 1765.7. Samples: 16265492. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:54:47,933][41694] Avg episode reward: [(0, '4.460')] +[2024-11-08 01:54:48,555][42004] Updated weights for policy 0, policy_version 20776 (0.0026) +[2024-11-08 01:54:52,931][41694] Fps is (10 sec: 7372.9, 60 sec: 7236.3, 300 sec: 6942.4). Total num frames: 85131264. Throughput: 0: 1772.1. Samples: 16276928. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:54:52,933][41694] Avg episode reward: [(0, '4.499')] +[2024-11-08 01:54:53,922][42004] Updated weights for policy 0, policy_version 20786 (0.0024) +[2024-11-08 01:54:57,932][41694] Fps is (10 sec: 6143.7, 60 sec: 6894.9, 300 sec: 6886.8). Total num frames: 85151744. Throughput: 0: 1722.5. Samples: 16284826. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:54:57,934][41694] Avg episode reward: [(0, '4.433')] +[2024-11-08 01:55:01,814][42004] Updated weights for policy 0, policy_version 20796 (0.0030) +[2024-11-08 01:55:02,931][41694] Fps is (10 sec: 5324.8, 60 sec: 6826.7, 300 sec: 6859.1). Total num frames: 85184512. Throughput: 0: 1699.4. Samples: 16289328. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:55:02,934][41694] Avg episode reward: [(0, '4.405')] +[2024-11-08 01:55:07,225][42004] Updated weights for policy 0, policy_version 20806 (0.0026) +[2024-11-08 01:55:07,932][41694] Fps is (10 sec: 7372.9, 60 sec: 6894.9, 300 sec: 6886.8). Total num frames: 85225472. Throughput: 0: 1693.1. Samples: 16300398. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:55:07,933][41694] Avg episode reward: [(0, '4.293')] +[2024-11-08 01:55:12,613][42004] Updated weights for policy 0, policy_version 20816 (0.0029) +[2024-11-08 01:55:12,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6963.2, 300 sec: 6914.6). Total num frames: 85262336. Throughput: 0: 1775.7. Samples: 16311938. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:55:12,935][41694] Avg episode reward: [(0, '4.404')] +[2024-11-08 01:55:17,932][41694] Fps is (10 sec: 6553.8, 60 sec: 6826.7, 300 sec: 6942.4). Total num frames: 85291008. Throughput: 0: 1758.1. Samples: 16316750. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:55:17,934][41694] Avg episode reward: [(0, '4.356')] +[2024-11-08 01:55:19,370][42004] Updated weights for policy 0, policy_version 20826 (0.0029) +[2024-11-08 01:55:22,932][41694] Fps is (10 sec: 6553.6, 60 sec: 7031.5, 300 sec: 6942.4). Total num frames: 85327872. Throughput: 0: 1698.4. Samples: 16326174. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:55:22,934][41694] Avg episode reward: [(0, '4.413')] +[2024-11-08 01:55:24,790][42004] Updated weights for policy 0, policy_version 20836 (0.0020) +[2024-11-08 01:55:27,932][41694] Fps is (10 sec: 7372.8, 60 sec: 7031.5, 300 sec: 6928.5). Total num frames: 85364736. Throughput: 0: 1713.4. Samples: 16337692. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:55:27,933][41694] Avg episode reward: [(0, '4.438')] +[2024-11-08 01:55:32,444][42004] Updated weights for policy 0, policy_version 20846 (0.0026) +[2024-11-08 01:55:32,932][41694] Fps is (10 sec: 6144.0, 60 sec: 6758.4, 300 sec: 6886.8). Total num frames: 85389312. Throughput: 0: 1725.6. Samples: 16343144. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:55:32,934][41694] Avg episode reward: [(0, '4.500')] +[2024-11-08 01:55:37,616][42004] Updated weights for policy 0, policy_version 20856 (0.0040) +[2024-11-08 01:55:37,933][41694] Fps is (10 sec: 6143.6, 60 sec: 6758.3, 300 sec: 6886.8). Total num frames: 85426176. Throughput: 0: 1640.9. Samples: 16350770. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:55:37,935][41694] Avg episode reward: [(0, '4.543')] +[2024-11-08 01:55:37,946][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000020856_85426176.pth... +[2024-11-08 01:55:38,071][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000020449_83759104.pth +[2024-11-08 01:55:42,926][42004] Updated weights for policy 0, policy_version 20866 (0.0039) +[2024-11-08 01:55:42,932][41694] Fps is (10 sec: 7782.3, 60 sec: 6826.7, 300 sec: 6900.7). Total num frames: 85467136. Throughput: 0: 1727.3. Samples: 16362552. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:55:42,934][41694] Avg episode reward: [(0, '4.512')] +[2024-11-08 01:55:47,931][41694] Fps is (10 sec: 7783.0, 60 sec: 6894.9, 300 sec: 6900.7). Total num frames: 85504000. Throughput: 0: 1756.7. Samples: 16368378. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:55:47,933][41694] Avg episode reward: [(0, '4.365')] +[2024-11-08 01:55:48,389][42004] Updated weights for policy 0, policy_version 20876 (0.0032) +[2024-11-08 01:55:52,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6758.4, 300 sec: 6942.4). Total num frames: 85536768. Throughput: 0: 1745.5. Samples: 16378946. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:55:52,934][41694] Avg episode reward: [(0, '4.550')] +[2024-11-08 01:55:54,320][42004] Updated weights for policy 0, policy_version 20886 (0.0033) +[2024-11-08 01:55:57,932][41694] Fps is (10 sec: 6963.2, 60 sec: 7031.5, 300 sec: 6956.3). Total num frames: 85573632. Throughput: 0: 1733.0. Samples: 16389922. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:55:57,934][41694] Avg episode reward: [(0, '4.564')] +[2024-11-08 01:55:59,788][42004] Updated weights for policy 0, policy_version 20896 (0.0026) +[2024-11-08 01:56:02,932][41694] Fps is (10 sec: 7372.9, 60 sec: 7099.7, 300 sec: 6942.4). Total num frames: 85610496. Throughput: 0: 1748.0. Samples: 16395410. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:56:02,933][41694] Avg episode reward: [(0, '4.439')] +[2024-11-08 01:56:07,797][42004] Updated weights for policy 0, policy_version 20906 (0.0049) +[2024-11-08 01:56:07,931][41694] Fps is (10 sec: 5734.4, 60 sec: 6758.4, 300 sec: 6900.7). Total num frames: 85630976. Throughput: 0: 1701.5. Samples: 16402742. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:56:07,933][41694] Avg episode reward: [(0, '4.453')] +[2024-11-08 01:56:12,932][41694] Fps is (10 sec: 5324.7, 60 sec: 6690.1, 300 sec: 6886.8). Total num frames: 85663744. Throughput: 0: 1659.3. Samples: 16412362. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-08 01:56:12,934][41694] Avg episode reward: [(0, '4.485')] +[2024-11-08 01:56:13,845][42004] Updated weights for policy 0, policy_version 20916 (0.0030) +[2024-11-08 01:56:17,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6826.7, 300 sec: 6900.7). Total num frames: 85700608. Throughput: 0: 1651.0. Samples: 16417438. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-08 01:56:17,933][41694] Avg episode reward: [(0, '4.286')] +[2024-11-08 01:56:19,457][42004] Updated weights for policy 0, policy_version 20926 (0.0025) +[2024-11-08 01:56:22,931][41694] Fps is (10 sec: 6963.4, 60 sec: 6758.4, 300 sec: 6900.7). Total num frames: 85733376. Throughput: 0: 1729.2. Samples: 16428584. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-08 01:56:22,933][41694] Avg episode reward: [(0, '4.490')] +[2024-11-08 01:56:25,437][42004] Updated weights for policy 0, policy_version 20936 (0.0032) +[2024-11-08 01:56:27,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6758.4, 300 sec: 6956.3). Total num frames: 85770240. Throughput: 0: 1694.9. Samples: 16438824. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:56:27,933][41694] Avg episode reward: [(0, '4.596')] +[2024-11-08 01:56:30,885][42004] Updated weights for policy 0, policy_version 20946 (0.0027) +[2024-11-08 01:56:32,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6963.2, 300 sec: 6956.3). Total num frames: 85807104. Throughput: 0: 1692.3. Samples: 16444530. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:56:32,936][41694] Avg episode reward: [(0, '4.358')] +[2024-11-08 01:56:36,207][42004] Updated weights for policy 0, policy_version 20956 (0.0025) +[2024-11-08 01:56:37,932][41694] Fps is (10 sec: 7782.4, 60 sec: 7031.5, 300 sec: 6956.3). Total num frames: 85848064. Throughput: 0: 1714.8. Samples: 16456112. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:56:37,934][41694] Avg episode reward: [(0, '4.733')] +[2024-11-08 01:56:42,931][41694] Fps is (10 sec: 6144.0, 60 sec: 6690.1, 300 sec: 6900.7). Total num frames: 85868544. Throughput: 0: 1622.5. Samples: 16462934. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:56:42,933][41694] Avg episode reward: [(0, '4.639')] +[2024-11-08 01:56:43,992][42004] Updated weights for policy 0, policy_version 20966 (0.0032) +[2024-11-08 01:56:47,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6690.1, 300 sec: 6900.7). Total num frames: 85905408. Throughput: 0: 1628.0. Samples: 16468670. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:56:47,933][41694] Avg episode reward: [(0, '4.605')] +[2024-11-08 01:56:49,389][42004] Updated weights for policy 0, policy_version 20976 (0.0026) +[2024-11-08 01:56:52,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6758.4, 300 sec: 6914.6). Total num frames: 85942272. Throughput: 0: 1720.7. Samples: 16480174. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:56:52,933][41694] Avg episode reward: [(0, '4.408')] +[2024-11-08 01:56:54,630][42004] Updated weights for policy 0, policy_version 20986 (0.0030) +[2024-11-08 01:56:57,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6758.4, 300 sec: 6963.2). Total num frames: 85979136. Throughput: 0: 1756.9. Samples: 16491420. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:56:57,935][41694] Avg episode reward: [(0, '4.177')] +[2024-11-08 01:57:00,874][42004] Updated weights for policy 0, policy_version 20996 (0.0034) +[2024-11-08 01:57:02,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6690.2, 300 sec: 6956.3). Total num frames: 86011904. Throughput: 0: 1750.4. Samples: 16496208. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:57:02,933][41694] Avg episode reward: [(0, '4.488')] +[2024-11-08 01:57:06,676][42004] Updated weights for policy 0, policy_version 21006 (0.0038) +[2024-11-08 01:57:07,932][41694] Fps is (10 sec: 6962.9, 60 sec: 6963.1, 300 sec: 6956.2). Total num frames: 86048768. Throughput: 0: 1729.2. Samples: 16506400. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:57:07,934][41694] Avg episode reward: [(0, '4.464')] +[2024-11-08 01:57:11,940][42004] Updated weights for policy 0, policy_version 21016 (0.0023) +[2024-11-08 01:57:12,931][41694] Fps is (10 sec: 7372.8, 60 sec: 7031.5, 300 sec: 6942.4). Total num frames: 86085632. Throughput: 0: 1761.4. Samples: 16518088. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:57:12,933][41694] Avg episode reward: [(0, '4.244')] +[2024-11-08 01:57:17,931][41694] Fps is (10 sec: 5734.6, 60 sec: 6758.4, 300 sec: 6873.0). Total num frames: 86106112. Throughput: 0: 1685.4. Samples: 16520374. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:57:17,933][41694] Avg episode reward: [(0, '4.193')] +[2024-11-08 01:57:19,622][42004] Updated weights for policy 0, policy_version 21026 (0.0028) +[2024-11-08 01:57:22,932][41694] Fps is (10 sec: 6143.6, 60 sec: 6894.9, 300 sec: 6886.8). Total num frames: 86147072. Throughput: 0: 1661.9. Samples: 16530898. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:57:22,937][41694] Avg episode reward: [(0, '4.543')] +[2024-11-08 01:57:24,863][42004] Updated weights for policy 0, policy_version 21036 (0.0022) +[2024-11-08 01:57:27,931][41694] Fps is (10 sec: 7782.5, 60 sec: 6894.9, 300 sec: 6900.7). Total num frames: 86183936. Throughput: 0: 1767.3. Samples: 16542462. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:57:27,933][41694] Avg episode reward: [(0, '4.359')] +[2024-11-08 01:57:30,340][42004] Updated weights for policy 0, policy_version 21046 (0.0026) +[2024-11-08 01:57:32,932][41694] Fps is (10 sec: 7373.1, 60 sec: 6894.9, 300 sec: 6949.9). Total num frames: 86220800. Throughput: 0: 1764.8. Samples: 16548086. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:57:32,934][41694] Avg episode reward: [(0, '4.229')] +[2024-11-08 01:57:36,112][42004] Updated weights for policy 0, policy_version 21056 (0.0025) +[2024-11-08 01:57:37,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6826.7, 300 sec: 6956.3). Total num frames: 86257664. Throughput: 0: 1744.8. Samples: 16558688. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:57:37,934][41694] Avg episode reward: [(0, '4.351')] +[2024-11-08 01:57:37,948][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000021059_86257664.pth... +[2024-11-08 01:57:38,090][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000020655_84602880.pth +[2024-11-08 01:57:41,849][42004] Updated weights for policy 0, policy_version 21066 (0.0040) +[2024-11-08 01:57:42,931][41694] Fps is (10 sec: 7372.9, 60 sec: 7099.7, 300 sec: 6956.3). Total num frames: 86294528. Throughput: 0: 1739.3. Samples: 16569688. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:57:42,934][41694] Avg episode reward: [(0, '4.590')] +[2024-11-08 01:57:46,994][42004] Updated weights for policy 0, policy_version 21076 (0.0024) +[2024-11-08 01:57:50,001][41694] Fps is (10 sec: 6108.5, 60 sec: 6863.0, 300 sec: 6894.0). Total num frames: 86331392. Throughput: 0: 1683.4. Samples: 16575446. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:57:50,004][41694] Avg episode reward: [(0, '4.545')] +[2024-11-08 01:57:52,932][41694] Fps is (10 sec: 6143.6, 60 sec: 6894.9, 300 sec: 6886.8). Total num frames: 86355968. Throughput: 0: 1701.3. Samples: 16582958. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:57:52,935][41694] Avg episode reward: [(0, '4.503')] +[2024-11-08 01:57:54,476][42004] Updated weights for policy 0, policy_version 21086 (0.0029) +[2024-11-08 01:57:57,932][41694] Fps is (10 sec: 7747.6, 60 sec: 6894.9, 300 sec: 6900.7). Total num frames: 86392832. Throughput: 0: 1692.9. Samples: 16594270. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:57:57,934][41694] Avg episode reward: [(0, '4.408')] +[2024-11-08 01:57:59,912][42004] Updated weights for policy 0, policy_version 21096 (0.0035) +[2024-11-08 01:58:02,931][41694] Fps is (10 sec: 7373.2, 60 sec: 6963.2, 300 sec: 6886.8). Total num frames: 86429696. Throughput: 0: 1771.6. Samples: 16600094. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:58:02,933][41694] Avg episode reward: [(0, '4.401')] +[2024-11-08 01:58:05,918][42004] Updated weights for policy 0, policy_version 21106 (0.0031) +[2024-11-08 01:58:07,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6895.0, 300 sec: 6933.4). Total num frames: 86462464. Throughput: 0: 1763.0. Samples: 16610234. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 01:58:07,934][41694] Avg episode reward: [(0, '4.265')] +[2024-11-08 01:58:11,745][42004] Updated weights for policy 0, policy_version 21116 (0.0038) +[2024-11-08 01:58:12,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6894.9, 300 sec: 6942.4). Total num frames: 86499328. Throughput: 0: 1742.8. Samples: 16620888. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:58:12,934][41694] Avg episode reward: [(0, '4.480')] +[2024-11-08 01:58:17,288][42004] Updated weights for policy 0, policy_version 21126 (0.0035) +[2024-11-08 01:58:17,932][41694] Fps is (10 sec: 7372.8, 60 sec: 7168.0, 300 sec: 6928.5). Total num frames: 86536192. Throughput: 0: 1742.0. Samples: 16626476. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:58:17,933][41694] Avg episode reward: [(0, '4.323')] +[2024-11-08 01:58:24,480][41694] Fps is (10 sec: 6029.4, 60 sec: 6854.6, 300 sec: 6878.5). Total num frames: 86568960. Throughput: 0: 1701.2. Samples: 16637878. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:58:24,482][41694] Avg episode reward: [(0, '4.540')] +[2024-11-08 01:58:24,774][42004] Updated weights for policy 0, policy_version 21136 (0.0029) +[2024-11-08 01:58:27,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6826.6, 300 sec: 6872.9). Total num frames: 86593536. Throughput: 0: 1677.4. Samples: 16645170. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:58:27,933][41694] Avg episode reward: [(0, '4.281')] +[2024-11-08 01:58:30,063][42004] Updated weights for policy 0, policy_version 21146 (0.0025) +[2024-11-08 01:58:32,931][41694] Fps is (10 sec: 7754.8, 60 sec: 6895.0, 300 sec: 6886.8). Total num frames: 86634496. Throughput: 0: 1757.2. Samples: 16650884. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:58:32,933][41694] Avg episode reward: [(0, '4.361')] +[2024-11-08 01:58:35,450][42004] Updated weights for policy 0, policy_version 21156 (0.0028) +[2024-11-08 01:58:37,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6894.9, 300 sec: 6886.8). Total num frames: 86671360. Throughput: 0: 1767.5. Samples: 16662494. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:58:37,933][41694] Avg episode reward: [(0, '4.341')] +[2024-11-08 01:58:41,287][42004] Updated weights for policy 0, policy_version 21166 (0.0037) +[2024-11-08 01:58:42,931][41694] Fps is (10 sec: 6963.1, 60 sec: 6826.7, 300 sec: 6921.4). Total num frames: 86704128. Throughput: 0: 1744.0. Samples: 16672752. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-08 01:58:42,933][41694] Avg episode reward: [(0, '4.471')] +[2024-11-08 01:58:46,699][42004] Updated weights for policy 0, policy_version 21176 (0.0029) +[2024-11-08 01:58:47,932][41694] Fps is (10 sec: 7372.8, 60 sec: 7141.3, 300 sec: 6942.4). Total num frames: 86745088. Throughput: 0: 1739.6. Samples: 16678376. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-08 01:58:47,934][41694] Avg episode reward: [(0, '4.443')] +[2024-11-08 01:58:51,993][42004] Updated weights for policy 0, policy_version 21186 (0.0027) +[2024-11-08 01:58:52,932][41694] Fps is (10 sec: 7782.1, 60 sec: 7099.8, 300 sec: 6928.5). Total num frames: 86781952. Throughput: 0: 1772.3. Samples: 16689988. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-08 01:58:52,934][41694] Avg episode reward: [(0, '4.317')] +[2024-11-08 01:58:58,969][41694] Fps is (10 sec: 5937.8, 60 sec: 6844.9, 300 sec: 6876.5). Total num frames: 86810624. Throughput: 0: 1630.4. Samples: 16695946. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:58:58,970][41694] Avg episode reward: [(0, '4.358')] +[2024-11-08 01:58:59,496][42004] Updated weights for policy 0, policy_version 21196 (0.0030) +[2024-11-08 01:59:02,931][41694] Fps is (10 sec: 6144.2, 60 sec: 6894.9, 300 sec: 6886.8). Total num frames: 86843392. Throughput: 0: 1705.5. Samples: 16703224. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:59:02,934][41694] Avg episode reward: [(0, '4.268')] +[2024-11-08 01:59:05,130][42004] Updated weights for policy 0, policy_version 21206 (0.0034) +[2024-11-08 01:59:07,933][41694] Fps is (10 sec: 7767.6, 60 sec: 6963.0, 300 sec: 6900.7). Total num frames: 86880256. Throughput: 0: 1761.3. Samples: 16714410. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 01:59:07,935][41694] Avg episode reward: [(0, '4.484')] +[2024-11-08 01:59:10,427][42004] Updated weights for policy 0, policy_version 21216 (0.0030) +[2024-11-08 01:59:12,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6963.2, 300 sec: 6900.7). Total num frames: 86917120. Throughput: 0: 1780.8. Samples: 16725308. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:59:12,934][41694] Avg episode reward: [(0, '4.618')] +[2024-11-08 01:59:16,412][42004] Updated weights for policy 0, policy_version 21226 (0.0024) +[2024-11-08 01:59:17,932][41694] Fps is (10 sec: 7373.7, 60 sec: 6963.2, 300 sec: 6942.4). Total num frames: 86953984. Throughput: 0: 1766.5. Samples: 16730378. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:59:17,938][41694] Avg episode reward: [(0, '4.362')] +[2024-11-08 01:59:21,789][42004] Updated weights for policy 0, policy_version 21236 (0.0031) +[2024-11-08 01:59:22,931][41694] Fps is (10 sec: 7372.8, 60 sec: 7217.8, 300 sec: 6942.4). Total num frames: 86990848. Throughput: 0: 1761.7. Samples: 16741772. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:59:22,934][41694] Avg episode reward: [(0, '4.560')] +[2024-11-08 01:59:27,071][42004] Updated weights for policy 0, policy_version 21246 (0.0025) +[2024-11-08 01:59:27,931][41694] Fps is (10 sec: 7373.1, 60 sec: 7236.3, 300 sec: 6928.5). Total num frames: 87027712. Throughput: 0: 1791.5. Samples: 16753370. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:59:27,933][41694] Avg episode reward: [(0, '4.291')] +[2024-11-08 01:59:33,463][41694] Fps is (10 sec: 5833.8, 60 sec: 6902.0, 300 sec: 6874.4). Total num frames: 87052288. Throughput: 0: 1772.7. Samples: 16759090. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:59:33,466][41694] Avg episode reward: [(0, '4.395')] +[2024-11-08 01:59:34,815][42004] Updated weights for policy 0, policy_version 21256 (0.0047) +[2024-11-08 01:59:37,931][41694] Fps is (10 sec: 5734.4, 60 sec: 6894.9, 300 sec: 6873.0). Total num frames: 87085056. Throughput: 0: 1683.1. Samples: 16765726. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:59:37,933][41694] Avg episode reward: [(0, '4.546')] +[2024-11-08 01:59:37,943][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000021261_87085056.pth... +[2024-11-08 01:59:38,042][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000020856_85426176.pth +[2024-11-08 01:59:40,548][42004] Updated weights for policy 0, policy_version 21266 (0.0029) +[2024-11-08 01:59:42,931][41694] Fps is (10 sec: 7354.3, 60 sec: 6963.2, 300 sec: 6886.8). Total num frames: 87121920. Throughput: 0: 1832.1. Samples: 16776488. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 01:59:42,934][41694] Avg episode reward: [(0, '4.574')] +[2024-11-08 01:59:46,398][42004] Updated weights for policy 0, policy_version 21276 (0.0040) +[2024-11-08 01:59:47,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6826.7, 300 sec: 6859.1). Total num frames: 87154688. Throughput: 0: 1744.3. Samples: 16781720. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 01:59:47,933][41694] Avg episode reward: [(0, '4.416')] +[2024-11-08 01:59:52,453][42004] Updated weights for policy 0, policy_version 21286 (0.0030) +[2024-11-08 01:59:52,932][41694] Fps is (10 sec: 6553.0, 60 sec: 6758.3, 300 sec: 6900.7). Total num frames: 87187456. Throughput: 0: 1725.9. Samples: 16792072. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 01:59:52,935][41694] Avg episode reward: [(0, '4.506')] +[2024-11-08 01:59:57,828][42004] Updated weights for policy 0, policy_version 21296 (0.0032) +[2024-11-08 01:59:57,932][41694] Fps is (10 sec: 7372.6, 60 sec: 7085.6, 300 sec: 6928.5). Total num frames: 87228416. Throughput: 0: 1728.3. Samples: 16803082. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 01:59:57,934][41694] Avg episode reward: [(0, '4.507')] +[2024-11-08 02:00:02,932][41694] Fps is (10 sec: 7373.4, 60 sec: 6963.2, 300 sec: 6900.7). Total num frames: 87261184. Throughput: 0: 1739.3. Samples: 16808646. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:00:02,935][41694] Avg episode reward: [(0, '4.267')] +[2024-11-08 02:00:03,491][42004] Updated weights for policy 0, policy_version 21306 (0.0031) +[2024-11-08 02:00:07,953][41694] Fps is (10 sec: 5722.6, 60 sec: 6756.2, 300 sec: 6858.6). Total num frames: 87285760. Throughput: 0: 1604.9. Samples: 16814026. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:00:07,955][41694] Avg episode reward: [(0, '4.471')] +[2024-11-08 02:00:11,151][42004] Updated weights for policy 0, policy_version 21316 (0.2265) +[2024-11-08 02:00:12,932][41694] Fps is (10 sec: 6144.0, 60 sec: 6758.4, 300 sec: 6886.8). Total num frames: 87322624. Throughput: 0: 1629.2. Samples: 16826686. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:00:12,933][41694] Avg episode reward: [(0, '4.294')] +[2024-11-08 02:00:17,448][42004] Updated weights for policy 0, policy_version 21326 (0.0028) +[2024-11-08 02:00:17,932][41694] Fps is (10 sec: 6567.4, 60 sec: 6621.9, 300 sec: 6859.1). Total num frames: 87351296. Throughput: 0: 1627.9. Samples: 16831478. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:00:17,933][41694] Avg episode reward: [(0, '4.210')] +[2024-11-08 02:00:22,932][41694] Fps is (10 sec: 6143.8, 60 sec: 6553.6, 300 sec: 6845.2). Total num frames: 87384064. Throughput: 0: 1674.5. Samples: 16841080. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:00:22,939][41694] Avg episode reward: [(0, '4.423')] +[2024-11-08 02:00:23,556][42004] Updated weights for policy 0, policy_version 21336 (0.0038) +[2024-11-08 02:00:27,932][41694] Fps is (10 sec: 7372.6, 60 sec: 6621.8, 300 sec: 6900.7). Total num frames: 87425024. Throughput: 0: 1685.5. Samples: 16852336. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:00:27,935][41694] Avg episode reward: [(0, '4.357')] +[2024-11-08 02:00:28,842][42004] Updated weights for policy 0, policy_version 21346 (0.0030) +[2024-11-08 02:00:32,931][41694] Fps is (10 sec: 7782.7, 60 sec: 6887.7, 300 sec: 6900.7). Total num frames: 87461888. Throughput: 0: 1694.7. Samples: 16857980. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:00:32,933][41694] Avg episode reward: [(0, '4.390')] +[2024-11-08 02:00:34,200][42004] Updated weights for policy 0, policy_version 21356 (0.0035) +[2024-11-08 02:00:37,932][41694] Fps is (10 sec: 7373.0, 60 sec: 6894.9, 300 sec: 6886.8). Total num frames: 87498752. Throughput: 0: 1719.8. Samples: 16869462. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:00:37,933][41694] Avg episode reward: [(0, '4.361')] +[2024-11-08 02:00:39,705][42004] Updated weights for policy 0, policy_version 21366 (0.0029) +[2024-11-08 02:00:42,931][41694] Fps is (10 sec: 5734.4, 60 sec: 6621.9, 300 sec: 6831.3). Total num frames: 87519232. Throughput: 0: 1647.4. Samples: 16877212. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:00:42,933][41694] Avg episode reward: [(0, '4.411')] +[2024-11-08 02:00:47,271][42004] Updated weights for policy 0, policy_version 21376 (0.0035) +[2024-11-08 02:00:47,932][41694] Fps is (10 sec: 6143.9, 60 sec: 6758.4, 300 sec: 6859.1). Total num frames: 87560192. Throughput: 0: 1630.2. Samples: 16882004. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:00:47,935][41694] Avg episode reward: [(0, '4.544')] +[2024-11-08 02:00:52,894][42004] Updated weights for policy 0, policy_version 21386 (0.0034) +[2024-11-08 02:00:52,931][41694] Fps is (10 sec: 7782.4, 60 sec: 6826.8, 300 sec: 6859.1). Total num frames: 87597056. Throughput: 0: 1765.8. Samples: 16893448. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:00:52,933][41694] Avg episode reward: [(0, '4.402')] +[2024-11-08 02:00:57,931][41694] Fps is (10 sec: 6963.4, 60 sec: 6690.2, 300 sec: 6845.2). Total num frames: 87629824. Throughput: 0: 1716.9. Samples: 16903948. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:00:57,934][41694] Avg episode reward: [(0, '4.438')] +[2024-11-08 02:00:58,588][42004] Updated weights for policy 0, policy_version 21396 (0.0025) +[2024-11-08 02:01:02,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6758.4, 300 sec: 6900.7). Total num frames: 87666688. Throughput: 0: 1740.2. Samples: 16909786. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:01:02,933][41694] Avg episode reward: [(0, '4.381')] +[2024-11-08 02:01:04,170][42004] Updated weights for policy 0, policy_version 21406 (0.0035) +[2024-11-08 02:01:07,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6965.6, 300 sec: 6914.6). Total num frames: 87703552. Throughput: 0: 1764.1. Samples: 16920466. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-11-08 02:01:07,935][41694] Avg episode reward: [(0, '4.500')] +[2024-11-08 02:01:10,178][42004] Updated weights for policy 0, policy_version 21416 (0.0043) +[2024-11-08 02:01:12,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6894.9, 300 sec: 6900.7). Total num frames: 87736320. Throughput: 0: 1734.8. Samples: 16930402. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-11-08 02:01:12,933][41694] Avg episode reward: [(0, '4.229')] +[2024-11-08 02:01:17,932][41694] Fps is (10 sec: 5324.8, 60 sec: 6758.4, 300 sec: 6859.1). Total num frames: 87756800. Throughput: 0: 1712.0. Samples: 16935018. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-11-08 02:01:17,933][41694] Avg episode reward: [(0, '4.482')] +[2024-11-08 02:01:18,351][42004] Updated weights for policy 0, policy_version 21426 (0.0024) +[2024-11-08 02:01:22,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6826.7, 300 sec: 6859.1). Total num frames: 87793664. Throughput: 0: 1628.9. Samples: 16942762. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:01:22,934][41694] Avg episode reward: [(0, '4.410')] +[2024-11-08 02:01:24,116][42004] Updated weights for policy 0, policy_version 21436 (0.0028) +[2024-11-08 02:01:27,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6690.2, 300 sec: 6845.2). Total num frames: 87826432. Throughput: 0: 1681.5. Samples: 16952880. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:01:27,933][41694] Avg episode reward: [(0, '4.494')] +[2024-11-08 02:01:30,097][42004] Updated weights for policy 0, policy_version 21446 (0.0021) +[2024-11-08 02:01:32,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6690.1, 300 sec: 6831.3). Total num frames: 87863296. Throughput: 0: 1691.9. Samples: 16958140. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:01:32,936][41694] Avg episode reward: [(0, '4.245')] +[2024-11-08 02:01:35,604][42004] Updated weights for policy 0, policy_version 21456 (0.0026) +[2024-11-08 02:01:37,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6690.1, 300 sec: 6886.8). Total num frames: 87900160. Throughput: 0: 1688.7. Samples: 16969442. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 02:01:37,934][41694] Avg episode reward: [(0, '4.465')] +[2024-11-08 02:01:37,952][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000021460_87900160.pth... +[2024-11-08 02:01:38,079][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000021059_86257664.pth +[2024-11-08 02:01:41,040][42004] Updated weights for policy 0, policy_version 21466 (0.0034) +[2024-11-08 02:01:42,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6963.2, 300 sec: 6886.8). Total num frames: 87937024. Throughput: 0: 1704.9. Samples: 16980670. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 02:01:42,933][41694] Avg episode reward: [(0, '4.571')] +[2024-11-08 02:01:46,423][42004] Updated weights for policy 0, policy_version 21476 (0.0029) +[2024-11-08 02:01:47,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6895.0, 300 sec: 6886.8). Total num frames: 87973888. Throughput: 0: 1695.4. Samples: 16986078. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 02:01:47,934][41694] Avg episode reward: [(0, '4.614')] +[2024-11-08 02:01:52,931][41694] Fps is (10 sec: 5734.4, 60 sec: 6621.9, 300 sec: 6831.3). Total num frames: 87994368. Throughput: 0: 1632.5. Samples: 16993928. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-11-08 02:01:52,933][41694] Avg episode reward: [(0, '4.382')] +[2024-11-08 02:01:54,267][42004] Updated weights for policy 0, policy_version 21486 (0.0032) +[2024-11-08 02:01:57,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6690.1, 300 sec: 6845.2). Total num frames: 88031232. Throughput: 0: 1642.7. Samples: 17004322. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-11-08 02:01:57,935][41694] Avg episode reward: [(0, '4.630')] +[2024-11-08 02:01:59,943][42004] Updated weights for policy 0, policy_version 21496 (0.0025) +[2024-11-08 02:02:02,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6621.9, 300 sec: 6831.3). Total num frames: 88064000. Throughput: 0: 1658.5. Samples: 17009652. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-11-08 02:02:02,933][41694] Avg episode reward: [(0, '4.624')] +[2024-11-08 02:02:06,023][42004] Updated weights for policy 0, policy_version 21506 (0.0022) +[2024-11-08 02:02:07,932][41694] Fps is (10 sec: 6962.9, 60 sec: 6621.8, 300 sec: 6831.3). Total num frames: 88100864. Throughput: 0: 1709.8. Samples: 17019702. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:02:07,936][41694] Avg episode reward: [(0, '4.328')] +[2024-11-08 02:02:11,641][42004] Updated weights for policy 0, policy_version 21516 (0.0032) +[2024-11-08 02:02:12,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6690.1, 300 sec: 6886.8). Total num frames: 88137728. Throughput: 0: 1729.6. Samples: 17030714. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:02:12,934][41694] Avg episode reward: [(0, '4.452')] +[2024-11-08 02:02:16,932][42004] Updated weights for policy 0, policy_version 21526 (0.0025) +[2024-11-08 02:02:17,932][41694] Fps is (10 sec: 7373.1, 60 sec: 6963.2, 300 sec: 6873.0). Total num frames: 88174592. Throughput: 0: 1739.2. Samples: 17036406. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:02:17,934][41694] Avg episode reward: [(0, '4.506')] +[2024-11-08 02:02:22,244][42004] Updated weights for policy 0, policy_version 21536 (0.0033) +[2024-11-08 02:02:22,932][41694] Fps is (10 sec: 7782.3, 60 sec: 7031.5, 300 sec: 6886.8). Total num frames: 88215552. Throughput: 0: 1746.3. Samples: 17048026. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:02:22,933][41694] Avg episode reward: [(0, '4.333')] +[2024-11-08 02:02:27,931][41694] Fps is (10 sec: 6144.1, 60 sec: 6826.7, 300 sec: 6831.3). Total num frames: 88236032. Throughput: 0: 1655.6. Samples: 17055174. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:02:27,933][41694] Avg episode reward: [(0, '4.510')] +[2024-11-08 02:02:29,902][42004] Updated weights for policy 0, policy_version 21546 (0.0032) +[2024-11-08 02:02:32,931][41694] Fps is (10 sec: 5734.4, 60 sec: 6826.7, 300 sec: 6831.3). Total num frames: 88272896. Throughput: 0: 1661.6. Samples: 17060848. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:02:32,934][41694] Avg episode reward: [(0, '4.586')] +[2024-11-08 02:02:35,664][42004] Updated weights for policy 0, policy_version 21556 (0.0029) +[2024-11-08 02:02:37,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6826.7, 300 sec: 6831.3). Total num frames: 88309760. Throughput: 0: 1725.9. Samples: 17071596. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:02:37,935][41694] Avg episode reward: [(0, '4.594')] +[2024-11-08 02:02:41,056][42004] Updated weights for policy 0, policy_version 21566 (0.0025) +[2024-11-08 02:02:42,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6826.7, 300 sec: 6879.6). Total num frames: 88346624. Throughput: 0: 1753.0. Samples: 17083206. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:02:42,933][41694] Avg episode reward: [(0, '4.265')] +[2024-11-08 02:02:46,667][42004] Updated weights for policy 0, policy_version 21576 (0.0026) +[2024-11-08 02:02:47,938][41694] Fps is (10 sec: 7372.8, 60 sec: 6826.7, 300 sec: 6873.0). Total num frames: 88383488. Throughput: 0: 1744.5. Samples: 17088156. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:02:47,941][41694] Avg episode reward: [(0, '4.311')] +[2024-11-08 02:02:52,073][42004] Updated weights for policy 0, policy_version 21586 (0.0021) +[2024-11-08 02:02:52,931][41694] Fps is (10 sec: 7372.7, 60 sec: 7099.7, 300 sec: 6873.0). Total num frames: 88420352. Throughput: 0: 1775.9. Samples: 17099618. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 02:02:52,934][41694] Avg episode reward: [(0, '4.664')] +[2024-11-08 02:02:57,398][42004] Updated weights for policy 0, policy_version 21596 (0.0022) +[2024-11-08 02:02:57,931][41694] Fps is (10 sec: 7372.9, 60 sec: 7099.8, 300 sec: 6873.0). Total num frames: 88457216. Throughput: 0: 1786.8. Samples: 17111122. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 02:02:57,933][41694] Avg episode reward: [(0, '4.353')] +[2024-11-08 02:03:02,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6894.9, 300 sec: 6831.3). Total num frames: 88477696. Throughput: 0: 1708.1. Samples: 17113272. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 02:03:02,934][41694] Avg episode reward: [(0, '4.477')] +[2024-11-08 02:03:05,508][42004] Updated weights for policy 0, policy_version 21606 (0.0022) +[2024-11-08 02:03:07,932][41694] Fps is (10 sec: 5324.7, 60 sec: 6826.7, 300 sec: 6817.4). Total num frames: 88510464. Throughput: 0: 1666.0. Samples: 17122996. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:03:07,940][41694] Avg episode reward: [(0, '4.562')] +[2024-11-08 02:03:11,711][42004] Updated weights for policy 0, policy_version 21616 (0.0032) +[2024-11-08 02:03:12,936][41694] Fps is (10 sec: 6960.3, 60 sec: 6826.2, 300 sec: 6817.3). Total num frames: 88547328. Throughput: 0: 1724.9. Samples: 17132802. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:03:12,941][41694] Avg episode reward: [(0, '4.464')] +[2024-11-08 02:03:17,052][42004] Updated weights for policy 0, policy_version 21626 (0.0027) +[2024-11-08 02:03:17,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6826.7, 300 sec: 6867.3). Total num frames: 88584192. Throughput: 0: 1720.4. Samples: 17138266. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:03:17,934][41694] Avg episode reward: [(0, '4.349')] +[2024-11-08 02:03:22,298][42004] Updated weights for policy 0, policy_version 21636 (0.0034) +[2024-11-08 02:03:22,932][41694] Fps is (10 sec: 7785.3, 60 sec: 6826.6, 300 sec: 6886.8). Total num frames: 88625152. Throughput: 0: 1746.1. Samples: 17150172. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 02:03:22,935][41694] Avg episode reward: [(0, '4.232')] +[2024-11-08 02:03:27,809][42004] Updated weights for policy 0, policy_version 21646 (0.0038) +[2024-11-08 02:03:27,931][41694] Fps is (10 sec: 7782.6, 60 sec: 7099.7, 300 sec: 6872.9). Total num frames: 88662016. Throughput: 0: 1741.1. Samples: 17161554. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 02:03:27,934][41694] Avg episode reward: [(0, '4.398')] +[2024-11-08 02:03:34,801][41694] Fps is (10 sec: 6211.8, 60 sec: 6885.2, 300 sec: 6829.7). Total num frames: 88698880. Throughput: 0: 1685.4. Samples: 17167150. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 02:03:34,804][41694] Avg episode reward: [(0, '4.353')] +[2024-11-08 02:03:35,288][42004] Updated weights for policy 0, policy_version 21656 (0.0032) +[2024-11-08 02:03:37,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6826.7, 300 sec: 6831.3). Total num frames: 88719360. Throughput: 0: 1666.5. Samples: 17174612. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:03:37,934][41694] Avg episode reward: [(0, '4.373')] +[2024-11-08 02:03:37,944][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000021660_88719360.pth... +[2024-11-08 02:03:38,112][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000021261_87085056.pth +[2024-11-08 02:03:41,126][42004] Updated weights for policy 0, policy_version 21666 (0.0045) +[2024-11-08 02:03:42,931][41694] Fps is (10 sec: 6549.2, 60 sec: 6758.4, 300 sec: 6803.5). Total num frames: 88752128. Throughput: 0: 1642.6. Samples: 17185040. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:03:42,933][41694] Avg episode reward: [(0, '4.548')] +[2024-11-08 02:03:47,001][42004] Updated weights for policy 0, policy_version 21676 (0.0028) +[2024-11-08 02:03:47,932][41694] Fps is (10 sec: 6963.0, 60 sec: 6758.4, 300 sec: 6803.5). Total num frames: 88788992. Throughput: 0: 1703.6. Samples: 17189934. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:03:47,934][41694] Avg episode reward: [(0, '4.352')] +[2024-11-08 02:03:52,311][42004] Updated weights for policy 0, policy_version 21686 (0.0028) +[2024-11-08 02:03:52,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6826.7, 300 sec: 6869.3). Total num frames: 88829952. Throughput: 0: 1749.8. Samples: 17201738. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:03:52,934][41694] Avg episode reward: [(0, '4.416')] +[2024-11-08 02:03:57,856][42004] Updated weights for policy 0, policy_version 21696 (0.0040) +[2024-11-08 02:03:57,931][41694] Fps is (10 sec: 7782.8, 60 sec: 6826.7, 300 sec: 6859.1). Total num frames: 88866816. Throughput: 0: 1775.2. Samples: 17212680. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:03:57,933][41694] Avg episode reward: [(0, '4.484')] +[2024-11-08 02:04:02,933][41694] Fps is (10 sec: 6962.3, 60 sec: 7031.3, 300 sec: 6845.2). Total num frames: 88899584. Throughput: 0: 1776.6. Samples: 17218214. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:04:02,935][41694] Avg episode reward: [(0, '4.353')] +[2024-11-08 02:04:03,861][42004] Updated weights for policy 0, policy_version 21706 (0.0040) +[2024-11-08 02:04:09,457][41694] Fps is (10 sec: 5330.7, 60 sec: 6790.5, 300 sec: 6782.3). Total num frames: 88928256. Throughput: 0: 1667.1. Samples: 17227736. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:04:09,459][41694] Avg episode reward: [(0, '4.329')] +[2024-11-08 02:04:12,392][42004] Updated weights for policy 0, policy_version 21716 (0.0041) +[2024-11-08 02:04:12,932][41694] Fps is (10 sec: 4915.8, 60 sec: 6690.6, 300 sec: 6761.9). Total num frames: 88948736. Throughput: 0: 1613.4. Samples: 17234158. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:04:12,934][41694] Avg episode reward: [(0, '4.178')] +[2024-11-08 02:04:17,931][41694] Fps is (10 sec: 6283.4, 60 sec: 6621.9, 300 sec: 6748.0). Total num frames: 88981504. Throughput: 0: 1668.7. Samples: 17239122. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:04:17,933][41694] Avg episode reward: [(0, '4.333')] +[2024-11-08 02:04:18,565][42004] Updated weights for policy 0, policy_version 21726 (0.0024) +[2024-11-08 02:04:22,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6553.6, 300 sec: 6748.0). Total num frames: 89018368. Throughput: 0: 1663.5. Samples: 17249470. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:04:22,934][41694] Avg episode reward: [(0, '4.439')] +[2024-11-08 02:04:23,957][42004] Updated weights for policy 0, policy_version 21736 (0.0025) +[2024-11-08 02:04:27,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6621.9, 300 sec: 6815.8). Total num frames: 89059328. Throughput: 0: 1692.8. Samples: 17261218. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:04:27,934][41694] Avg episode reward: [(0, '4.362')] +[2024-11-08 02:04:29,157][42004] Updated weights for policy 0, policy_version 21746 (0.0021) +[2024-11-08 02:04:32,931][41694] Fps is (10 sec: 8192.1, 60 sec: 6905.3, 300 sec: 6831.3). Total num frames: 89100288. Throughput: 0: 1718.3. Samples: 17267256. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:04:32,933][41694] Avg episode reward: [(0, '4.261')] +[2024-11-08 02:04:34,398][42004] Updated weights for policy 0, policy_version 21756 (0.0026) +[2024-11-08 02:04:37,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6963.2, 300 sec: 6831.3). Total num frames: 89137152. Throughput: 0: 1713.5. Samples: 17278846. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:04:37,934][41694] Avg episode reward: [(0, '4.737')] +[2024-11-08 02:04:39,814][42004] Updated weights for policy 0, policy_version 21766 (0.0042) +[2024-11-08 02:04:44,090][41694] Fps is (10 sec: 5873.4, 60 sec: 6764.4, 300 sec: 6790.8). Total num frames: 89165824. Throughput: 0: 1561.2. Samples: 17284740. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:04:44,094][41694] Avg episode reward: [(0, '4.589')] +[2024-11-08 02:04:47,931][41694] Fps is (10 sec: 5324.9, 60 sec: 6690.2, 300 sec: 6789.7). Total num frames: 89190400. Throughput: 0: 1616.6. Samples: 17290958. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:04:47,933][41694] Avg episode reward: [(0, '4.468')] +[2024-11-08 02:04:47,961][42004] Updated weights for policy 0, policy_version 21776 (0.0034) +[2024-11-08 02:04:52,931][41694] Fps is (10 sec: 6948.7, 60 sec: 6621.9, 300 sec: 6775.8). Total num frames: 89227264. Throughput: 0: 1683.6. Samples: 17300928. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:04:52,934][41694] Avg episode reward: [(0, '4.359')] +[2024-11-08 02:04:54,002][42004] Updated weights for policy 0, policy_version 21786 (0.0024) +[2024-11-08 02:04:57,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6621.9, 300 sec: 6789.6). Total num frames: 89264128. Throughput: 0: 1731.0. Samples: 17312052. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:04:57,934][41694] Avg episode reward: [(0, '4.475')] +[2024-11-08 02:04:59,420][42004] Updated weights for policy 0, policy_version 21796 (0.0025) +[2024-11-08 02:05:02,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6690.3, 300 sec: 6831.8). Total num frames: 89300992. Throughput: 0: 1752.2. Samples: 17317970. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:05:02,933][41694] Avg episode reward: [(0, '4.620')] +[2024-11-08 02:05:04,724][42004] Updated weights for policy 0, policy_version 21806 (0.0022) +[2024-11-08 02:05:07,932][41694] Fps is (10 sec: 7782.3, 60 sec: 7074.8, 300 sec: 6845.2). Total num frames: 89341952. Throughput: 0: 1780.7. Samples: 17329600. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:05:07,934][41694] Avg episode reward: [(0, '4.539')] +[2024-11-08 02:05:10,022][42004] Updated weights for policy 0, policy_version 21816 (0.0026) +[2024-11-08 02:05:12,931][41694] Fps is (10 sec: 7372.8, 60 sec: 7099.7, 300 sec: 6859.1). Total num frames: 89374720. Throughput: 0: 1767.0. Samples: 17340734. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:05:12,934][41694] Avg episode reward: [(0, '4.493')] +[2024-11-08 02:05:16,105][42004] Updated weights for policy 0, policy_version 21826 (0.0036) +[2024-11-08 02:05:18,548][41694] Fps is (10 sec: 5401.3, 60 sec: 6892.3, 300 sec: 6817.0). Total num frames: 89399296. Throughput: 0: 1713.7. Samples: 17345428. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:05:18,553][41694] Avg episode reward: [(0, '4.432')] +[2024-11-08 02:05:22,932][41694] Fps is (10 sec: 5324.8, 60 sec: 6826.7, 300 sec: 6789.6). Total num frames: 89427968. Throughput: 0: 1622.5. Samples: 17351860. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 02:05:22,934][41694] Avg episode reward: [(0, '4.390')] +[2024-11-08 02:05:24,607][42004] Updated weights for policy 0, policy_version 21836 (0.0031) +[2024-11-08 02:05:27,931][41694] Fps is (10 sec: 6548.0, 60 sec: 6690.1, 300 sec: 6775.8). Total num frames: 89460736. Throughput: 0: 1752.7. Samples: 17361580. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 02:05:27,934][41694] Avg episode reward: [(0, '4.399')] +[2024-11-08 02:05:30,120][42004] Updated weights for policy 0, policy_version 21846 (0.0040) +[2024-11-08 02:05:32,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6690.1, 300 sec: 6789.6). Total num frames: 89501696. Throughput: 0: 1700.9. Samples: 17367500. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 02:05:32,933][41694] Avg episode reward: [(0, '4.486')] +[2024-11-08 02:05:35,562][42004] Updated weights for policy 0, policy_version 21856 (0.0028) +[2024-11-08 02:05:37,932][41694] Fps is (10 sec: 7782.3, 60 sec: 6690.1, 300 sec: 6845.2). Total num frames: 89538560. Throughput: 0: 1735.9. Samples: 17379046. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:05:37,933][41694] Avg episode reward: [(0, '4.283')] +[2024-11-08 02:05:37,947][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000021860_89538560.pth... +[2024-11-08 02:05:38,063][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000021460_87900160.pth +[2024-11-08 02:05:40,725][42004] Updated weights for policy 0, policy_version 21866 (0.0033) +[2024-11-08 02:05:42,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6961.0, 300 sec: 6831.3). Total num frames: 89575424. Throughput: 0: 1752.4. Samples: 17390912. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:05:42,934][41694] Avg episode reward: [(0, '4.320')] +[2024-11-08 02:05:46,155][42004] Updated weights for policy 0, policy_version 21876 (0.0027) +[2024-11-08 02:05:47,937][41694] Fps is (10 sec: 7777.9, 60 sec: 7099.0, 300 sec: 6845.0). Total num frames: 89616384. Throughput: 0: 1740.2. Samples: 17396288. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:05:47,940][41694] Avg episode reward: [(0, '4.582')] +[2024-11-08 02:05:53,065][41694] Fps is (10 sec: 6062.9, 60 sec: 6811.5, 300 sec: 6800.4). Total num frames: 89636864. Throughput: 0: 1602.4. Samples: 17401924. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:05:53,067][41694] Avg episode reward: [(0, '4.622')] +[2024-11-08 02:05:54,020][42004] Updated weights for policy 0, policy_version 21886 (0.0025) +[2024-11-08 02:05:57,932][41694] Fps is (10 sec: 5327.8, 60 sec: 6758.4, 300 sec: 6789.6). Total num frames: 89669632. Throughput: 0: 1622.1. Samples: 17413730. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:05:57,934][41694] Avg episode reward: [(0, '4.554')] +[2024-11-08 02:06:00,176][42004] Updated weights for policy 0, policy_version 21896 (0.0031) +[2024-11-08 02:06:02,932][41694] Fps is (10 sec: 6642.2, 60 sec: 6690.1, 300 sec: 6775.8). Total num frames: 89702400. Throughput: 0: 1654.9. Samples: 17418878. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:06:02,943][41694] Avg episode reward: [(0, '4.441')] +[2024-11-08 02:06:05,748][42004] Updated weights for policy 0, policy_version 21906 (0.0027) +[2024-11-08 02:06:07,932][41694] Fps is (10 sec: 6963.3, 60 sec: 6621.9, 300 sec: 6789.6). Total num frames: 89739264. Throughput: 0: 1727.1. Samples: 17429580. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:06:07,935][41694] Avg episode reward: [(0, '4.321')] +[2024-11-08 02:06:11,480][42004] Updated weights for policy 0, policy_version 21916 (0.0043) +[2024-11-08 02:06:12,933][41694] Fps is (10 sec: 7371.8, 60 sec: 6690.0, 300 sec: 6845.1). Total num frames: 89776128. Throughput: 0: 1752.9. Samples: 17440464. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:06:12,935][41694] Avg episode reward: [(0, '4.543')] +[2024-11-08 02:06:17,182][42004] Updated weights for policy 0, policy_version 21926 (0.0026) +[2024-11-08 02:06:17,931][41694] Fps is (10 sec: 7373.0, 60 sec: 6966.6, 300 sec: 6845.2). Total num frames: 89812992. Throughput: 0: 1731.4. Samples: 17445412. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:06:17,932][41694] Avg episode reward: [(0, '4.384')] +[2024-11-08 02:06:22,577][42004] Updated weights for policy 0, policy_version 21936 (0.0030) +[2024-11-08 02:06:22,933][41694] Fps is (10 sec: 7373.0, 60 sec: 7031.3, 300 sec: 6859.0). Total num frames: 89849856. Throughput: 0: 1730.0. Samples: 17456896. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:06:22,936][41694] Avg episode reward: [(0, '4.586')] +[2024-11-08 02:06:27,932][41694] Fps is (10 sec: 5734.2, 60 sec: 6826.6, 300 sec: 6803.5). Total num frames: 89870336. Throughput: 0: 1643.9. Samples: 17464886. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:06:27,934][41694] Avg episode reward: [(0, '4.472')] +[2024-11-08 02:06:30,847][42004] Updated weights for policy 0, policy_version 21946 (0.0042) +[2024-11-08 02:06:32,931][41694] Fps is (10 sec: 5325.5, 60 sec: 6690.1, 300 sec: 6789.6). Total num frames: 89903104. Throughput: 0: 1613.0. Samples: 17468862. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:06:32,933][41694] Avg episode reward: [(0, '4.535')] +[2024-11-08 02:06:36,595][42004] Updated weights for policy 0, policy_version 21956 (0.0032) +[2024-11-08 02:06:37,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6690.1, 300 sec: 6789.6). Total num frames: 89939968. Throughput: 0: 1726.1. Samples: 17479370. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:06:37,933][41694] Avg episode reward: [(0, '4.344')] +[2024-11-08 02:06:41,767][42004] Updated weights for policy 0, policy_version 21966 (0.0026) +[2024-11-08 02:06:42,931][41694] Fps is (10 sec: 7782.5, 60 sec: 6758.4, 300 sec: 6803.5). Total num frames: 89980928. Throughput: 0: 1723.6. Samples: 17491290. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 02:06:42,933][41694] Avg episode reward: [(0, '4.157')] +[2024-11-08 02:06:46,952][42004] Updated weights for policy 0, policy_version 21976 (0.0022) +[2024-11-08 02:06:47,933][41694] Fps is (10 sec: 7780.9, 60 sec: 6690.6, 300 sec: 6859.0). Total num frames: 90017792. Throughput: 0: 1736.1. Samples: 17497004. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 02:06:47,935][41694] Avg episode reward: [(0, '4.447')] +[2024-11-08 02:06:52,108][42004] Updated weights for policy 0, policy_version 21986 (0.0019) +[2024-11-08 02:06:52,932][41694] Fps is (10 sec: 7782.3, 60 sec: 7047.2, 300 sec: 6873.0). Total num frames: 90058752. Throughput: 0: 1766.1. Samples: 17509054. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 02:06:52,933][41694] Avg episode reward: [(0, '4.671')] +[2024-11-08 02:06:57,354][42004] Updated weights for policy 0, policy_version 21996 (0.0024) +[2024-11-08 02:06:57,931][41694] Fps is (10 sec: 8193.7, 60 sec: 7168.0, 300 sec: 6900.7). Total num frames: 90099712. Throughput: 0: 1782.2. Samples: 17520660. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:06:57,933][41694] Avg episode reward: [(0, '4.324')] +[2024-11-08 02:07:02,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6894.9, 300 sec: 6831.3). Total num frames: 90116096. Throughput: 0: 1782.2. Samples: 17525612. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:07:02,933][41694] Avg episode reward: [(0, '4.259')] +[2024-11-08 02:07:06,239][42004] Updated weights for policy 0, policy_version 22006 (0.0039) +[2024-11-08 02:07:07,931][41694] Fps is (10 sec: 4505.5, 60 sec: 6758.4, 300 sec: 6803.5). Total num frames: 90144768. Throughput: 0: 1654.0. Samples: 17531322. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:07:07,933][41694] Avg episode reward: [(0, '4.452')] +[2024-11-08 02:07:11,974][42004] Updated weights for policy 0, policy_version 22016 (0.0033) +[2024-11-08 02:07:12,932][41694] Fps is (10 sec: 6553.5, 60 sec: 6758.5, 300 sec: 6803.5). Total num frames: 90181632. Throughput: 0: 1715.1. Samples: 17542068. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:07:12,933][41694] Avg episode reward: [(0, '4.563')] +[2024-11-08 02:07:17,002][42004] Updated weights for policy 0, policy_version 22026 (0.0024) +[2024-11-08 02:07:17,931][41694] Fps is (10 sec: 7782.3, 60 sec: 6826.7, 300 sec: 6803.5). Total num frames: 90222592. Throughput: 0: 1755.6. Samples: 17547864. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:07:17,933][41694] Avg episode reward: [(0, '4.498')] +[2024-11-08 02:07:22,201][42004] Updated weights for policy 0, policy_version 22036 (0.0032) +[2024-11-08 02:07:22,932][41694] Fps is (10 sec: 8192.1, 60 sec: 6895.0, 300 sec: 6872.9). Total num frames: 90263552. Throughput: 0: 1789.2. Samples: 17559886. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:07:22,935][41694] Avg episode reward: [(0, '4.360')] +[2024-11-08 02:07:27,562][42004] Updated weights for policy 0, policy_version 22046 (0.0027) +[2024-11-08 02:07:27,932][41694] Fps is (10 sec: 7782.3, 60 sec: 7168.0, 300 sec: 6872.9). Total num frames: 90300416. Throughput: 0: 1787.1. Samples: 17571710. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-08 02:07:27,934][41694] Avg episode reward: [(0, '4.375')] +[2024-11-08 02:07:32,932][41694] Fps is (10 sec: 7373.0, 60 sec: 7236.3, 300 sec: 6873.0). Total num frames: 90337280. Throughput: 0: 1785.7. Samples: 17577356. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-08 02:07:32,933][41694] Avg episode reward: [(0, '4.413')] +[2024-11-08 02:07:32,941][42004] Updated weights for policy 0, policy_version 22056 (0.0024) +[2024-11-08 02:07:37,932][41694] Fps is (10 sec: 5734.2, 60 sec: 6963.1, 300 sec: 6817.4). Total num frames: 90357760. Throughput: 0: 1699.6. Samples: 17585538. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-08 02:07:37,934][41694] Avg episode reward: [(0, '4.340')] +[2024-11-08 02:07:38,099][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000022061_90361856.pth... +[2024-11-08 02:07:38,217][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000021660_88719360.pth +[2024-11-08 02:07:41,123][42004] Updated weights for policy 0, policy_version 22066 (0.0035) +[2024-11-08 02:07:42,931][41694] Fps is (10 sec: 5324.8, 60 sec: 6826.7, 300 sec: 6803.5). Total num frames: 90390528. Throughput: 0: 1645.1. Samples: 17594690. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:07:42,934][41694] Avg episode reward: [(0, '4.447')] +[2024-11-08 02:07:46,775][42004] Updated weights for policy 0, policy_version 22076 (0.0019) +[2024-11-08 02:07:47,932][41694] Fps is (10 sec: 7373.1, 60 sec: 6895.1, 300 sec: 6817.4). Total num frames: 90431488. Throughput: 0: 1646.6. Samples: 17599710. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:07:47,934][41694] Avg episode reward: [(0, '4.284')] +[2024-11-08 02:07:52,068][42004] Updated weights for policy 0, policy_version 22086 (0.0035) +[2024-11-08 02:07:52,932][41694] Fps is (10 sec: 7782.1, 60 sec: 6826.6, 300 sec: 6817.4). Total num frames: 90468352. Throughput: 0: 1781.9. Samples: 17611508. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:07:52,934][41694] Avg episode reward: [(0, '4.522')] +[2024-11-08 02:07:57,214][42004] Updated weights for policy 0, policy_version 22096 (0.0026) +[2024-11-08 02:07:57,931][41694] Fps is (10 sec: 7782.5, 60 sec: 6826.7, 300 sec: 6886.8). Total num frames: 90509312. Throughput: 0: 1809.6. Samples: 17623500. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:07:57,932][41694] Avg episode reward: [(0, '4.453')] +[2024-11-08 02:08:02,868][42004] Updated weights for policy 0, policy_version 22106 (0.0027) +[2024-11-08 02:08:02,932][41694] Fps is (10 sec: 7782.5, 60 sec: 7168.0, 300 sec: 6900.7). Total num frames: 90546176. Throughput: 0: 1804.3. Samples: 17629058. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:08:02,933][41694] Avg episode reward: [(0, '4.392')] +[2024-11-08 02:08:07,932][41694] Fps is (10 sec: 7372.6, 60 sec: 7304.5, 300 sec: 6900.8). Total num frames: 90583040. Throughput: 0: 1777.7. Samples: 17639882. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:08:07,934][41694] Avg episode reward: [(0, '4.536')] +[2024-11-08 02:08:08,317][42004] Updated weights for policy 0, policy_version 22116 (0.0022) +[2024-11-08 02:08:12,932][41694] Fps is (10 sec: 5324.9, 60 sec: 6963.2, 300 sec: 6831.3). Total num frames: 90599424. Throughput: 0: 1659.7. Samples: 17646396. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-11-08 02:08:12,933][41694] Avg episode reward: [(0, '4.479')] +[2024-11-08 02:08:16,961][42004] Updated weights for policy 0, policy_version 22126 (0.0030) +[2024-11-08 02:08:17,931][41694] Fps is (10 sec: 4915.3, 60 sec: 6826.7, 300 sec: 6803.5). Total num frames: 90632192. Throughput: 0: 1640.0. Samples: 17651154. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-11-08 02:08:17,934][41694] Avg episode reward: [(0, '4.240')] +[2024-11-08 02:08:22,514][42004] Updated weights for policy 0, policy_version 22136 (0.0029) +[2024-11-08 02:08:22,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6758.4, 300 sec: 6803.5). Total num frames: 90669056. Throughput: 0: 1695.9. Samples: 17661854. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-11-08 02:08:22,934][41694] Avg episode reward: [(0, '4.351')] +[2024-11-08 02:08:27,862][42004] Updated weights for policy 0, policy_version 22146 (0.0034) +[2024-11-08 02:08:27,932][41694] Fps is (10 sec: 7782.2, 60 sec: 6826.7, 300 sec: 6860.9). Total num frames: 90710016. Throughput: 0: 1747.9. Samples: 17673346. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:08:27,934][41694] Avg episode reward: [(0, '4.485')] +[2024-11-08 02:08:32,931][41694] Fps is (10 sec: 7782.6, 60 sec: 6826.7, 300 sec: 6873.0). Total num frames: 90746880. Throughput: 0: 1761.4. Samples: 17678974. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:08:32,935][41694] Avg episode reward: [(0, '4.517')] +[2024-11-08 02:08:33,196][42004] Updated weights for policy 0, policy_version 22156 (0.0035) +[2024-11-08 02:08:37,932][41694] Fps is (10 sec: 7372.9, 60 sec: 7099.8, 300 sec: 6886.8). Total num frames: 90783744. Throughput: 0: 1752.9. Samples: 17690390. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:08:37,935][41694] Avg episode reward: [(0, '4.406')] +[2024-11-08 02:08:38,656][42004] Updated weights for policy 0, policy_version 22166 (0.0031) +[2024-11-08 02:08:42,932][41694] Fps is (10 sec: 7372.0, 60 sec: 7167.9, 300 sec: 6886.8). Total num frames: 90820608. Throughput: 0: 1740.8. Samples: 17701840. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:08:42,935][41694] Avg episode reward: [(0, '4.290')] +[2024-11-08 02:08:46,334][42004] Updated weights for policy 0, policy_version 22176 (0.0031) +[2024-11-08 02:08:47,933][41694] Fps is (10 sec: 5733.5, 60 sec: 6826.5, 300 sec: 6817.4). Total num frames: 90841088. Throughput: 0: 1689.1. Samples: 17705070. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:08:47,935][41694] Avg episode reward: [(0, '4.303')] +[2024-11-08 02:08:52,625][42004] Updated weights for policy 0, policy_version 22186 (0.0055) +[2024-11-08 02:08:52,931][41694] Fps is (10 sec: 5325.4, 60 sec: 6758.4, 300 sec: 6803.5). Total num frames: 90873856. Throughput: 0: 1626.8. Samples: 17713088. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:08:52,934][41694] Avg episode reward: [(0, '4.280')] +[2024-11-08 02:08:57,929][42004] Updated weights for policy 0, policy_version 22196 (0.0030) +[2024-11-08 02:08:57,932][41694] Fps is (10 sec: 7374.0, 60 sec: 6758.4, 300 sec: 6831.3). Total num frames: 90914816. Throughput: 0: 1735.2. Samples: 17724480. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:08:57,933][41694] Avg episode reward: [(0, '4.411')] +[2024-11-08 02:09:02,931][41694] Fps is (10 sec: 7782.3, 60 sec: 6758.4, 300 sec: 6894.7). Total num frames: 90951680. Throughput: 0: 1757.2. Samples: 17730226. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:09:02,935][41694] Avg episode reward: [(0, '4.303')] +[2024-11-08 02:09:03,506][42004] Updated weights for policy 0, policy_version 22206 (0.0035) +[2024-11-08 02:09:07,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6758.4, 300 sec: 6914.6). Total num frames: 90988544. Throughput: 0: 1767.2. Samples: 17741376. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:09:07,933][41694] Avg episode reward: [(0, '4.476')] +[2024-11-08 02:09:08,698][42004] Updated weights for policy 0, policy_version 22216 (0.0020) +[2024-11-08 02:09:12,932][41694] Fps is (10 sec: 7372.7, 60 sec: 7099.7, 300 sec: 6928.5). Total num frames: 91025408. Throughput: 0: 1767.8. Samples: 17752896. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 02:09:12,935][41694] Avg episode reward: [(0, '4.463')] +[2024-11-08 02:09:14,138][42004] Updated weights for policy 0, policy_version 22226 (0.0025) +[2024-11-08 02:09:17,932][41694] Fps is (10 sec: 7372.2, 60 sec: 7167.9, 300 sec: 6928.5). Total num frames: 91062272. Throughput: 0: 1766.3. Samples: 17758460. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 02:09:17,934][41694] Avg episode reward: [(0, '4.562')] +[2024-11-08 02:09:22,136][42004] Updated weights for policy 0, policy_version 22236 (0.0037) +[2024-11-08 02:09:22,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6894.9, 300 sec: 6859.1). Total num frames: 91082752. Throughput: 0: 1664.0. Samples: 17765272. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 02:09:22,934][41694] Avg episode reward: [(0, '4.471')] +[2024-11-08 02:09:27,931][41694] Fps is (10 sec: 5325.2, 60 sec: 6758.4, 300 sec: 6831.3). Total num frames: 91115520. Throughput: 0: 1637.3. Samples: 17775518. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 02:09:27,933][41694] Avg episode reward: [(0, '4.573')] +[2024-11-08 02:09:27,995][42004] Updated weights for policy 0, policy_version 22246 (0.0024) +[2024-11-08 02:09:32,931][41694] Fps is (10 sec: 7373.0, 60 sec: 6826.7, 300 sec: 6845.2). Total num frames: 91156480. Throughput: 0: 1693.8. Samples: 17781290. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 02:09:32,933][41694] Avg episode reward: [(0, '4.387')] +[2024-11-08 02:09:33,202][42004] Updated weights for policy 0, policy_version 22256 (0.0028) +[2024-11-08 02:09:37,931][41694] Fps is (10 sec: 8192.0, 60 sec: 6894.9, 300 sec: 6914.0). Total num frames: 91197440. Throughput: 0: 1782.5. Samples: 17793302. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 02:09:37,937][41694] Avg episode reward: [(0, '4.387')] +[2024-11-08 02:09:37,948][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000022265_91197440.pth... +[2024-11-08 02:09:38,040][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000021860_89538560.pth +[2024-11-08 02:09:38,334][42004] Updated weights for policy 0, policy_version 22266 (0.0027) +[2024-11-08 02:09:42,931][41694] Fps is (10 sec: 7782.4, 60 sec: 6895.1, 300 sec: 6928.5). Total num frames: 91234304. Throughput: 0: 1789.6. Samples: 17805012. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 02:09:42,933][41694] Avg episode reward: [(0, '4.433')] +[2024-11-08 02:09:43,663][42004] Updated weights for policy 0, policy_version 22276 (0.0022) +[2024-11-08 02:09:47,932][41694] Fps is (10 sec: 7372.7, 60 sec: 7168.2, 300 sec: 6928.5). Total num frames: 91271168. Throughput: 0: 1791.2. Samples: 17810830. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:09:47,935][41694] Avg episode reward: [(0, '4.590')] +[2024-11-08 02:09:49,025][42004] Updated weights for policy 0, policy_version 22286 (0.0030) +[2024-11-08 02:09:55,147][41694] Fps is (10 sec: 6035.5, 60 sec: 6978.6, 300 sec: 6876.8). Total num frames: 91308032. Throughput: 0: 1706.8. Samples: 17821962. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:09:55,149][41694] Avg episode reward: [(0, '4.570')] +[2024-11-08 02:09:57,508][42004] Updated weights for policy 0, policy_version 22296 (0.0024) +[2024-11-08 02:09:57,931][41694] Fps is (10 sec: 5324.9, 60 sec: 6826.7, 300 sec: 6859.1). Total num frames: 91324416. Throughput: 0: 1667.5. Samples: 17827934. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:09:57,933][41694] Avg episode reward: [(0, '4.649')] +[2024-11-08 02:10:02,932][41694] Fps is (10 sec: 6839.7, 60 sec: 6826.6, 300 sec: 6845.2). Total num frames: 91361280. Throughput: 0: 1659.2. Samples: 17833126. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:10:02,934][41694] Avg episode reward: [(0, '4.287')] +[2024-11-08 02:10:03,297][42004] Updated weights for policy 0, policy_version 22306 (0.0041) +[2024-11-08 02:10:07,932][41694] Fps is (10 sec: 7372.5, 60 sec: 6826.6, 300 sec: 6859.1). Total num frames: 91398144. Throughput: 0: 1749.0. Samples: 17843976. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:10:07,934][41694] Avg episode reward: [(0, '4.561')] +[2024-11-08 02:10:08,674][42004] Updated weights for policy 0, policy_version 22316 (0.0033) +[2024-11-08 02:10:12,932][41694] Fps is (10 sec: 7373.2, 60 sec: 6826.7, 300 sec: 6915.2). Total num frames: 91435008. Throughput: 0: 1774.7. Samples: 17855380. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:10:12,935][41694] Avg episode reward: [(0, '4.393')] +[2024-11-08 02:10:14,522][42004] Updated weights for policy 0, policy_version 22326 (0.0035) +[2024-11-08 02:10:17,931][41694] Fps is (10 sec: 6963.4, 60 sec: 6758.5, 300 sec: 6914.6). Total num frames: 91467776. Throughput: 0: 1753.3. Samples: 17860188. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:10:17,933][41694] Avg episode reward: [(0, '4.482')] +[2024-11-08 02:10:20,451][42004] Updated weights for policy 0, policy_version 22336 (0.0029) +[2024-11-08 02:10:22,932][41694] Fps is (10 sec: 6963.3, 60 sec: 7031.5, 300 sec: 6928.5). Total num frames: 91504640. Throughput: 0: 1718.9. Samples: 17870654. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:10:22,933][41694] Avg episode reward: [(0, '4.261')] +[2024-11-08 02:10:26,064][42004] Updated weights for policy 0, policy_version 22346 (0.0029) +[2024-11-08 02:10:29,313][41694] Fps is (10 sec: 6117.8, 60 sec: 6873.2, 300 sec: 6868.5). Total num frames: 91537408. Throughput: 0: 1650.5. Samples: 17881566. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:10:29,316][41694] Avg episode reward: [(0, '4.529')] +[2024-11-08 02:10:32,931][41694] Fps is (10 sec: 5734.5, 60 sec: 6758.4, 300 sec: 6859.1). Total num frames: 91561984. Throughput: 0: 1606.5. Samples: 17883124. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:10:32,943][41694] Avg episode reward: [(0, '4.329')] +[2024-11-08 02:10:34,064][42004] Updated weights for policy 0, policy_version 22356 (0.0046) +[2024-11-08 02:10:37,932][41694] Fps is (10 sec: 7128.7, 60 sec: 6690.1, 300 sec: 6859.0). Total num frames: 91598848. Throughput: 0: 1672.4. Samples: 17893516. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:10:37,934][41694] Avg episode reward: [(0, '4.595')] +[2024-11-08 02:10:39,545][42004] Updated weights for policy 0, policy_version 22366 (0.0024) +[2024-11-08 02:10:42,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6621.8, 300 sec: 6831.4). Total num frames: 91631616. Throughput: 0: 1709.4. Samples: 17904856. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:10:42,933][41694] Avg episode reward: [(0, '4.498')] +[2024-11-08 02:10:45,149][42004] Updated weights for policy 0, policy_version 22376 (0.0033) +[2024-11-08 02:10:47,931][41694] Fps is (10 sec: 7373.3, 60 sec: 6690.1, 300 sec: 6903.8). Total num frames: 91672576. Throughput: 0: 1718.5. Samples: 17910458. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 02:10:47,933][41694] Avg episode reward: [(0, '4.469')] +[2024-11-08 02:10:50,489][42004] Updated weights for policy 0, policy_version 22386 (0.0036) +[2024-11-08 02:10:52,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6946.6, 300 sec: 6914.6). Total num frames: 91709440. Throughput: 0: 1730.8. Samples: 17921860. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 02:10:52,934][41694] Avg episode reward: [(0, '4.387')] +[2024-11-08 02:10:56,035][42004] Updated weights for policy 0, policy_version 22396 (0.0026) +[2024-11-08 02:10:57,931][41694] Fps is (10 sec: 7372.7, 60 sec: 7031.5, 300 sec: 6928.5). Total num frames: 91746304. Throughput: 0: 1723.5. Samples: 17932936. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 02:10:57,933][41694] Avg episode reward: [(0, '4.645')] +[2024-11-08 02:11:03,754][41694] Fps is (10 sec: 5298.7, 60 sec: 6667.1, 300 sec: 6853.8). Total num frames: 91766784. Throughput: 0: 1697.8. Samples: 17937984. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 02:11:03,757][41694] Avg episode reward: [(0, '4.670')] +[2024-11-08 02:11:04,453][42004] Updated weights for policy 0, policy_version 22406 (0.0065) +[2024-11-08 02:11:07,931][41694] Fps is (10 sec: 4915.2, 60 sec: 6621.9, 300 sec: 6845.2). Total num frames: 91795456. Throughput: 0: 1631.0. Samples: 17944048. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 02:11:07,933][41694] Avg episode reward: [(0, '4.288')] +[2024-11-08 02:11:10,299][42004] Updated weights for policy 0, policy_version 22416 (0.0041) +[2024-11-08 02:11:12,932][41694] Fps is (10 sec: 7140.4, 60 sec: 6621.8, 300 sec: 6845.2). Total num frames: 91832320. Throughput: 0: 1667.2. Samples: 17954288. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 02:11:12,935][41694] Avg episode reward: [(0, '4.371')] +[2024-11-08 02:11:16,286][42004] Updated weights for policy 0, policy_version 22426 (0.0043) +[2024-11-08 02:11:17,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6621.9, 300 sec: 6831.3). Total num frames: 91865088. Throughput: 0: 1695.5. Samples: 17959420. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:11:17,934][41694] Avg episode reward: [(0, '4.219')] +[2024-11-08 02:11:22,071][42004] Updated weights for policy 0, policy_version 22436 (0.0023) +[2024-11-08 02:11:22,932][41694] Fps is (10 sec: 6963.5, 60 sec: 6621.9, 300 sec: 6886.8). Total num frames: 91901952. Throughput: 0: 1700.6. Samples: 17970042. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:11:22,933][41694] Avg episode reward: [(0, '4.360')] +[2024-11-08 02:11:27,419][42004] Updated weights for policy 0, policy_version 22446 (0.0045) +[2024-11-08 02:11:27,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6847.8, 300 sec: 6900.7). Total num frames: 91938816. Throughput: 0: 1701.9. Samples: 17981442. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:11:27,934][41694] Avg episode reward: [(0, '4.628')] +[2024-11-08 02:11:32,714][42004] Updated weights for policy 0, policy_version 22456 (0.0028) +[2024-11-08 02:11:32,931][41694] Fps is (10 sec: 7782.4, 60 sec: 6963.2, 300 sec: 6914.6). Total num frames: 91979776. Throughput: 0: 1705.0. Samples: 17987184. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:11:32,933][41694] Avg episode reward: [(0, '4.656')] +[2024-11-08 02:11:38,225][41694] Fps is (10 sec: 5968.6, 60 sec: 6657.6, 300 sec: 6838.4). Total num frames: 92000256. Throughput: 0: 1573.3. Samples: 17993122. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:11:38,230][41694] Avg episode reward: [(0, '4.449')] +[2024-11-08 02:11:38,244][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000022461_92000256.pth... +[2024-11-08 02:11:38,354][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000022061_90361856.pth +[2024-11-08 02:11:41,065][42004] Updated weights for policy 0, policy_version 22466 (0.0042) +[2024-11-08 02:11:42,932][41694] Fps is (10 sec: 5324.8, 60 sec: 6690.1, 300 sec: 6831.3). Total num frames: 92033024. Throughput: 0: 1591.7. Samples: 18004562. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:11:42,933][41694] Avg episode reward: [(0, '4.369')] +[2024-11-08 02:11:46,519][42004] Updated weights for policy 0, policy_version 22476 (0.0032) +[2024-11-08 02:11:47,932][41694] Fps is (10 sec: 7174.0, 60 sec: 6621.8, 300 sec: 6817.4). Total num frames: 92069888. Throughput: 0: 1630.1. Samples: 18010000. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:11:47,934][41694] Avg episode reward: [(0, '4.489')] +[2024-11-08 02:11:51,807][42004] Updated weights for policy 0, policy_version 22486 (0.0029) +[2024-11-08 02:11:52,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6621.9, 300 sec: 6803.5). Total num frames: 92106752. Throughput: 0: 1724.4. Samples: 18021646. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:11:52,933][41694] Avg episode reward: [(0, '4.379')] +[2024-11-08 02:11:57,234][42004] Updated weights for policy 0, policy_version 22496 (0.0031) +[2024-11-08 02:11:57,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6690.1, 300 sec: 6886.8). Total num frames: 92147712. Throughput: 0: 1753.3. Samples: 18033186. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:11:57,933][41694] Avg episode reward: [(0, '4.360')] +[2024-11-08 02:12:02,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6990.8, 300 sec: 6900.7). Total num frames: 92180480. Throughput: 0: 1761.7. Samples: 18038696. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:12:02,934][41694] Avg episode reward: [(0, '4.465')] +[2024-11-08 02:12:02,958][42004] Updated weights for policy 0, policy_version 22506 (0.0032) +[2024-11-08 02:12:07,932][41694] Fps is (10 sec: 6553.5, 60 sec: 6963.2, 300 sec: 6886.8). Total num frames: 92213248. Throughput: 0: 1727.4. Samples: 18047776. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:12:07,935][41694] Avg episode reward: [(0, '4.521')] +[2024-11-08 02:12:10,090][42004] Updated weights for policy 0, policy_version 22516 (0.0032) +[2024-11-08 02:12:12,931][41694] Fps is (10 sec: 4915.2, 60 sec: 6621.9, 300 sec: 6803.5). Total num frames: 92229632. Throughput: 0: 1626.8. Samples: 18054650. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:12:12,933][41694] Avg episode reward: [(0, '4.472')] +[2024-11-08 02:12:17,904][42004] Updated weights for policy 0, policy_version 22526 (0.0046) +[2024-11-08 02:12:17,932][41694] Fps is (10 sec: 5324.9, 60 sec: 6690.1, 300 sec: 6789.6). Total num frames: 92266496. Throughput: 0: 1585.6. Samples: 18058538. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:12:17,934][41694] Avg episode reward: [(0, '4.409')] +[2024-11-08 02:12:22,931][41694] Fps is (10 sec: 7372.7, 60 sec: 6690.1, 300 sec: 6789.6). Total num frames: 92303360. Throughput: 0: 1720.2. Samples: 18070026. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:12:22,933][41694] Avg episode reward: [(0, '4.434')] +[2024-11-08 02:12:23,287][42004] Updated weights for policy 0, policy_version 22536 (0.0027) +[2024-11-08 02:12:27,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6690.1, 300 sec: 6789.6). Total num frames: 92340224. Throughput: 0: 1706.6. Samples: 18081360. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:12:27,933][41694] Avg episode reward: [(0, '4.479')] +[2024-11-08 02:12:28,676][42004] Updated weights for policy 0, policy_version 22546 (0.0023) +[2024-11-08 02:12:32,931][41694] Fps is (10 sec: 7782.4, 60 sec: 6690.1, 300 sec: 6859.1). Total num frames: 92381184. Throughput: 0: 1712.8. Samples: 18087076. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:12:32,933][41694] Avg episode reward: [(0, '4.503')] +[2024-11-08 02:12:33,929][42004] Updated weights for policy 0, policy_version 22556 (0.0041) +[2024-11-08 02:12:37,931][41694] Fps is (10 sec: 7782.4, 60 sec: 6997.5, 300 sec: 6873.0). Total num frames: 92418048. Throughput: 0: 1703.9. Samples: 18098322. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:12:37,933][41694] Avg episode reward: [(0, '4.305')] +[2024-11-08 02:12:39,589][42004] Updated weights for policy 0, policy_version 22566 (0.0031) +[2024-11-08 02:12:42,933][41694] Fps is (10 sec: 6962.3, 60 sec: 6963.1, 300 sec: 6845.2). Total num frames: 92450816. Throughput: 0: 1690.8. Samples: 18109274. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:12:42,936][41694] Avg episode reward: [(0, '4.257')] +[2024-11-08 02:12:47,829][42004] Updated weights for policy 0, policy_version 22576 (0.0027) +[2024-11-08 02:12:47,934][41694] Fps is (10 sec: 5323.6, 60 sec: 6689.9, 300 sec: 6789.6). Total num frames: 92471296. Throughput: 0: 1677.4. Samples: 18114182. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:12:47,936][41694] Avg episode reward: [(0, '4.318')] +[2024-11-08 02:12:52,931][41694] Fps is (10 sec: 5735.1, 60 sec: 6690.1, 300 sec: 6775.8). Total num frames: 92508160. Throughput: 0: 1630.7. Samples: 18121156. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:12:52,933][41694] Avg episode reward: [(0, '4.434')] +[2024-11-08 02:12:53,262][42004] Updated weights for policy 0, policy_version 22586 (0.0021) +[2024-11-08 02:12:57,932][41694] Fps is (10 sec: 7374.2, 60 sec: 6621.8, 300 sec: 6775.8). Total num frames: 92545024. Throughput: 0: 1736.0. Samples: 18132772. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:12:57,934][41694] Avg episode reward: [(0, '4.458')] +[2024-11-08 02:12:58,578][42004] Updated weights for policy 0, policy_version 22596 (0.0028) +[2024-11-08 02:13:02,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6690.1, 300 sec: 6775.8). Total num frames: 92581888. Throughput: 0: 1780.0. Samples: 18138636. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:13:02,934][41694] Avg episode reward: [(0, '4.484')] +[2024-11-08 02:13:04,155][42004] Updated weights for policy 0, policy_version 22606 (0.0034) +[2024-11-08 02:13:07,932][41694] Fps is (10 sec: 7782.6, 60 sec: 6826.7, 300 sec: 6859.1). Total num frames: 92622848. Throughput: 0: 1766.3. Samples: 18149510. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:13:07,936][41694] Avg episode reward: [(0, '4.790')] +[2024-11-08 02:13:09,605][42004] Updated weights for policy 0, policy_version 22616 (0.0028) +[2024-11-08 02:13:12,931][41694] Fps is (10 sec: 7782.5, 60 sec: 7168.0, 300 sec: 6872.9). Total num frames: 92659712. Throughput: 0: 1766.6. Samples: 18160858. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:13:12,933][41694] Avg episode reward: [(0, '4.408')] +[2024-11-08 02:13:15,103][42004] Updated weights for policy 0, policy_version 22626 (0.0032) +[2024-11-08 02:13:17,932][41694] Fps is (10 sec: 6963.2, 60 sec: 7099.7, 300 sec: 6859.1). Total num frames: 92692480. Throughput: 0: 1764.2. Samples: 18166466. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:13:17,933][41694] Avg episode reward: [(0, '4.373')] +[2024-11-08 02:13:22,932][41694] Fps is (10 sec: 5324.7, 60 sec: 6826.7, 300 sec: 6789.6). Total num frames: 92712960. Throughput: 0: 1670.2. Samples: 18173482. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:13:22,934][41694] Avg episode reward: [(0, '4.268')] +[2024-11-08 02:13:23,423][42004] Updated weights for policy 0, policy_version 22636 (0.0045) +[2024-11-08 02:13:27,931][41694] Fps is (10 sec: 5734.5, 60 sec: 6826.7, 300 sec: 6789.6). Total num frames: 92749824. Throughput: 0: 1652.9. Samples: 18183652. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:13:27,933][41694] Avg episode reward: [(0, '4.438')] +[2024-11-08 02:13:28,742][42004] Updated weights for policy 0, policy_version 22646 (0.0029) +[2024-11-08 02:13:32,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6758.4, 300 sec: 6789.6). Total num frames: 92786688. Throughput: 0: 1671.8. Samples: 18189408. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:13:32,934][41694] Avg episode reward: [(0, '4.439')] +[2024-11-08 02:13:34,072][42004] Updated weights for policy 0, policy_version 22656 (0.0040) +[2024-11-08 02:13:37,932][41694] Fps is (10 sec: 7782.2, 60 sec: 6826.6, 300 sec: 6803.5). Total num frames: 92827648. Throughput: 0: 1771.2. Samples: 18200862. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:13:37,933][41694] Avg episode reward: [(0, '4.583')] +[2024-11-08 02:13:37,953][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000022663_92827648.pth... +[2024-11-08 02:13:38,055][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000022265_91197440.pth +[2024-11-08 02:13:39,413][42004] Updated weights for policy 0, policy_version 22666 (0.0026) +[2024-11-08 02:13:42,932][41694] Fps is (10 sec: 7782.3, 60 sec: 6895.1, 300 sec: 6859.1). Total num frames: 92864512. Throughput: 0: 1767.3. Samples: 18212300. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:13:42,933][41694] Avg episode reward: [(0, '4.494')] +[2024-11-08 02:13:44,930][42004] Updated weights for policy 0, policy_version 22676 (0.0043) +[2024-11-08 02:13:47,932][41694] Fps is (10 sec: 7372.9, 60 sec: 7168.3, 300 sec: 6872.9). Total num frames: 92901376. Throughput: 0: 1762.6. Samples: 18217954. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:13:47,934][41694] Avg episode reward: [(0, '4.464')] +[2024-11-08 02:13:50,447][42004] Updated weights for policy 0, policy_version 22686 (0.0029) +[2024-11-08 02:13:52,931][41694] Fps is (10 sec: 7373.0, 60 sec: 7168.0, 300 sec: 6859.1). Total num frames: 92938240. Throughput: 0: 1763.4. Samples: 18228864. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 02:13:52,934][41694] Avg episode reward: [(0, '4.487')] +[2024-11-08 02:13:57,932][41694] Fps is (10 sec: 5324.6, 60 sec: 6826.7, 300 sec: 6789.6). Total num frames: 92954624. Throughput: 0: 1655.0. Samples: 18235332. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 02:13:57,935][41694] Avg episode reward: [(0, '4.432')] +[2024-11-08 02:13:58,591][42004] Updated weights for policy 0, policy_version 22696 (0.0027) +[2024-11-08 02:14:02,931][41694] Fps is (10 sec: 5324.8, 60 sec: 6826.7, 300 sec: 6789.6). Total num frames: 92991488. Throughput: 0: 1657.7. Samples: 18241062. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 02:14:02,933][41694] Avg episode reward: [(0, '4.436')] +[2024-11-08 02:14:04,135][42004] Updated weights for policy 0, policy_version 22706 (0.0028) +[2024-11-08 02:14:07,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6758.4, 300 sec: 6789.6). Total num frames: 93028352. Throughput: 0: 1735.0. Samples: 18251558. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 02:14:07,936][41694] Avg episode reward: [(0, '4.573')] +[2024-11-08 02:14:09,829][42004] Updated weights for policy 0, policy_version 22716 (0.0029) +[2024-11-08 02:14:12,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6758.4, 300 sec: 6789.7). Total num frames: 93065216. Throughput: 0: 1757.7. Samples: 18262750. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:14:12,934][41694] Avg episode reward: [(0, '4.493')] +[2024-11-08 02:14:15,462][42004] Updated weights for policy 0, policy_version 22726 (0.0032) +[2024-11-08 02:14:17,932][41694] Fps is (10 sec: 7372.4, 60 sec: 6826.6, 300 sec: 6845.2). Total num frames: 93102080. Throughput: 0: 1749.6. Samples: 18268140. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:14:17,936][41694] Avg episode reward: [(0, '4.267')] +[2024-11-08 02:14:20,849][42004] Updated weights for policy 0, policy_version 22736 (0.0040) +[2024-11-08 02:14:22,932][41694] Fps is (10 sec: 7372.8, 60 sec: 7099.7, 300 sec: 6859.1). Total num frames: 93138944. Throughput: 0: 1747.1. Samples: 18279482. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:14:22,934][41694] Avg episode reward: [(0, '4.424')] +[2024-11-08 02:14:26,786][42004] Updated weights for policy 0, policy_version 22746 (0.0036) +[2024-11-08 02:14:27,934][41694] Fps is (10 sec: 6962.1, 60 sec: 7031.2, 300 sec: 6831.2). Total num frames: 93171712. Throughput: 0: 1722.1. Samples: 18289800. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:14:27,938][41694] Avg episode reward: [(0, '4.576')] +[2024-11-08 02:14:32,931][41694] Fps is (10 sec: 5324.9, 60 sec: 6758.4, 300 sec: 6761.9). Total num frames: 93192192. Throughput: 0: 1642.5. Samples: 18291868. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:14:32,933][41694] Avg episode reward: [(0, '4.580')] +[2024-11-08 02:14:34,702][42004] Updated weights for policy 0, policy_version 22756 (0.0029) +[2024-11-08 02:14:37,934][41694] Fps is (10 sec: 5734.2, 60 sec: 6689.8, 300 sec: 6761.8). Total num frames: 93229056. Throughput: 0: 1631.9. Samples: 18302302. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:14:37,938][41694] Avg episode reward: [(0, '4.428')] +[2024-11-08 02:14:40,104][42004] Updated weights for policy 0, policy_version 22766 (0.0034) +[2024-11-08 02:14:42,931][41694] Fps is (10 sec: 7782.4, 60 sec: 6758.4, 300 sec: 6775.8). Total num frames: 93270016. Throughput: 0: 1735.3. Samples: 18313420. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:14:42,934][41694] Avg episode reward: [(0, '4.360')] +[2024-11-08 02:14:45,728][42004] Updated weights for policy 0, policy_version 22776 (0.0025) +[2024-11-08 02:14:47,932][41694] Fps is (10 sec: 7784.4, 60 sec: 6758.4, 300 sec: 6827.0). Total num frames: 93306880. Throughput: 0: 1732.4. Samples: 18319020. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:14:47,933][41694] Avg episode reward: [(0, '4.444')] +[2024-11-08 02:14:51,025][42004] Updated weights for policy 0, policy_version 22786 (0.0030) +[2024-11-08 02:14:52,932][41694] Fps is (10 sec: 6963.0, 60 sec: 6690.1, 300 sec: 6831.3). Total num frames: 93339648. Throughput: 0: 1754.7. Samples: 18330518. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:14:52,934][41694] Avg episode reward: [(0, '4.758')] +[2024-11-08 02:14:56,878][42004] Updated weights for policy 0, policy_version 22796 (0.0023) +[2024-11-08 02:14:57,932][41694] Fps is (10 sec: 6963.3, 60 sec: 7031.5, 300 sec: 6831.3). Total num frames: 93376512. Throughput: 0: 1738.0. Samples: 18340960. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:14:57,934][41694] Avg episode reward: [(0, '4.598')] +[2024-11-08 02:15:05,030][41694] Fps is (10 sec: 5755.5, 60 sec: 6727.9, 300 sec: 6769.3). Total num frames: 93409280. Throughput: 0: 1651.5. Samples: 18345920. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:15:05,032][41694] Avg episode reward: [(0, '4.433')] +[2024-11-08 02:15:05,102][42004] Updated weights for policy 0, policy_version 22806 (0.0029) +[2024-11-08 02:15:07,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6758.4, 300 sec: 6775.8). Total num frames: 93433856. Throughput: 0: 1629.3. Samples: 18352800. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:15:07,933][41694] Avg episode reward: [(0, '4.276')] +[2024-11-08 02:15:10,428][42004] Updated weights for policy 0, policy_version 22816 (0.0026) +[2024-11-08 02:15:12,932][41694] Fps is (10 sec: 7775.7, 60 sec: 6758.4, 300 sec: 6789.6). Total num frames: 93470720. Throughput: 0: 1647.9. Samples: 18363952. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 02:15:12,933][41694] Avg episode reward: [(0, '4.310')] +[2024-11-08 02:15:16,392][42004] Updated weights for policy 0, policy_version 22826 (0.0026) +[2024-11-08 02:15:17,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6690.2, 300 sec: 6775.8). Total num frames: 93503488. Throughput: 0: 1716.3. Samples: 18369100. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 02:15:17,938][41694] Avg episode reward: [(0, '4.283')] +[2024-11-08 02:15:21,946][42004] Updated weights for policy 0, policy_version 22836 (0.0031) +[2024-11-08 02:15:22,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6690.2, 300 sec: 6821.6). Total num frames: 93540352. Throughput: 0: 1719.8. Samples: 18379686. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 02:15:22,933][41694] Avg episode reward: [(0, '4.445')] +[2024-11-08 02:15:27,169][42004] Updated weights for policy 0, policy_version 22846 (0.0026) +[2024-11-08 02:15:27,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6826.9, 300 sec: 6845.2). Total num frames: 93581312. Throughput: 0: 1734.8. Samples: 18391486. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:15:27,934][41694] Avg episode reward: [(0, '4.394')] +[2024-11-08 02:15:32,898][42004] Updated weights for policy 0, policy_version 22856 (0.0023) +[2024-11-08 02:15:32,932][41694] Fps is (10 sec: 7782.3, 60 sec: 7099.7, 300 sec: 6845.2). Total num frames: 93618176. Throughput: 0: 1734.2. Samples: 18397058. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:15:32,934][41694] Avg episode reward: [(0, '4.475')] +[2024-11-08 02:15:39,497][41694] Fps is (10 sec: 5666.5, 60 sec: 6786.4, 300 sec: 6795.2). Total num frames: 93646848. Throughput: 0: 1647.2. Samples: 18407222. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:15:39,499][41694] Avg episode reward: [(0, '4.671')] +[2024-11-08 02:15:39,515][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000022863_93646848.pth... +[2024-11-08 02:15:39,639][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000022461_92000256.pth +[2024-11-08 02:15:40,775][42004] Updated weights for policy 0, policy_version 22866 (0.0037) +[2024-11-08 02:15:42,932][41694] Fps is (10 sec: 5324.8, 60 sec: 6690.1, 300 sec: 6775.8). Total num frames: 93671424. Throughput: 0: 1635.2. Samples: 18414544. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:15:42,933][41694] Avg episode reward: [(0, '4.536')] +[2024-11-08 02:15:46,318][42004] Updated weights for policy 0, policy_version 22876 (0.0029) +[2024-11-08 02:15:47,932][41694] Fps is (10 sec: 7283.8, 60 sec: 6690.1, 300 sec: 6775.7). Total num frames: 93708288. Throughput: 0: 1728.7. Samples: 18420086. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:15:47,934][41694] Avg episode reward: [(0, '4.389')] +[2024-11-08 02:15:51,644][42004] Updated weights for policy 0, policy_version 22886 (0.0028) +[2024-11-08 02:15:52,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6826.7, 300 sec: 6789.6). Total num frames: 93749248. Throughput: 0: 1753.5. Samples: 18431706. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:15:52,934][41694] Avg episode reward: [(0, '4.367')] +[2024-11-08 02:15:56,884][42004] Updated weights for policy 0, policy_version 22896 (0.0029) +[2024-11-08 02:15:57,932][41694] Fps is (10 sec: 8192.1, 60 sec: 6894.9, 300 sec: 6878.2). Total num frames: 93790208. Throughput: 0: 1767.2. Samples: 18443478. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:15:57,935][41694] Avg episode reward: [(0, '4.491')] +[2024-11-08 02:16:02,341][42004] Updated weights for policy 0, policy_version 22906 (0.0030) +[2024-11-08 02:16:02,932][41694] Fps is (10 sec: 7782.5, 60 sec: 7215.6, 300 sec: 6886.8). Total num frames: 93827072. Throughput: 0: 1779.2. Samples: 18449166. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:16:02,933][41694] Avg episode reward: [(0, '4.549')] +[2024-11-08 02:16:07,932][41694] Fps is (10 sec: 6963.5, 60 sec: 7099.7, 300 sec: 6873.0). Total num frames: 93859840. Throughput: 0: 1778.3. Samples: 18459708. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:16:07,933][41694] Avg episode reward: [(0, '4.474')] +[2024-11-08 02:16:08,190][42004] Updated weights for policy 0, policy_version 22916 (0.0035) +[2024-11-08 02:16:13,973][41694] Fps is (10 sec: 5193.5, 60 sec: 6777.3, 300 sec: 6821.1). Total num frames: 93884416. Throughput: 0: 1591.1. Samples: 18464742. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:16:13,974][41694] Avg episode reward: [(0, '4.343')] +[2024-11-08 02:16:16,551][42004] Updated weights for policy 0, policy_version 22926 (0.0044) +[2024-11-08 02:16:17,931][41694] Fps is (10 sec: 5324.9, 60 sec: 6826.7, 300 sec: 6817.4). Total num frames: 93913088. Throughput: 0: 1645.7. Samples: 18471114. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 02:16:17,933][41694] Avg episode reward: [(0, '4.334')] +[2024-11-08 02:16:21,814][42004] Updated weights for policy 0, policy_version 22936 (0.0033) +[2024-11-08 02:16:22,931][41694] Fps is (10 sec: 7772.8, 60 sec: 6894.9, 300 sec: 6831.3). Total num frames: 93954048. Throughput: 0: 1733.2. Samples: 18482504. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 02:16:22,933][41694] Avg episode reward: [(0, '4.579')] +[2024-11-08 02:16:27,015][42004] Updated weights for policy 0, policy_version 22946 (0.0026) +[2024-11-08 02:16:27,931][41694] Fps is (10 sec: 7782.4, 60 sec: 6826.7, 300 sec: 6817.4). Total num frames: 93990912. Throughput: 0: 1766.7. Samples: 18494046. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 02:16:27,933][41694] Avg episode reward: [(0, '4.591')] +[2024-11-08 02:16:32,272][42004] Updated weights for policy 0, policy_version 22956 (0.0027) +[2024-11-08 02:16:32,931][41694] Fps is (10 sec: 7782.4, 60 sec: 6895.0, 300 sec: 6893.7). Total num frames: 94031872. Throughput: 0: 1773.1. Samples: 18499872. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-11-08 02:16:32,933][41694] Avg episode reward: [(0, '4.471')] +[2024-11-08 02:16:37,577][42004] Updated weights for policy 0, policy_version 22966 (0.0032) +[2024-11-08 02:16:37,932][41694] Fps is (10 sec: 7781.8, 60 sec: 7219.8, 300 sec: 6900.7). Total num frames: 94068736. Throughput: 0: 1777.5. Samples: 18511696. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-11-08 02:16:37,934][41694] Avg episode reward: [(0, '4.524')] +[2024-11-08 02:16:42,932][41694] Fps is (10 sec: 6963.1, 60 sec: 7168.0, 300 sec: 6886.8). Total num frames: 94101504. Throughput: 0: 1742.3. Samples: 18521882. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-11-08 02:16:42,933][41694] Avg episode reward: [(0, '4.349')] +[2024-11-08 02:16:43,700][42004] Updated weights for policy 0, policy_version 22976 (0.0038) +[2024-11-08 02:16:48,477][41694] Fps is (10 sec: 5438.4, 60 sec: 6900.6, 300 sec: 6832.6). Total num frames: 94126080. Throughput: 0: 1714.9. Samples: 18527270. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:16:48,479][41694] Avg episode reward: [(0, '4.528')] +[2024-11-08 02:16:51,541][42004] Updated weights for policy 0, policy_version 22986 (0.0026) +[2024-11-08 02:16:52,931][41694] Fps is (10 sec: 5734.5, 60 sec: 6826.7, 300 sec: 6817.4). Total num frames: 94158848. Throughput: 0: 1654.6. Samples: 18534164. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:16:52,933][41694] Avg episode reward: [(0, '4.544')] +[2024-11-08 02:16:56,778][42004] Updated weights for policy 0, policy_version 22996 (0.0029) +[2024-11-08 02:16:57,932][41694] Fps is (10 sec: 7797.5, 60 sec: 6826.7, 300 sec: 6845.2). Total num frames: 94199808. Throughput: 0: 1844.0. Samples: 18545802. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:16:57,934][41694] Avg episode reward: [(0, '4.392')] +[2024-11-08 02:17:02,568][42004] Updated weights for policy 0, policy_version 23006 (0.0027) +[2024-11-08 02:17:02,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6758.4, 300 sec: 6845.2). Total num frames: 94232576. Throughput: 0: 1787.0. Samples: 18551530. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:17:02,933][41694] Avg episode reward: [(0, '4.266')] +[2024-11-08 02:17:07,666][42004] Updated weights for policy 0, policy_version 23016 (0.0028) +[2024-11-08 02:17:07,932][41694] Fps is (10 sec: 7372.9, 60 sec: 6894.9, 300 sec: 6928.5). Total num frames: 94273536. Throughput: 0: 1782.8. Samples: 18562728. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:17:07,933][41694] Avg episode reward: [(0, '4.467')] +[2024-11-08 02:17:12,932][41694] Fps is (10 sec: 7782.3, 60 sec: 7225.1, 300 sec: 6928.5). Total num frames: 94310400. Throughput: 0: 1771.6. Samples: 18573768. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:17:12,935][41694] Avg episode reward: [(0, '4.475')] +[2024-11-08 02:17:13,512][42004] Updated weights for policy 0, policy_version 23026 (0.0034) +[2024-11-08 02:17:17,931][41694] Fps is (10 sec: 6963.3, 60 sec: 7168.0, 300 sec: 6914.6). Total num frames: 94343168. Throughput: 0: 1746.9. Samples: 18578482. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:17:17,933][41694] Avg episode reward: [(0, '4.435')] +[2024-11-08 02:17:19,495][42004] Updated weights for policy 0, policy_version 23036 (0.0035) +[2024-11-08 02:17:23,101][41694] Fps is (10 sec: 5235.9, 60 sec: 6807.4, 300 sec: 6855.1). Total num frames: 94363648. Throughput: 0: 1597.9. Samples: 18583870. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:17:23,103][41694] Avg episode reward: [(0, '4.494')] +[2024-11-08 02:17:27,242][42004] Updated weights for policy 0, policy_version 23046 (0.0029) +[2024-11-08 02:17:27,931][41694] Fps is (10 sec: 5734.4, 60 sec: 6826.7, 300 sec: 6845.2). Total num frames: 94400512. Throughput: 0: 1652.8. Samples: 18596258. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:17:27,936][41694] Avg episode reward: [(0, '4.519')] +[2024-11-08 02:17:32,551][42004] Updated weights for policy 0, policy_version 23056 (0.0022) +[2024-11-08 02:17:32,932][41694] Fps is (10 sec: 7500.1, 60 sec: 6758.4, 300 sec: 6845.2). Total num frames: 94437376. Throughput: 0: 1677.1. Samples: 18601828. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:17:32,933][41694] Avg episode reward: [(0, '4.413')] +[2024-11-08 02:17:37,761][42004] Updated weights for policy 0, policy_version 23066 (0.0043) +[2024-11-08 02:17:37,932][41694] Fps is (10 sec: 7782.1, 60 sec: 6826.7, 300 sec: 6873.0). Total num frames: 94478336. Throughput: 0: 1763.9. Samples: 18613542. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:17:37,934][41694] Avg episode reward: [(0, '4.458')] +[2024-11-08 02:17:37,944][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000023066_94478336.pth... +[2024-11-08 02:17:38,055][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000022663_92827648.pth +[2024-11-08 02:17:42,931][41694] Fps is (10 sec: 7782.5, 60 sec: 6895.0, 300 sec: 6928.5). Total num frames: 94515200. Throughput: 0: 1756.5. Samples: 18624844. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:17:42,933][41694] Avg episode reward: [(0, '4.385')] +[2024-11-08 02:17:43,391][42004] Updated weights for policy 0, policy_version 23076 (0.0026) +[2024-11-08 02:17:47,932][41694] Fps is (10 sec: 7373.0, 60 sec: 7164.8, 300 sec: 6928.5). Total num frames: 94552064. Throughput: 0: 1755.5. Samples: 18630528. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:17:47,933][41694] Avg episode reward: [(0, '4.389')] +[2024-11-08 02:17:48,585][42004] Updated weights for policy 0, policy_version 23086 (0.0025) +[2024-11-08 02:17:52,932][41694] Fps is (10 sec: 7782.2, 60 sec: 7236.3, 300 sec: 6942.4). Total num frames: 94593024. Throughput: 0: 1769.2. Samples: 18642344. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:17:52,933][41694] Avg episode reward: [(0, '4.207')] +[2024-11-08 02:17:53,843][42004] Updated weights for policy 0, policy_version 23096 (0.0026) +[2024-11-08 02:17:57,932][41694] Fps is (10 sec: 6144.0, 60 sec: 6895.0, 300 sec: 6886.8). Total num frames: 94613504. Throughput: 0: 1706.8. Samples: 18650572. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:17:57,934][41694] Avg episode reward: [(0, '4.263')] +[2024-11-08 02:18:02,054][42004] Updated weights for policy 0, policy_version 23106 (0.0027) +[2024-11-08 02:18:02,932][41694] Fps is (10 sec: 5324.7, 60 sec: 6894.9, 300 sec: 6859.1). Total num frames: 94646272. Throughput: 0: 1698.5. Samples: 18654914. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:18:02,934][41694] Avg episode reward: [(0, '4.326')] +[2024-11-08 02:18:07,687][42004] Updated weights for policy 0, policy_version 23116 (0.0029) +[2024-11-08 02:18:07,932][41694] Fps is (10 sec: 6962.9, 60 sec: 6826.6, 300 sec: 6859.1). Total num frames: 94683136. Throughput: 0: 1813.0. Samples: 18665150. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 02:18:07,935][41694] Avg episode reward: [(0, '4.431')] +[2024-11-08 02:18:12,931][41694] Fps is (10 sec: 7373.1, 60 sec: 6826.7, 300 sec: 6873.0). Total num frames: 94720000. Throughput: 0: 1789.1. Samples: 18676766. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 02:18:12,933][41694] Avg episode reward: [(0, '4.509')] +[2024-11-08 02:18:12,982][42004] Updated weights for policy 0, policy_version 23126 (0.0024) +[2024-11-08 02:18:17,932][41694] Fps is (10 sec: 7782.7, 60 sec: 6963.2, 300 sec: 6942.4). Total num frames: 94760960. Throughput: 0: 1793.7. Samples: 18682544. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 02:18:17,933][41694] Avg episode reward: [(0, '4.628')] +[2024-11-08 02:18:18,121][42004] Updated weights for policy 0, policy_version 23136 (0.0029) +[2024-11-08 02:18:22,931][41694] Fps is (10 sec: 7782.4, 60 sec: 7256.8, 300 sec: 6942.4). Total num frames: 94797824. Throughput: 0: 1792.0. Samples: 18694182. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:18:22,933][41694] Avg episode reward: [(0, '4.513')] +[2024-11-08 02:18:23,558][42004] Updated weights for policy 0, policy_version 23146 (0.0024) +[2024-11-08 02:18:27,931][41694] Fps is (10 sec: 7782.5, 60 sec: 7304.5, 300 sec: 6956.3). Total num frames: 94838784. Throughput: 0: 1800.8. Samples: 18705882. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:18:27,933][41694] Avg episode reward: [(0, '4.321')] +[2024-11-08 02:18:28,823][42004] Updated weights for policy 0, policy_version 23156 (0.0028) +[2024-11-08 02:18:32,931][41694] Fps is (10 sec: 6144.0, 60 sec: 7031.5, 300 sec: 6886.8). Total num frames: 94859264. Throughput: 0: 1789.6. Samples: 18711058. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:18:32,933][41694] Avg episode reward: [(0, '4.290')] +[2024-11-08 02:18:37,137][42004] Updated weights for policy 0, policy_version 23166 (0.0034) +[2024-11-08 02:18:37,931][41694] Fps is (10 sec: 5324.8, 60 sec: 6895.0, 300 sec: 6873.0). Total num frames: 94892032. Throughput: 0: 1673.0. Samples: 18717630. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:18:37,933][41694] Avg episode reward: [(0, '4.239')] +[2024-11-08 02:18:42,493][42004] Updated weights for policy 0, policy_version 23176 (0.0035) +[2024-11-08 02:18:42,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6894.9, 300 sec: 6872.9). Total num frames: 94928896. Throughput: 0: 1741.9. Samples: 18728956. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:18:42,934][41694] Avg episode reward: [(0, '4.367')] +[2024-11-08 02:18:47,766][42004] Updated weights for policy 0, policy_version 23186 (0.0030) +[2024-11-08 02:18:47,931][41694] Fps is (10 sec: 7782.4, 60 sec: 6963.2, 300 sec: 6886.8). Total num frames: 94969856. Throughput: 0: 1772.3. Samples: 18734666. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:18:47,933][41694] Avg episode reward: [(0, '4.523')] +[2024-11-08 02:18:52,931][41694] Fps is (10 sec: 7782.4, 60 sec: 6894.9, 300 sec: 6956.3). Total num frames: 95006720. Throughput: 0: 1805.4. Samples: 18746394. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:18:52,933][41694] Avg episode reward: [(0, '4.471')] +[2024-11-08 02:18:53,053][42004] Updated weights for policy 0, policy_version 23196 (0.0028) +[2024-11-08 02:18:57,931][41694] Fps is (10 sec: 7372.8, 60 sec: 7168.0, 300 sec: 6956.3). Total num frames: 95043584. Throughput: 0: 1800.7. Samples: 18757798. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:18:57,934][41694] Avg episode reward: [(0, '4.346')] +[2024-11-08 02:18:58,526][42004] Updated weights for policy 0, policy_version 23206 (0.0043) +[2024-11-08 02:19:02,933][41694] Fps is (10 sec: 7371.9, 60 sec: 7236.2, 300 sec: 6956.2). Total num frames: 95080448. Throughput: 0: 1795.1. Samples: 18763324. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:19:02,936][41694] Avg episode reward: [(0, '4.334')] +[2024-11-08 02:19:04,134][42004] Updated weights for policy 0, policy_version 23216 (0.0029) +[2024-11-08 02:19:07,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6963.3, 300 sec: 6900.7). Total num frames: 95100928. Throughput: 0: 1705.1. Samples: 18770910. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:19:07,934][41694] Avg episode reward: [(0, '4.365')] +[2024-11-08 02:19:12,742][42004] Updated weights for policy 0, policy_version 23226 (0.0029) +[2024-11-08 02:19:12,931][41694] Fps is (10 sec: 5325.5, 60 sec: 6894.9, 300 sec: 6886.9). Total num frames: 95133696. Throughput: 0: 1641.4. Samples: 18779744. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:19:12,933][41694] Avg episode reward: [(0, '4.579')] +[2024-11-08 02:19:17,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6826.6, 300 sec: 6886.8). Total num frames: 95170560. Throughput: 0: 1647.0. Samples: 18785172. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:19:17,935][41694] Avg episode reward: [(0, '4.503')] +[2024-11-08 02:19:18,071][42004] Updated weights for policy 0, policy_version 23236 (0.0040) +[2024-11-08 02:19:22,931][41694] Fps is (10 sec: 7782.4, 60 sec: 6894.9, 300 sec: 6914.7). Total num frames: 95211520. Throughput: 0: 1763.6. Samples: 18796994. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:19:22,933][41694] Avg episode reward: [(0, '4.655')] +[2024-11-08 02:19:23,338][42004] Updated weights for policy 0, policy_version 23246 (0.0023) +[2024-11-08 02:19:27,932][41694] Fps is (10 sec: 7782.5, 60 sec: 6826.7, 300 sec: 6970.1). Total num frames: 95248384. Throughput: 0: 1767.6. Samples: 18808500. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:19:27,934][41694] Avg episode reward: [(0, '4.499')] +[2024-11-08 02:19:28,628][42004] Updated weights for policy 0, policy_version 23256 (0.0030) +[2024-11-08 02:19:32,931][41694] Fps is (10 sec: 7782.3, 60 sec: 7168.0, 300 sec: 6984.1). Total num frames: 95289344. Throughput: 0: 1769.3. Samples: 18814284. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:19:32,933][41694] Avg episode reward: [(0, '4.553')] +[2024-11-08 02:19:34,045][42004] Updated weights for policy 0, policy_version 23266 (0.0027) +[2024-11-08 02:19:37,932][41694] Fps is (10 sec: 7782.4, 60 sec: 7236.3, 300 sec: 6970.1). Total num frames: 95326208. Throughput: 0: 1756.5. Samples: 18825438. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:19:37,934][41694] Avg episode reward: [(0, '4.309')] +[2024-11-08 02:19:37,950][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000023273_95326208.pth... +[2024-11-08 02:19:38,057][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000022863_93646848.pth +[2024-11-08 02:19:42,208][42004] Updated weights for policy 0, policy_version 23276 (0.0028) +[2024-11-08 02:19:42,932][41694] Fps is (10 sec: 5324.7, 60 sec: 6894.9, 300 sec: 6900.7). Total num frames: 95342592. Throughput: 0: 1645.7. Samples: 18831856. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 02:19:42,933][41694] Avg episode reward: [(0, '4.359')] +[2024-11-08 02:19:47,932][41694] Fps is (10 sec: 4915.1, 60 sec: 6758.4, 300 sec: 6900.7). Total num frames: 95375360. Throughput: 0: 1631.5. Samples: 18836740. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 02:19:47,934][41694] Avg episode reward: [(0, '4.339')] +[2024-11-08 02:19:48,006][42004] Updated weights for policy 0, policy_version 23286 (0.0023) +[2024-11-08 02:19:52,931][41694] Fps is (10 sec: 7373.0, 60 sec: 6826.7, 300 sec: 6914.6). Total num frames: 95416320. Throughput: 0: 1719.1. Samples: 18848268. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 02:19:52,934][41694] Avg episode reward: [(0, '4.804')] +[2024-11-08 02:19:53,220][42004] Updated weights for policy 0, policy_version 23296 (0.0033) +[2024-11-08 02:19:57,931][41694] Fps is (10 sec: 8192.2, 60 sec: 6894.9, 300 sec: 6992.1). Total num frames: 95457280. Throughput: 0: 1787.2. Samples: 18860166. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:19:57,933][41694] Avg episode reward: [(0, '4.592')] +[2024-11-08 02:19:58,385][42004] Updated weights for policy 0, policy_version 23306 (0.0036) +[2024-11-08 02:20:02,932][41694] Fps is (10 sec: 7781.9, 60 sec: 6895.0, 300 sec: 6984.0). Total num frames: 95494144. Throughput: 0: 1795.4. Samples: 18865964. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:20:02,934][41694] Avg episode reward: [(0, '4.547')] +[2024-11-08 02:20:03,887][42004] Updated weights for policy 0, policy_version 23316 (0.0041) +[2024-11-08 02:20:07,932][41694] Fps is (10 sec: 7372.6, 60 sec: 7168.0, 300 sec: 6984.0). Total num frames: 95531008. Throughput: 0: 1776.2. Samples: 18876924. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:20:07,934][41694] Avg episode reward: [(0, '4.314')] +[2024-11-08 02:20:09,413][42004] Updated weights for policy 0, policy_version 23326 (0.0041) +[2024-11-08 02:20:12,934][41694] Fps is (10 sec: 6962.1, 60 sec: 7167.7, 300 sec: 6984.0). Total num frames: 95563776. Throughput: 0: 1764.3. Samples: 18887898. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:20:12,935][41694] Avg episode reward: [(0, '4.342')] +[2024-11-08 02:20:17,933][41694] Fps is (10 sec: 4914.7, 60 sec: 6826.6, 300 sec: 6914.6). Total num frames: 95580160. Throughput: 0: 1670.3. Samples: 18889450. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:20:17,935][41694] Avg episode reward: [(0, '4.370')] +[2024-11-08 02:20:18,510][42004] Updated weights for policy 0, policy_version 23336 (0.0035) +[2024-11-08 02:20:22,931][41694] Fps is (10 sec: 4916.3, 60 sec: 6690.1, 300 sec: 6886.8). Total num frames: 95612928. Throughput: 0: 1603.7. Samples: 18897604. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:20:22,934][41694] Avg episode reward: [(0, '4.442')] +[2024-11-08 02:20:24,261][42004] Updated weights for policy 0, policy_version 23346 (0.0033) +[2024-11-08 02:20:27,932][41694] Fps is (10 sec: 6964.0, 60 sec: 6690.1, 300 sec: 6886.8). Total num frames: 95649792. Throughput: 0: 1717.7. Samples: 18909154. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:20:27,936][41694] Avg episode reward: [(0, '4.428')] +[2024-11-08 02:20:29,681][42004] Updated weights for policy 0, policy_version 23356 (0.0026) +[2024-11-08 02:20:32,932][41694] Fps is (10 sec: 7782.3, 60 sec: 6690.1, 300 sec: 6965.5). Total num frames: 95690752. Throughput: 0: 1734.9. Samples: 18914810. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:20:32,933][41694] Avg episode reward: [(0, '4.442')] +[2024-11-08 02:20:34,999][42004] Updated weights for policy 0, policy_version 23366 (0.0031) +[2024-11-08 02:20:37,931][41694] Fps is (10 sec: 7782.5, 60 sec: 6690.1, 300 sec: 6970.1). Total num frames: 95727616. Throughput: 0: 1733.6. Samples: 18926280. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:20:37,933][41694] Avg episode reward: [(0, '4.549')] +[2024-11-08 02:20:40,304][42004] Updated weights for policy 0, policy_version 23376 (0.0021) +[2024-11-08 02:20:42,932][41694] Fps is (10 sec: 7372.7, 60 sec: 7031.5, 300 sec: 6970.2). Total num frames: 95764480. Throughput: 0: 1714.1. Samples: 18937302. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:20:42,936][41694] Avg episode reward: [(0, '4.789')] +[2024-11-08 02:20:46,119][42004] Updated weights for policy 0, policy_version 23386 (0.0024) +[2024-11-08 02:20:49,870][41694] Fps is (10 sec: 5832.3, 60 sec: 6811.4, 300 sec: 6897.0). Total num frames: 95797248. Throughput: 0: 1633.6. Samples: 18942642. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:20:49,872][41694] Avg episode reward: [(0, '4.574')] +[2024-11-08 02:20:52,932][41694] Fps is (10 sec: 5324.8, 60 sec: 6690.1, 300 sec: 6873.0). Total num frames: 95817728. Throughput: 0: 1607.2. Samples: 18949248. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:20:52,935][41694] Avg episode reward: [(0, '4.333')] +[2024-11-08 02:20:54,556][42004] Updated weights for policy 0, policy_version 23396 (0.0032) +[2024-11-08 02:20:57,931][41694] Fps is (10 sec: 7113.7, 60 sec: 6621.9, 300 sec: 6873.0). Total num frames: 95854592. Throughput: 0: 1599.1. Samples: 18959854. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:20:57,933][41694] Avg episode reward: [(0, '4.266')] +[2024-11-08 02:20:59,846][42004] Updated weights for policy 0, policy_version 23406 (0.0036) +[2024-11-08 02:21:02,932][41694] Fps is (10 sec: 7372.6, 60 sec: 6621.9, 300 sec: 6886.8). Total num frames: 95891456. Throughput: 0: 1694.4. Samples: 18965698. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:21:02,934][41694] Avg episode reward: [(0, '4.454')] +[2024-11-08 02:21:05,305][42004] Updated weights for policy 0, policy_version 23416 (0.0023) +[2024-11-08 02:21:07,931][41694] Fps is (10 sec: 7782.4, 60 sec: 6690.2, 300 sec: 6967.0). Total num frames: 95932416. Throughput: 0: 1762.9. Samples: 18976934. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:21:07,933][41694] Avg episode reward: [(0, '4.466')] +[2024-11-08 02:21:10,905][42004] Updated weights for policy 0, policy_version 23426 (0.0030) +[2024-11-08 02:21:12,932][41694] Fps is (10 sec: 7372.9, 60 sec: 6690.4, 300 sec: 6956.3). Total num frames: 95965184. Throughput: 0: 1744.2. Samples: 18987644. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:21:12,934][41694] Avg episode reward: [(0, '4.255')] +[2024-11-08 02:21:16,626][42004] Updated weights for policy 0, policy_version 23436 (0.0031) +[2024-11-08 02:21:17,932][41694] Fps is (10 sec: 6963.1, 60 sec: 7031.6, 300 sec: 6942.4). Total num frames: 96002048. Throughput: 0: 1733.8. Samples: 18992830. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:21:17,940][41694] Avg episode reward: [(0, '4.351')] +[2024-11-08 02:21:24,346][41694] Fps is (10 sec: 5741.7, 60 sec: 6802.9, 300 sec: 6881.6). Total num frames: 96030720. Throughput: 0: 1671.6. Samples: 19003866. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:21:24,354][41694] Avg episode reward: [(0, '4.465')] +[2024-11-08 02:21:24,667][42004] Updated weights for policy 0, policy_version 23446 (0.0035) +[2024-11-08 02:21:27,931][41694] Fps is (10 sec: 5324.9, 60 sec: 6758.4, 300 sec: 6859.1). Total num frames: 96055296. Throughput: 0: 1622.1. Samples: 19010298. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:21:27,934][41694] Avg episode reward: [(0, '4.424')] +[2024-11-08 02:21:30,514][42004] Updated weights for policy 0, policy_version 23456 (0.0040) +[2024-11-08 02:21:32,932][41694] Fps is (10 sec: 7155.2, 60 sec: 6690.0, 300 sec: 6859.1). Total num frames: 96092160. Throughput: 0: 1692.7. Samples: 19015534. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:21:32,934][41694] Avg episode reward: [(0, '4.548')] +[2024-11-08 02:21:35,991][42004] Updated weights for policy 0, policy_version 23466 (0.0025) +[2024-11-08 02:21:37,931][41694] Fps is (10 sec: 7372.7, 60 sec: 6690.1, 300 sec: 6873.0). Total num frames: 96129024. Throughput: 0: 1726.4. Samples: 19026934. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:21:37,934][41694] Avg episode reward: [(0, '4.438')] +[2024-11-08 02:21:38,016][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000023470_96133120.pth... +[2024-11-08 02:21:38,114][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000023066_94478336.pth +[2024-11-08 02:21:41,400][42004] Updated weights for policy 0, policy_version 23476 (0.0026) +[2024-11-08 02:21:42,931][41694] Fps is (10 sec: 7373.5, 60 sec: 6690.1, 300 sec: 6927.4). Total num frames: 96165888. Throughput: 0: 1743.1. Samples: 19038294. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:21:42,934][41694] Avg episode reward: [(0, '4.527')] +[2024-11-08 02:21:46,701][42004] Updated weights for policy 0, policy_version 23486 (0.0026) +[2024-11-08 02:21:47,932][41694] Fps is (10 sec: 7781.8, 60 sec: 7054.6, 300 sec: 6942.4). Total num frames: 96206848. Throughput: 0: 1738.6. Samples: 19043936. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:21:47,934][41694] Avg episode reward: [(0, '4.426')] +[2024-11-08 02:21:52,289][42004] Updated weights for policy 0, policy_version 23496 (0.0028) +[2024-11-08 02:21:52,931][41694] Fps is (10 sec: 7782.4, 60 sec: 7099.7, 300 sec: 6928.5). Total num frames: 96243712. Throughput: 0: 1733.6. Samples: 19054944. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:21:52,934][41694] Avg episode reward: [(0, '4.501')] +[2024-11-08 02:21:58,789][41694] Fps is (10 sec: 5659.0, 60 sec: 6797.7, 300 sec: 6880.7). Total num frames: 96268288. Throughput: 0: 1588.5. Samples: 19060488. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:21:58,793][41694] Avg episode reward: [(0, '4.336')] +[2024-11-08 02:22:00,619][42004] Updated weights for policy 0, policy_version 23506 (0.0028) +[2024-11-08 02:22:02,931][41694] Fps is (10 sec: 4915.2, 60 sec: 6690.2, 300 sec: 6845.2). Total num frames: 96292864. Throughput: 0: 1640.3. Samples: 19066642. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:22:02,934][41694] Avg episode reward: [(0, '4.478')] +[2024-11-08 02:22:06,591][42004] Updated weights for policy 0, policy_version 23516 (0.0037) +[2024-11-08 02:22:07,931][41694] Fps is (10 sec: 6720.5, 60 sec: 6621.9, 300 sec: 6845.2). Total num frames: 96329728. Throughput: 0: 1672.1. Samples: 19076748. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:22:07,934][41694] Avg episode reward: [(0, '4.519')] +[2024-11-08 02:22:11,987][42004] Updated weights for policy 0, policy_version 23526 (0.0029) +[2024-11-08 02:22:12,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6690.1, 300 sec: 6859.1). Total num frames: 96366592. Throughput: 0: 1732.7. Samples: 19088268. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:22:12,933][41694] Avg episode reward: [(0, '4.539')] +[2024-11-08 02:22:17,038][42004] Updated weights for policy 0, policy_version 23536 (0.0026) +[2024-11-08 02:22:17,931][41694] Fps is (10 sec: 7782.4, 60 sec: 6758.4, 300 sec: 6932.5). Total num frames: 96407552. Throughput: 0: 1748.3. Samples: 19094204. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:22:17,934][41694] Avg episode reward: [(0, '4.522')] +[2024-11-08 02:22:22,297][42004] Updated weights for policy 0, policy_version 23546 (0.0023) +[2024-11-08 02:22:22,931][41694] Fps is (10 sec: 8192.1, 60 sec: 7131.3, 300 sec: 6942.4). Total num frames: 96448512. Throughput: 0: 1756.1. Samples: 19105958. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:22:22,933][41694] Avg episode reward: [(0, '4.421')] +[2024-11-08 02:22:27,651][42004] Updated weights for policy 0, policy_version 23556 (0.0029) +[2024-11-08 02:22:27,931][41694] Fps is (10 sec: 7782.4, 60 sec: 7168.0, 300 sec: 6942.4). Total num frames: 96485376. Throughput: 0: 1763.8. Samples: 19117664. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 02:22:27,933][41694] Avg episode reward: [(0, '4.491')] +[2024-11-08 02:22:33,257][41694] Fps is (10 sec: 5553.5, 60 sec: 6857.8, 300 sec: 6865.4). Total num frames: 96505856. Throughput: 0: 1733.4. Samples: 19122504. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 02:22:33,260][41694] Avg episode reward: [(0, '4.495')] +[2024-11-08 02:22:35,975][42004] Updated weights for policy 0, policy_version 23566 (0.0033) +[2024-11-08 02:22:37,932][41694] Fps is (10 sec: 5324.3, 60 sec: 6826.6, 300 sec: 6859.0). Total num frames: 96538624. Throughput: 0: 1649.4. Samples: 19129170. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 02:22:37,940][41694] Avg episode reward: [(0, '4.542')] +[2024-11-08 02:22:41,272][42004] Updated weights for policy 0, policy_version 23576 (0.0023) +[2024-11-08 02:22:42,932][41694] Fps is (10 sec: 7197.1, 60 sec: 6826.6, 300 sec: 6859.1). Total num frames: 96575488. Throughput: 0: 1818.2. Samples: 19140750. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:22:42,937][41694] Avg episode reward: [(0, '4.381')] +[2024-11-08 02:22:46,718][42004] Updated weights for policy 0, policy_version 23586 (0.0038) +[2024-11-08 02:22:47,931][41694] Fps is (10 sec: 7783.0, 60 sec: 6826.7, 300 sec: 6859.1). Total num frames: 96616448. Throughput: 0: 1771.9. Samples: 19146378. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:22:47,934][41694] Avg episode reward: [(0, '4.411')] +[2024-11-08 02:22:52,037][42004] Updated weights for policy 0, policy_version 23596 (0.0026) +[2024-11-08 02:22:52,931][41694] Fps is (10 sec: 7783.0, 60 sec: 6826.7, 300 sec: 6914.6). Total num frames: 96653312. Throughput: 0: 1805.7. Samples: 19158004. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:22:52,933][41694] Avg episode reward: [(0, '4.323')] +[2024-11-08 02:22:57,153][42004] Updated weights for policy 0, policy_version 23606 (0.0032) +[2024-11-08 02:22:57,932][41694] Fps is (10 sec: 7782.3, 60 sec: 7202.7, 300 sec: 6942.4). Total num frames: 96694272. Throughput: 0: 1813.7. Samples: 19169884. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:22:57,935][41694] Avg episode reward: [(0, '4.341')] +[2024-11-08 02:23:02,931][41694] Fps is (10 sec: 7372.8, 60 sec: 7236.3, 300 sec: 6928.5). Total num frames: 96727040. Throughput: 0: 1806.4. Samples: 19175490. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:23:02,933][41694] Avg episode reward: [(0, '4.307')] +[2024-11-08 02:23:03,027][42004] Updated weights for policy 0, policy_version 23616 (0.0029) +[2024-11-08 02:23:07,932][41694] Fps is (10 sec: 5324.8, 60 sec: 6963.2, 300 sec: 6872.9). Total num frames: 96747520. Throughput: 0: 1745.4. Samples: 19184500. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:23:07,937][41694] Avg episode reward: [(0, '4.379')] +[2024-11-08 02:23:11,029][42004] Updated weights for policy 0, policy_version 23626 (0.0033) +[2024-11-08 02:23:12,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6963.2, 300 sec: 6859.1). Total num frames: 96784384. Throughput: 0: 1665.7. Samples: 19192622. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:23:12,934][41694] Avg episode reward: [(0, '4.651')] +[2024-11-08 02:23:16,300][42004] Updated weights for policy 0, policy_version 23636 (0.0036) +[2024-11-08 02:23:17,932][41694] Fps is (10 sec: 7782.1, 60 sec: 6963.1, 300 sec: 6872.9). Total num frames: 96825344. Throughput: 0: 1696.0. Samples: 19198274. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:23:17,934][41694] Avg episode reward: [(0, '4.434')] +[2024-11-08 02:23:21,751][42004] Updated weights for policy 0, policy_version 23646 (0.0036) +[2024-11-08 02:23:22,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6894.9, 300 sec: 6859.1). Total num frames: 96862208. Throughput: 0: 1790.5. Samples: 19209742. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:23:22,934][41694] Avg episode reward: [(0, '4.347')] +[2024-11-08 02:23:26,870][42004] Updated weights for policy 0, policy_version 23656 (0.0022) +[2024-11-08 02:23:27,932][41694] Fps is (10 sec: 7782.6, 60 sec: 6963.2, 300 sec: 6928.5). Total num frames: 96903168. Throughput: 0: 1797.2. Samples: 19221624. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:23:27,934][41694] Avg episode reward: [(0, '4.602')] +[2024-11-08 02:23:32,038][42004] Updated weights for policy 0, policy_version 23666 (0.0033) +[2024-11-08 02:23:32,931][41694] Fps is (10 sec: 7782.5, 60 sec: 7275.8, 300 sec: 6942.4). Total num frames: 96940032. Throughput: 0: 1802.9. Samples: 19227510. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:23:32,943][41694] Avg episode reward: [(0, '4.518')] +[2024-11-08 02:23:37,858][42004] Updated weights for policy 0, policy_version 23676 (0.0049) +[2024-11-08 02:23:37,932][41694] Fps is (10 sec: 7372.9, 60 sec: 7304.6, 300 sec: 6942.4). Total num frames: 96976896. Throughput: 0: 1791.8. Samples: 19238634. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:23:37,933][41694] Avg episode reward: [(0, '4.495')] +[2024-11-08 02:23:37,947][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000023676_96976896.pth... +[2024-11-08 02:23:38,059][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000023273_95326208.pth +[2024-11-08 02:23:42,932][41694] Fps is (10 sec: 5324.8, 60 sec: 6963.3, 300 sec: 6859.1). Total num frames: 96993280. Throughput: 0: 1674.7. Samples: 19245244. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:23:42,935][41694] Avg episode reward: [(0, '4.351')] +[2024-11-08 02:23:45,700][42004] Updated weights for policy 0, policy_version 23686 (0.0044) +[2024-11-08 02:23:47,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6963.2, 300 sec: 6872.9). Total num frames: 97034240. Throughput: 0: 1671.6. Samples: 19250714. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:23:47,934][41694] Avg episode reward: [(0, '4.373')] +[2024-11-08 02:23:50,906][42004] Updated weights for policy 0, policy_version 23696 (0.0020) +[2024-11-08 02:23:52,931][41694] Fps is (10 sec: 7782.5, 60 sec: 6963.2, 300 sec: 6872.9). Total num frames: 97071104. Throughput: 0: 1731.2. Samples: 19262406. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:23:52,933][41694] Avg episode reward: [(0, '4.480')] +[2024-11-08 02:23:56,131][42004] Updated weights for policy 0, policy_version 23706 (0.0042) +[2024-11-08 02:23:57,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6963.2, 300 sec: 6886.9). Total num frames: 97112064. Throughput: 0: 1815.2. Samples: 19274306. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:23:57,934][41694] Avg episode reward: [(0, '4.338')] +[2024-11-08 02:24:01,805][42004] Updated weights for policy 0, policy_version 23716 (0.0025) +[2024-11-08 02:24:02,931][41694] Fps is (10 sec: 7782.4, 60 sec: 7031.5, 300 sec: 6942.4). Total num frames: 97148928. Throughput: 0: 1809.0. Samples: 19279680. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:24:02,935][41694] Avg episode reward: [(0, '4.355')] +[2024-11-08 02:24:06,999][42004] Updated weights for policy 0, policy_version 23726 (0.0031) +[2024-11-08 02:24:07,931][41694] Fps is (10 sec: 7372.9, 60 sec: 7304.5, 300 sec: 6956.3). Total num frames: 97185792. Throughput: 0: 1804.4. Samples: 19290938. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:24:07,939][41694] Avg episode reward: [(0, '4.469')] +[2024-11-08 02:24:12,932][41694] Fps is (10 sec: 6963.1, 60 sec: 7236.3, 300 sec: 6942.4). Total num frames: 97218560. Throughput: 0: 1774.6. Samples: 19301480. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:24:12,933][41694] Avg episode reward: [(0, '4.512')] +[2024-11-08 02:24:13,026][42004] Updated weights for policy 0, policy_version 23736 (0.0032) +[2024-11-08 02:24:17,932][41694] Fps is (10 sec: 5324.7, 60 sec: 6895.0, 300 sec: 6872.9). Total num frames: 97239040. Throughput: 0: 1735.8. Samples: 19305620. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 02:24:17,933][41694] Avg episode reward: [(0, '4.419')] +[2024-11-08 02:24:20,799][42004] Updated weights for policy 0, policy_version 23746 (0.0034) +[2024-11-08 02:24:22,932][41694] Fps is (10 sec: 6144.0, 60 sec: 6963.2, 300 sec: 6886.8). Total num frames: 97280000. Throughput: 0: 1674.9. Samples: 19314004. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 02:24:22,933][41694] Avg episode reward: [(0, '4.494')] +[2024-11-08 02:24:26,031][42004] Updated weights for policy 0, policy_version 23756 (0.0035) +[2024-11-08 02:24:27,931][41694] Fps is (10 sec: 7782.5, 60 sec: 6895.0, 300 sec: 6872.9). Total num frames: 97316864. Throughput: 0: 1786.3. Samples: 19325628. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 02:24:27,933][41694] Avg episode reward: [(0, '4.270')] +[2024-11-08 02:24:31,229][42004] Updated weights for policy 0, policy_version 23766 (0.0022) +[2024-11-08 02:24:32,931][41694] Fps is (10 sec: 7782.4, 60 sec: 6963.2, 300 sec: 6886.8). Total num frames: 97357824. Throughput: 0: 1795.7. Samples: 19331520. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 02:24:32,933][41694] Avg episode reward: [(0, '4.464')] +[2024-11-08 02:24:36,495][42004] Updated weights for policy 0, policy_version 23776 (0.0026) +[2024-11-08 02:24:37,933][41694] Fps is (10 sec: 7780.9, 60 sec: 6963.0, 300 sec: 6956.2). Total num frames: 97394688. Throughput: 0: 1796.1. Samples: 19343232. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:24:37,935][41694] Avg episode reward: [(0, '4.449')] +[2024-11-08 02:24:42,030][42004] Updated weights for policy 0, policy_version 23786 (0.0037) +[2024-11-08 02:24:42,932][41694] Fps is (10 sec: 7372.8, 60 sec: 7304.5, 300 sec: 6970.1). Total num frames: 97431552. Throughput: 0: 1779.8. Samples: 19354398. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:24:42,933][41694] Avg episode reward: [(0, '4.412')] +[2024-11-08 02:24:47,932][41694] Fps is (10 sec: 6964.5, 60 sec: 7168.0, 300 sec: 6942.4). Total num frames: 97464320. Throughput: 0: 1760.7. Samples: 19358912. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:24:47,933][41694] Avg episode reward: [(0, '4.363')] +[2024-11-08 02:24:48,142][42004] Updated weights for policy 0, policy_version 23796 (0.0028) +[2024-11-08 02:24:52,931][41694] Fps is (10 sec: 5324.9, 60 sec: 6894.9, 300 sec: 6872.9). Total num frames: 97484800. Throughput: 0: 1672.3. Samples: 19366192. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:24:52,933][41694] Avg episode reward: [(0, '4.340')] +[2024-11-08 02:24:55,884][42004] Updated weights for policy 0, policy_version 23806 (0.0028) +[2024-11-08 02:24:57,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6826.6, 300 sec: 6873.0). Total num frames: 97521664. Throughput: 0: 1679.9. Samples: 19377074. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:24:57,934][41694] Avg episode reward: [(0, '4.224')] +[2024-11-08 02:25:01,432][42004] Updated weights for policy 0, policy_version 23816 (0.0029) +[2024-11-08 02:25:02,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6826.7, 300 sec: 6873.0). Total num frames: 97558528. Throughput: 0: 1704.0. Samples: 19382302. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:25:02,935][41694] Avg episode reward: [(0, '4.312')] +[2024-11-08 02:25:07,298][42004] Updated weights for policy 0, policy_version 23826 (0.0027) +[2024-11-08 02:25:07,931][41694] Fps is (10 sec: 7373.0, 60 sec: 6826.7, 300 sec: 6886.9). Total num frames: 97595392. Throughput: 0: 1754.1. Samples: 19392940. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:25:07,933][41694] Avg episode reward: [(0, '4.595')] +[2024-11-08 02:25:12,915][42004] Updated weights for policy 0, policy_version 23836 (0.0038) +[2024-11-08 02:25:12,932][41694] Fps is (10 sec: 7372.3, 60 sec: 6894.9, 300 sec: 6956.3). Total num frames: 97632256. Throughput: 0: 1740.1. Samples: 19403934. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:25:12,934][41694] Avg episode reward: [(0, '4.492')] +[2024-11-08 02:25:17,931][41694] Fps is (10 sec: 6553.6, 60 sec: 7031.5, 300 sec: 6942.4). Total num frames: 97660928. Throughput: 0: 1713.3. Samples: 19408620. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:25:17,938][41694] Avg episode reward: [(0, '4.528')] +[2024-11-08 02:25:19,597][42004] Updated weights for policy 0, policy_version 23846 (0.0031) +[2024-11-08 02:25:22,932][41694] Fps is (10 sec: 6144.4, 60 sec: 6894.9, 300 sec: 6928.5). Total num frames: 97693696. Throughput: 0: 1662.9. Samples: 19418058. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:25:22,934][41694] Avg episode reward: [(0, '4.385')] +[2024-11-08 02:25:27,482][42004] Updated weights for policy 0, policy_version 23856 (0.0028) +[2024-11-08 02:25:27,931][41694] Fps is (10 sec: 5324.8, 60 sec: 6621.9, 300 sec: 6859.1). Total num frames: 97714176. Throughput: 0: 1571.4. Samples: 19425110. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:25:27,933][41694] Avg episode reward: [(0, '4.376')] +[2024-11-08 02:25:32,931][41694] Fps is (10 sec: 5734.4, 60 sec: 6553.6, 300 sec: 6859.1). Total num frames: 97751040. Throughput: 0: 1589.2. Samples: 19430426. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:25:32,934][41694] Avg episode reward: [(0, '4.322')] +[2024-11-08 02:25:33,186][42004] Updated weights for policy 0, policy_version 23866 (0.0038) +[2024-11-08 02:25:37,932][41694] Fps is (10 sec: 7781.6, 60 sec: 6622.0, 300 sec: 6872.9). Total num frames: 97792000. Throughput: 0: 1681.0. Samples: 19441840. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:25:37,935][41694] Avg episode reward: [(0, '4.458')] +[2024-11-08 02:25:37,948][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000023875_97792000.pth... +[2024-11-08 02:25:38,058][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000023470_96133120.pth +[2024-11-08 02:25:38,358][42004] Updated weights for policy 0, policy_version 23876 (0.0023) +[2024-11-08 02:25:42,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6621.9, 300 sec: 6932.4). Total num frames: 97828864. Throughput: 0: 1706.9. Samples: 19453884. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:25:42,933][41694] Avg episode reward: [(0, '4.458')] +[2024-11-08 02:25:43,528][42004] Updated weights for policy 0, policy_version 23886 (0.0028) +[2024-11-08 02:25:47,932][41694] Fps is (10 sec: 7783.1, 60 sec: 6758.4, 300 sec: 6956.3). Total num frames: 97869824. Throughput: 0: 1720.6. Samples: 19459730. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:25:47,934][41694] Avg episode reward: [(0, '4.517')] +[2024-11-08 02:25:48,690][42004] Updated weights for policy 0, policy_version 23896 (0.0032) +[2024-11-08 02:25:52,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6963.2, 300 sec: 6942.4). Total num frames: 97902592. Throughput: 0: 1729.4. Samples: 19470764. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:25:52,934][41694] Avg episode reward: [(0, '4.547')] +[2024-11-08 02:25:54,820][42004] Updated weights for policy 0, policy_version 23906 (0.0041) +[2024-11-08 02:26:00,222][41694] Fps is (10 sec: 5665.7, 60 sec: 6707.2, 300 sec: 6888.9). Total num frames: 97939456. Throughput: 0: 1639.1. Samples: 19481446. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:26:00,225][41694] Avg episode reward: [(0, '4.282')] +[2024-11-08 02:26:02,778][42004] Updated weights for policy 0, policy_version 23916 (0.0038) +[2024-11-08 02:26:02,932][41694] Fps is (10 sec: 5734.5, 60 sec: 6690.1, 300 sec: 6872.9). Total num frames: 97959936. Throughput: 0: 1658.0. Samples: 19483230. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:26:02,934][41694] Avg episode reward: [(0, '4.177')] +[2024-11-08 02:26:07,931][41694] Fps is (10 sec: 7437.9, 60 sec: 6690.1, 300 sec: 6886.8). Total num frames: 97996800. Throughput: 0: 1676.0. Samples: 19493476. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:26:07,933][41694] Avg episode reward: [(0, '4.208')] +[2024-11-08 02:26:08,334][42004] Updated weights for policy 0, policy_version 23926 (0.0029) +[2024-11-08 02:26:12,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6622.0, 300 sec: 6872.9). Total num frames: 98029568. Throughput: 0: 1753.9. Samples: 19504034. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:26:12,935][41694] Avg episode reward: [(0, '4.574')] +[2024-11-08 02:26:14,347][42004] Updated weights for policy 0, policy_version 23936 (0.0029) +[2024-11-08 02:26:17,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6758.4, 300 sec: 6934.0). Total num frames: 98066432. Throughput: 0: 1749.0. Samples: 19509130. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:26:17,933][41694] Avg episode reward: [(0, '4.381')] +[2024-11-08 02:26:19,798][42004] Updated weights for policy 0, policy_version 23946 (0.0030) +[2024-11-08 02:26:22,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6826.7, 300 sec: 6942.4). Total num frames: 98103296. Throughput: 0: 1749.9. Samples: 19520582. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:26:22,933][41694] Avg episode reward: [(0, '4.477')] +[2024-11-08 02:26:25,585][42004] Updated weights for policy 0, policy_version 23956 (0.0026) +[2024-11-08 02:26:27,932][41694] Fps is (10 sec: 6963.0, 60 sec: 7031.4, 300 sec: 6928.5). Total num frames: 98136064. Throughput: 0: 1712.3. Samples: 19530936. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:26:27,934][41694] Avg episode reward: [(0, '4.433')] +[2024-11-08 02:26:31,349][42004] Updated weights for policy 0, policy_version 23966 (0.0027) +[2024-11-08 02:26:34,567][41694] Fps is (10 sec: 5632.3, 60 sec: 6778.4, 300 sec: 6876.5). Total num frames: 98168832. Throughput: 0: 1639.6. Samples: 19536192. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:26:34,572][41694] Avg episode reward: [(0, '4.627')] +[2024-11-08 02:26:37,932][41694] Fps is (10 sec: 5734.5, 60 sec: 6690.2, 300 sec: 6872.9). Total num frames: 98193408. Throughput: 0: 1601.6. Samples: 19542838. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 02:26:37,935][41694] Avg episode reward: [(0, '4.369')] +[2024-11-08 02:26:39,258][42004] Updated weights for policy 0, policy_version 23976 (0.0028) +[2024-11-08 02:26:42,931][41694] Fps is (10 sec: 7345.6, 60 sec: 6690.1, 300 sec: 6859.1). Total num frames: 98230272. Throughput: 0: 1691.7. Samples: 19553698. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 02:26:42,933][41694] Avg episode reward: [(0, '4.456')] +[2024-11-08 02:26:44,798][42004] Updated weights for policy 0, policy_version 23986 (0.0030) +[2024-11-08 02:26:47,931][41694] Fps is (10 sec: 7782.5, 60 sec: 6690.1, 300 sec: 6873.0). Total num frames: 98271232. Throughput: 0: 1696.8. Samples: 19559586. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 02:26:47,932][41694] Avg episode reward: [(0, '4.361')] +[2024-11-08 02:26:49,956][42004] Updated weights for policy 0, policy_version 23996 (0.0024) +[2024-11-08 02:26:52,931][41694] Fps is (10 sec: 7782.4, 60 sec: 6758.4, 300 sec: 6934.8). Total num frames: 98308096. Throughput: 0: 1737.5. Samples: 19571662. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:26:52,935][41694] Avg episode reward: [(0, '4.622')] +[2024-11-08 02:26:55,119][42004] Updated weights for policy 0, policy_version 24006 (0.0026) +[2024-11-08 02:26:57,932][41694] Fps is (10 sec: 7372.6, 60 sec: 7026.6, 300 sec: 6956.3). Total num frames: 98344960. Throughput: 0: 1761.3. Samples: 19583292. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:26:57,934][41694] Avg episode reward: [(0, '4.381')] +[2024-11-08 02:27:01,198][42004] Updated weights for policy 0, policy_version 24016 (0.0030) +[2024-11-08 02:27:02,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6963.2, 300 sec: 6942.4). Total num frames: 98377728. Throughput: 0: 1753.6. Samples: 19588042. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:27:02,934][41694] Avg episode reward: [(0, '4.383')] +[2024-11-08 02:27:09,234][41694] Fps is (10 sec: 5436.0, 60 sec: 6681.6, 300 sec: 6884.2). Total num frames: 98406400. Throughput: 0: 1683.7. Samples: 19598542. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:27:09,236][41694] Avg episode reward: [(0, '4.500')] +[2024-11-08 02:27:09,285][42004] Updated weights for policy 0, policy_version 24026 (0.0029) +[2024-11-08 02:27:12,931][41694] Fps is (10 sec: 5734.5, 60 sec: 6758.4, 300 sec: 6872.9). Total num frames: 98435072. Throughput: 0: 1648.1. Samples: 19605100. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:27:12,934][41694] Avg episode reward: [(0, '4.323')] +[2024-11-08 02:27:15,301][42004] Updated weights for policy 0, policy_version 24036 (0.0042) +[2024-11-08 02:27:17,931][41694] Fps is (10 sec: 7064.4, 60 sec: 6690.1, 300 sec: 6845.2). Total num frames: 98467840. Throughput: 0: 1701.5. Samples: 19609974. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:27:17,933][41694] Avg episode reward: [(0, '4.541')] +[2024-11-08 02:27:20,892][42004] Updated weights for policy 0, policy_version 24046 (0.0042) +[2024-11-08 02:27:22,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6690.1, 300 sec: 6845.2). Total num frames: 98504704. Throughput: 0: 1739.1. Samples: 19621096. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:27:22,934][41694] Avg episode reward: [(0, '4.559')] +[2024-11-08 02:27:25,947][42004] Updated weights for policy 0, policy_version 24056 (0.0023) +[2024-11-08 02:27:27,932][41694] Fps is (10 sec: 7782.3, 60 sec: 6826.7, 300 sec: 6922.2). Total num frames: 98545664. Throughput: 0: 1762.4. Samples: 19633004. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:27:27,934][41694] Avg episode reward: [(0, '4.494')] +[2024-11-08 02:27:31,294][42004] Updated weights for policy 0, policy_version 24066 (0.0045) +[2024-11-08 02:27:32,931][41694] Fps is (10 sec: 7782.4, 60 sec: 7088.2, 300 sec: 6928.5). Total num frames: 98582528. Throughput: 0: 1761.4. Samples: 19638850. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:27:32,933][41694] Avg episode reward: [(0, '4.310')] +[2024-11-08 02:27:37,219][42004] Updated weights for policy 0, policy_version 24076 (0.0033) +[2024-11-08 02:27:37,931][41694] Fps is (10 sec: 7372.9, 60 sec: 7099.8, 300 sec: 6928.5). Total num frames: 98619392. Throughput: 0: 1717.8. Samples: 19648964. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:27:37,935][41694] Avg episode reward: [(0, '4.392')] +[2024-11-08 02:27:37,952][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000024077_98619392.pth... +[2024-11-08 02:27:38,055][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000023676_96976896.pth +[2024-11-08 02:27:43,891][41694] Fps is (10 sec: 5605.8, 60 sec: 6786.3, 300 sec: 6850.7). Total num frames: 98643968. Throughput: 0: 1554.4. Samples: 19654732. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:27:43,893][41694] Avg episode reward: [(0, '4.246')] +[2024-11-08 02:27:45,087][42004] Updated weights for policy 0, policy_version 24086 (0.0023) +[2024-11-08 02:27:47,932][41694] Fps is (10 sec: 5324.8, 60 sec: 6690.1, 300 sec: 6845.2). Total num frames: 98672640. Throughput: 0: 1634.2. Samples: 19661582. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:27:47,933][41694] Avg episode reward: [(0, '4.390')] +[2024-11-08 02:27:50,708][42004] Updated weights for policy 0, policy_version 24096 (0.0025) +[2024-11-08 02:27:52,932][41694] Fps is (10 sec: 7249.3, 60 sec: 6690.1, 300 sec: 6831.3). Total num frames: 98709504. Throughput: 0: 1689.7. Samples: 19672376. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:27:52,934][41694] Avg episode reward: [(0, '4.295')] +[2024-11-08 02:27:55,943][42004] Updated weights for policy 0, policy_version 24106 (0.0024) +[2024-11-08 02:27:57,931][41694] Fps is (10 sec: 7782.5, 60 sec: 6758.4, 300 sec: 6859.1). Total num frames: 98750464. Throughput: 0: 1759.3. Samples: 19684270. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:27:57,934][41694] Avg episode reward: [(0, '4.535')] +[2024-11-08 02:28:01,539][42004] Updated weights for policy 0, policy_version 24116 (0.0033) +[2024-11-08 02:28:02,931][41694] Fps is (10 sec: 7782.8, 60 sec: 6826.7, 300 sec: 6914.6). Total num frames: 98787328. Throughput: 0: 1773.6. Samples: 19689788. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:28:02,933][41694] Avg episode reward: [(0, '4.491')] +[2024-11-08 02:28:07,058][42004] Updated weights for policy 0, policy_version 24126 (0.0035) +[2024-11-08 02:28:07,931][41694] Fps is (10 sec: 7372.7, 60 sec: 7117.8, 300 sec: 6914.6). Total num frames: 98824192. Throughput: 0: 1764.7. Samples: 19700508. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:28:07,932][41694] Avg episode reward: [(0, '4.524')] +[2024-11-08 02:28:12,668][42004] Updated weights for policy 0, policy_version 24136 (0.0028) +[2024-11-08 02:28:12,932][41694] Fps is (10 sec: 7372.6, 60 sec: 7099.7, 300 sec: 6900.7). Total num frames: 98861056. Throughput: 0: 1746.0. Samples: 19711572. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:28:12,936][41694] Avg episode reward: [(0, '4.381')] +[2024-11-08 02:28:18,475][41694] Fps is (10 sec: 5827.1, 60 sec: 6900.6, 300 sec: 6846.4). Total num frames: 98885632. Throughput: 0: 1722.2. Samples: 19717284. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:28:18,477][41694] Avg episode reward: [(0, '4.389')] +[2024-11-08 02:28:20,514][42004] Updated weights for policy 0, policy_version 24146 (0.0049) +[2024-11-08 02:28:22,931][41694] Fps is (10 sec: 5734.5, 60 sec: 6894.9, 300 sec: 6831.3). Total num frames: 98918400. Throughput: 0: 1660.0. Samples: 19723666. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:28:22,937][41694] Avg episode reward: [(0, '4.567')] +[2024-11-08 02:28:26,134][42004] Updated weights for policy 0, policy_version 24156 (0.0026) +[2024-11-08 02:28:27,931][41694] Fps is (10 sec: 7363.7, 60 sec: 6826.7, 300 sec: 6831.3). Total num frames: 98955264. Throughput: 0: 1822.5. Samples: 19734994. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:28:27,933][41694] Avg episode reward: [(0, '4.505')] +[2024-11-08 02:28:31,480][42004] Updated weights for policy 0, policy_version 24166 (0.0027) +[2024-11-08 02:28:32,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6826.7, 300 sec: 6831.3). Total num frames: 98992128. Throughput: 0: 1757.6. Samples: 19740674. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:28:32,934][41694] Avg episode reward: [(0, '4.499')] +[2024-11-08 02:28:36,852][42004] Updated weights for policy 0, policy_version 24176 (0.0028) +[2024-11-08 02:28:37,931][41694] Fps is (10 sec: 7782.4, 60 sec: 6894.9, 300 sec: 6914.6). Total num frames: 99033088. Throughput: 0: 1773.9. Samples: 19752200. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 02:28:37,933][41694] Avg episode reward: [(0, '4.287')] +[2024-11-08 02:28:42,599][42004] Updated weights for policy 0, policy_version 24186 (0.0031) +[2024-11-08 02:28:42,932][41694] Fps is (10 sec: 7372.7, 60 sec: 7145.8, 300 sec: 6886.8). Total num frames: 99065856. Throughput: 0: 1751.8. Samples: 19763102. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 02:28:42,934][41694] Avg episode reward: [(0, '4.527')] +[2024-11-08 02:28:47,893][42004] Updated weights for policy 0, policy_version 24196 (0.0029) +[2024-11-08 02:28:47,931][41694] Fps is (10 sec: 7372.8, 60 sec: 7236.3, 300 sec: 6900.7). Total num frames: 99106816. Throughput: 0: 1747.8. Samples: 19768438. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 02:28:47,933][41694] Avg episode reward: [(0, '4.445')] +[2024-11-08 02:28:52,932][41694] Fps is (10 sec: 6553.5, 60 sec: 7031.5, 300 sec: 6845.2). Total num frames: 99131392. Throughput: 0: 1759.6. Samples: 19779690. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:28:52,934][41694] Avg episode reward: [(0, '4.348')] +[2024-11-08 02:28:55,032][42004] Updated weights for policy 0, policy_version 24206 (0.0029) +[2024-11-08 02:28:57,931][41694] Fps is (10 sec: 6144.0, 60 sec: 6963.2, 300 sec: 6845.2). Total num frames: 99168256. Throughput: 0: 1706.9. Samples: 19788380. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:28:57,933][41694] Avg episode reward: [(0, '4.557')] +[2024-11-08 02:29:00,239][42004] Updated weights for policy 0, policy_version 24216 (0.0028) +[2024-11-08 02:29:02,932][41694] Fps is (10 sec: 7372.9, 60 sec: 6963.2, 300 sec: 6845.2). Total num frames: 99205120. Throughput: 0: 1731.9. Samples: 19794276. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:29:02,933][41694] Avg episode reward: [(0, '4.268')] +[2024-11-08 02:29:05,589][42004] Updated weights for policy 0, policy_version 24226 (0.0039) +[2024-11-08 02:29:07,931][41694] Fps is (10 sec: 7782.4, 60 sec: 7031.5, 300 sec: 6873.0). Total num frames: 99246080. Throughput: 0: 1828.5. Samples: 19805950. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:29:07,934][41694] Avg episode reward: [(0, '4.445')] +[2024-11-08 02:29:10,786][42004] Updated weights for policy 0, policy_version 24236 (0.0029) +[2024-11-08 02:29:12,934][41694] Fps is (10 sec: 7780.8, 60 sec: 7031.2, 300 sec: 6928.4). Total num frames: 99282944. Throughput: 0: 1830.8. Samples: 19817382. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:29:12,935][41694] Avg episode reward: [(0, '4.234')] +[2024-11-08 02:29:16,850][42004] Updated weights for policy 0, policy_version 24246 (0.0040) +[2024-11-08 02:29:17,932][41694] Fps is (10 sec: 7372.7, 60 sec: 7302.5, 300 sec: 6914.6). Total num frames: 99319808. Throughput: 0: 1813.8. Samples: 19822294. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:29:17,934][41694] Avg episode reward: [(0, '4.366')] +[2024-11-08 02:29:22,070][42004] Updated weights for policy 0, policy_version 24256 (0.0028) +[2024-11-08 02:29:22,931][41694] Fps is (10 sec: 7374.4, 60 sec: 7304.5, 300 sec: 6914.6). Total num frames: 99356672. Throughput: 0: 1811.3. Samples: 19833710. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:29:22,933][41694] Avg episode reward: [(0, '4.294')] +[2024-11-08 02:29:27,931][41694] Fps is (10 sec: 5734.5, 60 sec: 7031.5, 300 sec: 6845.2). Total num frames: 99377152. Throughput: 0: 1723.5. Samples: 19840660. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:29:27,933][41694] Avg episode reward: [(0, '4.262')] +[2024-11-08 02:29:30,043][42004] Updated weights for policy 0, policy_version 24266 (0.0037) +[2024-11-08 02:29:32,932][41694] Fps is (10 sec: 5733.9, 60 sec: 7031.4, 300 sec: 6845.2). Total num frames: 99414016. Throughput: 0: 1721.5. Samples: 19845906. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:29:32,935][41694] Avg episode reward: [(0, '4.442')] +[2024-11-08 02:29:35,472][42004] Updated weights for policy 0, policy_version 24276 (0.0027) +[2024-11-08 02:29:37,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6963.2, 300 sec: 6845.2). Total num frames: 99450880. Throughput: 0: 1726.3. Samples: 19857372. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:29:37,933][41694] Avg episode reward: [(0, '4.298')] +[2024-11-08 02:29:38,039][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000024281_99454976.pth... +[2024-11-08 02:29:38,176][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000023875_97792000.pth +[2024-11-08 02:29:40,782][42004] Updated weights for policy 0, policy_version 24286 (0.0027) +[2024-11-08 02:29:42,932][41694] Fps is (10 sec: 7373.0, 60 sec: 7031.4, 300 sec: 6859.1). Total num frames: 99487744. Throughput: 0: 1788.3. Samples: 19868856. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:29:42,942][41694] Avg episode reward: [(0, '4.486')] +[2024-11-08 02:29:46,389][42004] Updated weights for policy 0, policy_version 24296 (0.0029) +[2024-11-08 02:29:47,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6963.2, 300 sec: 6914.6). Total num frames: 99524608. Throughput: 0: 1776.8. Samples: 19874230. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:29:47,934][41694] Avg episode reward: [(0, '4.650')] +[2024-11-08 02:29:52,470][42004] Updated weights for policy 0, policy_version 24306 (0.0037) +[2024-11-08 02:29:52,937][41694] Fps is (10 sec: 6960.0, 60 sec: 7099.2, 300 sec: 6900.6). Total num frames: 99557376. Throughput: 0: 1749.9. Samples: 19884706. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:29:52,938][41694] Avg episode reward: [(0, '4.440')] +[2024-11-08 02:29:57,626][42004] Updated weights for policy 0, policy_version 24316 (0.0026) +[2024-11-08 02:29:57,932][41694] Fps is (10 sec: 7372.7, 60 sec: 7168.0, 300 sec: 6914.6). Total num frames: 99598336. Throughput: 0: 1744.9. Samples: 19895898. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:29:57,933][41694] Avg episode reward: [(0, '4.599')] +[2024-11-08 02:30:02,932][41694] Fps is (10 sec: 6147.0, 60 sec: 6894.9, 300 sec: 6859.1). Total num frames: 99618816. Throughput: 0: 1734.8. Samples: 19900362. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:30:02,934][41694] Avg episode reward: [(0, '4.370')] +[2024-11-08 02:30:05,635][42004] Updated weights for policy 0, policy_version 24326 (0.0033) +[2024-11-08 02:30:07,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6826.6, 300 sec: 6859.1). Total num frames: 99655680. Throughput: 0: 1649.3. Samples: 19907930. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:30:07,934][41694] Avg episode reward: [(0, '4.342')] +[2024-11-08 02:30:11,080][42004] Updated weights for policy 0, policy_version 24336 (0.0037) +[2024-11-08 02:30:12,931][41694] Fps is (10 sec: 7373.0, 60 sec: 6826.9, 300 sec: 6886.8). Total num frames: 99692544. Throughput: 0: 1749.1. Samples: 19919368. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:30:12,933][41694] Avg episode reward: [(0, '4.414')] +[2024-11-08 02:30:17,194][42004] Updated weights for policy 0, policy_version 24346 (0.0032) +[2024-11-08 02:30:17,932][41694] Fps is (10 sec: 6963.3, 60 sec: 6758.4, 300 sec: 6886.8). Total num frames: 99725312. Throughput: 0: 1740.7. Samples: 19924238. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:30:17,933][41694] Avg episode reward: [(0, '4.302')] +[2024-11-08 02:30:22,912][42004] Updated weights for policy 0, policy_version 24356 (0.0033) +[2024-11-08 02:30:22,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6758.4, 300 sec: 6942.4). Total num frames: 99762176. Throughput: 0: 1714.9. Samples: 19934544. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:30:22,933][41694] Avg episode reward: [(0, '4.515')] +[2024-11-08 02:30:27,931][41694] Fps is (10 sec: 7372.9, 60 sec: 7031.5, 300 sec: 6942.4). Total num frames: 99799040. Throughput: 0: 1705.7. Samples: 19945610. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:30:27,933][41694] Avg episode reward: [(0, '4.384')] +[2024-11-08 02:30:28,411][42004] Updated weights for policy 0, policy_version 24366 (0.0030) +[2024-11-08 02:30:32,932][41694] Fps is (10 sec: 7372.6, 60 sec: 7031.5, 300 sec: 6928.5). Total num frames: 99835904. Throughput: 0: 1712.9. Samples: 19951310. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:30:32,934][41694] Avg episode reward: [(0, '4.410')] +[2024-11-08 02:30:33,710][42004] Updated weights for policy 0, policy_version 24376 (0.0027) +[2024-11-08 02:30:37,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6758.4, 300 sec: 6872.9). Total num frames: 99856384. Throughput: 0: 1651.8. Samples: 19959030. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:30:37,934][41694] Avg episode reward: [(0, '4.552')] +[2024-11-08 02:30:41,469][42004] Updated weights for policy 0, policy_version 24386 (0.0022) +[2024-11-08 02:30:42,932][41694] Fps is (10 sec: 5734.0, 60 sec: 6758.4, 300 sec: 6859.0). Total num frames: 99893248. Throughput: 0: 1643.7. Samples: 19969868. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:30:42,935][41694] Avg episode reward: [(0, '4.437')] +[2024-11-08 02:30:46,858][42004] Updated weights for policy 0, policy_version 24396 (0.0058) +[2024-11-08 02:30:47,931][41694] Fps is (10 sec: 7782.5, 60 sec: 6826.7, 300 sec: 6886.8). Total num frames: 99934208. Throughput: 0: 1666.6. Samples: 19975358. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:30:47,933][41694] Avg episode reward: [(0, '4.223')] +[2024-11-08 02:30:51,895][42004] Updated weights for policy 0, policy_version 24406 (0.0028) +[2024-11-08 02:30:52,931][41694] Fps is (10 sec: 7783.0, 60 sec: 6895.5, 300 sec: 6940.7). Total num frames: 99971072. Throughput: 0: 1767.7. Samples: 19987478. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:30:52,934][41694] Avg episode reward: [(0, '4.353')] +[2024-11-08 02:30:57,761][42004] Updated weights for policy 0, policy_version 24416 (0.0040) +[2024-11-08 02:30:57,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6826.7, 300 sec: 6942.4). Total num frames: 100007936. Throughput: 0: 1753.1. Samples: 19998256. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:30:57,933][41694] Avg episode reward: [(0, '4.583')] +[2024-11-08 02:31:02,932][41694] Fps is (10 sec: 7372.4, 60 sec: 7099.7, 300 sec: 6942.4). Total num frames: 100044800. Throughput: 0: 1768.3. Samples: 20003814. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:31:02,934][41694] Avg episode reward: [(0, '4.453')] +[2024-11-08 02:31:03,368][42004] Updated weights for policy 0, policy_version 24426 (0.0021) +[2024-11-08 02:31:07,932][41694] Fps is (10 sec: 7372.6, 60 sec: 7099.7, 300 sec: 6956.3). Total num frames: 100081664. Throughput: 0: 1784.1. Samples: 20014828. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:31:07,933][41694] Avg episode reward: [(0, '4.635')] +[2024-11-08 02:31:11,273][42004] Updated weights for policy 0, policy_version 24436 (0.0035) +[2024-11-08 02:31:12,932][41694] Fps is (10 sec: 5324.8, 60 sec: 6758.3, 300 sec: 6886.8). Total num frames: 100098048. Throughput: 0: 1672.3. Samples: 20020864. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:31:12,935][41694] Avg episode reward: [(0, '4.472')] +[2024-11-08 02:31:17,359][42004] Updated weights for policy 0, policy_version 24446 (0.0029) +[2024-11-08 02:31:17,932][41694] Fps is (10 sec: 5324.5, 60 sec: 6826.6, 300 sec: 6886.8). Total num frames: 100134912. Throughput: 0: 1653.1. Samples: 20025702. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:31:17,935][41694] Avg episode reward: [(0, '4.406')] +[2024-11-08 02:31:22,739][42004] Updated weights for policy 0, policy_version 24456 (0.0022) +[2024-11-08 02:31:22,932][41694] Fps is (10 sec: 7373.2, 60 sec: 6826.6, 300 sec: 6900.7). Total num frames: 100171776. Throughput: 0: 1739.8. Samples: 20037320. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:31:22,933][41694] Avg episode reward: [(0, '4.371')] +[2024-11-08 02:31:27,932][41694] Fps is (10 sec: 7373.3, 60 sec: 6826.7, 300 sec: 6953.2). Total num frames: 100208640. Throughput: 0: 1756.2. Samples: 20048896. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:31:27,935][41694] Avg episode reward: [(0, '4.442')] +[2024-11-08 02:31:28,097][42004] Updated weights for policy 0, policy_version 24466 (0.0027) +[2024-11-08 02:31:32,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6826.7, 300 sec: 6956.3). Total num frames: 100245504. Throughput: 0: 1748.4. Samples: 20054038. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:31:32,933][41694] Avg episode reward: [(0, '4.305')] +[2024-11-08 02:31:33,787][42004] Updated weights for policy 0, policy_version 24476 (0.0030) +[2024-11-08 02:31:37,932][41694] Fps is (10 sec: 6962.9, 60 sec: 7031.4, 300 sec: 6942.4). Total num frames: 100278272. Throughput: 0: 1716.2. Samples: 20064708. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:31:37,937][41694] Avg episode reward: [(0, '4.249')] +[2024-11-08 02:31:38,104][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000024483_100282368.pth... +[2024-11-08 02:31:38,216][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000024077_98619392.pth +[2024-11-08 02:31:39,831][42004] Updated weights for policy 0, policy_version 24486 (0.0033) +[2024-11-08 02:31:42,931][41694] Fps is (10 sec: 6963.2, 60 sec: 7031.6, 300 sec: 6928.5). Total num frames: 100315136. Throughput: 0: 1710.3. Samples: 20075220. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:31:42,933][41694] Avg episode reward: [(0, '4.202')] +[2024-11-08 02:31:47,804][42004] Updated weights for policy 0, policy_version 24496 (0.0024) +[2024-11-08 02:31:47,931][41694] Fps is (10 sec: 5734.7, 60 sec: 6690.1, 300 sec: 6872.9). Total num frames: 100335616. Throughput: 0: 1627.3. Samples: 20077042. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:31:47,933][41694] Avg episode reward: [(0, '4.287')] +[2024-11-08 02:31:52,931][41694] Fps is (10 sec: 5734.4, 60 sec: 6690.1, 300 sec: 6873.0). Total num frames: 100372480. Throughput: 0: 1617.1. Samples: 20087598. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:31:52,933][41694] Avg episode reward: [(0, '4.396')] +[2024-11-08 02:31:53,104][42004] Updated weights for policy 0, policy_version 24506 (0.0029) +[2024-11-08 02:31:57,932][41694] Fps is (10 sec: 7782.3, 60 sec: 6758.4, 300 sec: 6900.7). Total num frames: 100413440. Throughput: 0: 1740.9. Samples: 20099204. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:31:57,933][41694] Avg episode reward: [(0, '4.287')] +[2024-11-08 02:31:58,424][42004] Updated weights for policy 0, policy_version 24516 (0.0029) +[2024-11-08 02:32:02,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6690.2, 300 sec: 6945.3). Total num frames: 100446208. Throughput: 0: 1760.6. Samples: 20104930. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:32:02,934][41694] Avg episode reward: [(0, '4.344')] +[2024-11-08 02:32:04,635][42004] Updated weights for policy 0, policy_version 24526 (0.0029) +[2024-11-08 02:32:07,932][41694] Fps is (10 sec: 6553.4, 60 sec: 6621.9, 300 sec: 6928.5). Total num frames: 100478976. Throughput: 0: 1716.4. Samples: 20114560. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:32:07,934][41694] Avg episode reward: [(0, '4.340')] +[2024-11-08 02:32:10,153][42004] Updated weights for policy 0, policy_version 24536 (0.0028) +[2024-11-08 02:32:12,932][41694] Fps is (10 sec: 7372.7, 60 sec: 7031.5, 300 sec: 6956.3). Total num frames: 100519936. Throughput: 0: 1714.3. Samples: 20126038. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:32:12,935][41694] Avg episode reward: [(0, '4.317')] +[2024-11-08 02:32:15,477][42004] Updated weights for policy 0, policy_version 24546 (0.0023) +[2024-11-08 02:32:19,978][41694] Fps is (10 sec: 6460.4, 60 sec: 6799.6, 300 sec: 6908.3). Total num frames: 100556800. Throughput: 0: 1651.9. Samples: 20131754. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:32:19,980][41694] Avg episode reward: [(0, '4.521')] +[2024-11-08 02:32:22,932][41694] Fps is (10 sec: 5734.5, 60 sec: 6758.4, 300 sec: 6886.8). Total num frames: 100577280. Throughput: 0: 1648.1. Samples: 20138872. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:32:22,934][41694] Avg episode reward: [(0, '4.444')] +[2024-11-08 02:32:23,164][42004] Updated weights for policy 0, policy_version 24556 (0.0028) +[2024-11-08 02:32:27,932][41694] Fps is (10 sec: 7724.7, 60 sec: 6826.6, 300 sec: 6900.7). Total num frames: 100618240. Throughput: 0: 1673.7. Samples: 20150536. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:32:27,933][41694] Avg episode reward: [(0, '4.324')] +[2024-11-08 02:32:28,404][42004] Updated weights for policy 0, policy_version 24566 (0.0027) +[2024-11-08 02:32:32,932][41694] Fps is (10 sec: 7782.5, 60 sec: 6826.7, 300 sec: 6900.7). Total num frames: 100655104. Throughput: 0: 1764.9. Samples: 20156464. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:32:32,933][41694] Avg episode reward: [(0, '4.438')] +[2024-11-08 02:32:33,599][42004] Updated weights for policy 0, policy_version 24576 (0.0028) +[2024-11-08 02:32:37,931][41694] Fps is (10 sec: 7373.1, 60 sec: 6895.0, 300 sec: 6965.0). Total num frames: 100691968. Throughput: 0: 1772.4. Samples: 20167354. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:32:37,933][41694] Avg episode reward: [(0, '4.602')] +[2024-11-08 02:32:39,595][42004] Updated weights for policy 0, policy_version 24586 (0.0024) +[2024-11-08 02:32:42,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6894.9, 300 sec: 6970.1). Total num frames: 100728832. Throughput: 0: 1755.9. Samples: 20178220. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:32:42,933][41694] Avg episode reward: [(0, '4.853')] +[2024-11-08 02:32:45,028][42004] Updated weights for policy 0, policy_version 24596 (0.0031) +[2024-11-08 02:32:47,932][41694] Fps is (10 sec: 7372.7, 60 sec: 7168.0, 300 sec: 6970.1). Total num frames: 100765696. Throughput: 0: 1753.7. Samples: 20183848. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:32:47,934][41694] Avg episode reward: [(0, '4.520')] +[2024-11-08 02:32:50,514][42004] Updated weights for policy 0, policy_version 24606 (0.0030) +[2024-11-08 02:32:54,584][41694] Fps is (10 sec: 5975.6, 60 sec: 6909.4, 300 sec: 6903.7). Total num frames: 100798464. Throughput: 0: 1726.8. Samples: 20195118. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:32:54,588][41694] Avg episode reward: [(0, '4.556')] +[2024-11-08 02:32:57,931][41694] Fps is (10 sec: 5734.5, 60 sec: 6826.7, 300 sec: 6900.7). Total num frames: 100823040. Throughput: 0: 1687.7. Samples: 20201986. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:32:57,934][41694] Avg episode reward: [(0, '4.396')] +[2024-11-08 02:32:58,151][42004] Updated weights for policy 0, policy_version 24616 (0.0026) +[2024-11-08 02:33:02,931][41694] Fps is (10 sec: 7360.5, 60 sec: 6894.9, 300 sec: 6900.7). Total num frames: 100859904. Throughput: 0: 1771.3. Samples: 20207836. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:33:02,934][41694] Avg episode reward: [(0, '4.354')] +[2024-11-08 02:33:03,699][42004] Updated weights for policy 0, policy_version 24626 (0.0029) +[2024-11-08 02:33:07,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6963.2, 300 sec: 6900.7). Total num frames: 100896768. Throughput: 0: 1776.5. Samples: 20218814. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:33:07,934][41694] Avg episode reward: [(0, '4.499')] +[2024-11-08 02:33:09,446][42004] Updated weights for policy 0, policy_version 24636 (0.0043) +[2024-11-08 02:33:12,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6826.7, 300 sec: 6941.3). Total num frames: 100929536. Throughput: 0: 1746.9. Samples: 20229146. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:33:12,934][41694] Avg episode reward: [(0, '4.462')] +[2024-11-08 02:33:15,201][42004] Updated weights for policy 0, policy_version 24646 (0.0037) +[2024-11-08 02:33:17,932][41694] Fps is (10 sec: 7372.9, 60 sec: 7138.4, 300 sec: 6956.3). Total num frames: 100970496. Throughput: 0: 1739.3. Samples: 20234734. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:33:17,933][41694] Avg episode reward: [(0, '4.439')] +[2024-11-08 02:33:20,994][42004] Updated weights for policy 0, policy_version 24656 (0.0021) +[2024-11-08 02:33:22,932][41694] Fps is (10 sec: 7372.7, 60 sec: 7099.7, 300 sec: 6942.4). Total num frames: 101003264. Throughput: 0: 1732.0. Samples: 20245292. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:33:22,934][41694] Avg episode reward: [(0, '4.530')] +[2024-11-08 02:33:26,506][42004] Updated weights for policy 0, policy_version 24666 (0.0033) +[2024-11-08 02:33:28,835][41694] Fps is (10 sec: 5634.7, 60 sec: 6792.6, 300 sec: 6893.5). Total num frames: 101031936. Throughput: 0: 1580.5. Samples: 20250772. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:33:28,837][41694] Avg episode reward: [(0, '4.432')] +[2024-11-08 02:33:32,931][41694] Fps is (10 sec: 6144.1, 60 sec: 6826.7, 300 sec: 6886.8). Total num frames: 101064704. Throughput: 0: 1657.4. Samples: 20258432. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:33:32,933][41694] Avg episode reward: [(0, '4.343')] +[2024-11-08 02:33:33,762][42004] Updated weights for policy 0, policy_version 24676 (0.0034) +[2024-11-08 02:33:37,939][41694] Fps is (10 sec: 7648.5, 60 sec: 6825.8, 300 sec: 6900.5). Total num frames: 101101568. Throughput: 0: 1726.6. Samples: 20269976. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:33:37,944][41694] Avg episode reward: [(0, '4.400')] +[2024-11-08 02:33:37,964][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000024683_101101568.pth... +[2024-11-08 02:33:38,070][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000024281_99454976.pth +[2024-11-08 02:33:39,160][42004] Updated weights for policy 0, policy_version 24686 (0.0023) +[2024-11-08 02:33:42,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6826.7, 300 sec: 6886.8). Total num frames: 101138432. Throughput: 0: 1756.8. Samples: 20281042. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:33:42,935][41694] Avg episode reward: [(0, '4.515')] +[2024-11-08 02:33:45,220][42004] Updated weights for policy 0, policy_version 24696 (0.0032) +[2024-11-08 02:33:47,931][41694] Fps is (10 sec: 6968.8, 60 sec: 6758.4, 300 sec: 6914.6). Total num frames: 101171200. Throughput: 0: 1737.4. Samples: 20286020. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:33:47,933][41694] Avg episode reward: [(0, '4.379')] +[2024-11-08 02:33:50,796][42004] Updated weights for policy 0, policy_version 24706 (0.0033) +[2024-11-08 02:33:52,931][41694] Fps is (10 sec: 6963.2, 60 sec: 7020.0, 300 sec: 6914.6). Total num frames: 101208064. Throughput: 0: 1737.3. Samples: 20296994. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:33:52,933][41694] Avg episode reward: [(0, '4.299')] +[2024-11-08 02:33:56,146][42004] Updated weights for policy 0, policy_version 24716 (0.0034) +[2024-11-08 02:33:57,931][41694] Fps is (10 sec: 7782.4, 60 sec: 7099.7, 300 sec: 6928.5). Total num frames: 101249024. Throughput: 0: 1759.8. Samples: 20308338. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:33:57,934][41694] Avg episode reward: [(0, '4.524')] +[2024-11-08 02:34:03,285][41694] Fps is (10 sec: 5934.1, 60 sec: 6786.7, 300 sec: 6850.8). Total num frames: 101269504. Throughput: 0: 1738.1. Samples: 20313562. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:34:03,287][41694] Avg episode reward: [(0, '4.288')] +[2024-11-08 02:34:04,025][42004] Updated weights for policy 0, policy_version 24726 (0.0035) +[2024-11-08 02:34:07,931][41694] Fps is (10 sec: 5324.9, 60 sec: 6758.4, 300 sec: 6845.2). Total num frames: 101302272. Throughput: 0: 1665.0. Samples: 20320218. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-11-08 02:34:07,934][41694] Avg episode reward: [(0, '4.266')] +[2024-11-08 02:34:09,718][42004] Updated weights for policy 0, policy_version 24736 (0.0028) +[2024-11-08 02:34:12,932][41694] Fps is (10 sec: 7218.4, 60 sec: 6826.6, 300 sec: 6845.2). Total num frames: 101339136. Throughput: 0: 1825.3. Samples: 20331260. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-11-08 02:34:12,933][41694] Avg episode reward: [(0, '4.270')] +[2024-11-08 02:34:15,388][42004] Updated weights for policy 0, policy_version 24746 (0.0036) +[2024-11-08 02:34:17,932][41694] Fps is (10 sec: 7372.5, 60 sec: 6758.4, 300 sec: 6845.2). Total num frames: 101376000. Throughput: 0: 1738.9. Samples: 20336682. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-11-08 02:34:17,934][41694] Avg episode reward: [(0, '4.375')] +[2024-11-08 02:34:21,423][42004] Updated weights for policy 0, policy_version 24756 (0.0028) +[2024-11-08 02:34:22,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6758.4, 300 sec: 6886.8). Total num frames: 101408768. Throughput: 0: 1709.0. Samples: 20346870. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:34:22,934][41694] Avg episode reward: [(0, '4.341')] +[2024-11-08 02:34:26,864][42004] Updated weights for policy 0, policy_version 24766 (0.0030) +[2024-11-08 02:34:27,932][41694] Fps is (10 sec: 7372.8, 60 sec: 7069.7, 300 sec: 6900.7). Total num frames: 101449728. Throughput: 0: 1716.1. Samples: 20358266. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:34:27,934][41694] Avg episode reward: [(0, '4.545')] +[2024-11-08 02:34:32,049][42004] Updated weights for policy 0, policy_version 24776 (0.0025) +[2024-11-08 02:34:32,931][41694] Fps is (10 sec: 7782.7, 60 sec: 7031.5, 300 sec: 6900.7). Total num frames: 101486592. Throughput: 0: 1731.7. Samples: 20363948. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:34:32,933][41694] Avg episode reward: [(0, '4.474')] +[2024-11-08 02:34:37,932][41694] Fps is (10 sec: 5734.5, 60 sec: 6759.3, 300 sec: 6845.2). Total num frames: 101507072. Throughput: 0: 1720.4. Samples: 20374414. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:34:37,933][41694] Avg episode reward: [(0, '4.344')] +[2024-11-08 02:34:39,831][42004] Updated weights for policy 0, policy_version 24786 (0.0026) +[2024-11-08 02:34:42,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6758.4, 300 sec: 6845.2). Total num frames: 101543936. Throughput: 0: 1644.2. Samples: 20382328. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 02:34:42,934][41694] Avg episode reward: [(0, '4.154')] +[2024-11-08 02:34:45,409][42004] Updated weights for policy 0, policy_version 24796 (0.0029) +[2024-11-08 02:34:47,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6826.6, 300 sec: 6859.2). Total num frames: 101580800. Throughput: 0: 1663.7. Samples: 20387840. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 02:34:47,934][41694] Avg episode reward: [(0, '4.414')] +[2024-11-08 02:34:51,590][42004] Updated weights for policy 0, policy_version 24806 (0.0030) +[2024-11-08 02:34:52,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6758.4, 300 sec: 6831.3). Total num frames: 101613568. Throughput: 0: 1727.2. Samples: 20397942. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 02:34:52,934][41694] Avg episode reward: [(0, '4.490')] +[2024-11-08 02:34:57,190][42004] Updated weights for policy 0, policy_version 24816 (0.0040) +[2024-11-08 02:34:57,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6690.1, 300 sec: 6886.8). Total num frames: 101650432. Throughput: 0: 1725.5. Samples: 20408908. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:34:57,933][41694] Avg episode reward: [(0, '4.293')] +[2024-11-08 02:35:02,877][42004] Updated weights for policy 0, policy_version 24826 (0.0038) +[2024-11-08 02:35:02,932][41694] Fps is (10 sec: 7372.7, 60 sec: 7004.5, 300 sec: 6886.8). Total num frames: 101687296. Throughput: 0: 1729.8. Samples: 20414524. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:35:02,933][41694] Avg episode reward: [(0, '4.431')] +[2024-11-08 02:35:07,932][41694] Fps is (10 sec: 7372.1, 60 sec: 7031.3, 300 sec: 6886.8). Total num frames: 101724160. Throughput: 0: 1744.7. Samples: 20425384. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:35:07,936][41694] Avg episode reward: [(0, '4.622')] +[2024-11-08 02:35:08,154][42004] Updated weights for policy 0, policy_version 24836 (0.0035) +[2024-11-08 02:35:12,931][41694] Fps is (10 sec: 5734.5, 60 sec: 6758.4, 300 sec: 6845.2). Total num frames: 101744640. Throughput: 0: 1654.9. Samples: 20432734. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:35:12,932][41694] Avg episode reward: [(0, '4.701')] +[2024-11-08 02:35:16,410][42004] Updated weights for policy 0, policy_version 24846 (0.0037) +[2024-11-08 02:35:17,931][41694] Fps is (10 sec: 5325.3, 60 sec: 6690.2, 300 sec: 6831.3). Total num frames: 101777408. Throughput: 0: 1636.2. Samples: 20437578. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:35:17,933][41694] Avg episode reward: [(0, '4.436')] +[2024-11-08 02:35:22,235][42004] Updated weights for policy 0, policy_version 24856 (0.0030) +[2024-11-08 02:35:22,933][41694] Fps is (10 sec: 6552.9, 60 sec: 6690.1, 300 sec: 6817.4). Total num frames: 101810176. Throughput: 0: 1629.4. Samples: 20447740. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:35:22,936][41694] Avg episode reward: [(0, '4.498')] +[2024-11-08 02:35:27,932][41694] Fps is (10 sec: 6962.7, 60 sec: 6621.8, 300 sec: 6817.4). Total num frames: 101847040. Throughput: 0: 1682.3. Samples: 20458032. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:35:27,935][41694] Avg episode reward: [(0, '4.479')] +[2024-11-08 02:35:28,214][42004] Updated weights for policy 0, policy_version 24866 (0.0025) +[2024-11-08 02:35:32,931][41694] Fps is (10 sec: 7373.6, 60 sec: 6621.9, 300 sec: 6873.0). Total num frames: 101883904. Throughput: 0: 1687.7. Samples: 20463788. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:35:32,934][41694] Avg episode reward: [(0, '4.657')] +[2024-11-08 02:35:33,442][42004] Updated weights for policy 0, policy_version 24876 (0.0034) +[2024-11-08 02:35:37,932][41694] Fps is (10 sec: 7782.8, 60 sec: 6963.2, 300 sec: 6886.8). Total num frames: 101924864. Throughput: 0: 1723.0. Samples: 20475478. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:35:37,934][41694] Avg episode reward: [(0, '4.449')] +[2024-11-08 02:35:37,944][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000024884_101924864.pth... +[2024-11-08 02:35:38,038][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000024483_100282368.pth +[2024-11-08 02:35:38,849][42004] Updated weights for policy 0, policy_version 24886 (0.0026) +[2024-11-08 02:35:42,932][41694] Fps is (10 sec: 7782.3, 60 sec: 6963.2, 300 sec: 6872.9). Total num frames: 101961728. Throughput: 0: 1733.6. Samples: 20486920. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:35:42,933][41694] Avg episode reward: [(0, '4.218')] +[2024-11-08 02:35:44,261][42004] Updated weights for policy 0, policy_version 24896 (0.0031) +[2024-11-08 02:35:47,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6690.1, 300 sec: 6817.4). Total num frames: 101982208. Throughput: 0: 1708.8. Samples: 20491422. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:35:47,934][41694] Avg episode reward: [(0, '4.468')] +[2024-11-08 02:35:51,797][42004] Updated weights for policy 0, policy_version 24906 (0.0028) +[2024-11-08 02:35:52,931][41694] Fps is (10 sec: 6144.1, 60 sec: 6826.7, 300 sec: 6831.3). Total num frames: 102023168. Throughput: 0: 1652.4. Samples: 20499740. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:35:52,933][41694] Avg episode reward: [(0, '4.318')] +[2024-11-08 02:35:57,376][42004] Updated weights for policy 0, policy_version 24916 (0.0032) +[2024-11-08 02:35:57,932][41694] Fps is (10 sec: 7372.9, 60 sec: 6758.4, 300 sec: 6817.4). Total num frames: 102055936. Throughput: 0: 1739.4. Samples: 20511006. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:35:57,933][41694] Avg episode reward: [(0, '4.480')] +[2024-11-08 02:36:02,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6758.4, 300 sec: 6817.4). Total num frames: 102092800. Throughput: 0: 1744.3. Samples: 20516072. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:36:02,933][41694] Avg episode reward: [(0, '4.468')] +[2024-11-08 02:36:03,288][42004] Updated weights for policy 0, policy_version 24926 (0.0033) +[2024-11-08 02:36:07,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6758.5, 300 sec: 6886.8). Total num frames: 102129664. Throughput: 0: 1756.0. Samples: 20526760. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:36:07,934][41694] Avg episode reward: [(0, '4.410')] +[2024-11-08 02:36:08,841][42004] Updated weights for policy 0, policy_version 24936 (0.0042) +[2024-11-08 02:36:12,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6963.2, 300 sec: 6873.0). Total num frames: 102162432. Throughput: 0: 1757.9. Samples: 20537138. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:36:12,934][41694] Avg episode reward: [(0, '4.337')] +[2024-11-08 02:36:14,887][42004] Updated weights for policy 0, policy_version 24946 (0.0027) +[2024-11-08 02:36:17,932][41694] Fps is (10 sec: 6963.2, 60 sec: 7031.4, 300 sec: 6872.9). Total num frames: 102199296. Throughput: 0: 1749.9. Samples: 20542532. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:36:17,934][41694] Avg episode reward: [(0, '4.464')] +[2024-11-08 02:36:22,441][42004] Updated weights for policy 0, policy_version 24956 (0.0037) +[2024-11-08 02:36:22,932][41694] Fps is (10 sec: 6144.0, 60 sec: 6895.0, 300 sec: 6831.3). Total num frames: 102223872. Throughput: 0: 1654.4. Samples: 20549926. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 02:36:22,934][41694] Avg episode reward: [(0, '4.377')] +[2024-11-08 02:36:27,931][41694] Fps is (10 sec: 5734.6, 60 sec: 6826.8, 300 sec: 6817.4). Total num frames: 102256640. Throughput: 0: 1643.8. Samples: 20560890. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 02:36:27,933][41694] Avg episode reward: [(0, '4.319')] +[2024-11-08 02:36:27,962][42004] Updated weights for policy 0, policy_version 24966 (0.0045) +[2024-11-08 02:36:32,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6826.7, 300 sec: 6831.3). Total num frames: 102293504. Throughput: 0: 1660.1. Samples: 20566126. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 02:36:32,933][41694] Avg episode reward: [(0, '4.533')] +[2024-11-08 02:36:33,935][42004] Updated weights for policy 0, policy_version 24976 (0.0036) +[2024-11-08 02:36:37,932][41694] Fps is (10 sec: 7372.6, 60 sec: 6758.4, 300 sec: 6831.3). Total num frames: 102330368. Throughput: 0: 1710.2. Samples: 20576700. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 02:36:37,933][41694] Avg episode reward: [(0, '4.446')] +[2024-11-08 02:36:39,376][42004] Updated weights for policy 0, policy_version 24986 (0.0029) +[2024-11-08 02:36:42,931][41694] Fps is (10 sec: 6963.1, 60 sec: 6690.1, 300 sec: 6872.9). Total num frames: 102363136. Throughput: 0: 1703.8. Samples: 20587678. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:36:42,935][41694] Avg episode reward: [(0, '4.540')] +[2024-11-08 02:36:45,212][42004] Updated weights for policy 0, policy_version 24996 (0.0028) +[2024-11-08 02:36:47,931][41694] Fps is (10 sec: 7372.9, 60 sec: 7031.5, 300 sec: 6886.8). Total num frames: 102404096. Throughput: 0: 1714.2. Samples: 20593212. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:36:47,934][41694] Avg episode reward: [(0, '4.586')] +[2024-11-08 02:36:50,577][42004] Updated weights for policy 0, policy_version 25006 (0.0030) +[2024-11-08 02:36:52,931][41694] Fps is (10 sec: 7782.5, 60 sec: 6963.2, 300 sec: 6873.0). Total num frames: 102440960. Throughput: 0: 1731.5. Samples: 20604676. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:36:52,933][41694] Avg episode reward: [(0, '4.733')] +[2024-11-08 02:36:57,932][41694] Fps is (10 sec: 5734.2, 60 sec: 6758.4, 300 sec: 6831.3). Total num frames: 102461440. Throughput: 0: 1660.0. Samples: 20611838. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:36:57,933][41694] Avg episode reward: [(0, '4.496')] +[2024-11-08 02:36:58,129][42004] Updated weights for policy 0, policy_version 25016 (0.0028) +[2024-11-08 02:37:02,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6758.4, 300 sec: 6845.2). Total num frames: 102498304. Throughput: 0: 1666.7. Samples: 20617532. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:37:02,934][41694] Avg episode reward: [(0, '4.406')] +[2024-11-08 02:37:03,868][42004] Updated weights for policy 0, policy_version 25026 (0.0031) +[2024-11-08 02:37:07,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6690.1, 300 sec: 6817.4). Total num frames: 102531072. Throughput: 0: 1723.1. Samples: 20627464. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:37:07,934][41694] Avg episode reward: [(0, '4.614')] +[2024-11-08 02:37:09,802][42004] Updated weights for policy 0, policy_version 25036 (0.0030) +[2024-11-08 02:37:12,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6758.4, 300 sec: 6865.0). Total num frames: 102567936. Throughput: 0: 1724.3. Samples: 20638484. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:37:12,933][41694] Avg episode reward: [(0, '4.611')] +[2024-11-08 02:37:15,166][42004] Updated weights for policy 0, policy_version 25046 (0.0033) +[2024-11-08 02:37:17,931][41694] Fps is (10 sec: 7373.0, 60 sec: 6758.4, 300 sec: 6873.0). Total num frames: 102604800. Throughput: 0: 1736.1. Samples: 20644252. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:37:17,933][41694] Avg episode reward: [(0, '4.217')] +[2024-11-08 02:37:20,651][42004] Updated weights for policy 0, policy_version 25056 (0.0032) +[2024-11-08 02:37:22,932][41694] Fps is (10 sec: 7782.2, 60 sec: 7031.4, 300 sec: 6872.9). Total num frames: 102645760. Throughput: 0: 1753.0. Samples: 20655586. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:37:22,935][41694] Avg episode reward: [(0, '4.548')] +[2024-11-08 02:37:25,945][42004] Updated weights for policy 0, policy_version 25066 (0.0024) +[2024-11-08 02:37:30,126][41694] Fps is (10 sec: 6381.9, 60 sec: 6849.2, 300 sec: 6822.2). Total num frames: 102682624. Throughput: 0: 1682.5. Samples: 20667084. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:37:30,127][41694] Avg episode reward: [(0, '4.598')] +[2024-11-08 02:37:32,932][41694] Fps is (10 sec: 5734.2, 60 sec: 6826.6, 300 sec: 6817.4). Total num frames: 102703104. Throughput: 0: 1672.5. Samples: 20668476. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:37:32,935][41694] Avg episode reward: [(0, '4.562')] +[2024-11-08 02:37:33,846][42004] Updated weights for policy 0, policy_version 25076 (0.0028) +[2024-11-08 02:37:37,932][41694] Fps is (10 sec: 6821.5, 60 sec: 6758.4, 300 sec: 6803.5). Total num frames: 102735872. Throughput: 0: 1657.2. Samples: 20679252. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:37:37,934][41694] Avg episode reward: [(0, '4.416')] +[2024-11-08 02:37:37,997][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000025083_102739968.pth... +[2024-11-08 02:37:38,147][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000024683_101101568.pth +[2024-11-08 02:37:40,038][42004] Updated weights for policy 0, policy_version 25086 (0.0035) +[2024-11-08 02:37:42,932][41694] Fps is (10 sec: 6554.1, 60 sec: 6758.4, 300 sec: 6789.6). Total num frames: 102768640. Throughput: 0: 1709.5. Samples: 20688766. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:37:42,933][41694] Avg episode reward: [(0, '4.440')] +[2024-11-08 02:37:45,752][42004] Updated weights for policy 0, policy_version 25096 (0.0039) +[2024-11-08 02:37:47,932][41694] Fps is (10 sec: 7372.9, 60 sec: 6758.4, 300 sec: 6855.8). Total num frames: 102809600. Throughput: 0: 1709.2. Samples: 20694448. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:37:47,934][41694] Avg episode reward: [(0, '4.382')] +[2024-11-08 02:37:51,072][42004] Updated weights for policy 0, policy_version 25106 (0.0023) +[2024-11-08 02:37:52,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6758.4, 300 sec: 6859.1). Total num frames: 102846464. Throughput: 0: 1745.0. Samples: 20705990. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:37:52,934][41694] Avg episode reward: [(0, '4.326')] +[2024-11-08 02:37:56,442][42004] Updated weights for policy 0, policy_version 25116 (0.0028) +[2024-11-08 02:37:57,932][41694] Fps is (10 sec: 7372.7, 60 sec: 7031.5, 300 sec: 6859.1). Total num frames: 102883328. Throughput: 0: 1755.1. Samples: 20717464. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:37:57,940][41694] Avg episode reward: [(0, '4.370')] +[2024-11-08 02:38:02,188][42004] Updated weights for policy 0, policy_version 25126 (0.0028) +[2024-11-08 02:38:04,608][41694] Fps is (10 sec: 5963.5, 60 sec: 6773.9, 300 sec: 6806.5). Total num frames: 102916096. Throughput: 0: 1679.6. Samples: 20722652. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:38:04,611][41694] Avg episode reward: [(0, '4.521')] +[2024-11-08 02:38:07,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6826.7, 300 sec: 6817.4). Total num frames: 102940672. Throughput: 0: 1645.2. Samples: 20729620. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:38:07,934][41694] Avg episode reward: [(0, '4.425')] +[2024-11-08 02:38:09,822][42004] Updated weights for policy 0, policy_version 25136 (0.0026) +[2024-11-08 02:38:12,931][41694] Fps is (10 sec: 6889.3, 60 sec: 6758.4, 300 sec: 6789.6). Total num frames: 102973440. Throughput: 0: 1713.8. Samples: 20740446. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:38:12,934][41694] Avg episode reward: [(0, '4.428')] +[2024-11-08 02:38:15,934][42004] Updated weights for policy 0, policy_version 25146 (0.0027) +[2024-11-08 02:38:17,931][41694] Fps is (10 sec: 6963.4, 60 sec: 6758.4, 300 sec: 6803.5). Total num frames: 103010304. Throughput: 0: 1711.1. Samples: 20745472. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:38:17,933][41694] Avg episode reward: [(0, '4.613')] +[2024-11-08 02:38:21,517][42004] Updated weights for policy 0, policy_version 25156 (0.0026) +[2024-11-08 02:38:22,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6690.2, 300 sec: 6852.3). Total num frames: 103047168. Throughput: 0: 1714.5. Samples: 20756404. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:38:22,933][41694] Avg episode reward: [(0, '4.391')] +[2024-11-08 02:38:27,008][42004] Updated weights for policy 0, policy_version 25166 (0.0025) +[2024-11-08 02:38:27,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6944.1, 300 sec: 6845.2). Total num frames: 103084032. Throughput: 0: 1752.6. Samples: 20767632. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:38:27,934][41694] Avg episode reward: [(0, '4.694')] +[2024-11-08 02:38:32,156][42004] Updated weights for policy 0, policy_version 25176 (0.0027) +[2024-11-08 02:38:32,932][41694] Fps is (10 sec: 7782.3, 60 sec: 7031.5, 300 sec: 6859.2). Total num frames: 103124992. Throughput: 0: 1757.7. Samples: 20773546. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:38:32,934][41694] Avg episode reward: [(0, '4.433')] +[2024-11-08 02:38:39,122][41694] Fps is (10 sec: 6222.4, 60 sec: 6827.7, 300 sec: 6803.8). Total num frames: 103153664. Throughput: 0: 1714.7. Samples: 20785194. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:38:39,124][41694] Avg episode reward: [(0, '4.469')] +[2024-11-08 02:38:39,784][42004] Updated weights for policy 0, policy_version 25186 (0.0024) +[2024-11-08 02:38:42,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6894.9, 300 sec: 6817.4). Total num frames: 103182336. Throughput: 0: 1661.1. Samples: 20792212. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:38:42,939][41694] Avg episode reward: [(0, '4.536')] +[2024-11-08 02:38:45,328][42004] Updated weights for policy 0, policy_version 25196 (0.0032) +[2024-11-08 02:38:47,931][41694] Fps is (10 sec: 7439.4, 60 sec: 6826.7, 300 sec: 6817.4). Total num frames: 103219200. Throughput: 0: 1733.1. Samples: 20797736. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:38:47,933][41694] Avg episode reward: [(0, '4.516')] +[2024-11-08 02:38:51,225][42004] Updated weights for policy 0, policy_version 25206 (0.0030) +[2024-11-08 02:38:52,932][41694] Fps is (10 sec: 7372.9, 60 sec: 6826.7, 300 sec: 6803.5). Total num frames: 103256064. Throughput: 0: 1743.4. Samples: 20808074. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:38:52,934][41694] Avg episode reward: [(0, '4.428')] +[2024-11-08 02:38:56,641][42004] Updated weights for policy 0, policy_version 25216 (0.0027) +[2024-11-08 02:38:57,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6826.7, 300 sec: 6867.3). Total num frames: 103292928. Throughput: 0: 1760.1. Samples: 20819652. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 02:38:57,935][41694] Avg episode reward: [(0, '4.483')] +[2024-11-08 02:39:02,189][42004] Updated weights for policy 0, policy_version 25226 (0.0025) +[2024-11-08 02:39:02,931][41694] Fps is (10 sec: 7372.9, 60 sec: 7093.1, 300 sec: 6872.9). Total num frames: 103329792. Throughput: 0: 1768.4. Samples: 20825048. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 02:39:02,933][41694] Avg episode reward: [(0, '4.453')] +[2024-11-08 02:39:07,544][42004] Updated weights for policy 0, policy_version 25236 (0.0032) +[2024-11-08 02:39:07,931][41694] Fps is (10 sec: 7373.0, 60 sec: 7099.8, 300 sec: 6873.0). Total num frames: 103366656. Throughput: 0: 1773.4. Samples: 20836206. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 02:39:07,935][41694] Avg episode reward: [(0, '4.376')] +[2024-11-08 02:39:13,623][41694] Fps is (10 sec: 6129.7, 60 sec: 6951.4, 300 sec: 6829.2). Total num frames: 103395328. Throughput: 0: 1629.2. Samples: 20842072. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 02:39:13,625][41694] Avg episode reward: [(0, '4.386')] +[2024-11-08 02:39:15,327][42004] Updated weights for policy 0, policy_version 25246 (0.0031) +[2024-11-08 02:39:17,933][41694] Fps is (10 sec: 5733.8, 60 sec: 6894.8, 300 sec: 6831.3). Total num frames: 103424000. Throughput: 0: 1675.3. Samples: 20848936. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:39:17,935][41694] Avg episode reward: [(0, '4.473')] +[2024-11-08 02:39:21,248][42004] Updated weights for policy 0, policy_version 25256 (0.0029) +[2024-11-08 02:39:22,932][41694] Fps is (10 sec: 6600.3, 60 sec: 6826.7, 300 sec: 6803.5). Total num frames: 103456768. Throughput: 0: 1691.7. Samples: 20859308. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:39:22,934][41694] Avg episode reward: [(0, '4.473')] +[2024-11-08 02:39:26,913][42004] Updated weights for policy 0, policy_version 25266 (0.0028) +[2024-11-08 02:39:27,931][41694] Fps is (10 sec: 6964.0, 60 sec: 6826.7, 300 sec: 6803.5). Total num frames: 103493632. Throughput: 0: 1733.5. Samples: 20870218. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:39:27,933][41694] Avg episode reward: [(0, '4.285')] +[2024-11-08 02:39:32,037][42004] Updated weights for policy 0, policy_version 25276 (0.0022) +[2024-11-08 02:39:32,931][41694] Fps is (10 sec: 7782.5, 60 sec: 6826.7, 300 sec: 6873.0). Total num frames: 103534592. Throughput: 0: 1741.0. Samples: 20876082. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:39:32,933][41694] Avg episode reward: [(0, '4.372')] +[2024-11-08 02:39:37,336][42004] Updated weights for policy 0, policy_version 25286 (0.0023) +[2024-11-08 02:39:37,931][41694] Fps is (10 sec: 8192.0, 60 sec: 7173.8, 300 sec: 6886.8). Total num frames: 103575552. Throughput: 0: 1772.3. Samples: 20887828. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:39:37,933][41694] Avg episode reward: [(0, '4.403')] +[2024-11-08 02:39:37,943][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000025287_103575552.pth... +[2024-11-08 02:39:38,046][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000024884_101924864.pth +[2024-11-08 02:39:42,928][42004] Updated weights for policy 0, policy_version 25296 (0.0043) +[2024-11-08 02:39:42,931][41694] Fps is (10 sec: 7782.4, 60 sec: 7168.0, 300 sec: 6886.8). Total num frames: 103612416. Throughput: 0: 1763.3. Samples: 20899002. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:39:42,933][41694] Avg episode reward: [(0, '4.661')] +[2024-11-08 02:39:48,134][41694] Fps is (10 sec: 5620.4, 60 sec: 6871.7, 300 sec: 6840.5). Total num frames: 103632896. Throughput: 0: 1758.7. Samples: 20904546. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:39:48,136][41694] Avg episode reward: [(0, '4.706')] +[2024-11-08 02:39:50,737][42004] Updated weights for policy 0, policy_version 25306 (0.0029) +[2024-11-08 02:39:52,932][41694] Fps is (10 sec: 5324.4, 60 sec: 6826.6, 300 sec: 6831.3). Total num frames: 103665664. Throughput: 0: 1672.9. Samples: 20911486. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:39:52,934][41694] Avg episode reward: [(0, '4.419')] +[2024-11-08 02:39:56,751][42004] Updated weights for policy 0, policy_version 25316 (0.0027) +[2024-11-08 02:39:57,932][41694] Fps is (10 sec: 7107.2, 60 sec: 6826.7, 300 sec: 6831.3). Total num frames: 103702528. Throughput: 0: 1793.3. Samples: 20921532. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:39:57,937][41694] Avg episode reward: [(0, '4.545')] +[2024-11-08 02:40:02,289][42004] Updated weights for policy 0, policy_version 25326 (0.0027) +[2024-11-08 02:40:02,932][41694] Fps is (10 sec: 7373.1, 60 sec: 6826.6, 300 sec: 6831.3). Total num frames: 103739392. Throughput: 0: 1734.3. Samples: 20926976. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:40:02,934][41694] Avg episode reward: [(0, '4.457')] +[2024-11-08 02:40:07,644][42004] Updated weights for policy 0, policy_version 25336 (0.0027) +[2024-11-08 02:40:07,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6826.6, 300 sec: 6886.8). Total num frames: 103776256. Throughput: 0: 1753.9. Samples: 20938232. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 02:40:07,934][41694] Avg episode reward: [(0, '4.434')] +[2024-11-08 02:40:12,932][41694] Fps is (10 sec: 7372.5, 60 sec: 7044.3, 300 sec: 6900.7). Total num frames: 103813120. Throughput: 0: 1763.8. Samples: 20949592. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 02:40:12,938][41694] Avg episode reward: [(0, '4.455')] +[2024-11-08 02:40:13,378][42004] Updated weights for policy 0, policy_version 25346 (0.0029) +[2024-11-08 02:40:17,931][41694] Fps is (10 sec: 6963.4, 60 sec: 7031.6, 300 sec: 6900.7). Total num frames: 103845888. Throughput: 0: 1737.9. Samples: 20954288. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 02:40:17,933][41694] Avg episode reward: [(0, '4.446')] +[2024-11-08 02:40:19,513][42004] Updated weights for policy 0, policy_version 25356 (0.0029) +[2024-11-08 02:40:22,931][41694] Fps is (10 sec: 4915.6, 60 sec: 6758.4, 300 sec: 6831.3). Total num frames: 103862272. Throughput: 0: 1683.1. Samples: 20963568. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 02:40:22,933][41694] Avg episode reward: [(0, '4.359')] +[2024-11-08 02:40:27,634][42004] Updated weights for policy 0, policy_version 25366 (0.0024) +[2024-11-08 02:40:27,932][41694] Fps is (10 sec: 5324.3, 60 sec: 6758.3, 300 sec: 6831.3). Total num frames: 103899136. Throughput: 0: 1606.1. Samples: 20971276. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:40:27,934][41694] Avg episode reward: [(0, '4.656')] +[2024-11-08 02:40:32,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6690.1, 300 sec: 6817.4). Total num frames: 103936000. Throughput: 0: 1598.5. Samples: 20976152. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:40:32,933][41694] Avg episode reward: [(0, '4.510')] +[2024-11-08 02:40:33,377][42004] Updated weights for policy 0, policy_version 25376 (0.0038) +[2024-11-08 02:40:37,932][41694] Fps is (10 sec: 7373.4, 60 sec: 6621.9, 300 sec: 6817.4). Total num frames: 103972864. Throughput: 0: 1684.6. Samples: 20987290. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:40:37,935][41694] Avg episode reward: [(0, '4.505')] +[2024-11-08 02:40:38,875][42004] Updated weights for policy 0, policy_version 25386 (0.0031) +[2024-11-08 02:40:42,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6621.9, 300 sec: 6873.0). Total num frames: 104009728. Throughput: 0: 1713.4. Samples: 20998636. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:40:42,933][41694] Avg episode reward: [(0, '4.438')] +[2024-11-08 02:40:44,405][42004] Updated weights for policy 0, policy_version 25396 (0.0022) +[2024-11-08 02:40:47,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6918.3, 300 sec: 6859.1). Total num frames: 104046592. Throughput: 0: 1716.7. Samples: 21004228. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:40:47,935][41694] Avg episode reward: [(0, '4.159')] +[2024-11-08 02:40:49,768][42004] Updated weights for policy 0, policy_version 25406 (0.0026) +[2024-11-08 02:40:52,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6963.3, 300 sec: 6872.9). Total num frames: 104083456. Throughput: 0: 1719.6. Samples: 21015616. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:40:52,933][41694] Avg episode reward: [(0, '4.350')] +[2024-11-08 02:40:57,431][42004] Updated weights for policy 0, policy_version 25416 (0.0025) +[2024-11-08 02:40:57,931][41694] Fps is (10 sec: 5734.5, 60 sec: 6690.2, 300 sec: 6817.4). Total num frames: 104103936. Throughput: 0: 1627.5. Samples: 21022830. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:40:57,933][41694] Avg episode reward: [(0, '4.323')] +[2024-11-08 02:41:02,931][41694] Fps is (10 sec: 5324.9, 60 sec: 6621.9, 300 sec: 6803.5). Total num frames: 104136704. Throughput: 0: 1639.1. Samples: 21028046. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:41:02,933][41694] Avg episode reward: [(0, '4.285')] +[2024-11-08 02:41:03,516][42004] Updated weights for policy 0, policy_version 25426 (0.0027) +[2024-11-08 02:41:07,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6621.9, 300 sec: 6817.4). Total num frames: 104173568. Throughput: 0: 1655.6. Samples: 21038072. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:41:07,933][41694] Avg episode reward: [(0, '4.553')] +[2024-11-08 02:41:09,058][42004] Updated weights for policy 0, policy_version 25436 (0.0031) +[2024-11-08 02:41:12,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6621.9, 300 sec: 6817.4). Total num frames: 104210432. Throughput: 0: 1720.1. Samples: 21048680. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:41:12,933][41694] Avg episode reward: [(0, '4.395')] +[2024-11-08 02:41:15,141][42004] Updated weights for policy 0, policy_version 25446 (0.0031) +[2024-11-08 02:41:17,931][41694] Fps is (10 sec: 7372.7, 60 sec: 6690.1, 300 sec: 6859.1). Total num frames: 104247296. Throughput: 0: 1729.7. Samples: 21053990. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:41:17,933][41694] Avg episode reward: [(0, '4.277')] +[2024-11-08 02:41:20,672][42004] Updated weights for policy 0, policy_version 25456 (0.0031) +[2024-11-08 02:41:22,931][41694] Fps is (10 sec: 7372.9, 60 sec: 7031.5, 300 sec: 6872.9). Total num frames: 104284160. Throughput: 0: 1732.4. Samples: 21065248. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:41:22,933][41694] Avg episode reward: [(0, '4.442')] +[2024-11-08 02:41:25,969][42004] Updated weights for policy 0, policy_version 25466 (0.0025) +[2024-11-08 02:41:27,931][41694] Fps is (10 sec: 7372.8, 60 sec: 7031.6, 300 sec: 6872.9). Total num frames: 104321024. Throughput: 0: 1735.8. Samples: 21076748. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:41:27,933][41694] Avg episode reward: [(0, '4.526')] +[2024-11-08 02:41:32,931][41694] Fps is (10 sec: 5734.4, 60 sec: 6758.4, 300 sec: 6817.4). Total num frames: 104341504. Throughput: 0: 1704.2. Samples: 21080916. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:41:32,934][41694] Avg episode reward: [(0, '4.505')] +[2024-11-08 02:41:33,809][42004] Updated weights for policy 0, policy_version 25476 (0.0031) +[2024-11-08 02:41:37,932][41694] Fps is (10 sec: 5324.7, 60 sec: 6690.1, 300 sec: 6817.4). Total num frames: 104374272. Throughput: 0: 1613.6. Samples: 21088226. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:41:37,934][41694] Avg episode reward: [(0, '4.371')] +[2024-11-08 02:41:37,943][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000025482_104374272.pth... +[2024-11-08 02:41:38,075][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000025083_102739968.pth +[2024-11-08 02:41:39,964][42004] Updated weights for policy 0, policy_version 25486 (0.0026) +[2024-11-08 02:41:42,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6690.1, 300 sec: 6803.5). Total num frames: 104411136. Throughput: 0: 1691.5. Samples: 21098948. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:41:42,934][41694] Avg episode reward: [(0, '4.392')] +[2024-11-08 02:41:45,574][42004] Updated weights for policy 0, policy_version 25496 (0.0030) +[2024-11-08 02:41:47,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6690.1, 300 sec: 6803.5). Total num frames: 104448000. Throughput: 0: 1697.9. Samples: 21104450. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:41:47,933][41694] Avg episode reward: [(0, '4.408')] +[2024-11-08 02:41:50,997][42004] Updated weights for policy 0, policy_version 25506 (0.0028) +[2024-11-08 02:41:52,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6690.1, 300 sec: 6859.1). Total num frames: 104484864. Throughput: 0: 1724.7. Samples: 21115682. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:41:52,934][41694] Avg episode reward: [(0, '4.347')] +[2024-11-08 02:41:56,383][42004] Updated weights for policy 0, policy_version 25516 (0.0028) +[2024-11-08 02:41:57,941][41694] Fps is (10 sec: 7365.6, 60 sec: 6962.0, 300 sec: 6858.8). Total num frames: 104521728. Throughput: 0: 1743.3. Samples: 21127146. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:41:57,944][41694] Avg episode reward: [(0, '4.506')] +[2024-11-08 02:42:02,086][42004] Updated weights for policy 0, policy_version 25526 (0.0045) +[2024-11-08 02:42:02,931][41694] Fps is (10 sec: 7372.9, 60 sec: 7031.5, 300 sec: 6873.0). Total num frames: 104558592. Throughput: 0: 1745.1. Samples: 21132518. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:42:02,933][41694] Avg episode reward: [(0, '4.389')] +[2024-11-08 02:42:07,932][41694] Fps is (10 sec: 5740.0, 60 sec: 6758.4, 300 sec: 6817.4). Total num frames: 104579072. Throughput: 0: 1649.2. Samples: 21139462. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:42:07,934][41694] Avg episode reward: [(0, '4.476')] +[2024-11-08 02:42:10,214][42004] Updated weights for policy 0, policy_version 25536 (0.0034) +[2024-11-08 02:42:12,931][41694] Fps is (10 sec: 5324.8, 60 sec: 6690.1, 300 sec: 6803.5). Total num frames: 104611840. Throughput: 0: 1612.3. Samples: 21149300. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:42:12,933][41694] Avg episode reward: [(0, '4.432')] +[2024-11-08 02:42:15,809][42004] Updated weights for policy 0, policy_version 25546 (0.0030) +[2024-11-08 02:42:17,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6690.1, 300 sec: 6789.7). Total num frames: 104648704. Throughput: 0: 1647.2. Samples: 21155042. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:42:17,934][41694] Avg episode reward: [(0, '4.647')] +[2024-11-08 02:42:21,517][42004] Updated weights for policy 0, policy_version 25556 (0.0035) +[2024-11-08 02:42:22,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6690.1, 300 sec: 6840.5). Total num frames: 104685568. Throughput: 0: 1727.2. Samples: 21165950. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:42:22,933][41694] Avg episode reward: [(0, '4.493')] +[2024-11-08 02:42:26,857][42004] Updated weights for policy 0, policy_version 25566 (0.0029) +[2024-11-08 02:42:27,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6690.1, 300 sec: 6845.2). Total num frames: 104722432. Throughput: 0: 1744.0. Samples: 21177428. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:42:27,933][41694] Avg episode reward: [(0, '4.544')] +[2024-11-08 02:42:32,126][42004] Updated weights for policy 0, policy_version 25576 (0.0028) +[2024-11-08 02:42:32,932][41694] Fps is (10 sec: 7782.0, 60 sec: 7031.4, 300 sec: 6872.9). Total num frames: 104763392. Throughput: 0: 1746.7. Samples: 21183050. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:42:32,934][41694] Avg episode reward: [(0, '4.501')] +[2024-11-08 02:42:37,509][42004] Updated weights for policy 0, policy_version 25586 (0.0033) +[2024-11-08 02:42:37,932][41694] Fps is (10 sec: 7782.2, 60 sec: 7099.7, 300 sec: 6886.8). Total num frames: 104800256. Throughput: 0: 1753.8. Samples: 21194602. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:42:37,934][41694] Avg episode reward: [(0, '4.510')] +[2024-11-08 02:42:42,932][41694] Fps is (10 sec: 5325.0, 60 sec: 6758.4, 300 sec: 6803.5). Total num frames: 104816640. Throughput: 0: 1643.4. Samples: 21201084. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 02:42:42,934][41694] Avg episode reward: [(0, '4.523')] +[2024-11-08 02:42:45,962][42004] Updated weights for policy 0, policy_version 25596 (0.0027) +[2024-11-08 02:42:47,931][41694] Fps is (10 sec: 5324.9, 60 sec: 6758.4, 300 sec: 6803.5). Total num frames: 104853504. Throughput: 0: 1635.2. Samples: 21206102. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 02:42:47,933][41694] Avg episode reward: [(0, '4.358')] +[2024-11-08 02:42:51,364][42004] Updated weights for policy 0, policy_version 25606 (0.0032) +[2024-11-08 02:42:52,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6758.4, 300 sec: 6803.5). Total num frames: 104890368. Throughput: 0: 1732.0. Samples: 21217402. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 02:42:52,934][41694] Avg episode reward: [(0, '4.465')] +[2024-11-08 02:42:56,732][42004] Updated weights for policy 0, policy_version 25616 (0.0030) +[2024-11-08 02:42:57,931][41694] Fps is (10 sec: 7782.4, 60 sec: 6827.8, 300 sec: 6870.3). Total num frames: 104931328. Throughput: 0: 1769.7. Samples: 21228936. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 02:42:57,933][41694] Avg episode reward: [(0, '4.279')] +[2024-11-08 02:43:02,259][42004] Updated weights for policy 0, policy_version 25626 (0.0030) +[2024-11-08 02:43:02,932][41694] Fps is (10 sec: 7782.5, 60 sec: 6826.7, 300 sec: 6873.0). Total num frames: 104968192. Throughput: 0: 1764.5. Samples: 21234444. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:43:02,933][41694] Avg episode reward: [(0, '4.609')] +[2024-11-08 02:43:07,604][42004] Updated weights for policy 0, policy_version 25636 (0.0025) +[2024-11-08 02:43:07,931][41694] Fps is (10 sec: 7372.8, 60 sec: 7099.7, 300 sec: 6886.8). Total num frames: 105005056. Throughput: 0: 1772.5. Samples: 21245714. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:43:07,933][41694] Avg episode reward: [(0, '4.454')] +[2024-11-08 02:43:14,987][41694] Fps is (10 sec: 6115.7, 60 sec: 6930.6, 300 sec: 6839.2). Total num frames: 105041920. Throughput: 0: 1689.5. Samples: 21256930. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:43:14,989][41694] Avg episode reward: [(0, '4.585')] +[2024-11-08 02:43:15,530][42004] Updated weights for policy 0, policy_version 25646 (0.0039) +[2024-11-08 02:43:17,931][41694] Fps is (10 sec: 5324.8, 60 sec: 6826.7, 300 sec: 6817.4). Total num frames: 105058304. Throughput: 0: 1669.6. Samples: 21258180. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:43:17,934][41694] Avg episode reward: [(0, '4.594')] +[2024-11-08 02:43:21,300][42004] Updated weights for policy 0, policy_version 25656 (0.0029) +[2024-11-08 02:43:22,932][41694] Fps is (10 sec: 6702.4, 60 sec: 6826.6, 300 sec: 6817.4). Total num frames: 105095168. Throughput: 0: 1647.8. Samples: 21268754. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:43:22,933][41694] Avg episode reward: [(0, '4.427')] +[2024-11-08 02:43:26,596][42004] Updated weights for policy 0, policy_version 25666 (0.0040) +[2024-11-08 02:43:27,932][41694] Fps is (10 sec: 7782.1, 60 sec: 6894.9, 300 sec: 6817.4). Total num frames: 105136128. Throughput: 0: 1762.2. Samples: 21280382. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:43:27,936][41694] Avg episode reward: [(0, '4.581')] +[2024-11-08 02:43:31,807][42004] Updated weights for policy 0, policy_version 25676 (0.0026) +[2024-11-08 02:43:32,931][41694] Fps is (10 sec: 8192.2, 60 sec: 6895.0, 300 sec: 6886.9). Total num frames: 105177088. Throughput: 0: 1779.3. Samples: 21286170. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:43:32,933][41694] Avg episode reward: [(0, '4.548')] +[2024-11-08 02:43:36,979][42004] Updated weights for policy 0, policy_version 25686 (0.0027) +[2024-11-08 02:43:37,938][41694] Fps is (10 sec: 7777.8, 60 sec: 6894.2, 300 sec: 6886.7). Total num frames: 105213952. Throughput: 0: 1794.2. Samples: 21298152. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:43:37,940][41694] Avg episode reward: [(0, '4.447')] +[2024-11-08 02:43:38,008][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000025688_105218048.pth... +[2024-11-08 02:43:38,112][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000025287_103575552.pth +[2024-11-08 02:43:42,477][42004] Updated weights for policy 0, policy_version 25696 (0.0026) +[2024-11-08 02:43:42,934][41694] Fps is (10 sec: 7370.8, 60 sec: 7236.0, 300 sec: 6886.8). Total num frames: 105250816. Throughput: 0: 1788.5. Samples: 21309422. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:43:42,937][41694] Avg episode reward: [(0, '4.393')] +[2024-11-08 02:43:49,489][41694] Fps is (10 sec: 6028.3, 60 sec: 6986.7, 300 sec: 6836.9). Total num frames: 105283584. Throughput: 0: 1730.5. Samples: 21315012. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:43:49,490][41694] Avg episode reward: [(0, '4.462')] +[2024-11-08 02:43:50,518][42004] Updated weights for policy 0, policy_version 25706 (0.0037) +[2024-11-08 02:43:52,932][41694] Fps is (10 sec: 5326.1, 60 sec: 6894.9, 300 sec: 6817.4). Total num frames: 105304064. Throughput: 0: 1678.7. Samples: 21321254. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:43:52,933][41694] Avg episode reward: [(0, '4.478')] +[2024-11-08 02:43:56,126][42004] Updated weights for policy 0, policy_version 25716 (0.0036) +[2024-11-08 02:43:57,932][41694] Fps is (10 sec: 7277.0, 60 sec: 6894.9, 300 sec: 6831.3). Total num frames: 105345024. Throughput: 0: 1762.4. Samples: 21332614. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:43:57,934][41694] Avg episode reward: [(0, '4.533')] +[2024-11-08 02:44:01,433][42004] Updated weights for policy 0, policy_version 25726 (0.0023) +[2024-11-08 02:44:02,931][41694] Fps is (10 sec: 7782.4, 60 sec: 6894.9, 300 sec: 6831.3). Total num frames: 105381888. Throughput: 0: 1778.9. Samples: 21338232. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:44:02,934][41694] Avg episode reward: [(0, '4.598')] +[2024-11-08 02:44:06,834][42004] Updated weights for policy 0, policy_version 25736 (0.0034) +[2024-11-08 02:44:07,931][41694] Fps is (10 sec: 7782.5, 60 sec: 6963.2, 300 sec: 6889.1). Total num frames: 105422848. Throughput: 0: 1794.2. Samples: 21349494. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:44:07,933][41694] Avg episode reward: [(0, '4.438')] +[2024-11-08 02:44:12,137][42004] Updated weights for policy 0, policy_version 25746 (0.0030) +[2024-11-08 02:44:12,932][41694] Fps is (10 sec: 7782.3, 60 sec: 7210.2, 300 sec: 6900.7). Total num frames: 105459712. Throughput: 0: 1797.6. Samples: 21361274. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:44:12,934][41694] Avg episode reward: [(0, '4.532')] +[2024-11-08 02:44:17,413][42004] Updated weights for policy 0, policy_version 25756 (0.0027) +[2024-11-08 02:44:17,932][41694] Fps is (10 sec: 7782.3, 60 sec: 7372.8, 300 sec: 6928.5). Total num frames: 105500672. Throughput: 0: 1794.7. Samples: 21366930. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:44:17,933][41694] Avg episode reward: [(0, '4.427')] +[2024-11-08 02:44:23,955][41694] Fps is (10 sec: 5573.4, 60 sec: 6980.6, 300 sec: 6849.2). Total num frames: 105521152. Throughput: 0: 1724.7. Samples: 21377520. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:44:23,964][41694] Avg episode reward: [(0, '4.488')] +[2024-11-08 02:44:25,865][42004] Updated weights for policy 0, policy_version 25766 (0.0021) +[2024-11-08 02:44:27,932][41694] Fps is (10 sec: 4915.2, 60 sec: 6895.0, 300 sec: 6831.3). Total num frames: 105549824. Throughput: 0: 1655.8. Samples: 21383930. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:44:27,933][41694] Avg episode reward: [(0, '4.281')] +[2024-11-08 02:44:31,467][42004] Updated weights for policy 0, policy_version 25776 (0.0028) +[2024-11-08 02:44:32,934][41694] Fps is (10 sec: 7299.3, 60 sec: 6826.4, 300 sec: 6817.4). Total num frames: 105586688. Throughput: 0: 1709.8. Samples: 21389296. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:44:32,937][41694] Avg episode reward: [(0, '4.385')] +[2024-11-08 02:44:36,924][42004] Updated weights for policy 0, policy_version 25786 (0.0039) +[2024-11-08 02:44:37,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6827.4, 300 sec: 6817.4). Total num frames: 105623552. Throughput: 0: 1763.3. Samples: 21400604. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:44:37,935][41694] Avg episode reward: [(0, '4.511')] +[2024-11-08 02:44:42,250][42004] Updated weights for policy 0, policy_version 25796 (0.0033) +[2024-11-08 02:44:42,931][41694] Fps is (10 sec: 7784.3, 60 sec: 6895.2, 300 sec: 6891.6). Total num frames: 105664512. Throughput: 0: 1770.1. Samples: 21412268. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:44:42,933][41694] Avg episode reward: [(0, '4.585')] +[2024-11-08 02:44:47,532][42004] Updated weights for policy 0, policy_version 25806 (0.0028) +[2024-11-08 02:44:47,931][41694] Fps is (10 sec: 7782.6, 60 sec: 7148.7, 300 sec: 6900.7). Total num frames: 105701376. Throughput: 0: 1769.4. Samples: 21417854. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:44:47,934][41694] Avg episode reward: [(0, '4.478')] +[2024-11-08 02:44:52,931][41694] Fps is (10 sec: 7372.7, 60 sec: 7236.3, 300 sec: 6900.7). Total num frames: 105738240. Throughput: 0: 1770.8. Samples: 21429180. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:44:52,933][41694] Avg episode reward: [(0, '4.567')] +[2024-11-08 02:44:53,537][42004] Updated weights for policy 0, policy_version 25816 (0.0035) +[2024-11-08 02:44:58,438][41694] Fps is (10 sec: 5458.0, 60 sec: 6837.3, 300 sec: 6833.5). Total num frames: 105758720. Throughput: 0: 1594.6. Samples: 21433840. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:44:58,439][41694] Avg episode reward: [(0, '4.347')] +[2024-11-08 02:45:01,785][42004] Updated weights for policy 0, policy_version 25826 (0.0026) +[2024-11-08 02:45:02,931][41694] Fps is (10 sec: 4915.2, 60 sec: 6758.4, 300 sec: 6817.4). Total num frames: 105787392. Throughput: 0: 1623.6. Samples: 21439994. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:45:02,935][41694] Avg episode reward: [(0, '4.418')] +[2024-11-08 02:45:07,291][42004] Updated weights for policy 0, policy_version 25836 (0.0033) +[2024-11-08 02:45:07,932][41694] Fps is (10 sec: 7334.4, 60 sec: 6758.4, 300 sec: 6831.3). Total num frames: 105828352. Throughput: 0: 1671.7. Samples: 21451034. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:45:07,933][41694] Avg episode reward: [(0, '4.336')] +[2024-11-08 02:45:12,688][42004] Updated weights for policy 0, policy_version 25846 (0.0028) +[2024-11-08 02:45:12,932][41694] Fps is (10 sec: 7782.2, 60 sec: 6758.4, 300 sec: 6845.2). Total num frames: 105865216. Throughput: 0: 1746.6. Samples: 21462528. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:45:12,933][41694] Avg episode reward: [(0, '4.266')] +[2024-11-08 02:45:17,932][41694] Fps is (10 sec: 6963.0, 60 sec: 6621.8, 300 sec: 6900.7). Total num frames: 105897984. Throughput: 0: 1736.1. Samples: 21467418. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-11-08 02:45:17,934][41694] Avg episode reward: [(0, '4.669')] +[2024-11-08 02:45:18,744][42004] Updated weights for policy 0, policy_version 25856 (0.0035) +[2024-11-08 02:45:22,931][41694] Fps is (10 sec: 6963.4, 60 sec: 7014.6, 300 sec: 6900.7). Total num frames: 105934848. Throughput: 0: 1720.6. Samples: 21478030. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-11-08 02:45:22,934][41694] Avg episode reward: [(0, '4.470')] +[2024-11-08 02:45:24,389][42004] Updated weights for policy 0, policy_version 25866 (0.0028) +[2024-11-08 02:45:27,931][41694] Fps is (10 sec: 6963.5, 60 sec: 6963.2, 300 sec: 6886.8). Total num frames: 105967616. Throughput: 0: 1703.6. Samples: 21488930. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-11-08 02:45:27,933][41694] Avg episode reward: [(0, '4.342')] +[2024-11-08 02:45:30,614][42004] Updated weights for policy 0, policy_version 25876 (0.0026) +[2024-11-08 02:45:32,935][41694] Fps is (10 sec: 5323.1, 60 sec: 6690.0, 300 sec: 6831.2). Total num frames: 105988096. Throughput: 0: 1685.5. Samples: 21493708. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-11-08 02:45:32,936][41694] Avg episode reward: [(0, '4.470')] +[2024-11-08 02:45:37,931][41694] Fps is (10 sec: 5734.4, 60 sec: 6690.2, 300 sec: 6831.3). Total num frames: 106024960. Throughput: 0: 1576.4. Samples: 21500116. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:45:37,933][41694] Avg episode reward: [(0, '4.481')] +[2024-11-08 02:45:37,946][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000025885_106024960.pth... +[2024-11-08 02:45:38,058][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000025482_104374272.pth +[2024-11-08 02:45:38,488][42004] Updated weights for policy 0, policy_version 25886 (0.0028) +[2024-11-08 02:45:42,932][41694] Fps is (10 sec: 7375.1, 60 sec: 6621.9, 300 sec: 6831.3). Total num frames: 106061824. Throughput: 0: 1746.9. Samples: 21511568. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:45:42,934][41694] Avg episode reward: [(0, '4.511')] +[2024-11-08 02:45:43,829][42004] Updated weights for policy 0, policy_version 25896 (0.0024) +[2024-11-08 02:45:47,931][41694] Fps is (10 sec: 7372.7, 60 sec: 6621.9, 300 sec: 6831.3). Total num frames: 106098688. Throughput: 0: 1712.3. Samples: 21517048. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:45:47,934][41694] Avg episode reward: [(0, '4.463')] +[2024-11-08 02:45:49,325][42004] Updated weights for policy 0, policy_version 25906 (0.0030) +[2024-11-08 02:45:52,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6621.9, 300 sec: 6886.8). Total num frames: 106135552. Throughput: 0: 1717.3. Samples: 21528312. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 02:45:52,934][41694] Avg episode reward: [(0, '4.549')] +[2024-11-08 02:45:54,839][42004] Updated weights for policy 0, policy_version 25916 (0.0031) +[2024-11-08 02:45:57,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6953.6, 300 sec: 6900.7). Total num frames: 106172416. Throughput: 0: 1712.3. Samples: 21539580. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 02:45:57,933][41694] Avg episode reward: [(0, '4.484')] +[2024-11-08 02:46:00,637][42004] Updated weights for policy 0, policy_version 25926 (0.0025) +[2024-11-08 02:46:02,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6963.2, 300 sec: 6886.8). Total num frames: 106205184. Throughput: 0: 1723.4. Samples: 21544970. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 02:46:02,933][41694] Avg episode reward: [(0, '4.623')] +[2024-11-08 02:46:07,931][41694] Fps is (10 sec: 4915.2, 60 sec: 6553.6, 300 sec: 6817.4). Total num frames: 106221568. Throughput: 0: 1659.8. Samples: 21552722. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 02:46:07,933][41694] Avg episode reward: [(0, '4.651')] +[2024-11-08 02:46:09,136][42004] Updated weights for policy 0, policy_version 25936 (0.0044) +[2024-11-08 02:46:12,932][41694] Fps is (10 sec: 4915.1, 60 sec: 6485.4, 300 sec: 6803.5). Total num frames: 106254336. Throughput: 0: 1587.4. Samples: 21560364. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-08 02:46:12,935][41694] Avg episode reward: [(0, '4.669')] +[2024-11-08 02:46:15,491][42004] Updated weights for policy 0, policy_version 25946 (0.0036) +[2024-11-08 02:46:17,932][41694] Fps is (10 sec: 6963.0, 60 sec: 6553.6, 300 sec: 6803.5). Total num frames: 106291200. Throughput: 0: 1593.8. Samples: 21565426. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-08 02:46:17,935][41694] Avg episode reward: [(0, '4.413')] +[2024-11-08 02:46:20,875][42004] Updated weights for policy 0, policy_version 25956 (0.0041) +[2024-11-08 02:46:22,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6553.6, 300 sec: 6803.5). Total num frames: 106328064. Throughput: 0: 1702.7. Samples: 21576738. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-08 02:46:22,935][41694] Avg episode reward: [(0, '4.562')] +[2024-11-08 02:46:26,269][42004] Updated weights for policy 0, policy_version 25966 (0.0027) +[2024-11-08 02:46:27,931][41694] Fps is (10 sec: 7782.6, 60 sec: 6690.1, 300 sec: 6872.9). Total num frames: 106369024. Throughput: 0: 1702.2. Samples: 21588168. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:46:27,934][41694] Avg episode reward: [(0, '4.702')] +[2024-11-08 02:46:31,770][42004] Updated weights for policy 0, policy_version 25976 (0.0029) +[2024-11-08 02:46:32,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6963.6, 300 sec: 6886.8). Total num frames: 106405888. Throughput: 0: 1702.4. Samples: 21593658. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:46:32,933][41694] Avg episode reward: [(0, '4.458')] +[2024-11-08 02:46:37,697][42004] Updated weights for policy 0, policy_version 25986 (0.0034) +[2024-11-08 02:46:37,933][41694] Fps is (10 sec: 6962.5, 60 sec: 6894.8, 300 sec: 6872.9). Total num frames: 106438656. Throughput: 0: 1687.8. Samples: 21604266. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:46:37,941][41694] Avg episode reward: [(0, '4.397')] +[2024-11-08 02:46:42,931][41694] Fps is (10 sec: 4915.3, 60 sec: 6553.6, 300 sec: 6803.5). Total num frames: 106455040. Throughput: 0: 1572.3. Samples: 21610334. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:46:42,934][41694] Avg episode reward: [(0, '4.515')] +[2024-11-08 02:46:45,717][42004] Updated weights for policy 0, policy_version 25996 (0.0042) +[2024-11-08 02:46:47,932][41694] Fps is (10 sec: 5734.6, 60 sec: 6621.8, 300 sec: 6817.4). Total num frames: 106496000. Throughput: 0: 1580.6. Samples: 21616098. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:46:47,934][41694] Avg episode reward: [(0, '4.378')] +[2024-11-08 02:46:51,255][42004] Updated weights for policy 0, policy_version 26006 (0.0027) +[2024-11-08 02:46:52,931][41694] Fps is (10 sec: 7372.7, 60 sec: 6553.6, 300 sec: 6803.8). Total num frames: 106528768. Throughput: 0: 1656.1. Samples: 21627246. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:46:52,933][41694] Avg episode reward: [(0, '4.332')] +[2024-11-08 02:46:56,777][42004] Updated weights for policy 0, policy_version 26016 (0.0026) +[2024-11-08 02:46:57,932][41694] Fps is (10 sec: 7373.0, 60 sec: 6621.8, 300 sec: 6817.4). Total num frames: 106569728. Throughput: 0: 1736.3. Samples: 21638496. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:46:57,941][41694] Avg episode reward: [(0, '4.271')] +[2024-11-08 02:47:02,425][42004] Updated weights for policy 0, policy_version 26026 (0.0034) +[2024-11-08 02:47:02,932][41694] Fps is (10 sec: 7372.5, 60 sec: 6621.8, 300 sec: 6859.1). Total num frames: 106602496. Throughput: 0: 1744.5. Samples: 21643930. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:47:02,933][41694] Avg episode reward: [(0, '4.394')] +[2024-11-08 02:47:07,931][41694] Fps is (10 sec: 6963.5, 60 sec: 6963.2, 300 sec: 6873.0). Total num frames: 106639360. Throughput: 0: 1734.3. Samples: 21654782. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:47:07,933][41694] Avg episode reward: [(0, '4.407')] +[2024-11-08 02:47:07,960][42004] Updated weights for policy 0, policy_version 26036 (0.0031) +[2024-11-08 02:47:12,932][41694] Fps is (10 sec: 7373.1, 60 sec: 7031.5, 300 sec: 6872.9). Total num frames: 106676224. Throughput: 0: 1711.9. Samples: 21665202. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:47:12,934][41694] Avg episode reward: [(0, '4.342')] +[2024-11-08 02:47:13,884][42004] Updated weights for policy 0, policy_version 26046 (0.0033) +[2024-11-08 02:47:17,932][41694] Fps is (10 sec: 5324.7, 60 sec: 6690.1, 300 sec: 6803.5). Total num frames: 106692608. Throughput: 0: 1673.6. Samples: 21668970. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:47:17,933][41694] Avg episode reward: [(0, '4.253')] +[2024-11-08 02:47:21,763][42004] Updated weights for policy 0, policy_version 26056 (0.0033) +[2024-11-08 02:47:22,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6758.4, 300 sec: 6817.4). Total num frames: 106733568. Throughput: 0: 1627.6. Samples: 21677506. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:47:22,933][41694] Avg episode reward: [(0, '4.378')] +[2024-11-08 02:47:27,056][42004] Updated weights for policy 0, policy_version 26066 (0.0031) +[2024-11-08 02:47:27,931][41694] Fps is (10 sec: 7782.5, 60 sec: 6690.1, 300 sec: 6803.5). Total num frames: 106770432. Throughput: 0: 1750.4. Samples: 21689100. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:47:27,933][41694] Avg episode reward: [(0, '4.365')] +[2024-11-08 02:47:32,028][42004] Updated weights for policy 0, policy_version 26076 (0.0020) +[2024-11-08 02:47:32,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6758.4, 300 sec: 6817.4). Total num frames: 106811392. Throughput: 0: 1758.3. Samples: 21695220. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:47:32,935][41694] Avg episode reward: [(0, '4.334')] +[2024-11-08 02:47:37,226][42004] Updated weights for policy 0, policy_version 26086 (0.0022) +[2024-11-08 02:47:37,932][41694] Fps is (10 sec: 8191.8, 60 sec: 6895.0, 300 sec: 6900.7). Total num frames: 106852352. Throughput: 0: 1778.1. Samples: 21707262. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:47:37,933][41694] Avg episode reward: [(0, '4.470')] +[2024-11-08 02:47:37,943][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000026087_106852352.pth... +[2024-11-08 02:47:38,060][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000025688_105218048.pth +[2024-11-08 02:47:42,906][42004] Updated weights for policy 0, policy_version 26096 (0.0027) +[2024-11-08 02:47:42,931][41694] Fps is (10 sec: 7782.4, 60 sec: 7236.3, 300 sec: 6900.7). Total num frames: 106889216. Throughput: 0: 1768.4. Samples: 21718074. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:47:42,934][41694] Avg episode reward: [(0, '4.512')] +[2024-11-08 02:47:47,932][41694] Fps is (10 sec: 7372.9, 60 sec: 7168.1, 300 sec: 6900.7). Total num frames: 106926080. Throughput: 0: 1772.5. Samples: 21723690. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:47:47,934][41694] Avg episode reward: [(0, '4.521')] +[2024-11-08 02:47:48,203][42004] Updated weights for policy 0, policy_version 26106 (0.0027) +[2024-11-08 02:47:52,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6963.2, 300 sec: 6831.3). Total num frames: 106946560. Throughput: 0: 1699.1. Samples: 21731240. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:47:52,935][41694] Avg episode reward: [(0, '4.585')] +[2024-11-08 02:47:55,846][42004] Updated weights for policy 0, policy_version 26116 (0.0033) +[2024-11-08 02:47:57,932][41694] Fps is (10 sec: 5734.2, 60 sec: 6894.9, 300 sec: 6831.3). Total num frames: 106983424. Throughput: 0: 1721.8. Samples: 21742682. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 02:47:57,935][41694] Avg episode reward: [(0, '4.582')] +[2024-11-08 02:48:01,500][42004] Updated weights for policy 0, policy_version 26126 (0.0030) +[2024-11-08 02:48:02,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6963.2, 300 sec: 6831.3). Total num frames: 107020288. Throughput: 0: 1753.0. Samples: 21747854. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 02:48:02,943][41694] Avg episode reward: [(0, '4.517')] +[2024-11-08 02:48:07,376][42004] Updated weights for policy 0, policy_version 26136 (0.0032) +[2024-11-08 02:48:07,931][41694] Fps is (10 sec: 7373.2, 60 sec: 6963.2, 300 sec: 6879.2). Total num frames: 107057152. Throughput: 0: 1790.7. Samples: 21758086. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 02:48:07,933][41694] Avg episode reward: [(0, '4.421')] +[2024-11-08 02:48:12,771][42004] Updated weights for policy 0, policy_version 26146 (0.0030) +[2024-11-08 02:48:12,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6963.2, 300 sec: 6900.7). Total num frames: 107094016. Throughput: 0: 1787.5. Samples: 21769538. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-11-08 02:48:12,933][41694] Avg episode reward: [(0, '4.480')] +[2024-11-08 02:48:17,931][41694] Fps is (10 sec: 7372.8, 60 sec: 7304.6, 300 sec: 6900.7). Total num frames: 107130880. Throughput: 0: 1777.6. Samples: 21775214. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-11-08 02:48:17,933][41694] Avg episode reward: [(0, '4.321')] +[2024-11-08 02:48:18,067][42004] Updated weights for policy 0, policy_version 26156 (0.0028) +[2024-11-08 02:48:22,932][41694] Fps is (10 sec: 7782.5, 60 sec: 7304.5, 300 sec: 6900.7). Total num frames: 107171840. Throughput: 0: 1763.1. Samples: 21786600. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-11-08 02:48:22,933][41694] Avg episode reward: [(0, '4.438')] +[2024-11-08 02:48:25,685][42004] Updated weights for policy 0, policy_version 26166 (0.0028) +[2024-11-08 02:48:27,931][41694] Fps is (10 sec: 6144.0, 60 sec: 7031.5, 300 sec: 6831.3). Total num frames: 107192320. Throughput: 0: 1689.9. Samples: 21794120. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-11-08 02:48:27,934][41694] Avg episode reward: [(0, '4.352')] +[2024-11-08 02:48:31,079][42004] Updated weights for policy 0, policy_version 26176 (0.0034) +[2024-11-08 02:48:32,932][41694] Fps is (10 sec: 5734.2, 60 sec: 6963.2, 300 sec: 6831.4). Total num frames: 107229184. Throughput: 0: 1687.2. Samples: 21799616. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:48:32,937][41694] Avg episode reward: [(0, '4.445')] +[2024-11-08 02:48:36,971][42004] Updated weights for policy 0, policy_version 26186 (0.0032) +[2024-11-08 02:48:37,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6826.7, 300 sec: 6817.5). Total num frames: 107261952. Throughput: 0: 1749.9. Samples: 21809986. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:48:37,934][41694] Avg episode reward: [(0, '4.350')] +[2024-11-08 02:48:42,251][42004] Updated weights for policy 0, policy_version 26196 (0.0038) +[2024-11-08 02:48:42,932][41694] Fps is (10 sec: 7373.0, 60 sec: 6894.9, 300 sec: 6881.5). Total num frames: 107302912. Throughput: 0: 1760.6. Samples: 21821906. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:48:42,934][41694] Avg episode reward: [(0, '4.483')] +[2024-11-08 02:48:47,412][42004] Updated weights for policy 0, policy_version 26206 (0.0035) +[2024-11-08 02:48:47,932][41694] Fps is (10 sec: 7782.3, 60 sec: 6894.9, 300 sec: 6900.7). Total num frames: 107339776. Throughput: 0: 1773.3. Samples: 21827652. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:48:47,933][41694] Avg episode reward: [(0, '4.239')] +[2024-11-08 02:48:52,832][42004] Updated weights for policy 0, policy_version 26216 (0.0022) +[2024-11-08 02:48:52,932][41694] Fps is (10 sec: 7782.4, 60 sec: 7236.3, 300 sec: 6900.7). Total num frames: 107380736. Throughput: 0: 1801.8. Samples: 21839168. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:48:52,933][41694] Avg episode reward: [(0, '4.495')] +[2024-11-08 02:48:59,931][41694] Fps is (10 sec: 6144.5, 60 sec: 6936.9, 300 sec: 6840.5). Total num frames: 107413504. Throughput: 0: 1727.5. Samples: 21850730. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:48:59,932][41694] Avg episode reward: [(0, '4.311')] +[2024-11-08 02:49:00,490][42004] Updated weights for policy 0, policy_version 26226 (0.0026) +[2024-11-08 02:49:02,931][41694] Fps is (10 sec: 5734.5, 60 sec: 6963.2, 300 sec: 6831.3). Total num frames: 107438080. Throughput: 0: 1706.7. Samples: 21852016. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:49:02,933][41694] Avg episode reward: [(0, '4.275')] +[2024-11-08 02:49:06,368][42004] Updated weights for policy 0, policy_version 26236 (0.0028) +[2024-11-08 02:49:07,931][41694] Fps is (10 sec: 7167.2, 60 sec: 6894.9, 300 sec: 6817.4). Total num frames: 107470848. Throughput: 0: 1687.8. Samples: 21862550. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:49:07,933][41694] Avg episode reward: [(0, '4.585')] +[2024-11-08 02:49:12,637][42004] Updated weights for policy 0, policy_version 26246 (0.0034) +[2024-11-08 02:49:12,931][41694] Fps is (10 sec: 6553.6, 60 sec: 6826.7, 300 sec: 6789.6). Total num frames: 107503616. Throughput: 0: 1738.3. Samples: 21872342. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:49:12,937][41694] Avg episode reward: [(0, '4.278')] +[2024-11-08 02:49:17,856][42004] Updated weights for policy 0, policy_version 26256 (0.0035) +[2024-11-08 02:49:17,932][41694] Fps is (10 sec: 7372.6, 60 sec: 6894.9, 300 sec: 6882.9). Total num frames: 107544576. Throughput: 0: 1741.2. Samples: 21877972. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:49:17,934][41694] Avg episode reward: [(0, '4.615')] +[2024-11-08 02:49:22,932][41694] Fps is (10 sec: 7782.3, 60 sec: 6826.7, 300 sec: 6886.8). Total num frames: 107581440. Throughput: 0: 1775.4. Samples: 21889880. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:49:22,933][41694] Avg episode reward: [(0, '4.320')] +[2024-11-08 02:49:22,992][42004] Updated weights for policy 0, policy_version 26266 (0.0034) +[2024-11-08 02:49:27,932][41694] Fps is (10 sec: 7782.5, 60 sec: 7168.0, 300 sec: 6900.8). Total num frames: 107622400. Throughput: 0: 1770.1. Samples: 21901562. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:49:27,935][41694] Avg episode reward: [(0, '4.301')] +[2024-11-08 02:49:28,386][42004] Updated weights for policy 0, policy_version 26276 (0.0027) +[2024-11-08 02:49:34,288][41694] Fps is (10 sec: 6492.4, 60 sec: 6942.8, 300 sec: 6855.3). Total num frames: 107655168. Throughput: 0: 1717.4. Samples: 21907264. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:49:34,289][41694] Avg episode reward: [(0, '4.501')] +[2024-11-08 02:49:35,938][42004] Updated weights for policy 0, policy_version 26286 (0.0028) +[2024-11-08 02:49:37,931][41694] Fps is (10 sec: 5734.5, 60 sec: 6963.2, 300 sec: 6831.3). Total num frames: 107679744. Throughput: 0: 1676.2. Samples: 21914598. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:49:37,934][41694] Avg episode reward: [(0, '4.428')] +[2024-11-08 02:49:37,951][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000026289_107679744.pth... +[2024-11-08 02:49:38,068][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000025885_106024960.pth +[2024-11-08 02:49:41,904][42004] Updated weights for policy 0, policy_version 26296 (0.0037) +[2024-11-08 02:49:42,931][41694] Fps is (10 sec: 6634.1, 60 sec: 6826.7, 300 sec: 6817.4). Total num frames: 107712512. Throughput: 0: 1723.6. Samples: 21924848. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:49:42,935][41694] Avg episode reward: [(0, '4.420')] +[2024-11-08 02:49:47,571][42004] Updated weights for policy 0, policy_version 26306 (0.0028) +[2024-11-08 02:49:47,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6826.7, 300 sec: 6817.4). Total num frames: 107749376. Throughput: 0: 1726.4. Samples: 21929702. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:49:47,933][41694] Avg episode reward: [(0, '4.536')] +[2024-11-08 02:49:52,667][42004] Updated weights for policy 0, policy_version 26316 (0.0022) +[2024-11-08 02:49:52,931][41694] Fps is (10 sec: 7782.5, 60 sec: 6826.7, 300 sec: 6898.7). Total num frames: 107790336. Throughput: 0: 1761.4. Samples: 21941814. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:49:52,933][41694] Avg episode reward: [(0, '4.398')] +[2024-11-08 02:49:57,697][42004] Updated weights for policy 0, policy_version 26326 (0.0038) +[2024-11-08 02:49:57,931][41694] Fps is (10 sec: 8192.1, 60 sec: 7203.2, 300 sec: 6928.5). Total num frames: 107831296. Throughput: 0: 1813.9. Samples: 21953968. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:49:57,934][41694] Avg episode reward: [(0, '4.616')] +[2024-11-08 02:50:02,931][41694] Fps is (10 sec: 7372.8, 60 sec: 7099.7, 300 sec: 6900.7). Total num frames: 107864064. Throughput: 0: 1805.7. Samples: 21959228. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:50:02,933][41694] Avg episode reward: [(0, '4.491')] +[2024-11-08 02:50:03,492][42004] Updated weights for policy 0, policy_version 26336 (0.0022) +[2024-11-08 02:50:08,981][41694] Fps is (10 sec: 5560.6, 60 sec: 6910.6, 300 sec: 6848.6). Total num frames: 107892736. Throughput: 0: 1748.1. Samples: 21970380. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:50:08,983][41694] Avg episode reward: [(0, '4.494')] +[2024-11-08 02:50:11,603][42004] Updated weights for policy 0, policy_version 26346 (0.0026) +[2024-11-08 02:50:12,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6963.2, 300 sec: 6859.1). Total num frames: 107921408. Throughput: 0: 1664.8. Samples: 21976480. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:50:12,934][41694] Avg episode reward: [(0, '4.494')] +[2024-11-08 02:50:17,931][41694] Fps is (10 sec: 6406.5, 60 sec: 6758.4, 300 sec: 6831.3). Total num frames: 107950080. Throughput: 0: 1690.5. Samples: 21981046. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-11-08 02:50:17,933][41694] Avg episode reward: [(0, '4.342')] +[2024-11-08 02:50:18,342][42004] Updated weights for policy 0, policy_version 26356 (0.0038) +[2024-11-08 02:50:22,931][41694] Fps is (10 sec: 6553.7, 60 sec: 6758.4, 300 sec: 6845.2). Total num frames: 107986944. Throughput: 0: 1684.8. Samples: 21990416. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-11-08 02:50:22,933][41694] Avg episode reward: [(0, '4.295')] +[2024-11-08 02:50:23,898][42004] Updated weights for policy 0, policy_version 26366 (0.0018) +[2024-11-08 02:50:27,933][41694] Fps is (10 sec: 7371.7, 60 sec: 6690.0, 300 sec: 6900.8). Total num frames: 108023808. Throughput: 0: 1721.3. Samples: 22002308. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-11-08 02:50:27,935][41694] Avg episode reward: [(0, '4.432')] +[2024-11-08 02:50:29,287][42004] Updated weights for policy 0, policy_version 26376 (0.0025) +[2024-11-08 02:50:32,932][41694] Fps is (10 sec: 7782.3, 60 sec: 6984.5, 300 sec: 6914.6). Total num frames: 108064768. Throughput: 0: 1744.0. Samples: 22008184. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-11-08 02:50:32,934][41694] Avg episode reward: [(0, '4.527')] +[2024-11-08 02:50:34,362][42004] Updated weights for policy 0, policy_version 26386 (0.0034) +[2024-11-08 02:50:37,932][41694] Fps is (10 sec: 7783.3, 60 sec: 7031.4, 300 sec: 6914.6). Total num frames: 108101632. Throughput: 0: 1732.3. Samples: 22019770. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:50:37,935][41694] Avg episode reward: [(0, '4.293')] +[2024-11-08 02:50:39,934][42004] Updated weights for policy 0, policy_version 26396 (0.0032) +[2024-11-08 02:50:43,601][41694] Fps is (10 sec: 5758.8, 60 sec: 6818.9, 300 sec: 6857.4). Total num frames: 108126208. Throughput: 0: 1562.4. Samples: 22025322. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:50:43,602][41694] Avg episode reward: [(0, '4.467')] +[2024-11-08 02:50:47,931][41694] Fps is (10 sec: 5325.0, 60 sec: 6758.4, 300 sec: 6845.2). Total num frames: 108154880. Throughput: 0: 1611.1. Samples: 22031726. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:50:47,933][41694] Avg episode reward: [(0, '4.401')] +[2024-11-08 02:50:48,029][42004] Updated weights for policy 0, policy_version 26406 (0.0041) +[2024-11-08 02:50:52,935][41694] Fps is (10 sec: 6582.0, 60 sec: 6621.5, 300 sec: 6831.2). Total num frames: 108187648. Throughput: 0: 1625.6. Samples: 22041834. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:50:52,937][41694] Avg episode reward: [(0, '4.528')] +[2024-11-08 02:50:54,119][42004] Updated weights for policy 0, policy_version 26416 (0.0041) +[2024-11-08 02:50:57,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6621.9, 300 sec: 6859.1). Total num frames: 108228608. Throughput: 0: 1704.0. Samples: 22053158. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:50:57,933][41694] Avg episode reward: [(0, '4.308')] +[2024-11-08 02:50:59,299][42004] Updated weights for policy 0, policy_version 26426 (0.0030) +[2024-11-08 02:51:02,931][41694] Fps is (10 sec: 7785.2, 60 sec: 6690.1, 300 sec: 6928.5). Total num frames: 108265472. Throughput: 0: 1735.4. Samples: 22059140. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:51:02,933][41694] Avg episode reward: [(0, '4.443')] +[2024-11-08 02:51:04,763][42004] Updated weights for policy 0, policy_version 26436 (0.0021) +[2024-11-08 02:51:07,932][41694] Fps is (10 sec: 7372.6, 60 sec: 6948.1, 300 sec: 6942.4). Total num frames: 108302336. Throughput: 0: 1773.2. Samples: 22070212. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:51:07,934][41694] Avg episode reward: [(0, '4.349')] +[2024-11-08 02:51:11,112][42004] Updated weights for policy 0, policy_version 26446 (0.0027) +[2024-11-08 02:51:12,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6895.0, 300 sec: 6928.5). Total num frames: 108335104. Throughput: 0: 1730.7. Samples: 22080186. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:51:12,934][41694] Avg episode reward: [(0, '4.356')] +[2024-11-08 02:51:18,248][41694] Fps is (10 sec: 5161.8, 60 sec: 6723.0, 300 sec: 6865.6). Total num frames: 108355584. Throughput: 0: 1698.5. Samples: 22085152. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:51:18,251][41694] Avg episode reward: [(0, '4.165')] +[2024-11-08 02:51:19,245][42004] Updated weights for policy 0, policy_version 26456 (0.0039) +[2024-11-08 02:51:22,931][41694] Fps is (10 sec: 4915.2, 60 sec: 6621.9, 300 sec: 6831.3). Total num frames: 108384256. Throughput: 0: 1589.3. Samples: 22091290. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:51:22,935][41694] Avg episode reward: [(0, '4.145')] +[2024-11-08 02:51:25,511][42004] Updated weights for policy 0, policy_version 26466 (0.0041) +[2024-11-08 02:51:27,931][41694] Fps is (10 sec: 6344.4, 60 sec: 6553.7, 300 sec: 6817.4). Total num frames: 108417024. Throughput: 0: 1705.4. Samples: 22100924. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 02:51:27,933][41694] Avg episode reward: [(0, '4.466')] +[2024-11-08 02:51:31,181][42004] Updated weights for policy 0, policy_version 26476 (0.0037) +[2024-11-08 02:51:32,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6553.6, 300 sec: 6845.2). Total num frames: 108457984. Throughput: 0: 1662.8. Samples: 22106550. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 02:51:32,933][41694] Avg episode reward: [(0, '4.343')] +[2024-11-08 02:51:36,651][42004] Updated weights for policy 0, policy_version 26486 (0.0025) +[2024-11-08 02:51:37,932][41694] Fps is (10 sec: 7782.1, 60 sec: 6553.6, 300 sec: 6914.6). Total num frames: 108494848. Throughput: 0: 1693.3. Samples: 22118026. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 02:51:37,933][41694] Avg episode reward: [(0, '4.438')] +[2024-11-08 02:51:37,942][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000026488_108494848.pth... +[2024-11-08 02:51:38,257][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000026087_106852352.pth +[2024-11-08 02:51:41,916][42004] Updated weights for policy 0, policy_version 26496 (0.0049) +[2024-11-08 02:51:42,931][41694] Fps is (10 sec: 7372.7, 60 sec: 6834.6, 300 sec: 6900.7). Total num frames: 108531712. Throughput: 0: 1697.8. Samples: 22129558. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 02:51:42,933][41694] Avg episode reward: [(0, '4.377')] +[2024-11-08 02:51:47,317][42004] Updated weights for policy 0, policy_version 26506 (0.0031) +[2024-11-08 02:51:47,932][41694] Fps is (10 sec: 7782.5, 60 sec: 6963.1, 300 sec: 6928.5). Total num frames: 108572672. Throughput: 0: 1686.9. Samples: 22135052. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:51:47,937][41694] Avg episode reward: [(0, '4.395')] +[2024-11-08 02:51:52,931][41694] Fps is (10 sec: 6144.0, 60 sec: 6758.8, 300 sec: 6859.1). Total num frames: 108593152. Throughput: 0: 1673.8. Samples: 22145534. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:51:52,933][41694] Avg episode reward: [(0, '4.548')] +[2024-11-08 02:51:55,245][42004] Updated weights for policy 0, policy_version 26516 (0.0025) +[2024-11-08 02:51:57,931][41694] Fps is (10 sec: 4915.4, 60 sec: 6553.6, 300 sec: 6845.2). Total num frames: 108621824. Throughput: 0: 1608.4. Samples: 22152562. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:51:57,933][41694] Avg episode reward: [(0, '4.212')] +[2024-11-08 02:52:01,535][42004] Updated weights for policy 0, policy_version 26526 (0.0031) +[2024-11-08 02:52:02,933][41694] Fps is (10 sec: 6552.4, 60 sec: 6553.4, 300 sec: 6845.1). Total num frames: 108658688. Throughput: 0: 1614.9. Samples: 22157314. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:52:02,935][41694] Avg episode reward: [(0, '4.390')] +[2024-11-08 02:52:07,027][42004] Updated weights for policy 0, policy_version 26536 (0.0030) +[2024-11-08 02:52:07,932][41694] Fps is (10 sec: 7372.6, 60 sec: 6553.6, 300 sec: 6845.2). Total num frames: 108695552. Throughput: 0: 1711.9. Samples: 22168324. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:52:07,934][41694] Avg episode reward: [(0, '4.435')] +[2024-11-08 02:52:12,238][42004] Updated weights for policy 0, policy_version 26546 (0.0052) +[2024-11-08 02:52:12,931][41694] Fps is (10 sec: 7783.9, 60 sec: 6690.1, 300 sec: 6928.5). Total num frames: 108736512. Throughput: 0: 1765.2. Samples: 22180356. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:52:12,933][41694] Avg episode reward: [(0, '4.387')] +[2024-11-08 02:52:17,334][42004] Updated weights for policy 0, policy_version 26556 (0.0027) +[2024-11-08 02:52:17,933][41694] Fps is (10 sec: 7781.8, 60 sec: 6999.9, 300 sec: 6914.6). Total num frames: 108773376. Throughput: 0: 1769.3. Samples: 22186172. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:52:17,936][41694] Avg episode reward: [(0, '4.490')] +[2024-11-08 02:52:22,932][41694] Fps is (10 sec: 7372.7, 60 sec: 7099.7, 300 sec: 6914.6). Total num frames: 108810240. Throughput: 0: 1755.7. Samples: 22197032. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:52:22,934][41694] Avg episode reward: [(0, '4.464')] +[2024-11-08 02:52:23,111][42004] Updated weights for policy 0, policy_version 26566 (0.0036) +[2024-11-08 02:52:27,932][41694] Fps is (10 sec: 5325.2, 60 sec: 6826.6, 300 sec: 6831.3). Total num frames: 108826624. Throughput: 0: 1654.0. Samples: 22203990. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:52:27,934][41694] Avg episode reward: [(0, '4.486')] +[2024-11-08 02:52:31,959][42004] Updated weights for policy 0, policy_version 26576 (0.0034) +[2024-11-08 02:52:32,931][41694] Fps is (10 sec: 4915.3, 60 sec: 6690.1, 300 sec: 6803.5). Total num frames: 108859392. Throughput: 0: 1628.4. Samples: 22208328. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:52:32,934][41694] Avg episode reward: [(0, '4.532')] +[2024-11-08 02:52:37,467][42004] Updated weights for policy 0, policy_version 26586 (0.0024) +[2024-11-08 02:52:37,931][41694] Fps is (10 sec: 6963.5, 60 sec: 6690.2, 300 sec: 6803.5). Total num frames: 108896256. Throughput: 0: 1623.6. Samples: 22218596. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:52:37,933][41694] Avg episode reward: [(0, '4.541')] +[2024-11-08 02:52:42,679][42004] Updated weights for policy 0, policy_version 26596 (0.0036) +[2024-11-08 02:52:42,932][41694] Fps is (10 sec: 7782.2, 60 sec: 6758.4, 300 sec: 6817.4). Total num frames: 108937216. Throughput: 0: 1734.9. Samples: 22230632. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:52:42,933][41694] Avg episode reward: [(0, '4.230')] +[2024-11-08 02:52:47,787][42004] Updated weights for policy 0, policy_version 26606 (0.0025) +[2024-11-08 02:52:47,931][41694] Fps is (10 sec: 8192.0, 60 sec: 6758.5, 300 sec: 6886.8). Total num frames: 108978176. Throughput: 0: 1762.1. Samples: 22236606. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:52:47,933][41694] Avg episode reward: [(0, '4.665')] +[2024-11-08 02:52:52,931][41694] Fps is (10 sec: 7782.6, 60 sec: 7031.5, 300 sec: 6886.8). Total num frames: 109015040. Throughput: 0: 1780.4. Samples: 22248440. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:52:52,933][41694] Avg episode reward: [(0, '4.657')] +[2024-11-08 02:52:52,994][42004] Updated weights for policy 0, policy_version 26616 (0.0028) +[2024-11-08 02:52:57,931][41694] Fps is (10 sec: 7782.4, 60 sec: 7236.3, 300 sec: 6900.7). Total num frames: 109056000. Throughput: 0: 1766.6. Samples: 22259852. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:52:57,933][41694] Avg episode reward: [(0, '4.508')] +[2024-11-08 02:52:58,456][42004] Updated weights for policy 0, policy_version 26626 (0.0034) +[2024-11-08 02:53:02,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6895.1, 300 sec: 6831.3). Total num frames: 109072384. Throughput: 0: 1741.4. Samples: 22264534. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:53:02,933][41694] Avg episode reward: [(0, '4.568')] +[2024-11-08 02:53:07,392][42004] Updated weights for policy 0, policy_version 26636 (0.0038) +[2024-11-08 02:53:07,931][41694] Fps is (10 sec: 4505.6, 60 sec: 6758.4, 300 sec: 6803.5). Total num frames: 109101056. Throughput: 0: 1629.2. Samples: 22270346. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:53:07,933][41694] Avg episode reward: [(0, '4.696')] +[2024-11-08 02:53:12,641][42004] Updated weights for policy 0, policy_version 26646 (0.0050) +[2024-11-08 02:53:12,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6758.4, 300 sec: 6817.4). Total num frames: 109142016. Throughput: 0: 1728.7. Samples: 22281780. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:53:12,933][41694] Avg episode reward: [(0, '4.594')] +[2024-11-08 02:53:17,674][42004] Updated weights for policy 0, policy_version 26656 (0.0031) +[2024-11-08 02:53:17,932][41694] Fps is (10 sec: 8191.9, 60 sec: 6826.8, 300 sec: 6817.4). Total num frames: 109182976. Throughput: 0: 1762.9. Samples: 22287660. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:53:17,934][41694] Avg episode reward: [(0, '4.274')] +[2024-11-08 02:53:22,932][41694] Fps is (10 sec: 7782.3, 60 sec: 6826.7, 300 sec: 6872.9). Total num frames: 109219840. Throughput: 0: 1802.4. Samples: 22299704. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:53:22,933][41694] Avg episode reward: [(0, '4.611')] +[2024-11-08 02:53:22,946][42004] Updated weights for policy 0, policy_version 26666 (0.0024) +[2024-11-08 02:53:27,932][41694] Fps is (10 sec: 7782.4, 60 sec: 7236.3, 300 sec: 6886.8). Total num frames: 109260800. Throughput: 0: 1793.1. Samples: 22311322. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:53:27,933][41694] Avg episode reward: [(0, '4.661')] +[2024-11-08 02:53:28,374][42004] Updated weights for policy 0, policy_version 26676 (0.0023) +[2024-11-08 02:53:32,932][41694] Fps is (10 sec: 7782.4, 60 sec: 7304.5, 300 sec: 6900.7). Total num frames: 109297664. Throughput: 0: 1783.8. Samples: 22316878. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:53:32,933][41694] Avg episode reward: [(0, '4.585')] +[2024-11-08 02:53:33,600][42004] Updated weights for policy 0, policy_version 26686 (0.0030) +[2024-11-08 02:53:37,932][41694] Fps is (10 sec: 5324.8, 60 sec: 6963.2, 300 sec: 6817.4). Total num frames: 109314048. Throughput: 0: 1696.7. Samples: 22324790. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:53:37,934][41694] Avg episode reward: [(0, '4.296')] +[2024-11-08 02:53:38,067][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000026689_109318144.pth... +[2024-11-08 02:53:38,194][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000026289_107679744.pth +[2024-11-08 02:53:42,349][42004] Updated weights for policy 0, policy_version 26696 (0.0036) +[2024-11-08 02:53:42,932][41694] Fps is (10 sec: 5324.8, 60 sec: 6894.9, 300 sec: 6817.4). Total num frames: 109350912. Throughput: 0: 1638.8. Samples: 22333600. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:53:42,934][41694] Avg episode reward: [(0, '4.456')] +[2024-11-08 02:53:47,572][42004] Updated weights for policy 0, policy_version 26706 (0.0030) +[2024-11-08 02:53:47,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6826.6, 300 sec: 6803.5). Total num frames: 109387776. Throughput: 0: 1662.0. Samples: 22339326. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:53:47,935][41694] Avg episode reward: [(0, '4.176')] +[2024-11-08 02:53:52,715][42004] Updated weights for policy 0, policy_version 26716 (0.0023) +[2024-11-08 02:53:52,934][41694] Fps is (10 sec: 7780.3, 60 sec: 6894.6, 300 sec: 6877.8). Total num frames: 109428736. Throughput: 0: 1796.5. Samples: 22351192. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:53:52,937][41694] Avg episode reward: [(0, '4.544')] +[2024-11-08 02:53:57,926][42004] Updated weights for policy 0, policy_version 26726 (0.0030) +[2024-11-08 02:53:57,931][41694] Fps is (10 sec: 8192.1, 60 sec: 6894.9, 300 sec: 6886.8). Total num frames: 109469696. Throughput: 0: 1809.2. Samples: 22363194. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:53:57,932][41694] Avg episode reward: [(0, '4.471')] +[2024-11-08 02:54:02,932][41694] Fps is (10 sec: 7784.5, 60 sec: 7236.3, 300 sec: 6900.7). Total num frames: 109506560. Throughput: 0: 1807.2. Samples: 22368986. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:54:02,934][41694] Avg episode reward: [(0, '4.685')] +[2024-11-08 02:54:03,410][42004] Updated weights for policy 0, policy_version 26736 (0.0021) +[2024-11-08 02:54:07,932][41694] Fps is (10 sec: 7372.7, 60 sec: 7372.8, 300 sec: 6914.6). Total num frames: 109543424. Throughput: 0: 1786.7. Samples: 22380106. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:54:07,934][41694] Avg episode reward: [(0, '4.518')] +[2024-11-08 02:54:11,374][42004] Updated weights for policy 0, policy_version 26746 (0.0041) +[2024-11-08 02:54:12,932][41694] Fps is (10 sec: 5324.8, 60 sec: 6963.2, 300 sec: 6831.3). Total num frames: 109559808. Throughput: 0: 1669.0. Samples: 22386426. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:54:12,933][41694] Avg episode reward: [(0, '4.432')] +[2024-11-08 02:54:17,293][42004] Updated weights for policy 0, policy_version 26756 (0.0030) +[2024-11-08 02:54:17,932][41694] Fps is (10 sec: 5324.8, 60 sec: 6894.9, 300 sec: 6831.3). Total num frames: 109596672. Throughput: 0: 1654.8. Samples: 22391342. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:54:17,933][41694] Avg episode reward: [(0, '4.553')] +[2024-11-08 02:54:22,448][42004] Updated weights for policy 0, policy_version 26766 (0.0027) +[2024-11-08 02:54:22,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6894.9, 300 sec: 6817.4). Total num frames: 109633536. Throughput: 0: 1738.4. Samples: 22403016. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:54:22,934][41694] Avg episode reward: [(0, '4.759')] +[2024-11-08 02:54:27,452][42004] Updated weights for policy 0, policy_version 26776 (0.0038) +[2024-11-08 02:54:27,932][41694] Fps is (10 sec: 7782.3, 60 sec: 6894.9, 300 sec: 6876.8). Total num frames: 109674496. Throughput: 0: 1815.2. Samples: 22415282. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 02:54:27,933][41694] Avg episode reward: [(0, '4.588')] +[2024-11-08 02:54:32,509][42004] Updated weights for policy 0, policy_version 26786 (0.0036) +[2024-11-08 02:54:32,931][41694] Fps is (10 sec: 8192.1, 60 sec: 6963.2, 300 sec: 6900.7). Total num frames: 109715456. Throughput: 0: 1822.4. Samples: 22421334. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 02:54:32,933][41694] Avg episode reward: [(0, '4.527')] +[2024-11-08 02:54:37,677][42004] Updated weights for policy 0, policy_version 26796 (0.0028) +[2024-11-08 02:54:37,931][41694] Fps is (10 sec: 8192.2, 60 sec: 7372.8, 300 sec: 6928.5). Total num frames: 109756416. Throughput: 0: 1826.3. Samples: 22433372. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 02:54:37,935][41694] Avg episode reward: [(0, '4.429')] +[2024-11-08 02:54:42,932][41694] Fps is (10 sec: 7782.0, 60 sec: 7372.8, 300 sec: 6928.5). Total num frames: 109793280. Throughput: 0: 1809.4. Samples: 22444618. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:54:42,936][41694] Avg episode reward: [(0, '4.445')] +[2024-11-08 02:54:43,493][42004] Updated weights for policy 0, policy_version 26806 (0.0025) +[2024-11-08 02:54:47,932][41694] Fps is (10 sec: 5324.4, 60 sec: 7031.4, 300 sec: 6845.2). Total num frames: 109809664. Throughput: 0: 1731.5. Samples: 22446904. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:54:47,934][41694] Avg episode reward: [(0, '4.311')] +[2024-11-08 02:54:51,830][42004] Updated weights for policy 0, policy_version 26816 (0.0032) +[2024-11-08 02:54:52,931][41694] Fps is (10 sec: 5325.1, 60 sec: 6963.5, 300 sec: 6831.3). Total num frames: 109846528. Throughput: 0: 1678.6. Samples: 22455644. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:54:52,933][41694] Avg episode reward: [(0, '4.353')] +[2024-11-08 02:54:56,831][42004] Updated weights for policy 0, policy_version 26826 (0.0023) +[2024-11-08 02:54:57,931][41694] Fps is (10 sec: 7783.0, 60 sec: 6963.2, 300 sec: 6859.1). Total num frames: 109887488. Throughput: 0: 1805.3. Samples: 22467664. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:54:57,934][41694] Avg episode reward: [(0, '4.350')] +[2024-11-08 02:55:02,422][42004] Updated weights for policy 0, policy_version 26836 (0.0034) +[2024-11-08 02:55:02,932][41694] Fps is (10 sec: 7782.3, 60 sec: 6963.2, 300 sec: 6911.4). Total num frames: 109924352. Throughput: 0: 1821.6. Samples: 22473312. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:55:02,933][41694] Avg episode reward: [(0, '4.423')] +[2024-11-08 02:55:07,645][42004] Updated weights for policy 0, policy_version 26846 (0.0026) +[2024-11-08 02:55:07,931][41694] Fps is (10 sec: 7372.7, 60 sec: 6963.2, 300 sec: 6914.6). Total num frames: 109961216. Throughput: 0: 1814.0. Samples: 22484646. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:55:07,933][41694] Avg episode reward: [(0, '4.584')] +[2024-11-08 02:55:12,932][41694] Fps is (10 sec: 6963.2, 60 sec: 7236.3, 300 sec: 6928.5). Total num frames: 109993984. Throughput: 0: 1785.9. Samples: 22495648. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:55:12,943][41694] Avg episode reward: [(0, '4.234')] +[2024-11-08 02:55:13,607][42004] Updated weights for policy 0, policy_version 26856 (0.0026) +[2024-11-08 02:55:17,932][41694] Fps is (10 sec: 6553.4, 60 sec: 7168.0, 300 sec: 6914.6). Total num frames: 110026752. Throughput: 0: 1756.7. Samples: 22500384. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:55:17,942][41694] Avg episode reward: [(0, '4.485')] +[2024-11-08 02:55:22,651][42004] Updated weights for policy 0, policy_version 26866 (0.0032) +[2024-11-08 02:55:22,931][41694] Fps is (10 sec: 4915.3, 60 sec: 6826.7, 300 sec: 6845.2). Total num frames: 110043136. Throughput: 0: 1605.9. Samples: 22505636. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:55:22,934][41694] Avg episode reward: [(0, '4.573')] +[2024-11-08 02:55:27,931][41694] Fps is (10 sec: 5325.0, 60 sec: 6758.4, 300 sec: 6831.3). Total num frames: 110080000. Throughput: 0: 1595.6. Samples: 22516420. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:55:27,933][41694] Avg episode reward: [(0, '4.477')] +[2024-11-08 02:55:28,114][42004] Updated weights for policy 0, policy_version 26876 (0.0023) +[2024-11-08 02:55:32,932][41694] Fps is (10 sec: 7782.2, 60 sec: 6758.4, 300 sec: 6845.2). Total num frames: 110120960. Throughput: 0: 1677.6. Samples: 22522396. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:55:32,933][41694] Avg episode reward: [(0, '4.361')] +[2024-11-08 02:55:33,131][42004] Updated weights for policy 0, policy_version 26886 (0.0023) +[2024-11-08 02:55:37,932][41694] Fps is (10 sec: 8191.8, 60 sec: 6758.4, 300 sec: 6916.4). Total num frames: 110161920. Throughput: 0: 1752.4. Samples: 22534504. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:55:37,934][41694] Avg episode reward: [(0, '4.395')] +[2024-11-08 02:55:37,946][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000026895_110161920.pth... +[2024-11-08 02:55:38,056][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000026488_108494848.pth +[2024-11-08 02:55:38,385][42004] Updated weights for policy 0, policy_version 26896 (0.0021) +[2024-11-08 02:55:42,932][41694] Fps is (10 sec: 7782.5, 60 sec: 6758.4, 300 sec: 6928.5). Total num frames: 110198784. Throughput: 0: 1745.2. Samples: 22546200. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:55:42,934][41694] Avg episode reward: [(0, '4.303')] +[2024-11-08 02:55:43,632][42004] Updated weights for policy 0, policy_version 26906 (0.0023) +[2024-11-08 02:55:47,932][41694] Fps is (10 sec: 7782.4, 60 sec: 7168.1, 300 sec: 6956.3). Total num frames: 110239744. Throughput: 0: 1748.6. Samples: 22552000. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:55:47,934][41694] Avg episode reward: [(0, '4.435')] +[2024-11-08 02:55:48,832][42004] Updated weights for policy 0, policy_version 26916 (0.0028) +[2024-11-08 02:55:52,931][41694] Fps is (10 sec: 7372.9, 60 sec: 7099.7, 300 sec: 6928.5). Total num frames: 110272512. Throughput: 0: 1739.5. Samples: 22562924. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:55:52,933][41694] Avg episode reward: [(0, '4.558')] +[2024-11-08 02:55:54,992][42004] Updated weights for policy 0, policy_version 26926 (0.0031) +[2024-11-08 02:55:57,932][41694] Fps is (10 sec: 6963.0, 60 sec: 7031.4, 300 sec: 6928.5). Total num frames: 110309376. Throughput: 0: 1731.4. Samples: 22573560. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:55:57,934][41694] Avg episode reward: [(0, '4.657')] +[2024-11-08 02:56:00,631][42004] Updated weights for policy 0, policy_version 26936 (0.0033) +[2024-11-08 02:56:02,932][41694] Fps is (10 sec: 7372.6, 60 sec: 7031.5, 300 sec: 6928.5). Total num frames: 110346240. Throughput: 0: 1743.3. Samples: 22578832. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:56:02,934][41694] Avg episode reward: [(0, '4.514')] +[2024-11-08 02:56:06,051][42004] Updated weights for policy 0, policy_version 26946 (0.0024) +[2024-11-08 02:56:07,931][41694] Fps is (10 sec: 6963.5, 60 sec: 6963.2, 300 sec: 6928.5). Total num frames: 110379008. Throughput: 0: 1877.9. Samples: 22590140. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:56:07,933][41694] Avg episode reward: [(0, '4.254')] +[2024-11-08 02:56:12,939][41694] Fps is (10 sec: 4092.8, 60 sec: 6552.7, 300 sec: 6894.0). Total num frames: 110387200. Throughput: 0: 1722.9. Samples: 22593964. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:56:12,954][41694] Avg episode reward: [(0, '4.315')] +[2024-11-08 02:56:15,938][42004] Updated weights for policy 0, policy_version 26956 (0.0035) +[2024-11-08 02:56:17,931][41694] Fps is (10 sec: 4505.6, 60 sec: 6621.9, 300 sec: 6914.6). Total num frames: 110424064. Throughput: 0: 1698.1. Samples: 22598808. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:56:17,933][41694] Avg episode reward: [(0, '4.577')] +[2024-11-08 02:56:21,516][42004] Updated weights for policy 0, policy_version 26966 (0.0030) +[2024-11-08 02:56:22,932][41694] Fps is (10 sec: 7378.6, 60 sec: 6963.2, 300 sec: 6928.5). Total num frames: 110460928. Throughput: 0: 1673.9. Samples: 22609828. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:56:22,942][41694] Avg episode reward: [(0, '4.692')] +[2024-11-08 02:56:27,498][42004] Updated weights for policy 0, policy_version 26976 (0.0031) +[2024-11-08 02:56:27,931][41694] Fps is (10 sec: 6963.1, 60 sec: 6894.9, 300 sec: 6900.7). Total num frames: 110493696. Throughput: 0: 1643.7. Samples: 22620166. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:56:27,933][41694] Avg episode reward: [(0, '4.645')] +[2024-11-08 02:56:32,934][41694] Fps is (10 sec: 6961.4, 60 sec: 6826.4, 300 sec: 6900.7). Total num frames: 110530560. Throughput: 0: 1621.0. Samples: 22624948. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:56:32,937][41694] Avg episode reward: [(0, '4.430')] +[2024-11-08 02:56:33,184][42004] Updated weights for policy 0, policy_version 26986 (0.0026) +[2024-11-08 02:56:37,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6758.4, 300 sec: 6900.7). Total num frames: 110567424. Throughput: 0: 1636.0. Samples: 22636546. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:56:37,935][41694] Avg episode reward: [(0, '4.631')] +[2024-11-08 02:56:38,660][42004] Updated weights for policy 0, policy_version 26996 (0.0028) +[2024-11-08 02:56:42,931][41694] Fps is (10 sec: 7784.5, 60 sec: 6826.7, 300 sec: 6900.7). Total num frames: 110608384. Throughput: 0: 1654.6. Samples: 22648018. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:56:42,933][41694] Avg episode reward: [(0, '4.397')] +[2024-11-08 02:56:46,311][42004] Updated weights for policy 0, policy_version 27006 (0.0032) +[2024-11-08 02:56:47,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6417.1, 300 sec: 6886.8). Total num frames: 110624768. Throughput: 0: 1617.6. Samples: 22651624. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:56:47,939][41694] Avg episode reward: [(0, '4.355')] +[2024-11-08 02:56:51,804][42004] Updated weights for policy 0, policy_version 27016 (0.0031) +[2024-11-08 02:56:52,931][41694] Fps is (10 sec: 5734.4, 60 sec: 6553.6, 300 sec: 6928.5). Total num frames: 110665728. Throughput: 0: 1560.8. Samples: 22660374. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:56:52,934][41694] Avg episode reward: [(0, '4.553')] +[2024-11-08 02:56:57,117][42004] Updated weights for policy 0, policy_version 27026 (0.0022) +[2024-11-08 02:56:57,931][41694] Fps is (10 sec: 7782.5, 60 sec: 6553.6, 300 sec: 6928.5). Total num frames: 110702592. Throughput: 0: 1733.3. Samples: 22671950. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:56:57,933][41694] Avg episode reward: [(0, '4.560')] +[2024-11-08 02:57:02,931][41694] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6900.7). Total num frames: 110731264. Throughput: 0: 1732.2. Samples: 22676758. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:57:02,935][41694] Avg episode reward: [(0, '4.445')] +[2024-11-08 02:57:03,738][42004] Updated weights for policy 0, policy_version 27036 (0.0056) +[2024-11-08 02:57:07,932][41694] Fps is (10 sec: 6553.5, 60 sec: 6485.3, 300 sec: 6886.8). Total num frames: 110768128. Throughput: 0: 1699.0. Samples: 22686282. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:57:07,935][41694] Avg episode reward: [(0, '4.275')] +[2024-11-08 02:57:09,386][42004] Updated weights for policy 0, policy_version 27046 (0.0030) +[2024-11-08 02:57:12,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6964.1, 300 sec: 6886.9). Total num frames: 110804992. Throughput: 0: 1722.7. Samples: 22697686. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:57:12,934][41694] Avg episode reward: [(0, '4.609')] +[2024-11-08 02:57:14,693][42004] Updated weights for policy 0, policy_version 27056 (0.0023) +[2024-11-08 02:57:17,932][41694] Fps is (10 sec: 7782.3, 60 sec: 7031.4, 300 sec: 6900.7). Total num frames: 110845952. Throughput: 0: 1743.7. Samples: 22703412. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:57:17,934][41694] Avg episode reward: [(0, '4.326')] +[2024-11-08 02:57:22,541][42004] Updated weights for policy 0, policy_version 27066 (0.0027) +[2024-11-08 02:57:22,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6690.1, 300 sec: 6900.7). Total num frames: 110862336. Throughput: 0: 1636.2. Samples: 22710176. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:57:22,933][41694] Avg episode reward: [(0, '4.459')] +[2024-11-08 02:57:27,932][41694] Fps is (10 sec: 5324.6, 60 sec: 6758.3, 300 sec: 6914.6). Total num frames: 110899200. Throughput: 0: 1619.1. Samples: 22720880. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:57:27,934][41694] Avg episode reward: [(0, '4.328')] +[2024-11-08 02:57:28,301][42004] Updated weights for policy 0, policy_version 27076 (0.0026) +[2024-11-08 02:57:32,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6758.7, 300 sec: 6914.6). Total num frames: 110936064. Throughput: 0: 1664.1. Samples: 22726510. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:57:32,937][41694] Avg episode reward: [(0, '4.460')] +[2024-11-08 02:57:33,918][42004] Updated weights for policy 0, policy_version 27086 (0.0033) +[2024-11-08 02:57:37,932][41694] Fps is (10 sec: 6963.5, 60 sec: 6690.1, 300 sec: 6886.8). Total num frames: 110968832. Throughput: 0: 1705.5. Samples: 22737124. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:57:37,935][41694] Avg episode reward: [(0, '4.369')] +[2024-11-08 02:57:37,945][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000027092_110968832.pth... +[2024-11-08 02:57:38,075][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000026689_109318144.pth +[2024-11-08 02:57:39,932][42004] Updated weights for policy 0, policy_version 27096 (0.0034) +[2024-11-08 02:57:42,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6621.9, 300 sec: 6872.9). Total num frames: 111005696. Throughput: 0: 1689.0. Samples: 22747956. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:57:42,933][41694] Avg episode reward: [(0, '4.210')] +[2024-11-08 02:57:45,297][42004] Updated weights for policy 0, policy_version 27106 (0.0029) +[2024-11-08 02:57:47,932][41694] Fps is (10 sec: 7782.5, 60 sec: 7031.5, 300 sec: 6886.8). Total num frames: 111046656. Throughput: 0: 1711.6. Samples: 22753782. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 02:57:47,933][41694] Avg episode reward: [(0, '4.342')] +[2024-11-08 02:57:50,693][42004] Updated weights for policy 0, policy_version 27116 (0.0024) +[2024-11-08 02:57:52,931][41694] Fps is (10 sec: 7782.4, 60 sec: 6963.2, 300 sec: 6872.9). Total num frames: 111083520. Throughput: 0: 1748.3. Samples: 22764954. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 02:57:52,933][41694] Avg episode reward: [(0, '4.452')] +[2024-11-08 02:57:57,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6690.1, 300 sec: 6886.8). Total num frames: 111104000. Throughput: 0: 1646.2. Samples: 22771764. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 02:57:57,934][41694] Avg episode reward: [(0, '4.366')] +[2024-11-08 02:57:58,375][42004] Updated weights for policy 0, policy_version 27126 (0.0030) +[2024-11-08 02:58:02,931][41694] Fps is (10 sec: 5734.4, 60 sec: 6826.7, 300 sec: 6914.6). Total num frames: 111140864. Throughput: 0: 1647.6. Samples: 22777554. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 02:58:02,933][41694] Avg episode reward: [(0, '4.547')] +[2024-11-08 02:58:03,934][42004] Updated weights for policy 0, policy_version 27136 (0.0039) +[2024-11-08 02:58:07,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6826.7, 300 sec: 6900.7). Total num frames: 111177728. Throughput: 0: 1745.5. Samples: 22788724. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:58:07,934][41694] Avg episode reward: [(0, '4.580')] +[2024-11-08 02:58:09,572][42004] Updated weights for policy 0, policy_version 27146 (0.0026) +[2024-11-08 02:58:12,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6758.4, 300 sec: 6873.0). Total num frames: 111210496. Throughput: 0: 1741.0. Samples: 22799226. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:58:12,934][41694] Avg episode reward: [(0, '4.306')] +[2024-11-08 02:58:15,135][42004] Updated weights for policy 0, policy_version 27156 (0.0032) +[2024-11-08 02:58:17,932][41694] Fps is (10 sec: 7372.9, 60 sec: 6758.4, 300 sec: 6886.8). Total num frames: 111251456. Throughput: 0: 1744.5. Samples: 22805014. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:58:17,935][41694] Avg episode reward: [(0, '4.486')] +[2024-11-08 02:58:20,651][42004] Updated weights for policy 0, policy_version 27166 (0.0029) +[2024-11-08 02:58:22,932][41694] Fps is (10 sec: 7782.1, 60 sec: 7099.7, 300 sec: 6872.9). Total num frames: 111288320. Throughput: 0: 1756.6. Samples: 22816172. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:58:22,934][41694] Avg episode reward: [(0, '4.641')] +[2024-11-08 02:58:26,140][42004] Updated weights for policy 0, policy_version 27176 (0.0031) +[2024-11-08 02:58:30,192][41694] Fps is (10 sec: 6013.7, 60 sec: 6842.1, 300 sec: 6820.7). Total num frames: 111325184. Throughput: 0: 1682.1. Samples: 22827452. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:58:30,193][41694] Avg episode reward: [(0, '4.231')] +[2024-11-08 02:58:32,931][41694] Fps is (10 sec: 5325.0, 60 sec: 6758.4, 300 sec: 6873.0). Total num frames: 111341568. Throughput: 0: 1665.6. Samples: 22828734. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:58:32,935][41694] Avg episode reward: [(0, '4.357')] +[2024-11-08 02:58:34,151][42004] Updated weights for policy 0, policy_version 27186 (0.0030) +[2024-11-08 02:58:37,931][41694] Fps is (10 sec: 6879.6, 60 sec: 6826.7, 300 sec: 6873.0). Total num frames: 111378432. Throughput: 0: 1653.5. Samples: 22839360. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:58:37,934][41694] Avg episode reward: [(0, '4.397')] +[2024-11-08 02:58:39,742][42004] Updated weights for policy 0, policy_version 27196 (0.0038) +[2024-11-08 02:58:42,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6826.7, 300 sec: 6873.0). Total num frames: 111415296. Throughput: 0: 1732.1. Samples: 22849710. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:58:42,933][41694] Avg episode reward: [(0, '4.234')] +[2024-11-08 02:58:45,825][42004] Updated weights for policy 0, policy_version 27206 (0.0031) +[2024-11-08 02:58:47,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6690.2, 300 sec: 6845.2). Total num frames: 111448064. Throughput: 0: 1716.6. Samples: 22854802. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:58:47,933][41694] Avg episode reward: [(0, '4.306')] +[2024-11-08 02:58:51,351][42004] Updated weights for policy 0, policy_version 27216 (0.0027) +[2024-11-08 02:58:52,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6690.1, 300 sec: 6831.3). Total num frames: 111484928. Throughput: 0: 1717.6. Samples: 22866016. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:58:52,933][41694] Avg episode reward: [(0, '4.346')] +[2024-11-08 02:58:56,788][42004] Updated weights for policy 0, policy_version 27226 (0.0035) +[2024-11-08 02:58:57,932][41694] Fps is (10 sec: 7782.2, 60 sec: 7031.5, 300 sec: 6845.2). Total num frames: 111525888. Throughput: 0: 1737.6. Samples: 22877416. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:58:57,934][41694] Avg episode reward: [(0, '4.467')] +[2024-11-08 02:59:02,308][42004] Updated weights for policy 0, policy_version 27236 (0.0030) +[2024-11-08 02:59:04,581][41694] Fps is (10 sec: 6329.1, 60 sec: 6776.9, 300 sec: 6793.3). Total num frames: 111558656. Throughput: 0: 1670.1. Samples: 22882924. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:59:04,582][41694] Avg episode reward: [(0, '4.412')] +[2024-11-08 02:59:07,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6758.4, 300 sec: 6859.1). Total num frames: 111583232. Throughput: 0: 1641.0. Samples: 22890018. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:59:07,933][41694] Avg episode reward: [(0, '4.451')] +[2024-11-08 02:59:09,903][42004] Updated weights for policy 0, policy_version 27246 (0.0029) +[2024-11-08 02:59:12,931][41694] Fps is (10 sec: 7357.2, 60 sec: 6826.7, 300 sec: 6859.1). Total num frames: 111620096. Throughput: 0: 1728.8. Samples: 22901340. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:59:12,933][41694] Avg episode reward: [(0, '4.335')] +[2024-11-08 02:59:15,873][42004] Updated weights for policy 0, policy_version 27256 (0.0034) +[2024-11-08 02:59:17,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6690.1, 300 sec: 6845.2). Total num frames: 111652864. Throughput: 0: 1724.5. Samples: 22906338. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:59:17,933][41694] Avg episode reward: [(0, '4.517')] +[2024-11-08 02:59:21,494][42004] Updated weights for policy 0, policy_version 27266 (0.0028) +[2024-11-08 02:59:22,940][41694] Fps is (10 sec: 6957.4, 60 sec: 6689.2, 300 sec: 6831.1). Total num frames: 111689728. Throughput: 0: 1724.7. Samples: 22916984. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:59:22,942][41694] Avg episode reward: [(0, '4.550')] +[2024-11-08 02:59:26,888][42004] Updated weights for policy 0, policy_version 27276 (0.0030) +[2024-11-08 02:59:27,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6952.0, 300 sec: 6817.4). Total num frames: 111726592. Throughput: 0: 1751.2. Samples: 22928516. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:59:27,934][41694] Avg episode reward: [(0, '4.339')] +[2024-11-08 02:59:32,229][42004] Updated weights for policy 0, policy_version 27286 (0.0034) +[2024-11-08 02:59:32,931][41694] Fps is (10 sec: 7788.9, 60 sec: 7099.7, 300 sec: 6817.4). Total num frames: 111767552. Throughput: 0: 1763.7. Samples: 22934168. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:59:32,933][41694] Avg episode reward: [(0, '4.558')] +[2024-11-08 02:59:39,141][41694] Fps is (10 sec: 6212.0, 60 sec: 6825.6, 300 sec: 6761.9). Total num frames: 111796224. Throughput: 0: 1724.6. Samples: 22945708. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:59:39,146][41694] Avg episode reward: [(0, '4.508')] +[2024-11-08 02:59:39,155][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000027294_111796224.pth... +[2024-11-08 02:59:39,324][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000026895_110161920.pth +[2024-11-08 02:59:39,915][42004] Updated weights for policy 0, policy_version 27296 (0.0025) +[2024-11-08 02:59:42,931][41694] Fps is (10 sec: 5734.3, 60 sec: 6826.7, 300 sec: 6831.3). Total num frames: 111824896. Throughput: 0: 1667.6. Samples: 22952460. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:59:42,934][41694] Avg episode reward: [(0, '4.414')] +[2024-11-08 02:59:45,554][42004] Updated weights for policy 0, policy_version 27306 (0.0023) +[2024-11-08 02:59:47,932][41694] Fps is (10 sec: 6989.2, 60 sec: 6826.7, 300 sec: 6817.4). Total num frames: 111857664. Throughput: 0: 1727.7. Samples: 22957820. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 02:59:47,937][41694] Avg episode reward: [(0, '4.462')] +[2024-11-08 02:59:51,573][42004] Updated weights for policy 0, policy_version 27316 (0.0038) +[2024-11-08 02:59:52,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6826.6, 300 sec: 6803.5). Total num frames: 111894528. Throughput: 0: 1736.5. Samples: 22968162. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:59:52,936][41694] Avg episode reward: [(0, '4.325')] +[2024-11-08 02:59:57,276][42004] Updated weights for policy 0, policy_version 27326 (0.0049) +[2024-11-08 02:59:57,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6758.4, 300 sec: 6803.5). Total num frames: 111931392. Throughput: 0: 1720.7. Samples: 22978770. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 02:59:57,935][41694] Avg episode reward: [(0, '4.555')] +[2024-11-08 03:00:02,825][42004] Updated weights for policy 0, policy_version 27336 (0.0025) +[2024-11-08 03:00:02,932][41694] Fps is (10 sec: 7372.8, 60 sec: 7019.6, 300 sec: 6803.5). Total num frames: 111968256. Throughput: 0: 1736.6. Samples: 22984486. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:00:02,933][41694] Avg episode reward: [(0, '4.339')] +[2024-11-08 03:00:07,931][41694] Fps is (10 sec: 7372.9, 60 sec: 7031.5, 300 sec: 6817.4). Total num frames: 112005120. Throughput: 0: 1748.3. Samples: 22995642. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:00:07,933][41694] Avg episode reward: [(0, '4.380')] +[2024-11-08 03:00:08,181][42004] Updated weights for policy 0, policy_version 27346 (0.0026) +[2024-11-08 03:00:13,713][41694] Fps is (10 sec: 5698.6, 60 sec: 6738.9, 300 sec: 6771.7). Total num frames: 112029696. Throughput: 0: 1596.5. Samples: 23001608. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:00:13,715][41694] Avg episode reward: [(0, '4.460')] +[2024-11-08 03:00:16,484][42004] Updated weights for policy 0, policy_version 27356 (0.0037) +[2024-11-08 03:00:17,932][41694] Fps is (10 sec: 5324.7, 60 sec: 6758.4, 300 sec: 6831.3). Total num frames: 112058368. Throughput: 0: 1634.5. Samples: 23007722. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:00:17,933][41694] Avg episode reward: [(0, '4.472')] +[2024-11-08 03:00:22,607][42004] Updated weights for policy 0, policy_version 27366 (0.0027) +[2024-11-08 03:00:22,931][41694] Fps is (10 sec: 6665.0, 60 sec: 6691.1, 300 sec: 6817.4). Total num frames: 112091136. Throughput: 0: 1643.2. Samples: 23017666. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:00:22,933][41694] Avg episode reward: [(0, '4.258')] +[2024-11-08 03:00:27,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6690.1, 300 sec: 6803.5). Total num frames: 112128000. Throughput: 0: 1682.6. Samples: 23028178. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:00:27,933][41694] Avg episode reward: [(0, '4.431')] +[2024-11-08 03:00:28,269][42004] Updated weights for policy 0, policy_version 27376 (0.0024) +[2024-11-08 03:00:32,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6621.9, 300 sec: 6789.6). Total num frames: 112164864. Throughput: 0: 1690.8. Samples: 23033906. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:00:32,939][41694] Avg episode reward: [(0, '4.596')] +[2024-11-08 03:00:33,555][42004] Updated weights for policy 0, policy_version 27386 (0.0028) +[2024-11-08 03:00:37,932][41694] Fps is (10 sec: 7782.3, 60 sec: 6967.1, 300 sec: 6803.5). Total num frames: 112205824. Throughput: 0: 1715.9. Samples: 23045378. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:00:37,933][41694] Avg episode reward: [(0, '4.381')] +[2024-11-08 03:00:38,934][42004] Updated weights for policy 0, policy_version 27396 (0.0024) +[2024-11-08 03:00:42,932][41694] Fps is (10 sec: 7781.8, 60 sec: 6963.1, 300 sec: 6789.6). Total num frames: 112242688. Throughput: 0: 1736.2. Samples: 23056898. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:00:42,936][41694] Avg episode reward: [(0, '4.427')] +[2024-11-08 03:00:44,330][42004] Updated weights for policy 0, policy_version 27406 (0.0022) +[2024-11-08 03:00:48,284][41694] Fps is (10 sec: 5539.1, 60 sec: 6718.9, 300 sec: 6739.9). Total num frames: 112263168. Throughput: 0: 1719.7. Samples: 23062480. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:00:48,287][41694] Avg episode reward: [(0, '4.595')] +[2024-11-08 03:00:52,314][42004] Updated weights for policy 0, policy_version 27416 (0.0033) +[2024-11-08 03:00:52,931][41694] Fps is (10 sec: 5734.8, 60 sec: 6758.4, 300 sec: 6748.0). Total num frames: 112300032. Throughput: 0: 1629.6. Samples: 23068974. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:00:52,933][41694] Avg episode reward: [(0, '4.466')] +[2024-11-08 03:00:57,932][41694] Fps is (10 sec: 7217.6, 60 sec: 6690.1, 300 sec: 6734.1). Total num frames: 112332800. Throughput: 0: 1767.7. Samples: 23079774. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:00:57,933][41694] Avg episode reward: [(0, '4.459')] +[2024-11-08 03:00:58,059][42004] Updated weights for policy 0, policy_version 27426 (0.0030) +[2024-11-08 03:01:02,933][41694] Fps is (10 sec: 6962.4, 60 sec: 6690.0, 300 sec: 6748.0). Total num frames: 112369664. Throughput: 0: 1720.5. Samples: 23085148. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:01:02,936][41694] Avg episode reward: [(0, '4.407')] +[2024-11-08 03:01:03,671][42004] Updated weights for policy 0, policy_version 27436 (0.0024) +[2024-11-08 03:01:07,932][41694] Fps is (10 sec: 7372.9, 60 sec: 6690.1, 300 sec: 6845.4). Total num frames: 112406528. Throughput: 0: 1747.9. Samples: 23096322. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:01:07,936][41694] Avg episode reward: [(0, '4.655')] +[2024-11-08 03:01:09,064][42004] Updated weights for policy 0, policy_version 27446 (0.0028) +[2024-11-08 03:01:12,931][41694] Fps is (10 sec: 7373.6, 60 sec: 6985.9, 300 sec: 6845.2). Total num frames: 112443392. Throughput: 0: 1752.0. Samples: 23107020. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:01:12,934][41694] Avg episode reward: [(0, '4.319')] +[2024-11-08 03:01:15,056][42004] Updated weights for policy 0, policy_version 27456 (0.0030) +[2024-11-08 03:01:17,931][41694] Fps is (10 sec: 7372.9, 60 sec: 7031.5, 300 sec: 6845.2). Total num frames: 112480256. Throughput: 0: 1744.1. Samples: 23112390. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:01:17,933][41694] Avg episode reward: [(0, '4.529')] +[2024-11-08 03:01:22,874][42004] Updated weights for policy 0, policy_version 27466 (0.0032) +[2024-11-08 03:01:22,931][41694] Fps is (10 sec: 5734.4, 60 sec: 6826.7, 300 sec: 6803.5). Total num frames: 112500736. Throughput: 0: 1717.3. Samples: 23122654. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:01:22,932][41694] Avg episode reward: [(0, '4.352')] +[2024-11-08 03:01:27,931][41694] Fps is (10 sec: 5734.4, 60 sec: 6826.7, 300 sec: 6803.6). Total num frames: 112537600. Throughput: 0: 1634.7. Samples: 23130458. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:01:27,936][41694] Avg episode reward: [(0, '4.358')] +[2024-11-08 03:01:28,490][42004] Updated weights for policy 0, policy_version 27476 (0.0037) +[2024-11-08 03:01:32,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6758.4, 300 sec: 6789.6). Total num frames: 112570368. Throughput: 0: 1633.4. Samples: 23135408. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:01:32,933][41694] Avg episode reward: [(0, '4.584')] +[2024-11-08 03:01:34,448][42004] Updated weights for policy 0, policy_version 27486 (0.0025) +[2024-11-08 03:01:37,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6690.1, 300 sec: 6775.8). Total num frames: 112607232. Throughput: 0: 1715.5. Samples: 23146170. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:01:37,934][41694] Avg episode reward: [(0, '4.399')] +[2024-11-08 03:01:37,947][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000027492_112607232.pth... +[2024-11-08 03:01:38,052][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000027092_110968832.pth +[2024-11-08 03:01:39,862][42004] Updated weights for policy 0, policy_version 27496 (0.0031) +[2024-11-08 03:01:42,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6690.2, 300 sec: 6845.2). Total num frames: 112644096. Throughput: 0: 1725.8. Samples: 23157436. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:01:42,933][41694] Avg episode reward: [(0, '4.534')] +[2024-11-08 03:01:45,264][42004] Updated weights for policy 0, policy_version 27506 (0.0027) +[2024-11-08 03:01:47,932][41694] Fps is (10 sec: 7782.5, 60 sec: 7073.0, 300 sec: 6845.2). Total num frames: 112685056. Throughput: 0: 1734.8. Samples: 23163214. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:01:47,934][41694] Avg episode reward: [(0, '4.591')] +[2024-11-08 03:01:50,660][42004] Updated weights for policy 0, policy_version 27516 (0.0032) +[2024-11-08 03:01:52,932][41694] Fps is (10 sec: 7782.3, 60 sec: 7031.4, 300 sec: 6845.2). Total num frames: 112721920. Throughput: 0: 1743.5. Samples: 23174780. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:01:52,934][41694] Avg episode reward: [(0, '4.575')] +[2024-11-08 03:01:57,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6826.7, 300 sec: 6817.4). Total num frames: 112742400. Throughput: 0: 1679.2. Samples: 23182582. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:01:57,934][41694] Avg episode reward: [(0, '4.523')] +[2024-11-08 03:01:58,280][42004] Updated weights for policy 0, policy_version 27526 (0.0031) +[2024-11-08 03:02:02,932][41694] Fps is (10 sec: 5324.9, 60 sec: 6758.5, 300 sec: 6803.5). Total num frames: 112775168. Throughput: 0: 1671.3. Samples: 23187600. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 03:02:02,934][41694] Avg episode reward: [(0, '4.655')] +[2024-11-08 03:02:04,706][42004] Updated weights for policy 0, policy_version 27536 (0.0033) +[2024-11-08 03:02:07,932][41694] Fps is (10 sec: 6553.6, 60 sec: 6690.1, 300 sec: 6789.6). Total num frames: 112807936. Throughput: 0: 1641.2. Samples: 23196506. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 03:02:07,934][41694] Avg episode reward: [(0, '4.325')] +[2024-11-08 03:02:10,590][42004] Updated weights for policy 0, policy_version 27546 (0.0024) +[2024-11-08 03:02:12,933][41694] Fps is (10 sec: 6962.5, 60 sec: 6690.0, 300 sec: 6775.7). Total num frames: 112844800. Throughput: 0: 1708.2. Samples: 23207328. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 03:02:12,934][41694] Avg episode reward: [(0, '4.363')] +[2024-11-08 03:02:15,954][42004] Updated weights for policy 0, policy_version 27556 (0.0027) +[2024-11-08 03:02:17,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6690.1, 300 sec: 6845.2). Total num frames: 112881664. Throughput: 0: 1722.3. Samples: 23212912. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 03:02:17,933][41694] Avg episode reward: [(0, '4.542')] +[2024-11-08 03:02:21,512][42004] Updated weights for policy 0, policy_version 27566 (0.0020) +[2024-11-08 03:02:22,932][41694] Fps is (10 sec: 7373.5, 60 sec: 6963.2, 300 sec: 6845.2). Total num frames: 112918528. Throughput: 0: 1731.6. Samples: 23224094. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:02:22,935][41694] Avg episode reward: [(0, '4.405')] +[2024-11-08 03:02:26,908][42004] Updated weights for policy 0, policy_version 27576 (0.0021) +[2024-11-08 03:02:27,931][41694] Fps is (10 sec: 7782.4, 60 sec: 7031.5, 300 sec: 6859.1). Total num frames: 112959488. Throughput: 0: 1738.3. Samples: 23235658. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:02:27,934][41694] Avg episode reward: [(0, '4.433')] +[2024-11-08 03:02:32,931][41694] Fps is (10 sec: 6144.1, 60 sec: 6826.7, 300 sec: 6817.4). Total num frames: 112979968. Throughput: 0: 1719.9. Samples: 23240610. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:02:32,934][41694] Avg episode reward: [(0, '4.402')] +[2024-11-08 03:02:34,424][42004] Updated weights for policy 0, policy_version 27586 (0.0026) +[2024-11-08 03:02:37,931][41694] Fps is (10 sec: 5324.7, 60 sec: 6758.4, 300 sec: 6803.5). Total num frames: 113012736. Throughput: 0: 1636.7. Samples: 23248430. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:02:37,933][41694] Avg episode reward: [(0, '4.332')] +[2024-11-08 03:02:40,551][42004] Updated weights for policy 0, policy_version 27596 (0.0033) +[2024-11-08 03:02:42,932][41694] Fps is (10 sec: 6553.4, 60 sec: 6690.1, 300 sec: 6775.8). Total num frames: 113045504. Throughput: 0: 1682.2. Samples: 23258280. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:02:42,933][41694] Avg episode reward: [(0, '4.445')] +[2024-11-08 03:02:46,378][42004] Updated weights for policy 0, policy_version 27606 (0.0030) +[2024-11-08 03:02:47,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6621.9, 300 sec: 6775.8). Total num frames: 113082368. Throughput: 0: 1688.2. Samples: 23263568. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:02:47,933][41694] Avg episode reward: [(0, '4.414')] +[2024-11-08 03:02:51,962][42004] Updated weights for policy 0, policy_version 27616 (0.0024) +[2024-11-08 03:02:52,933][41694] Fps is (10 sec: 7372.4, 60 sec: 6621.8, 300 sec: 6831.3). Total num frames: 113119232. Throughput: 0: 1731.6. Samples: 23274428. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:02:52,936][41694] Avg episode reward: [(0, '4.602')] +[2024-11-08 03:02:57,150][42004] Updated weights for policy 0, policy_version 27626 (0.0029) +[2024-11-08 03:02:57,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6963.2, 300 sec: 6845.2). Total num frames: 113160192. Throughput: 0: 1756.5. Samples: 23286370. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:02:57,933][41694] Avg episode reward: [(0, '4.479')] +[2024-11-08 03:03:02,618][42004] Updated weights for policy 0, policy_version 27636 (0.0030) +[2024-11-08 03:03:02,931][41694] Fps is (10 sec: 7783.1, 60 sec: 7031.5, 300 sec: 6845.2). Total num frames: 113197056. Throughput: 0: 1754.0. Samples: 23291844. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:03:02,934][41694] Avg episode reward: [(0, '4.435')] +[2024-11-08 03:03:07,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6826.7, 300 sec: 6803.5). Total num frames: 113217536. Throughput: 0: 1677.7. Samples: 23299590. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:03:07,934][41694] Avg episode reward: [(0, '4.349')] +[2024-11-08 03:03:10,537][42004] Updated weights for policy 0, policy_version 27646 (0.0037) +[2024-11-08 03:03:12,931][41694] Fps is (10 sec: 5324.8, 60 sec: 6758.5, 300 sec: 6775.8). Total num frames: 113250304. Throughput: 0: 1635.2. Samples: 23309244. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:03:12,933][41694] Avg episode reward: [(0, '4.272')] +[2024-11-08 03:03:16,519][42004] Updated weights for policy 0, policy_version 27656 (0.0027) +[2024-11-08 03:03:17,936][41694] Fps is (10 sec: 6959.9, 60 sec: 6757.9, 300 sec: 6775.7). Total num frames: 113287168. Throughput: 0: 1639.5. Samples: 23314394. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:03:17,938][41694] Avg episode reward: [(0, '4.517')] +[2024-11-08 03:03:22,007][42004] Updated weights for policy 0, policy_version 27666 (0.0030) +[2024-11-08 03:03:22,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6758.4, 300 sec: 6828.1). Total num frames: 113324032. Throughput: 0: 1712.8. Samples: 23325506. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:03:22,934][41694] Avg episode reward: [(0, '4.360')] +[2024-11-08 03:03:27,393][42004] Updated weights for policy 0, policy_version 27676 (0.0032) +[2024-11-08 03:03:27,932][41694] Fps is (10 sec: 7376.3, 60 sec: 6690.1, 300 sec: 6845.2). Total num frames: 113360896. Throughput: 0: 1749.6. Samples: 23337014. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:03:27,934][41694] Avg episode reward: [(0, '4.340')] +[2024-11-08 03:03:32,756][42004] Updated weights for policy 0, policy_version 27686 (0.0031) +[2024-11-08 03:03:32,932][41694] Fps is (10 sec: 7782.3, 60 sec: 7031.5, 300 sec: 6859.1). Total num frames: 113401856. Throughput: 0: 1753.6. Samples: 23342480. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:03:32,934][41694] Avg episode reward: [(0, '4.449')] +[2024-11-08 03:03:37,932][41694] Fps is (10 sec: 7782.1, 60 sec: 7099.7, 300 sec: 6859.0). Total num frames: 113438720. Throughput: 0: 1773.9. Samples: 23354252. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:03:37,933][41694] Avg episode reward: [(0, '4.397')] +[2024-11-08 03:03:37,989][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000027696_113442816.pth... +[2024-11-08 03:03:37,996][42004] Updated weights for policy 0, policy_version 27696 (0.0026) +[2024-11-08 03:03:38,100][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000027294_111796224.pth +[2024-11-08 03:03:42,931][41694] Fps is (10 sec: 5734.5, 60 sec: 6895.0, 300 sec: 6817.4). Total num frames: 113459200. Throughput: 0: 1666.1. Samples: 23361346. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:03:42,933][41694] Avg episode reward: [(0, '4.565')] +[2024-11-08 03:03:46,550][42004] Updated weights for policy 0, policy_version 27706 (0.0025) +[2024-11-08 03:03:47,932][41694] Fps is (10 sec: 5325.0, 60 sec: 6826.7, 300 sec: 6803.5). Total num frames: 113491968. Throughput: 0: 1647.7. Samples: 23365990. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:03:47,934][41694] Avg episode reward: [(0, '4.717')] +[2024-11-08 03:03:52,577][42004] Updated weights for policy 0, policy_version 27716 (0.0029) +[2024-11-08 03:03:52,931][41694] Fps is (10 sec: 6553.6, 60 sec: 6758.5, 300 sec: 6775.8). Total num frames: 113524736. Throughput: 0: 1707.9. Samples: 23376444. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:03:52,933][41694] Avg episode reward: [(0, '4.312')] +[2024-11-08 03:03:57,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6690.2, 300 sec: 6827.8). Total num frames: 113561600. Throughput: 0: 1709.8. Samples: 23386186. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:03:57,933][41694] Avg episode reward: [(0, '4.566')] +[2024-11-08 03:03:58,449][42004] Updated weights for policy 0, policy_version 27726 (0.0024) +[2024-11-08 03:04:02,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6621.9, 300 sec: 6817.4). Total num frames: 113594368. Throughput: 0: 1721.5. Samples: 23391852. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:04:02,934][41694] Avg episode reward: [(0, '4.507')] +[2024-11-08 03:04:04,391][42004] Updated weights for policy 0, policy_version 27736 (0.0034) +[2024-11-08 03:04:07,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6894.9, 300 sec: 6817.4). Total num frames: 113631232. Throughput: 0: 1698.0. Samples: 23401918. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:04:07,934][41694] Avg episode reward: [(0, '4.566')] +[2024-11-08 03:04:10,010][42004] Updated weights for policy 0, policy_version 27746 (0.0025) +[2024-11-08 03:04:12,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6963.2, 300 sec: 6831.3). Total num frames: 113668096. Throughput: 0: 1692.0. Samples: 23413152. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:04:12,933][41694] Avg episode reward: [(0, '4.626')] +[2024-11-08 03:04:17,918][42004] Updated weights for policy 0, policy_version 27756 (0.0034) +[2024-11-08 03:04:17,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6690.7, 300 sec: 6775.9). Total num frames: 113688576. Throughput: 0: 1623.1. Samples: 23415518. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:04:17,933][41694] Avg episode reward: [(0, '4.471')] +[2024-11-08 03:04:22,932][41694] Fps is (10 sec: 5324.7, 60 sec: 6621.8, 300 sec: 6761.9). Total num frames: 113721344. Throughput: 0: 1569.4. Samples: 23424874. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:04:22,934][41694] Avg episode reward: [(0, '4.512')] +[2024-11-08 03:04:23,723][42004] Updated weights for policy 0, policy_version 27766 (0.0029) +[2024-11-08 03:04:27,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6621.9, 300 sec: 6748.0). Total num frames: 113758208. Throughput: 0: 1655.3. Samples: 23435834. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:04:27,933][41694] Avg episode reward: [(0, '4.407')] +[2024-11-08 03:04:29,330][42004] Updated weights for policy 0, policy_version 27776 (0.0043) +[2024-11-08 03:04:32,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6553.6, 300 sec: 6803.6). Total num frames: 113795072. Throughput: 0: 1677.3. Samples: 23441470. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:04:32,933][41694] Avg episode reward: [(0, '4.268')] +[2024-11-08 03:04:34,688][42004] Updated weights for policy 0, policy_version 27786 (0.0029) +[2024-11-08 03:04:37,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6553.7, 300 sec: 6803.5). Total num frames: 113831936. Throughput: 0: 1698.0. Samples: 23452856. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:04:37,934][41694] Avg episode reward: [(0, '4.400')] +[2024-11-08 03:04:40,196][42004] Updated weights for policy 0, policy_version 27796 (0.0025) +[2024-11-08 03:04:42,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6826.6, 300 sec: 6817.4). Total num frames: 113868800. Throughput: 0: 1726.0. Samples: 23463858. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:04:42,934][41694] Avg episode reward: [(0, '4.440')] +[2024-11-08 03:04:45,653][42004] Updated weights for policy 0, policy_version 27806 (0.0036) +[2024-11-08 03:04:49,954][41694] Fps is (10 sec: 6132.7, 60 sec: 6670.1, 300 sec: 6771.0). Total num frames: 113905664. Throughput: 0: 1653.8. Samples: 23469618. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:04:49,956][41694] Avg episode reward: [(0, '4.505')] +[2024-11-08 03:04:52,932][41694] Fps is (10 sec: 5734.5, 60 sec: 6690.1, 300 sec: 6761.9). Total num frames: 113926144. Throughput: 0: 1663.6. Samples: 23476782. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:04:52,934][41694] Avg episode reward: [(0, '4.422')] +[2024-11-08 03:04:53,663][42004] Updated weights for policy 0, policy_version 27816 (0.0035) +[2024-11-08 03:04:57,933][41694] Fps is (10 sec: 7187.0, 60 sec: 6690.0, 300 sec: 6761.9). Total num frames: 113963008. Throughput: 0: 1640.4. Samples: 23486974. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:04:57,935][41694] Avg episode reward: [(0, '4.309')] +[2024-11-08 03:04:59,302][42004] Updated weights for policy 0, policy_version 27826 (0.0041) +[2024-11-08 03:05:02,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6758.4, 300 sec: 6761.9). Total num frames: 113999872. Throughput: 0: 1716.9. Samples: 23492780. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:05:02,934][41694] Avg episode reward: [(0, '4.360')] +[2024-11-08 03:05:04,925][42004] Updated weights for policy 0, policy_version 27836 (0.0036) +[2024-11-08 03:05:07,932][41694] Fps is (10 sec: 7373.4, 60 sec: 6758.4, 300 sec: 6821.6). Total num frames: 114036736. Throughput: 0: 1752.1. Samples: 23503718. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:05:07,934][41694] Avg episode reward: [(0, '4.251')] +[2024-11-08 03:05:10,405][42004] Updated weights for policy 0, policy_version 27846 (0.0025) +[2024-11-08 03:05:12,931][41694] Fps is (10 sec: 7373.0, 60 sec: 6758.4, 300 sec: 6831.3). Total num frames: 114073600. Throughput: 0: 1750.2. Samples: 23514594. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:05:12,933][41694] Avg episode reward: [(0, '4.355')] +[2024-11-08 03:05:16,562][42004] Updated weights for policy 0, policy_version 27856 (0.0027) +[2024-11-08 03:05:17,939][41694] Fps is (10 sec: 6963.2, 60 sec: 6963.2, 300 sec: 6831.3). Total num frames: 114106368. Throughput: 0: 1730.1. Samples: 23519326. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:05:17,944][41694] Avg episode reward: [(0, '4.219')] +[2024-11-08 03:05:22,090][42004] Updated weights for policy 0, policy_version 27866 (0.0022) +[2024-11-08 03:05:24,515][41694] Fps is (10 sec: 5657.5, 60 sec: 6784.1, 300 sec: 6781.0). Total num frames: 114139136. Throughput: 0: 1659.4. Samples: 23530156. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:05:24,517][41694] Avg episode reward: [(0, '4.491')] +[2024-11-08 03:05:27,932][41694] Fps is (10 sec: 5324.8, 60 sec: 6690.1, 300 sec: 6761.9). Total num frames: 114159616. Throughput: 0: 1613.2. Samples: 23536454. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:05:27,934][41694] Avg episode reward: [(0, '4.447')] +[2024-11-08 03:05:30,539][42004] Updated weights for policy 0, policy_version 27876 (0.0026) +[2024-11-08 03:05:32,932][41694] Fps is (10 sec: 6813.3, 60 sec: 6690.1, 300 sec: 6748.0). Total num frames: 114196480. Throughput: 0: 1674.0. Samples: 23541562. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:05:32,933][41694] Avg episode reward: [(0, '4.385')] +[2024-11-08 03:05:35,923][42004] Updated weights for policy 0, policy_version 27886 (0.0033) +[2024-11-08 03:05:37,934][41694] Fps is (10 sec: 7371.3, 60 sec: 6689.9, 300 sec: 6748.0). Total num frames: 114233344. Throughput: 0: 1695.8. Samples: 23553098. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:05:37,936][41694] Avg episode reward: [(0, '4.621')] +[2024-11-08 03:05:38,057][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000027890_114237440.pth... +[2024-11-08 03:05:38,161][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000027492_112607232.pth +[2024-11-08 03:05:41,263][42004] Updated weights for policy 0, policy_version 27896 (0.0023) +[2024-11-08 03:05:42,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6690.1, 300 sec: 6811.7). Total num frames: 114270208. Throughput: 0: 1726.0. Samples: 23564642. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:05:42,934][41694] Avg episode reward: [(0, '4.428')] +[2024-11-08 03:05:46,627][42004] Updated weights for policy 0, policy_version 27906 (0.0024) +[2024-11-08 03:05:47,931][41694] Fps is (10 sec: 7784.1, 60 sec: 6994.1, 300 sec: 6817.4). Total num frames: 114311168. Throughput: 0: 1717.0. Samples: 23570044. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:05:47,933][41694] Avg episode reward: [(0, '4.467')] +[2024-11-08 03:05:52,068][42004] Updated weights for policy 0, policy_version 27916 (0.0029) +[2024-11-08 03:05:52,932][41694] Fps is (10 sec: 7782.5, 60 sec: 7031.5, 300 sec: 6831.3). Total num frames: 114348032. Throughput: 0: 1725.2. Samples: 23581352. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:05:52,934][41694] Avg episode reward: [(0, '4.327')] +[2024-11-08 03:05:59,050][41694] Fps is (10 sec: 5894.5, 60 sec: 6768.9, 300 sec: 6777.9). Total num frames: 114376704. Throughput: 0: 1571.8. Samples: 23587082. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:05:59,051][41694] Avg episode reward: [(0, '4.503')] +[2024-11-08 03:05:59,991][42004] Updated weights for policy 0, policy_version 27926 (0.0034) +[2024-11-08 03:06:02,931][41694] Fps is (10 sec: 5324.8, 60 sec: 6690.2, 300 sec: 6761.9). Total num frames: 114401280. Throughput: 0: 1655.6. Samples: 23593826. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:06:02,933][41694] Avg episode reward: [(0, '4.629')] +[2024-11-08 03:06:06,260][42004] Updated weights for policy 0, policy_version 27936 (0.0036) +[2024-11-08 03:06:07,938][41694] Fps is (10 sec: 6912.4, 60 sec: 6689.4, 300 sec: 6761.7). Total num frames: 114438144. Throughput: 0: 1691.9. Samples: 23603624. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:06:07,940][41694] Avg episode reward: [(0, '4.443')] +[2024-11-08 03:06:12,310][42004] Updated weights for policy 0, policy_version 27946 (0.0036) +[2024-11-08 03:06:12,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6621.8, 300 sec: 6748.0). Total num frames: 114470912. Throughput: 0: 1720.0. Samples: 23613854. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:06:12,935][41694] Avg episode reward: [(0, '4.584')] +[2024-11-08 03:06:17,561][42004] Updated weights for policy 0, policy_version 27956 (0.0032) +[2024-11-08 03:06:17,931][41694] Fps is (10 sec: 6967.8, 60 sec: 6690.2, 300 sec: 6803.5). Total num frames: 114507776. Throughput: 0: 1724.0. Samples: 23619140. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:06:17,933][41694] Avg episode reward: [(0, '4.478')] +[2024-11-08 03:06:22,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6941.6, 300 sec: 6803.5). Total num frames: 114544640. Throughput: 0: 1731.2. Samples: 23631000. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:06:22,937][41694] Avg episode reward: [(0, '4.626')] +[2024-11-08 03:06:22,944][42004] Updated weights for policy 0, policy_version 27966 (0.0034) +[2024-11-08 03:06:27,931][41694] Fps is (10 sec: 7782.4, 60 sec: 7099.8, 300 sec: 6831.3). Total num frames: 114585600. Throughput: 0: 1734.4. Samples: 23642690. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:06:27,933][41694] Avg episode reward: [(0, '4.398')] +[2024-11-08 03:06:28,301][42004] Updated weights for policy 0, policy_version 27976 (0.0023) +[2024-11-08 03:06:33,595][41694] Fps is (10 sec: 6145.7, 60 sec: 6819.5, 300 sec: 6774.4). Total num frames: 114610176. Throughput: 0: 1711.5. Samples: 23648198. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:06:33,597][41694] Avg episode reward: [(0, '4.520')] +[2024-11-08 03:06:36,410][42004] Updated weights for policy 0, policy_version 27986 (0.0032) +[2024-11-08 03:06:37,932][41694] Fps is (10 sec: 5324.7, 60 sec: 6758.6, 300 sec: 6761.9). Total num frames: 114638848. Throughput: 0: 1623.3. Samples: 23654400. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:06:37,941][41694] Avg episode reward: [(0, '4.540')] +[2024-11-08 03:06:42,010][42004] Updated weights for policy 0, policy_version 27996 (0.0033) +[2024-11-08 03:06:42,932][41694] Fps is (10 sec: 7019.5, 60 sec: 6758.4, 300 sec: 6748.0). Total num frames: 114675712. Throughput: 0: 1786.0. Samples: 23665456. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:06:42,934][41694] Avg episode reward: [(0, '4.559')] +[2024-11-08 03:06:47,219][42004] Updated weights for policy 0, policy_version 28006 (0.0023) +[2024-11-08 03:06:47,931][41694] Fps is (10 sec: 7782.6, 60 sec: 6758.4, 300 sec: 6761.9). Total num frames: 114716672. Throughput: 0: 1718.4. Samples: 23671156. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:06:47,933][41694] Avg episode reward: [(0, '4.400')] +[2024-11-08 03:06:52,488][42004] Updated weights for policy 0, policy_version 28016 (0.0033) +[2024-11-08 03:06:52,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6758.4, 300 sec: 6817.4). Total num frames: 114753536. Throughput: 0: 1763.1. Samples: 23682954. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:06:52,934][41694] Avg episode reward: [(0, '4.551')] +[2024-11-08 03:06:57,773][42004] Updated weights for policy 0, policy_version 28026 (0.0027) +[2024-11-08 03:06:57,932][41694] Fps is (10 sec: 7782.3, 60 sec: 7095.4, 300 sec: 6845.2). Total num frames: 114794496. Throughput: 0: 1794.8. Samples: 23694622. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:06:57,933][41694] Avg episode reward: [(0, '4.289')] +[2024-11-08 03:07:02,932][41694] Fps is (10 sec: 7782.2, 60 sec: 7168.0, 300 sec: 6859.1). Total num frames: 114831360. Throughput: 0: 1804.7. Samples: 23700350. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:07:02,934][41694] Avg episode reward: [(0, '4.529')] +[2024-11-08 03:07:03,383][42004] Updated weights for policy 0, policy_version 28036 (0.0022) +[2024-11-08 03:07:08,114][41694] Fps is (10 sec: 5229.4, 60 sec: 6806.7, 300 sec: 6785.5). Total num frames: 114847744. Throughput: 0: 1655.2. Samples: 23705786. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:07:08,116][41694] Avg episode reward: [(0, '4.361')] +[2024-11-08 03:07:12,038][42004] Updated weights for policy 0, policy_version 28046 (0.0043) +[2024-11-08 03:07:12,931][41694] Fps is (10 sec: 4915.4, 60 sec: 6826.7, 300 sec: 6775.8). Total num frames: 114880512. Throughput: 0: 1642.8. Samples: 23716614. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:07:12,934][41694] Avg episode reward: [(0, '4.446')] +[2024-11-08 03:07:17,372][42004] Updated weights for policy 0, policy_version 28056 (0.0027) +[2024-11-08 03:07:17,931][41694] Fps is (10 sec: 7092.7, 60 sec: 6826.7, 300 sec: 6775.8). Total num frames: 114917376. Throughput: 0: 1666.3. Samples: 23722076. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:07:17,933][41694] Avg episode reward: [(0, '4.265')] +[2024-11-08 03:07:22,778][42004] Updated weights for policy 0, policy_version 28066 (0.0026) +[2024-11-08 03:07:22,932][41694] Fps is (10 sec: 7782.3, 60 sec: 6894.9, 300 sec: 6775.8). Total num frames: 114958336. Throughput: 0: 1759.2. Samples: 23733564. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:07:22,933][41694] Avg episode reward: [(0, '4.538')] +[2024-11-08 03:07:27,932][41694] Fps is (10 sec: 7782.0, 60 sec: 6826.6, 300 sec: 6831.3). Total num frames: 114995200. Throughput: 0: 1774.4. Samples: 23745306. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:07:27,934][41694] Avg episode reward: [(0, '4.480')] +[2024-11-08 03:07:28,098][42004] Updated weights for policy 0, policy_version 28076 (0.0028) +[2024-11-08 03:07:32,931][41694] Fps is (10 sec: 7782.5, 60 sec: 7179.2, 300 sec: 6859.1). Total num frames: 115036160. Throughput: 0: 1774.8. Samples: 23751022. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:07:32,933][41694] Avg episode reward: [(0, '4.580')] +[2024-11-08 03:07:33,376][42004] Updated weights for policy 0, policy_version 28086 (0.0032) +[2024-11-08 03:07:37,932][41694] Fps is (10 sec: 7782.5, 60 sec: 7236.3, 300 sec: 6872.9). Total num frames: 115073024. Throughput: 0: 1768.4. Samples: 23762530. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:07:37,938][41694] Avg episode reward: [(0, '4.392')] +[2024-11-08 03:07:37,950][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000028094_115073024.pth... +[2024-11-08 03:07:38,096][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000027696_113442816.pth +[2024-11-08 03:07:39,310][42004] Updated weights for policy 0, policy_version 28096 (0.0031) +[2024-11-08 03:07:42,935][41694] Fps is (10 sec: 4913.4, 60 sec: 6826.3, 300 sec: 6789.6). Total num frames: 115085312. Throughput: 0: 1664.0. Samples: 23769506. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:07:42,937][41694] Avg episode reward: [(0, '4.466')] +[2024-11-08 03:07:47,777][42004] Updated weights for policy 0, policy_version 28106 (0.0029) +[2024-11-08 03:07:47,931][41694] Fps is (10 sec: 4915.3, 60 sec: 6758.4, 300 sec: 6789.7). Total num frames: 115122176. Throughput: 0: 1615.1. Samples: 23773030. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:07:47,933][41694] Avg episode reward: [(0, '4.297')] +[2024-11-08 03:07:52,932][41694] Fps is (10 sec: 7375.4, 60 sec: 6758.4, 300 sec: 6775.8). Total num frames: 115159040. Throughput: 0: 1757.1. Samples: 23784536. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:07:52,934][41694] Avg episode reward: [(0, '4.426')] +[2024-11-08 03:07:52,963][42004] Updated weights for policy 0, policy_version 28116 (0.0030) +[2024-11-08 03:07:57,932][41694] Fps is (10 sec: 7782.3, 60 sec: 6758.4, 300 sec: 6789.6). Total num frames: 115200000. Throughput: 0: 1774.1. Samples: 23796450. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:07:57,933][41694] Avg episode reward: [(0, '4.506')] +[2024-11-08 03:07:58,119][42004] Updated weights for policy 0, policy_version 28126 (0.0024) +[2024-11-08 03:08:02,932][41694] Fps is (10 sec: 7782.5, 60 sec: 6758.4, 300 sec: 6845.2). Total num frames: 115236864. Throughput: 0: 1783.1. Samples: 23802314. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:08:02,934][41694] Avg episode reward: [(0, '4.486')] +[2024-11-08 03:08:03,820][42004] Updated weights for policy 0, policy_version 28136 (0.0026) +[2024-11-08 03:08:07,932][41694] Fps is (10 sec: 7372.8, 60 sec: 7121.4, 300 sec: 6859.1). Total num frames: 115273728. Throughput: 0: 1752.1. Samples: 23812410. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:08:07,933][41694] Avg episode reward: [(0, '4.381')] +[2024-11-08 03:08:09,753][42004] Updated weights for policy 0, policy_version 28146 (0.0034) +[2024-11-08 03:08:12,932][41694] Fps is (10 sec: 6963.2, 60 sec: 7099.7, 300 sec: 6845.3). Total num frames: 115306496. Throughput: 0: 1731.7. Samples: 23823232. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:08:12,934][41694] Avg episode reward: [(0, '4.478')] +[2024-11-08 03:08:17,932][41694] Fps is (10 sec: 4915.2, 60 sec: 6758.4, 300 sec: 6775.8). Total num frames: 115322880. Throughput: 0: 1707.6. Samples: 23827866. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 03:08:17,934][41694] Avg episode reward: [(0, '4.524')] +[2024-11-08 03:08:18,012][42004] Updated weights for policy 0, policy_version 28156 (0.0033) +[2024-11-08 03:08:22,931][41694] Fps is (10 sec: 5734.5, 60 sec: 6758.4, 300 sec: 6789.6). Total num frames: 115363840. Throughput: 0: 1605.9. Samples: 23834794. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 03:08:22,935][41694] Avg episode reward: [(0, '4.481')] +[2024-11-08 03:08:23,395][42004] Updated weights for policy 0, policy_version 28166 (0.0032) +[2024-11-08 03:08:27,931][41694] Fps is (10 sec: 7782.5, 60 sec: 6758.5, 300 sec: 6775.8). Total num frames: 115400704. Throughput: 0: 1708.8. Samples: 23846394. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 03:08:27,933][41694] Avg episode reward: [(0, '4.392')] +[2024-11-08 03:08:28,742][42004] Updated weights for policy 0, policy_version 28176 (0.0029) +[2024-11-08 03:08:32,932][41694] Fps is (10 sec: 7782.3, 60 sec: 6758.4, 300 sec: 6789.7). Total num frames: 115441664. Throughput: 0: 1758.8. Samples: 23852178. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 03:08:32,933][41694] Avg episode reward: [(0, '4.577')] +[2024-11-08 03:08:33,908][42004] Updated weights for policy 0, policy_version 28186 (0.0040) +[2024-11-08 03:08:37,932][41694] Fps is (10 sec: 7782.2, 60 sec: 6758.4, 300 sec: 6845.2). Total num frames: 115478528. Throughput: 0: 1764.8. Samples: 23863952. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:08:37,934][41694] Avg episode reward: [(0, '4.688')] +[2024-11-08 03:08:39,215][42004] Updated weights for policy 0, policy_version 28196 (0.0034) +[2024-11-08 03:08:42,932][41694] Fps is (10 sec: 7372.8, 60 sec: 7168.4, 300 sec: 6859.1). Total num frames: 115515392. Throughput: 0: 1755.2. Samples: 23875436. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:08:42,933][41694] Avg episode reward: [(0, '4.674')] +[2024-11-08 03:08:44,630][42004] Updated weights for policy 0, policy_version 28206 (0.0026) +[2024-11-08 03:08:47,932][41694] Fps is (10 sec: 7372.9, 60 sec: 7168.0, 300 sec: 6872.9). Total num frames: 115552256. Throughput: 0: 1754.0. Samples: 23881246. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:08:47,933][41694] Avg episode reward: [(0, '4.505')] +[2024-11-08 03:08:52,932][42004] Updated weights for policy 0, policy_version 28216 (0.0032) +[2024-11-08 03:08:52,933][41694] Fps is (10 sec: 5734.4, 60 sec: 6895.0, 300 sec: 6817.4). Total num frames: 115572736. Throughput: 0: 1693.2. Samples: 23888606. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:08:52,935][41694] Avg episode reward: [(0, '4.468')] +[2024-11-08 03:08:57,932][41694] Fps is (10 sec: 5734.1, 60 sec: 6826.6, 300 sec: 6831.3). Total num frames: 115609600. Throughput: 0: 1678.1. Samples: 23898748. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:08:57,935][41694] Avg episode reward: [(0, '4.393')] +[2024-11-08 03:08:58,174][42004] Updated weights for policy 0, policy_version 28226 (0.0023) +[2024-11-08 03:09:02,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6826.7, 300 sec: 6831.3). Total num frames: 115646464. Throughput: 0: 1704.3. Samples: 23904558. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:09:02,936][41694] Avg episode reward: [(0, '4.310')] +[2024-11-08 03:09:03,604][42004] Updated weights for policy 0, policy_version 28236 (0.0030) +[2024-11-08 03:09:07,931][41694] Fps is (10 sec: 7782.8, 60 sec: 6894.9, 300 sec: 6845.2). Total num frames: 115687424. Throughput: 0: 1803.9. Samples: 23915968. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:09:07,933][41694] Avg episode reward: [(0, '4.566')] +[2024-11-08 03:09:08,762][42004] Updated weights for policy 0, policy_version 28246 (0.0032) +[2024-11-08 03:09:12,932][41694] Fps is (10 sec: 7782.2, 60 sec: 6963.2, 300 sec: 6900.7). Total num frames: 115724288. Throughput: 0: 1804.6. Samples: 23927600. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:09:12,935][41694] Avg episode reward: [(0, '4.343')] +[2024-11-08 03:09:14,075][42004] Updated weights for policy 0, policy_version 28256 (0.0026) +[2024-11-08 03:09:17,932][41694] Fps is (10 sec: 7782.4, 60 sec: 7372.8, 300 sec: 6928.5). Total num frames: 115765248. Throughput: 0: 1805.1. Samples: 23933406. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:09:17,933][41694] Avg episode reward: [(0, '4.266')] +[2024-11-08 03:09:19,459][42004] Updated weights for policy 0, policy_version 28266 (0.0024) +[2024-11-08 03:09:22,932][41694] Fps is (10 sec: 7372.7, 60 sec: 7236.2, 300 sec: 6914.6). Total num frames: 115798016. Throughput: 0: 1786.3. Samples: 23944334. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:09:22,934][41694] Avg episode reward: [(0, '4.504')] +[2024-11-08 03:09:27,875][42004] Updated weights for policy 0, policy_version 28276 (0.0021) +[2024-11-08 03:09:27,932][41694] Fps is (10 sec: 5324.7, 60 sec: 6963.2, 300 sec: 6859.1). Total num frames: 115818496. Throughput: 0: 1667.1. Samples: 23950454. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:09:27,933][41694] Avg episode reward: [(0, '4.552')] +[2024-11-08 03:09:32,932][41694] Fps is (10 sec: 5734.6, 60 sec: 6894.9, 300 sec: 6859.1). Total num frames: 115855360. Throughput: 0: 1665.0. Samples: 23956172. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:09:32,934][41694] Avg episode reward: [(0, '4.369')] +[2024-11-08 03:09:33,084][42004] Updated weights for policy 0, policy_version 28286 (0.0047) +[2024-11-08 03:09:37,932][41694] Fps is (10 sec: 7782.0, 60 sec: 6963.2, 300 sec: 6872.9). Total num frames: 115896320. Throughput: 0: 1767.0. Samples: 23968122. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:09:37,936][41694] Avg episode reward: [(0, '4.595')] +[2024-11-08 03:09:37,947][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000028295_115896320.pth... +[2024-11-08 03:09:38,075][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000027890_114237440.pth +[2024-11-08 03:09:38,278][42004] Updated weights for policy 0, policy_version 28296 (0.0025) +[2024-11-08 03:09:42,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6963.2, 300 sec: 6920.4). Total num frames: 115933184. Throughput: 0: 1803.6. Samples: 23979910. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:09:42,933][41694] Avg episode reward: [(0, '4.522')] +[2024-11-08 03:09:43,625][42004] Updated weights for policy 0, policy_version 28306 (0.0031) +[2024-11-08 03:09:47,932][41694] Fps is (10 sec: 7782.8, 60 sec: 7031.5, 300 sec: 6942.4). Total num frames: 115974144. Throughput: 0: 1800.2. Samples: 23985566. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:09:47,935][41694] Avg episode reward: [(0, '4.360')] +[2024-11-08 03:09:48,913][42004] Updated weights for policy 0, policy_version 28316 (0.0026) +[2024-11-08 03:09:52,931][41694] Fps is (10 sec: 7782.5, 60 sec: 7304.5, 300 sec: 6942.4). Total num frames: 116011008. Throughput: 0: 1800.7. Samples: 23996998. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:09:52,934][41694] Avg episode reward: [(0, '4.287')] +[2024-11-08 03:09:54,532][42004] Updated weights for policy 0, policy_version 28326 (0.0032) +[2024-11-08 03:09:57,931][41694] Fps is (10 sec: 6963.3, 60 sec: 7236.3, 300 sec: 6928.5). Total num frames: 116043776. Throughput: 0: 1777.6. Samples: 24007592. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:09:57,933][41694] Avg episode reward: [(0, '4.361')] +[2024-11-08 03:10:02,924][42004] Updated weights for policy 0, policy_version 28336 (0.0021) +[2024-11-08 03:10:02,931][41694] Fps is (10 sec: 5324.7, 60 sec: 6963.2, 300 sec: 6873.0). Total num frames: 116064256. Throughput: 0: 1704.9. Samples: 24010128. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:10:02,933][41694] Avg episode reward: [(0, '4.518')] +[2024-11-08 03:10:07,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6894.9, 300 sec: 6872.9). Total num frames: 116101120. Throughput: 0: 1675.0. Samples: 24019708. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:10:07,934][41694] Avg episode reward: [(0, '4.376')] +[2024-11-08 03:10:08,165][42004] Updated weights for policy 0, policy_version 28346 (0.0024) +[2024-11-08 03:10:12,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6894.9, 300 sec: 6886.8). Total num frames: 116137984. Throughput: 0: 1787.7. Samples: 24030900. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:10:12,935][41694] Avg episode reward: [(0, '4.472')] +[2024-11-08 03:10:13,862][42004] Updated weights for policy 0, policy_version 28356 (0.0028) +[2024-11-08 03:10:17,937][41694] Fps is (10 sec: 6959.5, 60 sec: 6757.8, 300 sec: 6923.9). Total num frames: 116170752. Throughput: 0: 1769.8. Samples: 24035824. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:10:17,939][41694] Avg episode reward: [(0, '4.260')] +[2024-11-08 03:10:19,818][42004] Updated weights for policy 0, policy_version 28366 (0.0049) +[2024-11-08 03:10:22,932][41694] Fps is (10 sec: 6963.3, 60 sec: 6826.7, 300 sec: 6942.4). Total num frames: 116207616. Throughput: 0: 1743.8. Samples: 24046594. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:10:22,933][41694] Avg episode reward: [(0, '4.352')] +[2024-11-08 03:10:25,199][42004] Updated weights for policy 0, policy_version 28376 (0.0026) +[2024-11-08 03:10:27,931][41694] Fps is (10 sec: 7376.7, 60 sec: 7099.8, 300 sec: 6942.4). Total num frames: 116244480. Throughput: 0: 1730.8. Samples: 24057796. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:10:27,933][41694] Avg episode reward: [(0, '4.352')] +[2024-11-08 03:10:31,404][42004] Updated weights for policy 0, policy_version 28386 (0.0029) +[2024-11-08 03:10:32,932][41694] Fps is (10 sec: 6963.1, 60 sec: 7031.4, 300 sec: 6928.5). Total num frames: 116277248. Throughput: 0: 1705.0. Samples: 24062292. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:10:32,935][41694] Avg episode reward: [(0, '4.345')] +[2024-11-08 03:10:37,931][41694] Fps is (10 sec: 5324.8, 60 sec: 6690.2, 300 sec: 6873.0). Total num frames: 116297728. Throughput: 0: 1599.9. Samples: 24068994. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:10:37,933][41694] Avg episode reward: [(0, '4.486')] +[2024-11-08 03:10:39,194][42004] Updated weights for policy 0, policy_version 28396 (0.0041) +[2024-11-08 03:10:42,932][41694] Fps is (10 sec: 5734.5, 60 sec: 6690.1, 300 sec: 6859.1). Total num frames: 116334592. Throughput: 0: 1609.7. Samples: 24080030. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:10:42,936][41694] Avg episode reward: [(0, '4.375')] +[2024-11-08 03:10:44,806][42004] Updated weights for policy 0, policy_version 28406 (0.0026) +[2024-11-08 03:10:47,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6690.1, 300 sec: 6872.9). Total num frames: 116375552. Throughput: 0: 1680.4. Samples: 24085748. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:10:47,936][41694] Avg episode reward: [(0, '4.655')] +[2024-11-08 03:10:50,015][42004] Updated weights for policy 0, policy_version 28416 (0.0031) +[2024-11-08 03:10:52,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6690.1, 300 sec: 6927.0). Total num frames: 116412416. Throughput: 0: 1728.8. Samples: 24097502. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:10:52,934][41694] Avg episode reward: [(0, '4.291')] +[2024-11-08 03:10:55,261][42004] Updated weights for policy 0, policy_version 28426 (0.0024) +[2024-11-08 03:10:57,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6758.4, 300 sec: 6942.4). Total num frames: 116449280. Throughput: 0: 1736.2. Samples: 24109028. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:10:57,933][41694] Avg episode reward: [(0, '4.633')] +[2024-11-08 03:11:00,943][42004] Updated weights for policy 0, policy_version 28436 (0.0029) +[2024-11-08 03:11:02,932][41694] Fps is (10 sec: 7372.8, 60 sec: 7031.5, 300 sec: 6942.5). Total num frames: 116486144. Throughput: 0: 1750.6. Samples: 24114594. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:11:02,933][41694] Avg episode reward: [(0, '4.671')] +[2024-11-08 03:11:07,099][42004] Updated weights for policy 0, policy_version 28446 (0.0028) +[2024-11-08 03:11:09,829][41694] Fps is (10 sec: 5508.2, 60 sec: 6683.5, 300 sec: 6884.2). Total num frames: 116514816. Throughput: 0: 1651.3. Samples: 24124038. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:11:09,832][41694] Avg episode reward: [(0, '4.638')] +[2024-11-08 03:11:12,931][41694] Fps is (10 sec: 4915.3, 60 sec: 6621.9, 300 sec: 6873.0). Total num frames: 116535296. Throughput: 0: 1614.0. Samples: 24130426. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:11:12,933][41694] Avg episode reward: [(0, '4.608')] +[2024-11-08 03:11:15,560][42004] Updated weights for policy 0, policy_version 28456 (0.0036) +[2024-11-08 03:11:17,931][41694] Fps is (10 sec: 7077.6, 60 sec: 6690.7, 300 sec: 6873.0). Total num frames: 116572160. Throughput: 0: 1623.8. Samples: 24135364. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:11:17,933][41694] Avg episode reward: [(0, '4.482')] +[2024-11-08 03:11:21,060][42004] Updated weights for policy 0, policy_version 28466 (0.0029) +[2024-11-08 03:11:22,931][41694] Fps is (10 sec: 7372.7, 60 sec: 6690.1, 300 sec: 6859.1). Total num frames: 116609024. Throughput: 0: 1727.6. Samples: 24146738. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:11:22,933][41694] Avg episode reward: [(0, '4.366')] +[2024-11-08 03:11:26,548][42004] Updated weights for policy 0, policy_version 28476 (0.0032) +[2024-11-08 03:11:27,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6690.1, 300 sec: 6916.3). Total num frames: 116645888. Throughput: 0: 1731.1. Samples: 24157928. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:11:27,933][41694] Avg episode reward: [(0, '4.471')] +[2024-11-08 03:11:31,902][42004] Updated weights for policy 0, policy_version 28486 (0.0051) +[2024-11-08 03:11:32,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6758.4, 300 sec: 6928.5). Total num frames: 116682752. Throughput: 0: 1726.7. Samples: 24163450. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:11:32,933][41694] Avg episode reward: [(0, '4.459')] +[2024-11-08 03:11:37,892][42004] Updated weights for policy 0, policy_version 28496 (0.0034) +[2024-11-08 03:11:37,932][41694] Fps is (10 sec: 7372.7, 60 sec: 7031.4, 300 sec: 6928.5). Total num frames: 116719616. Throughput: 0: 1705.1. Samples: 24174230. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:11:37,933][41694] Avg episode reward: [(0, '4.794')] +[2024-11-08 03:11:37,944][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000028496_116719616.pth... +[2024-11-08 03:11:38,071][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000028094_115073024.pth +[2024-11-08 03:11:44,357][41694] Fps is (10 sec: 5736.1, 60 sec: 6735.0, 300 sec: 6853.7). Total num frames: 116748288. Throughput: 0: 1629.2. Samples: 24184664. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:11:44,359][41694] Avg episode reward: [(0, '4.620')] +[2024-11-08 03:11:45,837][42004] Updated weights for policy 0, policy_version 28506 (0.0025) +[2024-11-08 03:11:47,932][41694] Fps is (10 sec: 5324.9, 60 sec: 6621.9, 300 sec: 6845.2). Total num frames: 116772864. Throughput: 0: 1588.4. Samples: 24186072. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:11:47,934][41694] Avg episode reward: [(0, '4.298')] +[2024-11-08 03:11:51,386][42004] Updated weights for policy 0, policy_version 28516 (0.0032) +[2024-11-08 03:11:52,933][41694] Fps is (10 sec: 7163.7, 60 sec: 6621.7, 300 sec: 6831.3). Total num frames: 116809728. Throughput: 0: 1697.4. Samples: 24197204. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:11:52,936][41694] Avg episode reward: [(0, '4.574')] +[2024-11-08 03:11:56,822][42004] Updated weights for policy 0, policy_version 28526 (0.0033) +[2024-11-08 03:11:57,931][41694] Fps is (10 sec: 7782.6, 60 sec: 6690.1, 300 sec: 6845.2). Total num frames: 116850688. Throughput: 0: 1736.5. Samples: 24208570. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:11:57,933][41694] Avg episode reward: [(0, '4.528')] +[2024-11-08 03:12:02,426][42004] Updated weights for policy 0, policy_version 28536 (0.0023) +[2024-11-08 03:12:02,933][41694] Fps is (10 sec: 7373.0, 60 sec: 6621.7, 300 sec: 6905.0). Total num frames: 116883456. Throughput: 0: 1755.5. Samples: 24214366. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:12:02,936][41694] Avg episode reward: [(0, '4.599')] +[2024-11-08 03:12:07,862][42004] Updated weights for policy 0, policy_version 28546 (0.0037) +[2024-11-08 03:12:07,932][41694] Fps is (10 sec: 7372.7, 60 sec: 7049.6, 300 sec: 6928.5). Total num frames: 116924416. Throughput: 0: 1737.9. Samples: 24224942. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:12:07,933][41694] Avg episode reward: [(0, '4.486')] +[2024-11-08 03:12:12,935][41694] Fps is (10 sec: 6961.6, 60 sec: 6962.8, 300 sec: 6900.6). Total num frames: 116953088. Throughput: 0: 1715.5. Samples: 24235130. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:12:12,937][41694] Avg episode reward: [(0, '4.347')] +[2024-11-08 03:12:14,430][42004] Updated weights for policy 0, policy_version 28556 (0.0035) +[2024-11-08 03:12:18,900][41694] Fps is (10 sec: 4854.8, 60 sec: 6651.1, 300 sec: 6822.8). Total num frames: 116977664. Throughput: 0: 1659.2. Samples: 24239720. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:12:18,902][41694] Avg episode reward: [(0, '4.387')] +[2024-11-08 03:12:22,526][42004] Updated weights for policy 0, policy_version 28566 (0.0033) +[2024-11-08 03:12:22,932][41694] Fps is (10 sec: 5326.6, 60 sec: 6621.8, 300 sec: 6817.4). Total num frames: 117006336. Throughput: 0: 1601.1. Samples: 24246280. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:12:22,934][41694] Avg episode reward: [(0, '4.779')] +[2024-11-08 03:12:27,869][42004] Updated weights for policy 0, policy_version 28576 (0.0027) +[2024-11-08 03:12:27,932][41694] Fps is (10 sec: 7709.1, 60 sec: 6690.1, 300 sec: 6817.4). Total num frames: 117047296. Throughput: 0: 1677.4. Samples: 24257758. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:12:27,934][41694] Avg episode reward: [(0, '4.455')] +[2024-11-08 03:12:32,932][41694] Fps is (10 sec: 7372.9, 60 sec: 6621.8, 300 sec: 6803.5). Total num frames: 117080064. Throughput: 0: 1706.6. Samples: 24262870. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:12:32,933][41694] Avg episode reward: [(0, '4.537')] +[2024-11-08 03:12:33,610][42004] Updated weights for policy 0, policy_version 28586 (0.0032) +[2024-11-08 03:12:37,932][41694] Fps is (10 sec: 6963.0, 60 sec: 6621.8, 300 sec: 6886.9). Total num frames: 117116928. Throughput: 0: 1707.8. Samples: 24274054. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:12:37,936][41694] Avg episode reward: [(0, '4.396')] +[2024-11-08 03:12:39,325][42004] Updated weights for policy 0, policy_version 28596 (0.0031) +[2024-11-08 03:12:42,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6922.8, 300 sec: 6886.8). Total num frames: 117153792. Throughput: 0: 1692.2. Samples: 24284718. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:12:42,933][41694] Avg episode reward: [(0, '4.491')] +[2024-11-08 03:12:45,473][42004] Updated weights for policy 0, policy_version 28606 (0.0024) +[2024-11-08 03:12:47,932][41694] Fps is (10 sec: 6554.0, 60 sec: 6826.7, 300 sec: 6859.1). Total num frames: 117182464. Throughput: 0: 1665.5. Samples: 24289312. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:12:47,935][41694] Avg episode reward: [(0, '4.530')] +[2024-11-08 03:12:51,144][42004] Updated weights for policy 0, policy_version 28616 (0.0034) +[2024-11-08 03:12:53,427][41694] Fps is (10 sec: 5463.9, 60 sec: 6635.6, 300 sec: 6806.0). Total num frames: 117211136. Throughput: 0: 1652.8. Samples: 24300138. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:12:53,428][41694] Avg episode reward: [(0, '4.455')] +[2024-11-08 03:12:57,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6485.3, 300 sec: 6789.6). Total num frames: 117239808. Throughput: 0: 1584.6. Samples: 24306434. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:12:57,936][41694] Avg episode reward: [(0, '4.398')] +[2024-11-08 03:12:59,416][42004] Updated weights for policy 0, policy_version 28626 (0.0041) +[2024-11-08 03:13:02,931][41694] Fps is (10 sec: 6464.0, 60 sec: 6485.5, 300 sec: 6775.8). Total num frames: 117272576. Throughput: 0: 1634.3. Samples: 24311682. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:13:02,933][41694] Avg episode reward: [(0, '4.724')] +[2024-11-08 03:13:05,120][42004] Updated weights for policy 0, policy_version 28636 (0.0030) +[2024-11-08 03:13:07,931][41694] Fps is (10 sec: 7373.0, 60 sec: 6485.3, 300 sec: 6803.5). Total num frames: 117313536. Throughput: 0: 1695.5. Samples: 24322578. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:13:07,933][41694] Avg episode reward: [(0, '4.336')] +[2024-11-08 03:13:10,626][42004] Updated weights for policy 0, policy_version 28646 (0.0027) +[2024-11-08 03:13:12,932][41694] Fps is (10 sec: 7781.6, 60 sec: 6622.2, 300 sec: 6872.9). Total num frames: 117350400. Throughput: 0: 1687.3. Samples: 24333688. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:13:12,934][41694] Avg episode reward: [(0, '4.100')] +[2024-11-08 03:13:16,119][42004] Updated weights for policy 0, policy_version 28656 (0.0032) +[2024-11-08 03:13:17,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6869.2, 300 sec: 6845.2). Total num frames: 117383168. Throughput: 0: 1697.7. Samples: 24339268. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:13:17,933][41694] Avg episode reward: [(0, '4.596')] +[2024-11-08 03:13:22,751][42004] Updated weights for policy 0, policy_version 28666 (0.0050) +[2024-11-08 03:13:22,932][41694] Fps is (10 sec: 6554.1, 60 sec: 6826.7, 300 sec: 6831.3). Total num frames: 117415936. Throughput: 0: 1656.7. Samples: 24348606. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:13:22,933][41694] Avg episode reward: [(0, '4.618')] +[2024-11-08 03:13:27,940][41694] Fps is (10 sec: 5320.1, 60 sec: 6484.4, 300 sec: 6761.7). Total num frames: 117436416. Throughput: 0: 1537.5. Samples: 24353920. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:13:27,943][41694] Avg episode reward: [(0, '4.493')] +[2024-11-08 03:13:30,458][42004] Updated weights for policy 0, policy_version 28676 (0.0027) +[2024-11-08 03:13:32,931][41694] Fps is (10 sec: 5734.5, 60 sec: 6553.6, 300 sec: 6761.9). Total num frames: 117473280. Throughput: 0: 1587.6. Samples: 24360754. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:13:32,932][41694] Avg episode reward: [(0, '4.523')] +[2024-11-08 03:13:36,118][42004] Updated weights for policy 0, policy_version 28686 (0.0035) +[2024-11-08 03:13:37,932][41694] Fps is (10 sec: 7379.0, 60 sec: 6553.6, 300 sec: 6761.9). Total num frames: 117510144. Throughput: 0: 1611.0. Samples: 24371834. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:13:37,934][41694] Avg episode reward: [(0, '4.680')] +[2024-11-08 03:13:37,941][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000028689_117510144.pth... +[2024-11-08 03:13:38,057][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000028295_115896320.pth +[2024-11-08 03:13:41,587][42004] Updated weights for policy 0, policy_version 28696 (0.0027) +[2024-11-08 03:13:42,933][41694] Fps is (10 sec: 7371.7, 60 sec: 6553.5, 300 sec: 6761.8). Total num frames: 117547008. Throughput: 0: 1708.4. Samples: 24383314. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-11-08 03:13:42,935][41694] Avg episode reward: [(0, '4.444')] +[2024-11-08 03:13:46,927][42004] Updated weights for policy 0, policy_version 28706 (0.0029) +[2024-11-08 03:13:47,932][41694] Fps is (10 sec: 7373.0, 60 sec: 6690.2, 300 sec: 6817.4). Total num frames: 117583872. Throughput: 0: 1717.9. Samples: 24388986. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-11-08 03:13:47,933][41694] Avg episode reward: [(0, '4.486')] +[2024-11-08 03:13:52,810][42004] Updated weights for policy 0, policy_version 28716 (0.0047) +[2024-11-08 03:13:52,931][41694] Fps is (10 sec: 7373.9, 60 sec: 6883.5, 300 sec: 6817.4). Total num frames: 117620736. Throughput: 0: 1715.3. Samples: 24399766. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-11-08 03:13:52,933][41694] Avg episode reward: [(0, '4.309')] +[2024-11-08 03:13:57,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6963.2, 300 sec: 6817.4). Total num frames: 117657600. Throughput: 0: 1704.8. Samples: 24410404. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:13:57,933][41694] Avg episode reward: [(0, '4.440')] +[2024-11-08 03:13:58,391][42004] Updated weights for policy 0, policy_version 28726 (0.0033) +[2024-11-08 03:14:02,931][41694] Fps is (10 sec: 5324.7, 60 sec: 6690.1, 300 sec: 6734.1). Total num frames: 117673984. Throughput: 0: 1700.7. Samples: 24415798. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:14:02,934][41694] Avg episode reward: [(0, '4.332')] +[2024-11-08 03:14:06,387][42004] Updated weights for policy 0, policy_version 28736 (0.0028) +[2024-11-08 03:14:07,932][41694] Fps is (10 sec: 5324.5, 60 sec: 6621.8, 300 sec: 6734.1). Total num frames: 117710848. Throughput: 0: 1640.7. Samples: 24422438. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:14:07,935][41694] Avg episode reward: [(0, '4.439')] +[2024-11-08 03:14:12,048][42004] Updated weights for policy 0, policy_version 28746 (0.0031) +[2024-11-08 03:14:12,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6622.0, 300 sec: 6720.2). Total num frames: 117747712. Throughput: 0: 1769.1. Samples: 24433516. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:14:12,934][41694] Avg episode reward: [(0, '4.381')] +[2024-11-08 03:14:17,138][42004] Updated weights for policy 0, policy_version 28756 (0.0027) +[2024-11-08 03:14:17,932][41694] Fps is (10 sec: 7782.8, 60 sec: 6758.4, 300 sec: 6748.0). Total num frames: 117788672. Throughput: 0: 1745.8. Samples: 24439316. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:14:17,933][41694] Avg episode reward: [(0, '4.498')] +[2024-11-08 03:14:22,554][42004] Updated weights for policy 0, policy_version 28766 (0.0031) +[2024-11-08 03:14:22,932][41694] Fps is (10 sec: 7782.2, 60 sec: 6826.6, 300 sec: 6803.5). Total num frames: 117825536. Throughput: 0: 1756.2. Samples: 24450862. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:14:22,934][41694] Avg episode reward: [(0, '4.392')] +[2024-11-08 03:14:27,931][41694] Fps is (10 sec: 6963.3, 60 sec: 7032.5, 300 sec: 6789.6). Total num frames: 117858304. Throughput: 0: 1735.7. Samples: 24461416. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:14:27,934][41694] Avg episode reward: [(0, '4.279')] +[2024-11-08 03:14:28,648][42004] Updated weights for policy 0, policy_version 28776 (0.0046) +[2024-11-08 03:14:32,931][41694] Fps is (10 sec: 6963.5, 60 sec: 7031.5, 300 sec: 6775.8). Total num frames: 117895168. Throughput: 0: 1729.0. Samples: 24466792. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:14:32,933][41694] Avg episode reward: [(0, '4.219')] +[2024-11-08 03:14:34,041][42004] Updated weights for policy 0, policy_version 28786 (0.0029) +[2024-11-08 03:14:37,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6758.4, 300 sec: 6720.2). Total num frames: 117915648. Throughput: 0: 1681.0. Samples: 24475412. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:14:37,934][41694] Avg episode reward: [(0, '4.515')] +[2024-11-08 03:14:42,230][42004] Updated weights for policy 0, policy_version 28796 (0.0027) +[2024-11-08 03:14:42,931][41694] Fps is (10 sec: 5734.4, 60 sec: 6758.6, 300 sec: 6706.3). Total num frames: 117952512. Throughput: 0: 1643.2. Samples: 24484350. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:14:42,934][41694] Avg episode reward: [(0, '4.352')] +[2024-11-08 03:14:47,548][42004] Updated weights for policy 0, policy_version 28806 (0.0026) +[2024-11-08 03:14:47,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6758.4, 300 sec: 6706.3). Total num frames: 117989376. Throughput: 0: 1640.6. Samples: 24489624. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:14:47,933][41694] Avg episode reward: [(0, '4.537')] +[2024-11-08 03:14:52,897][42004] Updated weights for policy 0, policy_version 28816 (0.0029) +[2024-11-08 03:14:52,931][41694] Fps is (10 sec: 7782.4, 60 sec: 6826.7, 300 sec: 6734.1). Total num frames: 118030336. Throughput: 0: 1760.3. Samples: 24501650. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:14:52,933][41694] Avg episode reward: [(0, '4.508')] +[2024-11-08 03:14:57,931][41694] Fps is (10 sec: 7782.4, 60 sec: 6826.7, 300 sec: 6789.6). Total num frames: 118067200. Throughput: 0: 1760.1. Samples: 24512720. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:14:57,933][41694] Avg episode reward: [(0, '4.344')] +[2024-11-08 03:14:58,559][42004] Updated weights for policy 0, policy_version 28826 (0.0027) +[2024-11-08 03:15:02,932][41694] Fps is (10 sec: 6553.5, 60 sec: 7031.4, 300 sec: 6761.9). Total num frames: 118095872. Throughput: 0: 1737.0. Samples: 24517482. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:15:02,933][41694] Avg episode reward: [(0, '4.507')] +[2024-11-08 03:15:04,743][42004] Updated weights for policy 0, policy_version 28836 (0.0025) +[2024-11-08 03:15:07,931][41694] Fps is (10 sec: 6553.6, 60 sec: 7031.5, 300 sec: 6761.9). Total num frames: 118132736. Throughput: 0: 1716.6. Samples: 24528108. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:15:07,933][41694] Avg episode reward: [(0, '4.610')] +[2024-11-08 03:15:12,676][42004] Updated weights for policy 0, policy_version 28846 (0.0034) +[2024-11-08 03:15:12,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6758.4, 300 sec: 6720.3). Total num frames: 118153216. Throughput: 0: 1626.8. Samples: 24534622. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:15:12,937][41694] Avg episode reward: [(0, '4.360')] +[2024-11-08 03:15:17,931][41694] Fps is (10 sec: 5324.8, 60 sec: 6621.9, 300 sec: 6706.3). Total num frames: 118185984. Throughput: 0: 1617.3. Samples: 24539572. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:15:17,933][41694] Avg episode reward: [(0, '4.402')] +[2024-11-08 03:15:19,216][42004] Updated weights for policy 0, policy_version 28856 (0.0029) +[2024-11-08 03:15:22,931][41694] Fps is (10 sec: 6553.7, 60 sec: 6553.6, 300 sec: 6692.4). Total num frames: 118218752. Throughput: 0: 1638.0. Samples: 24549120. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:15:22,934][41694] Avg episode reward: [(0, '4.523')] +[2024-11-08 03:15:24,719][42004] Updated weights for policy 0, policy_version 28866 (0.0030) +[2024-11-08 03:15:27,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6690.1, 300 sec: 6720.2). Total num frames: 118259712. Throughput: 0: 1698.2. Samples: 24560770. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:15:27,933][41694] Avg episode reward: [(0, '4.385')] +[2024-11-08 03:15:29,992][42004] Updated weights for policy 0, policy_version 28876 (0.0027) +[2024-11-08 03:15:32,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6621.9, 300 sec: 6761.9). Total num frames: 118292480. Throughput: 0: 1710.2. Samples: 24566582. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:15:32,933][41694] Avg episode reward: [(0, '4.426')] +[2024-11-08 03:15:35,897][42004] Updated weights for policy 0, policy_version 28886 (0.0042) +[2024-11-08 03:15:37,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6894.9, 300 sec: 6761.9). Total num frames: 118329344. Throughput: 0: 1675.5. Samples: 24577046. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:15:37,934][41694] Avg episode reward: [(0, '4.279')] +[2024-11-08 03:15:38,066][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000028890_118333440.pth... +[2024-11-08 03:15:38,156][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000028496_116719616.pth +[2024-11-08 03:15:41,394][42004] Updated weights for policy 0, policy_version 28896 (0.0049) +[2024-11-08 03:15:42,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6894.9, 300 sec: 6748.0). Total num frames: 118366208. Throughput: 0: 1676.7. Samples: 24588170. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:15:42,933][41694] Avg episode reward: [(0, '4.506')] +[2024-11-08 03:15:47,931][41694] Fps is (10 sec: 5734.5, 60 sec: 6621.9, 300 sec: 6692.5). Total num frames: 118386688. Throughput: 0: 1647.0. Samples: 24591598. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:15:47,933][41694] Avg episode reward: [(0, '4.402')] +[2024-11-08 03:15:49,646][42004] Updated weights for policy 0, policy_version 28906 (0.0049) +[2024-11-08 03:15:52,932][41694] Fps is (10 sec: 5324.7, 60 sec: 6485.3, 300 sec: 6678.6). Total num frames: 118419456. Throughput: 0: 1588.9. Samples: 24599610. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:15:52,934][41694] Avg episode reward: [(0, '4.417')] +[2024-11-08 03:15:55,387][42004] Updated weights for policy 0, policy_version 28916 (0.0030) +[2024-11-08 03:15:57,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6485.3, 300 sec: 6678.6). Total num frames: 118456320. Throughput: 0: 1689.7. Samples: 24610658. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:15:57,934][41694] Avg episode reward: [(0, '4.328')] +[2024-11-08 03:16:00,623][42004] Updated weights for policy 0, policy_version 28926 (0.0028) +[2024-11-08 03:16:02,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6621.9, 300 sec: 6749.8). Total num frames: 118493184. Throughput: 0: 1710.8. Samples: 24616556. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:16:02,933][41694] Avg episode reward: [(0, '4.402')] +[2024-11-08 03:16:06,640][42004] Updated weights for policy 0, policy_version 28936 (0.0035) +[2024-11-08 03:16:07,931][41694] Fps is (10 sec: 7372.7, 60 sec: 6621.9, 300 sec: 6761.9). Total num frames: 118530048. Throughput: 0: 1730.3. Samples: 24626982. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:16:07,936][41694] Avg episode reward: [(0, '4.735')] +[2024-11-08 03:16:12,814][42004] Updated weights for policy 0, policy_version 28946 (0.0026) +[2024-11-08 03:16:12,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6826.7, 300 sec: 6748.0). Total num frames: 118562816. Throughput: 0: 1692.1. Samples: 24636916. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:16:12,933][41694] Avg episode reward: [(0, '4.619')] +[2024-11-08 03:16:17,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6894.9, 300 sec: 6748.0). Total num frames: 118599680. Throughput: 0: 1679.2. Samples: 24642144. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:16:17,936][41694] Avg episode reward: [(0, '4.789')] +[2024-11-08 03:16:18,233][42004] Updated weights for policy 0, policy_version 28956 (0.0030) +[2024-11-08 03:16:22,931][41694] Fps is (10 sec: 5734.4, 60 sec: 6690.1, 300 sec: 6692.5). Total num frames: 118620160. Throughput: 0: 1604.5. Samples: 24649248. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:16:22,933][41694] Avg episode reward: [(0, '4.665')] +[2024-11-08 03:16:26,566][42004] Updated weights for policy 0, policy_version 28966 (0.0024) +[2024-11-08 03:16:27,931][41694] Fps is (10 sec: 5324.8, 60 sec: 6553.6, 300 sec: 6678.6). Total num frames: 118652928. Throughput: 0: 1587.1. Samples: 24659590. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:16:27,934][41694] Avg episode reward: [(0, '4.446')] +[2024-11-08 03:16:32,204][42004] Updated weights for policy 0, policy_version 28976 (0.0024) +[2024-11-08 03:16:32,931][41694] Fps is (10 sec: 6963.1, 60 sec: 6621.9, 300 sec: 6678.6). Total num frames: 118689792. Throughput: 0: 1622.8. Samples: 24664624. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:16:32,933][41694] Avg episode reward: [(0, '4.375')] +[2024-11-08 03:16:37,633][42004] Updated weights for policy 0, policy_version 28986 (0.0033) +[2024-11-08 03:16:37,932][41694] Fps is (10 sec: 7372.6, 60 sec: 6621.8, 300 sec: 6738.9). Total num frames: 118726656. Throughput: 0: 1696.2. Samples: 24675938. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:16:37,934][41694] Avg episode reward: [(0, '4.692')] +[2024-11-08 03:16:42,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6621.9, 300 sec: 6748.0). Total num frames: 118763520. Throughput: 0: 1691.0. Samples: 24686754. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:16:42,933][41694] Avg episode reward: [(0, '4.311')] +[2024-11-08 03:16:43,485][42004] Updated weights for policy 0, policy_version 28996 (0.0026) +[2024-11-08 03:16:47,932][41694] Fps is (10 sec: 7372.9, 60 sec: 6894.9, 300 sec: 6748.0). Total num frames: 118800384. Throughput: 0: 1683.5. Samples: 24692312. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:16:47,934][41694] Avg episode reward: [(0, '4.557')] +[2024-11-08 03:16:48,885][42004] Updated weights for policy 0, policy_version 29006 (0.0026) +[2024-11-08 03:16:52,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6963.2, 300 sec: 6734.1). Total num frames: 118837248. Throughput: 0: 1702.5. Samples: 24703596. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:16:52,934][41694] Avg episode reward: [(0, '4.474')] +[2024-11-08 03:16:56,458][42004] Updated weights for policy 0, policy_version 29016 (0.0026) +[2024-11-08 03:16:57,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6690.1, 300 sec: 6692.5). Total num frames: 118857728. Throughput: 0: 1646.0. Samples: 24710988. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:16:57,935][41694] Avg episode reward: [(0, '4.580')] +[2024-11-08 03:17:02,222][42004] Updated weights for policy 0, policy_version 29026 (0.0026) +[2024-11-08 03:17:02,932][41694] Fps is (10 sec: 5324.7, 60 sec: 6621.9, 300 sec: 6664.7). Total num frames: 118890496. Throughput: 0: 1648.0. Samples: 24716304. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:17:02,933][41694] Avg episode reward: [(0, '4.564')] +[2024-11-08 03:17:07,903][42004] Updated weights for policy 0, policy_version 29036 (0.0025) +[2024-11-08 03:17:07,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6690.1, 300 sec: 6706.4). Total num frames: 118931456. Throughput: 0: 1718.9. Samples: 24726598. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:17:07,933][41694] Avg episode reward: [(0, '4.232')] +[2024-11-08 03:17:12,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6758.4, 300 sec: 6770.2). Total num frames: 118968320. Throughput: 0: 1750.6. Samples: 24738368. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:17:12,934][41694] Avg episode reward: [(0, '4.362')] +[2024-11-08 03:17:13,343][42004] Updated weights for policy 0, policy_version 29046 (0.0038) +[2024-11-08 03:17:17,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6690.1, 300 sec: 6761.9). Total num frames: 119001088. Throughput: 0: 1754.2. Samples: 24743562. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:17:17,934][41694] Avg episode reward: [(0, '4.494')] +[2024-11-08 03:17:19,144][42004] Updated weights for policy 0, policy_version 29056 (0.0059) +[2024-11-08 03:17:22,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6963.2, 300 sec: 6748.0). Total num frames: 119037952. Throughput: 0: 1738.1. Samples: 24754150. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:17:22,933][41694] Avg episode reward: [(0, '4.427')] +[2024-11-08 03:17:24,890][42004] Updated weights for policy 0, policy_version 29066 (0.0030) +[2024-11-08 03:17:30,118][41694] Fps is (10 sec: 6050.2, 60 sec: 6784.3, 300 sec: 6712.1). Total num frames: 119074816. Throughput: 0: 1663.6. Samples: 24765252. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:17:30,119][41694] Avg episode reward: [(0, '4.322')] +[2024-11-08 03:17:32,932][41694] Fps is (10 sec: 5324.7, 60 sec: 6690.1, 300 sec: 6692.5). Total num frames: 119091200. Throughput: 0: 1646.2. Samples: 24766390. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 03:17:32,933][41694] Avg episode reward: [(0, '4.194')] +[2024-11-08 03:17:32,983][42004] Updated weights for policy 0, policy_version 29076 (0.0025) +[2024-11-08 03:17:37,932][41694] Fps is (10 sec: 7338.6, 60 sec: 6758.4, 300 sec: 6706.3). Total num frames: 119132160. Throughput: 0: 1634.3. Samples: 24777138. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 03:17:37,935][41694] Avg episode reward: [(0, '4.500')] +[2024-11-08 03:17:37,943][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000029085_119132160.pth... +[2024-11-08 03:17:38,052][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000028689_117510144.pth +[2024-11-08 03:17:38,375][42004] Updated weights for policy 0, policy_version 29086 (0.0026) +[2024-11-08 03:17:42,932][41694] Fps is (10 sec: 7782.3, 60 sec: 6758.4, 300 sec: 6734.1). Total num frames: 119169024. Throughput: 0: 1726.6. Samples: 24788684. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 03:17:42,933][41694] Avg episode reward: [(0, '4.541')] +[2024-11-08 03:17:43,604][42004] Updated weights for policy 0, policy_version 29096 (0.0021) +[2024-11-08 03:17:47,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6826.7, 300 sec: 6787.1). Total num frames: 119209984. Throughput: 0: 1742.2. Samples: 24794704. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 03:17:47,933][41694] Avg episode reward: [(0, '4.613')] +[2024-11-08 03:17:48,671][42004] Updated weights for policy 0, policy_version 29106 (0.0026) +[2024-11-08 03:17:52,931][41694] Fps is (10 sec: 8192.2, 60 sec: 6894.9, 300 sec: 6817.4). Total num frames: 119250944. Throughput: 0: 1777.8. Samples: 24806598. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:17:52,933][41694] Avg episode reward: [(0, '4.736')] +[2024-11-08 03:17:53,934][42004] Updated weights for policy 0, policy_version 29116 (0.0038) +[2024-11-08 03:17:57,932][41694] Fps is (10 sec: 7782.3, 60 sec: 7168.0, 300 sec: 6831.3). Total num frames: 119287808. Throughput: 0: 1773.6. Samples: 24818182. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:17:57,934][41694] Avg episode reward: [(0, '4.381')] +[2024-11-08 03:17:59,703][42004] Updated weights for policy 0, policy_version 29126 (0.0028) +[2024-11-08 03:18:04,837][41694] Fps is (10 sec: 5504.9, 60 sec: 6881.3, 300 sec: 6746.1). Total num frames: 119316480. Throughput: 0: 1693.3. Samples: 24822986. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:18:04,839][41694] Avg episode reward: [(0, '4.310')] +[2024-11-08 03:18:07,931][41694] Fps is (10 sec: 4915.3, 60 sec: 6758.4, 300 sec: 6734.1). Total num frames: 119336960. Throughput: 0: 1651.2. Samples: 24828454. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:18:07,933][41694] Avg episode reward: [(0, '4.278')] +[2024-11-08 03:18:08,469][42004] Updated weights for policy 0, policy_version 29136 (0.0030) +[2024-11-08 03:18:12,932][41694] Fps is (10 sec: 7083.8, 60 sec: 6758.4, 300 sec: 6748.0). Total num frames: 119373824. Throughput: 0: 1732.1. Samples: 24839412. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:18:12,934][41694] Avg episode reward: [(0, '4.491')] +[2024-11-08 03:18:13,852][42004] Updated weights for policy 0, policy_version 29146 (0.0021) +[2024-11-08 03:18:17,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6826.7, 300 sec: 6761.9). Total num frames: 119410688. Throughput: 0: 1751.0. Samples: 24845186. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:18:17,933][41694] Avg episode reward: [(0, '4.564')] +[2024-11-08 03:18:19,184][42004] Updated weights for policy 0, policy_version 29156 (0.0026) +[2024-11-08 03:18:22,931][41694] Fps is (10 sec: 7782.5, 60 sec: 6894.9, 300 sec: 6831.5). Total num frames: 119451648. Throughput: 0: 1765.5. Samples: 24856584. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:18:22,933][41694] Avg episode reward: [(0, '4.552')] +[2024-11-08 03:18:24,751][42004] Updated weights for policy 0, policy_version 29166 (0.0029) +[2024-11-08 03:18:27,932][41694] Fps is (10 sec: 7782.5, 60 sec: 7155.6, 300 sec: 6831.3). Total num frames: 119488512. Throughput: 0: 1766.3. Samples: 24868166. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:18:27,934][41694] Avg episode reward: [(0, '4.429')] +[2024-11-08 03:18:30,045][42004] Updated weights for policy 0, policy_version 29176 (0.0028) +[2024-11-08 03:18:32,931][41694] Fps is (10 sec: 7372.8, 60 sec: 7236.3, 300 sec: 6831.3). Total num frames: 119525376. Throughput: 0: 1757.7. Samples: 24873800. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 03:18:32,933][41694] Avg episode reward: [(0, '4.260')] +[2024-11-08 03:18:35,352][42004] Updated weights for policy 0, policy_version 29186 (0.0031) +[2024-11-08 03:18:39,438][41694] Fps is (10 sec: 5695.5, 60 sec: 6859.2, 300 sec: 6769.0). Total num frames: 119554048. Throughput: 0: 1683.9. Samples: 24884910. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 03:18:39,440][41694] Avg episode reward: [(0, '4.330')] +[2024-11-08 03:18:42,932][41694] Fps is (10 sec: 5324.5, 60 sec: 6826.6, 300 sec: 6761.9). Total num frames: 119578624. Throughput: 0: 1616.4. Samples: 24890922. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 03:18:42,935][41694] Avg episode reward: [(0, '4.414')] +[2024-11-08 03:18:43,756][42004] Updated weights for policy 0, policy_version 29196 (0.0037) +[2024-11-08 03:18:47,931][41694] Fps is (10 sec: 7234.0, 60 sec: 6758.4, 300 sec: 6761.9). Total num frames: 119615488. Throughput: 0: 1699.4. Samples: 24896220. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:18:47,936][41694] Avg episode reward: [(0, '4.473')] +[2024-11-08 03:18:49,234][42004] Updated weights for policy 0, policy_version 29206 (0.0029) +[2024-11-08 03:18:52,932][41694] Fps is (10 sec: 7782.7, 60 sec: 6758.4, 300 sec: 6775.8). Total num frames: 119656448. Throughput: 0: 1762.8. Samples: 24907780. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:18:52,933][41694] Avg episode reward: [(0, '4.373')] +[2024-11-08 03:18:54,575][42004] Updated weights for policy 0, policy_version 29216 (0.0025) +[2024-11-08 03:18:57,931][41694] Fps is (10 sec: 7782.4, 60 sec: 6758.4, 300 sec: 6845.2). Total num frames: 119693312. Throughput: 0: 1780.4. Samples: 24919532. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:18:57,933][41694] Avg episode reward: [(0, '4.382')] +[2024-11-08 03:19:00,019][42004] Updated weights for policy 0, policy_version 29226 (0.0031) +[2024-11-08 03:19:02,931][41694] Fps is (10 sec: 7372.9, 60 sec: 7121.0, 300 sec: 6845.2). Total num frames: 119730176. Throughput: 0: 1771.7. Samples: 24924912. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:19:02,934][41694] Avg episode reward: [(0, '4.251')] +[2024-11-08 03:19:05,717][42004] Updated weights for policy 0, policy_version 29236 (0.0030) +[2024-11-08 03:19:07,932][41694] Fps is (10 sec: 6963.1, 60 sec: 7099.7, 300 sec: 6831.3). Total num frames: 119762944. Throughput: 0: 1757.6. Samples: 24935678. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:19:07,934][41694] Avg episode reward: [(0, '4.539')] +[2024-11-08 03:19:14,121][41694] Fps is (10 sec: 5125.0, 60 sec: 6760.9, 300 sec: 6748.6). Total num frames: 119787520. Throughput: 0: 1580.1. Samples: 24941150. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:19:14,123][41694] Avg episode reward: [(0, '4.507')] +[2024-11-08 03:19:14,196][42004] Updated weights for policy 0, policy_version 29246 (0.0041) +[2024-11-08 03:19:17,931][41694] Fps is (10 sec: 5324.9, 60 sec: 6758.4, 300 sec: 6748.0). Total num frames: 119816192. Throughput: 0: 1624.4. Samples: 24946900. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:19:17,934][41694] Avg episode reward: [(0, '4.323')] +[2024-11-08 03:19:20,099][42004] Updated weights for policy 0, policy_version 29256 (0.0029) +[2024-11-08 03:19:22,932][41694] Fps is (10 sec: 7438.0, 60 sec: 6690.1, 300 sec: 6761.9). Total num frames: 119853056. Throughput: 0: 1671.8. Samples: 24957622. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:19:22,933][41694] Avg episode reward: [(0, '4.252')] +[2024-11-08 03:19:25,538][42004] Updated weights for policy 0, policy_version 29266 (0.0025) +[2024-11-08 03:19:27,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6690.1, 300 sec: 6761.9). Total num frames: 119889920. Throughput: 0: 1727.4. Samples: 24968656. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 03:19:27,933][41694] Avg episode reward: [(0, '4.418')] +[2024-11-08 03:19:30,943][42004] Updated weights for policy 0, policy_version 29276 (0.0029) +[2024-11-08 03:19:32,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6690.1, 300 sec: 6817.4). Total num frames: 119926784. Throughput: 0: 1733.9. Samples: 24974244. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 03:19:32,933][41694] Avg episode reward: [(0, '4.379')] +[2024-11-08 03:19:36,609][42004] Updated weights for policy 0, policy_version 29286 (0.0027) +[2024-11-08 03:19:37,932][41694] Fps is (10 sec: 7372.6, 60 sec: 7002.5, 300 sec: 6817.4). Total num frames: 119963648. Throughput: 0: 1722.7. Samples: 24985304. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 03:19:37,933][41694] Avg episode reward: [(0, '4.525')] +[2024-11-08 03:19:37,952][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000029288_119963648.pth... +[2024-11-08 03:19:38,077][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000028890_118333440.pth +[2024-11-08 03:19:42,070][42004] Updated weights for policy 0, policy_version 29296 (0.0034) +[2024-11-08 03:19:42,932][41694] Fps is (10 sec: 7372.8, 60 sec: 7031.5, 300 sec: 6817.4). Total num frames: 120000512. Throughput: 0: 1709.7. Samples: 24996468. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 03:19:42,933][41694] Avg episode reward: [(0, '4.355')] +[2024-11-08 03:19:48,356][41694] Fps is (10 sec: 5500.9, 60 sec: 6710.9, 300 sec: 6738.3). Total num frames: 120020992. Throughput: 0: 1679.4. Samples: 25001200. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:19:48,362][41694] Avg episode reward: [(0, '4.477')] +[2024-11-08 03:19:50,480][42004] Updated weights for policy 0, policy_version 29306 (0.0030) +[2024-11-08 03:19:52,931][41694] Fps is (10 sec: 5324.8, 60 sec: 6621.9, 300 sec: 6734.1). Total num frames: 120053760. Throughput: 0: 1599.5. Samples: 25007656. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:19:52,934][41694] Avg episode reward: [(0, '4.346')] +[2024-11-08 03:19:56,140][42004] Updated weights for policy 0, policy_version 29316 (0.0043) +[2024-11-08 03:19:57,932][41694] Fps is (10 sec: 7272.0, 60 sec: 6621.8, 300 sec: 6761.9). Total num frames: 120090624. Throughput: 0: 1776.0. Samples: 25018960. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:19:57,935][41694] Avg episode reward: [(0, '4.554')] +[2024-11-08 03:20:01,692][42004] Updated weights for policy 0, policy_version 29326 (0.0033) +[2024-11-08 03:20:02,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6621.9, 300 sec: 6761.9). Total num frames: 120127488. Throughput: 0: 1722.0. Samples: 25024388. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:20:02,937][41694] Avg episode reward: [(0, '4.482')] +[2024-11-08 03:20:07,234][42004] Updated weights for policy 0, policy_version 29336 (0.0032) +[2024-11-08 03:20:07,931][41694] Fps is (10 sec: 7373.0, 60 sec: 6690.2, 300 sec: 6817.4). Total num frames: 120164352. Throughput: 0: 1724.0. Samples: 25035202. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:20:07,933][41694] Avg episode reward: [(0, '4.352')] +[2024-11-08 03:20:12,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6964.7, 300 sec: 6817.4). Total num frames: 120197120. Throughput: 0: 1726.3. Samples: 25046340. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:20:12,933][41694] Avg episode reward: [(0, '4.400')] +[2024-11-08 03:20:12,995][42004] Updated weights for policy 0, policy_version 29346 (0.0034) +[2024-11-08 03:20:17,934][41694] Fps is (10 sec: 6551.9, 60 sec: 6894.6, 300 sec: 6817.4). Total num frames: 120229888. Throughput: 0: 1702.5. Samples: 25050860. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:20:17,937][41694] Avg episode reward: [(0, '4.406')] +[2024-11-08 03:20:19,499][42004] Updated weights for policy 0, policy_version 29356 (0.0025) +[2024-11-08 03:20:22,963][41694] Fps is (10 sec: 4900.0, 60 sec: 6550.2, 300 sec: 6733.4). Total num frames: 120246272. Throughput: 0: 1564.0. Samples: 25055732. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:20:22,965][41694] Avg episode reward: [(0, '4.513')] +[2024-11-08 03:20:27,630][42004] Updated weights for policy 0, policy_version 29366 (0.0028) +[2024-11-08 03:20:27,932][41694] Fps is (10 sec: 5326.1, 60 sec: 6553.6, 300 sec: 6748.0). Total num frames: 120283136. Throughput: 0: 1570.0. Samples: 25067118. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:20:27,933][41694] Avg episode reward: [(0, '4.259')] +[2024-11-08 03:20:32,932][41694] Fps is (10 sec: 7395.8, 60 sec: 6553.6, 300 sec: 6748.0). Total num frames: 120320000. Throughput: 0: 1603.8. Samples: 25072688. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:20:32,933][41694] Avg episode reward: [(0, '4.423')] +[2024-11-08 03:20:32,940][42004] Updated weights for policy 0, policy_version 29376 (0.0044) +[2024-11-08 03:20:37,931][41694] Fps is (10 sec: 7782.5, 60 sec: 6621.9, 300 sec: 6761.9). Total num frames: 120360960. Throughput: 0: 1702.4. Samples: 25084266. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:20:37,933][41694] Avg episode reward: [(0, '4.441')] +[2024-11-08 03:20:38,316][42004] Updated weights for policy 0, policy_version 29386 (0.0033) +[2024-11-08 03:20:42,931][41694] Fps is (10 sec: 7782.5, 60 sec: 6621.9, 300 sec: 6817.4). Total num frames: 120397824. Throughput: 0: 1702.3. Samples: 25095564. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:20:42,933][41694] Avg episode reward: [(0, '4.363')] +[2024-11-08 03:20:43,962][42004] Updated weights for policy 0, policy_version 29396 (0.0033) +[2024-11-08 03:20:47,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6944.1, 300 sec: 6831.3). Total num frames: 120434688. Throughput: 0: 1701.2. Samples: 25100942. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:20:47,933][41694] Avg episode reward: [(0, '4.470')] +[2024-11-08 03:20:49,581][42004] Updated weights for policy 0, policy_version 29406 (0.0027) +[2024-11-08 03:20:52,932][41694] Fps is (10 sec: 6963.0, 60 sec: 6894.9, 300 sec: 6817.4). Total num frames: 120467456. Throughput: 0: 1698.7. Samples: 25111644. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:20:52,934][41694] Avg episode reward: [(0, '4.545')] +[2024-11-08 03:20:57,932][41694] Fps is (10 sec: 4914.9, 60 sec: 6553.6, 300 sec: 6748.0). Total num frames: 120483840. Throughput: 0: 1601.0. Samples: 25118386. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:20:57,934][41694] Avg episode reward: [(0, '4.627')] +[2024-11-08 03:20:58,343][42004] Updated weights for policy 0, policy_version 29416 (0.0032) +[2024-11-08 03:21:02,933][41694] Fps is (10 sec: 5324.5, 60 sec: 6553.5, 300 sec: 6748.0). Total num frames: 120520704. Throughput: 0: 1593.5. Samples: 25122566. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:21:02,940][41694] Avg episode reward: [(0, '4.583')] +[2024-11-08 03:21:04,000][42004] Updated weights for policy 0, policy_version 29426 (0.0023) +[2024-11-08 03:21:07,931][41694] Fps is (10 sec: 6963.5, 60 sec: 6485.3, 300 sec: 6748.0). Total num frames: 120553472. Throughput: 0: 1719.3. Samples: 25133048. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:21:07,933][41694] Avg episode reward: [(0, '4.431')] +[2024-11-08 03:21:09,645][42004] Updated weights for policy 0, policy_version 29436 (0.0024) +[2024-11-08 03:21:12,931][41694] Fps is (10 sec: 6963.8, 60 sec: 6553.6, 300 sec: 6748.0). Total num frames: 120590336. Throughput: 0: 1701.8. Samples: 25143698. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:21:12,933][41694] Avg episode reward: [(0, '4.468')] +[2024-11-08 03:21:15,582][42004] Updated weights for policy 0, policy_version 29446 (0.0031) +[2024-11-08 03:21:17,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6622.1, 300 sec: 6803.5). Total num frames: 120627200. Throughput: 0: 1698.0. Samples: 25149098. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:21:17,933][41694] Avg episode reward: [(0, '4.573')] +[2024-11-08 03:21:21,015][42004] Updated weights for policy 0, policy_version 29456 (0.0048) +[2024-11-08 03:21:22,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6898.5, 300 sec: 6803.5). Total num frames: 120659968. Throughput: 0: 1691.5. Samples: 25160386. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:21:22,936][41694] Avg episode reward: [(0, '4.677')] +[2024-11-08 03:21:27,141][42004] Updated weights for policy 0, policy_version 29466 (0.0038) +[2024-11-08 03:21:27,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6894.9, 300 sec: 6803.5). Total num frames: 120696832. Throughput: 0: 1663.4. Samples: 25170416. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:21:27,933][41694] Avg episode reward: [(0, '4.480')] +[2024-11-08 03:21:32,932][41694] Fps is (10 sec: 5324.8, 60 sec: 6553.6, 300 sec: 6734.1). Total num frames: 120713216. Throughput: 0: 1647.6. Samples: 25175084. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:21:32,934][41694] Avg episode reward: [(0, '4.270')] +[2024-11-08 03:21:35,586][42004] Updated weights for policy 0, policy_version 29476 (0.0032) +[2024-11-08 03:21:37,932][41694] Fps is (10 sec: 5324.5, 60 sec: 6485.3, 300 sec: 6734.1). Total num frames: 120750080. Throughput: 0: 1558.1. Samples: 25181758. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:21:37,934][41694] Avg episode reward: [(0, '4.336')] +[2024-11-08 03:21:37,949][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000029480_120750080.pth... +[2024-11-08 03:21:38,073][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000029085_119132160.pth +[2024-11-08 03:21:41,328][42004] Updated weights for policy 0, policy_version 29486 (0.0025) +[2024-11-08 03:21:42,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6417.1, 300 sec: 6720.2). Total num frames: 120782848. Throughput: 0: 1649.7. Samples: 25192622. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:21:42,937][41694] Avg episode reward: [(0, '4.408')] +[2024-11-08 03:21:46,592][42004] Updated weights for policy 0, policy_version 29496 (0.0031) +[2024-11-08 03:21:47,932][41694] Fps is (10 sec: 7373.2, 60 sec: 6485.3, 300 sec: 6734.1). Total num frames: 120823808. Throughput: 0: 1680.9. Samples: 25198204. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:21:47,937][41694] Avg episode reward: [(0, '4.409')] +[2024-11-08 03:21:51,920][42004] Updated weights for policy 0, policy_version 29506 (0.0039) +[2024-11-08 03:21:52,931][41694] Fps is (10 sec: 7782.4, 60 sec: 6553.6, 300 sec: 6789.6). Total num frames: 120860672. Throughput: 0: 1704.8. Samples: 25209762. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:21:52,933][41694] Avg episode reward: [(0, '4.322')] +[2024-11-08 03:21:57,526][42004] Updated weights for policy 0, policy_version 29516 (0.0035) +[2024-11-08 03:21:57,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6895.0, 300 sec: 6803.5). Total num frames: 120897536. Throughput: 0: 1720.3. Samples: 25221110. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:21:57,933][41694] Avg episode reward: [(0, '4.512')] +[2024-11-08 03:22:02,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6826.7, 300 sec: 6775.8). Total num frames: 120930304. Throughput: 0: 1704.2. Samples: 25225786. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-08 03:22:02,936][41694] Avg episode reward: [(0, '4.399')] +[2024-11-08 03:22:04,007][42004] Updated weights for policy 0, policy_version 29526 (0.0040) +[2024-11-08 03:22:07,932][41694] Fps is (10 sec: 4915.1, 60 sec: 6553.6, 300 sec: 6706.3). Total num frames: 120946688. Throughput: 0: 1613.9. Samples: 25233010. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-08 03:22:07,935][41694] Avg episode reward: [(0, '4.464')] +[2024-11-08 03:22:12,071][42004] Updated weights for policy 0, policy_version 29536 (0.0041) +[2024-11-08 03:22:12,932][41694] Fps is (10 sec: 5324.8, 60 sec: 6553.6, 300 sec: 6720.2). Total num frames: 120983552. Throughput: 0: 1597.6. Samples: 25242310. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-08 03:22:12,933][41694] Avg episode reward: [(0, '4.434')] +[2024-11-08 03:22:17,334][42004] Updated weights for policy 0, policy_version 29546 (0.0026) +[2024-11-08 03:22:17,932][41694] Fps is (10 sec: 7782.5, 60 sec: 6621.9, 300 sec: 6734.1). Total num frames: 121024512. Throughput: 0: 1621.8. Samples: 25248064. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-08 03:22:17,933][41694] Avg episode reward: [(0, '4.460')] +[2024-11-08 03:22:22,892][42004] Updated weights for policy 0, policy_version 29556 (0.0023) +[2024-11-08 03:22:22,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6690.1, 300 sec: 6784.4). Total num frames: 121061376. Throughput: 0: 1731.0. Samples: 25259654. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:22:22,933][41694] Avg episode reward: [(0, '4.551')] +[2024-11-08 03:22:27,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6690.1, 300 sec: 6803.5). Total num frames: 121098240. Throughput: 0: 1732.0. Samples: 25270562. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:22:27,933][41694] Avg episode reward: [(0, '4.357')] +[2024-11-08 03:22:28,402][42004] Updated weights for policy 0, policy_version 29566 (0.0028) +[2024-11-08 03:22:32,933][41694] Fps is (10 sec: 7372.0, 60 sec: 7031.3, 300 sec: 6789.6). Total num frames: 121135104. Throughput: 0: 1729.5. Samples: 25276034. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:22:32,935][41694] Avg episode reward: [(0, '4.275')] +[2024-11-08 03:22:34,221][42004] Updated weights for policy 0, policy_version 29576 (0.0031) +[2024-11-08 03:22:37,932][41694] Fps is (10 sec: 6553.2, 60 sec: 6894.9, 300 sec: 6761.9). Total num frames: 121163776. Throughput: 0: 1693.7. Samples: 25285980. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:22:37,934][41694] Avg episode reward: [(0, '4.453')] +[2024-11-08 03:22:42,555][42004] Updated weights for policy 0, policy_version 29586 (0.2268) +[2024-11-08 03:22:42,932][41694] Fps is (10 sec: 4915.8, 60 sec: 6690.1, 300 sec: 6692.4). Total num frames: 121184256. Throughput: 0: 1586.8. Samples: 25292518. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:22:42,934][41694] Avg episode reward: [(0, '4.624')] +[2024-11-08 03:22:47,931][41694] Fps is (10 sec: 5734.7, 60 sec: 6621.9, 300 sec: 6678.6). Total num frames: 121221120. Throughput: 0: 1603.9. Samples: 25297962. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:22:47,933][41694] Avg episode reward: [(0, '4.504')] +[2024-11-08 03:22:48,098][42004] Updated weights for policy 0, policy_version 29596 (0.0030) +[2024-11-08 03:22:52,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6690.1, 300 sec: 6692.4). Total num frames: 121262080. Throughput: 0: 1699.4. Samples: 25309484. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:22:52,934][41694] Avg episode reward: [(0, '4.524')] +[2024-11-08 03:22:53,415][42004] Updated weights for policy 0, policy_version 29606 (0.0025) +[2024-11-08 03:22:57,931][41694] Fps is (10 sec: 7782.4, 60 sec: 6690.1, 300 sec: 6763.9). Total num frames: 121298944. Throughput: 0: 1751.6. Samples: 25321132. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:22:57,933][41694] Avg episode reward: [(0, '4.384')] +[2024-11-08 03:22:58,710][42004] Updated weights for policy 0, policy_version 29616 (0.0032) +[2024-11-08 03:23:02,932][41694] Fps is (10 sec: 7372.9, 60 sec: 6758.4, 300 sec: 6775.8). Total num frames: 121335808. Throughput: 0: 1746.3. Samples: 25326648. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:23:02,934][41694] Avg episode reward: [(0, '4.451')] +[2024-11-08 03:23:04,600][42004] Updated weights for policy 0, policy_version 29626 (0.0034) +[2024-11-08 03:23:07,932][41694] Fps is (10 sec: 6963.1, 60 sec: 7031.5, 300 sec: 6761.9). Total num frames: 121368576. Throughput: 0: 1721.2. Samples: 25337110. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:23:07,934][41694] Avg episode reward: [(0, '4.410')] +[2024-11-08 03:23:10,842][42004] Updated weights for policy 0, policy_version 29636 (0.0032) +[2024-11-08 03:23:12,931][41694] Fps is (10 sec: 6553.6, 60 sec: 6963.2, 300 sec: 6748.0). Total num frames: 121401344. Throughput: 0: 1699.0. Samples: 25347016. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:23:12,934][41694] Avg episode reward: [(0, '4.543')] +[2024-11-08 03:23:17,932][41694] Fps is (10 sec: 5324.5, 60 sec: 6621.8, 300 sec: 6678.5). Total num frames: 121421824. Throughput: 0: 1636.2. Samples: 25349660. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:23:17,935][41694] Avg episode reward: [(0, '4.384')] +[2024-11-08 03:23:18,910][42004] Updated weights for policy 0, policy_version 29646 (0.0029) +[2024-11-08 03:23:22,931][41694] Fps is (10 sec: 5734.4, 60 sec: 6621.9, 300 sec: 6678.6). Total num frames: 121458688. Throughput: 0: 1617.3. Samples: 25358756. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:23:22,933][41694] Avg episode reward: [(0, '4.615')] +[2024-11-08 03:23:24,555][42004] Updated weights for policy 0, policy_version 29656 (0.0037) +[2024-11-08 03:23:27,932][41694] Fps is (10 sec: 7373.2, 60 sec: 6621.9, 300 sec: 6678.6). Total num frames: 121495552. Throughput: 0: 1722.8. Samples: 25370046. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:23:27,934][41694] Avg episode reward: [(0, '4.420')] +[2024-11-08 03:23:30,061][42004] Updated weights for policy 0, policy_version 29666 (0.0028) +[2024-11-08 03:23:32,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6622.0, 300 sec: 6740.8). Total num frames: 121532416. Throughput: 0: 1724.0. Samples: 25375540. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:23:32,934][41694] Avg episode reward: [(0, '4.398')] +[2024-11-08 03:23:35,549][42004] Updated weights for policy 0, policy_version 29676 (0.0029) +[2024-11-08 03:23:37,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6758.5, 300 sec: 6748.0). Total num frames: 121569280. Throughput: 0: 1716.1. Samples: 25386710. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:23:37,933][41694] Avg episode reward: [(0, '4.695')] +[2024-11-08 03:23:37,943][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000029680_121569280.pth... +[2024-11-08 03:23:38,058][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000029288_119963648.pth +[2024-11-08 03:23:41,495][42004] Updated weights for policy 0, policy_version 29686 (0.0034) +[2024-11-08 03:23:42,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6963.2, 300 sec: 6734.1). Total num frames: 121602048. Throughput: 0: 1680.2. Samples: 25396742. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:23:42,934][41694] Avg episode reward: [(0, '4.608')] +[2024-11-08 03:23:47,430][42004] Updated weights for policy 0, policy_version 29696 (0.0026) +[2024-11-08 03:23:47,932][41694] Fps is (10 sec: 6553.5, 60 sec: 6894.9, 300 sec: 6706.3). Total num frames: 121634816. Throughput: 0: 1671.3. Samples: 25401856. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:23:47,933][41694] Avg episode reward: [(0, '4.279')] +[2024-11-08 03:23:52,932][41694] Fps is (10 sec: 5324.8, 60 sec: 6553.6, 300 sec: 6650.8). Total num frames: 121655296. Throughput: 0: 1586.0. Samples: 25408478. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:23:52,936][41694] Avg episode reward: [(0, '4.277')] +[2024-11-08 03:23:55,317][42004] Updated weights for policy 0, policy_version 29706 (0.0027) +[2024-11-08 03:23:57,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6553.6, 300 sec: 6650.8). Total num frames: 121692160. Throughput: 0: 1608.9. Samples: 25419416. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:23:57,933][41694] Avg episode reward: [(0, '4.369')] +[2024-11-08 03:24:00,695][42004] Updated weights for policy 0, policy_version 29716 (0.0025) +[2024-11-08 03:24:02,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6553.6, 300 sec: 6664.7). Total num frames: 121729024. Throughput: 0: 1683.9. Samples: 25425436. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:24:02,933][41694] Avg episode reward: [(0, '4.239')] +[2024-11-08 03:24:06,218][42004] Updated weights for policy 0, policy_version 29726 (0.0022) +[2024-11-08 03:24:07,933][41694] Fps is (10 sec: 7781.9, 60 sec: 6690.0, 300 sec: 6747.4). Total num frames: 121769984. Throughput: 0: 1729.7. Samples: 25436594. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:24:07,935][41694] Avg episode reward: [(0, '4.549')] +[2024-11-08 03:24:11,530][42004] Updated weights for policy 0, policy_version 29736 (0.0031) +[2024-11-08 03:24:12,931][41694] Fps is (10 sec: 7782.4, 60 sec: 6758.4, 300 sec: 6748.0). Total num frames: 121806848. Throughput: 0: 1735.5. Samples: 25448142. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:24:12,933][41694] Avg episode reward: [(0, '4.375')] +[2024-11-08 03:24:17,800][42004] Updated weights for policy 0, policy_version 29746 (0.0036) +[2024-11-08 03:24:17,932][41694] Fps is (10 sec: 6963.7, 60 sec: 6963.2, 300 sec: 6734.1). Total num frames: 121839616. Throughput: 0: 1711.9. Samples: 25452578. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:24:17,934][41694] Avg episode reward: [(0, '4.360')] +[2024-11-08 03:24:25,082][41694] Fps is (10 sec: 5393.8, 60 sec: 6656.4, 300 sec: 6671.6). Total num frames: 121872384. Throughput: 0: 1620.1. Samples: 25463096. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:24:25,087][41694] Avg episode reward: [(0, '4.305')] +[2024-11-08 03:24:25,833][42004] Updated weights for policy 0, policy_version 29756 (0.0026) +[2024-11-08 03:24:27,931][41694] Fps is (10 sec: 5324.9, 60 sec: 6621.9, 300 sec: 6664.7). Total num frames: 121892864. Throughput: 0: 1627.3. Samples: 25469970. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:24:27,934][41694] Avg episode reward: [(0, '4.340')] +[2024-11-08 03:24:31,237][42004] Updated weights for policy 0, policy_version 29766 (0.0023) +[2024-11-08 03:24:32,931][41694] Fps is (10 sec: 7827.2, 60 sec: 6690.1, 300 sec: 6678.6). Total num frames: 121933824. Throughput: 0: 1637.1. Samples: 25475524. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:24:32,933][41694] Avg episode reward: [(0, '4.349')] +[2024-11-08 03:24:36,482][42004] Updated weights for policy 0, policy_version 29776 (0.0032) +[2024-11-08 03:24:37,933][41694] Fps is (10 sec: 7781.0, 60 sec: 6689.9, 300 sec: 6678.5). Total num frames: 121970688. Throughput: 0: 1749.0. Samples: 25487184. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:24:37,935][41694] Avg episode reward: [(0, '4.512')] +[2024-11-08 03:24:41,906][42004] Updated weights for policy 0, policy_version 29786 (0.0028) +[2024-11-08 03:24:42,931][41694] Fps is (10 sec: 7372.7, 60 sec: 6758.4, 300 sec: 6743.8). Total num frames: 122007552. Throughput: 0: 1758.8. Samples: 25498562. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-11-08 03:24:42,933][41694] Avg episode reward: [(0, '4.511')] +[2024-11-08 03:24:47,461][42004] Updated weights for policy 0, policy_version 29796 (0.0028) +[2024-11-08 03:24:47,932][41694] Fps is (10 sec: 7374.0, 60 sec: 6826.6, 300 sec: 6748.0). Total num frames: 122044416. Throughput: 0: 1752.7. Samples: 25504308. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-11-08 03:24:47,934][41694] Avg episode reward: [(0, '4.523')] +[2024-11-08 03:24:52,931][41694] Fps is (10 sec: 7372.8, 60 sec: 7099.7, 300 sec: 6748.0). Total num frames: 122081280. Throughput: 0: 1725.5. Samples: 25514240. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-11-08 03:24:52,933][41694] Avg episode reward: [(0, '4.315')] +[2024-11-08 03:24:53,473][42004] Updated weights for policy 0, policy_version 29806 (0.0035) +[2024-11-08 03:24:59,585][41694] Fps is (10 sec: 5623.9, 60 sec: 6776.5, 300 sec: 6682.8). Total num frames: 122109952. Throughput: 0: 1651.9. Samples: 25525208. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-11-08 03:24:59,586][41694] Avg episode reward: [(0, '4.491')] +[2024-11-08 03:25:01,609][42004] Updated weights for policy 0, policy_version 29816 (0.0028) +[2024-11-08 03:25:02,931][41694] Fps is (10 sec: 5324.8, 60 sec: 6758.4, 300 sec: 6678.6). Total num frames: 122134528. Throughput: 0: 1643.7. Samples: 25526546. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:25:02,933][41694] Avg episode reward: [(0, '4.627')] +[2024-11-08 03:25:07,204][42004] Updated weights for policy 0, policy_version 29826 (0.0042) +[2024-11-08 03:25:07,931][41694] Fps is (10 sec: 7361.1, 60 sec: 6690.2, 300 sec: 6692.5). Total num frames: 122171392. Throughput: 0: 1724.2. Samples: 25536976. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:25:07,934][41694] Avg episode reward: [(0, '4.262')] +[2024-11-08 03:25:12,374][42004] Updated weights for policy 0, policy_version 29836 (0.0026) +[2024-11-08 03:25:12,932][41694] Fps is (10 sec: 7782.3, 60 sec: 6758.4, 300 sec: 6720.3). Total num frames: 122212352. Throughput: 0: 1753.0. Samples: 25548856. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:25:12,937][41694] Avg episode reward: [(0, '4.438')] +[2024-11-08 03:25:17,932][41694] Fps is (10 sec: 7372.6, 60 sec: 6758.4, 300 sec: 6776.5). Total num frames: 122245120. Throughput: 0: 1740.0. Samples: 25553824. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:25:17,933][41694] Avg episode reward: [(0, '4.346')] +[2024-11-08 03:25:18,402][42004] Updated weights for policy 0, policy_version 29846 (0.0035) +[2024-11-08 03:25:22,931][41694] Fps is (10 sec: 6553.7, 60 sec: 7009.6, 300 sec: 6761.9). Total num frames: 122277888. Throughput: 0: 1714.3. Samples: 25564326. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:25:22,933][41694] Avg episode reward: [(0, '4.537')] +[2024-11-08 03:25:24,533][42004] Updated weights for policy 0, policy_version 29856 (0.0026) +[2024-11-08 03:25:27,932][41694] Fps is (10 sec: 6963.2, 60 sec: 7031.5, 300 sec: 6761.9). Total num frames: 122314752. Throughput: 0: 1690.1. Samples: 25574616. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:25:27,933][41694] Avg episode reward: [(0, '4.300')] +[2024-11-08 03:25:30,258][42004] Updated weights for policy 0, policy_version 29866 (0.0026) +[2024-11-08 03:25:34,083][41694] Fps is (10 sec: 5510.0, 60 sec: 6631.2, 300 sec: 6680.3). Total num frames: 122339328. Throughput: 0: 1639.7. Samples: 25579982. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:25:34,096][41694] Avg episode reward: [(0, '4.334')] +[2024-11-08 03:25:37,932][41694] Fps is (10 sec: 5324.7, 60 sec: 6622.0, 300 sec: 6678.6). Total num frames: 122368000. Throughput: 0: 1591.0. Samples: 25585834. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:25:37,938][41694] Avg episode reward: [(0, '4.416')] +[2024-11-08 03:25:37,951][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000029875_122368000.pth... +[2024-11-08 03:25:38,065][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000029480_120750080.pth +[2024-11-08 03:25:38,474][42004] Updated weights for policy 0, policy_version 29876 (0.0027) +[2024-11-08 03:25:42,931][41694] Fps is (10 sec: 7405.6, 60 sec: 6621.9, 300 sec: 6678.6). Total num frames: 122404864. Throughput: 0: 1663.1. Samples: 25597296. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:25:42,933][41694] Avg episode reward: [(0, '4.444')] +[2024-11-08 03:25:43,774][42004] Updated weights for policy 0, policy_version 29886 (0.0024) +[2024-11-08 03:25:47,932][41694] Fps is (10 sec: 7372.9, 60 sec: 6621.9, 300 sec: 6692.4). Total num frames: 122441728. Throughput: 0: 1697.5. Samples: 25602932. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:25:47,934][41694] Avg episode reward: [(0, '4.358')] +[2024-11-08 03:25:49,167][42004] Updated weights for policy 0, policy_version 29896 (0.0025) +[2024-11-08 03:25:52,932][41694] Fps is (10 sec: 7782.3, 60 sec: 6690.1, 300 sec: 6775.8). Total num frames: 122482688. Throughput: 0: 1718.3. Samples: 25614302. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:25:52,933][41694] Avg episode reward: [(0, '4.474')] +[2024-11-08 03:25:54,884][42004] Updated weights for policy 0, policy_version 29906 (0.0041) +[2024-11-08 03:25:57,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6879.7, 300 sec: 6748.0). Total num frames: 122511360. Throughput: 0: 1683.6. Samples: 25624616. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:25:57,935][41694] Avg episode reward: [(0, '4.350')] +[2024-11-08 03:26:01,146][42004] Updated weights for policy 0, policy_version 29916 (0.0039) +[2024-11-08 03:26:02,932][41694] Fps is (10 sec: 6553.6, 60 sec: 6894.9, 300 sec: 6761.9). Total num frames: 122548224. Throughput: 0: 1685.1. Samples: 25629652. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:26:02,934][41694] Avg episode reward: [(0, '4.459')] +[2024-11-08 03:26:08,583][41694] Fps is (10 sec: 5383.9, 60 sec: 6550.8, 300 sec: 6691.6). Total num frames: 122568704. Throughput: 0: 1652.7. Samples: 25639772. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:26:08,586][41694] Avg episode reward: [(0, '4.560')] +[2024-11-08 03:26:09,380][42004] Updated weights for policy 0, policy_version 29926 (0.0028) +[2024-11-08 03:26:12,932][41694] Fps is (10 sec: 4915.2, 60 sec: 6417.1, 300 sec: 6678.6). Total num frames: 122597376. Throughput: 0: 1577.6. Samples: 25645606. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:26:12,933][41694] Avg episode reward: [(0, '4.414')] +[2024-11-08 03:26:15,686][42004] Updated weights for policy 0, policy_version 29936 (0.0042) +[2024-11-08 03:26:17,932][41694] Fps is (10 sec: 7010.0, 60 sec: 6485.3, 300 sec: 6692.4). Total num frames: 122634240. Throughput: 0: 1612.1. Samples: 25650674. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:26:17,933][41694] Avg episode reward: [(0, '4.331')] +[2024-11-08 03:26:20,870][42004] Updated weights for policy 0, policy_version 29946 (0.0022) +[2024-11-08 03:26:22,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6553.6, 300 sec: 6692.5). Total num frames: 122671104. Throughput: 0: 1704.3. Samples: 25662526. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:26:22,933][41694] Avg episode reward: [(0, '4.450')] +[2024-11-08 03:26:26,207][42004] Updated weights for policy 0, policy_version 29956 (0.0033) +[2024-11-08 03:26:27,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6553.6, 300 sec: 6761.9). Total num frames: 122707968. Throughput: 0: 1702.4. Samples: 25673904. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:26:27,933][41694] Avg episode reward: [(0, '4.486')] +[2024-11-08 03:26:32,932][41694] Fps is (10 sec: 6553.3, 60 sec: 6751.3, 300 sec: 6734.1). Total num frames: 122736640. Throughput: 0: 1669.3. Samples: 25678052. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:26:32,934][41694] Avg episode reward: [(0, '4.406')] +[2024-11-08 03:26:33,015][42004] Updated weights for policy 0, policy_version 29966 (0.0043) +[2024-11-08 03:26:37,932][41694] Fps is (10 sec: 6553.5, 60 sec: 6758.4, 300 sec: 6748.0). Total num frames: 122773504. Throughput: 0: 1646.7. Samples: 25688402. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:26:37,934][41694] Avg episode reward: [(0, '4.380')] +[2024-11-08 03:26:38,537][42004] Updated weights for policy 0, policy_version 29976 (0.0028) +[2024-11-08 03:26:43,005][41694] Fps is (10 sec: 6099.6, 60 sec: 6545.6, 300 sec: 6690.8). Total num frames: 122798080. Throughput: 0: 1534.8. Samples: 25693796. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:26:43,008][41694] Avg episode reward: [(0, '4.492')] +[2024-11-08 03:26:46,584][42004] Updated weights for policy 0, policy_version 29986 (0.0030) +[2024-11-08 03:26:47,931][41694] Fps is (10 sec: 5734.5, 60 sec: 6485.3, 300 sec: 6678.6). Total num frames: 122830848. Throughput: 0: 1569.4. Samples: 25700274. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:26:47,939][41694] Avg episode reward: [(0, '4.545')] +[2024-11-08 03:26:51,867][42004] Updated weights for policy 0, policy_version 29996 (0.0026) +[2024-11-08 03:26:52,931][41694] Fps is (10 sec: 7427.3, 60 sec: 6485.4, 300 sec: 6692.4). Total num frames: 122871808. Throughput: 0: 1622.0. Samples: 25711704. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:26:52,933][41694] Avg episode reward: [(0, '4.665')] +[2024-11-08 03:26:56,954][42004] Updated weights for policy 0, policy_version 30006 (0.0026) +[2024-11-08 03:26:57,932][41694] Fps is (10 sec: 7782.1, 60 sec: 6621.8, 300 sec: 6706.3). Total num frames: 122908672. Throughput: 0: 1737.1. Samples: 25723778. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:26:57,934][41694] Avg episode reward: [(0, '4.534')] +[2024-11-08 03:27:02,529][42004] Updated weights for policy 0, policy_version 30016 (0.0028) +[2024-11-08 03:27:02,931][41694] Fps is (10 sec: 7372.7, 60 sec: 6621.9, 300 sec: 6775.8). Total num frames: 122945536. Throughput: 0: 1752.5. Samples: 25729538. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:27:02,934][41694] Avg episode reward: [(0, '4.552')] +[2024-11-08 03:27:07,932][41694] Fps is (10 sec: 6963.4, 60 sec: 6901.6, 300 sec: 6761.9). Total num frames: 122978304. Throughput: 0: 1713.4. Samples: 25739628. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:27:07,933][41694] Avg episode reward: [(0, '4.343')] +[2024-11-08 03:27:08,483][42004] Updated weights for policy 0, policy_version 30026 (0.0041) +[2024-11-08 03:27:12,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6963.2, 300 sec: 6748.0). Total num frames: 123015168. Throughput: 0: 1699.5. Samples: 25750380. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:27:12,933][41694] Avg episode reward: [(0, '4.465')] +[2024-11-08 03:27:14,283][42004] Updated weights for policy 0, policy_version 30036 (0.0032) +[2024-11-08 03:27:17,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6690.1, 300 sec: 6692.4). Total num frames: 123035648. Throughput: 0: 1721.9. Samples: 25755538. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-08 03:27:17,933][41694] Avg episode reward: [(0, '4.498')] +[2024-11-08 03:27:22,573][42004] Updated weights for policy 0, policy_version 30046 (0.0036) +[2024-11-08 03:27:22,931][41694] Fps is (10 sec: 5324.9, 60 sec: 6621.9, 300 sec: 6678.6). Total num frames: 123068416. Throughput: 0: 1630.0. Samples: 25761752. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-08 03:27:22,934][41694] Avg episode reward: [(0, '4.598')] +[2024-11-08 03:27:27,932][41694] Fps is (10 sec: 6963.3, 60 sec: 6621.9, 300 sec: 6678.6). Total num frames: 123105280. Throughput: 0: 1753.1. Samples: 25772556. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-08 03:27:27,933][41694] Avg episode reward: [(0, '4.346')] +[2024-11-08 03:27:28,264][42004] Updated weights for policy 0, policy_version 30056 (0.0028) +[2024-11-08 03:27:32,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6758.4, 300 sec: 6706.3). Total num frames: 123142144. Throughput: 0: 1723.2. Samples: 25777818. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-08 03:27:32,935][41694] Avg episode reward: [(0, '4.542')] +[2024-11-08 03:27:34,010][42004] Updated weights for policy 0, policy_version 30066 (0.0023) +[2024-11-08 03:27:37,932][41694] Fps is (10 sec: 6553.6, 60 sec: 6621.9, 300 sec: 6734.1). Total num frames: 123170816. Throughput: 0: 1697.9. Samples: 25788110. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:27:37,933][41694] Avg episode reward: [(0, '4.430')] +[2024-11-08 03:27:37,952][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000030071_123170816.pth... +[2024-11-08 03:27:38,117][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000029680_121569280.pth +[2024-11-08 03:27:40,722][42004] Updated weights for policy 0, policy_version 30076 (0.0035) +[2024-11-08 03:27:42,931][41694] Fps is (10 sec: 6144.1, 60 sec: 6766.7, 300 sec: 6720.2). Total num frames: 123203584. Throughput: 0: 1642.9. Samples: 25797706. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:27:42,933][41694] Avg episode reward: [(0, '4.540')] +[2024-11-08 03:27:46,315][42004] Updated weights for policy 0, policy_version 30086 (0.0021) +[2024-11-08 03:27:47,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6826.7, 300 sec: 6706.3). Total num frames: 123240448. Throughput: 0: 1635.3. Samples: 25803126. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:27:47,935][41694] Avg episode reward: [(0, '4.515')] +[2024-11-08 03:27:52,932][41694] Fps is (10 sec: 6144.0, 60 sec: 6553.6, 300 sec: 6664.7). Total num frames: 123265024. Throughput: 0: 1599.2. Samples: 25811590. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:27:52,933][41694] Avg episode reward: [(0, '4.590')] +[2024-11-08 03:27:53,486][42004] Updated weights for policy 0, policy_version 30096 (0.0032) +[2024-11-08 03:27:57,932][41694] Fps is (10 sec: 6553.5, 60 sec: 6621.9, 300 sec: 6678.6). Total num frames: 123305984. Throughput: 0: 1606.2. Samples: 25822658. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:27:57,933][41694] Avg episode reward: [(0, '4.377')] +[2024-11-08 03:27:58,806][42004] Updated weights for policy 0, policy_version 30106 (0.0020) +[2024-11-08 03:28:02,932][41694] Fps is (10 sec: 7782.1, 60 sec: 6621.8, 300 sec: 6692.4). Total num frames: 123342848. Throughput: 0: 1620.9. Samples: 25828478. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:28:02,934][41694] Avg episode reward: [(0, '4.294')] +[2024-11-08 03:28:04,570][42004] Updated weights for policy 0, policy_version 30116 (0.0030) +[2024-11-08 03:28:07,932][41694] Fps is (10 sec: 6963.0, 60 sec: 6621.8, 300 sec: 6692.4). Total num frames: 123375616. Throughput: 0: 1715.5. Samples: 25838952. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:28:07,934][41694] Avg episode reward: [(0, '4.307')] +[2024-11-08 03:28:10,513][42004] Updated weights for policy 0, policy_version 30126 (0.0050) +[2024-11-08 03:28:12,932][41694] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6734.1). Total num frames: 123408384. Throughput: 0: 1695.6. Samples: 25848860. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:28:12,934][41694] Avg episode reward: [(0, '4.447')] +[2024-11-08 03:28:16,417][42004] Updated weights for policy 0, policy_version 30136 (0.0039) +[2024-11-08 03:28:17,932][41694] Fps is (10 sec: 6963.4, 60 sec: 6826.7, 300 sec: 6734.1). Total num frames: 123445248. Throughput: 0: 1691.2. Samples: 25853920. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:28:17,933][41694] Avg episode reward: [(0, '4.354')] +[2024-11-08 03:28:22,230][42004] Updated weights for policy 0, policy_version 30146 (0.0032) +[2024-11-08 03:28:22,932][41694] Fps is (10 sec: 7372.9, 60 sec: 6894.9, 300 sec: 6734.1). Total num frames: 123482112. Throughput: 0: 1706.2. Samples: 25864890. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:28:22,933][41694] Avg episode reward: [(0, '4.253')] +[2024-11-08 03:28:27,932][41694] Fps is (10 sec: 5734.2, 60 sec: 6621.8, 300 sec: 6678.5). Total num frames: 123502592. Throughput: 0: 1640.5. Samples: 25871528. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:28:27,935][41694] Avg episode reward: [(0, '4.435')] +[2024-11-08 03:28:30,053][42004] Updated weights for policy 0, policy_version 30156 (0.0036) +[2024-11-08 03:28:32,931][41694] Fps is (10 sec: 5734.6, 60 sec: 6621.9, 300 sec: 6678.6). Total num frames: 123539456. Throughput: 0: 1646.9. Samples: 25877236. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:28:32,933][41694] Avg episode reward: [(0, '4.442')] +[2024-11-08 03:28:35,445][42004] Updated weights for policy 0, policy_version 30166 (0.0027) +[2024-11-08 03:28:37,931][41694] Fps is (10 sec: 7373.3, 60 sec: 6758.4, 300 sec: 6692.4). Total num frames: 123576320. Throughput: 0: 1708.7. Samples: 25888480. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:28:37,933][41694] Avg episode reward: [(0, '4.362')] +[2024-11-08 03:28:40,879][42004] Updated weights for policy 0, policy_version 30176 (0.0029) +[2024-11-08 03:28:42,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6826.7, 300 sec: 6706.3). Total num frames: 123613184. Throughput: 0: 1718.3. Samples: 25899980. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:28:42,933][41694] Avg episode reward: [(0, '4.607')] +[2024-11-08 03:28:46,664][42004] Updated weights for policy 0, policy_version 30186 (0.0030) +[2024-11-08 03:28:47,932][41694] Fps is (10 sec: 7372.6, 60 sec: 6826.6, 300 sec: 6761.9). Total num frames: 123650048. Throughput: 0: 1698.8. Samples: 25904924. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:28:47,933][41694] Avg episode reward: [(0, '4.833')] +[2024-11-08 03:28:52,268][42004] Updated weights for policy 0, policy_version 30196 (0.0025) +[2024-11-08 03:28:52,932][41694] Fps is (10 sec: 7372.7, 60 sec: 7031.5, 300 sec: 6761.9). Total num frames: 123686912. Throughput: 0: 1705.2. Samples: 25915684. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:28:52,933][41694] Avg episode reward: [(0, '4.452')] +[2024-11-08 03:28:57,932][41694] Fps is (10 sec: 6963.3, 60 sec: 6894.9, 300 sec: 6748.0). Total num frames: 123719680. Throughput: 0: 1729.2. Samples: 25926674. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:28:57,933][41694] Avg episode reward: [(0, '4.508')] +[2024-11-08 03:28:57,992][42004] Updated weights for policy 0, policy_version 30206 (0.0027) +[2024-11-08 03:29:02,931][41694] Fps is (10 sec: 5324.8, 60 sec: 6621.9, 300 sec: 6678.6). Total num frames: 123740160. Throughput: 0: 1657.1. Samples: 25928490. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:29:02,933][41694] Avg episode reward: [(0, '4.549')] +[2024-11-08 03:29:05,953][42004] Updated weights for policy 0, policy_version 30216 (0.0023) +[2024-11-08 03:29:07,932][41694] Fps is (10 sec: 5734.0, 60 sec: 6690.1, 300 sec: 6678.5). Total num frames: 123777024. Throughput: 0: 1645.0. Samples: 25938918. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:29:07,934][41694] Avg episode reward: [(0, '4.662')] +[2024-11-08 03:29:11,277][42004] Updated weights for policy 0, policy_version 30226 (0.0030) +[2024-11-08 03:29:12,931][41694] Fps is (10 sec: 7782.4, 60 sec: 6826.7, 300 sec: 6706.3). Total num frames: 123817984. Throughput: 0: 1755.0. Samples: 25950500. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:29:12,934][41694] Avg episode reward: [(0, '4.730')] +[2024-11-08 03:29:16,801][42004] Updated weights for policy 0, policy_version 30236 (0.0029) +[2024-11-08 03:29:17,932][41694] Fps is (10 sec: 7373.3, 60 sec: 6758.4, 300 sec: 6755.6). Total num frames: 123850752. Throughput: 0: 1747.2. Samples: 25955862. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:29:17,933][41694] Avg episode reward: [(0, '4.576')] +[2024-11-08 03:29:22,599][42004] Updated weights for policy 0, policy_version 30246 (0.0026) +[2024-11-08 03:29:22,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6758.4, 300 sec: 6761.9). Total num frames: 123887616. Throughput: 0: 1733.0. Samples: 25966466. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:29:22,934][41694] Avg episode reward: [(0, '4.637')] +[2024-11-08 03:29:27,931][41694] Fps is (10 sec: 7372.9, 60 sec: 7031.5, 300 sec: 6748.0). Total num frames: 123924480. Throughput: 0: 1731.3. Samples: 25977888. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:29:27,933][41694] Avg episode reward: [(0, '4.733')] +[2024-11-08 03:29:27,949][42004] Updated weights for policy 0, policy_version 30256 (0.0033) +[2024-11-08 03:29:34,805][41694] Fps is (10 sec: 6209.6, 60 sec: 6818.6, 300 sec: 6705.4). Total num frames: 123961344. Throughput: 0: 1670.4. Samples: 25983220. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:29:34,808][41694] Avg episode reward: [(0, '4.467')] +[2024-11-08 03:29:35,875][42004] Updated weights for policy 0, policy_version 30266 (0.0024) +[2024-11-08 03:29:37,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6758.4, 300 sec: 6692.4). Total num frames: 123981824. Throughput: 0: 1655.7. Samples: 25990192. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:29:37,933][41694] Avg episode reward: [(0, '4.427')] +[2024-11-08 03:29:37,947][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000030269_123981824.pth... +[2024-11-08 03:29:38,078][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000029875_122368000.pth +[2024-11-08 03:29:41,416][42004] Updated weights for policy 0, policy_version 30276 (0.0022) +[2024-11-08 03:29:42,932][41694] Fps is (10 sec: 7056.1, 60 sec: 6758.4, 300 sec: 6692.4). Total num frames: 124018688. Throughput: 0: 1658.0. Samples: 26001286. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:29:42,934][41694] Avg episode reward: [(0, '4.397')] +[2024-11-08 03:29:46,879][42004] Updated weights for policy 0, policy_version 30286 (0.0029) +[2024-11-08 03:29:47,931][41694] Fps is (10 sec: 7782.5, 60 sec: 6826.7, 300 sec: 6706.3). Total num frames: 124059648. Throughput: 0: 1737.2. Samples: 26006664. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:29:47,933][41694] Avg episode reward: [(0, '4.356')] +[2024-11-08 03:29:52,581][42004] Updated weights for policy 0, policy_version 30296 (0.0026) +[2024-11-08 03:29:52,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6758.4, 300 sec: 6758.1). Total num frames: 124092416. Throughput: 0: 1765.4. Samples: 26018360. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:29:52,934][41694] Avg episode reward: [(0, '4.740')] +[2024-11-08 03:29:57,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6826.7, 300 sec: 6761.9). Total num frames: 124129280. Throughput: 0: 1729.0. Samples: 26028304. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:29:57,933][41694] Avg episode reward: [(0, '4.400')] +[2024-11-08 03:29:58,415][42004] Updated weights for policy 0, policy_version 30306 (0.0029) +[2024-11-08 03:30:02,931][41694] Fps is (10 sec: 6963.2, 60 sec: 7031.5, 300 sec: 6748.0). Total num frames: 124162048. Throughput: 0: 1729.7. Samples: 26033696. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:30:02,933][41694] Avg episode reward: [(0, '4.618')] +[2024-11-08 03:30:04,126][42004] Updated weights for policy 0, policy_version 30316 (0.0022) +[2024-11-08 03:30:09,542][41694] Fps is (10 sec: 5644.4, 60 sec: 6781.3, 300 sec: 6683.7). Total num frames: 124194816. Throughput: 0: 1668.3. Samples: 26044228. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:30:09,544][41694] Avg episode reward: [(0, '4.473')] +[2024-11-08 03:30:12,206][42004] Updated weights for policy 0, policy_version 30326 (0.0031) +[2024-11-08 03:30:12,932][41694] Fps is (10 sec: 5734.2, 60 sec: 6690.1, 300 sec: 6692.4). Total num frames: 124219392. Throughput: 0: 1628.1. Samples: 26051152. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:30:12,933][41694] Avg episode reward: [(0, '4.595')] +[2024-11-08 03:30:17,931][41694] Fps is (10 sec: 6835.4, 60 sec: 6690.1, 300 sec: 6692.4). Total num frames: 124252160. Throughput: 0: 1692.1. Samples: 26056194. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:30:17,934][41694] Avg episode reward: [(0, '4.639')] +[2024-11-08 03:30:18,092][42004] Updated weights for policy 0, policy_version 30336 (0.0026) +[2024-11-08 03:30:22,931][41694] Fps is (10 sec: 7373.0, 60 sec: 6758.4, 300 sec: 6706.3). Total num frames: 124293120. Throughput: 0: 1707.6. Samples: 26067032. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:30:22,932][41694] Avg episode reward: [(0, '4.578')] +[2024-11-08 03:30:23,364][42004] Updated weights for policy 0, policy_version 30346 (0.0027) +[2024-11-08 03:30:27,932][41694] Fps is (10 sec: 7782.3, 60 sec: 6758.4, 300 sec: 6774.4). Total num frames: 124329984. Throughput: 0: 1712.6. Samples: 26078354. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:30:27,934][41694] Avg episode reward: [(0, '4.569')] +[2024-11-08 03:30:28,950][42004] Updated weights for policy 0, policy_version 30356 (0.0030) +[2024-11-08 03:30:32,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6976.2, 300 sec: 6775.8). Total num frames: 124366848. Throughput: 0: 1715.0. Samples: 26083838. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:30:32,933][41694] Avg episode reward: [(0, '4.564')] +[2024-11-08 03:30:34,492][42004] Updated weights for policy 0, policy_version 30366 (0.0026) +[2024-11-08 03:30:37,931][41694] Fps is (10 sec: 7372.9, 60 sec: 7031.5, 300 sec: 6775.8). Total num frames: 124403712. Throughput: 0: 1705.7. Samples: 26095116. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:30:37,933][41694] Avg episode reward: [(0, '4.421')] +[2024-11-08 03:30:40,136][42004] Updated weights for policy 0, policy_version 30376 (0.0031) +[2024-11-08 03:30:44,344][41694] Fps is (10 sec: 5383.3, 60 sec: 6669.6, 300 sec: 6702.0). Total num frames: 124428288. Throughput: 0: 1558.6. Samples: 26100642. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:30:44,346][41694] Avg episode reward: [(0, '4.540')] +[2024-11-08 03:30:47,931][41694] Fps is (10 sec: 5324.7, 60 sec: 6621.9, 300 sec: 6692.4). Total num frames: 124456960. Throughput: 0: 1625.1. Samples: 26106824. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:30:47,933][41694] Avg episode reward: [(0, '4.495')] +[2024-11-08 03:30:48,089][42004] Updated weights for policy 0, policy_version 30386 (0.0026) +[2024-11-08 03:30:52,932][41694] Fps is (10 sec: 8108.4, 60 sec: 6758.3, 300 sec: 6734.1). Total num frames: 124497920. Throughput: 0: 1710.9. Samples: 26118462. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:30:52,934][41694] Avg episode reward: [(0, '4.398')] +[2024-11-08 03:30:53,309][42004] Updated weights for policy 0, policy_version 30396 (0.0026) +[2024-11-08 03:30:57,933][41694] Fps is (10 sec: 7781.8, 60 sec: 6758.3, 300 sec: 6734.1). Total num frames: 124534784. Throughput: 0: 1760.7. Samples: 26130386. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:30:57,935][41694] Avg episode reward: [(0, '4.559')] +[2024-11-08 03:30:58,562][42004] Updated weights for policy 0, policy_version 30406 (0.0036) +[2024-11-08 03:31:02,933][41694] Fps is (10 sec: 6962.6, 60 sec: 6758.2, 300 sec: 6790.7). Total num frames: 124567552. Throughput: 0: 1760.8. Samples: 26135432. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:31:02,935][41694] Avg episode reward: [(0, '4.518')] +[2024-11-08 03:31:04,830][42004] Updated weights for policy 0, policy_version 30416 (0.0031) +[2024-11-08 03:31:07,932][41694] Fps is (10 sec: 6963.6, 60 sec: 7014.9, 300 sec: 6803.5). Total num frames: 124604416. Throughput: 0: 1750.7. Samples: 26145814. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:31:07,934][41694] Avg episode reward: [(0, '4.573')] +[2024-11-08 03:31:11,088][42004] Updated weights for policy 0, policy_version 30426 (0.0028) +[2024-11-08 03:31:12,933][41694] Fps is (10 sec: 6553.4, 60 sec: 6894.8, 300 sec: 6775.7). Total num frames: 124633088. Throughput: 0: 1716.0. Samples: 26155576. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:31:12,937][41694] Avg episode reward: [(0, '4.497')] +[2024-11-08 03:31:18,926][41694] Fps is (10 sec: 5215.6, 60 sec: 6715.3, 300 sec: 6725.3). Total num frames: 124661760. Throughput: 0: 1663.7. Samples: 26160360. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:31:18,929][41694] Avg episode reward: [(0, '4.333')] +[2024-11-08 03:31:19,339][42004] Updated weights for policy 0, policy_version 30436 (0.0031) +[2024-11-08 03:31:22,931][41694] Fps is (10 sec: 5325.8, 60 sec: 6553.6, 300 sec: 6706.3). Total num frames: 124686336. Throughput: 0: 1594.0. Samples: 26166844. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:31:22,934][41694] Avg episode reward: [(0, '4.468')] +[2024-11-08 03:31:25,311][42004] Updated weights for policy 0, policy_version 30446 (0.0032) +[2024-11-08 03:31:27,931][41694] Fps is (10 sec: 6822.8, 60 sec: 6553.6, 300 sec: 6734.1). Total num frames: 124723200. Throughput: 0: 1759.8. Samples: 26177348. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:31:27,933][41694] Avg episode reward: [(0, '4.378')] +[2024-11-08 03:31:30,750][42004] Updated weights for policy 0, policy_version 30456 (0.0025) +[2024-11-08 03:31:32,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6553.6, 300 sec: 6734.1). Total num frames: 124760064. Throughput: 0: 1694.5. Samples: 26183078. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:31:32,933][41694] Avg episode reward: [(0, '4.568')] +[2024-11-08 03:31:36,702][42004] Updated weights for policy 0, policy_version 30466 (0.0046) +[2024-11-08 03:31:37,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6553.6, 300 sec: 6777.4). Total num frames: 124796928. Throughput: 0: 1669.3. Samples: 26193578. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:31:37,933][41694] Avg episode reward: [(0, '4.305')] +[2024-11-08 03:31:37,943][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000030468_124796928.pth... +[2024-11-08 03:31:38,275][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000030071_123170816.pth +[2024-11-08 03:31:42,204][42004] Updated weights for policy 0, policy_version 30476 (0.0039) +[2024-11-08 03:31:42,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6921.4, 300 sec: 6789.6). Total num frames: 124833792. Throughput: 0: 1648.6. Samples: 26204572. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:31:42,933][41694] Avg episode reward: [(0, '4.368')] +[2024-11-08 03:31:47,677][42004] Updated weights for policy 0, policy_version 30486 (0.0030) +[2024-11-08 03:31:47,932][41694] Fps is (10 sec: 7372.4, 60 sec: 6894.9, 300 sec: 6775.7). Total num frames: 124870656. Throughput: 0: 1662.4. Samples: 26210238. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:31:47,935][41694] Avg episode reward: [(0, '4.629')] +[2024-11-08 03:31:53,548][41694] Fps is (10 sec: 5787.3, 60 sec: 6554.6, 300 sec: 6720.1). Total num frames: 124895232. Throughput: 0: 1656.0. Samples: 26221356. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:31:53,550][41694] Avg episode reward: [(0, '4.631')] +[2024-11-08 03:31:55,570][42004] Updated weights for policy 0, policy_version 30496 (0.0025) +[2024-11-08 03:31:57,932][41694] Fps is (10 sec: 5734.6, 60 sec: 6553.7, 300 sec: 6720.2). Total num frames: 124928000. Throughput: 0: 1612.2. Samples: 26228122. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:31:57,933][41694] Avg episode reward: [(0, '4.517')] +[2024-11-08 03:32:00,923][42004] Updated weights for policy 0, policy_version 30506 (0.0039) +[2024-11-08 03:32:02,932][41694] Fps is (10 sec: 7420.6, 60 sec: 6622.0, 300 sec: 6734.1). Total num frames: 124964864. Throughput: 0: 1669.3. Samples: 26233820. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:32:02,934][41694] Avg episode reward: [(0, '4.499')] +[2024-11-08 03:32:07,127][42004] Updated weights for policy 0, policy_version 30516 (0.0036) +[2024-11-08 03:32:07,932][41694] Fps is (10 sec: 6962.9, 60 sec: 6553.6, 300 sec: 6720.2). Total num frames: 124997632. Throughput: 0: 1718.2. Samples: 26244164. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:32:07,934][41694] Avg episode reward: [(0, '4.471')] +[2024-11-08 03:32:12,627][42004] Updated weights for policy 0, policy_version 30526 (0.0022) +[2024-11-08 03:32:12,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6690.3, 300 sec: 6775.8). Total num frames: 125034496. Throughput: 0: 1724.3. Samples: 26254942. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:32:12,932][41694] Avg episode reward: [(0, '4.574')] +[2024-11-08 03:32:17,932][41694] Fps is (10 sec: 7373.1, 60 sec: 6941.7, 300 sec: 6789.6). Total num frames: 125071360. Throughput: 0: 1719.8. Samples: 26260470. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:32:17,933][41694] Avg episode reward: [(0, '4.403')] +[2024-11-08 03:32:18,055][42004] Updated weights for policy 0, policy_version 30536 (0.0024) +[2024-11-08 03:32:22,932][41694] Fps is (10 sec: 7372.5, 60 sec: 7031.4, 300 sec: 6789.6). Total num frames: 125108224. Throughput: 0: 1735.5. Samples: 26271676. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:32:22,934][41694] Avg episode reward: [(0, '4.626')] +[2024-11-08 03:32:23,673][42004] Updated weights for policy 0, policy_version 30546 (0.0023) +[2024-11-08 03:32:28,185][41694] Fps is (10 sec: 5592.8, 60 sec: 6730.0, 300 sec: 6728.3). Total num frames: 125128704. Throughput: 0: 1605.4. Samples: 26277220. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:32:28,190][41694] Avg episode reward: [(0, '4.572')] +[2024-11-08 03:32:31,807][42004] Updated weights for policy 0, policy_version 30556 (0.0027) +[2024-11-08 03:32:32,931][41694] Fps is (10 sec: 5325.0, 60 sec: 6690.1, 300 sec: 6748.0). Total num frames: 125161472. Throughput: 0: 1631.1. Samples: 26283636. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:32:32,933][41694] Avg episode reward: [(0, '4.643')] +[2024-11-08 03:32:37,169][42004] Updated weights for policy 0, policy_version 30566 (0.0020) +[2024-11-08 03:32:37,931][41694] Fps is (10 sec: 7564.5, 60 sec: 6758.4, 300 sec: 6775.8). Total num frames: 125202432. Throughput: 0: 1653.9. Samples: 26294764. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:32:37,933][41694] Avg episode reward: [(0, '4.373')] +[2024-11-08 03:32:42,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6690.2, 300 sec: 6761.9). Total num frames: 125235200. Throughput: 0: 1720.1. Samples: 26305524. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:32:42,933][41694] Avg episode reward: [(0, '4.449')] +[2024-11-08 03:32:42,980][42004] Updated weights for policy 0, policy_version 30576 (0.0031) +[2024-11-08 03:32:47,932][41694] Fps is (10 sec: 7372.3, 60 sec: 6758.4, 300 sec: 6817.4). Total num frames: 125276160. Throughput: 0: 1715.9. Samples: 26311036. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:32:47,935][41694] Avg episode reward: [(0, '4.270')] +[2024-11-08 03:32:48,347][42004] Updated weights for policy 0, policy_version 30586 (0.0023) +[2024-11-08 03:32:52,931][41694] Fps is (10 sec: 7782.4, 60 sec: 7035.5, 300 sec: 6803.5). Total num frames: 125313024. Throughput: 0: 1745.2. Samples: 26322698. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:32:52,933][41694] Avg episode reward: [(0, '4.623')] +[2024-11-08 03:32:53,587][42004] Updated weights for policy 0, policy_version 30596 (0.0034) +[2024-11-08 03:32:57,931][41694] Fps is (10 sec: 7373.3, 60 sec: 7031.5, 300 sec: 6803.5). Total num frames: 125349888. Throughput: 0: 1762.3. Samples: 26334246. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:32:57,933][41694] Avg episode reward: [(0, '4.510')] +[2024-11-08 03:32:59,072][42004] Updated weights for policy 0, policy_version 30606 (0.0025) +[2024-11-08 03:33:02,943][41694] Fps is (10 sec: 5727.8, 60 sec: 6757.1, 300 sec: 6761.6). Total num frames: 125370368. Throughput: 0: 1755.0. Samples: 26339466. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:33:02,945][41694] Avg episode reward: [(0, '4.346')] +[2024-11-08 03:33:07,711][42004] Updated weights for policy 0, policy_version 30616 (0.0039) +[2024-11-08 03:33:07,932][41694] Fps is (10 sec: 5324.4, 60 sec: 6758.4, 300 sec: 6761.9). Total num frames: 125403136. Throughput: 0: 1639.2. Samples: 26345442. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:33:07,935][41694] Avg episode reward: [(0, '4.683')] +[2024-11-08 03:33:12,933][41694] Fps is (10 sec: 6970.0, 60 sec: 6758.2, 300 sec: 6761.8). Total num frames: 125440000. Throughput: 0: 1762.0. Samples: 26356068. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:33:12,938][41694] Avg episode reward: [(0, '4.465')] +[2024-11-08 03:33:13,352][42004] Updated weights for policy 0, policy_version 30626 (0.0035) +[2024-11-08 03:33:17,932][41694] Fps is (10 sec: 6963.6, 60 sec: 6690.1, 300 sec: 6748.0). Total num frames: 125472768. Throughput: 0: 1727.7. Samples: 26361382. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:33:17,933][41694] Avg episode reward: [(0, '4.291')] +[2024-11-08 03:33:19,067][42004] Updated weights for policy 0, policy_version 30636 (0.0042) +[2024-11-08 03:33:22,931][41694] Fps is (10 sec: 7374.1, 60 sec: 6758.4, 300 sec: 6817.4). Total num frames: 125513728. Throughput: 0: 1729.2. Samples: 26372578. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:33:22,933][41694] Avg episode reward: [(0, '4.516')] +[2024-11-08 03:33:24,259][42004] Updated weights for policy 0, policy_version 30646 (0.0019) +[2024-11-08 03:33:27,932][41694] Fps is (10 sec: 8192.0, 60 sec: 7129.8, 300 sec: 6831.3). Total num frames: 125554688. Throughput: 0: 1748.6. Samples: 26384212. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:33:27,933][41694] Avg episode reward: [(0, '4.422')] +[2024-11-08 03:33:29,497][42004] Updated weights for policy 0, policy_version 30656 (0.0037) +[2024-11-08 03:33:32,931][41694] Fps is (10 sec: 7782.4, 60 sec: 7168.0, 300 sec: 6831.3). Total num frames: 125591552. Throughput: 0: 1755.3. Samples: 26390022. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:33:32,933][41694] Avg episode reward: [(0, '4.311')] +[2024-11-08 03:33:34,961][42004] Updated weights for policy 0, policy_version 30666 (0.0025) +[2024-11-08 03:33:37,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6826.6, 300 sec: 6775.8). Total num frames: 125612032. Throughput: 0: 1713.1. Samples: 26399788. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:33:37,934][41694] Avg episode reward: [(0, '4.360')] +[2024-11-08 03:33:37,952][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000030667_125612032.pth... +[2024-11-08 03:33:38,081][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000030269_123981824.pth +[2024-11-08 03:33:42,589][42004] Updated weights for policy 0, policy_version 30676 (0.0030) +[2024-11-08 03:33:42,932][41694] Fps is (10 sec: 5734.2, 60 sec: 6894.9, 300 sec: 6775.8). Total num frames: 125648896. Throughput: 0: 1654.5. Samples: 26408698. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:33:42,935][41694] Avg episode reward: [(0, '4.557')] +[2024-11-08 03:33:47,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6826.7, 300 sec: 6775.8). Total num frames: 125685760. Throughput: 0: 1657.2. Samples: 26414022. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:33:47,934][41694] Avg episode reward: [(0, '4.350')] +[2024-11-08 03:33:48,353][42004] Updated weights for policy 0, policy_version 30686 (0.0035) +[2024-11-08 03:33:52,933][41694] Fps is (10 sec: 6962.3, 60 sec: 6758.2, 300 sec: 6775.7). Total num frames: 125718528. Throughput: 0: 1751.3. Samples: 26424254. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:33:52,935][41694] Avg episode reward: [(0, '4.361')] +[2024-11-08 03:33:54,084][42004] Updated weights for policy 0, policy_version 30696 (0.0032) +[2024-11-08 03:33:57,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6826.7, 300 sec: 6845.2). Total num frames: 125759488. Throughput: 0: 1778.9. Samples: 26436116. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:33:57,933][41694] Avg episode reward: [(0, '4.422')] +[2024-11-08 03:33:59,178][42004] Updated weights for policy 0, policy_version 30706 (0.0026) +[2024-11-08 03:34:02,931][41694] Fps is (10 sec: 7783.7, 60 sec: 7101.1, 300 sec: 6845.2). Total num frames: 125796352. Throughput: 0: 1792.4. Samples: 26442038. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:34:02,934][41694] Avg episode reward: [(0, '4.296')] +[2024-11-08 03:34:04,852][42004] Updated weights for policy 0, policy_version 30716 (0.0026) +[2024-11-08 03:34:07,932][41694] Fps is (10 sec: 7372.6, 60 sec: 7168.0, 300 sec: 6831.3). Total num frames: 125833216. Throughput: 0: 1781.7. Samples: 26452754. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:34:07,934][41694] Avg episode reward: [(0, '4.453')] +[2024-11-08 03:34:12,914][42004] Updated weights for policy 0, policy_version 30726 (0.0024) +[2024-11-08 03:34:12,931][41694] Fps is (10 sec: 5734.5, 60 sec: 6895.1, 300 sec: 6789.6). Total num frames: 125853696. Throughput: 0: 1669.5. Samples: 26459338. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:34:12,934][41694] Avg episode reward: [(0, '4.384')] +[2024-11-08 03:34:17,932][41694] Fps is (10 sec: 5324.5, 60 sec: 6894.9, 300 sec: 6775.7). Total num frames: 125886464. Throughput: 0: 1664.1. Samples: 26464906. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:34:17,935][41694] Avg episode reward: [(0, '4.557')] +[2024-11-08 03:34:18,478][42004] Updated weights for policy 0, policy_version 30736 (0.0023) +[2024-11-08 03:34:22,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6826.7, 300 sec: 6775.8). Total num frames: 125923328. Throughput: 0: 1684.3. Samples: 26475582. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 03:34:22,933][41694] Avg episode reward: [(0, '4.575')] +[2024-11-08 03:34:24,481][42004] Updated weights for policy 0, policy_version 30746 (0.0032) +[2024-11-08 03:34:27,932][41694] Fps is (10 sec: 7373.2, 60 sec: 6758.4, 300 sec: 6819.1). Total num frames: 125960192. Throughput: 0: 1720.6. Samples: 26486126. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 03:34:27,934][41694] Avg episode reward: [(0, '4.268')] +[2024-11-08 03:34:29,999][42004] Updated weights for policy 0, policy_version 30756 (0.0026) +[2024-11-08 03:34:32,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6758.4, 300 sec: 6831.3). Total num frames: 125997056. Throughput: 0: 1727.4. Samples: 26491754. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 03:34:32,933][41694] Avg episode reward: [(0, '4.121')] +[2024-11-08 03:34:35,263][42004] Updated weights for policy 0, policy_version 30766 (0.0024) +[2024-11-08 03:34:37,941][41694] Fps is (10 sec: 7775.4, 60 sec: 7098.7, 300 sec: 6845.0). Total num frames: 126038016. Throughput: 0: 1761.0. Samples: 26503512. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:34:37,944][41694] Avg episode reward: [(0, '4.521')] +[2024-11-08 03:34:40,857][42004] Updated weights for policy 0, policy_version 30776 (0.0045) +[2024-11-08 03:34:42,931][41694] Fps is (10 sec: 7372.8, 60 sec: 7031.5, 300 sec: 6817.4). Total num frames: 126070784. Throughput: 0: 1735.9. Samples: 26514232. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:34:42,933][41694] Avg episode reward: [(0, '4.585')] +[2024-11-08 03:34:47,931][41694] Fps is (10 sec: 5329.7, 60 sec: 6758.4, 300 sec: 6775.8). Total num frames: 126091264. Throughput: 0: 1688.3. Samples: 26518010. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:34:47,933][41694] Avg episode reward: [(0, '4.634')] +[2024-11-08 03:34:49,039][42004] Updated weights for policy 0, policy_version 30786 (0.0033) +[2024-11-08 03:34:52,932][41694] Fps is (10 sec: 5324.7, 60 sec: 6758.6, 300 sec: 6761.9). Total num frames: 126124032. Throughput: 0: 1618.6. Samples: 26525592. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:34:52,934][41694] Avg episode reward: [(0, '4.305')] +[2024-11-08 03:34:55,202][42004] Updated weights for policy 0, policy_version 30796 (0.0028) +[2024-11-08 03:34:57,932][41694] Fps is (10 sec: 6553.0, 60 sec: 6621.8, 300 sec: 6761.9). Total num frames: 126156800. Throughput: 0: 1694.0. Samples: 26535568. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:34:57,934][41694] Avg episode reward: [(0, '4.557')] +[2024-11-08 03:35:01,011][42004] Updated weights for policy 0, policy_version 30806 (0.0033) +[2024-11-08 03:35:02,931][41694] Fps is (10 sec: 6963.4, 60 sec: 6621.9, 300 sec: 6813.0). Total num frames: 126193664. Throughput: 0: 1691.1. Samples: 26541006. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:35:02,933][41694] Avg episode reward: [(0, '4.413')] +[2024-11-08 03:35:06,467][42004] Updated weights for policy 0, policy_version 30816 (0.0027) +[2024-11-08 03:35:07,932][41694] Fps is (10 sec: 7373.3, 60 sec: 6621.9, 300 sec: 6817.4). Total num frames: 126230528. Throughput: 0: 1705.3. Samples: 26552320. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:35:07,933][41694] Avg episode reward: [(0, '4.294')] +[2024-11-08 03:35:11,920][42004] Updated weights for policy 0, policy_version 30826 (0.0026) +[2024-11-08 03:35:12,931][41694] Fps is (10 sec: 7372.7, 60 sec: 6894.9, 300 sec: 6831.3). Total num frames: 126267392. Throughput: 0: 1720.6. Samples: 26563552. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:35:12,933][41694] Avg episode reward: [(0, '4.387')] +[2024-11-08 03:35:17,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6895.0, 300 sec: 6803.5). Total num frames: 126300160. Throughput: 0: 1697.8. Samples: 26568154. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:35:17,933][41694] Avg episode reward: [(0, '4.679')] +[2024-11-08 03:35:18,166][42004] Updated weights for policy 0, policy_version 30836 (0.0034) +[2024-11-08 03:35:22,931][41694] Fps is (10 sec: 5324.8, 60 sec: 6621.9, 300 sec: 6748.0). Total num frames: 126320640. Throughput: 0: 1581.7. Samples: 26574674. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:35:22,933][41694] Avg episode reward: [(0, '4.542')] +[2024-11-08 03:35:26,391][42004] Updated weights for policy 0, policy_version 30846 (0.0026) +[2024-11-08 03:35:27,932][41694] Fps is (10 sec: 5324.7, 60 sec: 6553.6, 300 sec: 6734.1). Total num frames: 126353408. Throughput: 0: 1567.6. Samples: 26584774. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:35:27,933][41694] Avg episode reward: [(0, '4.319')] +[2024-11-08 03:35:32,695][42004] Updated weights for policy 0, policy_version 30856 (0.0039) +[2024-11-08 03:35:32,931][41694] Fps is (10 sec: 6553.6, 60 sec: 6485.3, 300 sec: 6720.2). Total num frames: 126386176. Throughput: 0: 1582.2. Samples: 26589208. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:35:32,937][41694] Avg episode reward: [(0, '4.415')] +[2024-11-08 03:35:37,780][42004] Updated weights for policy 0, policy_version 30866 (0.0027) +[2024-11-08 03:35:37,932][41694] Fps is (10 sec: 7372.6, 60 sec: 6486.3, 300 sec: 6808.4). Total num frames: 126427136. Throughput: 0: 1669.7. Samples: 26600728. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:35:37,934][41694] Avg episode reward: [(0, '4.403')] +[2024-11-08 03:35:37,945][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000030866_126427136.pth... +[2024-11-08 03:35:38,052][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000030468_124796928.pth +[2024-11-08 03:35:42,932][41694] Fps is (10 sec: 7782.1, 60 sec: 6553.6, 300 sec: 6803.5). Total num frames: 126464000. Throughput: 0: 1708.3. Samples: 26612442. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:35:42,934][41694] Avg episode reward: [(0, '4.399')] +[2024-11-08 03:35:42,996][42004] Updated weights for policy 0, policy_version 30876 (0.0025) +[2024-11-08 03:35:47,932][41694] Fps is (10 sec: 7373.0, 60 sec: 6826.7, 300 sec: 6789.7). Total num frames: 126500864. Throughput: 0: 1716.7. Samples: 26618256. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:35:47,934][41694] Avg episode reward: [(0, '4.459')] +[2024-11-08 03:35:48,729][42004] Updated weights for policy 0, policy_version 30886 (0.0035) +[2024-11-08 03:35:52,932][41694] Fps is (10 sec: 7373.0, 60 sec: 6894.9, 300 sec: 6789.7). Total num frames: 126537728. Throughput: 0: 1695.3. Samples: 26628610. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:35:52,934][41694] Avg episode reward: [(0, '4.346')] +[2024-11-08 03:35:56,712][42004] Updated weights for policy 0, policy_version 30896 (0.0026) +[2024-11-08 03:35:57,931][41694] Fps is (10 sec: 5734.5, 60 sec: 6690.2, 300 sec: 6748.0). Total num frames: 126558208. Throughput: 0: 1598.6. Samples: 26635488. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:35:57,933][41694] Avg episode reward: [(0, '4.269')] +[2024-11-08 03:36:02,931][41694] Fps is (10 sec: 4915.3, 60 sec: 6553.6, 300 sec: 6720.2). Total num frames: 126586880. Throughput: 0: 1609.7. Samples: 26640592. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:36:02,934][41694] Avg episode reward: [(0, '4.537')] +[2024-11-08 03:36:03,199][42004] Updated weights for policy 0, policy_version 30906 (0.0027) +[2024-11-08 03:36:07,932][41694] Fps is (10 sec: 6144.0, 60 sec: 6485.3, 300 sec: 6734.1). Total num frames: 126619648. Throughput: 0: 1666.0. Samples: 26649646. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:36:07,933][41694] Avg episode reward: [(0, '4.425')] +[2024-11-08 03:36:09,358][42004] Updated weights for policy 0, policy_version 30916 (0.0034) +[2024-11-08 03:36:12,932][41694] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6770.8). Total num frames: 126652416. Throughput: 0: 1668.0. Samples: 26659832. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:36:12,934][41694] Avg episode reward: [(0, '4.273')] +[2024-11-08 03:36:15,287][42004] Updated weights for policy 0, policy_version 30926 (0.0039) +[2024-11-08 03:36:17,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6553.6, 300 sec: 6803.5). Total num frames: 126693376. Throughput: 0: 1689.8. Samples: 26665250. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:36:17,936][41694] Avg episode reward: [(0, '4.327')] +[2024-11-08 03:36:20,564][42004] Updated weights for policy 0, policy_version 30936 (0.0024) +[2024-11-08 03:36:22,933][41694] Fps is (10 sec: 7781.6, 60 sec: 6826.5, 300 sec: 6803.5). Total num frames: 126730240. Throughput: 0: 1691.4. Samples: 26676840. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:36:22,934][41694] Avg episode reward: [(0, '4.336')] +[2024-11-08 03:36:26,130][42004] Updated weights for policy 0, policy_version 30946 (0.0036) +[2024-11-08 03:36:27,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6894.9, 300 sec: 6803.5). Total num frames: 126767104. Throughput: 0: 1679.0. Samples: 26687998. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:36:27,934][41694] Avg episode reward: [(0, '4.410')] +[2024-11-08 03:36:32,932][41694] Fps is (10 sec: 5325.3, 60 sec: 6621.9, 300 sec: 6734.1). Total num frames: 126783488. Throughput: 0: 1587.2. Samples: 26689682. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:36:32,933][41694] Avg episode reward: [(0, '4.629')] +[2024-11-08 03:36:34,523][42004] Updated weights for policy 0, policy_version 30956 (0.0034) +[2024-11-08 03:36:37,932][41694] Fps is (10 sec: 4505.6, 60 sec: 6417.1, 300 sec: 6706.3). Total num frames: 126812160. Throughput: 0: 1558.2. Samples: 26698728. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:36:37,933][41694] Avg episode reward: [(0, '4.518')] +[2024-11-08 03:36:40,990][42004] Updated weights for policy 0, policy_version 30966 (0.0021) +[2024-11-08 03:36:42,932][41694] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6706.3). Total num frames: 126849024. Throughput: 0: 1632.1. Samples: 26708934. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:36:42,933][41694] Avg episode reward: [(0, '4.480')] +[2024-11-08 03:36:46,205][42004] Updated weights for policy 0, policy_version 30976 (0.0032) +[2024-11-08 03:36:47,932][41694] Fps is (10 sec: 7782.3, 60 sec: 6485.3, 300 sec: 6776.0). Total num frames: 126889984. Throughput: 0: 1644.6. Samples: 26714598. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:36:47,934][41694] Avg episode reward: [(0, '4.366')] +[2024-11-08 03:36:51,398][42004] Updated weights for policy 0, policy_version 30986 (0.0028) +[2024-11-08 03:36:52,931][41694] Fps is (10 sec: 7782.5, 60 sec: 6485.3, 300 sec: 6775.8). Total num frames: 126926848. Throughput: 0: 1706.8. Samples: 26726454. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:36:52,933][41694] Avg episode reward: [(0, '4.310')] +[2024-11-08 03:36:56,787][42004] Updated weights for policy 0, policy_version 30996 (0.0031) +[2024-11-08 03:36:57,936][41694] Fps is (10 sec: 7779.2, 60 sec: 6826.2, 300 sec: 6789.5). Total num frames: 126967808. Throughput: 0: 1736.2. Samples: 26737966. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:36:57,939][41694] Avg episode reward: [(0, '4.534')] +[2024-11-08 03:37:02,509][42004] Updated weights for policy 0, policy_version 31006 (0.0029) +[2024-11-08 03:37:05,084][41694] Fps is (10 sec: 6066.7, 60 sec: 6656.1, 300 sec: 6740.5). Total num frames: 127000576. Throughput: 0: 1650.2. Samples: 26743060. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:37:05,086][41694] Avg episode reward: [(0, '4.419')] +[2024-11-08 03:37:07,932][41694] Fps is (10 sec: 5327.0, 60 sec: 6690.1, 300 sec: 6734.1). Total num frames: 127021056. Throughput: 0: 1618.1. Samples: 26749652. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:37:07,934][41694] Avg episode reward: [(0, '4.608')] +[2024-11-08 03:37:11,206][42004] Updated weights for policy 0, policy_version 31016 (0.0038) +[2024-11-08 03:37:12,936][41694] Fps is (10 sec: 6259.9, 60 sec: 6621.3, 300 sec: 6706.2). Total num frames: 127049728. Throughput: 0: 1580.4. Samples: 26759122. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:37:12,938][41694] Avg episode reward: [(0, '4.579')] +[2024-11-08 03:37:17,059][42004] Updated weights for policy 0, policy_version 31026 (0.0034) +[2024-11-08 03:37:17,931][41694] Fps is (10 sec: 6553.7, 60 sec: 6553.6, 300 sec: 6706.3). Total num frames: 127086592. Throughput: 0: 1654.3. Samples: 26764126. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:37:17,933][41694] Avg episode reward: [(0, '4.544')] +[2024-11-08 03:37:22,235][42004] Updated weights for policy 0, policy_version 31036 (0.0054) +[2024-11-08 03:37:22,932][41694] Fps is (10 sec: 7786.1, 60 sec: 6622.0, 300 sec: 6781.6). Total num frames: 127127552. Throughput: 0: 1714.8. Samples: 26775894. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:37:22,933][41694] Avg episode reward: [(0, '4.602')] +[2024-11-08 03:37:27,434][42004] Updated weights for policy 0, policy_version 31046 (0.0026) +[2024-11-08 03:37:27,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6621.9, 300 sec: 6789.6). Total num frames: 127164416. Throughput: 0: 1752.5. Samples: 26787796. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:37:27,933][41694] Avg episode reward: [(0, '4.480')] +[2024-11-08 03:37:32,841][42004] Updated weights for policy 0, policy_version 31056 (0.0026) +[2024-11-08 03:37:32,932][41694] Fps is (10 sec: 7782.3, 60 sec: 7031.5, 300 sec: 6789.6). Total num frames: 127205376. Throughput: 0: 1755.1. Samples: 26793578. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:37:32,934][41694] Avg episode reward: [(0, '4.482')] +[2024-11-08 03:37:39,693][41694] Fps is (10 sec: 5920.3, 60 sec: 6830.9, 300 sec: 6735.5). Total num frames: 127234048. Throughput: 0: 1672.4. Samples: 26804658. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:37:39,695][41694] Avg episode reward: [(0, '4.341')] +[2024-11-08 03:37:39,726][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000031064_127238144.pth... +[2024-11-08 03:37:39,832][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000030667_125612032.pth +[2024-11-08 03:37:40,964][42004] Updated weights for policy 0, policy_version 31066 (0.0063) +[2024-11-08 03:37:42,933][41694] Fps is (10 sec: 4914.5, 60 sec: 6758.2, 300 sec: 6706.3). Total num frames: 127254528. Throughput: 0: 1622.4. Samples: 26810970. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 03:37:42,936][41694] Avg episode reward: [(0, '4.314')] +[2024-11-08 03:37:47,543][42004] Updated weights for policy 0, policy_version 31076 (0.0040) +[2024-11-08 03:37:47,932][41694] Fps is (10 sec: 6463.4, 60 sec: 6621.9, 300 sec: 6692.4). Total num frames: 127287296. Throughput: 0: 1681.3. Samples: 26815098. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 03:37:47,933][41694] Avg episode reward: [(0, '4.432')] +[2024-11-08 03:37:52,732][42004] Updated weights for policy 0, policy_version 31086 (0.0034) +[2024-11-08 03:37:52,932][41694] Fps is (10 sec: 7373.9, 60 sec: 6690.1, 300 sec: 6706.3). Total num frames: 127328256. Throughput: 0: 1699.0. Samples: 26826106. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 03:37:52,933][41694] Avg episode reward: [(0, '4.457')] +[2024-11-08 03:37:57,863][42004] Updated weights for policy 0, policy_version 31096 (0.0023) +[2024-11-08 03:37:57,931][41694] Fps is (10 sec: 8192.1, 60 sec: 6690.6, 300 sec: 6776.0). Total num frames: 127369216. Throughput: 0: 1755.8. Samples: 26838124. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 03:37:57,933][41694] Avg episode reward: [(0, '4.447')] +[2024-11-08 03:38:02,932][41694] Fps is (10 sec: 7782.4, 60 sec: 7009.9, 300 sec: 6789.7). Total num frames: 127406080. Throughput: 0: 1772.7. Samples: 26843896. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:38:02,933][41694] Avg episode reward: [(0, '4.440')] +[2024-11-08 03:38:03,309][42004] Updated weights for policy 0, policy_version 31106 (0.0028) +[2024-11-08 03:38:07,932][41694] Fps is (10 sec: 7372.6, 60 sec: 7031.5, 300 sec: 6789.7). Total num frames: 127442944. Throughput: 0: 1766.7. Samples: 26855394. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:38:07,934][41694] Avg episode reward: [(0, '4.544')] +[2024-11-08 03:38:08,509][42004] Updated weights for policy 0, policy_version 31116 (0.0025) +[2024-11-08 03:38:14,222][41694] Fps is (10 sec: 6167.0, 60 sec: 6950.7, 300 sec: 6760.1). Total num frames: 127475712. Throughput: 0: 1587.0. Samples: 26861262. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:38:14,224][41694] Avg episode reward: [(0, '4.531')] +[2024-11-08 03:38:16,556][42004] Updated weights for policy 0, policy_version 31126 (0.0035) +[2024-11-08 03:38:17,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6894.9, 300 sec: 6734.1). Total num frames: 127500288. Throughput: 0: 1658.4. Samples: 26868206. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:38:17,934][41694] Avg episode reward: [(0, '4.595')] +[2024-11-08 03:38:22,893][42004] Updated weights for policy 0, policy_version 31136 (0.0036) +[2024-11-08 03:38:22,932][41694] Fps is (10 sec: 6584.4, 60 sec: 6758.4, 300 sec: 6706.3). Total num frames: 127533056. Throughput: 0: 1674.3. Samples: 26877052. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:38:22,934][41694] Avg episode reward: [(0, '4.456')] +[2024-11-08 03:38:27,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6758.4, 300 sec: 6706.3). Total num frames: 127569920. Throughput: 0: 1726.1. Samples: 26888642. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:38:27,934][41694] Avg episode reward: [(0, '4.501')] +[2024-11-08 03:38:28,076][42004] Updated weights for policy 0, policy_version 31146 (0.0031) +[2024-11-08 03:38:32,931][41694] Fps is (10 sec: 7782.5, 60 sec: 6758.4, 300 sec: 6775.8). Total num frames: 127610880. Throughput: 0: 1771.6. Samples: 26894818. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:38:32,932][41694] Avg episode reward: [(0, '4.498')] +[2024-11-08 03:38:33,221][42004] Updated weights for policy 0, policy_version 31156 (0.0030) +[2024-11-08 03:38:37,931][41694] Fps is (10 sec: 8192.2, 60 sec: 7173.8, 300 sec: 6789.6). Total num frames: 127651840. Throughput: 0: 1792.7. Samples: 26906776. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:38:37,934][41694] Avg episode reward: [(0, '4.531')] +[2024-11-08 03:38:38,352][42004] Updated weights for policy 0, policy_version 31166 (0.0028) +[2024-11-08 03:38:42,932][41694] Fps is (10 sec: 7782.2, 60 sec: 7236.4, 300 sec: 6789.6). Total num frames: 127688704. Throughput: 0: 1784.2. Samples: 26918412. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:38:42,934][41694] Avg episode reward: [(0, '4.307')] +[2024-11-08 03:38:43,696][42004] Updated weights for policy 0, policy_version 31176 (0.0031) +[2024-11-08 03:38:48,731][41694] Fps is (10 sec: 6068.1, 60 sec: 7073.7, 300 sec: 6757.5). Total num frames: 127717376. Throughput: 0: 1749.5. Samples: 26924022. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:38:48,733][41694] Avg episode reward: [(0, '4.471')] +[2024-11-08 03:38:52,000][42004] Updated weights for policy 0, policy_version 31186 (0.0025) +[2024-11-08 03:38:52,933][41694] Fps is (10 sec: 5324.2, 60 sec: 6894.8, 300 sec: 6720.2). Total num frames: 127741952. Throughput: 0: 1666.5. Samples: 26930390. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:38:52,935][41694] Avg episode reward: [(0, '4.423')] +[2024-11-08 03:38:57,860][42004] Updated weights for policy 0, policy_version 31196 (0.0040) +[2024-11-08 03:38:57,932][41694] Fps is (10 sec: 6678.2, 60 sec: 6826.6, 300 sec: 6720.2). Total num frames: 127778816. Throughput: 0: 1812.1. Samples: 26940468. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:38:57,934][41694] Avg episode reward: [(0, '4.487')] +[2024-11-08 03:39:02,931][41694] Fps is (10 sec: 6964.1, 60 sec: 6758.4, 300 sec: 6706.3). Total num frames: 127811584. Throughput: 0: 1728.1. Samples: 26945970. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:39:02,933][41694] Avg episode reward: [(0, '4.557')] +[2024-11-08 03:39:03,648][42004] Updated weights for policy 0, policy_version 31206 (0.0027) +[2024-11-08 03:39:07,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6826.7, 300 sec: 6775.8). Total num frames: 127852544. Throughput: 0: 1773.3. Samples: 26956848. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:39:07,933][41694] Avg episode reward: [(0, '4.396')] +[2024-11-08 03:39:08,932][42004] Updated weights for policy 0, policy_version 31216 (0.0029) +[2024-11-08 03:39:12,932][41694] Fps is (10 sec: 7782.3, 60 sec: 7046.5, 300 sec: 6789.7). Total num frames: 127889408. Throughput: 0: 1768.1. Samples: 26968208. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:39:12,934][41694] Avg episode reward: [(0, '4.336')] +[2024-11-08 03:39:14,549][42004] Updated weights for policy 0, policy_version 31226 (0.0030) +[2024-11-08 03:39:17,932][41694] Fps is (10 sec: 7372.8, 60 sec: 7099.8, 300 sec: 6789.6). Total num frames: 127926272. Throughput: 0: 1756.2. Samples: 26973848. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:39:17,933][41694] Avg episode reward: [(0, '4.666')] +[2024-11-08 03:39:19,973][42004] Updated weights for policy 0, policy_version 31236 (0.0028) +[2024-11-08 03:39:22,932][41694] Fps is (10 sec: 6143.8, 60 sec: 6963.2, 300 sec: 6748.0). Total num frames: 127950848. Throughput: 0: 1742.4. Samples: 26985184. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:39:22,934][41694] Avg episode reward: [(0, '4.558')] +[2024-11-08 03:39:27,570][42004] Updated weights for policy 0, policy_version 31246 (0.0043) +[2024-11-08 03:39:27,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6894.9, 300 sec: 6734.1). Total num frames: 127983616. Throughput: 0: 1648.2. Samples: 26992580. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:39:27,933][41694] Avg episode reward: [(0, '4.445')] +[2024-11-08 03:39:32,824][42004] Updated weights for policy 0, policy_version 31256 (0.0024) +[2024-11-08 03:39:32,931][41694] Fps is (10 sec: 7373.3, 60 sec: 6894.9, 300 sec: 6734.3). Total num frames: 128024576. Throughput: 0: 1666.7. Samples: 26997692. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:39:32,932][41694] Avg episode reward: [(0, '4.294')] +[2024-11-08 03:39:37,932][41694] Fps is (10 sec: 7782.2, 60 sec: 6826.6, 300 sec: 6748.0). Total num frames: 128061440. Throughput: 0: 1766.5. Samples: 27009880. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:39:37,935][41694] Avg episode reward: [(0, '4.407')] +[2024-11-08 03:39:38,032][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000031266_128065536.pth... +[2024-11-08 03:39:38,038][42004] Updated weights for policy 0, policy_version 31266 (0.0024) +[2024-11-08 03:39:38,121][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000030866_126427136.pth +[2024-11-08 03:39:42,931][41694] Fps is (10 sec: 7782.3, 60 sec: 6894.9, 300 sec: 6817.4). Total num frames: 128102400. Throughput: 0: 1806.9. Samples: 27021780. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:39:42,933][41694] Avg episode reward: [(0, '4.576')] +[2024-11-08 03:39:43,205][42004] Updated weights for policy 0, policy_version 31276 (0.0031) +[2024-11-08 03:39:47,932][41694] Fps is (10 sec: 8192.0, 60 sec: 7195.6, 300 sec: 6845.2). Total num frames: 128143360. Throughput: 0: 1815.0. Samples: 27027646. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:39:47,933][41694] Avg episode reward: [(0, '4.634')] +[2024-11-08 03:39:48,448][42004] Updated weights for policy 0, policy_version 31286 (0.0025) +[2024-11-08 03:39:52,931][41694] Fps is (10 sec: 7782.4, 60 sec: 7304.7, 300 sec: 6859.1). Total num frames: 128180224. Throughput: 0: 1831.5. Samples: 27039266. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:39:52,935][41694] Avg episode reward: [(0, '4.521')] +[2024-11-08 03:39:53,822][42004] Updated weights for policy 0, policy_version 31296 (0.0032) +[2024-11-08 03:39:57,933][41694] Fps is (10 sec: 5324.1, 60 sec: 6963.0, 300 sec: 6789.6). Total num frames: 128196608. Throughput: 0: 1736.0. Samples: 27046332. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:39:57,936][41694] Avg episode reward: [(0, '4.525')] +[2024-11-08 03:40:02,931][41694] Fps is (10 sec: 4505.6, 60 sec: 6894.9, 300 sec: 6761.9). Total num frames: 128225280. Throughput: 0: 1689.0. Samples: 27049854. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:40:02,933][41694] Avg episode reward: [(0, '4.447')] +[2024-11-08 03:40:03,151][42004] Updated weights for policy 0, policy_version 31306 (0.0029) +[2024-11-08 03:40:07,931][41694] Fps is (10 sec: 6964.3, 60 sec: 6894.9, 300 sec: 6775.8). Total num frames: 128266240. Throughput: 0: 1668.2. Samples: 27060254. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:40:07,933][41694] Avg episode reward: [(0, '4.544')] +[2024-11-08 03:40:08,405][42004] Updated weights for policy 0, policy_version 31316 (0.0023) +[2024-11-08 03:40:12,932][41694] Fps is (10 sec: 7782.2, 60 sec: 6894.9, 300 sec: 6789.6). Total num frames: 128303104. Throughput: 0: 1766.8. Samples: 27072088. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:40:12,936][41694] Avg episode reward: [(0, '4.236')] +[2024-11-08 03:40:13,801][42004] Updated weights for policy 0, policy_version 31326 (0.0027) +[2024-11-08 03:40:17,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6826.7, 300 sec: 6831.3). Total num frames: 128335872. Throughput: 0: 1767.9. Samples: 27077248. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:40:17,935][41694] Avg episode reward: [(0, '4.437')] +[2024-11-08 03:40:19,832][42004] Updated weights for policy 0, policy_version 31336 (0.0041) +[2024-11-08 03:40:22,932][41694] Fps is (10 sec: 6962.7, 60 sec: 7031.4, 300 sec: 6845.2). Total num frames: 128372736. Throughput: 0: 1735.9. Samples: 27087998. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:40:22,935][41694] Avg episode reward: [(0, '4.603')] +[2024-11-08 03:40:25,308][42004] Updated weights for policy 0, policy_version 31346 (0.0034) +[2024-11-08 03:40:27,932][41694] Fps is (10 sec: 7372.7, 60 sec: 7099.7, 300 sec: 6859.1). Total num frames: 128409600. Throughput: 0: 1713.5. Samples: 27098890. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:40:27,933][41694] Avg episode reward: [(0, '4.205')] +[2024-11-08 03:40:32,932][41694] Fps is (10 sec: 5325.2, 60 sec: 6690.1, 300 sec: 6775.8). Total num frames: 128425984. Throughput: 0: 1686.7. Samples: 27103546. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:40:32,936][41694] Avg episode reward: [(0, '4.320')] +[2024-11-08 03:40:33,680][42004] Updated weights for policy 0, policy_version 31356 (0.0033) +[2024-11-08 03:40:37,931][41694] Fps is (10 sec: 5324.9, 60 sec: 6690.2, 300 sec: 6775.8). Total num frames: 128462848. Throughput: 0: 1569.6. Samples: 27109900. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:40:37,935][41694] Avg episode reward: [(0, '4.577')] +[2024-11-08 03:40:39,429][42004] Updated weights for policy 0, policy_version 31366 (0.0025) +[2024-11-08 03:40:42,931][41694] Fps is (10 sec: 7373.0, 60 sec: 6621.9, 300 sec: 6775.8). Total num frames: 128499712. Throughput: 0: 1667.4. Samples: 27121360. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:40:42,933][41694] Avg episode reward: [(0, '4.268')] +[2024-11-08 03:40:44,744][42004] Updated weights for policy 0, policy_version 31376 (0.0037) +[2024-11-08 03:40:47,932][41694] Fps is (10 sec: 7782.3, 60 sec: 6621.9, 300 sec: 6789.6). Total num frames: 128540672. Throughput: 0: 1716.4. Samples: 27127092. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:40:47,933][41694] Avg episode reward: [(0, '4.333')] +[2024-11-08 03:40:50,000][42004] Updated weights for policy 0, policy_version 31386 (0.0037) +[2024-11-08 03:40:52,932][41694] Fps is (10 sec: 7782.2, 60 sec: 6621.8, 300 sec: 6845.2). Total num frames: 128577536. Throughput: 0: 1745.7. Samples: 27138812. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:40:52,934][41694] Avg episode reward: [(0, '4.574')] +[2024-11-08 03:40:55,381][42004] Updated weights for policy 0, policy_version 31396 (0.0024) +[2024-11-08 03:40:57,932][41694] Fps is (10 sec: 7372.6, 60 sec: 6963.3, 300 sec: 6872.9). Total num frames: 128614400. Throughput: 0: 1734.6. Samples: 27150144. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:40:57,935][41694] Avg episode reward: [(0, '4.534')] +[2024-11-08 03:41:01,034][42004] Updated weights for policy 0, policy_version 31406 (0.0027) +[2024-11-08 03:41:02,933][41694] Fps is (10 sec: 7371.7, 60 sec: 7099.5, 300 sec: 6886.8). Total num frames: 128651264. Throughput: 0: 1739.1. Samples: 27155512. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:41:02,936][41694] Avg episode reward: [(0, '4.298')] +[2024-11-08 03:41:07,934][41694] Fps is (10 sec: 4914.2, 60 sec: 6621.6, 300 sec: 6817.4). Total num frames: 128663552. Throughput: 0: 1648.8. Samples: 27162198. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:41:07,936][41694] Avg episode reward: [(0, '4.365')] +[2024-11-08 03:41:10,058][42004] Updated weights for policy 0, policy_version 31416 (0.0033) +[2024-11-08 03:41:12,932][41694] Fps is (10 sec: 4506.3, 60 sec: 6553.6, 300 sec: 6789.6). Total num frames: 128696320. Throughput: 0: 1583.2. Samples: 27170132. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:41:12,933][41694] Avg episode reward: [(0, '4.426')] +[2024-11-08 03:41:16,392][42004] Updated weights for policy 0, policy_version 31426 (0.0037) +[2024-11-08 03:41:17,932][41694] Fps is (10 sec: 6555.0, 60 sec: 6553.6, 300 sec: 6775.8). Total num frames: 128729088. Throughput: 0: 1591.7. Samples: 27175172. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:41:17,933][41694] Avg episode reward: [(0, '4.677')] +[2024-11-08 03:41:22,183][42004] Updated weights for policy 0, policy_version 31436 (0.0031) +[2024-11-08 03:41:22,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6553.7, 300 sec: 6775.8). Total num frames: 128765952. Throughput: 0: 1686.3. Samples: 27185784. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:41:22,932][41694] Avg episode reward: [(0, '4.440')] +[2024-11-08 03:41:27,392][42004] Updated weights for policy 0, policy_version 31446 (0.0031) +[2024-11-08 03:41:27,934][41694] Fps is (10 sec: 7780.7, 60 sec: 6621.6, 300 sec: 6859.0). Total num frames: 128806912. Throughput: 0: 1694.3. Samples: 27197606. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:41:27,936][41694] Avg episode reward: [(0, '4.466')] +[2024-11-08 03:41:32,717][42004] Updated weights for policy 0, policy_version 31456 (0.0027) +[2024-11-08 03:41:32,931][41694] Fps is (10 sec: 7782.4, 60 sec: 6963.2, 300 sec: 6886.8). Total num frames: 128843776. Throughput: 0: 1690.0. Samples: 27203140. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:41:32,936][41694] Avg episode reward: [(0, '4.527')] +[2024-11-08 03:41:37,932][41694] Fps is (10 sec: 6964.6, 60 sec: 6894.9, 300 sec: 6872.9). Total num frames: 128876544. Throughput: 0: 1682.4. Samples: 27214522. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:41:37,934][41694] Avg episode reward: [(0, '4.631')] +[2024-11-08 03:41:37,971][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000031465_128880640.pth... +[2024-11-08 03:41:38,121][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000031064_127238144.pth +[2024-11-08 03:41:38,717][42004] Updated weights for policy 0, policy_version 31466 (0.0032) +[2024-11-08 03:41:42,932][41694] Fps is (10 sec: 4915.1, 60 sec: 6553.6, 300 sec: 6789.6). Total num frames: 128892928. Throughput: 0: 1560.7. Samples: 27220376. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:41:42,934][41694] Avg episode reward: [(0, '4.570')] +[2024-11-08 03:41:47,088][42004] Updated weights for policy 0, policy_version 31476 (0.0030) +[2024-11-08 03:41:47,932][41694] Fps is (10 sec: 5325.0, 60 sec: 6485.3, 300 sec: 6789.6). Total num frames: 128929792. Throughput: 0: 1544.7. Samples: 27225020. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:41:47,933][41694] Avg episode reward: [(0, '4.340')] +[2024-11-08 03:41:52,506][42004] Updated weights for policy 0, policy_version 31486 (0.0024) +[2024-11-08 03:41:52,932][41694] Fps is (10 sec: 7372.9, 60 sec: 6485.3, 300 sec: 6775.9). Total num frames: 128966656. Throughput: 0: 1647.4. Samples: 27236326. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:41:52,934][41694] Avg episode reward: [(0, '4.637')] +[2024-11-08 03:41:57,851][42004] Updated weights for policy 0, policy_version 31496 (0.0036) +[2024-11-08 03:41:57,931][41694] Fps is (10 sec: 7782.4, 60 sec: 6553.6, 300 sec: 6853.5). Total num frames: 129007616. Throughput: 0: 1727.2. Samples: 27247858. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:41:57,934][41694] Avg episode reward: [(0, '4.524')] +[2024-11-08 03:42:02,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6553.8, 300 sec: 6859.1). Total num frames: 129044480. Throughput: 0: 1743.8. Samples: 27253642. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:42:02,934][41694] Avg episode reward: [(0, '4.280')] +[2024-11-08 03:42:03,359][42004] Updated weights for policy 0, policy_version 31506 (0.0026) +[2024-11-08 03:42:07,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6963.5, 300 sec: 6886.9). Total num frames: 129081344. Throughput: 0: 1748.7. Samples: 27264476. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:42:07,933][41694] Avg episode reward: [(0, '4.307')] +[2024-11-08 03:42:08,837][42004] Updated weights for policy 0, policy_version 31516 (0.0032) +[2024-11-08 03:42:12,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6963.2, 300 sec: 6872.9). Total num frames: 129114112. Throughput: 0: 1718.1. Samples: 27274916. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:42:12,934][41694] Avg episode reward: [(0, '4.638')] +[2024-11-08 03:42:17,719][42004] Updated weights for policy 0, policy_version 31526 (0.0041) +[2024-11-08 03:42:17,931][41694] Fps is (10 sec: 4915.2, 60 sec: 6690.2, 300 sec: 6789.6). Total num frames: 129130496. Throughput: 0: 1642.4. Samples: 27277046. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:42:17,933][41694] Avg episode reward: [(0, '4.551')] +[2024-11-08 03:42:22,932][41694] Fps is (10 sec: 5324.6, 60 sec: 6690.1, 300 sec: 6789.6). Total num frames: 129167360. Throughput: 0: 1593.2. Samples: 27286214. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:42:22,934][41694] Avg episode reward: [(0, '4.443')] +[2024-11-08 03:42:23,169][42004] Updated weights for policy 0, policy_version 31536 (0.0034) +[2024-11-08 03:42:27,931][41694] Fps is (10 sec: 7782.4, 60 sec: 6690.4, 300 sec: 6789.6). Total num frames: 129208320. Throughput: 0: 1722.2. Samples: 27297876. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:42:27,933][41694] Avg episode reward: [(0, '4.248')] +[2024-11-08 03:42:28,420][42004] Updated weights for policy 0, policy_version 31546 (0.0032) +[2024-11-08 03:42:32,932][41694] Fps is (10 sec: 7782.6, 60 sec: 6690.1, 300 sec: 6858.4). Total num frames: 129245184. Throughput: 0: 1747.2. Samples: 27303642. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:42:32,933][41694] Avg episode reward: [(0, '4.417')] +[2024-11-08 03:42:33,493][42004] Updated weights for policy 0, policy_version 31556 (0.0022) +[2024-11-08 03:42:37,932][41694] Fps is (10 sec: 7781.8, 60 sec: 6826.6, 300 sec: 6886.9). Total num frames: 129286144. Throughput: 0: 1758.8. Samples: 27315472. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:42:37,935][41694] Avg episode reward: [(0, '4.371')] +[2024-11-08 03:42:39,113][42004] Updated weights for policy 0, policy_version 31566 (0.0030) +[2024-11-08 03:42:42,931][41694] Fps is (10 sec: 6963.3, 60 sec: 7031.5, 300 sec: 6873.0). Total num frames: 129314816. Throughput: 0: 1731.8. Samples: 27325788. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:42:42,933][41694] Avg episode reward: [(0, '4.575')] +[2024-11-08 03:42:45,406][42004] Updated weights for policy 0, policy_version 31576 (0.0034) +[2024-11-08 03:42:50,145][41694] Fps is (10 sec: 5030.7, 60 sec: 6715.4, 300 sec: 6794.2). Total num frames: 129347584. Throughput: 0: 1633.3. Samples: 27330754. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:42:50,148][41694] Avg episode reward: [(0, '4.352')] +[2024-11-08 03:42:52,932][41694] Fps is (10 sec: 5324.7, 60 sec: 6690.1, 300 sec: 6775.7). Total num frames: 129368064. Throughput: 0: 1602.2. Samples: 27336576. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:42:52,933][41694] Avg episode reward: [(0, '4.456')] +[2024-11-08 03:42:53,828][42004] Updated weights for policy 0, policy_version 31586 (0.0033) +[2024-11-08 03:42:57,932][41694] Fps is (10 sec: 7364.7, 60 sec: 6621.9, 300 sec: 6775.8). Total num frames: 129404928. Throughput: 0: 1619.8. Samples: 27347806. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:42:57,935][41694] Avg episode reward: [(0, '4.506')] +[2024-11-08 03:42:59,026][42004] Updated weights for policy 0, policy_version 31596 (0.0027) +[2024-11-08 03:43:02,932][41694] Fps is (10 sec: 7782.5, 60 sec: 6690.1, 300 sec: 6789.6). Total num frames: 129445888. Throughput: 0: 1704.8. Samples: 27353764. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:43:02,933][41694] Avg episode reward: [(0, '4.551')] +[2024-11-08 03:43:04,536][42004] Updated weights for policy 0, policy_version 31606 (0.0033) +[2024-11-08 03:43:07,932][41694] Fps is (10 sec: 7782.5, 60 sec: 6690.1, 300 sec: 6833.4). Total num frames: 129482752. Throughput: 0: 1751.5. Samples: 27365030. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:43:07,933][41694] Avg episode reward: [(0, '4.236')] +[2024-11-08 03:43:09,670][42004] Updated weights for policy 0, policy_version 31616 (0.0034) +[2024-11-08 03:43:12,932][41694] Fps is (10 sec: 7782.0, 60 sec: 6826.6, 300 sec: 6859.1). Total num frames: 129523712. Throughput: 0: 1754.7. Samples: 27376838. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:43:12,935][41694] Avg episode reward: [(0, '4.275')] +[2024-11-08 03:43:15,097][42004] Updated weights for policy 0, policy_version 31626 (0.0028) +[2024-11-08 03:43:17,931][41694] Fps is (10 sec: 7372.8, 60 sec: 7099.7, 300 sec: 6859.1). Total num frames: 129556480. Throughput: 0: 1751.6. Samples: 27382466. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:43:17,935][41694] Avg episode reward: [(0, '4.429')] +[2024-11-08 03:43:21,113][42004] Updated weights for policy 0, policy_version 31636 (0.0042) +[2024-11-08 03:43:24,814][41694] Fps is (10 sec: 5171.0, 60 sec: 6751.5, 300 sec: 6788.0). Total num frames: 129585152. Throughput: 0: 1644.0. Samples: 27392546. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:43:24,827][41694] Avg episode reward: [(0, '4.261')] +[2024-11-08 03:43:27,931][41694] Fps is (10 sec: 4915.2, 60 sec: 6621.9, 300 sec: 6761.9). Total num frames: 129605632. Throughput: 0: 1607.9. Samples: 27398144. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:43:27,933][41694] Avg episode reward: [(0, '4.479')] +[2024-11-08 03:43:29,698][42004] Updated weights for policy 0, policy_version 31646 (0.0042) +[2024-11-08 03:43:32,932][41694] Fps is (10 sec: 7568.5, 60 sec: 6690.1, 300 sec: 6761.9). Total num frames: 129646592. Throughput: 0: 1701.9. Samples: 27403570. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:43:32,933][41694] Avg episode reward: [(0, '4.379')] +[2024-11-08 03:43:35,065][42004] Updated weights for policy 0, policy_version 31656 (0.0028) +[2024-11-08 03:43:37,932][41694] Fps is (10 sec: 7782.3, 60 sec: 6621.9, 300 sec: 6761.9). Total num frames: 129683456. Throughput: 0: 1748.9. Samples: 27415278. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:43:37,934][41694] Avg episode reward: [(0, '4.326')] +[2024-11-08 03:43:38,063][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000031662_129687552.pth... +[2024-11-08 03:43:38,146][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000031266_128065536.pth +[2024-11-08 03:43:40,369][42004] Updated weights for policy 0, policy_version 31666 (0.0027) +[2024-11-08 03:43:42,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6758.4, 300 sec: 6808.1). Total num frames: 129720320. Throughput: 0: 1754.5. Samples: 27426760. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:43:42,934][41694] Avg episode reward: [(0, '4.378')] +[2024-11-08 03:43:45,487][42004] Updated weights for policy 0, policy_version 31676 (0.0020) +[2024-11-08 03:43:47,931][41694] Fps is (10 sec: 7782.6, 60 sec: 7159.1, 300 sec: 6845.2). Total num frames: 129761280. Throughput: 0: 1756.0. Samples: 27432782. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:43:47,933][41694] Avg episode reward: [(0, '4.409')] +[2024-11-08 03:43:51,083][42004] Updated weights for policy 0, policy_version 31686 (0.0023) +[2024-11-08 03:43:52,931][41694] Fps is (10 sec: 7372.9, 60 sec: 7099.8, 300 sec: 6831.3). Total num frames: 129794048. Throughput: 0: 1752.6. Samples: 27443896. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:43:52,935][41694] Avg episode reward: [(0, '4.402')] +[2024-11-08 03:43:59,321][41694] Fps is (10 sec: 5394.5, 60 sec: 6805.6, 300 sec: 6785.5). Total num frames: 129822720. Throughput: 0: 1653.6. Samples: 27453546. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:43:59,322][41694] Avg episode reward: [(0, '4.402')] +[2024-11-08 03:43:59,656][42004] Updated weights for policy 0, policy_version 31696 (0.0027) +[2024-11-08 03:44:02,932][41694] Fps is (10 sec: 5324.6, 60 sec: 6690.1, 300 sec: 6761.9). Total num frames: 129847296. Throughput: 0: 1609.9. Samples: 27454910. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:44:02,934][41694] Avg episode reward: [(0, '4.594')] +[2024-11-08 03:44:05,269][42004] Updated weights for policy 0, policy_version 31706 (0.0034) +[2024-11-08 03:44:07,931][41694] Fps is (10 sec: 7611.0, 60 sec: 6758.4, 300 sec: 6775.8). Total num frames: 129888256. Throughput: 0: 1702.5. Samples: 27465952. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:44:07,934][41694] Avg episode reward: [(0, '4.293')] +[2024-11-08 03:44:10,547][42004] Updated weights for policy 0, policy_version 31716 (0.0022) +[2024-11-08 03:44:12,931][41694] Fps is (10 sec: 7782.7, 60 sec: 6690.2, 300 sec: 6775.8). Total num frames: 129925120. Throughput: 0: 1763.5. Samples: 27477500. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:44:12,933][41694] Avg episode reward: [(0, '4.150')] +[2024-11-08 03:44:15,830][42004] Updated weights for policy 0, policy_version 31726 (0.0024) +[2024-11-08 03:44:17,932][41694] Fps is (10 sec: 7782.2, 60 sec: 6826.6, 300 sec: 6831.3). Total num frames: 129966080. Throughput: 0: 1772.7. Samples: 27483342. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:44:17,933][41694] Avg episode reward: [(0, '4.339')] +[2024-11-08 03:44:21,104][42004] Updated weights for policy 0, policy_version 31736 (0.0026) +[2024-11-08 03:44:22,932][41694] Fps is (10 sec: 7372.3, 60 sec: 7118.2, 300 sec: 6831.3). Total num frames: 129998848. Throughput: 0: 1769.1. Samples: 27494888. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:44:22,934][41694] Avg episode reward: [(0, '4.655')] +[2024-11-08 03:44:27,340][42004] Updated weights for policy 0, policy_version 31746 (0.0032) +[2024-11-08 03:44:27,932][41694] Fps is (10 sec: 6963.3, 60 sec: 7168.0, 300 sec: 6817.4). Total num frames: 130035712. Throughput: 0: 1738.4. Samples: 27504988. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:44:27,935][41694] Avg episode reward: [(0, '4.482')] +[2024-11-08 03:44:33,917][41694] Fps is (10 sec: 5593.2, 60 sec: 6783.5, 300 sec: 6753.2). Total num frames: 130060288. Throughput: 0: 1682.4. Samples: 27510148. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:44:33,919][41694] Avg episode reward: [(0, '4.645')] +[2024-11-08 03:44:35,441][42004] Updated weights for policy 0, policy_version 31756 (0.0025) +[2024-11-08 03:44:37,931][41694] Fps is (10 sec: 5324.9, 60 sec: 6758.4, 300 sec: 6734.1). Total num frames: 130088960. Throughput: 0: 1623.1. Samples: 27516936. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-08 03:44:37,934][41694] Avg episode reward: [(0, '4.892')] +[2024-11-08 03:44:40,571][42004] Updated weights for policy 0, policy_version 31766 (0.0031) +[2024-11-08 03:44:42,932][41694] Fps is (10 sec: 7724.2, 60 sec: 6826.6, 300 sec: 6734.1). Total num frames: 130129920. Throughput: 0: 1721.8. Samples: 27528634. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-08 03:44:42,934][41694] Avg episode reward: [(0, '4.522')] +[2024-11-08 03:44:45,959][42004] Updated weights for policy 0, policy_version 31776 (0.0028) +[2024-11-08 03:44:47,932][41694] Fps is (10 sec: 7782.3, 60 sec: 6758.4, 300 sec: 6734.1). Total num frames: 130166784. Throughput: 0: 1763.2. Samples: 27534252. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-08 03:44:47,933][41694] Avg episode reward: [(0, '4.420')] +[2024-11-08 03:44:51,026][42004] Updated weights for policy 0, policy_version 31786 (0.0022) +[2024-11-08 03:44:52,931][41694] Fps is (10 sec: 7373.0, 60 sec: 6826.7, 300 sec: 6803.6). Total num frames: 130203648. Throughput: 0: 1785.6. Samples: 27546306. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-08 03:44:52,934][41694] Avg episode reward: [(0, '4.549')] +[2024-11-08 03:44:56,754][42004] Updated weights for policy 0, policy_version 31796 (0.0022) +[2024-11-08 03:44:57,932][41694] Fps is (10 sec: 7782.0, 60 sec: 7198.1, 300 sec: 6845.2). Total num frames: 130244608. Throughput: 0: 1770.5. Samples: 27557176. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:44:57,935][41694] Avg episode reward: [(0, '4.471')] +[2024-11-08 03:45:02,931][41694] Fps is (10 sec: 6963.3, 60 sec: 7099.8, 300 sec: 6803.5). Total num frames: 130273280. Throughput: 0: 1745.5. Samples: 27561888. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:45:02,933][41694] Avg episode reward: [(0, '4.569')] +[2024-11-08 03:45:03,195][42004] Updated weights for policy 0, policy_version 31806 (0.0045) +[2024-11-08 03:45:08,406][41694] Fps is (10 sec: 5084.0, 60 sec: 6773.1, 300 sec: 6751.0). Total num frames: 130297856. Throughput: 0: 1693.8. Samples: 27571912. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:45:08,407][41694] Avg episode reward: [(0, '4.413')] +[2024-11-08 03:45:11,125][42004] Updated weights for policy 0, policy_version 31816 (0.0026) +[2024-11-08 03:45:12,932][41694] Fps is (10 sec: 5734.2, 60 sec: 6758.4, 300 sec: 6761.9). Total num frames: 130330624. Throughput: 0: 1639.5. Samples: 27578766. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:45:12,934][41694] Avg episode reward: [(0, '4.467')] +[2024-11-08 03:45:16,952][42004] Updated weights for policy 0, policy_version 31826 (0.0042) +[2024-11-08 03:45:17,932][41694] Fps is (10 sec: 6879.6, 60 sec: 6621.8, 300 sec: 6748.0). Total num frames: 130363392. Throughput: 0: 1674.7. Samples: 27583858. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:45:17,933][41694] Avg episode reward: [(0, '4.492')] +[2024-11-08 03:45:22,329][42004] Updated weights for policy 0, policy_version 31836 (0.0023) +[2024-11-08 03:45:22,932][41694] Fps is (10 sec: 7372.6, 60 sec: 6758.4, 300 sec: 6761.9). Total num frames: 130404352. Throughput: 0: 1735.0. Samples: 27595012. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:45:22,934][41694] Avg episode reward: [(0, '4.552')] +[2024-11-08 03:45:27,473][42004] Updated weights for policy 0, policy_version 31846 (0.0030) +[2024-11-08 03:45:27,931][41694] Fps is (10 sec: 7782.7, 60 sec: 6758.4, 300 sec: 6831.3). Total num frames: 130441216. Throughput: 0: 1740.3. Samples: 27606948. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:45:27,934][41694] Avg episode reward: [(0, '4.508')] +[2024-11-08 03:45:32,780][42004] Updated weights for policy 0, policy_version 31856 (0.0026) +[2024-11-08 03:45:32,931][41694] Fps is (10 sec: 7782.8, 60 sec: 7148.9, 300 sec: 6845.2). Total num frames: 130482176. Throughput: 0: 1744.8. Samples: 27612770. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:45:32,933][41694] Avg episode reward: [(0, '4.162')] +[2024-11-08 03:45:37,932][41694] Fps is (10 sec: 7372.7, 60 sec: 7099.7, 300 sec: 6831.3). Total num frames: 130514944. Throughput: 0: 1711.5. Samples: 27623322. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:45:37,933][41694] Avg episode reward: [(0, '4.458')] +[2024-11-08 03:45:37,963][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000031864_130514944.pth... +[2024-11-08 03:45:38,070][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000031465_128880640.pth +[2024-11-08 03:45:38,939][42004] Updated weights for policy 0, policy_version 31866 (0.0033) +[2024-11-08 03:45:42,932][41694] Fps is (10 sec: 5324.7, 60 sec: 6758.4, 300 sec: 6761.9). Total num frames: 130535424. Throughput: 0: 1585.7. Samples: 27628532. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:45:42,934][41694] Avg episode reward: [(0, '4.495')] +[2024-11-08 03:45:47,049][42004] Updated weights for policy 0, policy_version 31876 (0.0029) +[2024-11-08 03:45:47,932][41694] Fps is (10 sec: 5324.8, 60 sec: 6690.1, 300 sec: 6748.0). Total num frames: 130568192. Throughput: 0: 1621.8. Samples: 27634870. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:45:47,933][41694] Avg episode reward: [(0, '4.300')] +[2024-11-08 03:45:52,384][42004] Updated weights for policy 0, policy_version 31886 (0.0031) +[2024-11-08 03:45:52,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6758.4, 300 sec: 6761.9). Total num frames: 130609152. Throughput: 0: 1670.3. Samples: 27646282. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:45:52,933][41694] Avg episode reward: [(0, '4.409')] +[2024-11-08 03:45:57,691][42004] Updated weights for policy 0, policy_version 31896 (0.0036) +[2024-11-08 03:45:57,932][41694] Fps is (10 sec: 7782.2, 60 sec: 6690.2, 300 sec: 6761.9). Total num frames: 130646016. Throughput: 0: 1755.2. Samples: 27657748. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:45:57,934][41694] Avg episode reward: [(0, '4.411')] +[2024-11-08 03:46:02,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6826.6, 300 sec: 6845.2). Total num frames: 130682880. Throughput: 0: 1768.4. Samples: 27663436. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:46:02,934][41694] Avg episode reward: [(0, '4.473')] +[2024-11-08 03:46:03,383][42004] Updated weights for policy 0, policy_version 31906 (0.0031) +[2024-11-08 03:46:07,932][41694] Fps is (10 sec: 7372.5, 60 sec: 7087.4, 300 sec: 6859.0). Total num frames: 130719744. Throughput: 0: 1761.1. Samples: 27674260. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:46:07,935][41694] Avg episode reward: [(0, '4.364')] +[2024-11-08 03:46:09,217][42004] Updated weights for policy 0, policy_version 31916 (0.0031) +[2024-11-08 03:46:12,931][41694] Fps is (10 sec: 6553.7, 60 sec: 6963.2, 300 sec: 6845.2). Total num frames: 130748416. Throughput: 0: 1699.1. Samples: 27683406. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:46:12,937][41694] Avg episode reward: [(0, '4.487')] +[2024-11-08 03:46:17,223][42004] Updated weights for policy 0, policy_version 31926 (0.0036) +[2024-11-08 03:46:17,931][41694] Fps is (10 sec: 5325.1, 60 sec: 6826.7, 300 sec: 6803.5). Total num frames: 130772992. Throughput: 0: 1681.6. Samples: 27688440. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:46:17,934][41694] Avg episode reward: [(0, '4.370')] +[2024-11-08 03:46:22,396][42004] Updated weights for policy 0, policy_version 31936 (0.0034) +[2024-11-08 03:46:22,933][41694] Fps is (10 sec: 6552.6, 60 sec: 6826.5, 300 sec: 6803.5). Total num frames: 130813952. Throughput: 0: 1638.4. Samples: 27697054. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:46:22,937][41694] Avg episode reward: [(0, '4.609')] +[2024-11-08 03:46:27,569][42004] Updated weights for policy 0, policy_version 31946 (0.0033) +[2024-11-08 03:46:27,932][41694] Fps is (10 sec: 7781.8, 60 sec: 6826.6, 300 sec: 6803.5). Total num frames: 130850816. Throughput: 0: 1790.6. Samples: 27709112. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:46:27,934][41694] Avg episode reward: [(0, '4.382')] +[2024-11-08 03:46:32,932][41694] Fps is (10 sec: 7373.7, 60 sec: 6758.4, 300 sec: 6817.4). Total num frames: 130887680. Throughput: 0: 1780.3. Samples: 27714984. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:46:32,935][41694] Avg episode reward: [(0, '4.401')] +[2024-11-08 03:46:33,229][42004] Updated weights for policy 0, policy_version 31956 (0.0029) +[2024-11-08 03:46:37,932][41694] Fps is (10 sec: 7373.2, 60 sec: 6826.6, 300 sec: 6886.8). Total num frames: 130924544. Throughput: 0: 1756.2. Samples: 27725310. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:46:37,935][41694] Avg episode reward: [(0, '4.475')] +[2024-11-08 03:46:39,031][42004] Updated weights for policy 0, policy_version 31966 (0.0030) +[2024-11-08 03:46:42,931][41694] Fps is (10 sec: 6963.3, 60 sec: 7031.5, 300 sec: 6873.0). Total num frames: 130957312. Throughput: 0: 1724.9. Samples: 27735368. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:46:42,936][41694] Avg episode reward: [(0, '4.441')] +[2024-11-08 03:46:45,628][42004] Updated weights for policy 0, policy_version 31976 (0.0029) +[2024-11-08 03:46:47,932][41694] Fps is (10 sec: 6144.1, 60 sec: 6963.2, 300 sec: 6845.2). Total num frames: 130985984. Throughput: 0: 1700.1. Samples: 27739942. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:46:47,933][41694] Avg episode reward: [(0, '4.287')] +[2024-11-08 03:46:52,361][42004] Updated weights for policy 0, policy_version 31986 (0.0039) +[2024-11-08 03:46:52,933][41694] Fps is (10 sec: 5733.5, 60 sec: 6758.2, 300 sec: 6803.5). Total num frames: 131014656. Throughput: 0: 1655.4. Samples: 27748756. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:46:52,942][41694] Avg episode reward: [(0, '4.356')] +[2024-11-08 03:46:57,744][42004] Updated weights for policy 0, policy_version 31996 (0.0030) +[2024-11-08 03:46:57,932][41694] Fps is (10 sec: 6963.3, 60 sec: 6826.7, 300 sec: 6817.4). Total num frames: 131055616. Throughput: 0: 1698.4. Samples: 27759836. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:46:57,933][41694] Avg episode reward: [(0, '4.521')] +[2024-11-08 03:47:02,931][41694] Fps is (10 sec: 7783.6, 60 sec: 6826.7, 300 sec: 6817.4). Total num frames: 131092480. Throughput: 0: 1718.6. Samples: 27765776. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:47:02,934][41694] Avg episode reward: [(0, '4.456')] +[2024-11-08 03:47:03,176][42004] Updated weights for policy 0, policy_version 32006 (0.0031) +[2024-11-08 03:47:07,931][41694] Fps is (10 sec: 7782.5, 60 sec: 6895.0, 300 sec: 6845.2). Total num frames: 131133440. Throughput: 0: 1783.8. Samples: 27777322. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:47:07,933][41694] Avg episode reward: [(0, '4.356')] +[2024-11-08 03:47:08,268][42004] Updated weights for policy 0, policy_version 32016 (0.0027) +[2024-11-08 03:47:12,934][41694] Fps is (10 sec: 7780.9, 60 sec: 7031.2, 300 sec: 6914.6). Total num frames: 131170304. Throughput: 0: 1779.1. Samples: 27789172. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:47:12,940][41694] Avg episode reward: [(0, '4.392')] +[2024-11-08 03:47:13,619][42004] Updated weights for policy 0, policy_version 32026 (0.0028) +[2024-11-08 03:47:17,932][41694] Fps is (10 sec: 7372.6, 60 sec: 7236.2, 300 sec: 6914.6). Total num frames: 131207168. Throughput: 0: 1763.4. Samples: 27794338. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:47:17,934][41694] Avg episode reward: [(0, '4.275')] +[2024-11-08 03:47:19,780][42004] Updated weights for policy 0, policy_version 32036 (0.0041) +[2024-11-08 03:47:24,360][41694] Fps is (10 sec: 5376.8, 60 sec: 6801.4, 300 sec: 6826.0). Total num frames: 131231744. Throughput: 0: 1705.5. Samples: 27804492. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:47:24,366][41694] Avg episode reward: [(0, '4.478')] +[2024-11-08 03:47:27,923][42004] Updated weights for policy 0, policy_version 32046 (0.0028) +[2024-11-08 03:47:27,932][41694] Fps is (10 sec: 5324.9, 60 sec: 6826.8, 300 sec: 6831.3). Total num frames: 131260416. Throughput: 0: 1678.8. Samples: 27810916. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:47:27,933][41694] Avg episode reward: [(0, '4.489')] +[2024-11-08 03:47:32,931][41694] Fps is (10 sec: 7168.3, 60 sec: 6758.4, 300 sec: 6803.5). Total num frames: 131293184. Throughput: 0: 1696.9. Samples: 27816300. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:47:32,933][41694] Avg episode reward: [(0, '4.353')] +[2024-11-08 03:47:33,592][42004] Updated weights for policy 0, policy_version 32056 (0.0024) +[2024-11-08 03:47:37,932][41694] Fps is (10 sec: 7372.6, 60 sec: 6826.7, 300 sec: 6845.2). Total num frames: 131334144. Throughput: 0: 1748.1. Samples: 27827420. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:47:37,936][41694] Avg episode reward: [(0, '4.369')] +[2024-11-08 03:47:37,952][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000032064_131334144.pth... +[2024-11-08 03:47:38,097][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000031662_129687552.pth +[2024-11-08 03:47:38,971][42004] Updated weights for policy 0, policy_version 32066 (0.0024) +[2024-11-08 03:47:42,931][41694] Fps is (10 sec: 7372.7, 60 sec: 6826.7, 300 sec: 6896.9). Total num frames: 131366912. Throughput: 0: 1742.4. Samples: 27838246. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:47:42,933][41694] Avg episode reward: [(0, '4.456')] +[2024-11-08 03:47:44,880][42004] Updated weights for policy 0, policy_version 32076 (0.0026) +[2024-11-08 03:47:47,931][41694] Fps is (10 sec: 7373.1, 60 sec: 7031.5, 300 sec: 6914.6). Total num frames: 131407872. Throughput: 0: 1733.8. Samples: 27843796. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:47:47,933][41694] Avg episode reward: [(0, '4.443')] +[2024-11-08 03:47:49,979][42004] Updated weights for policy 0, policy_version 32086 (0.0030) +[2024-11-08 03:47:52,932][41694] Fps is (10 sec: 7782.4, 60 sec: 7168.2, 300 sec: 6914.6). Total num frames: 131444736. Throughput: 0: 1741.5. Samples: 27855688. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:47:52,934][41694] Avg episode reward: [(0, '4.706')] +[2024-11-08 03:47:55,183][42004] Updated weights for policy 0, policy_version 32096 (0.0028) +[2024-11-08 03:47:58,364][41694] Fps is (10 sec: 6281.9, 60 sec: 6913.4, 300 sec: 6862.9). Total num frames: 131473408. Throughput: 0: 1594.7. Samples: 27861618. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:47:58,365][41694] Avg episode reward: [(0, '4.497')] +[2024-11-08 03:48:02,882][42004] Updated weights for policy 0, policy_version 32106 (0.0025) +[2024-11-08 03:48:02,933][41694] Fps is (10 sec: 6142.8, 60 sec: 6894.7, 300 sec: 6859.0). Total num frames: 131506176. Throughput: 0: 1674.3. Samples: 27869686. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:48:02,936][41694] Avg episode reward: [(0, '4.346')] +[2024-11-08 03:48:07,931][41694] Fps is (10 sec: 6849.9, 60 sec: 6758.4, 300 sec: 6831.3). Total num frames: 131538944. Throughput: 0: 1719.9. Samples: 27879428. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:48:07,934][41694] Avg episode reward: [(0, '4.362')] +[2024-11-08 03:48:08,529][42004] Updated weights for policy 0, policy_version 32116 (0.0045) +[2024-11-08 03:48:12,932][41694] Fps is (10 sec: 7374.2, 60 sec: 6826.9, 300 sec: 6859.1). Total num frames: 131579904. Throughput: 0: 1777.1. Samples: 27890886. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:48:12,935][41694] Avg episode reward: [(0, '4.308')] +[2024-11-08 03:48:13,900][42004] Updated weights for policy 0, policy_version 32126 (0.0030) +[2024-11-08 03:48:17,932][41694] Fps is (10 sec: 7782.3, 60 sec: 6826.7, 300 sec: 6931.1). Total num frames: 131616768. Throughput: 0: 1785.0. Samples: 27896624. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:48:17,933][41694] Avg episode reward: [(0, '4.660')] +[2024-11-08 03:48:19,271][42004] Updated weights for policy 0, policy_version 32136 (0.0027) +[2024-11-08 03:48:22,932][41694] Fps is (10 sec: 7372.8, 60 sec: 7203.0, 300 sec: 6942.4). Total num frames: 131653632. Throughput: 0: 1785.3. Samples: 27907758. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:48:22,934][41694] Avg episode reward: [(0, '4.427')] +[2024-11-08 03:48:24,798][42004] Updated weights for policy 0, policy_version 32146 (0.0024) +[2024-11-08 03:48:27,931][41694] Fps is (10 sec: 7372.9, 60 sec: 7168.0, 300 sec: 6928.5). Total num frames: 131690496. Throughput: 0: 1794.4. Samples: 27918992. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:48:27,933][41694] Avg episode reward: [(0, '4.505')] +[2024-11-08 03:48:30,567][42004] Updated weights for policy 0, policy_version 32156 (0.0033) +[2024-11-08 03:48:32,932][41694] Fps is (10 sec: 5734.1, 60 sec: 6963.1, 300 sec: 6872.9). Total num frames: 131710976. Throughput: 0: 1788.3. Samples: 27924272. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:48:32,937][41694] Avg episode reward: [(0, '4.511')] +[2024-11-08 03:48:37,932][41694] Fps is (10 sec: 5324.7, 60 sec: 6826.7, 300 sec: 6859.1). Total num frames: 131743744. Throughput: 0: 1668.0. Samples: 27930746. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:48:37,934][41694] Avg episode reward: [(0, '4.299')] +[2024-11-08 03:48:38,638][42004] Updated weights for policy 0, policy_version 32166 (0.0029) +[2024-11-08 03:48:42,931][41694] Fps is (10 sec: 6963.7, 60 sec: 6895.0, 300 sec: 6845.2). Total num frames: 131780608. Throughput: 0: 1797.4. Samples: 27941722. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:48:42,934][41694] Avg episode reward: [(0, '4.496')] +[2024-11-08 03:48:44,168][42004] Updated weights for policy 0, policy_version 32176 (0.0030) +[2024-11-08 03:48:47,932][41694] Fps is (10 sec: 7782.5, 60 sec: 6894.9, 300 sec: 6872.9). Total num frames: 131821568. Throughput: 0: 1726.8. Samples: 27947390. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:48:47,934][41694] Avg episode reward: [(0, '4.466')] +[2024-11-08 03:48:49,420][42004] Updated weights for policy 0, policy_version 32186 (0.0027) +[2024-11-08 03:48:52,931][41694] Fps is (10 sec: 7782.4, 60 sec: 6895.0, 300 sec: 6933.4). Total num frames: 131858432. Throughput: 0: 1769.3. Samples: 27959046. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:48:52,933][41694] Avg episode reward: [(0, '4.590')] +[2024-11-08 03:48:54,775][42004] Updated weights for policy 0, policy_version 32196 (0.0026) +[2024-11-08 03:48:57,932][41694] Fps is (10 sec: 7372.7, 60 sec: 7082.5, 300 sec: 6942.4). Total num frames: 131895296. Throughput: 0: 1772.0. Samples: 27970624. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:48:57,935][41694] Avg episode reward: [(0, '4.608')] +[2024-11-08 03:49:00,307][42004] Updated weights for policy 0, policy_version 32206 (0.0026) +[2024-11-08 03:49:02,932][41694] Fps is (10 sec: 7372.6, 60 sec: 7099.9, 300 sec: 6928.5). Total num frames: 131932160. Throughput: 0: 1768.1. Samples: 27976190. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:49:02,934][41694] Avg episode reward: [(0, '4.297')] +[2024-11-08 03:49:07,932][41694] Fps is (10 sec: 5734.0, 60 sec: 6894.8, 300 sec: 6872.9). Total num frames: 131952640. Throughput: 0: 1708.8. Samples: 27984656. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:49:07,934][41694] Avg episode reward: [(0, '4.312')] +[2024-11-08 03:49:08,509][42004] Updated weights for policy 0, policy_version 32216 (0.0027) +[2024-11-08 03:49:12,932][41694] Fps is (10 sec: 5324.9, 60 sec: 6758.4, 300 sec: 6845.2). Total num frames: 131985408. Throughput: 0: 1634.1. Samples: 27992528. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:49:12,938][41694] Avg episode reward: [(0, '4.376')] +[2024-11-08 03:49:14,547][42004] Updated weights for policy 0, policy_version 32226 (0.0035) +[2024-11-08 03:49:17,932][41694] Fps is (10 sec: 6963.7, 60 sec: 6758.4, 300 sec: 6859.1). Total num frames: 132022272. Throughput: 0: 1635.1. Samples: 27997852. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:49:17,934][41694] Avg episode reward: [(0, '4.425')] +[2024-11-08 03:49:19,934][42004] Updated weights for policy 0, policy_version 32236 (0.0026) +[2024-11-08 03:49:22,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6758.4, 300 sec: 6859.1). Total num frames: 132059136. Throughput: 0: 1751.8. Samples: 28009578. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:49:22,933][41694] Avg episode reward: [(0, '4.308')] +[2024-11-08 03:49:25,072][42004] Updated weights for policy 0, policy_version 32246 (0.0024) +[2024-11-08 03:49:27,932][41694] Fps is (10 sec: 7782.2, 60 sec: 6826.6, 300 sec: 6937.8). Total num frames: 132100096. Throughput: 0: 1765.6. Samples: 28021176. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:49:27,934][41694] Avg episode reward: [(0, '4.552')] +[2024-11-08 03:49:30,618][42004] Updated weights for policy 0, policy_version 32256 (0.0028) +[2024-11-08 03:49:32,931][41694] Fps is (10 sec: 7782.4, 60 sec: 7099.8, 300 sec: 6942.4). Total num frames: 132136960. Throughput: 0: 1760.8. Samples: 28026628. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:49:32,933][41694] Avg episode reward: [(0, '4.362')] +[2024-11-08 03:49:36,221][42004] Updated weights for policy 0, policy_version 32266 (0.0025) +[2024-11-08 03:49:37,932][41694] Fps is (10 sec: 6963.3, 60 sec: 7099.7, 300 sec: 6914.6). Total num frames: 132169728. Throughput: 0: 1745.9. Samples: 28037612. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:49:37,934][41694] Avg episode reward: [(0, '4.558')] +[2024-11-08 03:49:37,989][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000032269_132173824.pth... +[2024-11-08 03:49:38,084][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000031864_130514944.pth +[2024-11-08 03:49:42,931][41694] Fps is (10 sec: 4915.2, 60 sec: 6758.4, 300 sec: 6845.2). Total num frames: 132186112. Throughput: 0: 1623.2. Samples: 28043670. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:49:42,933][41694] Avg episode reward: [(0, '4.635')] +[2024-11-08 03:49:45,359][42004] Updated weights for policy 0, policy_version 32276 (0.0052) +[2024-11-08 03:49:47,932][41694] Fps is (10 sec: 4915.2, 60 sec: 6621.9, 300 sec: 6831.3). Total num frames: 132218880. Throughput: 0: 1589.2. Samples: 28047704. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:49:47,934][41694] Avg episode reward: [(0, '4.506')] +[2024-11-08 03:49:51,092][42004] Updated weights for policy 0, policy_version 32286 (0.0031) +[2024-11-08 03:49:52,933][41694] Fps is (10 sec: 6552.3, 60 sec: 6553.4, 300 sec: 6803.5). Total num frames: 132251648. Throughput: 0: 1634.7. Samples: 28058218. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-08 03:49:52,935][41694] Avg episode reward: [(0, '4.361')] +[2024-11-08 03:49:56,815][42004] Updated weights for policy 0, policy_version 32296 (0.0047) +[2024-11-08 03:49:57,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6621.9, 300 sec: 6845.2). Total num frames: 132292608. Throughput: 0: 1701.1. Samples: 28069076. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-08 03:49:57,933][41694] Avg episode reward: [(0, '4.446')] +[2024-11-08 03:50:02,931][41694] Fps is (10 sec: 6554.9, 60 sec: 6417.1, 300 sec: 6856.2). Total num frames: 132317184. Throughput: 0: 1689.3. Samples: 28073868. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-08 03:50:02,933][41694] Avg episode reward: [(0, '4.446')] +[2024-11-08 03:50:03,953][42004] Updated weights for policy 0, policy_version 32306 (0.0021) +[2024-11-08 03:50:07,932][41694] Fps is (10 sec: 5734.2, 60 sec: 6621.9, 300 sec: 6845.2). Total num frames: 132349952. Throughput: 0: 1603.6. Samples: 28081742. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-08 03:50:07,934][41694] Avg episode reward: [(0, '4.381')] +[2024-11-08 03:50:10,107][42004] Updated weights for policy 0, policy_version 32316 (0.0027) +[2024-11-08 03:50:12,931][41694] Fps is (10 sec: 6553.6, 60 sec: 6621.9, 300 sec: 6845.2). Total num frames: 132382720. Throughput: 0: 1578.5. Samples: 28092206. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:50:12,933][41694] Avg episode reward: [(0, '4.345')] +[2024-11-08 03:50:17,933][41694] Fps is (10 sec: 4505.0, 60 sec: 6212.1, 300 sec: 6748.0). Total num frames: 132395008. Throughput: 0: 1523.2. Samples: 28095174. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:50:17,937][41694] Avg episode reward: [(0, '4.387')] +[2024-11-08 03:50:19,763][42004] Updated weights for policy 0, policy_version 32326 (0.0038) +[2024-11-08 03:50:22,931][41694] Fps is (10 sec: 4505.6, 60 sec: 6144.0, 300 sec: 6734.1). Total num frames: 132427776. Throughput: 0: 1415.4. Samples: 28101306. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:50:22,933][41694] Avg episode reward: [(0, '4.323')] +[2024-11-08 03:50:25,739][42004] Updated weights for policy 0, policy_version 32336 (0.0025) +[2024-11-08 03:50:27,932][41694] Fps is (10 sec: 6554.8, 60 sec: 6007.5, 300 sec: 6706.3). Total num frames: 132460544. Throughput: 0: 1520.2. Samples: 28112080. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:50:27,933][41694] Avg episode reward: [(0, '4.268')] +[2024-11-08 03:50:31,249][42004] Updated weights for policy 0, policy_version 32346 (0.0052) +[2024-11-08 03:50:32,933][41694] Fps is (10 sec: 7371.4, 60 sec: 6075.5, 300 sec: 6734.1). Total num frames: 132501504. Throughput: 0: 1551.5. Samples: 28117526. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:50:32,936][41694] Avg episode reward: [(0, '4.368')] +[2024-11-08 03:50:36,726][42004] Updated weights for policy 0, policy_version 32356 (0.0031) +[2024-11-08 03:50:37,932][41694] Fps is (10 sec: 7782.3, 60 sec: 6144.0, 300 sec: 6789.6). Total num frames: 132538368. Throughput: 0: 1566.3. Samples: 28128698. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:50:37,934][41694] Avg episode reward: [(0, '4.490')] +[2024-11-08 03:50:42,293][42004] Updated weights for policy 0, policy_version 32366 (0.0025) +[2024-11-08 03:50:42,933][41694] Fps is (10 sec: 7373.1, 60 sec: 6485.2, 300 sec: 6803.5). Total num frames: 132575232. Throughput: 0: 1572.3. Samples: 28139832. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:50:42,935][41694] Avg episode reward: [(0, '4.331')] +[2024-11-08 03:50:47,671][42004] Updated weights for policy 0, policy_version 32376 (0.0025) +[2024-11-08 03:50:47,931][41694] Fps is (10 sec: 7373.0, 60 sec: 6553.6, 300 sec: 6789.6). Total num frames: 132612096. Throughput: 0: 1588.7. Samples: 28145360. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:50:47,933][41694] Avg episode reward: [(0, '4.397')] +[2024-11-08 03:50:52,931][41694] Fps is (10 sec: 5735.3, 60 sec: 6349.0, 300 sec: 6734.1). Total num frames: 132632576. Throughput: 0: 1583.0. Samples: 28152978. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:50:52,935][41694] Avg episode reward: [(0, '4.627')] +[2024-11-08 03:50:55,824][42004] Updated weights for policy 0, policy_version 32386 (0.0032) +[2024-11-08 03:50:57,931][41694] Fps is (10 sec: 5324.8, 60 sec: 6212.3, 300 sec: 6720.2). Total num frames: 132665344. Throughput: 0: 1577.9. Samples: 28163210. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:50:57,933][41694] Avg episode reward: [(0, '4.663')] +[2024-11-08 03:51:01,362][42004] Updated weights for policy 0, policy_version 32396 (0.0038) +[2024-11-08 03:51:02,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6417.0, 300 sec: 6720.2). Total num frames: 132702208. Throughput: 0: 1630.8. Samples: 28168558. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:51:02,934][41694] Avg episode reward: [(0, '4.341')] +[2024-11-08 03:51:07,094][42004] Updated weights for policy 0, policy_version 32406 (0.0035) +[2024-11-08 03:51:07,932][41694] Fps is (10 sec: 7372.6, 60 sec: 6485.4, 300 sec: 6748.0). Total num frames: 132739072. Throughput: 0: 1731.4. Samples: 28179220. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:51:07,934][41694] Avg episode reward: [(0, '4.460')] +[2024-11-08 03:51:12,931][41694] Fps is (10 sec: 6963.4, 60 sec: 6485.3, 300 sec: 6775.8). Total num frames: 132771840. Throughput: 0: 1724.3. Samples: 28189674. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:51:12,933][41694] Avg episode reward: [(0, '4.778')] +[2024-11-08 03:51:13,092][42004] Updated weights for policy 0, policy_version 32416 (0.0039) +[2024-11-08 03:51:17,933][41694] Fps is (10 sec: 6962.5, 60 sec: 6895.0, 300 sec: 6761.9). Total num frames: 132808704. Throughput: 0: 1720.0. Samples: 28194926. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:51:17,935][41694] Avg episode reward: [(0, '4.469')] +[2024-11-08 03:51:18,507][42004] Updated weights for policy 0, policy_version 32426 (0.0040) +[2024-11-08 03:51:24,945][41694] Fps is (10 sec: 6136.8, 60 sec: 6737.1, 300 sec: 6716.0). Total num frames: 132845568. Throughput: 0: 1653.7. Samples: 28206446. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:51:24,947][41694] Avg episode reward: [(0, '4.381')] +[2024-11-08 03:51:26,297][42004] Updated weights for policy 0, policy_version 32436 (0.0027) +[2024-11-08 03:51:27,932][41694] Fps is (10 sec: 5735.0, 60 sec: 6758.4, 300 sec: 6706.3). Total num frames: 132866048. Throughput: 0: 1627.3. Samples: 28213060. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:51:27,933][41694] Avg episode reward: [(0, '4.283')] +[2024-11-08 03:51:32,121][42004] Updated weights for policy 0, policy_version 32446 (0.0030) +[2024-11-08 03:51:32,932][41694] Fps is (10 sec: 7180.4, 60 sec: 6690.3, 300 sec: 6706.3). Total num frames: 132902912. Throughput: 0: 1616.6. Samples: 28218106. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:51:32,934][41694] Avg episode reward: [(0, '4.332')] +[2024-11-08 03:51:37,548][42004] Updated weights for policy 0, policy_version 32456 (0.0030) +[2024-11-08 03:51:37,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6690.1, 300 sec: 6720.2). Total num frames: 132939776. Throughput: 0: 1699.5. Samples: 28229454. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:51:37,934][41694] Avg episode reward: [(0, '4.421')] +[2024-11-08 03:51:37,949][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000032456_132939776.pth... +[2024-11-08 03:51:38,043][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000032064_131334144.pth +[2024-11-08 03:51:42,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6690.3, 300 sec: 6748.0). Total num frames: 132976640. Throughput: 0: 1728.1. Samples: 28240974. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:51:42,933][41694] Avg episode reward: [(0, '4.415')] +[2024-11-08 03:51:42,993][42004] Updated weights for policy 0, policy_version 32466 (0.0027) +[2024-11-08 03:51:47,932][41694] Fps is (10 sec: 7782.2, 60 sec: 6758.3, 300 sec: 6789.7). Total num frames: 133017600. Throughput: 0: 1732.8. Samples: 28246536. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:51:47,934][41694] Avg episode reward: [(0, '4.464')] +[2024-11-08 03:51:48,228][42004] Updated weights for policy 0, policy_version 32476 (0.0030) +[2024-11-08 03:51:52,932][41694] Fps is (10 sec: 7782.4, 60 sec: 7031.5, 300 sec: 6775.8). Total num frames: 133054464. Throughput: 0: 1741.4. Samples: 28257582. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:51:52,933][41694] Avg episode reward: [(0, '4.126')] +[2024-11-08 03:51:53,978][42004] Updated weights for policy 0, policy_version 32486 (0.0037) +[2024-11-08 03:51:59,274][41694] Fps is (10 sec: 5777.9, 60 sec: 6810.8, 300 sec: 6717.4). Total num frames: 133083136. Throughput: 0: 1694.9. Samples: 28268220. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:51:59,276][41694] Avg episode reward: [(0, '4.389')] +[2024-11-08 03:52:02,414][42004] Updated weights for policy 0, policy_version 32496 (0.0036) +[2024-11-08 03:52:02,932][41694] Fps is (10 sec: 4915.2, 60 sec: 6690.1, 300 sec: 6678.6). Total num frames: 133103616. Throughput: 0: 1662.5. Samples: 28269738. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:52:02,934][41694] Avg episode reward: [(0, '4.369')] +[2024-11-08 03:52:07,932][41694] Fps is (10 sec: 6623.6, 60 sec: 6690.1, 300 sec: 6678.6). Total num frames: 133140480. Throughput: 0: 1707.3. Samples: 28279836. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:52:07,935][41694] Avg episode reward: [(0, '4.517')] +[2024-11-08 03:52:07,997][42004] Updated weights for policy 0, policy_version 32506 (0.0038) +[2024-11-08 03:52:12,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6826.6, 300 sec: 6692.5). Total num frames: 133181440. Throughput: 0: 1736.5. Samples: 28291202. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:52:12,934][41694] Avg episode reward: [(0, '4.268')] +[2024-11-08 03:52:13,428][42004] Updated weights for policy 0, policy_version 32516 (0.0030) +[2024-11-08 03:52:17,931][41694] Fps is (10 sec: 7782.8, 60 sec: 6826.8, 300 sec: 6766.9). Total num frames: 133218304. Throughput: 0: 1748.4. Samples: 28296782. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:52:17,933][41694] Avg episode reward: [(0, '4.512')] +[2024-11-08 03:52:18,851][42004] Updated weights for policy 0, policy_version 32526 (0.0028) +[2024-11-08 03:52:22,932][41694] Fps is (10 sec: 7372.7, 60 sec: 7063.7, 300 sec: 6761.9). Total num frames: 133255168. Throughput: 0: 1747.1. Samples: 28308074. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-08 03:52:22,933][41694] Avg episode reward: [(0, '4.562')] +[2024-11-08 03:52:24,406][42004] Updated weights for policy 0, policy_version 32536 (0.0026) +[2024-11-08 03:52:27,932][41694] Fps is (10 sec: 7372.7, 60 sec: 7099.7, 300 sec: 6775.8). Total num frames: 133292032. Throughput: 0: 1742.8. Samples: 28319398. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-08 03:52:27,933][41694] Avg episode reward: [(0, '4.486')] +[2024-11-08 03:52:29,890][42004] Updated weights for policy 0, policy_version 32546 (0.0024) +[2024-11-08 03:52:33,611][41694] Fps is (10 sec: 5753.4, 60 sec: 6817.8, 300 sec: 6704.8). Total num frames: 133316608. Throughput: 0: 1714.7. Samples: 28324860. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-08 03:52:33,612][41694] Avg episode reward: [(0, '4.472')] +[2024-11-08 03:52:37,931][41694] Fps is (10 sec: 5324.9, 60 sec: 6758.4, 300 sec: 6706.3). Total num frames: 133345280. Throughput: 0: 1634.9. Samples: 28331152. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-08 03:52:37,933][41694] Avg episode reward: [(0, '4.343')] +[2024-11-08 03:52:37,986][42004] Updated weights for policy 0, policy_version 32556 (0.0041) +[2024-11-08 03:52:42,932][41694] Fps is (10 sec: 7470.4, 60 sec: 6826.7, 300 sec: 6706.3). Total num frames: 133386240. Throughput: 0: 1702.7. Samples: 28342556. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:52:42,934][41694] Avg episode reward: [(0, '4.415')] +[2024-11-08 03:52:43,412][42004] Updated weights for policy 0, policy_version 32566 (0.0027) +[2024-11-08 03:52:47,932][41694] Fps is (10 sec: 7782.3, 60 sec: 6758.4, 300 sec: 6706.3). Total num frames: 133423104. Throughput: 0: 1736.8. Samples: 28347894. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:52:47,934][41694] Avg episode reward: [(0, '4.414')] +[2024-11-08 03:52:48,977][42004] Updated weights for policy 0, policy_version 32576 (0.0038) +[2024-11-08 03:52:52,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6758.4, 300 sec: 6744.0). Total num frames: 133459968. Throughput: 0: 1758.5. Samples: 28358968. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:52:52,936][41694] Avg episode reward: [(0, '4.448')] +[2024-11-08 03:52:54,502][42004] Updated weights for policy 0, policy_version 32586 (0.0028) +[2024-11-08 03:52:57,932][41694] Fps is (10 sec: 6962.7, 60 sec: 6982.9, 300 sec: 6734.1). Total num frames: 133492736. Throughput: 0: 1749.1. Samples: 28369912. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:52:57,939][41694] Avg episode reward: [(0, '4.313')] +[2024-11-08 03:53:00,189][42004] Updated weights for policy 0, policy_version 32596 (0.0045) +[2024-11-08 03:53:02,932][41694] Fps is (10 sec: 6963.3, 60 sec: 7099.7, 300 sec: 6748.0). Total num frames: 133529600. Throughput: 0: 1752.5. Samples: 28375644. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:53:02,933][41694] Avg episode reward: [(0, '4.641')] +[2024-11-08 03:53:07,965][41694] Fps is (10 sec: 5307.3, 60 sec: 6754.6, 300 sec: 6663.9). Total num frames: 133545984. Throughput: 0: 1613.3. Samples: 28380728. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:53:07,967][41694] Avg episode reward: [(0, '4.517')] +[2024-11-08 03:53:08,608][42004] Updated weights for policy 0, policy_version 32606 (0.0037) +[2024-11-08 03:53:12,931][41694] Fps is (10 sec: 4915.3, 60 sec: 6621.9, 300 sec: 6650.8). Total num frames: 133578752. Throughput: 0: 1605.2. Samples: 28391630. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:53:12,939][41694] Avg episode reward: [(0, '4.424')] +[2024-11-08 03:53:14,945][42004] Updated weights for policy 0, policy_version 32616 (0.0036) +[2024-11-08 03:53:17,932][41694] Fps is (10 sec: 6986.8, 60 sec: 6621.9, 300 sec: 6650.8). Total num frames: 133615616. Throughput: 0: 1615.8. Samples: 28396476. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:53:17,934][41694] Avg episode reward: [(0, '4.332')] +[2024-11-08 03:53:20,267][42004] Updated weights for policy 0, policy_version 32626 (0.0022) +[2024-11-08 03:53:22,932][41694] Fps is (10 sec: 7372.5, 60 sec: 6621.9, 300 sec: 6650.8). Total num frames: 133652480. Throughput: 0: 1706.9. Samples: 28407964. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:53:22,934][41694] Avg episode reward: [(0, '4.307')] +[2024-11-08 03:53:25,816][42004] Updated weights for policy 0, policy_version 32636 (0.0026) +[2024-11-08 03:53:27,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6621.9, 300 sec: 6706.3). Total num frames: 133689344. Throughput: 0: 1694.8. Samples: 28418824. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:53:27,934][41694] Avg episode reward: [(0, '4.655')] +[2024-11-08 03:53:31,322][42004] Updated weights for policy 0, policy_version 32646 (0.0033) +[2024-11-08 03:53:32,932][41694] Fps is (10 sec: 7782.5, 60 sec: 6973.8, 300 sec: 6734.1). Total num frames: 133730304. Throughput: 0: 1699.9. Samples: 28424390. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:53:32,934][41694] Avg episode reward: [(0, '4.494')] +[2024-11-08 03:53:36,778][42004] Updated weights for policy 0, policy_version 32656 (0.0027) +[2024-11-08 03:53:37,932][41694] Fps is (10 sec: 7372.6, 60 sec: 6963.2, 300 sec: 6720.2). Total num frames: 133763072. Throughput: 0: 1708.1. Samples: 28435834. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:53:37,933][41694] Avg episode reward: [(0, '4.361')] +[2024-11-08 03:53:37,958][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000032657_133763072.pth... +[2024-11-08 03:53:38,101][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000032269_132173824.pth +[2024-11-08 03:53:42,932][41694] Fps is (10 sec: 5324.7, 60 sec: 6621.8, 300 sec: 6650.8). Total num frames: 133783552. Throughput: 0: 1615.0. Samples: 28442588. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:53:42,935][41694] Avg episode reward: [(0, '4.521')] +[2024-11-08 03:53:45,157][42004] Updated weights for policy 0, policy_version 32666 (0.0033) +[2024-11-08 03:53:47,933][41694] Fps is (10 sec: 5324.4, 60 sec: 6553.5, 300 sec: 6636.9). Total num frames: 133816320. Throughput: 0: 1592.2. Samples: 28447296. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:53:47,935][41694] Avg episode reward: [(0, '4.540')] +[2024-11-08 03:53:50,788][42004] Updated weights for policy 0, policy_version 32676 (0.0029) +[2024-11-08 03:53:52,931][41694] Fps is (10 sec: 6963.5, 60 sec: 6553.6, 300 sec: 6636.9). Total num frames: 133853184. Throughput: 0: 1721.2. Samples: 28458122. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:53:52,933][41694] Avg episode reward: [(0, '4.477')] +[2024-11-08 03:53:56,410][42004] Updated weights for policy 0, policy_version 32686 (0.0029) +[2024-11-08 03:53:57,931][41694] Fps is (10 sec: 7373.5, 60 sec: 6622.0, 300 sec: 6636.9). Total num frames: 133890048. Throughput: 0: 1724.8. Samples: 28469246. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:53:57,933][41694] Avg episode reward: [(0, '4.531')] +[2024-11-08 03:54:02,079][42004] Updated weights for policy 0, policy_version 32696 (0.0028) +[2024-11-08 03:54:02,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6621.9, 300 sec: 6692.5). Total num frames: 133926912. Throughput: 0: 1741.0. Samples: 28474820. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:54:02,933][41694] Avg episode reward: [(0, '4.232')] +[2024-11-08 03:54:07,370][42004] Updated weights for policy 0, policy_version 32706 (0.0030) +[2024-11-08 03:54:07,932][41694] Fps is (10 sec: 7782.3, 60 sec: 7035.4, 300 sec: 6720.2). Total num frames: 133967872. Throughput: 0: 1734.0. Samples: 28485992. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:54:07,934][41694] Avg episode reward: [(0, '4.349')] +[2024-11-08 03:54:12,931][41694] Fps is (10 sec: 7372.8, 60 sec: 7031.5, 300 sec: 6706.3). Total num frames: 134000640. Throughput: 0: 1731.6. Samples: 28496748. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:54:12,936][41694] Avg episode reward: [(0, '4.344')] +[2024-11-08 03:54:13,334][42004] Updated weights for policy 0, policy_version 32716 (0.0038) +[2024-11-08 03:54:17,931][41694] Fps is (10 sec: 5324.9, 60 sec: 6758.4, 300 sec: 6650.8). Total num frames: 134021120. Throughput: 0: 1703.9. Samples: 28501066. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:54:17,933][41694] Avg episode reward: [(0, '4.366')] +[2024-11-08 03:54:21,101][42004] Updated weights for policy 0, policy_version 32726 (0.0022) +[2024-11-08 03:54:22,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6758.4, 300 sec: 6636.9). Total num frames: 134057984. Throughput: 0: 1623.6. Samples: 28508896. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:54:22,933][41694] Avg episode reward: [(0, '4.347')] +[2024-11-08 03:54:26,488][42004] Updated weights for policy 0, policy_version 32736 (0.0023) +[2024-11-08 03:54:27,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6758.4, 300 sec: 6636.9). Total num frames: 134094848. Throughput: 0: 1726.8. Samples: 28520292. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:54:27,933][41694] Avg episode reward: [(0, '4.371')] +[2024-11-08 03:54:31,765][42004] Updated weights for policy 0, policy_version 32746 (0.0027) +[2024-11-08 03:54:32,931][41694] Fps is (10 sec: 7782.5, 60 sec: 6758.4, 300 sec: 6664.7). Total num frames: 134135808. Throughput: 0: 1748.1. Samples: 28525958. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:54:32,934][41694] Avg episode reward: [(0, '4.357')] +[2024-11-08 03:54:37,188][42004] Updated weights for policy 0, policy_version 32756 (0.0034) +[2024-11-08 03:54:37,931][41694] Fps is (10 sec: 7782.5, 60 sec: 6826.7, 300 sec: 6734.1). Total num frames: 134172672. Throughput: 0: 1763.6. Samples: 28537486. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:54:37,933][41694] Avg episode reward: [(0, '4.564')] +[2024-11-08 03:54:42,652][42004] Updated weights for policy 0, policy_version 32766 (0.0029) +[2024-11-08 03:54:42,931][41694] Fps is (10 sec: 7372.8, 60 sec: 7099.8, 300 sec: 6748.0). Total num frames: 134209536. Throughput: 0: 1769.1. Samples: 28548856. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:54:42,933][41694] Avg episode reward: [(0, '4.385')] +[2024-11-08 03:54:47,932][41694] Fps is (10 sec: 7372.7, 60 sec: 7168.1, 300 sec: 6761.9). Total num frames: 134246400. Throughput: 0: 1762.2. Samples: 28554120. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:54:47,933][41694] Avg episode reward: [(0, '4.282')] +[2024-11-08 03:54:48,462][42004] Updated weights for policy 0, policy_version 32776 (0.0034) +[2024-11-08 03:54:52,931][41694] Fps is (10 sec: 5734.4, 60 sec: 6894.9, 300 sec: 6692.5). Total num frames: 134266880. Throughput: 0: 1663.4. Samples: 28560846. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:54:52,933][41694] Avg episode reward: [(0, '4.325')] +[2024-11-08 03:54:56,185][42004] Updated weights for policy 0, policy_version 32786 (0.0028) +[2024-11-08 03:54:57,931][41694] Fps is (10 sec: 5734.4, 60 sec: 6894.9, 300 sec: 6734.1). Total num frames: 134303744. Throughput: 0: 1674.2. Samples: 28572086. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:54:57,933][41694] Avg episode reward: [(0, '4.592')] +[2024-11-08 03:55:01,634][42004] Updated weights for policy 0, policy_version 32796 (0.0041) +[2024-11-08 03:55:02,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6826.7, 300 sec: 6734.1). Total num frames: 134336512. Throughput: 0: 1700.3. Samples: 28577580. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:55:02,933][41694] Avg episode reward: [(0, '4.659')] +[2024-11-08 03:55:07,488][42004] Updated weights for policy 0, policy_version 32806 (0.0021) +[2024-11-08 03:55:07,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6758.4, 300 sec: 6748.0). Total num frames: 134373376. Throughput: 0: 1759.2. Samples: 28588058. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:55:07,933][41694] Avg episode reward: [(0, '4.427')] +[2024-11-08 03:55:12,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6826.6, 300 sec: 6831.3). Total num frames: 134410240. Throughput: 0: 1762.2. Samples: 28599590. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 03:55:12,933][41694] Avg episode reward: [(0, '4.371')] +[2024-11-08 03:55:12,987][42004] Updated weights for policy 0, policy_version 32816 (0.0044) +[2024-11-08 03:55:17,933][41694] Fps is (10 sec: 6962.1, 60 sec: 7031.3, 300 sec: 6831.3). Total num frames: 134443008. Throughput: 0: 1737.8. Samples: 28604160. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:55:17,935][41694] Avg episode reward: [(0, '4.399')] +[2024-11-08 03:55:19,587][42004] Updated weights for policy 0, policy_version 32826 (0.0028) +[2024-11-08 03:55:22,932][41694] Fps is (10 sec: 6553.7, 60 sec: 6963.2, 300 sec: 6831.3). Total num frames: 134475776. Throughput: 0: 1695.5. Samples: 28613782. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:55:22,933][41694] Avg episode reward: [(0, '4.327')] +[2024-11-08 03:55:27,626][42004] Updated weights for policy 0, policy_version 32836 (0.0026) +[2024-11-08 03:55:27,931][41694] Fps is (10 sec: 5325.6, 60 sec: 6690.1, 300 sec: 6761.9). Total num frames: 134496256. Throughput: 0: 1593.4. Samples: 28620560. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:55:27,933][41694] Avg episode reward: [(0, '4.377')] +[2024-11-08 03:55:32,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6621.8, 300 sec: 6761.9). Total num frames: 134533120. Throughput: 0: 1602.1. Samples: 28626216. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:55:32,934][41694] Avg episode reward: [(0, '4.756')] +[2024-11-08 03:55:32,944][42004] Updated weights for policy 0, policy_version 32846 (0.0025) +[2024-11-08 03:55:37,932][41694] Fps is (10 sec: 7782.0, 60 sec: 6690.1, 300 sec: 6775.8). Total num frames: 134574080. Throughput: 0: 1711.0. Samples: 28637840. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:55:37,934][41694] Avg episode reward: [(0, '4.605')] +[2024-11-08 03:55:37,944][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000032855_134574080.pth... +[2024-11-08 03:55:38,060][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000032456_132939776.pth +[2024-11-08 03:55:38,202][42004] Updated weights for policy 0, policy_version 32856 (0.0029) +[2024-11-08 03:55:42,931][41694] Fps is (10 sec: 7782.5, 60 sec: 6690.1, 300 sec: 6775.8). Total num frames: 134610944. Throughput: 0: 1719.9. Samples: 28649480. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:55:42,933][41694] Avg episode reward: [(0, '4.606')] +[2024-11-08 03:55:43,493][42004] Updated weights for policy 0, policy_version 32866 (0.0029) +[2024-11-08 03:55:47,931][41694] Fps is (10 sec: 7782.7, 60 sec: 6758.4, 300 sec: 6845.2). Total num frames: 134651904. Throughput: 0: 1725.0. Samples: 28655204. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:55:47,933][41694] Avg episode reward: [(0, '4.549')] +[2024-11-08 03:55:48,886][42004] Updated weights for policy 0, policy_version 32876 (0.0026) +[2024-11-08 03:55:52,931][41694] Fps is (10 sec: 7372.7, 60 sec: 6963.2, 300 sec: 6845.2). Total num frames: 134684672. Throughput: 0: 1740.2. Samples: 28666366. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:55:52,933][41694] Avg episode reward: [(0, '4.618')] +[2024-11-08 03:55:54,972][42004] Updated weights for policy 0, policy_version 32886 (0.0027) +[2024-11-08 03:55:59,985][41694] Fps is (10 sec: 5437.0, 60 sec: 6666.7, 300 sec: 6784.1). Total num frames: 134717440. Throughput: 0: 1630.1. Samples: 28676294. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:55:59,988][41694] Avg episode reward: [(0, '4.451')] +[2024-11-08 03:56:02,937][41694] Fps is (10 sec: 5323.4, 60 sec: 6689.9, 300 sec: 6775.7). Total num frames: 134737920. Throughput: 0: 1635.7. Samples: 28677770. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:56:02,938][41694] Avg episode reward: [(0, '4.466')] +[2024-11-08 03:56:03,211][42004] Updated weights for policy 0, policy_version 32896 (0.0032) +[2024-11-08 03:56:07,932][41694] Fps is (10 sec: 7216.3, 60 sec: 6690.1, 300 sec: 6789.6). Total num frames: 134774784. Throughput: 0: 1650.0. Samples: 28688030. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:56:07,934][41694] Avg episode reward: [(0, '4.375')] +[2024-11-08 03:56:08,735][42004] Updated weights for policy 0, policy_version 32906 (0.0031) +[2024-11-08 03:56:12,932][41694] Fps is (10 sec: 6964.9, 60 sec: 6621.9, 300 sec: 6775.8). Total num frames: 134807552. Throughput: 0: 1731.9. Samples: 28698496. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:56:12,934][41694] Avg episode reward: [(0, '4.392')] +[2024-11-08 03:56:14,825][42004] Updated weights for policy 0, policy_version 32916 (0.0028) +[2024-11-08 03:56:17,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6690.3, 300 sec: 6822.3). Total num frames: 134844416. Throughput: 0: 1725.7. Samples: 28703872. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:56:17,933][41694] Avg episode reward: [(0, '4.799')] +[2024-11-08 03:56:20,394][42004] Updated weights for policy 0, policy_version 32926 (0.0029) +[2024-11-08 03:56:22,931][41694] Fps is (10 sec: 7373.0, 60 sec: 6758.4, 300 sec: 6831.3). Total num frames: 134881280. Throughput: 0: 1713.9. Samples: 28714966. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:56:22,934][41694] Avg episode reward: [(0, '4.477')] +[2024-11-08 03:56:26,402][42004] Updated weights for policy 0, policy_version 32936 (0.0037) +[2024-11-08 03:56:27,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6963.2, 300 sec: 6817.4). Total num frames: 134914048. Throughput: 0: 1678.1. Samples: 28724996. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:56:27,935][41694] Avg episode reward: [(0, '4.722')] +[2024-11-08 03:56:34,437][41694] Fps is (10 sec: 5340.2, 60 sec: 6659.6, 300 sec: 6755.2). Total num frames: 134942720. Throughput: 0: 1606.5. Samples: 28729916. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:56:34,438][41694] Avg episode reward: [(0, '4.429')] +[2024-11-08 03:56:34,483][42004] Updated weights for policy 0, policy_version 32946 (0.0025) +[2024-11-08 03:56:37,932][41694] Fps is (10 sec: 5324.7, 60 sec: 6553.6, 300 sec: 6748.0). Total num frames: 134967296. Throughput: 0: 1562.9. Samples: 28736696. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:56:37,933][41694] Avg episode reward: [(0, '4.312')] +[2024-11-08 03:56:40,215][42004] Updated weights for policy 0, policy_version 32956 (0.0037) +[2024-11-08 03:56:42,931][41694] Fps is (10 sec: 7232.7, 60 sec: 6553.6, 300 sec: 6734.1). Total num frames: 135004160. Throughput: 0: 1663.5. Samples: 28747736. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:56:42,933][41694] Avg episode reward: [(0, '4.504')] +[2024-11-08 03:56:45,591][42004] Updated weights for policy 0, policy_version 32966 (0.0032) +[2024-11-08 03:56:47,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6553.6, 300 sec: 6748.0). Total num frames: 135045120. Throughput: 0: 1682.8. Samples: 28753494. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:56:47,933][41694] Avg episode reward: [(0, '4.343')] +[2024-11-08 03:56:51,226][42004] Updated weights for policy 0, policy_version 32976 (0.0031) +[2024-11-08 03:56:52,932][41694] Fps is (10 sec: 7372.1, 60 sec: 6553.5, 300 sec: 6792.8). Total num frames: 135077888. Throughput: 0: 1705.9. Samples: 28764796. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:56:52,934][41694] Avg episode reward: [(0, '4.584')] +[2024-11-08 03:56:56,795][42004] Updated weights for policy 0, policy_version 32986 (0.0025) +[2024-11-08 03:56:57,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6927.2, 300 sec: 6831.3). Total num frames: 135118848. Throughput: 0: 1715.1. Samples: 28775676. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:56:57,933][41694] Avg episode reward: [(0, '4.516')] +[2024-11-08 03:57:02,932][41694] Fps is (10 sec: 6963.4, 60 sec: 6826.9, 300 sec: 6803.5). Total num frames: 135147520. Throughput: 0: 1702.5. Samples: 28780486. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 03:57:02,938][41694] Avg episode reward: [(0, '4.374')] +[2024-11-08 03:57:03,195][42004] Updated weights for policy 0, policy_version 32996 (0.0037) +[2024-11-08 03:57:08,902][41694] Fps is (10 sec: 4853.6, 60 sec: 6516.5, 300 sec: 6725.9). Total num frames: 135172096. Throughput: 0: 1635.1. Samples: 28790134. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 03:57:08,908][41694] Avg episode reward: [(0, '4.514')] +[2024-11-08 03:57:11,229][42004] Updated weights for policy 0, policy_version 33006 (0.0038) +[2024-11-08 03:57:12,932][41694] Fps is (10 sec: 5325.1, 60 sec: 6553.6, 300 sec: 6720.2). Total num frames: 135200768. Throughput: 0: 1600.4. Samples: 28797016. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 03:57:12,934][41694] Avg episode reward: [(0, '4.550')] +[2024-11-08 03:57:16,910][42004] Updated weights for policy 0, policy_version 33016 (0.0030) +[2024-11-08 03:57:17,931][41694] Fps is (10 sec: 7258.2, 60 sec: 6553.6, 300 sec: 6720.2). Total num frames: 135237632. Throughput: 0: 1666.4. Samples: 28802396. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 03:57:17,933][41694] Avg episode reward: [(0, '4.305')] +[2024-11-08 03:57:22,099][42004] Updated weights for policy 0, policy_version 33026 (0.0020) +[2024-11-08 03:57:22,932][41694] Fps is (10 sec: 7782.3, 60 sec: 6621.8, 300 sec: 6734.1). Total num frames: 135278592. Throughput: 0: 1720.6. Samples: 28814122. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:57:22,934][41694] Avg episode reward: [(0, '4.426')] +[2024-11-08 03:57:27,245][42004] Updated weights for policy 0, policy_version 33036 (0.0040) +[2024-11-08 03:57:27,931][41694] Fps is (10 sec: 8191.9, 60 sec: 6758.4, 300 sec: 6805.3). Total num frames: 135319552. Throughput: 0: 1739.6. Samples: 28826016. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:57:27,933][41694] Avg episode reward: [(0, '4.418')] +[2024-11-08 03:57:32,574][42004] Updated weights for policy 0, policy_version 33046 (0.0028) +[2024-11-08 03:57:32,932][41694] Fps is (10 sec: 7782.5, 60 sec: 7072.3, 300 sec: 6817.4). Total num frames: 135356416. Throughput: 0: 1736.0. Samples: 28831612. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:57:32,934][41694] Avg episode reward: [(0, '4.434')] +[2024-11-08 03:57:37,932][41694] Fps is (10 sec: 6963.2, 60 sec: 7031.5, 300 sec: 6789.6). Total num frames: 135389184. Throughput: 0: 1717.4. Samples: 28842078. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:57:37,935][41694] Avg episode reward: [(0, '4.338')] +[2024-11-08 03:57:37,954][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000033054_135389184.pth... +[2024-11-08 03:57:38,103][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000032657_133763072.pth +[2024-11-08 03:57:38,832][42004] Updated weights for policy 0, policy_version 33056 (0.0032) +[2024-11-08 03:57:43,466][41694] Fps is (10 sec: 5443.4, 60 sec: 6766.4, 300 sec: 6735.8). Total num frames: 135413760. Throughput: 0: 1570.9. Samples: 28847208. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:57:43,469][41694] Avg episode reward: [(0, '4.373')] +[2024-11-08 03:57:46,987][42004] Updated weights for policy 0, policy_version 33066 (0.0040) +[2024-11-08 03:57:47,934][41694] Fps is (10 sec: 5323.7, 60 sec: 6621.6, 300 sec: 6720.2). Total num frames: 135442432. Throughput: 0: 1629.7. Samples: 28853824. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:57:47,936][41694] Avg episode reward: [(0, '4.311')] +[2024-11-08 03:57:52,932][41694] Fps is (10 sec: 6491.1, 60 sec: 6622.0, 300 sec: 6720.2). Total num frames: 135475200. Throughput: 0: 1672.2. Samples: 28863760. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:57:52,933][41694] Avg episode reward: [(0, '4.513')] +[2024-11-08 03:57:52,948][42004] Updated weights for policy 0, policy_version 33076 (0.0033) +[2024-11-08 03:57:57,931][41694] Fps is (10 sec: 7374.4, 60 sec: 6621.9, 300 sec: 6734.1). Total num frames: 135516160. Throughput: 0: 1738.6. Samples: 28875254. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:57:57,933][41694] Avg episode reward: [(0, '4.538')] +[2024-11-08 03:57:58,238][42004] Updated weights for policy 0, policy_version 33086 (0.0036) +[2024-11-08 03:58:02,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6758.5, 300 sec: 6804.3). Total num frames: 135553024. Throughput: 0: 1743.7. Samples: 28880864. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:58:02,933][41694] Avg episode reward: [(0, '4.574')] +[2024-11-08 03:58:03,715][42004] Updated weights for policy 0, policy_version 33096 (0.0030) +[2024-11-08 03:58:07,933][41694] Fps is (10 sec: 7371.9, 60 sec: 7077.6, 300 sec: 6817.4). Total num frames: 135589888. Throughput: 0: 1728.2. Samples: 28891894. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:58:07,935][41694] Avg episode reward: [(0, '4.503')] +[2024-11-08 03:58:09,759][42004] Updated weights for policy 0, policy_version 33106 (0.0028) +[2024-11-08 03:58:12,931][41694] Fps is (10 sec: 6963.3, 60 sec: 7031.5, 300 sec: 6803.5). Total num frames: 135622656. Throughput: 0: 1686.8. Samples: 28901922. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:58:12,934][41694] Avg episode reward: [(0, '4.524')] +[2024-11-08 03:58:17,852][42004] Updated weights for policy 0, policy_version 33116 (0.0029) +[2024-11-08 03:58:17,931][41694] Fps is (10 sec: 5325.4, 60 sec: 6758.4, 300 sec: 6748.0). Total num frames: 135643136. Throughput: 0: 1680.0. Samples: 28907212. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:58:17,933][41694] Avg episode reward: [(0, '4.502')] +[2024-11-08 03:58:22,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6690.2, 300 sec: 6748.0). Total num frames: 135680000. Throughput: 0: 1603.7. Samples: 28914244. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:58:22,933][41694] Avg episode reward: [(0, '4.691')] +[2024-11-08 03:58:23,364][42004] Updated weights for policy 0, policy_version 33126 (0.0024) +[2024-11-08 03:58:27,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6621.9, 300 sec: 6734.1). Total num frames: 135716864. Throughput: 0: 1763.6. Samples: 28925628. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:58:27,934][41694] Avg episode reward: [(0, '4.374')] +[2024-11-08 03:58:28,850][42004] Updated weights for policy 0, policy_version 33136 (0.0031) +[2024-11-08 03:58:32,931][41694] Fps is (10 sec: 7782.5, 60 sec: 6690.2, 300 sec: 6761.9). Total num frames: 135757824. Throughput: 0: 1716.4. Samples: 28931060. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:58:32,933][41694] Avg episode reward: [(0, '4.358')] +[2024-11-08 03:58:33,980][42004] Updated weights for policy 0, policy_version 33146 (0.0024) +[2024-11-08 03:58:37,931][41694] Fps is (10 sec: 7782.5, 60 sec: 6758.4, 300 sec: 6817.4). Total num frames: 135794688. Throughput: 0: 1757.1. Samples: 28942828. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:58:37,933][41694] Avg episode reward: [(0, '4.381')] +[2024-11-08 03:58:39,516][42004] Updated weights for policy 0, policy_version 33156 (0.0026) +[2024-11-08 03:58:42,935][41694] Fps is (10 sec: 6960.9, 60 sec: 6956.6, 300 sec: 6817.4). Total num frames: 135827456. Throughput: 0: 1736.4. Samples: 28953398. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:58:42,937][41694] Avg episode reward: [(0, '4.542')] +[2024-11-08 03:58:45,427][42004] Updated weights for policy 0, policy_version 33166 (0.0034) +[2024-11-08 03:58:47,931][41694] Fps is (10 sec: 6963.2, 60 sec: 7031.7, 300 sec: 6817.4). Total num frames: 135864320. Throughput: 0: 1729.1. Samples: 28958674. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:58:47,933][41694] Avg episode reward: [(0, '4.619')] +[2024-11-08 03:58:52,932][41694] Fps is (10 sec: 5326.4, 60 sec: 6758.4, 300 sec: 6748.0). Total num frames: 135880704. Throughput: 0: 1692.0. Samples: 28968032. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:58:52,933][41694] Avg episode reward: [(0, '4.670')] +[2024-11-08 03:58:53,688][42004] Updated weights for policy 0, policy_version 33176 (0.0029) +[2024-11-08 03:58:57,932][41694] Fps is (10 sec: 5324.7, 60 sec: 6690.1, 300 sec: 6748.0). Total num frames: 135917568. Throughput: 0: 1637.7. Samples: 28975618. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:58:57,933][41694] Avg episode reward: [(0, '4.465')] +[2024-11-08 03:58:59,340][42004] Updated weights for policy 0, policy_version 33186 (0.0046) +[2024-11-08 03:59:02,932][41694] Fps is (10 sec: 7372.9, 60 sec: 6690.1, 300 sec: 6734.1). Total num frames: 135954432. Throughput: 0: 1642.6. Samples: 28981130. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:59:02,933][41694] Avg episode reward: [(0, '4.528')] +[2024-11-08 03:59:04,878][42004] Updated weights for policy 0, policy_version 33196 (0.0029) +[2024-11-08 03:59:07,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6690.2, 300 sec: 6748.0). Total num frames: 135991296. Throughput: 0: 1735.5. Samples: 28992342. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:59:07,935][41694] Avg episode reward: [(0, '4.600')] +[2024-11-08 03:59:10,344][42004] Updated weights for policy 0, policy_version 33206 (0.0028) +[2024-11-08 03:59:12,931][41694] Fps is (10 sec: 7782.5, 60 sec: 6826.7, 300 sec: 6817.4). Total num frames: 136032256. Throughput: 0: 1738.2. Samples: 29003846. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:59:12,936][41694] Avg episode reward: [(0, '4.516')] +[2024-11-08 03:59:16,079][42004] Updated weights for policy 0, policy_version 33216 (0.0047) +[2024-11-08 03:59:17,932][41694] Fps is (10 sec: 7372.9, 60 sec: 7031.5, 300 sec: 6803.5). Total num frames: 136065024. Throughput: 0: 1731.1. Samples: 29008960. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:59:17,935][41694] Avg episode reward: [(0, '4.459')] +[2024-11-08 03:59:21,844][42004] Updated weights for policy 0, policy_version 33226 (0.0032) +[2024-11-08 03:59:22,932][41694] Fps is (10 sec: 6963.1, 60 sec: 7031.5, 300 sec: 6803.5). Total num frames: 136101888. Throughput: 0: 1704.7. Samples: 29019542. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:59:22,934][41694] Avg episode reward: [(0, '4.331')] +[2024-11-08 03:59:27,932][41694] Fps is (10 sec: 5324.8, 60 sec: 6690.1, 300 sec: 6720.2). Total num frames: 136118272. Throughput: 0: 1634.7. Samples: 29026956. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:59:27,934][41694] Avg episode reward: [(0, '4.488')] +[2024-11-08 03:59:29,842][42004] Updated weights for policy 0, policy_version 33236 (0.0026) +[2024-11-08 03:59:32,932][41694] Fps is (10 sec: 5324.8, 60 sec: 6621.9, 300 sec: 6720.2). Total num frames: 136155136. Throughput: 0: 1619.8. Samples: 29031566. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:59:32,934][41694] Avg episode reward: [(0, '4.633')] +[2024-11-08 03:59:35,372][42004] Updated weights for policy 0, policy_version 33246 (0.0032) +[2024-11-08 03:59:37,932][41694] Fps is (10 sec: 7372.9, 60 sec: 6621.9, 300 sec: 6720.2). Total num frames: 136192000. Throughput: 0: 1663.1. Samples: 29042870. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:59:37,933][41694] Avg episode reward: [(0, '4.516')] +[2024-11-08 03:59:37,953][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000033251_136196096.pth... +[2024-11-08 03:59:38,078][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000032855_134574080.pth +[2024-11-08 03:59:40,688][42004] Updated weights for policy 0, policy_version 33256 (0.0026) +[2024-11-08 03:59:42,931][41694] Fps is (10 sec: 7782.5, 60 sec: 6758.8, 300 sec: 6734.1). Total num frames: 136232960. Throughput: 0: 1750.3. Samples: 29054380. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:59:42,933][41694] Avg episode reward: [(0, '4.484')] +[2024-11-08 03:59:46,093][42004] Updated weights for policy 0, policy_version 33266 (0.0034) +[2024-11-08 03:59:47,931][41694] Fps is (10 sec: 7782.5, 60 sec: 6758.4, 300 sec: 6789.6). Total num frames: 136269824. Throughput: 0: 1747.1. Samples: 29059748. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:59:47,934][41694] Avg episode reward: [(0, '4.433')] +[2024-11-08 03:59:52,367][42004] Updated weights for policy 0, policy_version 33276 (0.0030) +[2024-11-08 03:59:52,932][41694] Fps is (10 sec: 6553.5, 60 sec: 6963.2, 300 sec: 6761.9). Total num frames: 136298496. Throughput: 0: 1740.3. Samples: 29070654. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 03:59:52,934][41694] Avg episode reward: [(0, '4.373')] +[2024-11-08 03:59:57,932][41694] Fps is (10 sec: 6553.4, 60 sec: 6963.2, 300 sec: 6775.8). Total num frames: 136335360. Throughput: 0: 1716.4. Samples: 29081086. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 03:59:57,933][41694] Avg episode reward: [(0, '4.560')] +[2024-11-08 03:59:58,056][42004] Updated weights for policy 0, policy_version 33286 (0.0027) +[2024-11-08 04:00:02,932][41694] Fps is (10 sec: 6553.6, 60 sec: 6826.7, 300 sec: 6748.0). Total num frames: 136364032. Throughput: 0: 1697.6. Samples: 29085354. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:00:02,941][41694] Avg episode reward: [(0, '4.619')] +[2024-11-08 04:00:04,824][42004] Updated weights for policy 0, policy_version 33296 (0.0029) +[2024-11-08 04:00:07,932][41694] Fps is (10 sec: 6963.0, 60 sec: 6894.9, 300 sec: 6761.9). Total num frames: 136404992. Throughput: 0: 1677.5. Samples: 29095028. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:00:07,934][41694] Avg episode reward: [(0, '4.384')] +[2024-11-08 04:00:10,295][42004] Updated weights for policy 0, policy_version 33306 (0.0025) +[2024-11-08 04:00:12,932][41694] Fps is (10 sec: 7372.3, 60 sec: 6758.3, 300 sec: 6761.9). Total num frames: 136437760. Throughput: 0: 1743.7. Samples: 29105422. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:00:12,935][41694] Avg episode reward: [(0, '4.581')] +[2024-11-08 04:00:18,107][41694] Fps is (10 sec: 4025.5, 60 sec: 6330.3, 300 sec: 6674.6). Total num frames: 136445952. Throughput: 0: 1634.9. Samples: 29105422. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:00:18,110][41694] Avg episode reward: [(0, '4.492')] +[2024-11-08 04:00:20,434][42004] Updated weights for policy 0, policy_version 33316 (0.0058) +[2024-11-08 04:00:22,932][41694] Fps is (10 sec: 3686.4, 60 sec: 6212.2, 300 sec: 6706.3). Total num frames: 136474624. Throughput: 0: 1581.0. Samples: 29114018. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:00:22,935][41694] Avg episode reward: [(0, '4.446')] +[2024-11-08 04:00:27,272][42004] Updated weights for policy 0, policy_version 33326 (0.0040) +[2024-11-08 04:00:27,932][41694] Fps is (10 sec: 5836.4, 60 sec: 6417.0, 300 sec: 6678.5). Total num frames: 136503296. Throughput: 0: 1521.6. Samples: 29122854. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:00:27,937][41694] Avg episode reward: [(0, '4.196')] +[2024-11-08 04:00:32,933][41694] Fps is (10 sec: 6143.7, 60 sec: 6348.7, 300 sec: 6650.8). Total num frames: 136536064. Throughput: 0: 1501.5. Samples: 29127316. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:00:32,935][41694] Avg episode reward: [(0, '4.255')] +[2024-11-08 04:00:33,679][42004] Updated weights for policy 0, policy_version 33336 (0.0037) +[2024-11-08 04:00:37,932][41694] Fps is (10 sec: 6963.8, 60 sec: 6348.8, 300 sec: 6650.8). Total num frames: 136572928. Throughput: 0: 1492.4. Samples: 29137812. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:00:37,933][41694] Avg episode reward: [(0, '4.421')] +[2024-11-08 04:00:39,021][42004] Updated weights for policy 0, policy_version 33346 (0.0032) +[2024-11-08 04:00:42,932][41694] Fps is (10 sec: 7783.0, 60 sec: 6348.7, 300 sec: 6650.8). Total num frames: 136613888. Throughput: 0: 1519.5. Samples: 29149466. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:00:42,934][41694] Avg episode reward: [(0, '4.358')] +[2024-11-08 04:00:44,431][42004] Updated weights for policy 0, policy_version 33356 (0.0023) +[2024-11-08 04:00:47,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6348.8, 300 sec: 6664.7). Total num frames: 136650752. Throughput: 0: 1554.5. Samples: 29155306. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:00:47,934][41694] Avg episode reward: [(0, '4.584')] +[2024-11-08 04:00:49,659][42004] Updated weights for policy 0, policy_version 33366 (0.0022) +[2024-11-08 04:00:52,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6485.3, 300 sec: 6725.4). Total num frames: 136687616. Throughput: 0: 1596.1. Samples: 29166852. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:00:52,933][41694] Avg episode reward: [(0, '4.514')] +[2024-11-08 04:00:54,981][42004] Updated weights for policy 0, policy_version 33376 (0.0031) +[2024-11-08 04:00:57,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6485.3, 300 sec: 6734.2). Total num frames: 136724480. Throughput: 0: 1606.2. Samples: 29177698. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:00:57,933][41694] Avg episode reward: [(0, '4.370')] +[2024-11-08 04:01:01,086][42004] Updated weights for policy 0, policy_version 33386 (0.0030) +[2024-11-08 04:01:02,932][41694] Fps is (10 sec: 7373.1, 60 sec: 6621.9, 300 sec: 6734.1). Total num frames: 136761344. Throughput: 0: 1729.7. Samples: 29182954. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:01:02,934][41694] Avg episode reward: [(0, '4.668')] +[2024-11-08 04:01:06,633][42004] Updated weights for policy 0, policy_version 33396 (0.0048) +[2024-11-08 04:01:07,932][41694] Fps is (10 sec: 7372.6, 60 sec: 6553.6, 300 sec: 6748.0). Total num frames: 136798208. Throughput: 0: 1773.6. Samples: 29193828. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:01:07,935][41694] Avg episode reward: [(0, '4.496')] +[2024-11-08 04:01:12,932][41694] Fps is (10 sec: 6553.6, 60 sec: 6485.4, 300 sec: 6720.2). Total num frames: 136826880. Throughput: 0: 1784.9. Samples: 29203174. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:01:12,934][41694] Avg episode reward: [(0, '4.408')] +[2024-11-08 04:01:13,271][42004] Updated weights for policy 0, policy_version 33406 (0.0034) +[2024-11-08 04:01:17,931][41694] Fps is (10 sec: 6553.9, 60 sec: 6983.6, 300 sec: 6720.2). Total num frames: 136863744. Throughput: 0: 1794.8. Samples: 29208078. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:01:17,933][41694] Avg episode reward: [(0, '4.613')] +[2024-11-08 04:01:18,998][42004] Updated weights for policy 0, policy_version 33416 (0.0041) +[2024-11-08 04:01:23,949][41694] Fps is (10 sec: 6320.5, 60 sec: 6914.3, 300 sec: 6697.1). Total num frames: 136896512. Throughput: 0: 1761.8. Samples: 29218886. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:01:23,951][41694] Avg episode reward: [(0, '4.540')] +[2024-11-08 04:01:26,190][42004] Updated weights for policy 0, policy_version 33426 (0.0033) +[2024-11-08 04:01:27,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6963.3, 300 sec: 6740.7). Total num frames: 136921088. Throughput: 0: 1728.2. Samples: 29227236. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:01:27,934][41694] Avg episode reward: [(0, '4.458')] +[2024-11-08 04:01:32,366][42004] Updated weights for policy 0, policy_version 33436 (0.0031) +[2024-11-08 04:01:32,931][41694] Fps is (10 sec: 6383.6, 60 sec: 6963.3, 300 sec: 6734.1). Total num frames: 136953856. Throughput: 0: 1710.8. Samples: 29232294. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:01:32,934][41694] Avg episode reward: [(0, '4.241')] +[2024-11-08 04:01:37,866][42004] Updated weights for policy 0, policy_version 33446 (0.0031) +[2024-11-08 04:01:37,932][41694] Fps is (10 sec: 7372.8, 60 sec: 7031.4, 300 sec: 6748.0). Total num frames: 136994816. Throughput: 0: 1684.4. Samples: 29242648. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:01:37,933][41694] Avg episode reward: [(0, '4.474')] +[2024-11-08 04:01:37,944][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000033446_136994816.pth... +[2024-11-08 04:01:38,062][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000033054_135389184.pth +[2024-11-08 04:01:42,931][41694] Fps is (10 sec: 7782.4, 60 sec: 6963.3, 300 sec: 6734.1). Total num frames: 137031680. Throughput: 0: 1704.3. Samples: 29254392. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:01:42,933][41694] Avg episode reward: [(0, '4.534')] +[2024-11-08 04:01:43,056][42004] Updated weights for policy 0, policy_version 33456 (0.0028) +[2024-11-08 04:01:47,931][41694] Fps is (10 sec: 7782.6, 60 sec: 7031.5, 300 sec: 6761.9). Total num frames: 137072640. Throughput: 0: 1714.1. Samples: 29260090. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:01:47,933][41694] Avg episode reward: [(0, '4.527')] +[2024-11-08 04:01:48,331][42004] Updated weights for policy 0, policy_version 33466 (0.0040) +[2024-11-08 04:01:52,931][41694] Fps is (10 sec: 7782.4, 60 sec: 7031.5, 300 sec: 6748.0). Total num frames: 137109504. Throughput: 0: 1729.8. Samples: 29271668. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:01:52,935][41694] Avg episode reward: [(0, '4.523')] +[2024-11-08 04:01:53,795][42004] Updated weights for policy 0, policy_version 33476 (0.0022) +[2024-11-08 04:01:57,932][41694] Fps is (10 sec: 6143.9, 60 sec: 6826.7, 300 sec: 6734.1). Total num frames: 137134080. Throughput: 0: 1727.3. Samples: 29280902. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:01:57,934][41694] Avg episode reward: [(0, '4.650')] +[2024-11-08 04:02:01,232][42004] Updated weights for policy 0, policy_version 33486 (0.0023) +[2024-11-08 04:02:02,938][41694] Fps is (10 sec: 5730.7, 60 sec: 6757.7, 300 sec: 6784.0). Total num frames: 137166848. Throughput: 0: 1709.8. Samples: 29285030. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:02:02,940][41694] Avg episode reward: [(0, '4.573')] +[2024-11-08 04:02:07,629][42004] Updated weights for policy 0, policy_version 33496 (0.0038) +[2024-11-08 04:02:07,932][41694] Fps is (10 sec: 6553.7, 60 sec: 6690.2, 300 sec: 6775.8). Total num frames: 137199616. Throughput: 0: 1718.3. Samples: 29294462. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:02:07,933][41694] Avg episode reward: [(0, '4.567')] +[2024-11-08 04:02:12,932][41694] Fps is (10 sec: 6967.7, 60 sec: 6826.7, 300 sec: 6775.8). Total num frames: 137236480. Throughput: 0: 1739.0. Samples: 29305490. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:02:12,933][41694] Avg episode reward: [(0, '4.596')] +[2024-11-08 04:02:13,059][42004] Updated weights for policy 0, policy_version 33506 (0.0024) +[2024-11-08 04:02:17,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6894.9, 300 sec: 6775.8). Total num frames: 137277440. Throughput: 0: 1755.6. Samples: 29311294. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:02:17,933][41694] Avg episode reward: [(0, '4.474')] +[2024-11-08 04:02:18,341][42004] Updated weights for policy 0, policy_version 33516 (0.0027) +[2024-11-08 04:02:22,932][41694] Fps is (10 sec: 7782.4, 60 sec: 7083.3, 300 sec: 6761.9). Total num frames: 137314304. Throughput: 0: 1787.5. Samples: 29323086. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:02:22,933][41694] Avg episode reward: [(0, '4.378')] +[2024-11-08 04:02:23,616][42004] Updated weights for policy 0, policy_version 33526 (0.0026) +[2024-11-08 04:02:27,931][41694] Fps is (10 sec: 7372.9, 60 sec: 7168.0, 300 sec: 6761.9). Total num frames: 137351168. Throughput: 0: 1772.3. Samples: 29334144. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:02:27,933][41694] Avg episode reward: [(0, '4.385')] +[2024-11-08 04:02:29,422][42004] Updated weights for policy 0, policy_version 33536 (0.0033) +[2024-11-08 04:02:32,931][41694] Fps is (10 sec: 6144.0, 60 sec: 7031.5, 300 sec: 6734.1). Total num frames: 137375744. Throughput: 0: 1763.0. Samples: 29339426. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:02:32,933][41694] Avg episode reward: [(0, '4.334')] +[2024-11-08 04:02:36,698][42004] Updated weights for policy 0, policy_version 33546 (0.0032) +[2024-11-08 04:02:37,932][41694] Fps is (10 sec: 5734.2, 60 sec: 6894.9, 300 sec: 6774.1). Total num frames: 137408512. Throughput: 0: 1682.8. Samples: 29347394. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:02:37,934][41694] Avg episode reward: [(0, '4.342')] +[2024-11-08 04:02:42,891][42004] Updated weights for policy 0, policy_version 33556 (0.0028) +[2024-11-08 04:02:42,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6894.9, 300 sec: 6789.7). Total num frames: 137445376. Throughput: 0: 1697.1. Samples: 29357272. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:02:42,933][41694] Avg episode reward: [(0, '4.441')] +[2024-11-08 04:02:47,932][41694] Fps is (10 sec: 7373.0, 60 sec: 6826.7, 300 sec: 6803.5). Total num frames: 137482240. Throughput: 0: 1729.0. Samples: 29362826. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:02:47,935][41694] Avg episode reward: [(0, '4.546')] +[2024-11-08 04:02:48,312][42004] Updated weights for policy 0, policy_version 33566 (0.0026) +[2024-11-08 04:02:52,933][41694] Fps is (10 sec: 7371.8, 60 sec: 6826.5, 300 sec: 6789.6). Total num frames: 137519104. Throughput: 0: 1775.0. Samples: 29374340. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:02:52,935][41694] Avg episode reward: [(0, '4.306')] +[2024-11-08 04:02:53,698][42004] Updated weights for policy 0, policy_version 33576 (0.0022) +[2024-11-08 04:02:57,932][41694] Fps is (10 sec: 7372.8, 60 sec: 7031.5, 300 sec: 6789.6). Total num frames: 137555968. Throughput: 0: 1771.9. Samples: 29385226. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:02:57,933][41694] Avg episode reward: [(0, '4.637')] +[2024-11-08 04:02:59,610][42004] Updated weights for policy 0, policy_version 33586 (0.0028) +[2024-11-08 04:03:02,932][41694] Fps is (10 sec: 6964.0, 60 sec: 7032.2, 300 sec: 6775.8). Total num frames: 137588736. Throughput: 0: 1753.3. Samples: 29390192. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:03:02,933][41694] Avg episode reward: [(0, '4.720')] +[2024-11-08 04:03:07,475][42004] Updated weights for policy 0, policy_version 33596 (0.0034) +[2024-11-08 04:03:07,944][41694] Fps is (10 sec: 5318.2, 60 sec: 6825.2, 300 sec: 6733.8). Total num frames: 137609216. Throughput: 0: 1645.5. Samples: 29397152. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:03:07,946][41694] Avg episode reward: [(0, '4.688')] +[2024-11-08 04:03:12,932][41694] Fps is (10 sec: 5324.8, 60 sec: 6758.4, 300 sec: 6775.8). Total num frames: 137641984. Throughput: 0: 1614.4. Samples: 29406792. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:03:12,933][41694] Avg episode reward: [(0, '4.475')] +[2024-11-08 04:03:14,085][42004] Updated weights for policy 0, policy_version 33606 (0.0034) +[2024-11-08 04:03:17,931][41694] Fps is (10 sec: 6561.8, 60 sec: 6621.9, 300 sec: 6761.9). Total num frames: 137674752. Throughput: 0: 1607.0. Samples: 29411740. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:03:17,933][41694] Avg episode reward: [(0, '4.439')] +[2024-11-08 04:03:19,757][42004] Updated weights for policy 0, policy_version 33616 (0.0026) +[2024-11-08 04:03:22,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6690.1, 300 sec: 6775.8). Total num frames: 137715712. Throughput: 0: 1677.0. Samples: 29422858. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:03:22,934][41694] Avg episode reward: [(0, '4.438')] +[2024-11-08 04:03:25,178][42004] Updated weights for policy 0, policy_version 33626 (0.0027) +[2024-11-08 04:03:27,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6621.9, 300 sec: 6748.0). Total num frames: 137748480. Throughput: 0: 1706.5. Samples: 29434066. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:03:27,934][41694] Avg episode reward: [(0, '4.456')] +[2024-11-08 04:03:30,725][42004] Updated weights for policy 0, policy_version 33636 (0.0030) +[2024-11-08 04:03:32,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6894.9, 300 sec: 6761.9). Total num frames: 137789440. Throughput: 0: 1702.3. Samples: 29439428. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:03:32,933][41694] Avg episode reward: [(0, '4.309')] +[2024-11-08 04:03:36,510][42004] Updated weights for policy 0, policy_version 33646 (0.0025) +[2024-11-08 04:03:39,442][41694] Fps is (10 sec: 6405.4, 60 sec: 6725.7, 300 sec: 6727.5). Total num frames: 137822208. Throughput: 0: 1629.9. Samples: 29450144. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:03:39,445][41694] Avg episode reward: [(0, '4.396')] +[2024-11-08 04:03:39,461][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000033648_137822208.pth... +[2024-11-08 04:03:39,604][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000033251_136196096.pth +[2024-11-08 04:03:42,931][41694] Fps is (10 sec: 5324.8, 60 sec: 6621.9, 300 sec: 6706.3). Total num frames: 137842688. Throughput: 0: 1600.0. Samples: 29457226. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:03:42,933][41694] Avg episode reward: [(0, '4.390')] +[2024-11-08 04:03:44,440][42004] Updated weights for policy 0, policy_version 33656 (0.0031) +[2024-11-08 04:03:47,932][41694] Fps is (10 sec: 6272.0, 60 sec: 6553.6, 300 sec: 6761.9). Total num frames: 137875456. Throughput: 0: 1599.2. Samples: 29462154. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:03:47,936][41694] Avg episode reward: [(0, '4.485')] +[2024-11-08 04:03:50,738][42004] Updated weights for policy 0, policy_version 33666 (0.0026) +[2024-11-08 04:03:52,931][41694] Fps is (10 sec: 6553.6, 60 sec: 6485.5, 300 sec: 6748.0). Total num frames: 137908224. Throughput: 0: 1665.8. Samples: 29472092. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:03:52,936][41694] Avg episode reward: [(0, '4.572')] +[2024-11-08 04:03:56,904][42004] Updated weights for policy 0, policy_version 33676 (0.0029) +[2024-11-08 04:03:57,931][41694] Fps is (10 sec: 6553.7, 60 sec: 6417.1, 300 sec: 6734.1). Total num frames: 137940992. Throughput: 0: 1673.2. Samples: 29482086. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:03:57,934][41694] Avg episode reward: [(0, '4.551')] +[2024-11-08 04:04:02,448][42004] Updated weights for policy 0, policy_version 33686 (0.0051) +[2024-11-08 04:04:02,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6485.3, 300 sec: 6734.1). Total num frames: 137977856. Throughput: 0: 1690.2. Samples: 29487800. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:04:02,933][41694] Avg episode reward: [(0, '4.531')] +[2024-11-08 04:04:07,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6759.8, 300 sec: 6720.2). Total num frames: 138014720. Throughput: 0: 1667.2. Samples: 29497882. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:04:07,933][41694] Avg episode reward: [(0, '4.505')] +[2024-11-08 04:04:08,388][42004] Updated weights for policy 0, policy_version 33696 (0.0041) +[2024-11-08 04:04:13,416][41694] Fps is (10 sec: 5859.9, 60 sec: 6568.8, 300 sec: 6681.5). Total num frames: 138039296. Throughput: 0: 1521.3. Samples: 29503262. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:04:13,418][41694] Avg episode reward: [(0, '4.351')] +[2024-11-08 04:04:16,400][42004] Updated weights for policy 0, policy_version 33706 (0.0029) +[2024-11-08 04:04:17,932][41694] Fps is (10 sec: 5324.3, 60 sec: 6553.5, 300 sec: 6664.7). Total num frames: 138067968. Throughput: 0: 1574.2. Samples: 29510270. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:04:17,935][41694] Avg episode reward: [(0, '4.535')] +[2024-11-08 04:04:22,534][42004] Updated weights for policy 0, policy_version 33716 (0.0030) +[2024-11-08 04:04:22,932][41694] Fps is (10 sec: 6456.8, 60 sec: 6417.0, 300 sec: 6720.2). Total num frames: 138100736. Throughput: 0: 1603.7. Samples: 29519890. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:04:22,933][41694] Avg episode reward: [(0, '4.426')] +[2024-11-08 04:04:27,932][41694] Fps is (10 sec: 6963.7, 60 sec: 6485.3, 300 sec: 6720.2). Total num frames: 138137600. Throughput: 0: 1633.2. Samples: 29530720. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:04:27,934][41694] Avg episode reward: [(0, '4.412')] +[2024-11-08 04:04:28,142][42004] Updated weights for policy 0, policy_version 33726 (0.0026) +[2024-11-08 04:04:32,933][41694] Fps is (10 sec: 7781.9, 60 sec: 6485.2, 300 sec: 6734.1). Total num frames: 138178560. Throughput: 0: 1648.1. Samples: 29536318. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:04:32,937][41694] Avg episode reward: [(0, '4.574')] +[2024-11-08 04:04:33,420][42004] Updated weights for policy 0, policy_version 33736 (0.0025) +[2024-11-08 04:04:37,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6722.8, 300 sec: 6720.2). Total num frames: 138215424. Throughput: 0: 1691.8. Samples: 29548224. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:04:37,934][41694] Avg episode reward: [(0, '4.475')] +[2024-11-08 04:04:38,699][42004] Updated weights for policy 0, policy_version 33746 (0.0024) +[2024-11-08 04:04:42,932][41694] Fps is (10 sec: 7373.3, 60 sec: 6826.7, 300 sec: 6720.2). Total num frames: 138252288. Throughput: 0: 1715.4. Samples: 29559278. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:04:42,933][41694] Avg episode reward: [(0, '4.319')] +[2024-11-08 04:04:44,363][42004] Updated weights for policy 0, policy_version 33756 (0.0031) +[2024-11-08 04:04:47,932][41694] Fps is (10 sec: 6144.0, 60 sec: 6690.1, 300 sec: 6706.3). Total num frames: 138276864. Throughput: 0: 1708.6. Samples: 29564686. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:04:47,935][41694] Avg episode reward: [(0, '4.365')] +[2024-11-08 04:04:52,092][42004] Updated weights for policy 0, policy_version 33766 (0.0049) +[2024-11-08 04:04:52,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6690.1, 300 sec: 6692.5). Total num frames: 138309632. Throughput: 0: 1649.5. Samples: 29572110. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:04:52,934][41694] Avg episode reward: [(0, '4.430')] +[2024-11-08 04:04:57,932][41694] Fps is (10 sec: 6553.6, 60 sec: 6690.1, 300 sec: 6706.3). Total num frames: 138342400. Throughput: 0: 1770.5. Samples: 29582076. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:04:57,933][41694] Avg episode reward: [(0, '4.497')] +[2024-11-08 04:04:58,113][42004] Updated weights for policy 0, policy_version 33776 (0.0031) +[2024-11-08 04:05:02,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6690.1, 300 sec: 6692.5). Total num frames: 138379264. Throughput: 0: 1725.5. Samples: 29587916. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:05:02,933][41694] Avg episode reward: [(0, '4.406')] +[2024-11-08 04:05:03,512][42004] Updated weights for policy 0, policy_version 33786 (0.0038) +[2024-11-08 04:05:07,931][41694] Fps is (10 sec: 7782.5, 60 sec: 6758.4, 300 sec: 6720.2). Total num frames: 138420224. Throughput: 0: 1764.0. Samples: 29599270. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:05:07,933][41694] Avg episode reward: [(0, '4.293')] +[2024-11-08 04:05:08,826][42004] Updated weights for policy 0, policy_version 33796 (0.0034) +[2024-11-08 04:05:12,933][41694] Fps is (10 sec: 7372.0, 60 sec: 6951.0, 300 sec: 6807.6). Total num frames: 138452992. Throughput: 0: 1767.2. Samples: 29610244. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:05:12,935][41694] Avg episode reward: [(0, '4.462')] +[2024-11-08 04:05:15,151][42004] Updated weights for policy 0, policy_version 33806 (0.0032) +[2024-11-08 04:05:17,932][41694] Fps is (10 sec: 6553.4, 60 sec: 6963.3, 300 sec: 6817.4). Total num frames: 138485760. Throughput: 0: 1744.8. Samples: 29614834. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:05:17,935][41694] Avg episode reward: [(0, '4.531')] +[2024-11-08 04:05:22,790][42004] Updated weights for policy 0, policy_version 33816 (0.0032) +[2024-11-08 04:05:22,931][41694] Fps is (10 sec: 5735.0, 60 sec: 6826.7, 300 sec: 6803.5). Total num frames: 138510336. Throughput: 0: 1650.9. Samples: 29622516. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:05:22,933][41694] Avg episode reward: [(0, '4.405')] +[2024-11-08 04:05:27,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6758.4, 300 sec: 6803.5). Total num frames: 138543104. Throughput: 0: 1616.3. Samples: 29632010. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:05:27,934][41694] Avg episode reward: [(0, '4.444')] +[2024-11-08 04:05:29,137][42004] Updated weights for policy 0, policy_version 33826 (0.0025) +[2024-11-08 04:05:32,931][41694] Fps is (10 sec: 6553.6, 60 sec: 6622.0, 300 sec: 6789.6). Total num frames: 138575872. Throughput: 0: 1603.3. Samples: 29636836. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:05:32,934][41694] Avg episode reward: [(0, '4.259')] +[2024-11-08 04:05:34,711][42004] Updated weights for policy 0, policy_version 33836 (0.0026) +[2024-11-08 04:05:37,931][41694] Fps is (10 sec: 7373.0, 60 sec: 6690.1, 300 sec: 6789.7). Total num frames: 138616832. Throughput: 0: 1693.5. Samples: 29648316. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:05:37,933][41694] Avg episode reward: [(0, '4.308')] +[2024-11-08 04:05:37,945][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000033842_138616832.pth... +[2024-11-08 04:05:38,042][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000033446_136994816.pth +[2024-11-08 04:05:40,121][42004] Updated weights for policy 0, policy_version 33846 (0.0030) +[2024-11-08 04:05:42,932][41694] Fps is (10 sec: 7782.3, 60 sec: 6690.1, 300 sec: 6789.6). Total num frames: 138653696. Throughput: 0: 1717.9. Samples: 29659380. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:05:42,934][41694] Avg episode reward: [(0, '4.371')] +[2024-11-08 04:05:45,558][42004] Updated weights for policy 0, policy_version 33856 (0.0022) +[2024-11-08 04:05:47,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6894.9, 300 sec: 6789.7). Total num frames: 138690560. Throughput: 0: 1712.1. Samples: 29664960. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:05:47,935][41694] Avg episode reward: [(0, '4.322')] +[2024-11-08 04:05:51,273][42004] Updated weights for policy 0, policy_version 33866 (0.0040) +[2024-11-08 04:05:52,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6963.2, 300 sec: 6789.6). Total num frames: 138727424. Throughput: 0: 1706.8. Samples: 29676078. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:05:52,933][41694] Avg episode reward: [(0, '4.541')] +[2024-11-08 04:05:57,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6758.4, 300 sec: 6734.1). Total num frames: 138747904. Throughput: 0: 1631.5. Samples: 29683658. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:05:57,934][41694] Avg episode reward: [(0, '4.473')] +[2024-11-08 04:05:58,541][42004] Updated weights for policy 0, policy_version 33876 (0.0032) +[2024-11-08 04:06:02,932][41694] Fps is (10 sec: 5324.9, 60 sec: 6690.1, 300 sec: 6720.2). Total num frames: 138780672. Throughput: 0: 1643.7. Samples: 29688802. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:06:02,935][41694] Avg episode reward: [(0, '4.467')] +[2024-11-08 04:06:05,039][42004] Updated weights for policy 0, policy_version 33886 (0.0034) +[2024-11-08 04:06:07,932][41694] Fps is (10 sec: 6963.0, 60 sec: 6621.8, 300 sec: 6748.0). Total num frames: 138817536. Throughput: 0: 1693.1. Samples: 29698704. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:06:07,934][41694] Avg episode reward: [(0, '4.374')] +[2024-11-08 04:06:10,694][42004] Updated weights for policy 0, policy_version 33896 (0.0029) +[2024-11-08 04:06:12,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6622.0, 300 sec: 6734.1). Total num frames: 138850304. Throughput: 0: 1714.5. Samples: 29709164. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:06:12,936][41694] Avg episode reward: [(0, '4.407')] +[2024-11-08 04:06:16,329][42004] Updated weights for policy 0, policy_version 33906 (0.0030) +[2024-11-08 04:06:17,932][41694] Fps is (10 sec: 7373.0, 60 sec: 6758.4, 300 sec: 6785.3). Total num frames: 138891264. Throughput: 0: 1724.7. Samples: 29714446. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:06:17,934][41694] Avg episode reward: [(0, '4.478')] +[2024-11-08 04:06:21,815][42004] Updated weights for policy 0, policy_version 33916 (0.0026) +[2024-11-08 04:06:22,931][41694] Fps is (10 sec: 7782.4, 60 sec: 6963.2, 300 sec: 6803.5). Total num frames: 138928128. Throughput: 0: 1726.1. Samples: 29725992. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:06:22,933][41694] Avg episode reward: [(0, '4.321')] +[2024-11-08 04:06:27,350][42004] Updated weights for policy 0, policy_version 33926 (0.0027) +[2024-11-08 04:06:29,106][41694] Fps is (10 sec: 6231.3, 60 sec: 6829.5, 300 sec: 6776.5). Total num frames: 138960896. Throughput: 0: 1684.1. Samples: 29737144. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:06:29,113][41694] Avg episode reward: [(0, '4.450')] +[2024-11-08 04:06:32,932][41694] Fps is (10 sec: 5734.2, 60 sec: 6826.6, 300 sec: 6748.0). Total num frames: 138985472. Throughput: 0: 1657.7. Samples: 29739556. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:06:32,934][41694] Avg episode reward: [(0, '4.385')] +[2024-11-08 04:06:35,210][42004] Updated weights for policy 0, policy_version 33936 (0.0035) +[2024-11-08 04:06:37,932][41694] Fps is (10 sec: 6033.4, 60 sec: 6621.8, 300 sec: 6720.2). Total num frames: 139014144. Throughput: 0: 1624.2. Samples: 29749166. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:06:37,933][41694] Avg episode reward: [(0, '4.656')] +[2024-11-08 04:06:41,461][42004] Updated weights for policy 0, policy_version 33946 (0.0040) +[2024-11-08 04:06:42,932][41694] Fps is (10 sec: 6553.7, 60 sec: 6621.9, 300 sec: 6706.3). Total num frames: 139051008. Throughput: 0: 1682.5. Samples: 29759372. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:06:42,944][41694] Avg episode reward: [(0, '4.419')] +[2024-11-08 04:06:47,029][42004] Updated weights for policy 0, policy_version 33956 (0.0022) +[2024-11-08 04:06:47,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6621.8, 300 sec: 6706.3). Total num frames: 139087872. Throughput: 0: 1680.3. Samples: 29764416. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:06:47,934][41694] Avg episode reward: [(0, '4.480')] +[2024-11-08 04:06:52,472][42004] Updated weights for policy 0, policy_version 33966 (0.0025) +[2024-11-08 04:06:52,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6621.9, 300 sec: 6748.0). Total num frames: 139124736. Throughput: 0: 1715.3. Samples: 29775894. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:06:52,933][41694] Avg episode reward: [(0, '4.300')] +[2024-11-08 04:06:57,857][42004] Updated weights for policy 0, policy_version 33976 (0.0035) +[2024-11-08 04:06:57,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6963.2, 300 sec: 6775.9). Total num frames: 139165696. Throughput: 0: 1739.6. Samples: 29787448. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:06:57,934][41694] Avg episode reward: [(0, '4.465')] +[2024-11-08 04:07:02,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6894.9, 300 sec: 6761.9). Total num frames: 139194368. Throughput: 0: 1745.9. Samples: 29793010. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:07:02,933][41694] Avg episode reward: [(0, '4.734')] +[2024-11-08 04:07:04,237][42004] Updated weights for policy 0, policy_version 33986 (0.0028) +[2024-11-08 04:07:07,931][41694] Fps is (10 sec: 6553.8, 60 sec: 6895.0, 300 sec: 6761.9). Total num frames: 139231232. Throughput: 0: 1692.4. Samples: 29802152. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:07:07,935][41694] Avg episode reward: [(0, '4.404')] +[2024-11-08 04:07:10,265][42004] Updated weights for policy 0, policy_version 33996 (0.0020) +[2024-11-08 04:07:12,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6894.9, 300 sec: 6734.1). Total num frames: 139264000. Throughput: 0: 1716.5. Samples: 29812372. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:07:12,933][41694] Avg episode reward: [(0, '4.503')] +[2024-11-08 04:07:15,646][42004] Updated weights for policy 0, policy_version 34006 (0.0028) +[2024-11-08 04:07:17,932][41694] Fps is (10 sec: 7372.6, 60 sec: 6894.9, 300 sec: 6748.0). Total num frames: 139304960. Throughput: 0: 1752.0. Samples: 29818394. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:07:17,934][41694] Avg episode reward: [(0, '4.488')] +[2024-11-08 04:07:20,916][42004] Updated weights for policy 0, policy_version 34016 (0.0026) +[2024-11-08 04:07:22,942][41694] Fps is (10 sec: 7774.1, 60 sec: 6893.7, 300 sec: 6747.7). Total num frames: 139341824. Throughput: 0: 1800.0. Samples: 29830186. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:07:22,944][41694] Avg episode reward: [(0, '4.397')] +[2024-11-08 04:07:26,397][42004] Updated weights for policy 0, policy_version 34026 (0.0025) +[2024-11-08 04:07:27,932][41694] Fps is (10 sec: 7372.8, 60 sec: 7102.2, 300 sec: 6789.6). Total num frames: 139378688. Throughput: 0: 1818.1. Samples: 29841188. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:07:27,933][41694] Avg episode reward: [(0, '4.209')] +[2024-11-08 04:07:32,121][42004] Updated weights for policy 0, policy_version 34036 (0.0027) +[2024-11-08 04:07:32,931][41694] Fps is (10 sec: 7380.6, 60 sec: 7168.0, 300 sec: 6803.5). Total num frames: 139415552. Throughput: 0: 1820.5. Samples: 29846340. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:07:32,933][41694] Avg episode reward: [(0, '4.429')] +[2024-11-08 04:07:37,931][41694] Fps is (10 sec: 6144.2, 60 sec: 7099.8, 300 sec: 6761.9). Total num frames: 139440128. Throughput: 0: 1745.2. Samples: 29854430. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:07:37,933][41694] Avg episode reward: [(0, '4.351')] +[2024-11-08 04:07:37,944][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000034043_139440128.pth... +[2024-11-08 04:07:38,063][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000033648_137822208.pth +[2024-11-08 04:07:39,507][42004] Updated weights for policy 0, policy_version 34046 (0.0035) +[2024-11-08 04:07:42,932][41694] Fps is (10 sec: 5734.3, 60 sec: 7031.5, 300 sec: 6748.0). Total num frames: 139472896. Throughput: 0: 1712.1. Samples: 29864492. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:07:42,933][41694] Avg episode reward: [(0, '4.568')] +[2024-11-08 04:07:45,953][42004] Updated weights for policy 0, policy_version 34056 (0.0042) +[2024-11-08 04:07:47,932][41694] Fps is (10 sec: 6553.5, 60 sec: 6963.2, 300 sec: 6734.1). Total num frames: 139505664. Throughput: 0: 1697.6. Samples: 29869402. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:07:47,934][41694] Avg episode reward: [(0, '4.606')] +[2024-11-08 04:07:51,696][42004] Updated weights for policy 0, policy_version 34066 (0.0024) +[2024-11-08 04:07:52,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6963.2, 300 sec: 6734.1). Total num frames: 139542528. Throughput: 0: 1728.3. Samples: 29879928. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:07:52,937][41694] Avg episode reward: [(0, '4.398')] +[2024-11-08 04:07:57,527][42004] Updated weights for policy 0, policy_version 34076 (0.0031) +[2024-11-08 04:07:57,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6826.7, 300 sec: 6734.1). Total num frames: 139575296. Throughput: 0: 1733.6. Samples: 29890386. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:07:57,934][41694] Avg episode reward: [(0, '4.332')] +[2024-11-08 04:08:02,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6963.2, 300 sec: 6789.9). Total num frames: 139612160. Throughput: 0: 1716.5. Samples: 29895634. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 04:08:02,933][41694] Avg episode reward: [(0, '4.546')] +[2024-11-08 04:08:03,423][42004] Updated weights for policy 0, policy_version 34086 (0.0035) +[2024-11-08 04:08:07,932][41694] Fps is (10 sec: 7373.0, 60 sec: 6963.2, 300 sec: 6803.5). Total num frames: 139649024. Throughput: 0: 1692.6. Samples: 29906336. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 04:08:07,933][41694] Avg episode reward: [(0, '4.386')] +[2024-11-08 04:08:09,617][42004] Updated weights for policy 0, policy_version 34096 (0.0030) +[2024-11-08 04:08:12,932][41694] Fps is (10 sec: 6553.5, 60 sec: 6894.9, 300 sec: 6789.6). Total num frames: 139677696. Throughput: 0: 1664.6. Samples: 29916094. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 04:08:12,933][41694] Avg episode reward: [(0, '4.432')] +[2024-11-08 04:08:15,764][42004] Updated weights for policy 0, policy_version 34106 (0.0025) +[2024-11-08 04:08:17,932][41694] Fps is (10 sec: 6144.0, 60 sec: 6758.4, 300 sec: 6761.9). Total num frames: 139710464. Throughput: 0: 1658.9. Samples: 29920990. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 04:08:17,933][41694] Avg episode reward: [(0, '4.530')] +[2024-11-08 04:08:21,562][42004] Updated weights for policy 0, policy_version 34116 (0.0028) +[2024-11-08 04:08:22,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6759.6, 300 sec: 6775.8). Total num frames: 139747328. Throughput: 0: 1711.2. Samples: 29931436. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 04:08:22,933][41694] Avg episode reward: [(0, '4.397')] +[2024-11-08 04:08:27,023][42004] Updated weights for policy 0, policy_version 34126 (0.0023) +[2024-11-08 04:08:27,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6758.4, 300 sec: 6761.9). Total num frames: 139784192. Throughput: 0: 1737.1. Samples: 29942662. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:08:27,934][41694] Avg episode reward: [(0, '4.361')] +[2024-11-08 04:08:32,679][42004] Updated weights for policy 0, policy_version 34136 (0.0039) +[2024-11-08 04:08:32,932][41694] Fps is (10 sec: 7372.9, 60 sec: 6758.4, 300 sec: 6810.6). Total num frames: 139821056. Throughput: 0: 1741.3. Samples: 29947760. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:08:32,934][41694] Avg episode reward: [(0, '4.284')] +[2024-11-08 04:08:37,932][41694] Fps is (10 sec: 7372.6, 60 sec: 6963.2, 300 sec: 6831.3). Total num frames: 139857920. Throughput: 0: 1750.6. Samples: 29958706. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:08:37,934][41694] Avg episode reward: [(0, '4.555')] +[2024-11-08 04:08:38,275][42004] Updated weights for policy 0, policy_version 34146 (0.0026) +[2024-11-08 04:08:42,931][41694] Fps is (10 sec: 6144.1, 60 sec: 6826.7, 300 sec: 6803.5). Total num frames: 139882496. Throughput: 0: 1727.8. Samples: 29968136. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:08:42,933][41694] Avg episode reward: [(0, '4.607')] +[2024-11-08 04:08:45,676][42004] Updated weights for policy 0, policy_version 34156 (0.0047) +[2024-11-08 04:08:47,932][41694] Fps is (10 sec: 5734.5, 60 sec: 6826.7, 300 sec: 6803.5). Total num frames: 139915264. Throughput: 0: 1697.3. Samples: 29972012. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:08:47,934][41694] Avg episode reward: [(0, '4.401')] +[2024-11-08 04:08:51,933][42004] Updated weights for policy 0, policy_version 34166 (0.0031) +[2024-11-08 04:08:52,932][41694] Fps is (10 sec: 6553.5, 60 sec: 6758.4, 300 sec: 6803.5). Total num frames: 139948032. Throughput: 0: 1680.4. Samples: 29981954. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:08:52,934][41694] Avg episode reward: [(0, '4.534')] +[2024-11-08 04:08:57,345][42004] Updated weights for policy 0, policy_version 34176 (0.0026) +[2024-11-08 04:08:57,931][41694] Fps is (10 sec: 7373.0, 60 sec: 6895.0, 300 sec: 6817.4). Total num frames: 139988992. Throughput: 0: 1709.9. Samples: 29993040. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:08:57,933][41694] Avg episode reward: [(0, '4.637')] +[2024-11-08 04:09:02,925][42004] Updated weights for policy 0, policy_version 34186 (0.0030) +[2024-11-08 04:09:02,932][41694] Fps is (10 sec: 7782.5, 60 sec: 6894.9, 300 sec: 6817.4). Total num frames: 140025856. Throughput: 0: 1726.0. Samples: 29998658. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:09:02,936][41694] Avg episode reward: [(0, '4.567')] +[2024-11-08 04:09:07,932][41694] Fps is (10 sec: 7372.6, 60 sec: 6894.9, 300 sec: 6870.4). Total num frames: 140062720. Throughput: 0: 1743.6. Samples: 30009900. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:09:07,934][41694] Avg episode reward: [(0, '4.482')] +[2024-11-08 04:09:08,226][42004] Updated weights for policy 0, policy_version 34196 (0.0031) +[2024-11-08 04:09:12,931][41694] Fps is (10 sec: 7372.9, 60 sec: 7031.5, 300 sec: 6886.9). Total num frames: 140099584. Throughput: 0: 1744.4. Samples: 30021160. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:09:12,933][41694] Avg episode reward: [(0, '4.431')] +[2024-11-08 04:09:13,621][42004] Updated weights for policy 0, policy_version 34206 (0.0038) +[2024-11-08 04:09:17,932][41694] Fps is (10 sec: 6144.0, 60 sec: 6894.9, 300 sec: 6859.1). Total num frames: 140124160. Throughput: 0: 1760.7. Samples: 30026990. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:09:17,933][41694] Avg episode reward: [(0, '4.364')] +[2024-11-08 04:09:20,899][42004] Updated weights for policy 0, policy_version 34216 (0.0036) +[2024-11-08 04:09:22,932][41694] Fps is (10 sec: 6143.9, 60 sec: 6895.0, 300 sec: 6859.1). Total num frames: 140161024. Throughput: 0: 1692.5. Samples: 30034870. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:09:22,933][41694] Avg episode reward: [(0, '4.322')] +[2024-11-08 04:09:26,957][42004] Updated weights for policy 0, policy_version 34226 (0.0031) +[2024-11-08 04:09:27,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6826.6, 300 sec: 6831.3). Total num frames: 140193792. Throughput: 0: 1709.5. Samples: 30045064. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:09:27,933][41694] Avg episode reward: [(0, '4.261')] +[2024-11-08 04:09:32,169][42004] Updated weights for policy 0, policy_version 34236 (0.0028) +[2024-11-08 04:09:32,932][41694] Fps is (10 sec: 7372.4, 60 sec: 6894.9, 300 sec: 6845.2). Total num frames: 140234752. Throughput: 0: 1752.1. Samples: 30050856. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:09:32,935][41694] Avg episode reward: [(0, '4.181')] +[2024-11-08 04:09:37,560][42004] Updated weights for policy 0, policy_version 34246 (0.0027) +[2024-11-08 04:09:37,932][41694] Fps is (10 sec: 7782.2, 60 sec: 6894.9, 300 sec: 6845.2). Total num frames: 140271616. Throughput: 0: 1791.1. Samples: 30062556. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:09:37,935][41694] Avg episode reward: [(0, '4.383')] +[2024-11-08 04:09:37,951][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000034246_140271616.pth... +[2024-11-08 04:09:38,050][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000033842_138616832.pth +[2024-11-08 04:09:42,931][41694] Fps is (10 sec: 7373.3, 60 sec: 7099.7, 300 sec: 6886.8). Total num frames: 140308480. Throughput: 0: 1789.4. Samples: 30073564. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:09:42,933][41694] Avg episode reward: [(0, '4.280')] +[2024-11-08 04:09:43,161][42004] Updated weights for policy 0, policy_version 34256 (0.0041) +[2024-11-08 04:09:47,932][41694] Fps is (10 sec: 7372.8, 60 sec: 7168.0, 300 sec: 6900.7). Total num frames: 140345344. Throughput: 0: 1780.3. Samples: 30078770. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:09:47,934][41694] Avg episode reward: [(0, '4.332')] +[2024-11-08 04:09:48,872][42004] Updated weights for policy 0, policy_version 34266 (0.0031) +[2024-11-08 04:09:52,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6963.2, 300 sec: 6859.1). Total num frames: 140365824. Throughput: 0: 1697.2. Samples: 30086272. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:09:52,934][41694] Avg episode reward: [(0, '4.280')] +[2024-11-08 04:09:57,060][42004] Updated weights for policy 0, policy_version 34276 (0.0025) +[2024-11-08 04:09:57,932][41694] Fps is (10 sec: 5325.0, 60 sec: 6826.6, 300 sec: 6845.2). Total num frames: 140398592. Throughput: 0: 1666.4. Samples: 30096146. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:09:57,935][41694] Avg episode reward: [(0, '4.401')] +[2024-11-08 04:10:02,833][42004] Updated weights for policy 0, policy_version 34286 (0.0024) +[2024-11-08 04:10:02,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6826.7, 300 sec: 6831.3). Total num frames: 140435456. Throughput: 0: 1651.2. Samples: 30101296. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:10:02,934][41694] Avg episode reward: [(0, '4.574')] +[2024-11-08 04:10:07,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6826.7, 300 sec: 6845.2). Total num frames: 140472320. Throughput: 0: 1717.3. Samples: 30112150. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:10:07,933][41694] Avg episode reward: [(0, '4.393')] +[2024-11-08 04:10:08,261][42004] Updated weights for policy 0, policy_version 34296 (0.0033) +[2024-11-08 04:10:12,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6826.7, 300 sec: 6859.1). Total num frames: 140509184. Throughput: 0: 1742.4. Samples: 30123470. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:10:12,933][41694] Avg episode reward: [(0, '4.499')] +[2024-11-08 04:10:14,043][42004] Updated weights for policy 0, policy_version 34306 (0.0029) +[2024-11-08 04:10:17,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6963.2, 300 sec: 6886.8). Total num frames: 140541952. Throughput: 0: 1718.6. Samples: 30128190. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 04:10:17,934][41694] Avg episode reward: [(0, '4.260')] +[2024-11-08 04:10:20,348][42004] Updated weights for policy 0, policy_version 34316 (0.0036) +[2024-11-08 04:10:22,932][41694] Fps is (10 sec: 6553.4, 60 sec: 6894.9, 300 sec: 6886.8). Total num frames: 140574720. Throughput: 0: 1685.6. Samples: 30138410. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 04:10:22,934][41694] Avg episode reward: [(0, '4.509')] +[2024-11-08 04:10:27,722][42004] Updated weights for policy 0, policy_version 34326 (0.0033) +[2024-11-08 04:10:27,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6758.4, 300 sec: 6859.1). Total num frames: 140599296. Throughput: 0: 1612.6. Samples: 30146132. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 04:10:27,933][41694] Avg episode reward: [(0, '4.306')] +[2024-11-08 04:10:32,931][41694] Fps is (10 sec: 5734.6, 60 sec: 6621.9, 300 sec: 6831.3). Total num frames: 140632064. Throughput: 0: 1601.3. Samples: 30150828. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 04:10:32,936][41694] Avg episode reward: [(0, '4.442')] +[2024-11-08 04:10:33,663][42004] Updated weights for policy 0, policy_version 34336 (0.0027) +[2024-11-08 04:10:37,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6621.9, 300 sec: 6831.3). Total num frames: 140668928. Throughput: 0: 1680.9. Samples: 30161912. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:10:37,933][41694] Avg episode reward: [(0, '4.558')] +[2024-11-08 04:10:39,075][42004] Updated weights for policy 0, policy_version 34346 (0.0019) +[2024-11-08 04:10:42,933][41694] Fps is (10 sec: 7781.5, 60 sec: 6690.0, 300 sec: 6845.2). Total num frames: 140709888. Throughput: 0: 1719.2. Samples: 30173512. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:10:42,935][41694] Avg episode reward: [(0, '4.329')] +[2024-11-08 04:10:44,479][42004] Updated weights for policy 0, policy_version 34356 (0.0025) +[2024-11-08 04:10:47,932][41694] Fps is (10 sec: 7782.0, 60 sec: 6690.1, 300 sec: 6845.2). Total num frames: 140746752. Throughput: 0: 1725.5. Samples: 30178944. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:10:47,935][41694] Avg episode reward: [(0, '4.518')] +[2024-11-08 04:10:49,986][42004] Updated weights for policy 0, policy_version 34366 (0.0023) +[2024-11-08 04:10:52,931][41694] Fps is (10 sec: 7373.6, 60 sec: 6963.2, 300 sec: 6900.7). Total num frames: 140783616. Throughput: 0: 1735.8. Samples: 30190260. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:10:52,934][41694] Avg episode reward: [(0, '4.524')] +[2024-11-08 04:10:55,666][42004] Updated weights for policy 0, policy_version 34376 (0.0037) +[2024-11-08 04:10:59,060][41694] Fps is (10 sec: 5889.0, 60 sec: 6767.6, 300 sec: 6860.6). Total num frames: 140812288. Throughput: 0: 1679.6. Samples: 30200950. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:10:59,062][41694] Avg episode reward: [(0, '4.418')] +[2024-11-08 04:11:02,932][41694] Fps is (10 sec: 5324.8, 60 sec: 6690.1, 300 sec: 6845.2). Total num frames: 140836864. Throughput: 0: 1658.6. Samples: 30202828. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:11:02,934][41694] Avg episode reward: [(0, '4.340')] +[2024-11-08 04:11:03,849][42004] Updated weights for policy 0, policy_version 34386 (0.0038) +[2024-11-08 04:11:07,931][41694] Fps is (10 sec: 6464.3, 60 sec: 6621.9, 300 sec: 6845.2). Total num frames: 140869632. Throughput: 0: 1636.9. Samples: 30212070. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:11:07,933][41694] Avg episode reward: [(0, '4.390')] +[2024-11-08 04:11:09,924][42004] Updated weights for policy 0, policy_version 34396 (0.0032) +[2024-11-08 04:11:12,931][41694] Fps is (10 sec: 6553.7, 60 sec: 6553.6, 300 sec: 6817.4). Total num frames: 140902400. Throughput: 0: 1684.4. Samples: 30221930. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:11:12,933][41694] Avg episode reward: [(0, '4.427')] +[2024-11-08 04:11:16,161][42004] Updated weights for policy 0, policy_version 34406 (0.0028) +[2024-11-08 04:11:17,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6621.9, 300 sec: 6817.4). Total num frames: 140939264. Throughput: 0: 1691.5. Samples: 30226944. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:11:17,933][41694] Avg episode reward: [(0, '4.534')] +[2024-11-08 04:11:21,785][42004] Updated weights for policy 0, policy_version 34416 (0.0042) +[2024-11-08 04:11:22,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6690.2, 300 sec: 6858.6). Total num frames: 140976128. Throughput: 0: 1692.3. Samples: 30238066. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:11:22,933][41694] Avg episode reward: [(0, '4.580')] +[2024-11-08 04:11:27,180][42004] Updated weights for policy 0, policy_version 34426 (0.0032) +[2024-11-08 04:11:27,932][41694] Fps is (10 sec: 7372.4, 60 sec: 6894.9, 300 sec: 6872.9). Total num frames: 141012992. Throughput: 0: 1689.0. Samples: 30249514. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:11:27,935][41694] Avg episode reward: [(0, '4.377')] +[2024-11-08 04:11:33,054][41694] Fps is (10 sec: 6069.4, 60 sec: 6744.6, 300 sec: 6856.2). Total num frames: 141037568. Throughput: 0: 1687.7. Samples: 30255098. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:11:33,056][41694] Avg episode reward: [(0, '4.490')] +[2024-11-08 04:11:34,442][42004] Updated weights for policy 0, policy_version 34436 (0.0035) +[2024-11-08 04:11:37,932][41694] Fps is (10 sec: 5734.6, 60 sec: 6690.1, 300 sec: 6845.2). Total num frames: 141070336. Throughput: 0: 1611.3. Samples: 30262768. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:11:37,935][41694] Avg episode reward: [(0, '4.476')] +[2024-11-08 04:11:37,957][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000034441_141070336.pth... +[2024-11-08 04:11:38,105][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000034043_139440128.pth +[2024-11-08 04:11:40,600][42004] Updated weights for policy 0, policy_version 34446 (0.0044) +[2024-11-08 04:11:42,931][41694] Fps is (10 sec: 6635.2, 60 sec: 6553.7, 300 sec: 6831.3). Total num frames: 141103104. Throughput: 0: 1634.0. Samples: 30272634. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:11:42,933][41694] Avg episode reward: [(0, '4.377')] +[2024-11-08 04:11:46,137][42004] Updated weights for policy 0, policy_version 34456 (0.0031) +[2024-11-08 04:11:47,932][41694] Fps is (10 sec: 7372.5, 60 sec: 6621.9, 300 sec: 6845.2). Total num frames: 141144064. Throughput: 0: 1676.7. Samples: 30278282. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:11:47,934][41694] Avg episode reward: [(0, '4.381')] +[2024-11-08 04:11:51,510][42004] Updated weights for policy 0, policy_version 34466 (0.0030) +[2024-11-08 04:11:52,931][41694] Fps is (10 sec: 7782.3, 60 sec: 6621.9, 300 sec: 6831.3). Total num frames: 141180928. Throughput: 0: 1725.1. Samples: 30289698. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:11:52,933][41694] Avg episode reward: [(0, '4.468')] +[2024-11-08 04:11:57,039][42004] Updated weights for policy 0, policy_version 34476 (0.0031) +[2024-11-08 04:11:57,932][41694] Fps is (10 sec: 7373.1, 60 sec: 6888.0, 300 sec: 6859.1). Total num frames: 141217792. Throughput: 0: 1756.8. Samples: 30300988. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:11:57,934][41694] Avg episode reward: [(0, '4.406')] +[2024-11-08 04:12:02,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6894.9, 300 sec: 6845.2). Total num frames: 141250560. Throughput: 0: 1765.3. Samples: 30306382. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:12:02,936][41694] Avg episode reward: [(0, '4.536')] +[2024-11-08 04:12:03,023][42004] Updated weights for policy 0, policy_version 34486 (0.0033) +[2024-11-08 04:12:07,932][41694] Fps is (10 sec: 5324.8, 60 sec: 6690.1, 300 sec: 6803.5). Total num frames: 141271040. Throughput: 0: 1705.9. Samples: 30314830. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:12:07,936][41694] Avg episode reward: [(0, '4.473')] +[2024-11-08 04:12:11,313][42004] Updated weights for policy 0, policy_version 34496 (0.0028) +[2024-11-08 04:12:12,931][41694] Fps is (10 sec: 5324.9, 60 sec: 6690.1, 300 sec: 6775.8). Total num frames: 141303808. Throughput: 0: 1617.3. Samples: 30322292. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:12:12,933][41694] Avg episode reward: [(0, '4.405')] +[2024-11-08 04:12:17,558][42004] Updated weights for policy 0, policy_version 34506 (0.0027) +[2024-11-08 04:12:17,932][41694] Fps is (10 sec: 6553.6, 60 sec: 6621.9, 300 sec: 6762.1). Total num frames: 141336576. Throughput: 0: 1594.7. Samples: 30326664. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:12:17,933][41694] Avg episode reward: [(0, '4.740')] +[2024-11-08 04:12:22,712][42004] Updated weights for policy 0, policy_version 34516 (0.0025) +[2024-11-08 04:12:22,932][41694] Fps is (10 sec: 7372.4, 60 sec: 6690.1, 300 sec: 6775.8). Total num frames: 141377536. Throughput: 0: 1684.2. Samples: 30338558. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:12:22,935][41694] Avg episode reward: [(0, '4.444')] +[2024-11-08 04:12:27,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6690.2, 300 sec: 6775.8). Total num frames: 141414400. Throughput: 0: 1727.6. Samples: 30350378. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:12:27,933][41694] Avg episode reward: [(0, '4.286')] +[2024-11-08 04:12:28,002][42004] Updated weights for policy 0, policy_version 34526 (0.0025) +[2024-11-08 04:12:32,932][41694] Fps is (10 sec: 7782.7, 60 sec: 6977.5, 300 sec: 6831.3). Total num frames: 141455360. Throughput: 0: 1728.7. Samples: 30356072. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:12:32,935][41694] Avg episode reward: [(0, '4.499')] +[2024-11-08 04:12:33,264][42004] Updated weights for policy 0, policy_version 34536 (0.0028) +[2024-11-08 04:12:37,931][41694] Fps is (10 sec: 7782.5, 60 sec: 7031.5, 300 sec: 6845.2). Total num frames: 141492224. Throughput: 0: 1728.0. Samples: 30367456. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:12:37,933][41694] Avg episode reward: [(0, '4.519')] +[2024-11-08 04:12:38,777][42004] Updated weights for policy 0, policy_version 34546 (0.0035) +[2024-11-08 04:12:42,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6826.6, 300 sec: 6803.5). Total num frames: 141512704. Throughput: 0: 1636.6. Samples: 30374636. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:12:42,936][41694] Avg episode reward: [(0, '4.322')] +[2024-11-08 04:12:47,174][42004] Updated weights for policy 0, policy_version 34556 (0.0026) +[2024-11-08 04:12:47,932][41694] Fps is (10 sec: 5324.6, 60 sec: 6690.2, 300 sec: 6789.6). Total num frames: 141545472. Throughput: 0: 1617.6. Samples: 30379176. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:12:47,934][41694] Avg episode reward: [(0, '4.511')] +[2024-11-08 04:12:52,933][41694] Fps is (10 sec: 6552.7, 60 sec: 6621.7, 300 sec: 6789.6). Total num frames: 141578240. Throughput: 0: 1660.6. Samples: 30389560. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:12:52,937][41694] Avg episode reward: [(0, '4.501')] +[2024-11-08 04:12:52,948][42004] Updated weights for policy 0, policy_version 34566 (0.0028) +[2024-11-08 04:12:57,932][41694] Fps is (10 sec: 7373.0, 60 sec: 6690.1, 300 sec: 6803.5). Total num frames: 141619200. Throughput: 0: 1741.1. Samples: 30400642. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:12:57,934][41694] Avg episode reward: [(0, '4.360')] +[2024-11-08 04:12:58,397][42004] Updated weights for policy 0, policy_version 34576 (0.0030) +[2024-11-08 04:13:02,931][41694] Fps is (10 sec: 7374.0, 60 sec: 6690.2, 300 sec: 6789.6). Total num frames: 141651968. Throughput: 0: 1765.5. Samples: 30406112. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:13:02,933][41694] Avg episode reward: [(0, '4.505')] +[2024-11-08 04:13:04,729][42004] Updated weights for policy 0, policy_version 34586 (0.0033) +[2024-11-08 04:13:07,932][41694] Fps is (10 sec: 6553.3, 60 sec: 6894.9, 300 sec: 6803.5). Total num frames: 141684736. Throughput: 0: 1716.0. Samples: 30415776. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:13:07,934][41694] Avg episode reward: [(0, '4.244')] +[2024-11-08 04:13:10,877][42004] Updated weights for policy 0, policy_version 34596 (0.0029) +[2024-11-08 04:13:12,931][41694] Fps is (10 sec: 6553.6, 60 sec: 6894.9, 300 sec: 6803.5). Total num frames: 141717504. Throughput: 0: 1680.9. Samples: 30426020. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:13:12,933][41694] Avg episode reward: [(0, '4.381')] +[2024-11-08 04:13:17,932][41694] Fps is (10 sec: 4915.4, 60 sec: 6621.9, 300 sec: 6734.1). Total num frames: 141733888. Throughput: 0: 1619.2. Samples: 30428936. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:13:17,933][41694] Avg episode reward: [(0, '4.341')] +[2024-11-08 04:13:19,541][42004] Updated weights for policy 0, policy_version 34606 (0.0031) +[2024-11-08 04:13:22,931][41694] Fps is (10 sec: 4915.2, 60 sec: 6485.4, 300 sec: 6720.2). Total num frames: 141766656. Throughput: 0: 1524.4. Samples: 30436054. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:13:22,933][41694] Avg episode reward: [(0, '4.564')] +[2024-11-08 04:13:25,771][42004] Updated weights for policy 0, policy_version 34616 (0.0029) +[2024-11-08 04:13:27,932][41694] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6706.3). Total num frames: 141799424. Throughput: 0: 1595.3. Samples: 30446424. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:13:27,934][41694] Avg episode reward: [(0, '4.413')] +[2024-11-08 04:13:32,013][42004] Updated weights for policy 0, policy_version 34626 (0.0033) +[2024-11-08 04:13:32,932][41694] Fps is (10 sec: 6553.4, 60 sec: 6280.5, 300 sec: 6692.4). Total num frames: 141832192. Throughput: 0: 1595.4. Samples: 30450968. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:13:32,935][41694] Avg episode reward: [(0, '4.375')] +[2024-11-08 04:13:37,541][42004] Updated weights for policy 0, policy_version 34636 (0.0022) +[2024-11-08 04:13:37,933][41694] Fps is (10 sec: 6962.1, 60 sec: 6280.4, 300 sec: 6734.1). Total num frames: 141869056. Throughput: 0: 1608.6. Samples: 30461946. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:13:37,936][41694] Avg episode reward: [(0, '4.230')] +[2024-11-08 04:13:38,056][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000034637_141873152.pth... +[2024-11-08 04:13:38,152][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000034246_140271616.pth +[2024-11-08 04:13:42,919][42004] Updated weights for policy 0, policy_version 34646 (0.0032) +[2024-11-08 04:13:42,932][41694] Fps is (10 sec: 7782.6, 60 sec: 6621.9, 300 sec: 6761.9). Total num frames: 141910016. Throughput: 0: 1619.6. Samples: 30473526. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:13:42,933][41694] Avg episode reward: [(0, '4.383')] +[2024-11-08 04:13:47,932][41694] Fps is (10 sec: 7783.7, 60 sec: 6690.2, 300 sec: 6775.8). Total num frames: 141946880. Throughput: 0: 1618.4. Samples: 30478942. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:13:47,934][41694] Avg episode reward: [(0, '4.340')] +[2024-11-08 04:13:50,333][42004] Updated weights for policy 0, policy_version 34656 (0.0025) +[2024-11-08 04:13:52,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6485.5, 300 sec: 6706.3). Total num frames: 141967360. Throughput: 0: 1568.5. Samples: 30486360. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:13:52,934][41694] Avg episode reward: [(0, '4.351')] +[2024-11-08 04:13:56,673][42004] Updated weights for policy 0, policy_version 34666 (0.0026) +[2024-11-08 04:13:57,932][41694] Fps is (10 sec: 5324.7, 60 sec: 6348.8, 300 sec: 6692.4). Total num frames: 142000128. Throughput: 0: 1556.3. Samples: 30496052. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:13:57,934][41694] Avg episode reward: [(0, '4.461')] +[2024-11-08 04:14:02,267][42004] Updated weights for policy 0, policy_version 34676 (0.0025) +[2024-11-08 04:14:02,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6417.0, 300 sec: 6692.4). Total num frames: 142036992. Throughput: 0: 1615.1. Samples: 30501616. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:14:02,934][41694] Avg episode reward: [(0, '4.457')] +[2024-11-08 04:14:07,517][42004] Updated weights for policy 0, policy_version 34686 (0.0026) +[2024-11-08 04:14:07,932][41694] Fps is (10 sec: 7372.9, 60 sec: 6485.4, 300 sec: 6692.4). Total num frames: 142073856. Throughput: 0: 1708.8. Samples: 30512952. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:14:07,933][41694] Avg episode reward: [(0, '4.553')] +[2024-11-08 04:14:12,762][42004] Updated weights for policy 0, policy_version 34696 (0.0038) +[2024-11-08 04:14:12,932][41694] Fps is (10 sec: 7782.5, 60 sec: 6621.9, 300 sec: 6748.0). Total num frames: 142114816. Throughput: 0: 1739.3. Samples: 30524694. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:14:12,933][41694] Avg episode reward: [(0, '4.350')] +[2024-11-08 04:14:17,932][41694] Fps is (10 sec: 7782.2, 60 sec: 6963.2, 300 sec: 6748.0). Total num frames: 142151680. Throughput: 0: 1765.0. Samples: 30530392. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:14:17,935][41694] Avg episode reward: [(0, '4.299')] +[2024-11-08 04:14:18,282][42004] Updated weights for policy 0, policy_version 34706 (0.0030) +[2024-11-08 04:14:24,551][41694] Fps is (10 sec: 5992.6, 60 sec: 6780.2, 300 sec: 6711.1). Total num frames: 142184448. Throughput: 0: 1707.4. Samples: 30541540. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:14:24,553][41694] Avg episode reward: [(0, '4.436')] +[2024-11-08 04:14:25,962][42004] Updated weights for policy 0, policy_version 34716 (0.0040) +[2024-11-08 04:14:27,932][41694] Fps is (10 sec: 5324.5, 60 sec: 6758.3, 300 sec: 6678.6). Total num frames: 142204928. Throughput: 0: 1658.6. Samples: 30548164. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:14:27,935][41694] Avg episode reward: [(0, '4.389')] +[2024-11-08 04:14:32,149][42004] Updated weights for policy 0, policy_version 34726 (0.0022) +[2024-11-08 04:14:32,932][41694] Fps is (10 sec: 6842.5, 60 sec: 6826.7, 300 sec: 6678.6). Total num frames: 142241792. Throughput: 0: 1645.4. Samples: 30552984. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:14:32,935][41694] Avg episode reward: [(0, '4.674')] +[2024-11-08 04:14:37,286][42004] Updated weights for policy 0, policy_version 34736 (0.0032) +[2024-11-08 04:14:37,931][41694] Fps is (10 sec: 7783.0, 60 sec: 6895.1, 300 sec: 6692.4). Total num frames: 142282752. Throughput: 0: 1741.1. Samples: 30564708. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:14:37,934][41694] Avg episode reward: [(0, '4.421')] +[2024-11-08 04:14:42,487][42004] Updated weights for policy 0, policy_version 34746 (0.0024) +[2024-11-08 04:14:42,931][41694] Fps is (10 sec: 7782.6, 60 sec: 6826.7, 300 sec: 6692.5). Total num frames: 142319616. Throughput: 0: 1788.9. Samples: 30576550. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:14:42,933][41694] Avg episode reward: [(0, '4.622')] +[2024-11-08 04:14:47,696][42004] Updated weights for policy 0, policy_version 34756 (0.0029) +[2024-11-08 04:14:47,934][41694] Fps is (10 sec: 7780.3, 60 sec: 6894.6, 300 sec: 6761.8). Total num frames: 142360576. Throughput: 0: 1792.7. Samples: 30582294. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:14:47,937][41694] Avg episode reward: [(0, '4.558')] +[2024-11-08 04:14:52,932][41694] Fps is (10 sec: 7782.3, 60 sec: 7168.0, 300 sec: 6775.8). Total num frames: 142397440. Throughput: 0: 1797.6. Samples: 30593844. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:14:52,934][41694] Avg episode reward: [(0, '4.584')] +[2024-11-08 04:14:53,431][42004] Updated weights for policy 0, policy_version 34766 (0.0036) +[2024-11-08 04:14:58,936][41694] Fps is (10 sec: 5956.9, 60 sec: 6982.9, 300 sec: 6725.1). Total num frames: 142426112. Throughput: 0: 1616.6. Samples: 30599064. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:14:58,938][41694] Avg episode reward: [(0, '4.364')] +[2024-11-08 04:15:01,728][42004] Updated weights for policy 0, policy_version 34776 (0.0028) +[2024-11-08 04:15:02,932][41694] Fps is (10 sec: 4915.1, 60 sec: 6826.7, 300 sec: 6692.4). Total num frames: 142446592. Throughput: 0: 1673.7. Samples: 30605708. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:15:02,934][41694] Avg episode reward: [(0, '4.351')] +[2024-11-08 04:15:07,726][42004] Updated weights for policy 0, policy_version 34786 (0.0030) +[2024-11-08 04:15:07,934][41694] Fps is (10 sec: 6372.9, 60 sec: 6826.4, 300 sec: 6692.4). Total num frames: 142483456. Throughput: 0: 1697.4. Samples: 30615176. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:15:07,936][41694] Avg episode reward: [(0, '4.398')] +[2024-11-08 04:15:12,932][41694] Fps is (10 sec: 7372.9, 60 sec: 6758.4, 300 sec: 6706.3). Total num frames: 142520320. Throughput: 0: 1743.7. Samples: 30626628. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:15:12,935][41694] Avg episode reward: [(0, '4.483')] +[2024-11-08 04:15:12,990][42004] Updated weights for policy 0, policy_version 34796 (0.0029) +[2024-11-08 04:15:17,932][41694] Fps is (10 sec: 6964.9, 60 sec: 6690.2, 300 sec: 6706.3). Total num frames: 142553088. Throughput: 0: 1746.9. Samples: 30631596. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:15:17,933][41694] Avg episode reward: [(0, '4.402')] +[2024-11-08 04:15:19,232][42004] Updated weights for policy 0, policy_version 34806 (0.0022) +[2024-11-08 04:15:22,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6945.9, 300 sec: 6748.0). Total num frames: 142589952. Throughput: 0: 1721.1. Samples: 30642156. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:15:22,933][41694] Avg episode reward: [(0, '4.607')] +[2024-11-08 04:15:24,505][42004] Updated weights for policy 0, policy_version 34816 (0.0027) +[2024-11-08 04:15:27,932][41694] Fps is (10 sec: 7372.8, 60 sec: 7031.6, 300 sec: 6761.9). Total num frames: 142626816. Throughput: 0: 1706.5. Samples: 30653342. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:15:27,936][41694] Avg episode reward: [(0, '4.239')] +[2024-11-08 04:15:30,193][42004] Updated weights for policy 0, policy_version 34826 (0.0031) +[2024-11-08 04:15:33,395][41694] Fps is (10 sec: 6263.4, 60 sec: 6842.1, 300 sec: 6723.5). Total num frames: 142655488. Throughput: 0: 1687.5. Samples: 30659010. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:15:33,396][41694] Avg episode reward: [(0, '4.302')] +[2024-11-08 04:15:37,934][41694] Fps is (10 sec: 5732.8, 60 sec: 6689.8, 300 sec: 6692.4). Total num frames: 142684160. Throughput: 0: 1588.1. Samples: 30665312. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:15:37,939][41694] Avg episode reward: [(0, '4.409')] +[2024-11-08 04:15:37,971][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000034835_142684160.pth... +[2024-11-08 04:15:38,064][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000034441_141070336.pth +[2024-11-08 04:15:38,432][42004] Updated weights for policy 0, policy_version 34836 (0.0030) +[2024-11-08 04:15:42,932][41694] Fps is (10 sec: 6871.9, 60 sec: 6690.1, 300 sec: 6692.5). Total num frames: 142721024. Throughput: 0: 1763.6. Samples: 30676654. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:15:42,933][41694] Avg episode reward: [(0, '4.334')] +[2024-11-08 04:15:43,737][42004] Updated weights for policy 0, policy_version 34846 (0.0031) +[2024-11-08 04:15:47,932][41694] Fps is (10 sec: 7374.9, 60 sec: 6622.2, 300 sec: 6692.4). Total num frames: 142757888. Throughput: 0: 1700.4. Samples: 30682226. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:15:47,934][41694] Avg episode reward: [(0, '4.568')] +[2024-11-08 04:15:48,984][42004] Updated weights for policy 0, policy_version 34856 (0.0021) +[2024-11-08 04:15:52,931][41694] Fps is (10 sec: 7782.5, 60 sec: 6690.1, 300 sec: 6760.0). Total num frames: 142798848. Throughput: 0: 1742.8. Samples: 30693598. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:15:52,933][41694] Avg episode reward: [(0, '4.407')] +[2024-11-08 04:15:54,543][42004] Updated weights for policy 0, policy_version 34866 (0.0030) +[2024-11-08 04:15:57,932][41694] Fps is (10 sec: 7782.3, 60 sec: 6942.9, 300 sec: 6775.8). Total num frames: 142835712. Throughput: 0: 1746.5. Samples: 30705220. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:15:57,934][41694] Avg episode reward: [(0, '4.494')] +[2024-11-08 04:15:59,979][42004] Updated weights for policy 0, policy_version 34876 (0.0026) +[2024-11-08 04:16:02,931][41694] Fps is (10 sec: 7372.8, 60 sec: 7099.8, 300 sec: 6789.6). Total num frames: 142872576. Throughput: 0: 1756.7. Samples: 30710646. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:16:02,933][41694] Avg episode reward: [(0, '4.424')] +[2024-11-08 04:16:07,550][42004] Updated weights for policy 0, policy_version 34886 (0.0032) +[2024-11-08 04:16:07,931][41694] Fps is (10 sec: 5734.5, 60 sec: 6826.9, 300 sec: 6748.0). Total num frames: 142893056. Throughput: 0: 1745.9. Samples: 30720722. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:16:07,933][41694] Avg episode reward: [(0, '4.505')] +[2024-11-08 04:16:12,931][41694] Fps is (10 sec: 4915.2, 60 sec: 6690.1, 300 sec: 6720.2). Total num frames: 142921728. Throughput: 0: 1646.1. Samples: 30727416. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:16:12,934][41694] Avg episode reward: [(0, '4.503')] +[2024-11-08 04:16:14,048][42004] Updated weights for policy 0, policy_version 34896 (0.0027) +[2024-11-08 04:16:17,932][41694] Fps is (10 sec: 6553.5, 60 sec: 6758.4, 300 sec: 6720.2). Total num frames: 142958592. Throughput: 0: 1651.4. Samples: 30732560. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:16:17,934][41694] Avg episode reward: [(0, '4.283')] +[2024-11-08 04:16:19,891][42004] Updated weights for policy 0, policy_version 34906 (0.0037) +[2024-11-08 04:16:22,932][41694] Fps is (10 sec: 7372.6, 60 sec: 6758.4, 300 sec: 6720.2). Total num frames: 142995456. Throughput: 0: 1737.2. Samples: 30743480. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:16:22,935][41694] Avg episode reward: [(0, '4.389')] +[2024-11-08 04:16:25,311][42004] Updated weights for policy 0, policy_version 34916 (0.0026) +[2024-11-08 04:16:27,931][41694] Fps is (10 sec: 7373.0, 60 sec: 6758.4, 300 sec: 6764.7). Total num frames: 143032320. Throughput: 0: 1734.3. Samples: 30754698. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:16:27,933][41694] Avg episode reward: [(0, '4.603')] +[2024-11-08 04:16:30,603][42004] Updated weights for policy 0, policy_version 34926 (0.0032) +[2024-11-08 04:16:32,932][41694] Fps is (10 sec: 7782.4, 60 sec: 7017.4, 300 sec: 6789.6). Total num frames: 143073280. Throughput: 0: 1741.4. Samples: 30760588. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:16:32,934][41694] Avg episode reward: [(0, '4.537')] +[2024-11-08 04:16:36,011][42004] Updated weights for policy 0, policy_version 34936 (0.0027) +[2024-11-08 04:16:37,932][41694] Fps is (10 sec: 7782.3, 60 sec: 7100.1, 300 sec: 6803.5). Total num frames: 143110144. Throughput: 0: 1740.5. Samples: 30771920. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:16:37,933][41694] Avg episode reward: [(0, '4.482')] +[2024-11-08 04:16:42,934][41694] Fps is (10 sec: 5733.2, 60 sec: 6826.4, 300 sec: 6734.1). Total num frames: 143130624. Throughput: 0: 1637.9. Samples: 30778930. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:16:42,936][41694] Avg episode reward: [(0, '4.433')] +[2024-11-08 04:16:44,059][42004] Updated weights for policy 0, policy_version 34946 (0.0023) +[2024-11-08 04:16:47,932][41694] Fps is (10 sec: 5324.8, 60 sec: 6758.4, 300 sec: 6720.2). Total num frames: 143163392. Throughput: 0: 1625.5. Samples: 30783796. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:16:47,933][41694] Avg episode reward: [(0, '4.356')] +[2024-11-08 04:16:49,602][42004] Updated weights for policy 0, policy_version 34956 (0.0028) +[2024-11-08 04:16:52,931][41694] Fps is (10 sec: 7374.6, 60 sec: 6758.4, 300 sec: 6734.1). Total num frames: 143204352. Throughput: 0: 1654.8. Samples: 30795186. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:16:52,933][41694] Avg episode reward: [(0, '4.458')] +[2024-11-08 04:16:55,036][42004] Updated weights for policy 0, policy_version 34966 (0.0025) +[2024-11-08 04:16:57,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6758.4, 300 sec: 6748.0). Total num frames: 143241216. Throughput: 0: 1763.0. Samples: 30806752. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:16:57,933][41694] Avg episode reward: [(0, '4.391')] +[2024-11-08 04:17:00,502][42004] Updated weights for policy 0, policy_version 34976 (0.0022) +[2024-11-08 04:17:02,932][41694] Fps is (10 sec: 7372.3, 60 sec: 6758.3, 300 sec: 6803.5). Total num frames: 143278080. Throughput: 0: 1768.1. Samples: 30812126. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:17:02,935][41694] Avg episode reward: [(0, '4.353')] +[2024-11-08 04:17:05,905][42004] Updated weights for policy 0, policy_version 34986 (0.0031) +[2024-11-08 04:17:07,932][41694] Fps is (10 sec: 7372.7, 60 sec: 7031.4, 300 sec: 6817.4). Total num frames: 143314944. Throughput: 0: 1776.3. Samples: 30823412. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:17:07,934][41694] Avg episode reward: [(0, '4.301')] +[2024-11-08 04:17:11,491][42004] Updated weights for policy 0, policy_version 34996 (0.0031) +[2024-11-08 04:17:12,931][41694] Fps is (10 sec: 7373.3, 60 sec: 7168.0, 300 sec: 6831.3). Total num frames: 143351808. Throughput: 0: 1773.9. Samples: 30834524. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:17:12,933][41694] Avg episode reward: [(0, '4.494')] +[2024-11-08 04:17:17,935][41694] Fps is (10 sec: 5323.1, 60 sec: 6826.3, 300 sec: 6747.9). Total num frames: 143368192. Throughput: 0: 1714.5. Samples: 30837748. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:17:17,937][41694] Avg episode reward: [(0, '4.477')] +[2024-11-08 04:17:19,805][42004] Updated weights for policy 0, policy_version 35006 (0.0028) +[2024-11-08 04:17:22,931][41694] Fps is (10 sec: 5324.8, 60 sec: 6826.7, 300 sec: 6748.0). Total num frames: 143405056. Throughput: 0: 1645.9. Samples: 30845986. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:17:22,934][41694] Avg episode reward: [(0, '4.410')] +[2024-11-08 04:17:25,294][42004] Updated weights for policy 0, policy_version 35016 (0.0021) +[2024-11-08 04:17:27,931][41694] Fps is (10 sec: 7375.4, 60 sec: 6826.7, 300 sec: 6734.1). Total num frames: 143441920. Throughput: 0: 1730.9. Samples: 30856816. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:17:27,933][41694] Avg episode reward: [(0, '4.451')] +[2024-11-08 04:17:30,822][42004] Updated weights for policy 0, policy_version 35026 (0.0030) +[2024-11-08 04:17:32,931][41694] Fps is (10 sec: 7372.7, 60 sec: 6758.4, 300 sec: 6734.1). Total num frames: 143478784. Throughput: 0: 1752.2. Samples: 30862646. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:17:32,934][41694] Avg episode reward: [(0, '4.477')] +[2024-11-08 04:17:36,321][42004] Updated weights for policy 0, policy_version 35036 (0.0024) +[2024-11-08 04:17:37,933][41694] Fps is (10 sec: 7372.0, 60 sec: 6758.3, 300 sec: 6789.6). Total num frames: 143515648. Throughput: 0: 1746.5. Samples: 30873782. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:17:37,935][41694] Avg episode reward: [(0, '4.496')] +[2024-11-08 04:17:37,993][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000035039_143519744.pth... +[2024-11-08 04:17:38,090][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000034637_141873152.pth +[2024-11-08 04:17:41,911][42004] Updated weights for policy 0, policy_version 35046 (0.0032) +[2024-11-08 04:17:42,933][41694] Fps is (10 sec: 7371.8, 60 sec: 7031.6, 300 sec: 6803.5). Total num frames: 143552512. Throughput: 0: 1735.1. Samples: 30884832. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:17:42,935][41694] Avg episode reward: [(0, '4.327')] +[2024-11-08 04:17:47,261][42004] Updated weights for policy 0, policy_version 35056 (0.0045) +[2024-11-08 04:17:47,932][41694] Fps is (10 sec: 7783.2, 60 sec: 7168.0, 300 sec: 6831.3). Total num frames: 143593472. Throughput: 0: 1741.3. Samples: 30890484. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:17:47,933][41694] Avg episode reward: [(0, '4.100')] +[2024-11-08 04:17:52,932][41694] Fps is (10 sec: 6144.8, 60 sec: 6826.6, 300 sec: 6761.9). Total num frames: 143613952. Throughput: 0: 1660.0. Samples: 30898110. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:17:52,933][41694] Avg episode reward: [(0, '4.333')] +[2024-11-08 04:17:54,849][42004] Updated weights for policy 0, policy_version 35066 (0.0029) +[2024-11-08 04:17:57,931][41694] Fps is (10 sec: 5734.5, 60 sec: 6826.7, 300 sec: 6775.8). Total num frames: 143650816. Throughput: 0: 1661.3. Samples: 30909282. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:17:57,933][41694] Avg episode reward: [(0, '4.589')] +[2024-11-08 04:18:00,169][42004] Updated weights for policy 0, policy_version 35076 (0.0036) +[2024-11-08 04:18:02,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6758.5, 300 sec: 6775.8). Total num frames: 143683584. Throughput: 0: 1719.8. Samples: 30915134. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:18:02,933][41694] Avg episode reward: [(0, '4.371')] +[2024-11-08 04:18:06,610][42004] Updated weights for policy 0, policy_version 35086 (0.0026) +[2024-11-08 04:18:07,933][41694] Fps is (10 sec: 6552.5, 60 sec: 6690.0, 300 sec: 6775.7). Total num frames: 143716352. Throughput: 0: 1744.7. Samples: 30924502. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:18:07,935][41694] Avg episode reward: [(0, '4.460')] +[2024-11-08 04:18:12,471][42004] Updated weights for policy 0, policy_version 35096 (0.0038) +[2024-11-08 04:18:12,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6690.1, 300 sec: 6845.2). Total num frames: 143753216. Throughput: 0: 1735.4. Samples: 30934908. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:18:12,934][41694] Avg episode reward: [(0, '4.721')] +[2024-11-08 04:18:17,932][41694] Fps is (10 sec: 7373.8, 60 sec: 7031.8, 300 sec: 6859.1). Total num frames: 143790080. Throughput: 0: 1726.7. Samples: 30940350. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:18:17,934][41694] Avg episode reward: [(0, '4.318')] +[2024-11-08 04:18:18,054][42004] Updated weights for policy 0, policy_version 35106 (0.0025) +[2024-11-08 04:18:24,612][41694] Fps is (10 sec: 6312.2, 60 sec: 6839.9, 300 sec: 6834.0). Total num frames: 143826944. Throughput: 0: 1664.8. Samples: 30951494. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:18:24,615][41694] Avg episode reward: [(0, '4.325')] +[2024-11-08 04:18:25,449][42004] Updated weights for policy 0, policy_version 35116 (0.0038) +[2024-11-08 04:18:27,932][41694] Fps is (10 sec: 6144.1, 60 sec: 6826.7, 300 sec: 6845.2). Total num frames: 143851520. Throughput: 0: 1638.9. Samples: 30958580. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:18:27,934][41694] Avg episode reward: [(0, '4.315')] +[2024-11-08 04:18:31,412][42004] Updated weights for policy 0, policy_version 35126 (0.0031) +[2024-11-08 04:18:32,932][41694] Fps is (10 sec: 6891.9, 60 sec: 6758.3, 300 sec: 6831.3). Total num frames: 143884288. Throughput: 0: 1635.8. Samples: 30964096. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:18:32,935][41694] Avg episode reward: [(0, '4.547')] +[2024-11-08 04:18:37,206][42004] Updated weights for policy 0, policy_version 35136 (0.0034) +[2024-11-08 04:18:37,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6758.5, 300 sec: 6817.4). Total num frames: 143921152. Throughput: 0: 1704.4. Samples: 30974808. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:18:37,933][41694] Avg episode reward: [(0, '4.516')] +[2024-11-08 04:18:42,796][42004] Updated weights for policy 0, policy_version 35146 (0.0025) +[2024-11-08 04:18:42,931][41694] Fps is (10 sec: 7373.4, 60 sec: 6758.5, 300 sec: 6817.4). Total num frames: 143958016. Throughput: 0: 1696.6. Samples: 30985630. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:18:42,933][41694] Avg episode reward: [(0, '4.460')] +[2024-11-08 04:18:47,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6690.1, 300 sec: 6873.0). Total num frames: 143994880. Throughput: 0: 1686.8. Samples: 30991038. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:18:47,933][41694] Avg episode reward: [(0, '4.586')] +[2024-11-08 04:18:48,165][42004] Updated weights for policy 0, policy_version 35156 (0.0033) +[2024-11-08 04:18:52,932][41694] Fps is (10 sec: 7372.4, 60 sec: 6963.1, 300 sec: 6886.8). Total num frames: 144031744. Throughput: 0: 1729.2. Samples: 31002316. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:18:52,935][41694] Avg episode reward: [(0, '4.786')] +[2024-11-08 04:18:53,646][42004] Updated weights for policy 0, policy_version 35166 (0.0034) +[2024-11-08 04:18:58,741][41694] Fps is (10 sec: 6062.8, 60 sec: 6735.8, 300 sec: 6840.3). Total num frames: 144060416. Throughput: 0: 1600.5. Samples: 31008226. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:18:58,743][41694] Avg episode reward: [(0, '4.496')] +[2024-11-08 04:19:01,234][42004] Updated weights for policy 0, policy_version 35176 (0.0033) +[2024-11-08 04:19:02,932][41694] Fps is (10 sec: 5734.6, 60 sec: 6758.4, 300 sec: 6831.3). Total num frames: 144089088. Throughput: 0: 1669.6. Samples: 31015482. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-11-08 04:19:02,935][41694] Avg episode reward: [(0, '4.436')] +[2024-11-08 04:19:06,892][42004] Updated weights for policy 0, policy_version 35186 (0.0027) +[2024-11-08 04:19:07,931][41694] Fps is (10 sec: 7130.8, 60 sec: 6826.9, 300 sec: 6817.4). Total num frames: 144125952. Throughput: 0: 1727.1. Samples: 31026312. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-11-08 04:19:07,933][41694] Avg episode reward: [(0, '4.438')] +[2024-11-08 04:19:12,868][42004] Updated weights for policy 0, policy_version 35196 (0.0035) +[2024-11-08 04:19:12,931][41694] Fps is (10 sec: 7373.1, 60 sec: 6826.7, 300 sec: 6817.4). Total num frames: 144162816. Throughput: 0: 1737.9. Samples: 31036786. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-11-08 04:19:12,932][41694] Avg episode reward: [(0, '4.457')] +[2024-11-08 04:19:17,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6826.7, 300 sec: 6869.0). Total num frames: 144199680. Throughput: 0: 1731.3. Samples: 31042004. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-11-08 04:19:17,933][41694] Avg episode reward: [(0, '4.594')] +[2024-11-08 04:19:18,448][42004] Updated weights for policy 0, policy_version 35206 (0.0034) +[2024-11-08 04:19:22,931][41694] Fps is (10 sec: 7372.8, 60 sec: 7023.4, 300 sec: 6886.9). Total num frames: 144236544. Throughput: 0: 1739.5. Samples: 31053086. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:19:22,933][41694] Avg episode reward: [(0, '4.537')] +[2024-11-08 04:19:23,945][42004] Updated weights for policy 0, policy_version 35216 (0.0031) +[2024-11-08 04:19:27,932][41694] Fps is (10 sec: 7372.8, 60 sec: 7031.5, 300 sec: 6886.8). Total num frames: 144273408. Throughput: 0: 1747.8. Samples: 31064280. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:19:27,933][41694] Avg episode reward: [(0, '4.675')] +[2024-11-08 04:19:29,628][42004] Updated weights for policy 0, policy_version 35226 (0.0026) +[2024-11-08 04:19:32,998][41694] Fps is (10 sec: 5696.4, 60 sec: 6819.2, 300 sec: 6815.9). Total num frames: 144293888. Throughput: 0: 1744.4. Samples: 31069654. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:19:33,000][41694] Avg episode reward: [(0, '4.549')] +[2024-11-08 04:19:37,174][42004] Updated weights for policy 0, policy_version 35236 (0.0025) +[2024-11-08 04:19:37,931][41694] Fps is (10 sec: 5734.5, 60 sec: 6826.7, 300 sec: 6817.4). Total num frames: 144330752. Throughput: 0: 1657.0. Samples: 31076878. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:19:37,933][41694] Avg episode reward: [(0, '4.435')] +[2024-11-08 04:19:37,943][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000035237_144330752.pth... +[2024-11-08 04:19:38,041][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000034835_142684160.pth +[2024-11-08 04:19:42,838][42004] Updated weights for policy 0, policy_version 35246 (0.0029) +[2024-11-08 04:19:42,932][41694] Fps is (10 sec: 7422.2, 60 sec: 6826.7, 300 sec: 6803.6). Total num frames: 144367616. Throughput: 0: 1809.9. Samples: 31088206. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:19:42,933][41694] Avg episode reward: [(0, '4.500')] +[2024-11-08 04:19:47,933][41694] Fps is (10 sec: 6552.9, 60 sec: 6690.0, 300 sec: 6775.7). Total num frames: 144396288. Throughput: 0: 1708.8. Samples: 31092380. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:19:47,934][41694] Avg episode reward: [(0, '4.617')] +[2024-11-08 04:19:49,031][42004] Updated weights for policy 0, policy_version 35256 (0.0041) +[2024-11-08 04:19:52,931][41694] Fps is (10 sec: 6553.6, 60 sec: 6690.2, 300 sec: 6826.8). Total num frames: 144433152. Throughput: 0: 1703.3. Samples: 31102962. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:19:52,934][41694] Avg episode reward: [(0, '4.517')] +[2024-11-08 04:19:54,760][42004] Updated weights for policy 0, policy_version 35266 (0.0034) +[2024-11-08 04:19:57,932][41694] Fps is (10 sec: 7373.5, 60 sec: 6920.0, 300 sec: 6859.1). Total num frames: 144470016. Throughput: 0: 1722.3. Samples: 31114288. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:19:57,933][41694] Avg episode reward: [(0, '4.353')] +[2024-11-08 04:20:00,517][42004] Updated weights for policy 0, policy_version 35276 (0.0022) +[2024-11-08 04:20:02,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6963.2, 300 sec: 6859.1). Total num frames: 144506880. Throughput: 0: 1717.6. Samples: 31119296. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:20:02,933][41694] Avg episode reward: [(0, '4.458')] +[2024-11-08 04:20:07,931][41694] Fps is (10 sec: 5734.5, 60 sec: 6690.1, 300 sec: 6803.5). Total num frames: 144527360. Throughput: 0: 1675.4. Samples: 31128478. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:20:07,933][41694] Avg episode reward: [(0, '4.466')] +[2024-11-08 04:20:08,272][42004] Updated weights for policy 0, policy_version 35286 (0.0034) +[2024-11-08 04:20:12,931][41694] Fps is (10 sec: 5734.5, 60 sec: 6690.1, 300 sec: 6817.4). Total num frames: 144564224. Throughput: 0: 1623.4. Samples: 31137332. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:20:12,934][41694] Avg episode reward: [(0, '4.634')] +[2024-11-08 04:20:13,922][42004] Updated weights for policy 0, policy_version 35296 (0.0036) +[2024-11-08 04:20:17,933][41694] Fps is (10 sec: 6553.2, 60 sec: 6553.5, 300 sec: 6789.6). Total num frames: 144592896. Throughput: 0: 1615.1. Samples: 31142228. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:20:17,934][41694] Avg episode reward: [(0, '4.321')] +[2024-11-08 04:20:20,947][42004] Updated weights for policy 0, policy_version 35306 (0.0032) +[2024-11-08 04:20:22,932][41694] Fps is (10 sec: 6143.9, 60 sec: 6485.3, 300 sec: 6775.8). Total num frames: 144625664. Throughput: 0: 1647.7. Samples: 31151024. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:20:22,934][41694] Avg episode reward: [(0, '4.365')] +[2024-11-08 04:20:26,583][42004] Updated weights for policy 0, policy_version 35316 (0.0037) +[2024-11-08 04:20:27,932][41694] Fps is (10 sec: 6963.6, 60 sec: 6485.3, 300 sec: 6814.2). Total num frames: 144662528. Throughput: 0: 1639.4. Samples: 31161980. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:20:27,933][41694] Avg episode reward: [(0, '4.638')] +[2024-11-08 04:20:32,472][42004] Updated weights for policy 0, policy_version 35326 (0.0025) +[2024-11-08 04:20:32,931][41694] Fps is (10 sec: 6963.4, 60 sec: 6697.6, 300 sec: 6817.5). Total num frames: 144695296. Throughput: 0: 1661.0. Samples: 31167124. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:20:32,933][41694] Avg episode reward: [(0, '4.459')] +[2024-11-08 04:20:37,884][42004] Updated weights for policy 0, policy_version 35336 (0.0032) +[2024-11-08 04:20:37,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6758.4, 300 sec: 6831.3). Total num frames: 144736256. Throughput: 0: 1667.4. Samples: 31177996. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:20:37,934][41694] Avg episode reward: [(0, '4.448')] +[2024-11-08 04:20:42,931][41694] Fps is (10 sec: 5734.4, 60 sec: 6417.1, 300 sec: 6761.9). Total num frames: 144752640. Throughput: 0: 1575.2. Samples: 31185170. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:20:42,933][41694] Avg episode reward: [(0, '4.310')] +[2024-11-08 04:20:45,913][42004] Updated weights for policy 0, policy_version 35346 (0.0025) +[2024-11-08 04:20:47,932][41694] Fps is (10 sec: 5324.7, 60 sec: 6553.7, 300 sec: 6748.0). Total num frames: 144789504. Throughput: 0: 1576.4. Samples: 31190234. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:20:47,933][41694] Avg episode reward: [(0, '4.440')] +[2024-11-08 04:20:51,844][42004] Updated weights for policy 0, policy_version 35356 (0.0028) +[2024-11-08 04:20:52,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6485.3, 300 sec: 6734.1). Total num frames: 144822272. Throughput: 0: 1604.9. Samples: 31200700. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:20:52,933][41694] Avg episode reward: [(0, '4.567')] +[2024-11-08 04:20:57,691][42004] Updated weights for policy 0, policy_version 35366 (0.0036) +[2024-11-08 04:20:57,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6485.3, 300 sec: 6734.1). Total num frames: 144859136. Throughput: 0: 1637.9. Samples: 31211038. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:20:57,938][41694] Avg episode reward: [(0, '4.497')] +[2024-11-08 04:21:02,932][41694] Fps is (10 sec: 6962.7, 60 sec: 6417.0, 300 sec: 6775.7). Total num frames: 144891904. Throughput: 0: 1646.8. Samples: 31216336. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:21:02,934][41694] Avg episode reward: [(0, '4.454')] +[2024-11-08 04:21:03,614][42004] Updated weights for policy 0, policy_version 35376 (0.0043) +[2024-11-08 04:21:07,932][41694] Fps is (10 sec: 6963.3, 60 sec: 6690.1, 300 sec: 6803.5). Total num frames: 144928768. Throughput: 0: 1685.7. Samples: 31226882. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:21:07,933][41694] Avg episode reward: [(0, '4.446')] +[2024-11-08 04:21:09,634][42004] Updated weights for policy 0, policy_version 35386 (0.0025) +[2024-11-08 04:21:12,932][41694] Fps is (10 sec: 6963.4, 60 sec: 6621.8, 300 sec: 6789.6). Total num frames: 144961536. Throughput: 0: 1657.1. Samples: 31236548. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:21:12,935][41694] Avg episode reward: [(0, '4.385')] +[2024-11-08 04:21:17,845][42004] Updated weights for policy 0, policy_version 35396 (0.0029) +[2024-11-08 04:21:17,931][41694] Fps is (10 sec: 5324.8, 60 sec: 6485.4, 300 sec: 6734.1). Total num frames: 144982016. Throughput: 0: 1601.8. Samples: 31239204. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:21:17,933][41694] Avg episode reward: [(0, '4.312')] +[2024-11-08 04:21:22,931][41694] Fps is (10 sec: 5734.6, 60 sec: 6553.6, 300 sec: 6734.1). Total num frames: 145018880. Throughput: 0: 1575.8. Samples: 31248908. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:21:22,934][41694] Avg episode reward: [(0, '4.510')] +[2024-11-08 04:21:23,361][42004] Updated weights for policy 0, policy_version 35406 (0.0038) +[2024-11-08 04:21:27,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6485.3, 300 sec: 6706.3). Total num frames: 145051648. Throughput: 0: 1647.1. Samples: 31259288. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:21:27,934][41694] Avg episode reward: [(0, '4.450')] +[2024-11-08 04:21:29,562][42004] Updated weights for policy 0, policy_version 35416 (0.0034) +[2024-11-08 04:21:32,932][41694] Fps is (10 sec: 6553.5, 60 sec: 6485.3, 300 sec: 6692.4). Total num frames: 145084416. Throughput: 0: 1648.2. Samples: 31264404. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:21:32,933][41694] Avg episode reward: [(0, '4.454')] +[2024-11-08 04:21:35,262][42004] Updated weights for policy 0, policy_version 35426 (0.0029) +[2024-11-08 04:21:37,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6417.1, 300 sec: 6748.0). Total num frames: 145121280. Throughput: 0: 1656.8. Samples: 31275258. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:21:37,934][41694] Avg episode reward: [(0, '4.388')] +[2024-11-08 04:21:38,022][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000035431_145125376.pth... +[2024-11-08 04:21:38,141][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000035039_143519744.pth +[2024-11-08 04:21:40,990][42004] Updated weights for policy 0, policy_version 35436 (0.0027) +[2024-11-08 04:21:42,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6758.4, 300 sec: 6761.9). Total num frames: 145158144. Throughput: 0: 1667.2. Samples: 31286060. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:21:42,933][41694] Avg episode reward: [(0, '4.567')] +[2024-11-08 04:21:46,535][42004] Updated weights for policy 0, policy_version 35446 (0.0027) +[2024-11-08 04:21:47,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6758.4, 300 sec: 6748.0). Total num frames: 145195008. Throughput: 0: 1669.0. Samples: 31291438. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:21:47,935][41694] Avg episode reward: [(0, '4.758')] +[2024-11-08 04:21:52,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6553.6, 300 sec: 6692.4). Total num frames: 145215488. Throughput: 0: 1592.1. Samples: 31298526. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:21:52,935][41694] Avg episode reward: [(0, '4.586')] +[2024-11-08 04:21:54,487][42004] Updated weights for policy 0, policy_version 35456 (0.0040) +[2024-11-08 04:21:57,932][41694] Fps is (10 sec: 5324.8, 60 sec: 6485.4, 300 sec: 6678.6). Total num frames: 145248256. Throughput: 0: 1611.7. Samples: 31309072. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:21:57,933][41694] Avg episode reward: [(0, '4.412')] +[2024-11-08 04:22:00,455][42004] Updated weights for policy 0, policy_version 35466 (0.0035) +[2024-11-08 04:22:02,931][41694] Fps is (10 sec: 6553.7, 60 sec: 6485.4, 300 sec: 6664.7). Total num frames: 145281024. Throughput: 0: 1658.0. Samples: 31313814. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:22:02,933][41694] Avg episode reward: [(0, '4.547')] +[2024-11-08 04:22:06,557][42004] Updated weights for policy 0, policy_version 35476 (0.0036) +[2024-11-08 04:22:07,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6485.3, 300 sec: 6664.7). Total num frames: 145317888. Throughput: 0: 1669.8. Samples: 31324050. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:22:07,933][41694] Avg episode reward: [(0, '4.625')] +[2024-11-08 04:22:12,024][42004] Updated weights for policy 0, policy_version 35486 (0.0033) +[2024-11-08 04:22:12,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6553.6, 300 sec: 6734.2). Total num frames: 145354752. Throughput: 0: 1689.2. Samples: 31335304. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:22:12,933][41694] Avg episode reward: [(0, '4.725')] +[2024-11-08 04:22:17,470][42004] Updated weights for policy 0, policy_version 35496 (0.0027) +[2024-11-08 04:22:17,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6826.7, 300 sec: 6734.1). Total num frames: 145391616. Throughput: 0: 1693.8. Samples: 31340624. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:22:17,933][41694] Avg episode reward: [(0, '4.493')] +[2024-11-08 04:22:24,478][41694] Fps is (10 sec: 6030.6, 60 sec: 6588.6, 300 sec: 6685.2). Total num frames: 145424384. Throughput: 0: 1642.1. Samples: 31351690. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:22:24,480][41694] Avg episode reward: [(0, '4.426')] +[2024-11-08 04:22:25,223][42004] Updated weights for policy 0, policy_version 35506 (0.0024) +[2024-11-08 04:22:27,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6621.8, 300 sec: 6678.6). Total num frames: 145448960. Throughput: 0: 1624.6. Samples: 31359168. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 04:22:27,933][41694] Avg episode reward: [(0, '4.601')] +[2024-11-08 04:22:30,654][42004] Updated weights for policy 0, policy_version 35516 (0.0045) +[2024-11-08 04:22:32,931][41694] Fps is (10 sec: 7267.8, 60 sec: 6690.1, 300 sec: 6678.6). Total num frames: 145485824. Throughput: 0: 1631.1. Samples: 31364838. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 04:22:32,934][41694] Avg episode reward: [(0, '4.379')] +[2024-11-08 04:22:36,802][42004] Updated weights for policy 0, policy_version 35526 (0.0044) +[2024-11-08 04:22:37,932][41694] Fps is (10 sec: 6962.9, 60 sec: 6621.8, 300 sec: 6664.7). Total num frames: 145518592. Throughput: 0: 1690.9. Samples: 31374618. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 04:22:37,934][41694] Avg episode reward: [(0, '4.598')] +[2024-11-08 04:22:42,550][42004] Updated weights for policy 0, policy_version 35536 (0.0038) +[2024-11-08 04:22:42,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6621.9, 300 sec: 6650.8). Total num frames: 145555456. Throughput: 0: 1696.9. Samples: 31385432. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 04:22:42,933][41694] Avg episode reward: [(0, '4.445')] +[2024-11-08 04:22:47,932][41694] Fps is (10 sec: 7373.1, 60 sec: 6621.9, 300 sec: 6706.3). Total num frames: 145592320. Throughput: 0: 1719.2. Samples: 31391180. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 04:22:47,933][41694] Avg episode reward: [(0, '4.509')] +[2024-11-08 04:22:47,971][42004] Updated weights for policy 0, policy_version 35546 (0.0024) +[2024-11-08 04:22:52,931][41694] Fps is (10 sec: 7782.4, 60 sec: 6963.2, 300 sec: 6720.2). Total num frames: 145633280. Throughput: 0: 1740.7. Samples: 31402380. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:22:52,933][41694] Avg episode reward: [(0, '4.326')] +[2024-11-08 04:22:53,323][42004] Updated weights for policy 0, policy_version 35556 (0.0032) +[2024-11-08 04:22:58,730][41694] Fps is (10 sec: 6448.5, 60 sec: 6804.4, 300 sec: 6688.2). Total num frames: 145661952. Throughput: 0: 1592.5. Samples: 31408240. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:22:58,732][41694] Avg episode reward: [(0, '4.347')] +[2024-11-08 04:23:00,948][42004] Updated weights for policy 0, policy_version 35566 (0.0026) +[2024-11-08 04:23:02,931][41694] Fps is (10 sec: 5734.4, 60 sec: 6826.7, 300 sec: 6692.5). Total num frames: 145690624. Throughput: 0: 1660.2. Samples: 31415332. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:23:02,933][41694] Avg episode reward: [(0, '4.449')] +[2024-11-08 04:23:07,143][42004] Updated weights for policy 0, policy_version 35576 (0.0036) +[2024-11-08 04:23:07,931][41694] Fps is (10 sec: 6677.2, 60 sec: 6758.4, 300 sec: 6678.6). Total num frames: 145723392. Throughput: 0: 1701.5. Samples: 31425628. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:23:07,933][41694] Avg episode reward: [(0, '4.500')] +[2024-11-08 04:23:12,839][42004] Updated weights for policy 0, policy_version 35586 (0.0031) +[2024-11-08 04:23:12,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6758.4, 300 sec: 6678.6). Total num frames: 145760256. Throughput: 0: 1709.1. Samples: 31436076. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:23:12,937][41694] Avg episode reward: [(0, '4.632')] +[2024-11-08 04:23:17,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6758.4, 300 sec: 6716.8). Total num frames: 145797120. Throughput: 0: 1701.1. Samples: 31441386. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:23:17,934][41694] Avg episode reward: [(0, '4.652')] +[2024-11-08 04:23:18,252][42004] Updated weights for policy 0, policy_version 35596 (0.0026) +[2024-11-08 04:23:22,932][41694] Fps is (10 sec: 7372.7, 60 sec: 7007.2, 300 sec: 6720.2). Total num frames: 145833984. Throughput: 0: 1741.8. Samples: 31453000. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:23:22,934][41694] Avg episode reward: [(0, '4.509')] +[2024-11-08 04:23:23,639][42004] Updated weights for policy 0, policy_version 35606 (0.0032) +[2024-11-08 04:23:27,932][41694] Fps is (10 sec: 7782.2, 60 sec: 7099.7, 300 sec: 6748.0). Total num frames: 145874944. Throughput: 0: 1757.0. Samples: 31464496. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:23:27,934][41694] Avg episode reward: [(0, '4.438')] +[2024-11-08 04:23:29,086][42004] Updated weights for policy 0, policy_version 35616 (0.0031) +[2024-11-08 04:23:32,968][41694] Fps is (10 sec: 5713.7, 60 sec: 6754.3, 300 sec: 6677.7). Total num frames: 145891328. Throughput: 0: 1744.2. Samples: 31469732. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:23:32,969][41694] Avg episode reward: [(0, '4.543')] +[2024-11-08 04:23:36,698][42004] Updated weights for policy 0, policy_version 35626 (0.0021) +[2024-11-08 04:23:37,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6895.0, 300 sec: 6692.4). Total num frames: 145932288. Throughput: 0: 1666.3. Samples: 31477366. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 04:23:37,934][41694] Avg episode reward: [(0, '4.233')] +[2024-11-08 04:23:37,951][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000035628_145932288.pth... +[2024-11-08 04:23:38,205][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000035237_144330752.pth +[2024-11-08 04:23:42,932][41694] Fps is (10 sec: 6988.5, 60 sec: 6758.4, 300 sec: 6664.7). Total num frames: 145960960. Throughput: 0: 1781.2. Samples: 31486974. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 04:23:42,934][41694] Avg episode reward: [(0, '4.265')] +[2024-11-08 04:23:43,087][42004] Updated weights for policy 0, policy_version 35636 (0.0045) +[2024-11-08 04:23:47,931][41694] Fps is (10 sec: 6553.7, 60 sec: 6758.4, 300 sec: 6664.7). Total num frames: 145997824. Throughput: 0: 1705.2. Samples: 31492068. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 04:23:47,933][41694] Avg episode reward: [(0, '4.397')] +[2024-11-08 04:23:48,560][42004] Updated weights for policy 0, policy_version 35646 (0.0038) +[2024-11-08 04:23:52,931][41694] Fps is (10 sec: 7782.5, 60 sec: 6758.4, 300 sec: 6724.8). Total num frames: 146038784. Throughput: 0: 1736.0. Samples: 31503748. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 04:23:52,934][41694] Avg episode reward: [(0, '4.453')] +[2024-11-08 04:23:53,876][42004] Updated weights for policy 0, policy_version 35656 (0.0033) +[2024-11-08 04:23:57,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6918.7, 300 sec: 6720.2). Total num frames: 146071552. Throughput: 0: 1738.6. Samples: 31514312. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 04:23:57,933][41694] Avg episode reward: [(0, '4.523')] +[2024-11-08 04:23:59,729][42004] Updated weights for policy 0, policy_version 35666 (0.0029) +[2024-11-08 04:24:02,932][41694] Fps is (10 sec: 6962.9, 60 sec: 6963.1, 300 sec: 6720.2). Total num frames: 146108416. Throughput: 0: 1746.4. Samples: 31519976. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:24:02,934][41694] Avg episode reward: [(0, '4.290')] +[2024-11-08 04:24:07,378][42004] Updated weights for policy 0, policy_version 35676 (0.0027) +[2024-11-08 04:24:07,932][41694] Fps is (10 sec: 6143.9, 60 sec: 6826.6, 300 sec: 6678.6). Total num frames: 146132992. Throughput: 0: 1686.8. Samples: 31528906. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:24:07,933][41694] Avg episode reward: [(0, '4.606')] +[2024-11-08 04:24:12,906][42004] Updated weights for policy 0, policy_version 35686 (0.0026) +[2024-11-08 04:24:12,932][41694] Fps is (10 sec: 6144.2, 60 sec: 6826.7, 300 sec: 6678.6). Total num frames: 146169856. Throughput: 0: 1638.4. Samples: 31538226. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:24:12,933][41694] Avg episode reward: [(0, '4.327')] +[2024-11-08 04:24:17,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6758.4, 300 sec: 6664.7). Total num frames: 146202624. Throughput: 0: 1644.7. Samples: 31543684. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:24:17,934][41694] Avg episode reward: [(0, '4.439')] +[2024-11-08 04:24:18,714][42004] Updated weights for policy 0, policy_version 35696 (0.0025) +[2024-11-08 04:24:22,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6758.4, 300 sec: 6664.7). Total num frames: 146239488. Throughput: 0: 1702.2. Samples: 31553966. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:24:22,933][41694] Avg episode reward: [(0, '4.444')] +[2024-11-08 04:24:24,506][42004] Updated weights for policy 0, policy_version 35706 (0.0035) +[2024-11-08 04:24:27,931][41694] Fps is (10 sec: 6963.4, 60 sec: 6621.9, 300 sec: 6707.8). Total num frames: 146272256. Throughput: 0: 1732.8. Samples: 31564952. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:24:27,933][41694] Avg episode reward: [(0, '4.417')] +[2024-11-08 04:24:30,628][42004] Updated weights for policy 0, policy_version 35716 (0.0045) +[2024-11-08 04:24:32,931][41694] Fps is (10 sec: 6963.4, 60 sec: 6967.4, 300 sec: 6706.3). Total num frames: 146309120. Throughput: 0: 1725.5. Samples: 31569716. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:24:32,933][41694] Avg episode reward: [(0, '4.255')] +[2024-11-08 04:24:35,954][42004] Updated weights for policy 0, policy_version 35726 (0.0030) +[2024-11-08 04:24:37,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6895.0, 300 sec: 6706.3). Total num frames: 146345984. Throughput: 0: 1719.2. Samples: 31581110. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:24:37,933][41694] Avg episode reward: [(0, '4.543')] +[2024-11-08 04:24:42,931][41694] Fps is (10 sec: 6144.0, 60 sec: 6826.7, 300 sec: 6692.5). Total num frames: 146370560. Throughput: 0: 1651.6. Samples: 31588636. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:24:42,933][41694] Avg episode reward: [(0, '4.442')] +[2024-11-08 04:24:43,373][42004] Updated weights for policy 0, policy_version 35736 (0.0031) +[2024-11-08 04:24:47,931][41694] Fps is (10 sec: 5734.4, 60 sec: 6758.4, 300 sec: 6678.6). Total num frames: 146403328. Throughput: 0: 1645.7. Samples: 31594032. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:24:47,933][41694] Avg episode reward: [(0, '4.493')] +[2024-11-08 04:24:49,180][42004] Updated weights for policy 0, policy_version 35746 (0.0029) +[2024-11-08 04:24:52,931][41694] Fps is (10 sec: 6963.1, 60 sec: 6690.1, 300 sec: 6678.6). Total num frames: 146440192. Throughput: 0: 1677.3. Samples: 31604386. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:24:52,934][41694] Avg episode reward: [(0, '4.575')] +[2024-11-08 04:24:54,971][42004] Updated weights for policy 0, policy_version 35756 (0.0024) +[2024-11-08 04:24:57,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6758.4, 300 sec: 6678.6). Total num frames: 146477056. Throughput: 0: 1718.3. Samples: 31615550. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:24:57,935][41694] Avg episode reward: [(0, '4.583')] +[2024-11-08 04:25:00,368][42004] Updated weights for policy 0, policy_version 35766 (0.0039) +[2024-11-08 04:25:02,932][41694] Fps is (10 sec: 7372.6, 60 sec: 6758.4, 300 sec: 6734.1). Total num frames: 146513920. Throughput: 0: 1723.9. Samples: 31621258. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:25:02,934][41694] Avg episode reward: [(0, '4.428')] +[2024-11-08 04:25:06,204][42004] Updated weights for policy 0, policy_version 35776 (0.0026) +[2024-11-08 04:25:07,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6963.2, 300 sec: 6734.1). Total num frames: 146550784. Throughput: 0: 1731.7. Samples: 31631890. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:25:07,933][41694] Avg episode reward: [(0, '4.492')] +[2024-11-08 04:25:11,860][42004] Updated weights for policy 0, policy_version 35786 (0.0028) +[2024-11-08 04:25:12,932][41694] Fps is (10 sec: 6963.3, 60 sec: 6894.9, 300 sec: 6748.0). Total num frames: 146583552. Throughput: 0: 1728.8. Samples: 31642748. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:25:12,934][41694] Avg episode reward: [(0, '4.368')] +[2024-11-08 04:25:17,932][41694] Fps is (10 sec: 5324.7, 60 sec: 6690.1, 300 sec: 6706.3). Total num frames: 146604032. Throughput: 0: 1659.4. Samples: 31644388. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:25:17,933][41694] Avg episode reward: [(0, '4.396')] +[2024-11-08 04:25:20,265][42004] Updated weights for policy 0, policy_version 35796 (0.0031) +[2024-11-08 04:25:22,934][41694] Fps is (10 sec: 5323.9, 60 sec: 6621.7, 300 sec: 6692.4). Total num frames: 146636800. Throughput: 0: 1619.7. Samples: 31654002. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:25:22,935][41694] Avg episode reward: [(0, '4.187')] +[2024-11-08 04:25:26,369][42004] Updated weights for policy 0, policy_version 35806 (0.0026) +[2024-11-08 04:25:27,932][41694] Fps is (10 sec: 6553.7, 60 sec: 6621.9, 300 sec: 6692.4). Total num frames: 146669568. Throughput: 0: 1674.3. Samples: 31663978. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:25:27,934][41694] Avg episode reward: [(0, '4.272')] +[2024-11-08 04:25:31,920][42004] Updated weights for policy 0, policy_version 35816 (0.0025) +[2024-11-08 04:25:32,931][41694] Fps is (10 sec: 7374.3, 60 sec: 6690.1, 300 sec: 6692.5). Total num frames: 146710528. Throughput: 0: 1675.7. Samples: 31669438. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:25:32,934][41694] Avg episode reward: [(0, '4.398')] +[2024-11-08 04:25:37,167][42004] Updated weights for policy 0, policy_version 35826 (0.0035) +[2024-11-08 04:25:37,932][41694] Fps is (10 sec: 7782.1, 60 sec: 6690.1, 300 sec: 6761.9). Total num frames: 146747392. Throughput: 0: 1707.1. Samples: 31681208. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:25:37,936][41694] Avg episode reward: [(0, '4.456')] +[2024-11-08 04:25:37,954][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000035827_146747392.pth... +[2024-11-08 04:25:38,066][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000035431_145125376.pth +[2024-11-08 04:25:42,465][42004] Updated weights for policy 0, policy_version 35836 (0.0025) +[2024-11-08 04:25:42,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6894.9, 300 sec: 6761.9). Total num frames: 146784256. Throughput: 0: 1714.5. Samples: 31692700. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:25:42,933][41694] Avg episode reward: [(0, '4.230')] +[2024-11-08 04:25:49,238][41694] Fps is (10 sec: 6159.0, 60 sec: 6748.0, 300 sec: 6732.1). Total num frames: 146817024. Throughput: 0: 1658.7. Samples: 31698068. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:25:49,241][41694] Avg episode reward: [(0, '4.250')] +[2024-11-08 04:25:49,979][42004] Updated weights for policy 0, policy_version 35846 (0.0038) +[2024-11-08 04:25:52,932][41694] Fps is (10 sec: 6143.9, 60 sec: 6758.4, 300 sec: 6734.1). Total num frames: 146845696. Throughput: 0: 1642.3. Samples: 31705794. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:25:52,937][41694] Avg episode reward: [(0, '4.500')] +[2024-11-08 04:25:55,785][42004] Updated weights for policy 0, policy_version 35856 (0.0030) +[2024-11-08 04:25:57,932][41694] Fps is (10 sec: 7067.1, 60 sec: 6690.1, 300 sec: 6734.1). Total num frames: 146878464. Throughput: 0: 1630.9. Samples: 31716140. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:25:57,933][41694] Avg episode reward: [(0, '4.531')] +[2024-11-08 04:26:02,085][42004] Updated weights for policy 0, policy_version 35866 (0.0026) +[2024-11-08 04:26:02,931][41694] Fps is (10 sec: 6553.7, 60 sec: 6621.9, 300 sec: 6720.2). Total num frames: 146911232. Throughput: 0: 1702.8. Samples: 31721012. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:26:02,934][41694] Avg episode reward: [(0, '4.305')] +[2024-11-08 04:26:07,599][42004] Updated weights for policy 0, policy_version 35876 (0.0030) +[2024-11-08 04:26:07,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6621.8, 300 sec: 6734.1). Total num frames: 146948096. Throughput: 0: 1725.4. Samples: 31731640. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:26:07,935][41694] Avg episode reward: [(0, '4.613')] +[2024-11-08 04:26:12,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6690.2, 300 sec: 6789.6). Total num frames: 146984960. Throughput: 0: 1741.8. Samples: 31742358. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:26:12,933][41694] Avg episode reward: [(0, '4.506')] +[2024-11-08 04:26:13,371][42004] Updated weights for policy 0, policy_version 35886 (0.0030) +[2024-11-08 04:26:17,933][41694] Fps is (10 sec: 7372.0, 60 sec: 6963.1, 300 sec: 6789.6). Total num frames: 147021824. Throughput: 0: 1738.0. Samples: 31747650. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:26:17,935][41694] Avg episode reward: [(0, '4.567')] +[2024-11-08 04:26:18,886][42004] Updated weights for policy 0, policy_version 35896 (0.0029) +[2024-11-08 04:26:23,495][41694] Fps is (10 sec: 5816.2, 60 sec: 6763.4, 300 sec: 6749.0). Total num frames: 147046400. Throughput: 0: 1701.8. Samples: 31758746. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:26:23,497][41694] Avg episode reward: [(0, '4.365')] +[2024-11-08 04:26:26,873][42004] Updated weights for policy 0, policy_version 35906 (0.0026) +[2024-11-08 04:26:27,932][41694] Fps is (10 sec: 5325.4, 60 sec: 6758.4, 300 sec: 6748.0). Total num frames: 147075072. Throughput: 0: 1622.1. Samples: 31765696. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:26:27,933][41694] Avg episode reward: [(0, '4.641')] +[2024-11-08 04:26:32,858][42004] Updated weights for policy 0, policy_version 35916 (0.0038) +[2024-11-08 04:26:32,932][41694] Fps is (10 sec: 6945.0, 60 sec: 6690.1, 300 sec: 6748.0). Total num frames: 147111936. Throughput: 0: 1664.5. Samples: 31770796. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:26:32,933][41694] Avg episode reward: [(0, '4.539')] +[2024-11-08 04:26:37,932][41694] Fps is (10 sec: 7372.9, 60 sec: 6690.2, 300 sec: 6748.0). Total num frames: 147148800. Throughput: 0: 1677.8. Samples: 31781294. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:26:37,934][41694] Avg episode reward: [(0, '4.491')] +[2024-11-08 04:26:38,463][42004] Updated weights for policy 0, policy_version 35926 (0.0028) +[2024-11-08 04:26:42,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6690.1, 300 sec: 6748.0). Total num frames: 147185664. Throughput: 0: 1695.3. Samples: 31792430. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:26:42,933][41694] Avg episode reward: [(0, '4.337')] +[2024-11-08 04:26:43,840][42004] Updated weights for policy 0, policy_version 35936 (0.0026) +[2024-11-08 04:26:47,931][41694] Fps is (10 sec: 7373.0, 60 sec: 6908.8, 300 sec: 6803.5). Total num frames: 147222528. Throughput: 0: 1713.6. Samples: 31798124. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:26:47,933][41694] Avg episode reward: [(0, '4.437')] +[2024-11-08 04:26:49,399][42004] Updated weights for policy 0, policy_version 35946 (0.0026) +[2024-11-08 04:26:52,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6894.9, 300 sec: 6817.4). Total num frames: 147259392. Throughput: 0: 1722.9. Samples: 31809172. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:26:52,934][41694] Avg episode reward: [(0, '4.536')] +[2024-11-08 04:26:55,091][42004] Updated weights for policy 0, policy_version 35956 (0.0034) +[2024-11-08 04:26:57,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6690.1, 300 sec: 6775.8). Total num frames: 147279872. Throughput: 0: 1675.6. Samples: 31817762. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:26:57,934][41694] Avg episode reward: [(0, '4.519')] +[2024-11-08 04:27:02,932][41694] Fps is (10 sec: 5324.7, 60 sec: 6690.1, 300 sec: 6761.9). Total num frames: 147312640. Throughput: 0: 1650.3. Samples: 31821912. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:27:02,938][41694] Avg episode reward: [(0, '4.378')] +[2024-11-08 04:27:02,994][42004] Updated weights for policy 0, policy_version 35966 (0.0043) +[2024-11-08 04:27:07,932][41694] Fps is (10 sec: 6962.7, 60 sec: 6690.1, 300 sec: 6761.9). Total num frames: 147349504. Throughput: 0: 1636.6. Samples: 31831472. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:27:07,937][41694] Avg episode reward: [(0, '4.684')] +[2024-11-08 04:27:09,043][42004] Updated weights for policy 0, policy_version 35976 (0.0040) +[2024-11-08 04:27:12,931][41694] Fps is (10 sec: 7373.0, 60 sec: 6690.1, 300 sec: 6761.9). Total num frames: 147386368. Throughput: 0: 1703.5. Samples: 31842354. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:27:12,935][41694] Avg episode reward: [(0, '4.392')] +[2024-11-08 04:27:14,589][42004] Updated weights for policy 0, policy_version 35986 (0.0032) +[2024-11-08 04:27:17,932][41694] Fps is (10 sec: 7373.3, 60 sec: 6690.3, 300 sec: 6811.5). Total num frames: 147423232. Throughput: 0: 1718.1. Samples: 31848110. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:27:17,933][41694] Avg episode reward: [(0, '4.465')] +[2024-11-08 04:27:19,991][42004] Updated weights for policy 0, policy_version 35996 (0.0030) +[2024-11-08 04:27:22,932][41694] Fps is (10 sec: 7372.5, 60 sec: 6960.3, 300 sec: 6817.4). Total num frames: 147460096. Throughput: 0: 1736.4. Samples: 31859432. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:27:22,933][41694] Avg episode reward: [(0, '4.356')] +[2024-11-08 04:27:25,621][42004] Updated weights for policy 0, policy_version 36006 (0.0036) +[2024-11-08 04:27:27,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6963.2, 300 sec: 6803.5). Total num frames: 147492864. Throughput: 0: 1716.8. Samples: 31869688. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:27:27,935][41694] Avg episode reward: [(0, '4.557')] +[2024-11-08 04:27:32,931][41694] Fps is (10 sec: 5325.0, 60 sec: 6690.1, 300 sec: 6761.9). Total num frames: 147513344. Throughput: 0: 1686.1. Samples: 31873998. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:27:32,936][41694] Avg episode reward: [(0, '4.489')] +[2024-11-08 04:27:34,181][42004] Updated weights for policy 0, policy_version 36016 (0.0033) +[2024-11-08 04:27:37,932][41694] Fps is (10 sec: 4915.2, 60 sec: 6553.6, 300 sec: 6734.1). Total num frames: 147542016. Throughput: 0: 1589.4. Samples: 31880696. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:27:37,935][41694] Avg episode reward: [(0, '4.237')] +[2024-11-08 04:27:37,948][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000036021_147542016.pth... +[2024-11-08 04:27:38,077][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000035628_145932288.pth +[2024-11-08 04:27:40,582][42004] Updated weights for policy 0, policy_version 36026 (0.0033) +[2024-11-08 04:27:42,931][41694] Fps is (10 sec: 6144.0, 60 sec: 6485.3, 300 sec: 6720.2). Total num frames: 147574784. Throughput: 0: 1615.2. Samples: 31890446. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:27:42,934][41694] Avg episode reward: [(0, '4.462')] +[2024-11-08 04:27:46,162][42004] Updated weights for policy 0, policy_version 36036 (0.0023) +[2024-11-08 04:27:47,932][41694] Fps is (10 sec: 7373.0, 60 sec: 6553.6, 300 sec: 6720.2). Total num frames: 147615744. Throughput: 0: 1650.2. Samples: 31896172. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:27:47,933][41694] Avg episode reward: [(0, '4.463')] +[2024-11-08 04:27:51,537][42004] Updated weights for policy 0, policy_version 36046 (0.0036) +[2024-11-08 04:27:52,932][41694] Fps is (10 sec: 7782.3, 60 sec: 6553.6, 300 sec: 6766.3). Total num frames: 147652608. Throughput: 0: 1692.5. Samples: 31907632. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:27:52,935][41694] Avg episode reward: [(0, '4.347')] +[2024-11-08 04:27:57,381][42004] Updated weights for policy 0, policy_version 36056 (0.0040) +[2024-11-08 04:27:57,932][41694] Fps is (10 sec: 6963.0, 60 sec: 6758.4, 300 sec: 6761.9). Total num frames: 147685376. Throughput: 0: 1687.7. Samples: 31918300. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:27:57,934][41694] Avg episode reward: [(0, '4.544')] +[2024-11-08 04:28:02,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6826.7, 300 sec: 6775.8). Total num frames: 147722240. Throughput: 0: 1673.2. Samples: 31923402. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:28:02,938][41694] Avg episode reward: [(0, '4.414')] +[2024-11-08 04:28:03,359][42004] Updated weights for policy 0, policy_version 36066 (0.0026) +[2024-11-08 04:28:07,932][41694] Fps is (10 sec: 5734.5, 60 sec: 6553.7, 300 sec: 6720.2). Total num frames: 147742720. Throughput: 0: 1579.9. Samples: 31930528. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:28:07,934][41694] Avg episode reward: [(0, '4.571')] +[2024-11-08 04:28:11,364][42004] Updated weights for policy 0, policy_version 36076 (0.0038) +[2024-11-08 04:28:12,931][41694] Fps is (10 sec: 5324.9, 60 sec: 6485.3, 300 sec: 6706.3). Total num frames: 147775488. Throughput: 0: 1570.6. Samples: 31940366. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:28:12,933][41694] Avg episode reward: [(0, '4.459')] +[2024-11-08 04:28:17,506][42004] Updated weights for policy 0, policy_version 36086 (0.0037) +[2024-11-08 04:28:17,932][41694] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6692.5). Total num frames: 147808256. Throughput: 0: 1583.8. Samples: 31945270. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:28:17,933][41694] Avg episode reward: [(0, '4.628')] +[2024-11-08 04:28:22,719][42004] Updated weights for policy 0, policy_version 36096 (0.0025) +[2024-11-08 04:28:22,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6485.4, 300 sec: 6692.5). Total num frames: 147849216. Throughput: 0: 1681.4. Samples: 31956358. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:28:22,934][41694] Avg episode reward: [(0, '4.700')] +[2024-11-08 04:28:27,931][41694] Fps is (10 sec: 7782.5, 60 sec: 6553.6, 300 sec: 6762.7). Total num frames: 147886080. Throughput: 0: 1726.1. Samples: 31968122. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:28:27,933][41694] Avg episode reward: [(0, '4.506')] +[2024-11-08 04:28:27,980][42004] Updated weights for policy 0, policy_version 36106 (0.0033) +[2024-11-08 04:28:32,931][41694] Fps is (10 sec: 7372.7, 60 sec: 6826.7, 300 sec: 6748.0). Total num frames: 147922944. Throughput: 0: 1717.4. Samples: 31973454. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:28:32,933][41694] Avg episode reward: [(0, '4.601')] +[2024-11-08 04:28:33,476][42004] Updated weights for policy 0, policy_version 36116 (0.0028) +[2024-11-08 04:28:37,931][41694] Fps is (10 sec: 7372.7, 60 sec: 6963.2, 300 sec: 6775.8). Total num frames: 147959808. Throughput: 0: 1710.8. Samples: 31984618. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:28:37,933][41694] Avg episode reward: [(0, '4.463')] +[2024-11-08 04:28:41,155][42004] Updated weights for policy 0, policy_version 36126 (0.0031) +[2024-11-08 04:28:42,934][41694] Fps is (10 sec: 5733.1, 60 sec: 6758.2, 300 sec: 6720.2). Total num frames: 147980288. Throughput: 0: 1634.6. Samples: 31991860. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:28:42,936][41694] Avg episode reward: [(0, '4.557')] +[2024-11-08 04:28:47,549][42004] Updated weights for policy 0, policy_version 36136 (0.0040) +[2024-11-08 04:28:47,931][41694] Fps is (10 sec: 5324.8, 60 sec: 6621.9, 300 sec: 6692.4). Total num frames: 148013056. Throughput: 0: 1628.8. Samples: 31996698. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:28:47,933][41694] Avg episode reward: [(0, '4.521')] +[2024-11-08 04:28:52,933][41694] Fps is (10 sec: 6963.6, 60 sec: 6621.7, 300 sec: 6706.3). Total num frames: 148049920. Throughput: 0: 1693.9. Samples: 32006758. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:28:52,935][41694] Avg episode reward: [(0, '4.633')] +[2024-11-08 04:28:53,218][42004] Updated weights for policy 0, policy_version 36146 (0.0036) +[2024-11-08 04:28:57,932][41694] Fps is (10 sec: 7372.6, 60 sec: 6690.1, 300 sec: 6706.3). Total num frames: 148086784. Throughput: 0: 1728.0. Samples: 32018126. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:28:57,934][41694] Avg episode reward: [(0, '4.386')] +[2024-11-08 04:28:58,592][42004] Updated weights for policy 0, policy_version 36156 (0.0026) +[2024-11-08 04:29:02,931][41694] Fps is (10 sec: 7374.1, 60 sec: 6690.2, 300 sec: 6748.0). Total num frames: 148123648. Throughput: 0: 1743.5. Samples: 32023726. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:29:02,933][41694] Avg episode reward: [(0, '4.373')] +[2024-11-08 04:29:04,531][42004] Updated weights for policy 0, policy_version 36166 (0.0026) +[2024-11-08 04:29:07,932][41694] Fps is (10 sec: 7372.6, 60 sec: 6963.1, 300 sec: 6748.0). Total num frames: 148160512. Throughput: 0: 1729.9. Samples: 32034204. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:29:07,936][41694] Avg episode reward: [(0, '4.508')] +[2024-11-08 04:29:10,098][42004] Updated weights for policy 0, policy_version 36176 (0.0027) +[2024-11-08 04:29:13,330][41694] Fps is (10 sec: 6696.4, 60 sec: 6917.3, 300 sec: 6738.9). Total num frames: 148193280. Throughput: 0: 1699.4. Samples: 32045274. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:29:13,332][41694] Avg episode reward: [(0, '4.598')] +[2024-11-08 04:29:16,631][42004] Updated weights for policy 0, policy_version 36186 (0.0029) +[2024-11-08 04:29:17,935][41694] Fps is (10 sec: 6551.4, 60 sec: 6962.7, 300 sec: 6734.0). Total num frames: 148226048. Throughput: 0: 1675.8. Samples: 32048870. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:29:17,940][41694] Avg episode reward: [(0, '4.446')] +[2024-11-08 04:29:22,548][42004] Updated weights for policy 0, policy_version 36196 (0.0032) +[2024-11-08 04:29:22,931][41694] Fps is (10 sec: 6825.6, 60 sec: 6826.7, 300 sec: 6734.1). Total num frames: 148258816. Throughput: 0: 1656.5. Samples: 32059162. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:29:22,933][41694] Avg episode reward: [(0, '4.438')] +[2024-11-08 04:29:27,847][42004] Updated weights for policy 0, policy_version 36206 (0.0036) +[2024-11-08 04:29:27,932][41694] Fps is (10 sec: 7375.6, 60 sec: 6894.9, 300 sec: 6748.0). Total num frames: 148299776. Throughput: 0: 1750.9. Samples: 32070648. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-11-08 04:29:27,935][41694] Avg episode reward: [(0, '4.470')] +[2024-11-08 04:29:32,932][41694] Fps is (10 sec: 7782.3, 60 sec: 6894.9, 300 sec: 6748.0). Total num frames: 148336640. Throughput: 0: 1774.6. Samples: 32076554. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-11-08 04:29:32,934][41694] Avg episode reward: [(0, '4.466')] +[2024-11-08 04:29:33,086][42004] Updated weights for policy 0, policy_version 36216 (0.0031) +[2024-11-08 04:29:37,932][41694] Fps is (10 sec: 7782.3, 60 sec: 6963.2, 300 sec: 6803.5). Total num frames: 148377600. Throughput: 0: 1817.2. Samples: 32088530. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-11-08 04:29:37,935][41694] Avg episode reward: [(0, '4.394')] +[2024-11-08 04:29:37,947][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000036225_148377600.pth... +[2024-11-08 04:29:38,075][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000035827_146747392.pth +[2024-11-08 04:29:38,451][42004] Updated weights for policy 0, policy_version 36226 (0.0031) +[2024-11-08 04:29:42,933][41694] Fps is (10 sec: 7371.5, 60 sec: 7168.1, 300 sec: 6803.5). Total num frames: 148410368. Throughput: 0: 1800.9. Samples: 32099168. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-11-08 04:29:42,937][41694] Avg episode reward: [(0, '4.456')] +[2024-11-08 04:29:44,436][42004] Updated weights for policy 0, policy_version 36236 (0.0033) +[2024-11-08 04:29:47,932][41694] Fps is (10 sec: 5734.5, 60 sec: 7031.5, 300 sec: 6761.9). Total num frames: 148434944. Throughput: 0: 1786.9. Samples: 32104138. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:29:47,934][41694] Avg episode reward: [(0, '4.594')] +[2024-11-08 04:29:51,907][42004] Updated weights for policy 0, policy_version 36246 (0.0035) +[2024-11-08 04:29:52,932][41694] Fps is (10 sec: 5735.2, 60 sec: 6963.3, 300 sec: 6748.0). Total num frames: 148467712. Throughput: 0: 1726.9. Samples: 32111916. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:29:52,933][41694] Avg episode reward: [(0, '4.640')] +[2024-11-08 04:29:57,931][41694] Fps is (10 sec: 6144.1, 60 sec: 6826.7, 300 sec: 6720.2). Total num frames: 148496384. Throughput: 0: 1698.1. Samples: 32121012. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:29:57,934][41694] Avg episode reward: [(0, '4.356')] +[2024-11-08 04:29:58,512][42004] Updated weights for policy 0, policy_version 36256 (0.0048) +[2024-11-08 04:30:02,931][41694] Fps is (10 sec: 6553.9, 60 sec: 6826.7, 300 sec: 6720.2). Total num frames: 148533248. Throughput: 0: 1723.1. Samples: 32126402. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:30:02,933][41694] Avg episode reward: [(0, '4.391')] +[2024-11-08 04:30:04,222][42004] Updated weights for policy 0, policy_version 36266 (0.0031) +[2024-11-08 04:30:07,932][41694] Fps is (10 sec: 7372.6, 60 sec: 6826.7, 300 sec: 6734.1). Total num frames: 148570112. Throughput: 0: 1734.7. Samples: 32137226. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:30:07,934][41694] Avg episode reward: [(0, '4.251')] +[2024-11-08 04:30:09,898][42004] Updated weights for policy 0, policy_version 36276 (0.0040) +[2024-11-08 04:30:12,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6941.0, 300 sec: 6789.6). Total num frames: 148606976. Throughput: 0: 1714.1. Samples: 32147782. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:30:12,934][41694] Avg episode reward: [(0, '4.402')] +[2024-11-08 04:30:16,146][42004] Updated weights for policy 0, policy_version 36286 (0.0039) +[2024-11-08 04:30:17,932][41694] Fps is (10 sec: 6553.7, 60 sec: 6827.1, 300 sec: 6775.8). Total num frames: 148635648. Throughput: 0: 1688.7. Samples: 32152544. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:30:17,933][41694] Avg episode reward: [(0, '4.347')] +[2024-11-08 04:30:22,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6758.4, 300 sec: 6761.9). Total num frames: 148664320. Throughput: 0: 1607.0. Samples: 32160846. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:30:22,934][41694] Avg episode reward: [(0, '4.433')] +[2024-11-08 04:30:23,027][42004] Updated weights for policy 0, policy_version 36296 (0.0029) +[2024-11-08 04:30:27,931][41694] Fps is (10 sec: 6553.7, 60 sec: 6690.1, 300 sec: 6748.0). Total num frames: 148701184. Throughput: 0: 1602.4. Samples: 32171272. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:30:27,934][41694] Avg episode reward: [(0, '4.514')] +[2024-11-08 04:30:29,098][42004] Updated weights for policy 0, policy_version 36306 (0.0029) +[2024-11-08 04:30:32,931][41694] Fps is (10 sec: 6963.4, 60 sec: 6621.9, 300 sec: 6734.1). Total num frames: 148733952. Throughput: 0: 1603.1. Samples: 32176276. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:30:32,933][41694] Avg episode reward: [(0, '4.547')] +[2024-11-08 04:30:34,857][42004] Updated weights for policy 0, policy_version 36316 (0.0027) +[2024-11-08 04:30:37,932][41694] Fps is (10 sec: 6963.0, 60 sec: 6553.6, 300 sec: 6734.1). Total num frames: 148770816. Throughput: 0: 1673.4. Samples: 32187218. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:30:37,936][41694] Avg episode reward: [(0, '4.512')] +[2024-11-08 04:30:40,561][42004] Updated weights for policy 0, policy_version 36326 (0.0023) +[2024-11-08 04:30:42,932][41694] Fps is (10 sec: 7372.6, 60 sec: 6622.0, 300 sec: 6778.0). Total num frames: 148807680. Throughput: 0: 1709.5. Samples: 32197942. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:30:42,934][41694] Avg episode reward: [(0, '4.417')] +[2024-11-08 04:30:46,130][42004] Updated weights for policy 0, policy_version 36336 (0.0034) +[2024-11-08 04:30:47,931][41694] Fps is (10 sec: 7373.0, 60 sec: 6826.7, 300 sec: 6775.8). Total num frames: 148844544. Throughput: 0: 1711.3. Samples: 32203410. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:30:47,934][41694] Avg episode reward: [(0, '4.278')] +[2024-11-08 04:30:51,835][42004] Updated weights for policy 0, policy_version 36346 (0.0033) +[2024-11-08 04:30:54,352][41694] Fps is (10 sec: 6097.2, 60 sec: 6668.8, 300 sec: 6743.3). Total num frames: 148877312. Throughput: 0: 1662.9. Samples: 32214418. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:30:54,353][41694] Avg episode reward: [(0, '4.328')] +[2024-11-08 04:30:57,932][41694] Fps is (10 sec: 6143.6, 60 sec: 6826.6, 300 sec: 6761.9). Total num frames: 148905984. Throughput: 0: 1660.2. Samples: 32222492. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:30:57,935][41694] Avg episode reward: [(0, '4.355')] +[2024-11-08 04:30:59,029][42004] Updated weights for policy 0, policy_version 36356 (0.0045) +[2024-11-08 04:31:02,932][41694] Fps is (10 sec: 6683.7, 60 sec: 6690.1, 300 sec: 6734.1). Total num frames: 148934656. Throughput: 0: 1665.4. Samples: 32227486. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:31:02,936][41694] Avg episode reward: [(0, '4.553')] +[2024-11-08 04:31:05,939][42004] Updated weights for policy 0, policy_version 36366 (0.0038) +[2024-11-08 04:31:07,932][41694] Fps is (10 sec: 6144.3, 60 sec: 6621.9, 300 sec: 6720.2). Total num frames: 148967424. Throughput: 0: 1680.2. Samples: 32236456. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:31:07,934][41694] Avg episode reward: [(0, '4.539')] +[2024-11-08 04:31:12,100][42004] Updated weights for policy 0, policy_version 36376 (0.0026) +[2024-11-08 04:31:12,932][41694] Fps is (10 sec: 6553.7, 60 sec: 6553.6, 300 sec: 6706.4). Total num frames: 149000192. Throughput: 0: 1669.7. Samples: 32246408. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:31:12,933][41694] Avg episode reward: [(0, '4.473')] +[2024-11-08 04:31:17,899][42004] Updated weights for policy 0, policy_version 36386 (0.0024) +[2024-11-08 04:31:17,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6690.1, 300 sec: 6760.9). Total num frames: 149037056. Throughput: 0: 1669.6. Samples: 32251408. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:31:17,933][41694] Avg episode reward: [(0, '4.154')] +[2024-11-08 04:31:22,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6826.7, 300 sec: 6775.8). Total num frames: 149073920. Throughput: 0: 1673.9. Samples: 32262544. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:31:22,933][41694] Avg episode reward: [(0, '4.514')] +[2024-11-08 04:31:23,419][42004] Updated weights for policy 0, policy_version 36396 (0.0028) +[2024-11-08 04:31:27,932][41694] Fps is (10 sec: 6144.0, 60 sec: 6621.8, 300 sec: 6734.1). Total num frames: 149098496. Throughput: 0: 1558.0. Samples: 32268052. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:31:27,933][41694] Avg episode reward: [(0, '4.445')] +[2024-11-08 04:31:30,459][42004] Updated weights for policy 0, policy_version 36406 (0.0032) +[2024-11-08 04:31:32,932][41694] Fps is (10 sec: 6144.0, 60 sec: 6690.1, 300 sec: 6734.1). Total num frames: 149135360. Throughput: 0: 1623.6. Samples: 32276474. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:31:32,933][41694] Avg episode reward: [(0, '4.537')] +[2024-11-08 04:31:36,586][42004] Updated weights for policy 0, policy_version 36416 (0.0022) +[2024-11-08 04:31:37,932][41694] Fps is (10 sec: 6963.3, 60 sec: 6621.9, 300 sec: 6720.2). Total num frames: 149168128. Throughput: 0: 1654.5. Samples: 32286520. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:31:37,934][41694] Avg episode reward: [(0, '4.471')] +[2024-11-08 04:31:38,003][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000036418_149168128.pth... +[2024-11-08 04:31:38,212][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000036021_147542016.pth +[2024-11-08 04:31:42,457][42004] Updated weights for policy 0, policy_version 36426 (0.0038) +[2024-11-08 04:31:42,932][41694] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6706.3). Total num frames: 149200896. Throughput: 0: 1651.0. Samples: 32296788. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:31:42,933][41694] Avg episode reward: [(0, '4.290')] +[2024-11-08 04:31:47,933][41694] Fps is (10 sec: 6962.2, 60 sec: 6553.4, 300 sec: 6706.3). Total num frames: 149237760. Throughput: 0: 1662.2. Samples: 32302288. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-11-08 04:31:47,935][41694] Avg episode reward: [(0, '4.451')] +[2024-11-08 04:31:47,983][42004] Updated weights for policy 0, policy_version 36436 (0.0040) +[2024-11-08 04:31:52,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6782.4, 300 sec: 6761.9). Total num frames: 149274624. Throughput: 0: 1711.6. Samples: 32313478. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-11-08 04:31:52,934][41694] Avg episode reward: [(0, '4.265')] +[2024-11-08 04:31:53,523][42004] Updated weights for policy 0, policy_version 36446 (0.0029) +[2024-11-08 04:31:57,932][41694] Fps is (10 sec: 7783.2, 60 sec: 6826.7, 300 sec: 6789.6). Total num frames: 149315584. Throughput: 0: 1742.0. Samples: 32324800. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-11-08 04:31:57,933][41694] Avg episode reward: [(0, '4.442')] +[2024-11-08 04:31:58,964][42004] Updated weights for policy 0, policy_version 36456 (0.0029) +[2024-11-08 04:32:02,931][41694] Fps is (10 sec: 6553.6, 60 sec: 6758.4, 300 sec: 6748.0). Total num frames: 149340160. Throughput: 0: 1753.6. Samples: 32330320. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-11-08 04:32:02,934][41694] Avg episode reward: [(0, '4.439')] +[2024-11-08 04:32:06,468][42004] Updated weights for policy 0, policy_version 36466 (0.0037) +[2024-11-08 04:32:07,932][41694] Fps is (10 sec: 5734.6, 60 sec: 6758.4, 300 sec: 6734.1). Total num frames: 149372928. Throughput: 0: 1674.8. Samples: 32337910. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-11-08 04:32:07,933][41694] Avg episode reward: [(0, '4.684')] +[2024-11-08 04:32:12,534][42004] Updated weights for policy 0, policy_version 36476 (0.0029) +[2024-11-08 04:32:12,931][41694] Fps is (10 sec: 6553.6, 60 sec: 6758.4, 300 sec: 6720.2). Total num frames: 149405696. Throughput: 0: 1777.7. Samples: 32348050. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:32:12,933][41694] Avg episode reward: [(0, '4.411')] +[2024-11-08 04:32:17,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6758.4, 300 sec: 6720.2). Total num frames: 149442560. Throughput: 0: 1709.6. Samples: 32353408. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:32:17,933][41694] Avg episode reward: [(0, '4.475')] +[2024-11-08 04:32:18,031][42004] Updated weights for policy 0, policy_version 36486 (0.0027) +[2024-11-08 04:32:22,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6758.4, 300 sec: 6734.1). Total num frames: 149479424. Throughput: 0: 1738.3. Samples: 32364742. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:32:22,933][41694] Avg episode reward: [(0, '4.272')] +[2024-11-08 04:32:23,521][42004] Updated weights for policy 0, policy_version 36496 (0.0031) +[2024-11-08 04:32:27,932][41694] Fps is (10 sec: 7782.4, 60 sec: 7031.5, 300 sec: 6803.5). Total num frames: 149520384. Throughput: 0: 1760.3. Samples: 32376002. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:32:27,934][41694] Avg episode reward: [(0, '4.483')] +[2024-11-08 04:32:28,943][42004] Updated weights for policy 0, policy_version 36506 (0.0027) +[2024-11-08 04:32:32,931][41694] Fps is (10 sec: 7782.4, 60 sec: 7031.5, 300 sec: 6831.3). Total num frames: 149557248. Throughput: 0: 1760.7. Samples: 32381518. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:32:32,934][41694] Avg episode reward: [(0, '4.623')] +[2024-11-08 04:32:36,137][42004] Updated weights for policy 0, policy_version 36516 (0.0026) +[2024-11-08 04:32:37,931][41694] Fps is (10 sec: 6144.1, 60 sec: 6894.9, 300 sec: 6803.5). Total num frames: 149581824. Throughput: 0: 1693.8. Samples: 32389700. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:32:37,933][41694] Avg episode reward: [(0, '4.552')] +[2024-11-08 04:32:42,172][42004] Updated weights for policy 0, policy_version 36526 (0.0036) +[2024-11-08 04:32:42,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6894.9, 300 sec: 6775.8). Total num frames: 149614592. Throughput: 0: 1667.9. Samples: 32399854. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:32:42,934][41694] Avg episode reward: [(0, '4.327')] +[2024-11-08 04:32:47,932][41694] Fps is (10 sec: 6553.4, 60 sec: 6826.8, 300 sec: 6761.9). Total num frames: 149647360. Throughput: 0: 1659.1. Samples: 32404978. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:32:47,934][41694] Avg episode reward: [(0, '4.452')] +[2024-11-08 04:32:48,006][42004] Updated weights for policy 0, policy_version 36536 (0.0034) +[2024-11-08 04:32:52,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6894.9, 300 sec: 6789.6). Total num frames: 149688320. Throughput: 0: 1740.7. Samples: 32416240. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:32:52,933][41694] Avg episode reward: [(0, '4.375')] +[2024-11-08 04:32:53,171][42004] Updated weights for policy 0, policy_version 36546 (0.0024) +[2024-11-08 04:32:57,931][41694] Fps is (10 sec: 7782.6, 60 sec: 6826.7, 300 sec: 6789.6). Total num frames: 149725184. Throughput: 0: 1771.6. Samples: 32427772. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:32:57,933][41694] Avg episode reward: [(0, '4.401')] +[2024-11-08 04:32:58,574][42004] Updated weights for policy 0, policy_version 36556 (0.0036) +[2024-11-08 04:33:02,932][41694] Fps is (10 sec: 7372.7, 60 sec: 7031.5, 300 sec: 6845.2). Total num frames: 149762048. Throughput: 0: 1784.1. Samples: 32433692. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:33:02,934][41694] Avg episode reward: [(0, '4.831')] +[2024-11-08 04:33:04,427][42004] Updated weights for policy 0, policy_version 36566 (0.0029) +[2024-11-08 04:33:09,211][41694] Fps is (10 sec: 6173.5, 60 sec: 6884.7, 300 sec: 6815.6). Total num frames: 149794816. Throughput: 0: 1713.1. Samples: 32444022. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:33:09,213][41694] Avg episode reward: [(0, '4.812')] +[2024-11-08 04:33:11,607][42004] Updated weights for policy 0, policy_version 36576 (0.0033) +[2024-11-08 04:33:12,932][41694] Fps is (10 sec: 6144.0, 60 sec: 6963.2, 300 sec: 6831.3). Total num frames: 149823488. Throughput: 0: 1693.2. Samples: 32452198. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:33:12,934][41694] Avg episode reward: [(0, '4.310')] +[2024-11-08 04:33:17,492][42004] Updated weights for policy 0, policy_version 36586 (0.0026) +[2024-11-08 04:33:17,932][41694] Fps is (10 sec: 7045.1, 60 sec: 6894.9, 300 sec: 6803.5). Total num frames: 149856256. Throughput: 0: 1683.6. Samples: 32457282. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:33:17,933][41694] Avg episode reward: [(0, '4.266')] +[2024-11-08 04:33:22,932][41694] Fps is (10 sec: 6962.7, 60 sec: 6894.8, 300 sec: 6803.5). Total num frames: 149893120. Throughput: 0: 1729.7. Samples: 32467536. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:33:22,935][41694] Avg episode reward: [(0, '4.322')] +[2024-11-08 04:33:23,356][42004] Updated weights for policy 0, policy_version 36596 (0.0030) +[2024-11-08 04:33:27,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6826.7, 300 sec: 6803.5). Total num frames: 149929984. Throughput: 0: 1757.3. Samples: 32478934. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:33:27,934][41694] Avg episode reward: [(0, '4.432')] +[2024-11-08 04:33:28,737][42004] Updated weights for policy 0, policy_version 36606 (0.0023) +[2024-11-08 04:33:32,934][41694] Fps is (10 sec: 7371.9, 60 sec: 6826.4, 300 sec: 6803.5). Total num frames: 149966848. Throughput: 0: 1768.3. Samples: 32484554. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:33:32,936][41694] Avg episode reward: [(0, '4.382')] +[2024-11-08 04:33:34,141][42004] Updated weights for policy 0, policy_version 36616 (0.0027) +[2024-11-08 04:33:37,932][41694] Fps is (10 sec: 7372.9, 60 sec: 7031.5, 300 sec: 6859.1). Total num frames: 150003712. Throughput: 0: 1759.9. Samples: 32495434. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:33:37,936][41694] Avg episode reward: [(0, '4.481')] +[2024-11-08 04:33:37,947][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000036622_150003712.pth... +[2024-11-08 04:33:38,155][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000036225_148377600.pth +[2024-11-08 04:33:40,036][42004] Updated weights for policy 0, policy_version 36626 (0.0030) +[2024-11-08 04:33:43,069][41694] Fps is (10 sec: 6062.0, 60 sec: 6879.2, 300 sec: 6828.1). Total num frames: 150028288. Throughput: 0: 1616.7. Samples: 32500744. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:33:43,071][41694] Avg episode reward: [(0, '4.498')] +[2024-11-08 04:33:47,421][42004] Updated weights for policy 0, policy_version 36636 (0.0029) +[2024-11-08 04:33:47,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6895.0, 300 sec: 6817.4). Total num frames: 150061056. Throughput: 0: 1659.3. Samples: 32508360. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:33:47,934][41694] Avg episode reward: [(0, '4.382')] +[2024-11-08 04:33:52,931][41694] Fps is (10 sec: 7060.2, 60 sec: 6826.7, 300 sec: 6817.4). Total num frames: 150097920. Throughput: 0: 1706.0. Samples: 32518610. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:33:52,933][41694] Avg episode reward: [(0, '4.515')] +[2024-11-08 04:33:53,500][42004] Updated weights for policy 0, policy_version 36646 (0.0027) +[2024-11-08 04:33:57,932][41694] Fps is (10 sec: 7372.9, 60 sec: 6826.7, 300 sec: 6817.4). Total num frames: 150134784. Throughput: 0: 1724.8. Samples: 32529816. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:33:57,934][41694] Avg episode reward: [(0, '4.708')] +[2024-11-08 04:33:58,678][42004] Updated weights for policy 0, policy_version 36656 (0.0028) +[2024-11-08 04:34:02,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6758.4, 300 sec: 6803.5). Total num frames: 150167552. Throughput: 0: 1738.0. Samples: 32535490. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:34:02,935][41694] Avg episode reward: [(0, '4.313')] +[2024-11-08 04:34:04,990][42004] Updated weights for policy 0, policy_version 36666 (0.0026) +[2024-11-08 04:34:07,932][41694] Fps is (10 sec: 6962.9, 60 sec: 6975.3, 300 sec: 6826.6). Total num frames: 150204416. Throughput: 0: 1733.3. Samples: 32545536. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:34:07,935][41694] Avg episode reward: [(0, '4.427')] +[2024-11-08 04:34:10,673][42004] Updated weights for policy 0, policy_version 36676 (0.0030) +[2024-11-08 04:34:12,931][41694] Fps is (10 sec: 6963.4, 60 sec: 6895.0, 300 sec: 6817.5). Total num frames: 150237184. Throughput: 0: 1714.9. Samples: 32556106. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:34:12,933][41694] Avg episode reward: [(0, '4.412')] +[2024-11-08 04:34:17,932][41694] Fps is (10 sec: 5734.5, 60 sec: 6758.4, 300 sec: 6789.6). Total num frames: 150261760. Throughput: 0: 1715.8. Samples: 32561762. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:34:17,933][41694] Avg episode reward: [(0, '4.553')] +[2024-11-08 04:34:17,962][42004] Updated weights for policy 0, policy_version 36686 (0.0033) +[2024-11-08 04:34:22,932][41694] Fps is (10 sec: 6143.9, 60 sec: 6758.5, 300 sec: 6775.8). Total num frames: 150298624. Throughput: 0: 1640.7. Samples: 32569264. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:34:22,934][41694] Avg episode reward: [(0, '4.503')] +[2024-11-08 04:34:23,893][42004] Updated weights for policy 0, policy_version 36696 (0.0020) +[2024-11-08 04:34:27,931][41694] Fps is (10 sec: 6963.4, 60 sec: 6690.2, 300 sec: 6761.9). Total num frames: 150331392. Throughput: 0: 1751.3. Samples: 32579310. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:34:27,933][41694] Avg episode reward: [(0, '4.442')] +[2024-11-08 04:34:29,970][42004] Updated weights for policy 0, policy_version 36706 (0.0032) +[2024-11-08 04:34:32,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6690.4, 300 sec: 6748.0). Total num frames: 150368256. Throughput: 0: 1698.0. Samples: 32584770. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 04:34:32,937][41694] Avg episode reward: [(0, '4.498')] +[2024-11-08 04:34:35,203][42004] Updated weights for policy 0, policy_version 36716 (0.0024) +[2024-11-08 04:34:37,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6690.2, 300 sec: 6761.9). Total num frames: 150405120. Throughput: 0: 1719.8. Samples: 32596002. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 04:34:37,933][41694] Avg episode reward: [(0, '4.431')] +[2024-11-08 04:34:40,943][42004] Updated weights for policy 0, policy_version 36726 (0.0026) +[2024-11-08 04:34:42,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6910.8, 300 sec: 6803.5). Total num frames: 150441984. Throughput: 0: 1717.6. Samples: 32607106. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 04:34:42,933][41694] Avg episode reward: [(0, '4.422')] +[2024-11-08 04:34:46,678][42004] Updated weights for policy 0, policy_version 36736 (0.0027) +[2024-11-08 04:34:47,932][41694] Fps is (10 sec: 7372.5, 60 sec: 6963.2, 300 sec: 6817.4). Total num frames: 150478848. Throughput: 0: 1701.4. Samples: 32612054. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 04:34:47,936][41694] Avg episode reward: [(0, '4.383')] +[2024-11-08 04:34:52,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6690.1, 300 sec: 6789.6). Total num frames: 150499328. Throughput: 0: 1654.4. Samples: 32619984. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 04:34:52,934][41694] Avg episode reward: [(0, '4.350')] +[2024-11-08 04:34:54,209][42004] Updated weights for policy 0, policy_version 36746 (0.0044) +[2024-11-08 04:34:57,931][41694] Fps is (10 sec: 5734.6, 60 sec: 6690.1, 300 sec: 6789.6). Total num frames: 150536192. Throughput: 0: 1648.4. Samples: 32630286. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:34:57,938][41694] Avg episode reward: [(0, '4.317')] +[2024-11-08 04:35:00,162][42004] Updated weights for policy 0, policy_version 36756 (0.0028) +[2024-11-08 04:35:02,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6690.1, 300 sec: 6775.8). Total num frames: 150568960. Throughput: 0: 1635.6. Samples: 32635362. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:35:02,933][41694] Avg episode reward: [(0, '4.421')] +[2024-11-08 04:35:06,004][42004] Updated weights for policy 0, policy_version 36766 (0.0027) +[2024-11-08 04:35:07,932][41694] Fps is (10 sec: 6962.9, 60 sec: 6690.1, 300 sec: 6775.8). Total num frames: 150605824. Throughput: 0: 1703.2. Samples: 32645910. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:35:07,934][41694] Avg episode reward: [(0, '4.514')] +[2024-11-08 04:35:11,613][42004] Updated weights for policy 0, policy_version 36776 (0.0031) +[2024-11-08 04:35:12,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6758.4, 300 sec: 6803.5). Total num frames: 150642688. Throughput: 0: 1724.3. Samples: 32656904. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:35:12,938][41694] Avg episode reward: [(0, '4.444')] +[2024-11-08 04:35:17,816][42004] Updated weights for policy 0, policy_version 36786 (0.0047) +[2024-11-08 04:35:17,932][41694] Fps is (10 sec: 6963.3, 60 sec: 6894.9, 300 sec: 6817.4). Total num frames: 150675456. Throughput: 0: 1709.4. Samples: 32661694. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:35:17,934][41694] Avg episode reward: [(0, '4.615')] +[2024-11-08 04:35:22,933][41694] Fps is (10 sec: 6962.3, 60 sec: 6894.8, 300 sec: 6817.4). Total num frames: 150712320. Throughput: 0: 1693.2. Samples: 32672198. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:35:22,936][41694] Avg episode reward: [(0, '4.578')] +[2024-11-08 04:35:25,127][42004] Updated weights for policy 0, policy_version 36796 (0.0033) +[2024-11-08 04:35:27,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6690.1, 300 sec: 6775.8). Total num frames: 150732800. Throughput: 0: 1619.0. Samples: 32679962. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:35:27,934][41694] Avg episode reward: [(0, '4.513')] +[2024-11-08 04:35:31,046][42004] Updated weights for policy 0, policy_version 36806 (0.0026) +[2024-11-08 04:35:32,932][41694] Fps is (10 sec: 5734.7, 60 sec: 6690.0, 300 sec: 6775.7). Total num frames: 150769664. Throughput: 0: 1623.0. Samples: 32685090. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:35:32,935][41694] Avg episode reward: [(0, '4.446')] +[2024-11-08 04:35:36,821][42004] Updated weights for policy 0, policy_version 36816 (0.0031) +[2024-11-08 04:35:37,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6690.1, 300 sec: 6775.8). Total num frames: 150806528. Throughput: 0: 1681.6. Samples: 32695654. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:35:37,933][41694] Avg episode reward: [(0, '4.286')] +[2024-11-08 04:35:37,953][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000036818_150806528.pth... +[2024-11-08 04:35:38,050][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000036418_149168128.pth +[2024-11-08 04:35:42,129][42004] Updated weights for policy 0, policy_version 36826 (0.0026) +[2024-11-08 04:35:42,932][41694] Fps is (10 sec: 7373.4, 60 sec: 6690.1, 300 sec: 6775.8). Total num frames: 150843392. Throughput: 0: 1711.1. Samples: 32707286. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:35:42,933][41694] Avg episode reward: [(0, '4.344')] +[2024-11-08 04:35:47,351][42004] Updated weights for policy 0, policy_version 36836 (0.0026) +[2024-11-08 04:35:47,932][41694] Fps is (10 sec: 7782.3, 60 sec: 6758.4, 300 sec: 6836.4). Total num frames: 150884352. Throughput: 0: 1718.7. Samples: 32712702. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:35:47,935][41694] Avg episode reward: [(0, '4.583')] +[2024-11-08 04:35:52,886][42004] Updated weights for policy 0, policy_version 36846 (0.0023) +[2024-11-08 04:35:52,932][41694] Fps is (10 sec: 7782.5, 60 sec: 7031.5, 300 sec: 6831.3). Total num frames: 150921216. Throughput: 0: 1743.8. Samples: 32724382. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:35:52,935][41694] Avg episode reward: [(0, '4.503')] +[2024-11-08 04:35:59,049][41694] Fps is (10 sec: 6263.0, 60 sec: 6835.8, 300 sec: 6819.3). Total num frames: 150953984. Throughput: 0: 1711.8. Samples: 32735848. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:35:59,051][41694] Avg episode reward: [(0, '4.219')] +[2024-11-08 04:36:00,077][42004] Updated weights for policy 0, policy_version 36856 (0.0024) +[2024-11-08 04:36:02,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6826.7, 300 sec: 6817.4). Total num frames: 150978560. Throughput: 0: 1695.3. Samples: 32737982. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:36:02,933][41694] Avg episode reward: [(0, '4.367')] +[2024-11-08 04:36:06,511][42004] Updated weights for policy 0, policy_version 36866 (0.0034) +[2024-11-08 04:36:07,932][41694] Fps is (10 sec: 6456.1, 60 sec: 6758.4, 300 sec: 6817.4). Total num frames: 151011328. Throughput: 0: 1673.0. Samples: 32747482. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:36:07,934][41694] Avg episode reward: [(0, '4.342')] +[2024-11-08 04:36:12,637][42004] Updated weights for policy 0, policy_version 36876 (0.0043) +[2024-11-08 04:36:12,932][41694] Fps is (10 sec: 6553.2, 60 sec: 6690.1, 300 sec: 6803.5). Total num frames: 151044096. Throughput: 0: 1719.4. Samples: 32757336. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:36:12,935][41694] Avg episode reward: [(0, '4.409')] +[2024-11-08 04:36:17,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6758.4, 300 sec: 6803.5). Total num frames: 151080960. Throughput: 0: 1724.6. Samples: 32762696. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:36:17,935][41694] Avg episode reward: [(0, '4.379')] +[2024-11-08 04:36:18,226][42004] Updated weights for policy 0, policy_version 36886 (0.0033) +[2024-11-08 04:36:22,932][41694] Fps is (10 sec: 7373.3, 60 sec: 6758.6, 300 sec: 6845.2). Total num frames: 151117824. Throughput: 0: 1741.7. Samples: 32774032. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:36:22,934][41694] Avg episode reward: [(0, '4.299')] +[2024-11-08 04:36:23,627][42004] Updated weights for policy 0, policy_version 36896 (0.0030) +[2024-11-08 04:36:27,932][41694] Fps is (10 sec: 7373.0, 60 sec: 7031.5, 300 sec: 6845.2). Total num frames: 151154688. Throughput: 0: 1734.1. Samples: 32785320. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:36:27,933][41694] Avg episode reward: [(0, '4.387')] +[2024-11-08 04:36:29,116][42004] Updated weights for policy 0, policy_version 36906 (0.0063) +[2024-11-08 04:36:33,228][41694] Fps is (10 sec: 5967.2, 60 sec: 6793.2, 300 sec: 6810.6). Total num frames: 151179264. Throughput: 0: 1727.3. Samples: 32790944. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-08 04:36:33,230][41694] Avg episode reward: [(0, '4.366')] +[2024-11-08 04:36:36,953][42004] Updated weights for policy 0, policy_version 36916 (0.0030) +[2024-11-08 04:36:37,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6758.4, 300 sec: 6817.4). Total num frames: 151212032. Throughput: 0: 1632.2. Samples: 32797832. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-08 04:36:37,934][41694] Avg episode reward: [(0, '4.583')] +[2024-11-08 04:36:42,931][41694] Fps is (10 sec: 6753.8, 60 sec: 6690.2, 300 sec: 6803.6). Total num frames: 151244800. Throughput: 0: 1642.0. Samples: 32807904. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-08 04:36:42,933][41694] Avg episode reward: [(0, '4.409')] +[2024-11-08 04:36:43,030][42004] Updated weights for policy 0, policy_version 36926 (0.0046) +[2024-11-08 04:36:47,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6690.1, 300 sec: 6817.4). Total num frames: 151285760. Throughput: 0: 1670.8. Samples: 32813168. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-08 04:36:47,933][41694] Avg episode reward: [(0, '4.505')] +[2024-11-08 04:36:48,404][42004] Updated weights for policy 0, policy_version 36936 (0.0026) +[2024-11-08 04:36:52,932][41694] Fps is (10 sec: 7781.6, 60 sec: 6690.0, 300 sec: 6803.5). Total num frames: 151322624. Throughput: 0: 1716.7. Samples: 32824736. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-08 04:36:52,935][41694] Avg episode reward: [(0, '4.565')] +[2024-11-08 04:36:53,897][42004] Updated weights for policy 0, policy_version 36946 (0.0031) +[2024-11-08 04:36:57,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6886.7, 300 sec: 6845.2). Total num frames: 151359488. Throughput: 0: 1751.2. Samples: 32836140. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:36:57,935][41694] Avg episode reward: [(0, '4.563')] +[2024-11-08 04:36:59,526][42004] Updated weights for policy 0, policy_version 36956 (0.0025) +[2024-11-08 04:37:02,932][41694] Fps is (10 sec: 6963.8, 60 sec: 6894.9, 300 sec: 6845.2). Total num frames: 151392256. Throughput: 0: 1748.9. Samples: 32841396. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:37:02,935][41694] Avg episode reward: [(0, '4.384')] +[2024-11-08 04:37:05,370][42004] Updated weights for policy 0, policy_version 36966 (0.0049) +[2024-11-08 04:37:07,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6758.4, 300 sec: 6817.4). Total num frames: 151416832. Throughput: 0: 1709.1. Samples: 32850942. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:37:07,934][41694] Avg episode reward: [(0, '4.356')] +[2024-11-08 04:37:12,925][42004] Updated weights for policy 0, policy_version 36976 (0.0031) +[2024-11-08 04:37:12,932][41694] Fps is (10 sec: 6144.0, 60 sec: 6826.7, 300 sec: 6817.4). Total num frames: 151453696. Throughput: 0: 1646.5. Samples: 32859412. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:37:12,933][41694] Avg episode reward: [(0, '4.572')] +[2024-11-08 04:37:17,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6758.4, 300 sec: 6803.5). Total num frames: 151486464. Throughput: 0: 1644.6. Samples: 32864462. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:37:17,935][41694] Avg episode reward: [(0, '4.496')] +[2024-11-08 04:37:18,628][42004] Updated weights for policy 0, policy_version 36986 (0.0026) +[2024-11-08 04:37:22,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6826.7, 300 sec: 6803.5). Total num frames: 151527424. Throughput: 0: 1727.5. Samples: 32875568. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:37:22,934][41694] Avg episode reward: [(0, '4.309')] +[2024-11-08 04:37:23,970][42004] Updated weights for policy 0, policy_version 36996 (0.0029) +[2024-11-08 04:37:27,932][41694] Fps is (10 sec: 7372.9, 60 sec: 6758.4, 300 sec: 6789.6). Total num frames: 151560192. Throughput: 0: 1743.1. Samples: 32886344. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:37:27,934][41694] Avg episode reward: [(0, '4.386')] +[2024-11-08 04:37:29,908][42004] Updated weights for policy 0, policy_version 37006 (0.0035) +[2024-11-08 04:37:32,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6997.8, 300 sec: 6831.3). Total num frames: 151597056. Throughput: 0: 1752.9. Samples: 32892048. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:37:32,934][41694] Avg episode reward: [(0, '4.528')] +[2024-11-08 04:37:35,470][42004] Updated weights for policy 0, policy_version 37016 (0.0035) +[2024-11-08 04:37:37,932][41694] Fps is (10 sec: 7372.8, 60 sec: 7031.5, 300 sec: 6845.2). Total num frames: 151633920. Throughput: 0: 1740.6. Samples: 32903062. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:37:37,933][41694] Avg episode reward: [(0, '4.434')] +[2024-11-08 04:37:37,946][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000037020_151633920.pth... +[2024-11-08 04:37:38,068][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000036622_150003712.pth +[2024-11-08 04:37:42,777][42004] Updated weights for policy 0, policy_version 37026 (0.0029) +[2024-11-08 04:37:42,931][41694] Fps is (10 sec: 6144.1, 60 sec: 6894.9, 300 sec: 6817.4). Total num frames: 151658496. Throughput: 0: 1659.5. Samples: 32910818. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:37:42,933][41694] Avg episode reward: [(0, '4.535')] +[2024-11-08 04:37:47,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6758.4, 300 sec: 6789.6). Total num frames: 151691264. Throughput: 0: 1652.2. Samples: 32915746. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:37:47,935][41694] Avg episode reward: [(0, '4.500')] +[2024-11-08 04:37:48,862][42004] Updated weights for policy 0, policy_version 37036 (0.0026) +[2024-11-08 04:37:52,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6758.5, 300 sec: 6789.6). Total num frames: 151728128. Throughput: 0: 1667.3. Samples: 32925972. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:37:52,932][41694] Avg episode reward: [(0, '4.394')] +[2024-11-08 04:37:54,502][42004] Updated weights for policy 0, policy_version 37046 (0.0026) +[2024-11-08 04:37:57,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6690.1, 300 sec: 6775.8). Total num frames: 151760896. Throughput: 0: 1730.0. Samples: 32937264. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:37:57,933][41694] Avg episode reward: [(0, '4.491')] +[2024-11-08 04:38:00,274][42004] Updated weights for policy 0, policy_version 37056 (0.0034) +[2024-11-08 04:38:02,932][41694] Fps is (10 sec: 6963.0, 60 sec: 6758.4, 300 sec: 6819.2). Total num frames: 151797760. Throughput: 0: 1732.8. Samples: 32942436. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:38:02,934][41694] Avg episode reward: [(0, '4.538')] +[2024-11-08 04:38:05,996][42004] Updated weights for policy 0, policy_version 37066 (0.0024) +[2024-11-08 04:38:07,932][41694] Fps is (10 sec: 7372.6, 60 sec: 6963.2, 300 sec: 6817.4). Total num frames: 151834624. Throughput: 0: 1723.5. Samples: 32953124. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:38:07,934][41694] Avg episode reward: [(0, '4.368')] +[2024-11-08 04:38:11,557][42004] Updated weights for policy 0, policy_version 37076 (0.0034) +[2024-11-08 04:38:12,932][41694] Fps is (10 sec: 7372.9, 60 sec: 6963.2, 300 sec: 6831.3). Total num frames: 151871488. Throughput: 0: 1732.0. Samples: 32964282. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:38:12,933][41694] Avg episode reward: [(0, '4.542')] +[2024-11-08 04:38:17,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6758.4, 300 sec: 6775.8). Total num frames: 151891968. Throughput: 0: 1651.8. Samples: 32966378. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:38:17,935][41694] Avg episode reward: [(0, '4.405')] +[2024-11-08 04:38:19,440][42004] Updated weights for policy 0, policy_version 37086 (0.0043) +[2024-11-08 04:38:22,934][41694] Fps is (10 sec: 5323.6, 60 sec: 6621.6, 300 sec: 6761.8). Total num frames: 151924736. Throughput: 0: 1620.6. Samples: 32975992. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:38:22,935][41694] Avg episode reward: [(0, '4.275')] +[2024-11-08 04:38:25,500][42004] Updated weights for policy 0, policy_version 37096 (0.0025) +[2024-11-08 04:38:27,932][41694] Fps is (10 sec: 6963.3, 60 sec: 6690.1, 300 sec: 6761.9). Total num frames: 151961600. Throughput: 0: 1684.5. Samples: 32986620. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:38:27,934][41694] Avg episode reward: [(0, '4.294')] +[2024-11-08 04:38:31,446][42004] Updated weights for policy 0, policy_version 37106 (0.0027) +[2024-11-08 04:38:32,931][41694] Fps is (10 sec: 6964.9, 60 sec: 6621.9, 300 sec: 6748.0). Total num frames: 151994368. Throughput: 0: 1687.6. Samples: 32991690. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:38:32,933][41694] Avg episode reward: [(0, '4.398')] +[2024-11-08 04:38:37,059][42004] Updated weights for policy 0, policy_version 37116 (0.0032) +[2024-11-08 04:38:37,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6621.9, 300 sec: 6792.8). Total num frames: 152031232. Throughput: 0: 1694.7. Samples: 33002236. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:38:37,933][41694] Avg episode reward: [(0, '4.376')] +[2024-11-08 04:38:42,593][42004] Updated weights for policy 0, policy_version 37126 (0.0024) +[2024-11-08 04:38:42,933][41694] Fps is (10 sec: 7371.7, 60 sec: 6826.5, 300 sec: 6803.5). Total num frames: 152068096. Throughput: 0: 1693.6. Samples: 33013478. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:38:42,935][41694] Avg episode reward: [(0, '4.360')] +[2024-11-08 04:38:48,934][41694] Fps is (10 sec: 6328.9, 60 sec: 6714.5, 300 sec: 6766.6). Total num frames: 152100864. Throughput: 0: 1663.3. Samples: 33018952. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:38:48,935][41694] Avg episode reward: [(0, '4.317')] +[2024-11-08 04:38:49,774][42004] Updated weights for policy 0, policy_version 37136 (0.0030) +[2024-11-08 04:38:52,931][41694] Fps is (10 sec: 5735.2, 60 sec: 6621.9, 300 sec: 6748.0). Total num frames: 152125440. Throughput: 0: 1636.5. Samples: 33026764. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:38:52,933][41694] Avg episode reward: [(0, '4.444')] +[2024-11-08 04:38:56,108][42004] Updated weights for policy 0, policy_version 37146 (0.0036) +[2024-11-08 04:38:57,932][41694] Fps is (10 sec: 6828.3, 60 sec: 6690.1, 300 sec: 6761.9). Total num frames: 152162304. Throughput: 0: 1608.8. Samples: 33036678. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:38:57,934][41694] Avg episode reward: [(0, '4.789')] +[2024-11-08 04:39:01,808][42004] Updated weights for policy 0, policy_version 37156 (0.0024) +[2024-11-08 04:39:02,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6621.9, 300 sec: 6748.0). Total num frames: 152195072. Throughput: 0: 1681.3. Samples: 33042036. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:39:02,934][41694] Avg episode reward: [(0, '4.461')] +[2024-11-08 04:39:07,388][42004] Updated weights for policy 0, policy_version 37166 (0.0029) +[2024-11-08 04:39:07,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6621.9, 300 sec: 6761.9). Total num frames: 152231936. Throughput: 0: 1711.3. Samples: 33052996. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:39:07,935][41694] Avg episode reward: [(0, '4.380')] +[2024-11-08 04:39:12,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6621.9, 300 sec: 6803.5). Total num frames: 152268800. Throughput: 0: 1711.3. Samples: 33063630. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:39:12,933][41694] Avg episode reward: [(0, '4.547')] +[2024-11-08 04:39:13,136][42004] Updated weights for policy 0, policy_version 37176 (0.0030) +[2024-11-08 04:39:17,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6895.0, 300 sec: 6803.5). Total num frames: 152305664. Throughput: 0: 1718.6. Samples: 33069028. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:39:17,936][41694] Avg episode reward: [(0, '4.348')] +[2024-11-08 04:39:18,807][42004] Updated weights for policy 0, policy_version 37186 (0.0026) +[2024-11-08 04:39:22,931][41694] Fps is (10 sec: 5734.5, 60 sec: 6690.4, 300 sec: 6761.9). Total num frames: 152326144. Throughput: 0: 1722.2. Samples: 33079734. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:39:22,933][41694] Avg episode reward: [(0, '4.449')] +[2024-11-08 04:39:26,644][42004] Updated weights for policy 0, policy_version 37196 (0.0031) +[2024-11-08 04:39:27,932][41694] Fps is (10 sec: 5324.8, 60 sec: 6621.9, 300 sec: 6748.0). Total num frames: 152358912. Throughput: 0: 1631.8. Samples: 33086908. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:39:27,934][41694] Avg episode reward: [(0, '4.306')] +[2024-11-08 04:39:32,583][42004] Updated weights for policy 0, policy_version 37206 (0.0032) +[2024-11-08 04:39:32,932][41694] Fps is (10 sec: 6963.0, 60 sec: 6690.1, 300 sec: 6748.0). Total num frames: 152395776. Throughput: 0: 1648.6. Samples: 33091486. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:39:32,936][41694] Avg episode reward: [(0, '4.541')] +[2024-11-08 04:39:37,912][42004] Updated weights for policy 0, policy_version 37216 (0.0026) +[2024-11-08 04:39:37,932][41694] Fps is (10 sec: 7782.0, 60 sec: 6758.3, 300 sec: 6761.9). Total num frames: 152436736. Throughput: 0: 1694.6. Samples: 33103020. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:39:37,934][41694] Avg episode reward: [(0, '4.462')] +[2024-11-08 04:39:37,943][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000037216_152436736.pth... +[2024-11-08 04:39:38,071][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000036818_150806528.pth +[2024-11-08 04:39:42,931][41694] Fps is (10 sec: 7782.5, 60 sec: 6758.6, 300 sec: 6761.9). Total num frames: 152473600. Throughput: 0: 1725.5. Samples: 33114326. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:39:42,933][41694] Avg episode reward: [(0, '4.384')] +[2024-11-08 04:39:43,555][42004] Updated weights for policy 0, policy_version 37226 (0.0031) +[2024-11-08 04:39:47,932][41694] Fps is (10 sec: 6963.6, 60 sec: 6873.2, 300 sec: 6803.5). Total num frames: 152506368. Throughput: 0: 1721.8. Samples: 33119518. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:39:47,933][41694] Avg episode reward: [(0, '4.423')] +[2024-11-08 04:39:49,269][42004] Updated weights for policy 0, policy_version 37236 (0.0028) +[2024-11-08 04:39:52,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6963.2, 300 sec: 6803.5). Total num frames: 152543232. Throughput: 0: 1720.1. Samples: 33130398. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:39:52,933][41694] Avg episode reward: [(0, '4.471')] +[2024-11-08 04:39:54,759][42004] Updated weights for policy 0, policy_version 37246 (0.0022) +[2024-11-08 04:39:57,932][41694] Fps is (10 sec: 6143.9, 60 sec: 6758.4, 300 sec: 6775.8). Total num frames: 152567808. Throughput: 0: 1660.6. Samples: 33138356. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:39:57,941][41694] Avg episode reward: [(0, '4.322')] +[2024-11-08 04:40:02,935][41694] Fps is (10 sec: 5322.8, 60 sec: 6689.7, 300 sec: 6747.9). Total num frames: 152596480. Throughput: 0: 1636.2. Samples: 33142664. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:40:02,940][41694] Avg episode reward: [(0, '4.320')] +[2024-11-08 04:40:03,152][42004] Updated weights for policy 0, policy_version 37256 (0.0029) +[2024-11-08 04:40:07,931][41694] Fps is (10 sec: 6553.7, 60 sec: 6690.2, 300 sec: 6748.0). Total num frames: 152633344. Throughput: 0: 1624.9. Samples: 33152856. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:40:07,933][41694] Avg episode reward: [(0, '4.483')] +[2024-11-08 04:40:08,630][42004] Updated weights for policy 0, policy_version 37266 (0.0031) +[2024-11-08 04:40:12,932][41694] Fps is (10 sec: 7375.2, 60 sec: 6690.1, 300 sec: 6761.9). Total num frames: 152670208. Throughput: 0: 1714.9. Samples: 33164080. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:40:12,934][41694] Avg episode reward: [(0, '4.499')] +[2024-11-08 04:40:14,589][42004] Updated weights for policy 0, policy_version 37276 (0.0032) +[2024-11-08 04:40:17,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6621.9, 300 sec: 6748.0). Total num frames: 152702976. Throughput: 0: 1722.2. Samples: 33168986. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:40:17,933][41694] Avg episode reward: [(0, '4.387')] +[2024-11-08 04:40:20,598][42004] Updated weights for policy 0, policy_version 37286 (0.0035) +[2024-11-08 04:40:22,932][41694] Fps is (10 sec: 6963.3, 60 sec: 6894.9, 300 sec: 6803.5). Total num frames: 152739840. Throughput: 0: 1694.3. Samples: 33179262. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:40:22,933][41694] Avg episode reward: [(0, '4.452')] +[2024-11-08 04:40:26,133][42004] Updated weights for policy 0, policy_version 37296 (0.0024) +[2024-11-08 04:40:27,933][41694] Fps is (10 sec: 7371.4, 60 sec: 6963.0, 300 sec: 6803.5). Total num frames: 152776704. Throughput: 0: 1689.4. Samples: 33190352. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:40:27,937][41694] Avg episode reward: [(0, '4.491')] +[2024-11-08 04:40:32,932][41694] Fps is (10 sec: 5734.2, 60 sec: 6690.1, 300 sec: 6748.0). Total num frames: 152797184. Throughput: 0: 1652.9. Samples: 33193898. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:40:32,936][41694] Avg episode reward: [(0, '4.387')] +[2024-11-08 04:40:33,717][42004] Updated weights for policy 0, policy_version 37306 (0.0029) +[2024-11-08 04:40:37,932][41694] Fps is (10 sec: 5325.8, 60 sec: 6553.7, 300 sec: 6734.1). Total num frames: 152829952. Throughput: 0: 1600.0. Samples: 33202398. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:40:37,933][41694] Avg episode reward: [(0, '4.634')] +[2024-11-08 04:40:39,761][42004] Updated weights for policy 0, policy_version 37316 (0.0051) +[2024-11-08 04:40:42,931][41694] Fps is (10 sec: 6963.7, 60 sec: 6553.6, 300 sec: 6720.2). Total num frames: 152866816. Throughput: 0: 1660.9. Samples: 33213094. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:40:42,933][41694] Avg episode reward: [(0, '4.428')] +[2024-11-08 04:40:45,354][42004] Updated weights for policy 0, policy_version 37326 (0.0025) +[2024-11-08 04:40:47,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6621.9, 300 sec: 6720.2). Total num frames: 152903680. Throughput: 0: 1688.3. Samples: 33218630. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:40:47,933][41694] Avg episode reward: [(0, '4.433')] +[2024-11-08 04:40:50,954][42004] Updated weights for policy 0, policy_version 37336 (0.0026) +[2024-11-08 04:40:52,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6621.9, 300 sec: 6759.7). Total num frames: 152940544. Throughput: 0: 1708.7. Samples: 33229748. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:40:52,935][41694] Avg episode reward: [(0, '4.575')] +[2024-11-08 04:40:56,390][42004] Updated weights for policy 0, policy_version 37346 (0.0027) +[2024-11-08 04:40:57,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6826.7, 300 sec: 6775.8). Total num frames: 152977408. Throughput: 0: 1706.2. Samples: 33240858. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:40:57,933][41694] Avg episode reward: [(0, '4.389')] +[2024-11-08 04:41:02,364][42004] Updated weights for policy 0, policy_version 37356 (0.0034) +[2024-11-08 04:41:02,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6963.6, 300 sec: 6789.6). Total num frames: 153014272. Throughput: 0: 1715.2. Samples: 33246172. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:41:02,933][41694] Avg episode reward: [(0, '4.506')] +[2024-11-08 04:41:07,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6690.1, 300 sec: 6748.0). Total num frames: 153034752. Throughput: 0: 1644.3. Samples: 33253254. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:41:07,933][41694] Avg episode reward: [(0, '4.490')] +[2024-11-08 04:41:10,453][42004] Updated weights for policy 0, policy_version 37366 (0.0050) +[2024-11-08 04:41:12,931][41694] Fps is (10 sec: 4915.2, 60 sec: 6553.6, 300 sec: 6720.2). Total num frames: 153063424. Throughput: 0: 1603.5. Samples: 33262508. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:41:12,933][41694] Avg episode reward: [(0, '4.579')] +[2024-11-08 04:41:16,743][42004] Updated weights for policy 0, policy_version 37376 (0.0034) +[2024-11-08 04:41:17,932][41694] Fps is (10 sec: 6553.6, 60 sec: 6621.9, 300 sec: 6720.2). Total num frames: 153100288. Throughput: 0: 1631.8. Samples: 33267328. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:41:17,934][41694] Avg episode reward: [(0, '4.386')] +[2024-11-08 04:41:22,523][42004] Updated weights for policy 0, policy_version 37386 (0.0034) +[2024-11-08 04:41:22,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6553.6, 300 sec: 6706.3). Total num frames: 153133056. Throughput: 0: 1679.4. Samples: 33277972. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:41:22,934][41694] Avg episode reward: [(0, '4.542')] +[2024-11-08 04:41:27,913][42004] Updated weights for policy 0, policy_version 37396 (0.0046) +[2024-11-08 04:41:27,931][41694] Fps is (10 sec: 7373.0, 60 sec: 6622.1, 300 sec: 6768.7). Total num frames: 153174016. Throughput: 0: 1693.8. Samples: 33289314. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:41:27,933][41694] Avg episode reward: [(0, '4.487')] +[2024-11-08 04:41:32,932][41694] Fps is (10 sec: 7782.3, 60 sec: 6895.0, 300 sec: 6775.8). Total num frames: 153210880. Throughput: 0: 1693.2. Samples: 33294824. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:41:32,933][41694] Avg episode reward: [(0, '4.712')] +[2024-11-08 04:41:33,451][42004] Updated weights for policy 0, policy_version 37406 (0.0025) +[2024-11-08 04:41:38,813][41694] Fps is (10 sec: 6022.4, 60 sec: 6727.8, 300 sec: 6741.7). Total num frames: 153239552. Throughput: 0: 1662.0. Samples: 33306002. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:41:38,815][41694] Avg episode reward: [(0, '4.573')] +[2024-11-08 04:41:38,833][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000037412_153239552.pth... +[2024-11-08 04:41:38,966][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000037020_151633920.pth +[2024-11-08 04:41:41,104][42004] Updated weights for policy 0, policy_version 37416 (0.0028) +[2024-11-08 04:41:42,931][41694] Fps is (10 sec: 5324.9, 60 sec: 6621.9, 300 sec: 6706.3). Total num frames: 153264128. Throughput: 0: 1602.8. Samples: 33312986. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:41:42,933][41694] Avg episode reward: [(0, '4.432')] +[2024-11-08 04:41:47,145][42004] Updated weights for policy 0, policy_version 37426 (0.0030) +[2024-11-08 04:41:47,932][41694] Fps is (10 sec: 6738.2, 60 sec: 6621.8, 300 sec: 6706.4). Total num frames: 153300992. Throughput: 0: 1593.0. Samples: 33317856. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:41:47,935][41694] Avg episode reward: [(0, '4.491')] +[2024-11-08 04:41:52,565][42004] Updated weights for policy 0, policy_version 37436 (0.0028) +[2024-11-08 04:41:52,931][41694] Fps is (10 sec: 7372.7, 60 sec: 6621.9, 300 sec: 6706.3). Total num frames: 153337856. Throughput: 0: 1683.7. Samples: 33329022. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:41:52,933][41694] Avg episode reward: [(0, '4.371')] +[2024-11-08 04:41:57,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6621.8, 300 sec: 6720.2). Total num frames: 153374720. Throughput: 0: 1725.6. Samples: 33340160. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:41:57,934][41694] Avg episode reward: [(0, '4.232')] +[2024-11-08 04:41:58,096][42004] Updated weights for policy 0, policy_version 37446 (0.0022) +[2024-11-08 04:42:02,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6621.8, 300 sec: 6761.9). Total num frames: 153411584. Throughput: 0: 1745.6. Samples: 33345882. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:42:02,934][41694] Avg episode reward: [(0, '4.411')] +[2024-11-08 04:42:04,677][42004] Updated weights for policy 0, policy_version 37456 (0.0023) +[2024-11-08 04:42:07,932][41694] Fps is (10 sec: 6144.1, 60 sec: 6690.1, 300 sec: 6720.2). Total num frames: 153436160. Throughput: 0: 1695.2. Samples: 33354256. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:42:07,934][41694] Avg episode reward: [(0, '4.469')] +[2024-11-08 04:42:12,932][41694] Fps is (10 sec: 4505.6, 60 sec: 6553.6, 300 sec: 6678.6). Total num frames: 153456640. Throughput: 0: 1611.8. Samples: 33361844. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:42:12,934][41694] Avg episode reward: [(0, '4.432')] +[2024-11-08 04:42:13,347][42004] Updated weights for policy 0, policy_version 37466 (0.0042) +[2024-11-08 04:42:17,932][41694] Fps is (10 sec: 5324.9, 60 sec: 6485.3, 300 sec: 6650.8). Total num frames: 153489408. Throughput: 0: 1558.6. Samples: 33364960. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:42:17,933][41694] Avg episode reward: [(0, '4.372')] +[2024-11-08 04:42:19,601][42004] Updated weights for policy 0, policy_version 37476 (0.0038) +[2024-11-08 04:42:22,931][41694] Fps is (10 sec: 6553.7, 60 sec: 6485.3, 300 sec: 6650.8). Total num frames: 153522176. Throughput: 0: 1571.5. Samples: 33375332. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:42:22,933][41694] Avg episode reward: [(0, '4.499')] +[2024-11-08 04:42:25,231][42004] Updated weights for policy 0, policy_version 37486 (0.0031) +[2024-11-08 04:42:27,932][41694] Fps is (10 sec: 6963.0, 60 sec: 6417.0, 300 sec: 6650.8). Total num frames: 153559040. Throughput: 0: 1631.5. Samples: 33386406. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:42:27,934][41694] Avg episode reward: [(0, '4.479')] +[2024-11-08 04:42:30,948][42004] Updated weights for policy 0, policy_version 37496 (0.0041) +[2024-11-08 04:42:32,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6417.1, 300 sec: 6650.8). Total num frames: 153595904. Throughput: 0: 1641.7. Samples: 33391732. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:42:32,933][41694] Avg episode reward: [(0, '4.617')] +[2024-11-08 04:42:36,503][42004] Updated weights for policy 0, policy_version 37506 (0.0027) +[2024-11-08 04:42:37,933][41694] Fps is (10 sec: 7371.9, 60 sec: 6651.2, 300 sec: 6692.4). Total num frames: 153632768. Throughput: 0: 1643.0. Samples: 33402960. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:42:37,938][41694] Avg episode reward: [(0, '4.562')] +[2024-11-08 04:42:41,982][42004] Updated weights for policy 0, policy_version 37516 (0.0031) +[2024-11-08 04:42:42,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6758.4, 300 sec: 6706.3). Total num frames: 153669632. Throughput: 0: 1642.0. Samples: 33414050. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:42:42,934][41694] Avg episode reward: [(0, '4.483')] +[2024-11-08 04:42:47,932][41694] Fps is (10 sec: 6144.9, 60 sec: 6553.6, 300 sec: 6664.7). Total num frames: 153694208. Throughput: 0: 1634.8. Samples: 33419450. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:42:47,939][41694] Avg episode reward: [(0, '4.430')] +[2024-11-08 04:42:49,771][42004] Updated weights for policy 0, policy_version 37526 (0.0050) +[2024-11-08 04:42:52,931][41694] Fps is (10 sec: 5734.5, 60 sec: 6485.3, 300 sec: 6664.7). Total num frames: 153726976. Throughput: 0: 1602.0. Samples: 33426344. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:42:52,933][41694] Avg episode reward: [(0, '4.287')] +[2024-11-08 04:42:55,675][42004] Updated weights for policy 0, policy_version 37536 (0.0034) +[2024-11-08 04:42:57,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6485.3, 300 sec: 6664.7). Total num frames: 153763840. Throughput: 0: 1669.4. Samples: 33436966. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:42:57,935][41694] Avg episode reward: [(0, '4.586')] +[2024-11-08 04:43:01,332][42004] Updated weights for policy 0, policy_version 37546 (0.0034) +[2024-11-08 04:43:02,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6417.1, 300 sec: 6650.8). Total num frames: 153796608. Throughput: 0: 1724.1. Samples: 33442544. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:43:02,939][41694] Avg episode reward: [(0, '4.349')] +[2024-11-08 04:43:07,636][42004] Updated weights for policy 0, policy_version 37556 (0.0032) +[2024-11-08 04:43:07,932][41694] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6636.9). Total num frames: 153829376. Throughput: 0: 1700.8. Samples: 33451870. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:43:07,935][41694] Avg episode reward: [(0, '4.318')] +[2024-11-08 04:43:12,932][41694] Fps is (10 sec: 6963.3, 60 sec: 6826.7, 300 sec: 6692.5). Total num frames: 153866240. Throughput: 0: 1691.8. Samples: 33462538. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:43:12,934][41694] Avg episode reward: [(0, '4.504')] +[2024-11-08 04:43:13,370][42004] Updated weights for policy 0, policy_version 37566 (0.0029) +[2024-11-08 04:43:17,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6894.9, 300 sec: 6706.4). Total num frames: 153903104. Throughput: 0: 1701.4. Samples: 33468294. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:43:17,934][41694] Avg episode reward: [(0, '4.524')] +[2024-11-08 04:43:18,847][42004] Updated weights for policy 0, policy_version 37576 (0.0037) +[2024-11-08 04:43:22,934][41694] Fps is (10 sec: 5733.7, 60 sec: 6690.0, 300 sec: 6650.8). Total num frames: 153923584. Throughput: 0: 1615.9. Samples: 33475676. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:43:22,937][41694] Avg episode reward: [(0, '4.329')] +[2024-11-08 04:43:27,050][42004] Updated weights for policy 0, policy_version 37586 (0.0063) +[2024-11-08 04:43:27,933][41694] Fps is (10 sec: 5324.3, 60 sec: 6621.8, 300 sec: 6650.8). Total num frames: 153956352. Throughput: 0: 1591.3. Samples: 33485660. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:43:27,936][41694] Avg episode reward: [(0, '4.470')] +[2024-11-08 04:43:32,841][42004] Updated weights for policy 0, policy_version 37596 (0.0038) +[2024-11-08 04:43:32,932][41694] Fps is (10 sec: 6964.0, 60 sec: 6621.9, 300 sec: 6650.8). Total num frames: 153993216. Throughput: 0: 1581.4. Samples: 33490612. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:43:32,934][41694] Avg episode reward: [(0, '4.422')] +[2024-11-08 04:43:37,932][41694] Fps is (10 sec: 7373.5, 60 sec: 6622.0, 300 sec: 6650.8). Total num frames: 154030080. Throughput: 0: 1676.1. Samples: 33501768. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:43:37,934][41694] Avg episode reward: [(0, '4.340')] +[2024-11-08 04:43:37,943][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000037605_154030080.pth... +[2024-11-08 04:43:38,050][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000037216_152436736.pth +[2024-11-08 04:43:38,324][42004] Updated weights for policy 0, policy_version 37606 (0.0027) +[2024-11-08 04:43:42,932][41694] Fps is (10 sec: 7372.3, 60 sec: 6621.8, 300 sec: 6687.4). Total num frames: 154066944. Throughput: 0: 1691.0. Samples: 33513060. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:43:42,937][41694] Avg episode reward: [(0, '4.419')] +[2024-11-08 04:43:43,802][42004] Updated weights for policy 0, policy_version 37616 (0.0022) +[2024-11-08 04:43:47,932][41694] Fps is (10 sec: 7372.2, 60 sec: 6826.6, 300 sec: 6706.3). Total num frames: 154103808. Throughput: 0: 1683.9. Samples: 33518322. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:43:47,934][41694] Avg episode reward: [(0, '4.130')] +[2024-11-08 04:43:49,529][42004] Updated weights for policy 0, policy_version 37626 (0.0026) +[2024-11-08 04:43:52,931][41694] Fps is (10 sec: 7373.4, 60 sec: 6894.9, 300 sec: 6706.3). Total num frames: 154140672. Throughput: 0: 1718.1. Samples: 33529184. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:43:52,933][41694] Avg episode reward: [(0, '4.355')] +[2024-11-08 04:43:57,268][42004] Updated weights for policy 0, policy_version 37636 (0.0023) +[2024-11-08 04:43:57,932][41694] Fps is (10 sec: 5325.3, 60 sec: 6553.6, 300 sec: 6650.8). Total num frames: 154157056. Throughput: 0: 1637.4. Samples: 33536222. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:43:57,934][41694] Avg episode reward: [(0, '4.434')] +[2024-11-08 04:44:02,932][41694] Fps is (10 sec: 5324.7, 60 sec: 6621.9, 300 sec: 6650.8). Total num frames: 154193920. Throughput: 0: 1622.1. Samples: 33541290. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:44:02,933][41694] Avg episode reward: [(0, '4.243')] +[2024-11-08 04:44:03,295][42004] Updated weights for policy 0, policy_version 37646 (0.0028) +[2024-11-08 04:44:07,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6690.2, 300 sec: 6650.8). Total num frames: 154230784. Throughput: 0: 1700.8. Samples: 33552208. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:44:07,933][41694] Avg episode reward: [(0, '4.611')] +[2024-11-08 04:44:08,555][42004] Updated weights for policy 0, policy_version 37656 (0.0028) +[2024-11-08 04:44:12,932][41694] Fps is (10 sec: 7372.9, 60 sec: 6690.1, 300 sec: 6650.8). Total num frames: 154267648. Throughput: 0: 1728.8. Samples: 33563456. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:44:12,934][41694] Avg episode reward: [(0, '4.258')] +[2024-11-08 04:44:13,986][42004] Updated weights for policy 0, policy_version 37666 (0.0026) +[2024-11-08 04:44:17,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6690.1, 300 sec: 6706.3). Total num frames: 154304512. Throughput: 0: 1748.4. Samples: 33569292. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:44:17,935][41694] Avg episode reward: [(0, '4.398')] +[2024-11-08 04:44:19,726][42004] Updated weights for policy 0, policy_version 37676 (0.0027) +[2024-11-08 04:44:22,932][41694] Fps is (10 sec: 7782.4, 60 sec: 7031.6, 300 sec: 6734.1). Total num frames: 154345472. Throughput: 0: 1745.2. Samples: 33580300. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:44:22,934][41694] Avg episode reward: [(0, '4.437')] +[2024-11-08 04:44:25,181][42004] Updated weights for policy 0, policy_version 37686 (0.0032) +[2024-11-08 04:44:28,855][41694] Fps is (10 sec: 6374.6, 60 sec: 6857.8, 300 sec: 6685.4). Total num frames: 154374144. Throughput: 0: 1697.2. Samples: 33590998. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:44:28,857][41694] Avg episode reward: [(0, '4.569')] +[2024-11-08 04:44:32,935][41694] Fps is (10 sec: 5324.7, 60 sec: 6758.4, 300 sec: 6650.8). Total num frames: 154398720. Throughput: 0: 1658.3. Samples: 33592942. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:44:32,938][41694] Avg episode reward: [(0, '4.473')] +[2024-11-08 04:44:32,982][42004] Updated weights for policy 0, policy_version 37696 (0.0037) +[2024-11-08 04:44:37,932][41694] Fps is (10 sec: 6769.0, 60 sec: 6758.4, 300 sec: 6650.8). Total num frames: 154435584. Throughput: 0: 1654.8. Samples: 33603648. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:44:37,934][41694] Avg episode reward: [(0, '4.440')] +[2024-11-08 04:44:38,610][42004] Updated weights for policy 0, policy_version 37706 (0.0039) +[2024-11-08 04:44:42,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6826.7, 300 sec: 6678.6). Total num frames: 154476544. Throughput: 0: 1751.3. Samples: 33615030. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:44:42,934][41694] Avg episode reward: [(0, '4.448')] +[2024-11-08 04:44:44,016][42004] Updated weights for policy 0, policy_version 37716 (0.0022) +[2024-11-08 04:44:47,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6758.5, 300 sec: 6664.7). Total num frames: 154509312. Throughput: 0: 1755.6. Samples: 33620292. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:44:47,933][41694] Avg episode reward: [(0, '4.447')] +[2024-11-08 04:44:49,647][42004] Updated weights for policy 0, policy_version 37726 (0.0030) +[2024-11-08 04:44:52,932][41694] Fps is (10 sec: 6553.5, 60 sec: 6690.1, 300 sec: 6692.4). Total num frames: 154542080. Throughput: 0: 1755.8. Samples: 33631220. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:44:52,933][41694] Avg episode reward: [(0, '4.464')] +[2024-11-08 04:44:55,817][42004] Updated weights for policy 0, policy_version 37736 (0.0032) +[2024-11-08 04:44:57,932][41694] Fps is (10 sec: 6962.9, 60 sec: 7031.4, 300 sec: 6720.3). Total num frames: 154578944. Throughput: 0: 1731.3. Samples: 33641366. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:44:57,936][41694] Avg episode reward: [(0, '4.594')] +[2024-11-08 04:45:02,943][41694] Fps is (10 sec: 5728.0, 60 sec: 6757.1, 300 sec: 6664.4). Total num frames: 154599424. Throughput: 0: 1716.6. Samples: 33646560. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 04:45:02,949][41694] Avg episode reward: [(0, '4.572')] +[2024-11-08 04:45:03,781][42004] Updated weights for policy 0, policy_version 37746 (0.0032) +[2024-11-08 04:45:07,931][41694] Fps is (10 sec: 5325.0, 60 sec: 6690.1, 300 sec: 6650.8). Total num frames: 154632192. Throughput: 0: 1609.9. Samples: 33652746. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 04:45:07,933][41694] Avg episode reward: [(0, '4.411')] +[2024-11-08 04:45:09,772][42004] Updated weights for policy 0, policy_version 37756 (0.0035) +[2024-11-08 04:45:12,931][41694] Fps is (10 sec: 6971.1, 60 sec: 6690.1, 300 sec: 6664.7). Total num frames: 154669056. Throughput: 0: 1642.6. Samples: 33663396. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 04:45:12,938][41694] Avg episode reward: [(0, '4.434')] +[2024-11-08 04:45:15,864][42004] Updated weights for policy 0, policy_version 37766 (0.0040) +[2024-11-08 04:45:17,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6621.9, 300 sec: 6650.8). Total num frames: 154701824. Throughput: 0: 1678.7. Samples: 33668482. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 04:45:17,934][41694] Avg episode reward: [(0, '4.410')] +[2024-11-08 04:45:21,883][42004] Updated weights for policy 0, policy_version 37776 (0.0041) +[2024-11-08 04:45:22,934][41694] Fps is (10 sec: 6961.6, 60 sec: 6553.3, 300 sec: 6650.8). Total num frames: 154738688. Throughput: 0: 1660.9. Samples: 33678392. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:45:22,939][41694] Avg episode reward: [(0, '4.485')] +[2024-11-08 04:45:27,538][42004] Updated weights for policy 0, policy_version 37786 (0.0030) +[2024-11-08 04:45:27,933][41694] Fps is (10 sec: 6962.5, 60 sec: 6725.3, 300 sec: 6692.4). Total num frames: 154771456. Throughput: 0: 1655.0. Samples: 33689508. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:45:27,936][41694] Avg episode reward: [(0, '4.414')] +[2024-11-08 04:45:32,931][41694] Fps is (10 sec: 6555.2, 60 sec: 6758.4, 300 sec: 6692.4). Total num frames: 154804224. Throughput: 0: 1653.9. Samples: 33694718. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:45:32,933][41694] Avg episode reward: [(0, '4.375')] +[2024-11-08 04:45:33,586][42004] Updated weights for policy 0, policy_version 37796 (0.0027) +[2024-11-08 04:45:37,931][41694] Fps is (10 sec: 5325.4, 60 sec: 6485.4, 300 sec: 6636.9). Total num frames: 154824704. Throughput: 0: 1593.9. Samples: 33702944. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:45:37,933][41694] Avg episode reward: [(0, '4.516')] +[2024-11-08 04:45:38,043][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000037800_154828800.pth... +[2024-11-08 04:45:38,185][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000037412_153239552.pth +[2024-11-08 04:45:41,842][42004] Updated weights for policy 0, policy_version 37806 (0.0030) +[2024-11-08 04:45:42,931][41694] Fps is (10 sec: 5734.4, 60 sec: 6417.1, 300 sec: 6636.9). Total num frames: 154861568. Throughput: 0: 1552.5. Samples: 33711228. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:45:42,933][41694] Avg episode reward: [(0, '4.390')] +[2024-11-08 04:45:47,369][42004] Updated weights for policy 0, policy_version 37816 (0.0029) +[2024-11-08 04:45:47,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6485.3, 300 sec: 6636.9). Total num frames: 154898432. Throughput: 0: 1555.4. Samples: 33716534. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:45:47,934][41694] Avg episode reward: [(0, '4.373')] +[2024-11-08 04:45:52,822][42004] Updated weights for policy 0, policy_version 37826 (0.0037) +[2024-11-08 04:45:52,932][41694] Fps is (10 sec: 7372.6, 60 sec: 6553.6, 300 sec: 6636.9). Total num frames: 154935296. Throughput: 0: 1672.8. Samples: 33728024. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:45:52,934][41694] Avg episode reward: [(0, '4.461')] +[2024-11-08 04:45:57,932][41694] Fps is (10 sec: 7372.2, 60 sec: 6553.6, 300 sec: 6636.9). Total num frames: 154972160. Throughput: 0: 1685.2. Samples: 33739232. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:45:57,934][41694] Avg episode reward: [(0, '4.406')] +[2024-11-08 04:45:58,206][42004] Updated weights for policy 0, policy_version 37836 (0.0026) +[2024-11-08 04:46:02,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6759.7, 300 sec: 6678.6). Total num frames: 155004928. Throughput: 0: 1691.4. Samples: 33744596. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:46:02,933][41694] Avg episode reward: [(0, '4.518')] +[2024-11-08 04:46:04,001][42004] Updated weights for policy 0, policy_version 37846 (0.0032) +[2024-11-08 04:46:07,931][41694] Fps is (10 sec: 7373.4, 60 sec: 6894.9, 300 sec: 6720.2). Total num frames: 155045888. Throughput: 0: 1712.4. Samples: 33755446. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:46:07,933][41694] Avg episode reward: [(0, '4.373')] +[2024-11-08 04:46:11,931][42004] Updated weights for policy 0, policy_version 37856 (0.0031) +[2024-11-08 04:46:12,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6553.6, 300 sec: 6650.8). Total num frames: 155062272. Throughput: 0: 1615.5. Samples: 33762204. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:46:12,934][41694] Avg episode reward: [(0, '4.406')] +[2024-11-08 04:46:17,932][41694] Fps is (10 sec: 4915.1, 60 sec: 6553.6, 300 sec: 6650.8). Total num frames: 155095040. Throughput: 0: 1601.7. Samples: 33766794. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:46:17,935][41694] Avg episode reward: [(0, '4.506')] +[2024-11-08 04:46:18,333][42004] Updated weights for policy 0, policy_version 37866 (0.0029) +[2024-11-08 04:46:22,932][41694] Fps is (10 sec: 6553.6, 60 sec: 6485.6, 300 sec: 6623.0). Total num frames: 155127808. Throughput: 0: 1648.0. Samples: 33777104. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:46:22,934][41694] Avg episode reward: [(0, '4.421')] +[2024-11-08 04:46:24,378][42004] Updated weights for policy 0, policy_version 37876 (0.0044) +[2024-11-08 04:46:27,933][41694] Fps is (10 sec: 6962.6, 60 sec: 6553.6, 300 sec: 6623.0). Total num frames: 155164672. Throughput: 0: 1684.0. Samples: 33787010. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:46:27,934][41694] Avg episode reward: [(0, '4.114')] +[2024-11-08 04:46:30,013][42004] Updated weights for policy 0, policy_version 37886 (0.0025) +[2024-11-08 04:46:32,932][41694] Fps is (10 sec: 7372.9, 60 sec: 6621.9, 300 sec: 6670.7). Total num frames: 155201536. Throughput: 0: 1693.4. Samples: 33792738. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:46:32,934][41694] Avg episode reward: [(0, '4.451')] +[2024-11-08 04:46:35,686][42004] Updated weights for policy 0, policy_version 37896 (0.0027) +[2024-11-08 04:46:37,932][41694] Fps is (10 sec: 7373.6, 60 sec: 6894.9, 300 sec: 6692.4). Total num frames: 155238400. Throughput: 0: 1684.5. Samples: 33803828. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:46:37,933][41694] Avg episode reward: [(0, '4.385')] +[2024-11-08 04:46:41,252][42004] Updated weights for policy 0, policy_version 37906 (0.0026) +[2024-11-08 04:46:42,939][41694] Fps is (10 sec: 6958.3, 60 sec: 6825.8, 300 sec: 6678.4). Total num frames: 155271168. Throughput: 0: 1676.3. Samples: 33814678. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:46:42,942][41694] Avg episode reward: [(0, '4.259')] +[2024-11-08 04:46:47,932][41694] Fps is (10 sec: 5324.6, 60 sec: 6553.6, 300 sec: 6623.0). Total num frames: 155291648. Throughput: 0: 1596.7. Samples: 33816446. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:46:47,934][41694] Avg episode reward: [(0, '4.380')] +[2024-11-08 04:46:49,494][42004] Updated weights for policy 0, policy_version 37916 (0.0030) +[2024-11-08 04:46:52,931][41694] Fps is (10 sec: 5328.6, 60 sec: 6485.3, 300 sec: 6609.1). Total num frames: 155324416. Throughput: 0: 1562.4. Samples: 33825756. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:46:52,934][41694] Avg episode reward: [(0, '4.421')] +[2024-11-08 04:46:55,285][42004] Updated weights for policy 0, policy_version 37926 (0.0030) +[2024-11-08 04:46:57,931][41694] Fps is (10 sec: 6963.5, 60 sec: 6485.4, 300 sec: 6609.1). Total num frames: 155361280. Throughput: 0: 1662.5. Samples: 33837018. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:46:57,933][41694] Avg episode reward: [(0, '4.436')] +[2024-11-08 04:47:00,562][42004] Updated weights for policy 0, policy_version 37936 (0.0040) +[2024-11-08 04:47:02,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6553.6, 300 sec: 6650.8). Total num frames: 155398144. Throughput: 0: 1691.7. Samples: 33842920. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:47:02,933][41694] Avg episode reward: [(0, '4.429')] +[2024-11-08 04:47:06,419][42004] Updated weights for policy 0, policy_version 37946 (0.0040) +[2024-11-08 04:47:07,932][41694] Fps is (10 sec: 7372.6, 60 sec: 6485.3, 300 sec: 6706.3). Total num frames: 155435008. Throughput: 0: 1698.7. Samples: 33853546. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:47:07,933][41694] Avg episode reward: [(0, '4.588')] +[2024-11-08 04:47:11,889][42004] Updated weights for policy 0, policy_version 37956 (0.0030) +[2024-11-08 04:47:12,931][41694] Fps is (10 sec: 7373.0, 60 sec: 6826.7, 300 sec: 6720.2). Total num frames: 155471872. Throughput: 0: 1729.4. Samples: 33864830. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:47:12,933][41694] Avg episode reward: [(0, '4.448')] +[2024-11-08 04:47:18,823][41694] Fps is (10 sec: 6393.6, 60 sec: 6726.8, 300 sec: 6700.0). Total num frames: 155504640. Throughput: 0: 1687.2. Samples: 33870164. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:47:18,825][41694] Avg episode reward: [(0, '4.229')] +[2024-11-08 04:47:19,308][42004] Updated weights for policy 0, policy_version 37966 (0.0023) +[2024-11-08 04:47:22,931][41694] Fps is (10 sec: 5734.4, 60 sec: 6690.2, 300 sec: 6678.6). Total num frames: 155529216. Throughput: 0: 1632.3. Samples: 33877280. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:47:22,933][41694] Avg episode reward: [(0, '4.268')] +[2024-11-08 04:47:25,764][42004] Updated weights for policy 0, policy_version 37976 (0.0045) +[2024-11-08 04:47:27,931][41694] Fps is (10 sec: 6745.0, 60 sec: 6690.3, 300 sec: 6678.6). Total num frames: 155566080. Throughput: 0: 1618.0. Samples: 33887476. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:47:27,933][41694] Avg episode reward: [(0, '4.420')] +[2024-11-08 04:47:31,110][42004] Updated weights for policy 0, policy_version 37986 (0.0026) +[2024-11-08 04:47:32,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6690.1, 300 sec: 6678.6). Total num frames: 155602944. Throughput: 0: 1704.0. Samples: 33893124. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:47:32,933][41694] Avg episode reward: [(0, '4.461')] +[2024-11-08 04:47:36,487][42004] Updated weights for policy 0, policy_version 37996 (0.0027) +[2024-11-08 04:47:37,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6690.1, 300 sec: 6678.6). Total num frames: 155639808. Throughput: 0: 1746.9. Samples: 33904368. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:47:37,933][41694] Avg episode reward: [(0, '4.609')] +[2024-11-08 04:47:37,948][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000037998_155639808.pth... +[2024-11-08 04:47:38,096][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000037605_154030080.pth +[2024-11-08 04:47:42,148][42004] Updated weights for policy 0, policy_version 38006 (0.0030) +[2024-11-08 04:47:42,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6759.2, 300 sec: 6720.2). Total num frames: 155676672. Throughput: 0: 1745.7. Samples: 33915574. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:47:42,934][41694] Avg episode reward: [(0, '4.396')] +[2024-11-08 04:47:47,393][42004] Updated weights for policy 0, policy_version 38016 (0.0031) +[2024-11-08 04:47:47,932][41694] Fps is (10 sec: 7372.8, 60 sec: 7031.5, 300 sec: 6734.1). Total num frames: 155713536. Throughput: 0: 1742.8. Samples: 33921346. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:47:47,933][41694] Avg episode reward: [(0, '4.306')] +[2024-11-08 04:47:52,932][41694] Fps is (10 sec: 6144.0, 60 sec: 6894.9, 300 sec: 6692.4). Total num frames: 155738112. Throughput: 0: 1752.9. Samples: 33932426. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:47:52,934][41694] Avg episode reward: [(0, '4.550')] +[2024-11-08 04:47:54,893][42004] Updated weights for policy 0, policy_version 38026 (0.0027) +[2024-11-08 04:47:57,933][41694] Fps is (10 sec: 6143.1, 60 sec: 6894.7, 300 sec: 6706.3). Total num frames: 155774976. Throughput: 0: 1663.6. Samples: 33939694. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:47:57,936][41694] Avg episode reward: [(0, '4.689')] +[2024-11-08 04:48:00,695][42004] Updated weights for policy 0, policy_version 38036 (0.0025) +[2024-11-08 04:48:02,932][41694] Fps is (10 sec: 6962.6, 60 sec: 6826.6, 300 sec: 6706.3). Total num frames: 155807744. Throughput: 0: 1700.1. Samples: 33945154. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:48:02,936][41694] Avg episode reward: [(0, '4.429')] +[2024-11-08 04:48:06,968][42004] Updated weights for policy 0, policy_version 38046 (0.0031) +[2024-11-08 04:48:07,932][41694] Fps is (10 sec: 6554.6, 60 sec: 6758.4, 300 sec: 6692.4). Total num frames: 155840512. Throughput: 0: 1727.4. Samples: 33955014. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:48:07,934][41694] Avg episode reward: [(0, '4.506')] +[2024-11-08 04:48:12,247][42004] Updated weights for policy 0, policy_version 38056 (0.0023) +[2024-11-08 04:48:12,932][41694] Fps is (10 sec: 7373.2, 60 sec: 6826.6, 300 sec: 6706.3). Total num frames: 155881472. Throughput: 0: 1756.5. Samples: 33966518. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:48:12,935][41694] Avg episode reward: [(0, '4.432')] +[2024-11-08 04:48:17,531][42004] Updated weights for policy 0, policy_version 38066 (0.0025) +[2024-11-08 04:48:17,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6998.9, 300 sec: 6761.9). Total num frames: 155918336. Throughput: 0: 1759.2. Samples: 33972290. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:48:17,934][41694] Avg episode reward: [(0, '4.400')] +[2024-11-08 04:48:22,742][42004] Updated weights for policy 0, policy_version 38076 (0.0033) +[2024-11-08 04:48:22,931][41694] Fps is (10 sec: 7782.7, 60 sec: 7168.0, 300 sec: 6789.7). Total num frames: 155959296. Throughput: 0: 1772.5. Samples: 33984128. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:48:22,936][41694] Avg episode reward: [(0, '4.324')] +[2024-11-08 04:48:27,932][41694] Fps is (10 sec: 6553.5, 60 sec: 6963.2, 300 sec: 6748.0). Total num frames: 155983872. Throughput: 0: 1700.0. Samples: 33992076. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:48:27,934][41694] Avg episode reward: [(0, '4.311')] +[2024-11-08 04:48:29,971][42004] Updated weights for policy 0, policy_version 38086 (0.0025) +[2024-11-08 04:48:32,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6894.9, 300 sec: 6734.1). Total num frames: 156016640. Throughput: 0: 1695.2. Samples: 33997632. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:48:32,934][41694] Avg episode reward: [(0, '4.499')] +[2024-11-08 04:48:36,011][42004] Updated weights for policy 0, policy_version 38096 (0.0039) +[2024-11-08 04:48:37,931][41694] Fps is (10 sec: 6553.8, 60 sec: 6826.7, 300 sec: 6720.2). Total num frames: 156049408. Throughput: 0: 1673.3. Samples: 34007724. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:48:37,937][41694] Avg episode reward: [(0, '4.292')] +[2024-11-08 04:48:41,852][42004] Updated weights for policy 0, policy_version 38106 (0.0026) +[2024-11-08 04:48:42,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6894.9, 300 sec: 6734.1). Total num frames: 156090368. Throughput: 0: 1750.3. Samples: 34018454. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 04:48:42,933][41694] Avg episode reward: [(0, '4.376')] +[2024-11-08 04:48:47,038][42004] Updated weights for policy 0, policy_version 38116 (0.0025) +[2024-11-08 04:48:47,932][41694] Fps is (10 sec: 7782.3, 60 sec: 6894.9, 300 sec: 6734.1). Total num frames: 156127232. Throughput: 0: 1757.7. Samples: 34024250. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 04:48:47,934][41694] Avg episode reward: [(0, '4.519')] +[2024-11-08 04:48:52,344][42004] Updated weights for policy 0, policy_version 38126 (0.0040) +[2024-11-08 04:48:52,931][41694] Fps is (10 sec: 7782.6, 60 sec: 7168.0, 300 sec: 6817.4). Total num frames: 156168192. Throughput: 0: 1799.2. Samples: 34035978. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 04:48:52,934][41694] Avg episode reward: [(0, '4.391')] +[2024-11-08 04:48:57,711][42004] Updated weights for policy 0, policy_version 38136 (0.0030) +[2024-11-08 04:48:57,931][41694] Fps is (10 sec: 7782.4, 60 sec: 7168.2, 300 sec: 6817.4). Total num frames: 156205056. Throughput: 0: 1799.4. Samples: 34047490. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 04:48:57,933][41694] Avg episode reward: [(0, '4.569')] +[2024-11-08 04:49:02,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6963.3, 300 sec: 6761.9). Total num frames: 156225536. Throughput: 0: 1756.2. Samples: 34051320. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 04:49:02,933][41694] Avg episode reward: [(0, '4.554')] +[2024-11-08 04:49:05,634][42004] Updated weights for policy 0, policy_version 38146 (0.0030) +[2024-11-08 04:49:07,931][41694] Fps is (10 sec: 5324.8, 60 sec: 6963.2, 300 sec: 6748.0). Total num frames: 156258304. Throughput: 0: 1679.7. Samples: 34059714. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:49:07,933][41694] Avg episode reward: [(0, '4.546')] +[2024-11-08 04:49:11,518][42004] Updated weights for policy 0, policy_version 38156 (0.0028) +[2024-11-08 04:49:12,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6895.0, 300 sec: 6748.0). Total num frames: 156295168. Throughput: 0: 1732.0. Samples: 34070014. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:49:12,934][41694] Avg episode reward: [(0, '4.269')] +[2024-11-08 04:49:17,180][42004] Updated weights for policy 0, policy_version 38166 (0.0024) +[2024-11-08 04:49:17,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6894.9, 300 sec: 6734.1). Total num frames: 156332032. Throughput: 0: 1723.9. Samples: 34075206. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:49:17,933][41694] Avg episode reward: [(0, '4.286')] +[2024-11-08 04:49:22,259][42004] Updated weights for policy 0, policy_version 38176 (0.0023) +[2024-11-08 04:49:22,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6894.9, 300 sec: 6797.0). Total num frames: 156372992. Throughput: 0: 1767.0. Samples: 34087240. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:49:22,935][41694] Avg episode reward: [(0, '4.646')] +[2024-11-08 04:49:27,670][42004] Updated weights for policy 0, policy_version 38186 (0.0027) +[2024-11-08 04:49:27,932][41694] Fps is (10 sec: 7782.1, 60 sec: 7099.7, 300 sec: 6817.4). Total num frames: 156409856. Throughput: 0: 1789.4. Samples: 34098978. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 04:49:27,936][41694] Avg episode reward: [(0, '4.470')] +[2024-11-08 04:49:32,932][41694] Fps is (10 sec: 7372.6, 60 sec: 7168.0, 300 sec: 6817.4). Total num frames: 156446720. Throughput: 0: 1783.3. Samples: 34104498. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 04:49:32,934][41694] Avg episode reward: [(0, '4.439')] +[2024-11-08 04:49:33,119][42004] Updated weights for policy 0, policy_version 38196 (0.0025) +[2024-11-08 04:49:37,932][41694] Fps is (10 sec: 6144.3, 60 sec: 7031.4, 300 sec: 6761.9). Total num frames: 156471296. Throughput: 0: 1693.5. Samples: 34112186. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 04:49:37,933][41694] Avg episode reward: [(0, '4.464')] +[2024-11-08 04:49:37,947][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000038201_156471296.pth... +[2024-11-08 04:49:38,067][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000037800_154828800.pth +[2024-11-08 04:49:40,642][42004] Updated weights for policy 0, policy_version 38206 (0.0027) +[2024-11-08 04:49:42,931][41694] Fps is (10 sec: 6144.2, 60 sec: 6963.2, 300 sec: 6775.8). Total num frames: 156508160. Throughput: 0: 1676.3. Samples: 34122922. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 04:49:42,933][41694] Avg episode reward: [(0, '4.395')] +[2024-11-08 04:49:46,699][42004] Updated weights for policy 0, policy_version 38216 (0.0031) +[2024-11-08 04:49:47,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6894.9, 300 sec: 6775.8). Total num frames: 156540928. Throughput: 0: 1699.2. Samples: 34127786. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 04:49:47,933][41694] Avg episode reward: [(0, '4.524')] +[2024-11-08 04:49:52,204][42004] Updated weights for policy 0, policy_version 38226 (0.0033) +[2024-11-08 04:49:52,932][41694] Fps is (10 sec: 6963.0, 60 sec: 6826.6, 300 sec: 6775.8). Total num frames: 156577792. Throughput: 0: 1753.9. Samples: 34138642. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:49:52,935][41694] Avg episode reward: [(0, '4.409')] +[2024-11-08 04:49:57,374][42004] Updated weights for policy 0, policy_version 38236 (0.0028) +[2024-11-08 04:49:57,931][41694] Fps is (10 sec: 7782.6, 60 sec: 6894.9, 300 sec: 6845.4). Total num frames: 156618752. Throughput: 0: 1788.4. Samples: 34150492. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:49:57,933][41694] Avg episode reward: [(0, '4.473')] +[2024-11-08 04:50:02,868][42004] Updated weights for policy 0, policy_version 38246 (0.0030) +[2024-11-08 04:50:02,932][41694] Fps is (10 sec: 7782.5, 60 sec: 7168.0, 300 sec: 6859.1). Total num frames: 156655616. Throughput: 0: 1801.2. Samples: 34156258. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:50:02,934][41694] Avg episode reward: [(0, '4.253')] +[2024-11-08 04:50:09,338][41694] Fps is (10 sec: 6104.7, 60 sec: 7003.9, 300 sec: 6812.7). Total num frames: 156688384. Throughput: 0: 1722.4. Samples: 34167168. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:50:09,340][41694] Avg episode reward: [(0, '4.411')] +[2024-11-08 04:50:10,198][42004] Updated weights for policy 0, policy_version 38256 (0.0022) +[2024-11-08 04:50:12,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6963.2, 300 sec: 6817.4). Total num frames: 156712960. Throughput: 0: 1675.2. Samples: 34174360. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:50:12,933][41694] Avg episode reward: [(0, '4.407')] +[2024-11-08 04:50:16,504][42004] Updated weights for policy 0, policy_version 38266 (0.0038) +[2024-11-08 04:50:17,932][41694] Fps is (10 sec: 6196.0, 60 sec: 6826.7, 300 sec: 6789.7). Total num frames: 156741632. Throughput: 0: 1665.5. Samples: 34179444. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:50:17,933][41694] Avg episode reward: [(0, '4.618')] +[2024-11-08 04:50:22,932][41694] Fps is (10 sec: 6143.9, 60 sec: 6690.1, 300 sec: 6789.7). Total num frames: 156774400. Throughput: 0: 1693.3. Samples: 34188384. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:50:22,937][41694] Avg episode reward: [(0, '4.481')] +[2024-11-08 04:50:23,129][42004] Updated weights for policy 0, policy_version 38276 (0.0035) +[2024-11-08 04:50:27,932][41694] Fps is (10 sec: 7372.5, 60 sec: 6758.4, 300 sec: 6817.4). Total num frames: 156815360. Throughput: 0: 1711.6. Samples: 34199946. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:50:27,935][41694] Avg episode reward: [(0, '4.404')] +[2024-11-08 04:50:28,310][42004] Updated weights for policy 0, policy_version 38286 (0.0026) +[2024-11-08 04:50:32,932][41694] Fps is (10 sec: 7782.5, 60 sec: 6758.4, 300 sec: 6872.9). Total num frames: 156852224. Throughput: 0: 1721.6. Samples: 34205260. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:50:32,934][41694] Avg episode reward: [(0, '4.574')] +[2024-11-08 04:50:33,954][42004] Updated weights for policy 0, policy_version 38296 (0.0034) +[2024-11-08 04:50:37,932][41694] Fps is (10 sec: 7373.1, 60 sec: 6963.2, 300 sec: 6872.9). Total num frames: 156889088. Throughput: 0: 1732.6. Samples: 34216608. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:50:37,934][41694] Avg episode reward: [(0, '4.380')] +[2024-11-08 04:50:39,566][42004] Updated weights for policy 0, policy_version 38306 (0.0028) +[2024-11-08 04:50:43,356][41694] Fps is (10 sec: 5893.7, 60 sec: 6710.9, 300 sec: 6821.5). Total num frames: 156913664. Throughput: 0: 1578.9. Samples: 34222214. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:50:43,358][41694] Avg episode reward: [(0, '4.406')] +[2024-11-08 04:50:47,022][42004] Updated weights for policy 0, policy_version 38316 (0.0034) +[2024-11-08 04:50:47,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6758.4, 300 sec: 6817.4). Total num frames: 156946432. Throughput: 0: 1624.4. Samples: 34229354. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 04:50:47,934][41694] Avg episode reward: [(0, '4.571')] +[2024-11-08 04:50:52,652][42004] Updated weights for policy 0, policy_version 38326 (0.0027) +[2024-11-08 04:50:52,932][41694] Fps is (10 sec: 7272.2, 60 sec: 6758.4, 300 sec: 6817.4). Total num frames: 156983296. Throughput: 0: 1681.8. Samples: 34240486. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 04:50:52,933][41694] Avg episode reward: [(0, '4.437')] +[2024-11-08 04:50:57,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6690.1, 300 sec: 6831.3). Total num frames: 157020160. Throughput: 0: 1707.6. Samples: 34251204. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 04:50:57,933][41694] Avg episode reward: [(0, '4.471')] +[2024-11-08 04:50:58,163][42004] Updated weights for policy 0, policy_version 38336 (0.0029) +[2024-11-08 04:51:02,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6690.1, 300 sec: 6817.4). Total num frames: 157057024. Throughput: 0: 1724.8. Samples: 34257058. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 04:51:02,934][41694] Avg episode reward: [(0, '4.263')] +[2024-11-08 04:51:03,612][42004] Updated weights for policy 0, policy_version 38346 (0.0027) +[2024-11-08 04:51:07,931][41694] Fps is (10 sec: 7782.4, 60 sec: 6990.5, 300 sec: 6900.7). Total num frames: 157097984. Throughput: 0: 1783.9. Samples: 34268660. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 04:51:07,933][41694] Avg episode reward: [(0, '4.631')] +[2024-11-08 04:51:08,878][42004] Updated weights for policy 0, policy_version 38356 (0.0030) +[2024-11-08 04:51:12,933][41694] Fps is (10 sec: 7372.0, 60 sec: 6963.1, 300 sec: 6900.7). Total num frames: 157130752. Throughput: 0: 1752.6. Samples: 34278814. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:51:12,936][41694] Avg episode reward: [(0, '4.325')] +[2024-11-08 04:51:15,261][42004] Updated weights for policy 0, policy_version 38366 (0.0030) +[2024-11-08 04:51:17,934][41694] Fps is (10 sec: 5324.4, 60 sec: 6826.6, 300 sec: 6859.0). Total num frames: 157151232. Throughput: 0: 1744.6. Samples: 34283768. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:51:17,937][41694] Avg episode reward: [(0, '4.344')] +[2024-11-08 04:51:22,693][42004] Updated weights for policy 0, policy_version 38376 (0.0024) +[2024-11-08 04:51:22,932][41694] Fps is (10 sec: 5735.0, 60 sec: 6895.0, 300 sec: 6859.1). Total num frames: 157188096. Throughput: 0: 1659.8. Samples: 34291300. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:51:22,934][41694] Avg episode reward: [(0, '4.506')] +[2024-11-08 04:51:27,932][41694] Fps is (10 sec: 6963.8, 60 sec: 6758.4, 300 sec: 6845.2). Total num frames: 157220864. Throughput: 0: 1790.8. Samples: 34302038. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:51:27,934][41694] Avg episode reward: [(0, '4.397')] +[2024-11-08 04:51:28,503][42004] Updated weights for policy 0, policy_version 38386 (0.0022) +[2024-11-08 04:51:32,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6826.7, 300 sec: 6859.1). Total num frames: 157261824. Throughput: 0: 1734.4. Samples: 34307400. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:51:32,933][41694] Avg episode reward: [(0, '4.521')] +[2024-11-08 04:51:33,817][42004] Updated weights for policy 0, policy_version 38396 (0.0024) +[2024-11-08 04:51:37,931][41694] Fps is (10 sec: 7782.4, 60 sec: 6826.7, 300 sec: 6873.1). Total num frames: 157298688. Throughput: 0: 1747.2. Samples: 34319110. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:51:37,933][41694] Avg episode reward: [(0, '4.400')] +[2024-11-08 04:51:38,017][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000038404_157302784.pth... +[2024-11-08 04:51:38,108][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000037998_155639808.pth +[2024-11-08 04:51:39,102][42004] Updated weights for policy 0, policy_version 38406 (0.0034) +[2024-11-08 04:51:42,931][41694] Fps is (10 sec: 7372.8, 60 sec: 7081.6, 300 sec: 6928.5). Total num frames: 157335552. Throughput: 0: 1762.1. Samples: 34330498. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:51:42,933][41694] Avg episode reward: [(0, '4.164')] +[2024-11-08 04:51:44,704][42004] Updated weights for policy 0, policy_version 38416 (0.0029) +[2024-11-08 04:51:47,932][41694] Fps is (10 sec: 7372.8, 60 sec: 7099.7, 300 sec: 6942.4). Total num frames: 157372416. Throughput: 0: 1755.8. Samples: 34336068. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:51:47,934][41694] Avg episode reward: [(0, '4.365')] +[2024-11-08 04:51:52,451][42004] Updated weights for policy 0, policy_version 38426 (0.0028) +[2024-11-08 04:51:52,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6826.7, 300 sec: 6886.8). Total num frames: 157392896. Throughput: 0: 1679.6. Samples: 34344240. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:51:52,934][41694] Avg episode reward: [(0, '4.195')] +[2024-11-08 04:51:57,931][41694] Fps is (10 sec: 5734.4, 60 sec: 6826.7, 300 sec: 6886.8). Total num frames: 157429760. Throughput: 0: 1671.7. Samples: 34354040. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:51:57,933][41694] Avg episode reward: [(0, '4.239')] +[2024-11-08 04:51:58,118][42004] Updated weights for policy 0, policy_version 38436 (0.0030) +[2024-11-08 04:52:02,937][41694] Fps is (10 sec: 6959.1, 60 sec: 6757.7, 300 sec: 6872.8). Total num frames: 157462528. Throughput: 0: 1667.5. Samples: 34358816. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:52:02,940][41694] Avg episode reward: [(0, '4.476')] +[2024-11-08 04:52:04,952][42004] Updated weights for policy 0, policy_version 38446 (0.0045) +[2024-11-08 04:52:07,932][41694] Fps is (10 sec: 6553.1, 60 sec: 6621.8, 300 sec: 6859.0). Total num frames: 157495296. Throughput: 0: 1711.0. Samples: 34368296. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:52:07,934][41694] Avg episode reward: [(0, '4.584')] +[2024-11-08 04:52:11,109][42004] Updated weights for policy 0, policy_version 38456 (0.0031) +[2024-11-08 04:52:12,931][41694] Fps is (10 sec: 6147.7, 60 sec: 6553.7, 300 sec: 6865.9). Total num frames: 157523968. Throughput: 0: 1688.2. Samples: 34378006. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:52:12,933][41694] Avg episode reward: [(0, '4.347')] +[2024-11-08 04:52:16,986][42004] Updated weights for policy 0, policy_version 38466 (0.0029) +[2024-11-08 04:52:17,931][41694] Fps is (10 sec: 6554.1, 60 sec: 6826.8, 300 sec: 6886.8). Total num frames: 157560832. Throughput: 0: 1682.4. Samples: 34383110. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:52:17,934][41694] Avg episode reward: [(0, '4.271')] +[2024-11-08 04:52:22,721][42004] Updated weights for policy 0, policy_version 38476 (0.0031) +[2024-11-08 04:52:22,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6826.7, 300 sec: 6886.8). Total num frames: 157597696. Throughput: 0: 1659.5. Samples: 34393786. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:52:22,933][41694] Avg episode reward: [(0, '4.353')] +[2024-11-08 04:52:27,814][42004] Updated weights for policy 0, policy_version 38486 (0.0029) +[2024-11-08 04:52:27,932][41694] Fps is (10 sec: 7782.3, 60 sec: 6963.2, 300 sec: 6900.7). Total num frames: 157638656. Throughput: 0: 1671.0. Samples: 34405694. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:52:27,933][41694] Avg episode reward: [(0, '4.516')] +[2024-11-08 04:52:32,931][41694] Fps is (10 sec: 7782.4, 60 sec: 6894.9, 300 sec: 6900.7). Total num frames: 157675520. Throughput: 0: 1665.3. Samples: 34411008. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:52:32,933][41694] Avg episode reward: [(0, '4.605')] +[2024-11-08 04:52:33,452][42004] Updated weights for policy 0, policy_version 38496 (0.0024) +[2024-11-08 04:52:37,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6826.7, 300 sec: 6886.8). Total num frames: 157708288. Throughput: 0: 1721.2. Samples: 34421694. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:52:37,934][41694] Avg episode reward: [(0, '4.554')] +[2024-11-08 04:52:39,106][42004] Updated weights for policy 0, policy_version 38506 (0.0037) +[2024-11-08 04:52:42,934][41694] Fps is (10 sec: 7370.6, 60 sec: 6894.6, 300 sec: 6900.7). Total num frames: 157749248. Throughput: 0: 1755.9. Samples: 34433062. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:52:42,938][41694] Avg episode reward: [(0, '4.199')] +[2024-11-08 04:52:44,578][42004] Updated weights for policy 0, policy_version 38516 (0.0026) +[2024-11-08 04:52:47,932][41694] Fps is (10 sec: 7372.4, 60 sec: 6826.6, 300 sec: 6928.5). Total num frames: 157782016. Throughput: 0: 1779.1. Samples: 34438868. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:52:47,934][41694] Avg episode reward: [(0, '4.515')] +[2024-11-08 04:52:50,520][42004] Updated weights for policy 0, policy_version 38526 (0.0033) +[2024-11-08 04:52:52,932][41694] Fps is (10 sec: 6964.9, 60 sec: 7099.7, 300 sec: 6928.5). Total num frames: 157818880. Throughput: 0: 1795.7. Samples: 34449102. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:52:52,935][41694] Avg episode reward: [(0, '4.325')] +[2024-11-08 04:52:57,489][42004] Updated weights for policy 0, policy_version 38536 (0.0028) +[2024-11-08 04:52:57,934][41694] Fps is (10 sec: 6142.7, 60 sec: 6894.6, 300 sec: 6900.7). Total num frames: 157843456. Throughput: 0: 1766.6. Samples: 34457506. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:52:57,939][41694] Avg episode reward: [(0, '4.492')] +[2024-11-08 04:53:02,932][41694] Fps is (10 sec: 6144.3, 60 sec: 6963.9, 300 sec: 6914.6). Total num frames: 157880320. Throughput: 0: 1769.2. Samples: 34462722. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:53:02,934][41694] Avg episode reward: [(0, '4.422')] +[2024-11-08 04:53:03,512][42004] Updated weights for policy 0, policy_version 38546 (0.0034) +[2024-11-08 04:53:07,932][41694] Fps is (10 sec: 6965.0, 60 sec: 6963.3, 300 sec: 6886.8). Total num frames: 157913088. Throughput: 0: 1761.4. Samples: 34473048. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:53:07,934][41694] Avg episode reward: [(0, '4.473')] +[2024-11-08 04:53:09,488][42004] Updated weights for policy 0, policy_version 38556 (0.0032) +[2024-11-08 04:53:12,932][41694] Fps is (10 sec: 6963.0, 60 sec: 7099.7, 300 sec: 6886.8). Total num frames: 157949952. Throughput: 0: 1730.6. Samples: 34483572. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:53:12,934][41694] Avg episode reward: [(0, '4.611')] +[2024-11-08 04:53:15,032][42004] Updated weights for policy 0, policy_version 38566 (0.0026) +[2024-11-08 04:53:17,932][41694] Fps is (10 sec: 7372.7, 60 sec: 7099.7, 300 sec: 6872.9). Total num frames: 157986816. Throughput: 0: 1735.2. Samples: 34489092. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:53:17,934][41694] Avg episode reward: [(0, '4.497')] +[2024-11-08 04:53:20,785][42004] Updated weights for policy 0, policy_version 38576 (0.0021) +[2024-11-08 04:53:22,932][41694] Fps is (10 sec: 6963.4, 60 sec: 7031.5, 300 sec: 6900.7). Total num frames: 158019584. Throughput: 0: 1737.6. Samples: 34499886. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:53:22,933][41694] Avg episode reward: [(0, '4.340')] +[2024-11-08 04:53:26,472][42004] Updated weights for policy 0, policy_version 38586 (0.0038) +[2024-11-08 04:53:27,931][41694] Fps is (10 sec: 6963.5, 60 sec: 6963.2, 300 sec: 6914.6). Total num frames: 158056448. Throughput: 0: 1724.5. Samples: 34510658. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:53:27,934][41694] Avg episode reward: [(0, '4.508')] +[2024-11-08 04:53:32,931][41694] Fps is (10 sec: 6144.1, 60 sec: 6758.4, 300 sec: 6886.8). Total num frames: 158081024. Throughput: 0: 1677.4. Samples: 34514352. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:53:32,933][41694] Avg episode reward: [(0, '4.369')] +[2024-11-08 04:53:33,883][42004] Updated weights for policy 0, policy_version 38596 (0.0029) +[2024-11-08 04:53:37,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6758.4, 300 sec: 6859.1). Total num frames: 158113792. Throughput: 0: 1653.5. Samples: 34523508. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 04:53:37,934][41694] Avg episode reward: [(0, '4.602')] +[2024-11-08 04:53:38,075][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000038603_158117888.pth... +[2024-11-08 04:53:38,226][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000038201_156471296.pth +[2024-11-08 04:53:40,224][42004] Updated weights for policy 0, policy_version 38606 (0.0034) +[2024-11-08 04:53:42,932][41694] Fps is (10 sec: 6553.5, 60 sec: 6622.2, 300 sec: 6845.2). Total num frames: 158146560. Throughput: 0: 1673.0. Samples: 34532786. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 04:53:42,933][41694] Avg episode reward: [(0, '4.518')] +[2024-11-08 04:53:46,242][42004] Updated weights for policy 0, policy_version 38616 (0.0030) +[2024-11-08 04:53:47,931][41694] Fps is (10 sec: 6553.7, 60 sec: 6622.0, 300 sec: 6817.4). Total num frames: 158179328. Throughput: 0: 1671.3. Samples: 34537930. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 04:53:47,934][41694] Avg episode reward: [(0, '4.225')] +[2024-11-08 04:53:51,793][42004] Updated weights for policy 0, policy_version 38626 (0.0031) +[2024-11-08 04:53:52,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6690.2, 300 sec: 6831.3). Total num frames: 158220288. Throughput: 0: 1692.9. Samples: 34549230. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 04:53:52,934][41694] Avg episode reward: [(0, '4.390')] +[2024-11-08 04:53:57,287][42004] Updated weights for policy 0, policy_version 38636 (0.0026) +[2024-11-08 04:53:57,931][41694] Fps is (10 sec: 7782.3, 60 sec: 6895.3, 300 sec: 6886.8). Total num frames: 158257152. Throughput: 0: 1704.9. Samples: 34560294. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 04:53:57,933][41694] Avg episode reward: [(0, '4.578')] +[2024-11-08 04:54:03,896][41694] Fps is (10 sec: 6350.9, 60 sec: 6718.7, 300 sec: 6864.4). Total num frames: 158289920. Throughput: 0: 1668.7. Samples: 34565792. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:54:03,898][41694] Avg episode reward: [(0, '4.338')] +[2024-11-08 04:54:04,064][42004] Updated weights for policy 0, policy_version 38646 (0.0024) +[2024-11-08 04:54:07,931][41694] Fps is (10 sec: 6553.6, 60 sec: 6826.7, 300 sec: 6873.0). Total num frames: 158322688. Throughput: 0: 1663.7. Samples: 34574754. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:54:07,933][41694] Avg episode reward: [(0, '4.421')] +[2024-11-08 04:54:09,474][42004] Updated weights for policy 0, policy_version 38656 (0.0023) +[2024-11-08 04:54:12,932][41694] Fps is (10 sec: 7706.1, 60 sec: 6826.7, 300 sec: 6873.0). Total num frames: 158359552. Throughput: 0: 1677.4. Samples: 34586140. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:54:12,933][41694] Avg episode reward: [(0, '4.528')] +[2024-11-08 04:54:15,310][42004] Updated weights for policy 0, policy_version 38666 (0.0030) +[2024-11-08 04:54:17,932][41694] Fps is (10 sec: 6963.0, 60 sec: 6758.4, 300 sec: 6845.2). Total num frames: 158392320. Throughput: 0: 1702.2. Samples: 34590952. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:54:17,935][41694] Avg episode reward: [(0, '4.514')] +[2024-11-08 04:54:20,844][42004] Updated weights for policy 0, policy_version 38676 (0.0028) +[2024-11-08 04:54:22,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6826.7, 300 sec: 6845.2). Total num frames: 158429184. Throughput: 0: 1748.8. Samples: 34602204. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:54:22,933][41694] Avg episode reward: [(0, '4.536')] +[2024-11-08 04:54:26,271][42004] Updated weights for policy 0, policy_version 38686 (0.0027) +[2024-11-08 04:54:27,932][41694] Fps is (10 sec: 7372.9, 60 sec: 6826.7, 300 sec: 6845.2). Total num frames: 158466048. Throughput: 0: 1790.0. Samples: 34613336. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:54:27,933][41694] Avg episode reward: [(0, '4.385')] +[2024-11-08 04:54:31,843][42004] Updated weights for policy 0, policy_version 38696 (0.0023) +[2024-11-08 04:54:32,932][41694] Fps is (10 sec: 7782.1, 60 sec: 7099.7, 300 sec: 6900.7). Total num frames: 158507008. Throughput: 0: 1793.0. Samples: 34618616. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:54:32,935][41694] Avg episode reward: [(0, '4.242')] +[2024-11-08 04:54:38,017][41694] Fps is (10 sec: 6091.9, 60 sec: 6885.1, 300 sec: 6843.2). Total num frames: 158527488. Throughput: 0: 1787.9. Samples: 34629840. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:54:38,019][41694] Avg episode reward: [(0, '4.455')] +[2024-11-08 04:54:39,448][42004] Updated weights for policy 0, policy_version 38706 (0.0034) +[2024-11-08 04:54:42,932][41694] Fps is (10 sec: 5324.9, 60 sec: 6894.9, 300 sec: 6845.2). Total num frames: 158560256. Throughput: 0: 1704.0. Samples: 34636972. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:54:42,934][41694] Avg episode reward: [(0, '4.311')] +[2024-11-08 04:54:45,316][42004] Updated weights for policy 0, policy_version 38716 (0.0039) +[2024-11-08 04:54:47,934][41694] Fps is (10 sec: 7021.9, 60 sec: 6962.9, 300 sec: 6845.1). Total num frames: 158597120. Throughput: 0: 1733.7. Samples: 34642142. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:54:47,936][41694] Avg episode reward: [(0, '4.593')] +[2024-11-08 04:54:51,537][42004] Updated weights for policy 0, policy_version 38726 (0.0024) +[2024-11-08 04:54:52,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6826.7, 300 sec: 6817.4). Total num frames: 158629888. Throughput: 0: 1720.1. Samples: 34652158. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:54:52,959][41694] Avg episode reward: [(0, '4.666')] +[2024-11-08 04:54:56,885][42004] Updated weights for policy 0, policy_version 38736 (0.0030) +[2024-11-08 04:54:57,932][41694] Fps is (10 sec: 6964.5, 60 sec: 6826.6, 300 sec: 6817.4). Total num frames: 158666752. Throughput: 0: 1716.4. Samples: 34663380. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:54:57,934][41694] Avg episode reward: [(0, '4.554')] +[2024-11-08 04:55:02,754][42004] Updated weights for policy 0, policy_version 38746 (0.0025) +[2024-11-08 04:55:02,932][41694] Fps is (10 sec: 7372.8, 60 sec: 7007.5, 300 sec: 6864.0). Total num frames: 158703616. Throughput: 0: 1731.5. Samples: 34668870. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:55:02,933][41694] Avg episode reward: [(0, '4.648')] +[2024-11-08 04:55:07,932][41694] Fps is (10 sec: 6963.4, 60 sec: 6894.9, 300 sec: 6859.1). Total num frames: 158736384. Throughput: 0: 1701.1. Samples: 34678754. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:55:07,934][41694] Avg episode reward: [(0, '4.516')] +[2024-11-08 04:55:08,529][42004] Updated weights for policy 0, policy_version 38756 (0.0031) +[2024-11-08 04:55:12,932][41694] Fps is (10 sec: 6143.9, 60 sec: 6758.4, 300 sec: 6859.1). Total num frames: 158765056. Throughput: 0: 1650.9. Samples: 34687628. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:55:12,934][41694] Avg episode reward: [(0, '4.544')] +[2024-11-08 04:55:15,914][42004] Updated weights for policy 0, policy_version 38766 (0.0020) +[2024-11-08 04:55:17,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6690.1, 300 sec: 6845.2). Total num frames: 158793728. Throughput: 0: 1645.9. Samples: 34692680. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:55:17,934][41694] Avg episode reward: [(0, '4.633')] +[2024-11-08 04:55:22,933][41694] Fps is (10 sec: 4914.8, 60 sec: 6416.9, 300 sec: 6775.7). Total num frames: 158814208. Throughput: 0: 1554.3. Samples: 34699654. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:55:22,936][41694] Avg episode reward: [(0, '4.575')] +[2024-11-08 04:55:24,503][42004] Updated weights for policy 0, policy_version 38776 (0.0041) +[2024-11-08 04:55:27,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6417.1, 300 sec: 6775.8). Total num frames: 158851072. Throughput: 0: 1592.5. Samples: 34708636. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:55:27,934][41694] Avg episode reward: [(0, '4.542')] +[2024-11-08 04:55:30,148][42004] Updated weights for policy 0, policy_version 38786 (0.0022) +[2024-11-08 04:55:32,932][41694] Fps is (10 sec: 7373.4, 60 sec: 6348.8, 300 sec: 6775.7). Total num frames: 158887936. Throughput: 0: 1600.0. Samples: 34714140. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:55:32,935][41694] Avg episode reward: [(0, '4.427')] +[2024-11-08 04:55:35,814][42004] Updated weights for policy 0, policy_version 38796 (0.0030) +[2024-11-08 04:55:37,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6563.0, 300 sec: 6813.3). Total num frames: 158920704. Throughput: 0: 1617.0. Samples: 34724924. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:55:37,935][41694] Avg episode reward: [(0, '4.456')] +[2024-11-08 04:55:38,077][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000038800_158924800.pth... +[2024-11-08 04:55:38,197][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000038404_157302784.pth +[2024-11-08 04:55:41,546][42004] Updated weights for policy 0, policy_version 38806 (0.0038) +[2024-11-08 04:55:42,931][41694] Fps is (10 sec: 6963.5, 60 sec: 6621.9, 300 sec: 6817.4). Total num frames: 158957568. Throughput: 0: 1604.4. Samples: 34735576. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:55:42,933][41694] Avg episode reward: [(0, '4.777')] +[2024-11-08 04:55:47,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6349.0, 300 sec: 6761.9). Total num frames: 158978048. Throughput: 0: 1536.2. Samples: 34738000. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:55:47,933][41694] Avg episode reward: [(0, '4.506')] +[2024-11-08 04:55:49,077][42004] Updated weights for policy 0, policy_version 38816 (0.0041) +[2024-11-08 04:55:52,931][41694] Fps is (10 sec: 5734.4, 60 sec: 6417.1, 300 sec: 6761.9). Total num frames: 159014912. Throughput: 0: 1543.0. Samples: 34748188. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:55:52,933][41694] Avg episode reward: [(0, '4.594')] +[2024-11-08 04:55:54,736][42004] Updated weights for policy 0, policy_version 38826 (0.0031) +[2024-11-08 04:55:57,932][41694] Fps is (10 sec: 6963.0, 60 sec: 6348.8, 300 sec: 6748.0). Total num frames: 159047680. Throughput: 0: 1582.2. Samples: 34758826. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:55:57,936][41694] Avg episode reward: [(0, '4.577')] +[2024-11-08 04:56:01,427][42004] Updated weights for policy 0, policy_version 38836 (0.0035) +[2024-11-08 04:56:02,931][41694] Fps is (10 sec: 6553.7, 60 sec: 6280.5, 300 sec: 6720.2). Total num frames: 159080448. Throughput: 0: 1566.4. Samples: 34763166. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:56:02,933][41694] Avg episode reward: [(0, '4.353')] +[2024-11-08 04:56:07,399][42004] Updated weights for policy 0, policy_version 38846 (0.0030) +[2024-11-08 04:56:07,932][41694] Fps is (10 sec: 6553.8, 60 sec: 6280.5, 300 sec: 6720.2). Total num frames: 159113216. Throughput: 0: 1634.4. Samples: 34773200. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:56:07,934][41694] Avg episode reward: [(0, '4.621')] +[2024-11-08 04:56:12,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6417.1, 300 sec: 6775.8). Total num frames: 159150080. Throughput: 0: 1677.6. Samples: 34784128. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:56:12,934][41694] Avg episode reward: [(0, '4.501')] +[2024-11-08 04:56:13,088][42004] Updated weights for policy 0, policy_version 38856 (0.0044) +[2024-11-08 04:56:18,817][41694] Fps is (10 sec: 6396.7, 60 sec: 6391.0, 300 sec: 6741.6). Total num frames: 159182848. Throughput: 0: 1641.0. Samples: 34789436. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:56:18,819][41694] Avg episode reward: [(0, '4.513')] +[2024-11-08 04:56:20,286][42004] Updated weights for policy 0, policy_version 38866 (0.0027) +[2024-11-08 04:56:22,932][41694] Fps is (10 sec: 6143.9, 60 sec: 6622.0, 300 sec: 6748.0). Total num frames: 159211520. Throughput: 0: 1618.4. Samples: 34797752. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:56:22,935][41694] Avg episode reward: [(0, '4.260')] +[2024-11-08 04:56:25,868][42004] Updated weights for policy 0, policy_version 38876 (0.0026) +[2024-11-08 04:56:27,931][41694] Fps is (10 sec: 7190.4, 60 sec: 6621.9, 300 sec: 6734.1). Total num frames: 159248384. Throughput: 0: 1627.0. Samples: 34808792. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:56:27,934][41694] Avg episode reward: [(0, '4.487')] +[2024-11-08 04:56:31,875][42004] Updated weights for policy 0, policy_version 38886 (0.0033) +[2024-11-08 04:56:32,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6553.6, 300 sec: 6720.2). Total num frames: 159281152. Throughput: 0: 1681.8. Samples: 34813680. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:56:32,934][41694] Avg episode reward: [(0, '4.390')] +[2024-11-08 04:56:37,399][42004] Updated weights for policy 0, policy_version 38896 (0.0023) +[2024-11-08 04:56:37,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6621.9, 300 sec: 6720.2). Total num frames: 159318016. Throughput: 0: 1691.6. Samples: 34824310. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:56:37,933][41694] Avg episode reward: [(0, '4.514')] +[2024-11-08 04:56:42,932][41694] Fps is (10 sec: 7372.9, 60 sec: 6621.9, 300 sec: 6720.2). Total num frames: 159354880. Throughput: 0: 1705.7. Samples: 34835580. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:56:42,934][41694] Avg episode reward: [(0, '4.600')] +[2024-11-08 04:56:42,940][42004] Updated weights for policy 0, policy_version 38906 (0.0031) +[2024-11-08 04:56:47,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6894.9, 300 sec: 6775.8). Total num frames: 159391744. Throughput: 0: 1726.4. Samples: 34840854. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:56:47,933][41694] Avg episode reward: [(0, '4.551')] +[2024-11-08 04:56:48,500][42004] Updated weights for policy 0, policy_version 38916 (0.0037) +[2024-11-08 04:56:52,932][41694] Fps is (10 sec: 6144.1, 60 sec: 6690.1, 300 sec: 6734.1). Total num frames: 159416320. Throughput: 0: 1751.0. Samples: 34851994. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:56:52,933][41694] Avg episode reward: [(0, '4.422')] +[2024-11-08 04:56:55,729][42004] Updated weights for policy 0, policy_version 38926 (0.0037) +[2024-11-08 04:56:57,935][41694] Fps is (10 sec: 6141.8, 60 sec: 6758.0, 300 sec: 6748.0). Total num frames: 159453184. Throughput: 0: 1690.4. Samples: 34860200. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:56:57,939][41694] Avg episode reward: [(0, '4.387')] +[2024-11-08 04:57:01,395][42004] Updated weights for policy 0, policy_version 38936 (0.0033) +[2024-11-08 04:57:02,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6826.7, 300 sec: 6761.9). Total num frames: 159490048. Throughput: 0: 1723.8. Samples: 34865480. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:57:02,933][41694] Avg episode reward: [(0, '4.519')] +[2024-11-08 04:57:07,810][42004] Updated weights for policy 0, policy_version 38946 (0.0023) +[2024-11-08 04:57:07,931][41694] Fps is (10 sec: 6965.7, 60 sec: 6826.7, 300 sec: 6775.8). Total num frames: 159522816. Throughput: 0: 1717.2. Samples: 34875024. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:57:07,933][41694] Avg episode reward: [(0, '4.516')] +[2024-11-08 04:57:12,931][41694] Fps is (10 sec: 6963.1, 60 sec: 6826.7, 300 sec: 6775.8). Total num frames: 159559680. Throughput: 0: 1720.2. Samples: 34886202. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:57:12,933][41694] Avg episode reward: [(0, '4.311')] +[2024-11-08 04:57:13,281][42004] Updated weights for policy 0, policy_version 38956 (0.0032) +[2024-11-08 04:57:17,932][41694] Fps is (10 sec: 7372.6, 60 sec: 6998.2, 300 sec: 6775.8). Total num frames: 159596544. Throughput: 0: 1731.8. Samples: 34891612. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:57:17,933][41694] Avg episode reward: [(0, '4.616')] +[2024-11-08 04:57:18,547][42004] Updated weights for policy 0, policy_version 38966 (0.0028) +[2024-11-08 04:57:22,932][41694] Fps is (10 sec: 7372.8, 60 sec: 7031.5, 300 sec: 6761.9). Total num frames: 159633408. Throughput: 0: 1748.1. Samples: 34902976. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:57:22,933][41694] Avg episode reward: [(0, '4.453')] +[2024-11-08 04:57:24,099][42004] Updated weights for policy 0, policy_version 38976 (0.0029) +[2024-11-08 04:57:27,932][41694] Fps is (10 sec: 6144.0, 60 sec: 6826.7, 300 sec: 6720.2). Total num frames: 159657984. Throughput: 0: 1675.8. Samples: 34910992. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:57:27,934][41694] Avg episode reward: [(0, '4.312')] +[2024-11-08 04:57:31,582][42004] Updated weights for policy 0, policy_version 38986 (0.0028) +[2024-11-08 04:57:32,932][41694] Fps is (10 sec: 6144.0, 60 sec: 6895.0, 300 sec: 6734.1). Total num frames: 159694848. Throughput: 0: 1674.9. Samples: 34916224. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:57:32,934][41694] Avg episode reward: [(0, '4.377')] +[2024-11-08 04:57:37,094][42004] Updated weights for policy 0, policy_version 38996 (0.0043) +[2024-11-08 04:57:37,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6894.9, 300 sec: 6720.3). Total num frames: 159731712. Throughput: 0: 1681.9. Samples: 34927678. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:57:37,933][41694] Avg episode reward: [(0, '4.647')] +[2024-11-08 04:57:37,947][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000038997_159731712.pth... +[2024-11-08 04:57:38,060][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000038603_158117888.pth +[2024-11-08 04:57:42,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6826.7, 300 sec: 6720.2). Total num frames: 159764480. Throughput: 0: 1726.9. Samples: 34937902. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:57:42,932][41694] Avg episode reward: [(0, '4.434')] +[2024-11-08 04:57:42,977][42004] Updated weights for policy 0, policy_version 39006 (0.0035) +[2024-11-08 04:57:47,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6894.9, 300 sec: 6734.1). Total num frames: 159805440. Throughput: 0: 1733.3. Samples: 34943478. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:57:47,934][41694] Avg episode reward: [(0, '4.443')] +[2024-11-08 04:57:48,338][42004] Updated weights for policy 0, policy_version 39016 (0.0025) +[2024-11-08 04:57:52,932][41694] Fps is (10 sec: 7782.2, 60 sec: 7099.7, 300 sec: 6775.8). Total num frames: 159842304. Throughput: 0: 1774.9. Samples: 34954896. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:57:52,934][41694] Avg episode reward: [(0, '4.245')] +[2024-11-08 04:57:53,753][42004] Updated weights for policy 0, policy_version 39026 (0.0032) +[2024-11-08 04:57:57,931][41694] Fps is (10 sec: 7372.9, 60 sec: 7100.2, 300 sec: 6775.8). Total num frames: 159879168. Throughput: 0: 1772.2. Samples: 34965950. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:57:57,933][41694] Avg episode reward: [(0, '4.328')] +[2024-11-08 04:58:01,045][42004] Updated weights for policy 0, policy_version 39036 (0.0026) +[2024-11-08 04:58:02,932][41694] Fps is (10 sec: 5734.0, 60 sec: 6826.6, 300 sec: 6734.1). Total num frames: 159899648. Throughput: 0: 1724.3. Samples: 34969208. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:58:02,934][41694] Avg episode reward: [(0, '4.377')] +[2024-11-08 04:58:06,839][42004] Updated weights for policy 0, policy_version 39046 (0.0039) +[2024-11-08 04:58:07,931][41694] Fps is (10 sec: 5734.4, 60 sec: 6894.9, 300 sec: 6734.1). Total num frames: 159936512. Throughput: 0: 1688.8. Samples: 34978974. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:58:07,933][41694] Avg episode reward: [(0, '4.572')] +[2024-11-08 04:58:12,932][41694] Fps is (10 sec: 6554.1, 60 sec: 6758.4, 300 sec: 6706.3). Total num frames: 159965184. Throughput: 0: 1721.1. Samples: 34988442. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:58:12,933][41694] Avg episode reward: [(0, '4.589')] +[2024-11-08 04:58:13,467][42004] Updated weights for policy 0, policy_version 39056 (0.0033) +[2024-11-08 04:58:17,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6826.7, 300 sec: 6734.1). Total num frames: 160006144. Throughput: 0: 1713.9. Samples: 34993350. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:58:17,934][41694] Avg episode reward: [(0, '4.583')] +[2024-11-08 04:58:18,822][42004] Updated weights for policy 0, policy_version 39066 (0.0031) +[2024-11-08 04:58:22,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6826.7, 300 sec: 6734.1). Total num frames: 160043008. Throughput: 0: 1712.5. Samples: 35004740. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:58:22,934][41694] Avg episode reward: [(0, '4.369')] +[2024-11-08 04:58:24,390][42004] Updated weights for policy 0, policy_version 39076 (0.0026) +[2024-11-08 04:58:27,932][41694] Fps is (10 sec: 7372.7, 60 sec: 7031.5, 300 sec: 6775.8). Total num frames: 160079872. Throughput: 0: 1740.4. Samples: 35016222. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:58:27,934][41694] Avg episode reward: [(0, '4.385')] +[2024-11-08 04:58:29,998][42004] Updated weights for policy 0, policy_version 39086 (0.0029) +[2024-11-08 04:58:34,327][41694] Fps is (10 sec: 6110.7, 60 sec: 6805.0, 300 sec: 6743.9). Total num frames: 160112640. Throughput: 0: 1686.1. Samples: 35021704. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 04:58:34,328][41694] Avg episode reward: [(0, '4.496')] +[2024-11-08 04:58:37,581][42004] Updated weights for policy 0, policy_version 39096 (0.0033) +[2024-11-08 04:58:37,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6758.4, 300 sec: 6748.0). Total num frames: 160137216. Throughput: 0: 1648.5. Samples: 35029080. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:58:37,939][41694] Avg episode reward: [(0, '4.266')] +[2024-11-08 04:58:42,931][41694] Fps is (10 sec: 7140.2, 60 sec: 6826.7, 300 sec: 6761.9). Total num frames: 160174080. Throughput: 0: 1648.0. Samples: 35040110. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:58:42,934][41694] Avg episode reward: [(0, '4.508')] +[2024-11-08 04:58:42,998][42004] Updated weights for policy 0, policy_version 39106 (0.0034) +[2024-11-08 04:58:47,932][41694] Fps is (10 sec: 7373.0, 60 sec: 6758.4, 300 sec: 6748.0). Total num frames: 160210944. Throughput: 0: 1692.2. Samples: 35045354. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:58:47,933][41694] Avg episode reward: [(0, '4.586')] +[2024-11-08 04:58:49,009][42004] Updated weights for policy 0, policy_version 39116 (0.0025) +[2024-11-08 04:58:52,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6758.4, 300 sec: 6748.0). Total num frames: 160247808. Throughput: 0: 1713.3. Samples: 35056074. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:58:52,934][41694] Avg episode reward: [(0, '4.318')] +[2024-11-08 04:58:54,396][42004] Updated weights for policy 0, policy_version 39126 (0.0036) +[2024-11-08 04:58:57,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6758.4, 300 sec: 6784.0). Total num frames: 160284672. Throughput: 0: 1755.2. Samples: 35067426. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:58:57,937][41694] Avg episode reward: [(0, '4.427')] +[2024-11-08 04:59:00,006][42004] Updated weights for policy 0, policy_version 39136 (0.0029) +[2024-11-08 04:59:02,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6963.3, 300 sec: 6761.9). Total num frames: 160317440. Throughput: 0: 1764.8. Samples: 35072768. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:59:02,933][41694] Avg episode reward: [(0, '4.433')] +[2024-11-08 04:59:06,312][42004] Updated weights for policy 0, policy_version 39146 (0.0026) +[2024-11-08 04:59:08,475][41694] Fps is (10 sec: 5438.9, 60 sec: 6697.8, 300 sec: 6707.9). Total num frames: 160342016. Throughput: 0: 1708.9. Samples: 35082568. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:59:08,478][41694] Avg episode reward: [(0, '4.676')] +[2024-11-08 04:59:12,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6826.7, 300 sec: 6720.2). Total num frames: 160374784. Throughput: 0: 1642.6. Samples: 35090140. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:59:12,935][41694] Avg episode reward: [(0, '4.447')] +[2024-11-08 04:59:13,737][42004] Updated weights for policy 0, policy_version 39156 (0.0037) +[2024-11-08 04:59:17,931][41694] Fps is (10 sec: 7363.2, 60 sec: 6758.4, 300 sec: 6720.2). Total num frames: 160411648. Throughput: 0: 1694.6. Samples: 35095598. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:59:17,933][41694] Avg episode reward: [(0, '4.414')] +[2024-11-08 04:59:19,618][42004] Updated weights for policy 0, policy_version 39166 (0.0024) +[2024-11-08 04:59:22,932][41694] Fps is (10 sec: 6962.9, 60 sec: 6690.1, 300 sec: 6706.3). Total num frames: 160444416. Throughput: 0: 1708.6. Samples: 35105966. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 04:59:22,934][41694] Avg episode reward: [(0, '4.415')] +[2024-11-08 04:59:25,379][42004] Updated weights for policy 0, policy_version 39176 (0.0027) +[2024-11-08 04:59:27,932][41694] Fps is (10 sec: 6963.0, 60 sec: 6690.1, 300 sec: 6692.5). Total num frames: 160481280. Throughput: 0: 1703.2. Samples: 35116754. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:59:27,933][41694] Avg episode reward: [(0, '4.411')] +[2024-11-08 04:59:30,866][42004] Updated weights for policy 0, policy_version 39186 (0.0020) +[2024-11-08 04:59:32,932][41694] Fps is (10 sec: 7373.0, 60 sec: 6919.3, 300 sec: 6749.9). Total num frames: 160518144. Throughput: 0: 1707.7. Samples: 35122202. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:59:32,934][41694] Avg episode reward: [(0, '4.576')] +[2024-11-08 04:59:36,407][42004] Updated weights for policy 0, policy_version 39196 (0.0026) +[2024-11-08 04:59:37,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6963.2, 300 sec: 6761.9). Total num frames: 160555008. Throughput: 0: 1721.4. Samples: 35133538. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:59:37,933][41694] Avg episode reward: [(0, '4.542')] +[2024-11-08 04:59:37,948][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000039198_160555008.pth... +[2024-11-08 04:59:38,096][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000038800_158924800.pth +[2024-11-08 04:59:42,932][41694] Fps is (10 sec: 6143.7, 60 sec: 6758.3, 300 sec: 6720.3). Total num frames: 160579584. Throughput: 0: 1656.1. Samples: 35141950. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:59:42,935][41694] Avg episode reward: [(0, '4.466')] +[2024-11-08 04:59:43,871][42004] Updated weights for policy 0, policy_version 39206 (0.0033) +[2024-11-08 04:59:47,936][41694] Fps is (10 sec: 6141.1, 60 sec: 6757.9, 300 sec: 6734.0). Total num frames: 160616448. Throughput: 0: 1638.8. Samples: 35146524. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:59:47,939][41694] Avg episode reward: [(0, '4.500')] +[2024-11-08 04:59:49,482][42004] Updated weights for policy 0, policy_version 39216 (0.0030) +[2024-11-08 04:59:52,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6690.1, 300 sec: 6720.2). Total num frames: 160649216. Throughput: 0: 1681.8. Samples: 35157338. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:59:52,934][41694] Avg episode reward: [(0, '4.474')] +[2024-11-08 04:59:55,892][42004] Updated weights for policy 0, policy_version 39226 (0.0035) +[2024-11-08 04:59:57,932][41694] Fps is (10 sec: 6556.2, 60 sec: 6621.8, 300 sec: 6706.3). Total num frames: 160681984. Throughput: 0: 1710.9. Samples: 35167132. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 04:59:57,934][41694] Avg episode reward: [(0, '4.384')] +[2024-11-08 05:00:01,426][42004] Updated weights for policy 0, policy_version 39236 (0.0026) +[2024-11-08 05:00:02,932][41694] Fps is (10 sec: 6963.4, 60 sec: 6690.1, 300 sec: 6720.2). Total num frames: 160718848. Throughput: 0: 1713.5. Samples: 35172708. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:00:02,934][41694] Avg episode reward: [(0, '4.378')] +[2024-11-08 05:00:07,119][42004] Updated weights for policy 0, policy_version 39246 (0.0040) +[2024-11-08 05:00:07,932][41694] Fps is (10 sec: 7373.2, 60 sec: 6957.9, 300 sec: 6748.0). Total num frames: 160755712. Throughput: 0: 1719.0. Samples: 35183322. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:00:07,934][41694] Avg episode reward: [(0, '4.424')] +[2024-11-08 05:00:12,864][42004] Updated weights for policy 0, policy_version 39256 (0.0031) +[2024-11-08 05:00:12,932][41694] Fps is (10 sec: 7372.9, 60 sec: 6963.2, 300 sec: 6775.8). Total num frames: 160792576. Throughput: 0: 1722.1. Samples: 35194248. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:00:12,934][41694] Avg episode reward: [(0, '4.482')] +[2024-11-08 05:00:17,932][41694] Fps is (10 sec: 5324.8, 60 sec: 6621.8, 300 sec: 6761.9). Total num frames: 160808960. Throughput: 0: 1690.4. Samples: 35198272. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:00:17,934][41694] Avg episode reward: [(0, '4.683')] +[2024-11-08 05:00:21,180][42004] Updated weights for policy 0, policy_version 39266 (0.0035) +[2024-11-08 05:00:22,931][41694] Fps is (10 sec: 5324.9, 60 sec: 6690.2, 300 sec: 6761.9). Total num frames: 160845824. Throughput: 0: 1603.4. Samples: 35205692. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:00:22,935][41694] Avg episode reward: [(0, '4.568')] +[2024-11-08 05:00:26,862][42004] Updated weights for policy 0, policy_version 39276 (0.0029) +[2024-11-08 05:00:27,932][41694] Fps is (10 sec: 6963.3, 60 sec: 6621.9, 300 sec: 6748.0). Total num frames: 160878592. Throughput: 0: 1651.9. Samples: 35216284. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:00:27,933][41694] Avg episode reward: [(0, '4.378')] +[2024-11-08 05:00:32,383][42004] Updated weights for policy 0, policy_version 39286 (0.0027) +[2024-11-08 05:00:32,931][41694] Fps is (10 sec: 6963.1, 60 sec: 6621.9, 300 sec: 6761.9). Total num frames: 160915456. Throughput: 0: 1666.8. Samples: 35221524. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:00:32,933][41694] Avg episode reward: [(0, '4.266')] +[2024-11-08 05:00:37,811][42004] Updated weights for policy 0, policy_version 39296 (0.0033) +[2024-11-08 05:00:37,931][41694] Fps is (10 sec: 7782.6, 60 sec: 6690.1, 300 sec: 6775.8). Total num frames: 160956416. Throughput: 0: 1685.0. Samples: 35233162. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:00:37,932][41694] Avg episode reward: [(0, '4.550')] +[2024-11-08 05:00:42,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6895.0, 300 sec: 6831.3). Total num frames: 160993280. Throughput: 0: 1723.0. Samples: 35244664. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:00:42,934][41694] Avg episode reward: [(0, '4.448')] +[2024-11-08 05:00:43,226][42004] Updated weights for policy 0, policy_version 39306 (0.0031) +[2024-11-08 05:00:47,932][41694] Fps is (10 sec: 7372.4, 60 sec: 6895.4, 300 sec: 6831.3). Total num frames: 161030144. Throughput: 0: 1714.4. Samples: 35249856. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:00:47,936][41694] Avg episode reward: [(0, '4.401')] +[2024-11-08 05:00:48,772][42004] Updated weights for policy 0, policy_version 39316 (0.0031) +[2024-11-08 05:00:52,931][41694] Fps is (10 sec: 5734.5, 60 sec: 6690.2, 300 sec: 6789.6). Total num frames: 161050624. Throughput: 0: 1652.5. Samples: 35257686. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:00:52,933][41694] Avg episode reward: [(0, '4.344')] +[2024-11-08 05:00:56,350][42004] Updated weights for policy 0, policy_version 39326 (0.0023) +[2024-11-08 05:00:57,932][41694] Fps is (10 sec: 5734.6, 60 sec: 6758.5, 300 sec: 6803.5). Total num frames: 161087488. Throughput: 0: 1652.4. Samples: 35268606. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:00:57,934][41694] Avg episode reward: [(0, '4.494')] +[2024-11-08 05:01:02,580][42004] Updated weights for policy 0, policy_version 39336 (0.0038) +[2024-11-08 05:01:02,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6690.2, 300 sec: 6803.5). Total num frames: 161120256. Throughput: 0: 1672.8. Samples: 35273548. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:01:02,933][41694] Avg episode reward: [(0, '4.233')] +[2024-11-08 05:01:07,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6690.1, 300 sec: 6803.5). Total num frames: 161157120. Throughput: 0: 1738.9. Samples: 35283942. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:01:07,937][41694] Avg episode reward: [(0, '4.617')] +[2024-11-08 05:01:08,020][42004] Updated weights for policy 0, policy_version 39346 (0.0037) +[2024-11-08 05:01:12,932][41694] Fps is (10 sec: 6963.0, 60 sec: 6621.9, 300 sec: 6824.0). Total num frames: 161189888. Throughput: 0: 1728.8. Samples: 35294080. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:01:12,935][41694] Avg episode reward: [(0, '4.411')] +[2024-11-08 05:01:14,334][42004] Updated weights for policy 0, policy_version 39356 (0.0037) +[2024-11-08 05:01:17,932][41694] Fps is (10 sec: 6963.3, 60 sec: 6963.2, 300 sec: 6831.3). Total num frames: 161226752. Throughput: 0: 1728.9. Samples: 35299326. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:01:17,934][41694] Avg episode reward: [(0, '4.532')] +[2024-11-08 05:01:20,200][42004] Updated weights for policy 0, policy_version 39366 (0.0024) +[2024-11-08 05:01:22,931][41694] Fps is (10 sec: 7373.0, 60 sec: 6963.2, 300 sec: 6831.3). Total num frames: 161263616. Throughput: 0: 1708.4. Samples: 35310040. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:01:22,933][41694] Avg episode reward: [(0, '4.417')] +[2024-11-08 05:01:27,480][42004] Updated weights for policy 0, policy_version 39376 (0.0043) +[2024-11-08 05:01:27,931][41694] Fps is (10 sec: 5734.4, 60 sec: 6758.4, 300 sec: 6789.6). Total num frames: 161284096. Throughput: 0: 1620.3. Samples: 35317578. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:01:27,934][41694] Avg episode reward: [(0, '4.498')] +[2024-11-08 05:01:32,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6758.4, 300 sec: 6789.6). Total num frames: 161320960. Throughput: 0: 1626.5. Samples: 35323050. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:01:32,933][41694] Avg episode reward: [(0, '4.265')] +[2024-11-08 05:01:33,131][42004] Updated weights for policy 0, policy_version 39386 (0.0031) +[2024-11-08 05:01:37,932][41694] Fps is (10 sec: 7372.1, 60 sec: 6690.0, 300 sec: 6789.6). Total num frames: 161357824. Throughput: 0: 1686.9. Samples: 35333596. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:01:37,934][41694] Avg episode reward: [(0, '4.620')] +[2024-11-08 05:01:37,964][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000039394_161357824.pth... +[2024-11-08 05:01:38,084][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000038997_159731712.pth +[2024-11-08 05:01:38,869][42004] Updated weights for policy 0, policy_version 39396 (0.0042) +[2024-11-08 05:01:42,931][41694] Fps is (10 sec: 7373.0, 60 sec: 6690.1, 300 sec: 6789.6). Total num frames: 161394688. Throughput: 0: 1693.8. Samples: 35344826. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:01:42,933][41694] Avg episode reward: [(0, '4.543')] +[2024-11-08 05:01:44,330][42004] Updated weights for policy 0, policy_version 39406 (0.0031) +[2024-11-08 05:01:47,931][41694] Fps is (10 sec: 7373.5, 60 sec: 6690.2, 300 sec: 6831.3). Total num frames: 161431552. Throughput: 0: 1712.1. Samples: 35350592. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:01:47,933][41694] Avg episode reward: [(0, '4.351')] +[2024-11-08 05:01:49,836][42004] Updated weights for policy 0, policy_version 39416 (0.0026) +[2024-11-08 05:01:52,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6963.2, 300 sec: 6831.4). Total num frames: 161468416. Throughput: 0: 1727.8. Samples: 35361692. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:01:52,935][41694] Avg episode reward: [(0, '4.269')] +[2024-11-08 05:01:55,472][42004] Updated weights for policy 0, policy_version 39426 (0.0026) +[2024-11-08 05:01:59,041][41694] Fps is (10 sec: 5899.2, 60 sec: 6702.7, 300 sec: 6778.0). Total num frames: 161497088. Throughput: 0: 1700.4. Samples: 35372482. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:01:59,042][41694] Avg episode reward: [(0, '4.380')] +[2024-11-08 05:02:02,931][41694] Fps is (10 sec: 5734.5, 60 sec: 6758.4, 300 sec: 6789.6). Total num frames: 161525760. Throughput: 0: 1667.4. Samples: 35374360. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:02:02,933][41694] Avg episode reward: [(0, '4.336')] +[2024-11-08 05:02:03,411][42004] Updated weights for policy 0, policy_version 39436 (0.0027) +[2024-11-08 05:02:07,932][41694] Fps is (10 sec: 6910.4, 60 sec: 6690.1, 300 sec: 6775.7). Total num frames: 161558528. Throughput: 0: 1656.6. Samples: 35384588. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:02:07,934][41694] Avg episode reward: [(0, '4.445')] +[2024-11-08 05:02:09,428][42004] Updated weights for policy 0, policy_version 39446 (0.0035) +[2024-11-08 05:02:12,932][41694] Fps is (10 sec: 6962.7, 60 sec: 6758.4, 300 sec: 6775.7). Total num frames: 161595392. Throughput: 0: 1713.2. Samples: 35394674. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:02:12,935][41694] Avg episode reward: [(0, '4.368')] +[2024-11-08 05:02:15,052][42004] Updated weights for policy 0, policy_version 39456 (0.0045) +[2024-11-08 05:02:17,931][41694] Fps is (10 sec: 7373.0, 60 sec: 6758.4, 300 sec: 6775.8). Total num frames: 161632256. Throughput: 0: 1719.3. Samples: 35400418. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:02:17,934][41694] Avg episode reward: [(0, '4.288')] +[2024-11-08 05:02:20,611][42004] Updated weights for policy 0, policy_version 39466 (0.0028) +[2024-11-08 05:02:22,931][41694] Fps is (10 sec: 7373.3, 60 sec: 6758.4, 300 sec: 6817.4). Total num frames: 161669120. Throughput: 0: 1731.7. Samples: 35411522. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:02:22,933][41694] Avg episode reward: [(0, '4.323')] +[2024-11-08 05:02:26,153][42004] Updated weights for policy 0, policy_version 39476 (0.0028) +[2024-11-08 05:02:27,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6963.2, 300 sec: 6803.5). Total num frames: 161701888. Throughput: 0: 1726.7. Samples: 35422526. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:02:27,934][41694] Avg episode reward: [(0, '4.499')] +[2024-11-08 05:02:33,215][41694] Fps is (10 sec: 5576.1, 60 sec: 6726.6, 300 sec: 6755.4). Total num frames: 161726464. Throughput: 0: 1701.6. Samples: 35427646. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:02:33,218][41694] Avg episode reward: [(0, '4.312')] +[2024-11-08 05:02:33,881][42004] Updated weights for policy 0, policy_version 39486 (0.0035) +[2024-11-08 05:02:37,931][41694] Fps is (10 sec: 6144.0, 60 sec: 6758.5, 300 sec: 6775.8). Total num frames: 161763328. Throughput: 0: 1629.4. Samples: 35435014. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:02:37,933][41694] Avg episode reward: [(0, '4.497')] +[2024-11-08 05:02:39,538][42004] Updated weights for policy 0, policy_version 39496 (0.0032) +[2024-11-08 05:02:42,932][41694] Fps is (10 sec: 7166.5, 60 sec: 6690.1, 300 sec: 6748.0). Total num frames: 161796096. Throughput: 0: 1663.2. Samples: 35445480. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:02:42,935][41694] Avg episode reward: [(0, '4.374')] +[2024-11-08 05:02:45,588][42004] Updated weights for policy 0, policy_version 39506 (0.0033) +[2024-11-08 05:02:47,932][41694] Fps is (10 sec: 6962.8, 60 sec: 6690.1, 300 sec: 6748.0). Total num frames: 161832960. Throughput: 0: 1690.9. Samples: 35450450. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:02:47,934][41694] Avg episode reward: [(0, '4.267')] +[2024-11-08 05:02:51,368][42004] Updated weights for policy 0, policy_version 39516 (0.0037) +[2024-11-08 05:02:52,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6621.9, 300 sec: 6734.1). Total num frames: 161865728. Throughput: 0: 1703.9. Samples: 35461262. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:02:52,933][41694] Avg episode reward: [(0, '4.338')] +[2024-11-08 05:02:56,689][42004] Updated weights for policy 0, policy_version 39526 (0.0023) +[2024-11-08 05:02:57,932][41694] Fps is (10 sec: 7373.1, 60 sec: 6955.2, 300 sec: 6803.5). Total num frames: 161906688. Throughput: 0: 1737.7. Samples: 35472868. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:02:57,935][41694] Avg episode reward: [(0, '4.408')] +[2024-11-08 05:03:02,714][42004] Updated weights for policy 0, policy_version 39536 (0.0031) +[2024-11-08 05:03:02,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6894.9, 300 sec: 6789.6). Total num frames: 161939456. Throughput: 0: 1721.0. Samples: 35477864. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:03:02,933][41694] Avg episode reward: [(0, '4.516')] +[2024-11-08 05:03:07,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6758.4, 300 sec: 6775.8). Total num frames: 161964032. Throughput: 0: 1687.6. Samples: 35487464. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:03:07,933][41694] Avg episode reward: [(0, '4.392')] +[2024-11-08 05:03:10,241][42004] Updated weights for policy 0, policy_version 39546 (0.0033) +[2024-11-08 05:03:12,939][41694] Fps is (10 sec: 5730.8, 60 sec: 6689.5, 300 sec: 6747.8). Total num frames: 161996800. Throughput: 0: 1632.3. Samples: 35495988. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:03:12,941][41694] Avg episode reward: [(0, '4.599')] +[2024-11-08 05:03:16,118][42004] Updated weights for policy 0, policy_version 39556 (0.0037) +[2024-11-08 05:03:17,933][41694] Fps is (10 sec: 6553.0, 60 sec: 6621.7, 300 sec: 6734.1). Total num frames: 162029568. Throughput: 0: 1640.7. Samples: 35501014. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:03:17,937][41694] Avg episode reward: [(0, '4.687')] +[2024-11-08 05:03:22,033][42004] Updated weights for policy 0, policy_version 39566 (0.0030) +[2024-11-08 05:03:22,931][41694] Fps is (10 sec: 6967.6, 60 sec: 6621.9, 300 sec: 6734.1). Total num frames: 162066432. Throughput: 0: 1693.6. Samples: 35511228. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:03:22,933][41694] Avg episode reward: [(0, '4.423')] +[2024-11-08 05:03:27,369][42004] Updated weights for policy 0, policy_version 39576 (0.0025) +[2024-11-08 05:03:27,931][41694] Fps is (10 sec: 7783.3, 60 sec: 6758.4, 300 sec: 6794.0). Total num frames: 162107392. Throughput: 0: 1715.4. Samples: 35522672. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:03:27,933][41694] Avg episode reward: [(0, '4.393')] +[2024-11-08 05:03:32,858][42004] Updated weights for policy 0, policy_version 39586 (0.0030) +[2024-11-08 05:03:32,932][41694] Fps is (10 sec: 7782.3, 60 sec: 6996.3, 300 sec: 6803.5). Total num frames: 162144256. Throughput: 0: 1729.7. Samples: 35528284. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:03:32,961][41694] Avg episode reward: [(0, '4.486')] +[2024-11-08 05:03:37,932][41694] Fps is (10 sec: 7372.6, 60 sec: 6963.2, 300 sec: 6803.5). Total num frames: 162181120. Throughput: 0: 1737.2. Samples: 35539436. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:03:37,934][41694] Avg episode reward: [(0, '4.353')] +[2024-11-08 05:03:37,945][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000039595_162181120.pth... +[2024-11-08 05:03:38,078][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000039198_160555008.pth +[2024-11-08 05:03:38,338][42004] Updated weights for policy 0, policy_version 39596 (0.0022) +[2024-11-08 05:03:42,931][41694] Fps is (10 sec: 5734.5, 60 sec: 6758.4, 300 sec: 6748.0). Total num frames: 162201600. Throughput: 0: 1648.3. Samples: 35547042. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:03:42,933][41694] Avg episode reward: [(0, '4.453')] +[2024-11-08 05:03:46,120][42004] Updated weights for policy 0, policy_version 39606 (0.0031) +[2024-11-08 05:03:47,931][41694] Fps is (10 sec: 5325.0, 60 sec: 6690.2, 300 sec: 6734.1). Total num frames: 162234368. Throughput: 0: 1648.3. Samples: 35552036. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:03:47,933][41694] Avg episode reward: [(0, '4.336')] +[2024-11-08 05:03:52,387][42004] Updated weights for policy 0, policy_version 39616 (0.0027) +[2024-11-08 05:03:52,933][41694] Fps is (10 sec: 6552.5, 60 sec: 6690.0, 300 sec: 6720.2). Total num frames: 162267136. Throughput: 0: 1654.9. Samples: 35561938. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:03:52,936][41694] Avg episode reward: [(0, '4.468')] +[2024-11-08 05:03:57,932][41694] Fps is (10 sec: 6553.5, 60 sec: 6553.6, 300 sec: 6720.2). Total num frames: 162299904. Throughput: 0: 1672.7. Samples: 35571250. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:03:57,934][41694] Avg episode reward: [(0, '4.577')] +[2024-11-08 05:03:58,790][42004] Updated weights for policy 0, policy_version 39626 (0.0038) +[2024-11-08 05:04:02,932][41694] Fps is (10 sec: 6144.7, 60 sec: 6485.3, 300 sec: 6746.5). Total num frames: 162328576. Throughput: 0: 1675.0. Samples: 35576390. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:04:02,934][41694] Avg episode reward: [(0, '4.456')] +[2024-11-08 05:04:05,673][42004] Updated weights for policy 0, policy_version 39636 (0.0036) +[2024-11-08 05:04:07,932][41694] Fps is (10 sec: 6144.0, 60 sec: 6621.9, 300 sec: 6734.1). Total num frames: 162361344. Throughput: 0: 1646.3. Samples: 35585310. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:04:07,936][41694] Avg episode reward: [(0, '4.442')] +[2024-11-08 05:04:11,542][42004] Updated weights for policy 0, policy_version 39646 (0.0027) +[2024-11-08 05:04:12,932][41694] Fps is (10 sec: 6963.4, 60 sec: 6690.8, 300 sec: 6734.1). Total num frames: 162398208. Throughput: 0: 1630.0. Samples: 35596022. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:04:12,938][41694] Avg episode reward: [(0, '4.540')] +[2024-11-08 05:04:17,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6485.4, 300 sec: 6692.5). Total num frames: 162418688. Throughput: 0: 1562.5. Samples: 35598598. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:04:17,934][41694] Avg episode reward: [(0, '4.533')] +[2024-11-08 05:04:19,184][42004] Updated weights for policy 0, policy_version 39656 (0.0027) +[2024-11-08 05:04:22,931][41694] Fps is (10 sec: 5734.5, 60 sec: 6485.3, 300 sec: 6692.5). Total num frames: 162455552. Throughput: 0: 1534.5. Samples: 35608488. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:04:22,933][41694] Avg episode reward: [(0, '4.678')] +[2024-11-08 05:04:25,335][42004] Updated weights for policy 0, policy_version 39666 (0.0036) +[2024-11-08 05:04:27,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6348.8, 300 sec: 6678.6). Total num frames: 162488320. Throughput: 0: 1585.6. Samples: 35618396. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:04:27,935][41694] Avg episode reward: [(0, '4.462')] +[2024-11-08 05:04:30,895][42004] Updated weights for policy 0, policy_version 39676 (0.0030) +[2024-11-08 05:04:32,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6348.8, 300 sec: 6678.6). Total num frames: 162525184. Throughput: 0: 1600.0. Samples: 35624034. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:04:32,933][41694] Avg episode reward: [(0, '4.527')] +[2024-11-08 05:04:36,514][42004] Updated weights for policy 0, policy_version 39686 (0.0020) +[2024-11-08 05:04:37,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6348.8, 300 sec: 6720.2). Total num frames: 162562048. Throughput: 0: 1626.2. Samples: 35635116. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:04:37,933][41694] Avg episode reward: [(0, '4.300')] +[2024-11-08 05:04:42,276][42004] Updated weights for policy 0, policy_version 39696 (0.0039) +[2024-11-08 05:04:42,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6621.9, 300 sec: 6720.3). Total num frames: 162598912. Throughput: 0: 1658.1. Samples: 35645864. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:04:42,934][41694] Avg episode reward: [(0, '4.491')] +[2024-11-08 05:04:49,275][41694] Fps is (10 sec: 6138.4, 60 sec: 6476.8, 300 sec: 6689.8). Total num frames: 162631680. Throughput: 0: 1610.7. Samples: 35651036. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:04:49,278][41694] Avg episode reward: [(0, '4.626')] +[2024-11-08 05:04:49,671][42004] Updated weights for policy 0, policy_version 39706 (0.0030) +[2024-11-08 05:04:52,931][41694] Fps is (10 sec: 5734.5, 60 sec: 6485.5, 300 sec: 6692.5). Total num frames: 162656256. Throughput: 0: 1635.1. Samples: 35658888. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:04:52,933][41694] Avg episode reward: [(0, '4.657')] +[2024-11-08 05:04:55,523][42004] Updated weights for policy 0, policy_version 39716 (0.0026) +[2024-11-08 05:04:57,932][41694] Fps is (10 sec: 6624.5, 60 sec: 6485.3, 300 sec: 6678.6). Total num frames: 162689024. Throughput: 0: 1617.1. Samples: 35668792. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:04:57,934][41694] Avg episode reward: [(0, '4.460')] +[2024-11-08 05:05:02,162][42004] Updated weights for policy 0, policy_version 39726 (0.0048) +[2024-11-08 05:05:02,932][41694] Fps is (10 sec: 6553.5, 60 sec: 6553.6, 300 sec: 6664.7). Total num frames: 162721792. Throughput: 0: 1656.6. Samples: 35673144. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:05:02,934][41694] Avg episode reward: [(0, '4.525')] +[2024-11-08 05:05:07,693][42004] Updated weights for policy 0, policy_version 39736 (0.0034) +[2024-11-08 05:05:07,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6621.9, 300 sec: 6664.7). Total num frames: 162758656. Throughput: 0: 1678.7. Samples: 35684028. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:05:07,936][41694] Avg episode reward: [(0, '4.643')] +[2024-11-08 05:05:12,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6621.9, 300 sec: 6734.1). Total num frames: 162795520. Throughput: 0: 1699.5. Samples: 35694872. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:05:12,934][41694] Avg episode reward: [(0, '4.500')] +[2024-11-08 05:05:13,638][42004] Updated weights for policy 0, policy_version 39746 (0.0030) +[2024-11-08 05:05:17,932][41694] Fps is (10 sec: 6553.6, 60 sec: 6758.4, 300 sec: 6706.3). Total num frames: 162824192. Throughput: 0: 1677.5. Samples: 35699520. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:05:17,936][41694] Avg episode reward: [(0, '4.531')] +[2024-11-08 05:05:19,835][42004] Updated weights for policy 0, policy_version 39756 (0.0038) +[2024-11-08 05:05:23,336][41694] Fps is (10 sec: 5118.0, 60 sec: 6509.7, 300 sec: 6669.4). Total num frames: 162848768. Throughput: 0: 1648.5. Samples: 35709964. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:05:23,337][41694] Avg episode reward: [(0, '4.615')] +[2024-11-08 05:05:27,438][42004] Updated weights for policy 0, policy_version 39766 (0.0044) +[2024-11-08 05:05:27,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6553.6, 300 sec: 6664.7). Total num frames: 162881536. Throughput: 0: 1585.5. Samples: 35717210. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:05:27,935][41694] Avg episode reward: [(0, '4.558')] +[2024-11-08 05:05:32,932][41694] Fps is (10 sec: 7256.4, 60 sec: 6553.6, 300 sec: 6650.8). Total num frames: 162918400. Throughput: 0: 1629.5. Samples: 35722174. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:05:32,934][41694] Avg episode reward: [(0, '4.508')] +[2024-11-08 05:05:33,466][42004] Updated weights for policy 0, policy_version 39776 (0.0032) +[2024-11-08 05:05:37,932][41694] Fps is (10 sec: 7372.5, 60 sec: 6553.5, 300 sec: 6650.8). Total num frames: 162955264. Throughput: 0: 1648.2. Samples: 35733056. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:05:37,934][41694] Avg episode reward: [(0, '4.446')] +[2024-11-08 05:05:37,942][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000039784_162955264.pth... +[2024-11-08 05:05:38,073][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000039394_161357824.pth +[2024-11-08 05:05:38,736][42004] Updated weights for policy 0, policy_version 39786 (0.0027) +[2024-11-08 05:05:42,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6553.6, 300 sec: 6650.8). Total num frames: 162992128. Throughput: 0: 1675.9. Samples: 35744206. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:05:42,934][41694] Avg episode reward: [(0, '4.376')] +[2024-11-08 05:05:44,718][42004] Updated weights for policy 0, policy_version 39796 (0.0031) +[2024-11-08 05:05:47,932][41694] Fps is (10 sec: 6963.4, 60 sec: 6703.7, 300 sec: 6692.4). Total num frames: 163024896. Throughput: 0: 1690.1. Samples: 35749200. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:05:47,934][41694] Avg episode reward: [(0, '4.215')] +[2024-11-08 05:05:50,322][42004] Updated weights for policy 0, policy_version 39806 (0.0034) +[2024-11-08 05:05:52,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6758.4, 300 sec: 6692.5). Total num frames: 163061760. Throughput: 0: 1690.8. Samples: 35760114. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:05:52,935][41694] Avg episode reward: [(0, '4.425')] +[2024-11-08 05:05:57,762][42004] Updated weights for policy 0, policy_version 39816 (0.0025) +[2024-11-08 05:05:57,931][41694] Fps is (10 sec: 6144.2, 60 sec: 6621.9, 300 sec: 6664.7). Total num frames: 163086336. Throughput: 0: 1631.0. Samples: 35768268. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:05:57,934][41694] Avg episode reward: [(0, '4.546')] +[2024-11-08 05:06:02,932][41694] Fps is (10 sec: 5734.1, 60 sec: 6621.8, 300 sec: 6650.8). Total num frames: 163119104. Throughput: 0: 1634.3. Samples: 35773064. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:06:02,938][41694] Avg episode reward: [(0, '4.463')] +[2024-11-08 05:06:03,754][42004] Updated weights for policy 0, policy_version 39826 (0.0038) +[2024-11-08 05:06:07,932][41694] Fps is (10 sec: 6553.5, 60 sec: 6553.6, 300 sec: 6650.8). Total num frames: 163151872. Throughput: 0: 1633.4. Samples: 35782806. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:06:07,936][41694] Avg episode reward: [(0, '4.480')] +[2024-11-08 05:06:10,089][42004] Updated weights for policy 0, policy_version 39836 (0.0031) +[2024-11-08 05:06:12,932][41694] Fps is (10 sec: 6553.9, 60 sec: 6485.3, 300 sec: 6636.9). Total num frames: 163184640. Throughput: 0: 1671.7. Samples: 35792438. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:06:12,933][41694] Avg episode reward: [(0, '4.453')] +[2024-11-08 05:06:16,330][42004] Updated weights for policy 0, policy_version 39846 (0.0027) +[2024-11-08 05:06:17,932][41694] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6623.0). Total num frames: 163217408. Throughput: 0: 1670.8. Samples: 35797362. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:06:17,934][41694] Avg episode reward: [(0, '4.406')] +[2024-11-08 05:06:21,984][42004] Updated weights for policy 0, policy_version 39856 (0.0026) +[2024-11-08 05:06:22,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6804.2, 300 sec: 6678.6). Total num frames: 163254272. Throughput: 0: 1673.0. Samples: 35808340. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:06:22,934][41694] Avg episode reward: [(0, '4.494')] +[2024-11-08 05:06:27,553][42004] Updated weights for policy 0, policy_version 39866 (0.0033) +[2024-11-08 05:06:27,931][41694] Fps is (10 sec: 7373.0, 60 sec: 6826.7, 300 sec: 6678.6). Total num frames: 163291136. Throughput: 0: 1669.7. Samples: 35819344. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:06:27,933][41694] Avg episode reward: [(0, '4.473')] +[2024-11-08 05:06:32,932][41694] Fps is (10 sec: 6143.9, 60 sec: 6621.9, 300 sec: 6636.9). Total num frames: 163315712. Throughput: 0: 1665.6. Samples: 35824150. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:06:32,935][41694] Avg episode reward: [(0, '4.534')] +[2024-11-08 05:06:35,092][42004] Updated weights for policy 0, policy_version 39876 (0.0035) +[2024-11-08 05:06:37,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6553.7, 300 sec: 6623.0). Total num frames: 163348480. Throughput: 0: 1603.7. Samples: 35832280. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:06:37,935][41694] Avg episode reward: [(0, '4.336')] +[2024-11-08 05:06:41,350][42004] Updated weights for policy 0, policy_version 39886 (0.0035) +[2024-11-08 05:06:42,932][41694] Fps is (10 sec: 6553.6, 60 sec: 6485.3, 300 sec: 6609.1). Total num frames: 163381248. Throughput: 0: 1640.1. Samples: 35842072. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:06:42,934][41694] Avg episode reward: [(0, '4.322')] +[2024-11-08 05:06:47,050][42004] Updated weights for policy 0, policy_version 39896 (0.0027) +[2024-11-08 05:06:47,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6553.6, 300 sec: 6609.1). Total num frames: 163418112. Throughput: 0: 1648.6. Samples: 35847248. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:06:47,934][41694] Avg episode reward: [(0, '4.323')] +[2024-11-08 05:06:52,736][42004] Updated weights for policy 0, policy_version 39906 (0.0038) +[2024-11-08 05:06:52,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6553.6, 300 sec: 6662.0). Total num frames: 163454976. Throughput: 0: 1667.5. Samples: 35857844. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:06:52,933][41694] Avg episode reward: [(0, '4.361')] +[2024-11-08 05:06:57,931][41694] Fps is (10 sec: 7372.7, 60 sec: 6758.4, 300 sec: 6664.7). Total num frames: 163491840. Throughput: 0: 1706.0. Samples: 35869208. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:06:57,933][41694] Avg episode reward: [(0, '4.515')] +[2024-11-08 05:06:58,175][42004] Updated weights for policy 0, policy_version 39916 (0.0029) +[2024-11-08 05:07:02,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6826.7, 300 sec: 6678.6). Total num frames: 163528704. Throughput: 0: 1720.5. Samples: 35874782. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:07:02,933][41694] Avg episode reward: [(0, '4.332')] +[2024-11-08 05:07:05,643][42004] Updated weights for policy 0, policy_version 39926 (0.0032) +[2024-11-08 05:07:07,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6621.9, 300 sec: 6623.0). Total num frames: 163549184. Throughput: 0: 1642.1. Samples: 35882236. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:07:07,933][41694] Avg episode reward: [(0, '4.674')] +[2024-11-08 05:07:11,799][42004] Updated weights for policy 0, policy_version 39936 (0.0045) +[2024-11-08 05:07:12,932][41694] Fps is (10 sec: 5324.8, 60 sec: 6621.9, 300 sec: 6609.1). Total num frames: 163581952. Throughput: 0: 1618.3. Samples: 35892168. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:07:12,933][41694] Avg episode reward: [(0, '4.547')] +[2024-11-08 05:07:17,577][42004] Updated weights for policy 0, policy_version 39946 (0.0022) +[2024-11-08 05:07:17,932][41694] Fps is (10 sec: 6963.3, 60 sec: 6690.1, 300 sec: 6609.1). Total num frames: 163618816. Throughput: 0: 1625.3. Samples: 35897290. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:07:17,935][41694] Avg episode reward: [(0, '4.608')] +[2024-11-08 05:07:22,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6690.1, 300 sec: 6623.0). Total num frames: 163655680. Throughput: 0: 1696.6. Samples: 35908626. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:07:22,934][41694] Avg episode reward: [(0, '4.322')] +[2024-11-08 05:07:22,991][42004] Updated weights for policy 0, policy_version 39956 (0.0032) +[2024-11-08 05:07:27,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6690.1, 300 sec: 6671.1). Total num frames: 163692544. Throughput: 0: 1728.6. Samples: 35919860. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:07:27,933][41694] Avg episode reward: [(0, '4.680')] +[2024-11-08 05:07:28,483][42004] Updated weights for policy 0, policy_version 39966 (0.0025) +[2024-11-08 05:07:32,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6963.2, 300 sec: 6678.6). Total num frames: 163733504. Throughput: 0: 1738.3. Samples: 35925472. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:07:32,934][41694] Avg episode reward: [(0, '4.358')] +[2024-11-08 05:07:33,927][42004] Updated weights for policy 0, policy_version 39976 (0.0025) +[2024-11-08 05:07:39,574][41694] Fps is (10 sec: 6332.8, 60 sec: 6777.7, 300 sec: 6641.6). Total num frames: 163766272. Throughput: 0: 1687.9. Samples: 35936570. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:07:39,576][41694] Avg episode reward: [(0, '4.366')] +[2024-11-08 05:07:39,588][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000039982_163766272.pth... +[2024-11-08 05:07:39,729][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000039595_162181120.pth +[2024-11-08 05:07:41,589][42004] Updated weights for policy 0, policy_version 39986 (0.0050) +[2024-11-08 05:07:42,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6826.7, 300 sec: 6636.9). Total num frames: 163790848. Throughput: 0: 1664.2. Samples: 35944096. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:07:42,933][41694] Avg episode reward: [(0, '4.644')] +[2024-11-08 05:07:47,749][42004] Updated weights for policy 0, policy_version 39996 (0.0033) +[2024-11-08 05:07:47,937][41694] Fps is (10 sec: 6856.8, 60 sec: 6757.8, 300 sec: 6636.8). Total num frames: 163823616. Throughput: 0: 1644.2. Samples: 35948782. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:07:47,946][41694] Avg episode reward: [(0, '4.347')] +[2024-11-08 05:07:52,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6758.4, 300 sec: 6623.0). Total num frames: 163860480. Throughput: 0: 1719.0. Samples: 35959590. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:07:52,933][41694] Avg episode reward: [(0, '4.409')] +[2024-11-08 05:07:53,123][42004] Updated weights for policy 0, policy_version 40006 (0.0029) +[2024-11-08 05:07:57,932][41694] Fps is (10 sec: 7376.7, 60 sec: 6758.4, 300 sec: 6636.9). Total num frames: 163897344. Throughput: 0: 1751.7. Samples: 35970996. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:07:57,933][41694] Avg episode reward: [(0, '4.280')] +[2024-11-08 05:07:58,620][42004] Updated weights for policy 0, policy_version 40016 (0.0028) +[2024-11-08 05:08:02,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6758.4, 300 sec: 6678.6). Total num frames: 163934208. Throughput: 0: 1757.3. Samples: 35976366. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:08:02,933][41694] Avg episode reward: [(0, '4.354')] +[2024-11-08 05:08:04,514][42004] Updated weights for policy 0, policy_version 40026 (0.0031) +[2024-11-08 05:08:07,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6963.2, 300 sec: 6678.7). Total num frames: 163966976. Throughput: 0: 1725.2. Samples: 35986262. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:08:07,934][41694] Avg episode reward: [(0, '4.714')] +[2024-11-08 05:08:10,644][42004] Updated weights for policy 0, policy_version 40036 (0.0033) +[2024-11-08 05:08:13,629][41694] Fps is (10 sec: 5743.2, 60 sec: 6815.7, 300 sec: 6649.0). Total num frames: 163995648. Throughput: 0: 1570.8. Samples: 35991644. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:08:13,631][41694] Avg episode reward: [(0, '4.436')] +[2024-11-08 05:08:17,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6758.4, 300 sec: 6636.9). Total num frames: 164024320. Throughput: 0: 1622.5. Samples: 35998484. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:08:17,934][41694] Avg episode reward: [(0, '4.344')] +[2024-11-08 05:08:18,625][42004] Updated weights for policy 0, policy_version 40046 (0.0029) +[2024-11-08 05:08:22,931][41694] Fps is (10 sec: 6605.0, 60 sec: 6690.1, 300 sec: 6609.1). Total num frames: 164057088. Throughput: 0: 1660.8. Samples: 36008580. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:08:22,935][41694] Avg episode reward: [(0, '4.479')] +[2024-11-08 05:08:24,531][42004] Updated weights for policy 0, policy_version 40056 (0.0041) +[2024-11-08 05:08:27,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6690.1, 300 sec: 6609.1). Total num frames: 164093952. Throughput: 0: 1679.9. Samples: 36019692. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:08:27,933][41694] Avg episode reward: [(0, '4.359')] +[2024-11-08 05:08:30,011][42004] Updated weights for policy 0, policy_version 40066 (0.0034) +[2024-11-08 05:08:32,932][41694] Fps is (10 sec: 7372.6, 60 sec: 6621.9, 300 sec: 6609.1). Total num frames: 164130816. Throughput: 0: 1695.4. Samples: 36025064. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:08:32,933][41694] Avg episode reward: [(0, '4.574')] +[2024-11-08 05:08:35,869][42004] Updated weights for policy 0, policy_version 40076 (0.0037) +[2024-11-08 05:08:37,931][41694] Fps is (10 sec: 6963.4, 60 sec: 6808.2, 300 sec: 6650.8). Total num frames: 164163584. Throughput: 0: 1688.4. Samples: 36035570. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:08:37,933][41694] Avg episode reward: [(0, '4.496')] +[2024-11-08 05:08:41,547][42004] Updated weights for policy 0, policy_version 40086 (0.0027) +[2024-11-08 05:08:42,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6826.7, 300 sec: 6664.7). Total num frames: 164200448. Throughput: 0: 1675.8. Samples: 36046408. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:08:42,934][41694] Avg episode reward: [(0, '4.271')] +[2024-11-08 05:08:47,932][41694] Fps is (10 sec: 5734.0, 60 sec: 6622.4, 300 sec: 6623.0). Total num frames: 164220928. Throughput: 0: 1671.4. Samples: 36051580. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:08:47,937][41694] Avg episode reward: [(0, '4.329')] +[2024-11-08 05:08:49,477][42004] Updated weights for policy 0, policy_version 40096 (0.0042) +[2024-11-08 05:08:52,931][41694] Fps is (10 sec: 5325.0, 60 sec: 6553.6, 300 sec: 6623.0). Total num frames: 164253696. Throughput: 0: 1595.2. Samples: 36058046. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:08:52,934][41694] Avg episode reward: [(0, '4.406')] +[2024-11-08 05:08:56,106][42004] Updated weights for policy 0, policy_version 40106 (0.0063) +[2024-11-08 05:08:57,932][41694] Fps is (10 sec: 6554.0, 60 sec: 6485.3, 300 sec: 6636.9). Total num frames: 164286464. Throughput: 0: 1722.2. Samples: 36067942. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:08:57,933][41694] Avg episode reward: [(0, '4.525')] +[2024-11-08 05:09:01,634][42004] Updated weights for policy 0, policy_version 40116 (0.0033) +[2024-11-08 05:09:02,932][41694] Fps is (10 sec: 6963.0, 60 sec: 6485.3, 300 sec: 6650.8). Total num frames: 164323328. Throughput: 0: 1665.6. Samples: 36073438. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:09:02,934][41694] Avg episode reward: [(0, '4.423')] +[2024-11-08 05:09:07,271][42004] Updated weights for policy 0, policy_version 40126 (0.0029) +[2024-11-08 05:09:07,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6553.6, 300 sec: 6650.8). Total num frames: 164360192. Throughput: 0: 1675.1. Samples: 36083962. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:09:07,933][41694] Avg episode reward: [(0, '4.385')] +[2024-11-08 05:09:12,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6699.8, 300 sec: 6692.4). Total num frames: 164392960. Throughput: 0: 1672.8. Samples: 36094968. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:09:12,934][41694] Avg episode reward: [(0, '4.331')] +[2024-11-08 05:09:13,051][42004] Updated weights for policy 0, policy_version 40136 (0.0035) +[2024-11-08 05:09:17,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6826.7, 300 sec: 6706.3). Total num frames: 164433920. Throughput: 0: 1674.1. Samples: 36100398. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:09:17,933][41694] Avg episode reward: [(0, '4.429')] +[2024-11-08 05:09:18,511][42004] Updated weights for policy 0, policy_version 40146 (0.0024) +[2024-11-08 05:09:22,932][41694] Fps is (10 sec: 6144.1, 60 sec: 6621.8, 300 sec: 6664.7). Total num frames: 164454400. Throughput: 0: 1639.9. Samples: 36109364. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:09:22,934][41694] Avg episode reward: [(0, '4.313')] +[2024-11-08 05:09:26,558][42004] Updated weights for policy 0, policy_version 40156 (0.0034) +[2024-11-08 05:09:27,932][41694] Fps is (10 sec: 5324.7, 60 sec: 6553.6, 300 sec: 6650.8). Total num frames: 164487168. Throughput: 0: 1591.9. Samples: 36118044. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:09:27,935][41694] Avg episode reward: [(0, '4.406')] +[2024-11-08 05:09:32,510][42004] Updated weights for policy 0, policy_version 40166 (0.0031) +[2024-11-08 05:09:32,931][41694] Fps is (10 sec: 6553.7, 60 sec: 6485.4, 300 sec: 6636.9). Total num frames: 164519936. Throughput: 0: 1580.9. Samples: 36122720. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:09:32,933][41694] Avg episode reward: [(0, '4.321')] +[2024-11-08 05:09:37,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6553.6, 300 sec: 6636.9). Total num frames: 164556800. Throughput: 0: 1686.6. Samples: 36133944. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:09:37,934][41694] Avg episode reward: [(0, '4.424')] +[2024-11-08 05:09:38,029][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000040176_164560896.pth... +[2024-11-08 05:09:38,033][42004] Updated weights for policy 0, policy_version 40176 (0.0027) +[2024-11-08 05:09:38,231][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000039784_162955264.pth +[2024-11-08 05:09:42,932][41694] Fps is (10 sec: 7372.3, 60 sec: 6553.6, 300 sec: 6681.2). Total num frames: 164593664. Throughput: 0: 1707.5. Samples: 36144782. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:09:42,935][41694] Avg episode reward: [(0, '4.460')] +[2024-11-08 05:09:43,727][42004] Updated weights for policy 0, policy_version 40186 (0.0031) +[2024-11-08 05:09:47,931][41694] Fps is (10 sec: 6963.4, 60 sec: 6758.5, 300 sec: 6678.6). Total num frames: 164626432. Throughput: 0: 1701.1. Samples: 36149988. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:09:47,933][41694] Avg episode reward: [(0, '4.388')] +[2024-11-08 05:09:49,679][42004] Updated weights for policy 0, policy_version 40196 (0.0024) +[2024-11-08 05:09:52,932][41694] Fps is (10 sec: 6963.5, 60 sec: 6826.6, 300 sec: 6692.4). Total num frames: 164663296. Throughput: 0: 1704.3. Samples: 36160656. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:09:52,935][41694] Avg episode reward: [(0, '4.355')] +[2024-11-08 05:09:57,230][42004] Updated weights for policy 0, policy_version 40206 (0.0032) +[2024-11-08 05:09:57,932][41694] Fps is (10 sec: 6143.9, 60 sec: 6690.1, 300 sec: 6664.7). Total num frames: 164687872. Throughput: 0: 1623.8. Samples: 36168040. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:09:57,934][41694] Avg episode reward: [(0, '4.505')] +[2024-11-08 05:10:02,932][41694] Fps is (10 sec: 5324.8, 60 sec: 6553.6, 300 sec: 6636.9). Total num frames: 164716544. Throughput: 0: 1604.0. Samples: 36172576. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:10:02,934][41694] Avg episode reward: [(0, '4.541')] +[2024-11-08 05:10:03,775][42004] Updated weights for policy 0, policy_version 40216 (0.0032) +[2024-11-08 05:10:07,932][41694] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6636.9). Total num frames: 164753408. Throughput: 0: 1626.4. Samples: 36182552. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:10:07,933][41694] Avg episode reward: [(0, '4.596')] +[2024-11-08 05:10:09,447][42004] Updated weights for policy 0, policy_version 40226 (0.0029) +[2024-11-08 05:10:12,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6553.6, 300 sec: 6650.8). Total num frames: 164786176. Throughput: 0: 1668.9. Samples: 36193144. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:10:12,933][41694] Avg episode reward: [(0, '4.593')] +[2024-11-08 05:10:15,756][42004] Updated weights for policy 0, policy_version 40236 (0.0023) +[2024-11-08 05:10:17,932][41694] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6687.7). Total num frames: 164818944. Throughput: 0: 1673.0. Samples: 36198006. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:10:17,937][41694] Avg episode reward: [(0, '4.390')] +[2024-11-08 05:10:21,799][42004] Updated weights for policy 0, policy_version 40246 (0.0031) +[2024-11-08 05:10:22,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6690.1, 300 sec: 6692.5). Total num frames: 164855808. Throughput: 0: 1646.4. Samples: 36208032. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:10:22,933][41694] Avg episode reward: [(0, '4.386')] +[2024-11-08 05:10:27,433][42004] Updated weights for policy 0, policy_version 40256 (0.0041) +[2024-11-08 05:10:27,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6690.1, 300 sec: 6678.6). Total num frames: 164888576. Throughput: 0: 1648.1. Samples: 36218946. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:10:27,933][41694] Avg episode reward: [(0, '4.567')] +[2024-11-08 05:10:32,936][41694] Fps is (10 sec: 5322.6, 60 sec: 6484.9, 300 sec: 6622.9). Total num frames: 164909056. Throughput: 0: 1574.4. Samples: 36220844. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:10:32,939][41694] Avg episode reward: [(0, '4.612')] +[2024-11-08 05:10:35,622][42004] Updated weights for policy 0, policy_version 40266 (0.0023) +[2024-11-08 05:10:37,932][41694] Fps is (10 sec: 5324.7, 60 sec: 6417.1, 300 sec: 6609.1). Total num frames: 164941824. Throughput: 0: 1552.5. Samples: 36230518. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:10:37,935][41694] Avg episode reward: [(0, '4.562')] +[2024-11-08 05:10:41,286][42004] Updated weights for policy 0, policy_version 40276 (0.0051) +[2024-11-08 05:10:42,931][41694] Fps is (10 sec: 6966.1, 60 sec: 6417.1, 300 sec: 6623.0). Total num frames: 164978688. Throughput: 0: 1634.3. Samples: 36241584. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:10:42,933][41694] Avg episode reward: [(0, '4.784')] +[2024-11-08 05:10:46,861][42004] Updated weights for policy 0, policy_version 40286 (0.0029) +[2024-11-08 05:10:47,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6553.6, 300 sec: 6636.9). Total num frames: 165019648. Throughput: 0: 1652.4. Samples: 36246936. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:10:47,933][41694] Avg episode reward: [(0, '4.497')] +[2024-11-08 05:10:52,232][42004] Updated weights for policy 0, policy_version 40296 (0.0023) +[2024-11-08 05:10:52,931][41694] Fps is (10 sec: 7782.4, 60 sec: 6553.6, 300 sec: 6678.6). Total num frames: 165056512. Throughput: 0: 1685.3. Samples: 36258390. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:10:52,933][41694] Avg episode reward: [(0, '4.367')] +[2024-11-08 05:10:57,751][42004] Updated weights for policy 0, policy_version 40306 (0.0022) +[2024-11-08 05:10:57,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6758.4, 300 sec: 6692.4). Total num frames: 165093376. Throughput: 0: 1698.3. Samples: 36269570. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:10:57,935][41694] Avg episode reward: [(0, '4.578')] +[2024-11-08 05:11:04,128][41694] Fps is (10 sec: 5853.2, 60 sec: 6626.3, 300 sec: 6651.6). Total num frames: 165122048. Throughput: 0: 1667.8. Samples: 36275054. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:11:04,131][41694] Avg episode reward: [(0, '4.295')] +[2024-11-08 05:11:05,482][42004] Updated weights for policy 0, policy_version 40316 (0.0029) +[2024-11-08 05:11:07,932][41694] Fps is (10 sec: 5324.9, 60 sec: 6553.6, 300 sec: 6650.8). Total num frames: 165146624. Throughput: 0: 1642.3. Samples: 36281934. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:11:07,934][41694] Avg episode reward: [(0, '4.501')] +[2024-11-08 05:11:12,003][42004] Updated weights for policy 0, policy_version 40326 (0.0031) +[2024-11-08 05:11:12,931][41694] Fps is (10 sec: 6513.9, 60 sec: 6553.6, 300 sec: 6650.8). Total num frames: 165179392. Throughput: 0: 1612.0. Samples: 36291484. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:11:12,933][41694] Avg episode reward: [(0, '4.407')] +[2024-11-08 05:11:17,612][42004] Updated weights for policy 0, policy_version 40336 (0.0019) +[2024-11-08 05:11:17,931][41694] Fps is (10 sec: 6963.5, 60 sec: 6621.9, 300 sec: 6650.8). Total num frames: 165216256. Throughput: 0: 1683.0. Samples: 36296574. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:11:17,933][41694] Avg episode reward: [(0, '4.431')] +[2024-11-08 05:11:22,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6621.9, 300 sec: 6650.8). Total num frames: 165253120. Throughput: 0: 1724.5. Samples: 36308120. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:11:22,933][41694] Avg episode reward: [(0, '4.457')] +[2024-11-08 05:11:22,991][42004] Updated weights for policy 0, policy_version 40346 (0.0032) +[2024-11-08 05:11:27,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6690.1, 300 sec: 6692.4). Total num frames: 165289984. Throughput: 0: 1723.2. Samples: 36319128. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:11:27,935][41694] Avg episode reward: [(0, '4.408')] +[2024-11-08 05:11:28,611][42004] Updated weights for policy 0, policy_version 40356 (0.0037) +[2024-11-08 05:11:32,932][41694] Fps is (10 sec: 7782.2, 60 sec: 7031.9, 300 sec: 6720.2). Total num frames: 165330944. Throughput: 0: 1727.1. Samples: 36324656. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:11:32,934][41694] Avg episode reward: [(0, '4.617')] +[2024-11-08 05:11:33,943][42004] Updated weights for policy 0, policy_version 40366 (0.0029) +[2024-11-08 05:11:38,336][41694] Fps is (10 sec: 5905.3, 60 sec: 6781.0, 300 sec: 6669.4). Total num frames: 165351424. Throughput: 0: 1700.5. Samples: 36335600. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:11:38,341][41694] Avg episode reward: [(0, '4.553')] +[2024-11-08 05:11:38,456][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000040370_165355520.pth... +[2024-11-08 05:11:38,553][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000039982_163766272.pth +[2024-11-08 05:11:42,399][42004] Updated weights for policy 0, policy_version 40376 (0.0025) +[2024-11-08 05:11:42,932][41694] Fps is (10 sec: 4915.2, 60 sec: 6690.1, 300 sec: 6650.8). Total num frames: 165380096. Throughput: 0: 1609.9. Samples: 36342014. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:11:42,933][41694] Avg episode reward: [(0, '4.620')] +[2024-11-08 05:11:47,933][41694] Fps is (10 sec: 6828.7, 60 sec: 6621.7, 300 sec: 6650.8). Total num frames: 165416960. Throughput: 0: 1636.0. Samples: 36346720. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:11:47,935][41694] Avg episode reward: [(0, '4.627')] +[2024-11-08 05:11:48,383][42004] Updated weights for policy 0, policy_version 40386 (0.0034) +[2024-11-08 05:11:52,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6621.9, 300 sec: 6650.8). Total num frames: 165453824. Throughput: 0: 1688.0. Samples: 36357894. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:11:52,933][41694] Avg episode reward: [(0, '4.664')] +[2024-11-08 05:11:53,532][42004] Updated weights for policy 0, policy_version 40396 (0.0027) +[2024-11-08 05:11:57,931][41694] Fps is (10 sec: 7783.5, 60 sec: 6690.2, 300 sec: 6664.7). Total num frames: 165494784. Throughput: 0: 1739.6. Samples: 36369764. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:11:57,933][41694] Avg episode reward: [(0, '4.433')] +[2024-11-08 05:11:58,803][42004] Updated weights for policy 0, policy_version 40406 (0.0028) +[2024-11-08 05:12:02,932][41694] Fps is (10 sec: 7372.6, 60 sec: 6895.9, 300 sec: 6706.3). Total num frames: 165527552. Throughput: 0: 1751.9. Samples: 36375408. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:12:02,933][41694] Avg episode reward: [(0, '4.315')] +[2024-11-08 05:12:05,333][42004] Updated weights for policy 0, policy_version 40416 (0.0030) +[2024-11-08 05:12:07,934][41694] Fps is (10 sec: 6142.7, 60 sec: 6826.5, 300 sec: 6692.4). Total num frames: 165556224. Throughput: 0: 1695.9. Samples: 36384440. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:12:07,938][41694] Avg episode reward: [(0, '4.387')] +[2024-11-08 05:12:12,933][41694] Fps is (10 sec: 4915.0, 60 sec: 6621.8, 300 sec: 6636.9). Total num frames: 165576704. Throughput: 0: 1608.4. Samples: 36391506. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:12:12,935][41694] Avg episode reward: [(0, '4.447')] +[2024-11-08 05:12:13,674][42004] Updated weights for policy 0, policy_version 40426 (0.0030) +[2024-11-08 05:12:17,932][41694] Fps is (10 sec: 5325.5, 60 sec: 6553.5, 300 sec: 6623.0). Total num frames: 165609472. Throughput: 0: 1577.1. Samples: 36395626. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:12:17,935][41694] Avg episode reward: [(0, '4.456')] +[2024-11-08 05:12:20,449][42004] Updated weights for policy 0, policy_version 40436 (0.0039) +[2024-11-08 05:12:22,931][41694] Fps is (10 sec: 6553.9, 60 sec: 6485.3, 300 sec: 6609.1). Total num frames: 165642240. Throughput: 0: 1561.9. Samples: 36405256. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:12:22,933][41694] Avg episode reward: [(0, '4.471')] +[2024-11-08 05:12:25,678][42004] Updated weights for policy 0, policy_version 40446 (0.0025) +[2024-11-08 05:12:27,932][41694] Fps is (10 sec: 6963.6, 60 sec: 6485.3, 300 sec: 6595.2). Total num frames: 165679104. Throughput: 0: 1651.1. Samples: 36416312. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:12:27,934][41694] Avg episode reward: [(0, '4.552')] +[2024-11-08 05:12:31,391][42004] Updated weights for policy 0, policy_version 40456 (0.0033) +[2024-11-08 05:12:32,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6417.1, 300 sec: 6646.1). Total num frames: 165715968. Throughput: 0: 1671.3. Samples: 36421928. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:12:32,933][41694] Avg episode reward: [(0, '4.577')] +[2024-11-08 05:12:36,826][42004] Updated weights for policy 0, policy_version 40466 (0.0041) +[2024-11-08 05:12:37,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6735.5, 300 sec: 6650.8). Total num frames: 165752832. Throughput: 0: 1671.5. Samples: 36433112. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:12:37,933][41694] Avg episode reward: [(0, '4.619')] +[2024-11-08 05:12:42,385][42004] Updated weights for policy 0, policy_version 40476 (0.0028) +[2024-11-08 05:12:42,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6826.7, 300 sec: 6664.8). Total num frames: 165789696. Throughput: 0: 1659.5. Samples: 36444440. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:12:42,933][41694] Avg episode reward: [(0, '4.366')] +[2024-11-08 05:12:47,932][41694] Fps is (10 sec: 6143.6, 60 sec: 6621.9, 300 sec: 6623.0). Total num frames: 165814272. Throughput: 0: 1633.4. Samples: 36448910. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:12:47,935][41694] Avg episode reward: [(0, '4.392')] +[2024-11-08 05:12:50,071][42004] Updated weights for policy 0, policy_version 40486 (0.0041) +[2024-11-08 05:12:52,931][41694] Fps is (10 sec: 5734.5, 60 sec: 6553.6, 300 sec: 6609.1). Total num frames: 165847040. Throughput: 0: 1606.8. Samples: 36456742. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:12:52,933][41694] Avg episode reward: [(0, '4.327')] +[2024-11-08 05:12:56,391][42004] Updated weights for policy 0, policy_version 40496 (0.0028) +[2024-11-08 05:12:57,932][41694] Fps is (10 sec: 6554.0, 60 sec: 6417.0, 300 sec: 6595.3). Total num frames: 165879808. Throughput: 0: 1673.1. Samples: 36466796. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:12:57,933][41694] Avg episode reward: [(0, '4.289')] +[2024-11-08 05:13:02,358][42004] Updated weights for policy 0, policy_version 40506 (0.0024) +[2024-11-08 05:13:02,932][41694] Fps is (10 sec: 6963.0, 60 sec: 6485.3, 300 sec: 6609.1). Total num frames: 165916672. Throughput: 0: 1695.8. Samples: 36471934. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:13:02,935][41694] Avg episode reward: [(0, '4.626')] +[2024-11-08 05:13:07,685][42004] Updated weights for policy 0, policy_version 40516 (0.0032) +[2024-11-08 05:13:07,933][41694] Fps is (10 sec: 7371.9, 60 sec: 6621.9, 300 sec: 6652.6). Total num frames: 165953536. Throughput: 0: 1719.8. Samples: 36482650. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:13:07,935][41694] Avg episode reward: [(0, '4.440')] +[2024-11-08 05:13:12,932][41694] Fps is (10 sec: 7372.9, 60 sec: 6895.0, 300 sec: 6664.7). Total num frames: 165990400. Throughput: 0: 1725.2. Samples: 36493944. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:13:12,933][41694] Avg episode reward: [(0, '4.480')] +[2024-11-08 05:13:13,283][42004] Updated weights for policy 0, policy_version 40526 (0.0030) +[2024-11-08 05:13:17,931][41694] Fps is (10 sec: 6964.1, 60 sec: 6895.0, 300 sec: 6664.7). Total num frames: 166023168. Throughput: 0: 1713.7. Samples: 36499046. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:13:17,934][41694] Avg episode reward: [(0, '4.380')] +[2024-11-08 05:13:21,040][42004] Updated weights for policy 0, policy_version 40536 (0.0032) +[2024-11-08 05:13:22,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6758.4, 300 sec: 6623.0). Total num frames: 166047744. Throughput: 0: 1626.7. Samples: 36506314. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:13:22,934][41694] Avg episode reward: [(0, '4.449')] +[2024-11-08 05:13:27,381][42004] Updated weights for policy 0, policy_version 40546 (0.0046) +[2024-11-08 05:13:27,932][41694] Fps is (10 sec: 5324.7, 60 sec: 6621.9, 300 sec: 6595.3). Total num frames: 166076416. Throughput: 0: 1591.8. Samples: 36516070. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:13:27,933][41694] Avg episode reward: [(0, '4.480')] +[2024-11-08 05:13:32,800][42004] Updated weights for policy 0, policy_version 40556 (0.0030) +[2024-11-08 05:13:32,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6690.1, 300 sec: 6623.0). Total num frames: 166117376. Throughput: 0: 1609.5. Samples: 36521338. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:13:32,933][41694] Avg episode reward: [(0, '4.431')] +[2024-11-08 05:13:37,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6690.1, 300 sec: 6623.0). Total num frames: 166154240. Throughput: 0: 1687.5. Samples: 36532678. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:13:37,934][41694] Avg episode reward: [(0, '4.416')] +[2024-11-08 05:13:37,946][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000040565_166154240.pth... +[2024-11-08 05:13:38,037][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000040176_164560896.pth +[2024-11-08 05:13:38,347][42004] Updated weights for policy 0, policy_version 40566 (0.0026) +[2024-11-08 05:13:42,932][41694] Fps is (10 sec: 7372.2, 60 sec: 6690.0, 300 sec: 6678.6). Total num frames: 166191104. Throughput: 0: 1721.1. Samples: 36544248. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:13:42,935][41694] Avg episode reward: [(0, '4.199')] +[2024-11-08 05:13:43,753][42004] Updated weights for policy 0, policy_version 40576 (0.0034) +[2024-11-08 05:13:47,932][41694] Fps is (10 sec: 7782.5, 60 sec: 6963.3, 300 sec: 6706.3). Total num frames: 166232064. Throughput: 0: 1731.2. Samples: 36549838. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:13:47,935][41694] Avg episode reward: [(0, '4.452')] +[2024-11-08 05:13:48,948][42004] Updated weights for policy 0, policy_version 40586 (0.0032) +[2024-11-08 05:13:54,815][41694] Fps is (10 sec: 6204.8, 60 sec: 6751.3, 300 sec: 6663.8). Total num frames: 166264832. Throughput: 0: 1663.7. Samples: 36560648. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:13:54,818][41694] Avg episode reward: [(0, '4.302')] +[2024-11-08 05:13:57,462][42004] Updated weights for policy 0, policy_version 40596 (0.0034) +[2024-11-08 05:13:57,932][41694] Fps is (10 sec: 4915.2, 60 sec: 6690.1, 300 sec: 6636.9). Total num frames: 166281216. Throughput: 0: 1630.5. Samples: 36567316. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:13:57,934][41694] Avg episode reward: [(0, '4.255')] +[2024-11-08 05:14:02,931][41694] Fps is (10 sec: 6055.6, 60 sec: 6621.9, 300 sec: 6623.0). Total num frames: 166313984. Throughput: 0: 1614.3. Samples: 36571690. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:14:02,934][41694] Avg episode reward: [(0, '4.341')] +[2024-11-08 05:14:03,872][42004] Updated weights for policy 0, policy_version 40606 (0.0029) +[2024-11-08 05:14:07,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6622.0, 300 sec: 6636.9). Total num frames: 166350848. Throughput: 0: 1673.9. Samples: 36581640. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:14:07,934][41694] Avg episode reward: [(0, '4.597')] +[2024-11-08 05:14:09,409][42004] Updated weights for policy 0, policy_version 40616 (0.0020) +[2024-11-08 05:14:12,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6621.9, 300 sec: 6623.0). Total num frames: 166387712. Throughput: 0: 1702.1. Samples: 36592664. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:14:12,933][41694] Avg episode reward: [(0, '4.445')] +[2024-11-08 05:14:15,133][42004] Updated weights for policy 0, policy_version 40626 (0.0035) +[2024-11-08 05:14:17,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6690.1, 300 sec: 6678.6). Total num frames: 166424576. Throughput: 0: 1710.5. Samples: 36598312. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:14:17,933][41694] Avg episode reward: [(0, '4.276')] +[2024-11-08 05:14:20,531][42004] Updated weights for policy 0, policy_version 40636 (0.0025) +[2024-11-08 05:14:22,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6895.0, 300 sec: 6692.5). Total num frames: 166461440. Throughput: 0: 1715.7. Samples: 36609882. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:14:22,933][41694] Avg episode reward: [(0, '4.322')] +[2024-11-08 05:14:26,284][42004] Updated weights for policy 0, policy_version 40646 (0.0027) +[2024-11-08 05:14:28,998][41694] Fps is (10 sec: 5922.0, 60 sec: 6774.5, 300 sec: 6654.5). Total num frames: 166490112. Throughput: 0: 1656.4. Samples: 36620552. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:14:29,000][41694] Avg episode reward: [(0, '4.338')] +[2024-11-08 05:14:32,932][41694] Fps is (10 sec: 5324.7, 60 sec: 6621.9, 300 sec: 6636.9). Total num frames: 166514688. Throughput: 0: 1614.4. Samples: 36622486. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:14:32,934][41694] Avg episode reward: [(0, '4.544')] +[2024-11-08 05:14:34,322][42004] Updated weights for policy 0, policy_version 40656 (0.0027) +[2024-11-08 05:14:37,931][41694] Fps is (10 sec: 6877.5, 60 sec: 6621.9, 300 sec: 6636.9). Total num frames: 166551552. Throughput: 0: 1647.9. Samples: 36631702. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:14:37,936][41694] Avg episode reward: [(0, '4.446')] +[2024-11-08 05:14:40,071][42004] Updated weights for policy 0, policy_version 40666 (0.0039) +[2024-11-08 05:14:42,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6621.9, 300 sec: 6650.8). Total num frames: 166588416. Throughput: 0: 1687.2. Samples: 36643240. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:14:42,933][41694] Avg episode reward: [(0, '4.572')] +[2024-11-08 05:14:45,481][42004] Updated weights for policy 0, policy_version 40676 (0.0031) +[2024-11-08 05:14:47,932][41694] Fps is (10 sec: 7372.6, 60 sec: 6553.6, 300 sec: 6650.8). Total num frames: 166625280. Throughput: 0: 1712.3. Samples: 36648746. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:14:47,934][41694] Avg episode reward: [(0, '4.647')] +[2024-11-08 05:14:50,973][42004] Updated weights for policy 0, policy_version 40686 (0.0036) +[2024-11-08 05:14:52,932][41694] Fps is (10 sec: 6963.3, 60 sec: 6766.0, 300 sec: 6678.6). Total num frames: 166658048. Throughput: 0: 1739.8. Samples: 36659930. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:14:52,933][41694] Avg episode reward: [(0, '4.270')] +[2024-11-08 05:14:56,876][42004] Updated weights for policy 0, policy_version 40696 (0.0027) +[2024-11-08 05:14:57,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6963.2, 300 sec: 6720.2). Total num frames: 166699008. Throughput: 0: 1731.5. Samples: 36670580. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:14:57,933][41694] Avg episode reward: [(0, '4.424')] +[2024-11-08 05:15:03,218][41694] Fps is (10 sec: 5972.8, 60 sec: 6726.3, 300 sec: 6658.2). Total num frames: 166719488. Throughput: 0: 1707.9. Samples: 36675658. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:15:03,221][41694] Avg episode reward: [(0, '4.467')] +[2024-11-08 05:15:05,053][42004] Updated weights for policy 0, policy_version 40706 (0.0031) +[2024-11-08 05:15:07,931][41694] Fps is (10 sec: 4915.2, 60 sec: 6621.9, 300 sec: 6650.8). Total num frames: 166748160. Throughput: 0: 1602.9. Samples: 36682012. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:15:07,934][41694] Avg episode reward: [(0, '4.587')] +[2024-11-08 05:15:11,336][42004] Updated weights for policy 0, policy_version 40716 (0.0034) +[2024-11-08 05:15:12,931][41694] Fps is (10 sec: 6325.4, 60 sec: 6553.6, 300 sec: 6650.8). Total num frames: 166780928. Throughput: 0: 1630.2. Samples: 36692172. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:15:12,933][41694] Avg episode reward: [(0, '4.470')] +[2024-11-08 05:15:17,366][42004] Updated weights for policy 0, policy_version 40726 (0.0035) +[2024-11-08 05:15:17,933][41694] Fps is (10 sec: 6962.4, 60 sec: 6553.5, 300 sec: 6650.8). Total num frames: 166817792. Throughput: 0: 1658.6. Samples: 36697126. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:15:17,935][41694] Avg episode reward: [(0, '4.342')] +[2024-11-08 05:15:22,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6485.3, 300 sec: 6650.8). Total num frames: 166850560. Throughput: 0: 1676.9. Samples: 36707162. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:15:22,934][41694] Avg episode reward: [(0, '4.506')] +[2024-11-08 05:15:23,019][42004] Updated weights for policy 0, policy_version 40736 (0.0023) +[2024-11-08 05:15:27,932][41694] Fps is (10 sec: 7373.5, 60 sec: 6811.2, 300 sec: 6720.3). Total num frames: 166891520. Throughput: 0: 1674.5. Samples: 36718592. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:15:27,937][41694] Avg episode reward: [(0, '4.416')] +[2024-11-08 05:15:28,473][42004] Updated weights for policy 0, policy_version 40746 (0.0022) +[2024-11-08 05:15:32,932][41694] Fps is (10 sec: 7782.2, 60 sec: 6894.9, 300 sec: 6734.1). Total num frames: 166928384. Throughput: 0: 1676.4. Samples: 36724184. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:15:32,936][41694] Avg episode reward: [(0, '4.397')] +[2024-11-08 05:15:33,930][42004] Updated weights for policy 0, policy_version 40756 (0.0029) +[2024-11-08 05:15:37,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6621.8, 300 sec: 6678.6). Total num frames: 166948864. Throughput: 0: 1645.5. Samples: 36733976. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:15:37,934][41694] Avg episode reward: [(0, '4.291')] +[2024-11-08 05:15:37,953][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000040759_166948864.pth... +[2024-11-08 05:15:38,101][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000040370_165355520.pth +[2024-11-08 05:15:42,240][42004] Updated weights for policy 0, policy_version 40766 (0.0032) +[2024-11-08 05:15:42,932][41694] Fps is (10 sec: 4915.2, 60 sec: 6485.3, 300 sec: 6636.9). Total num frames: 166977536. Throughput: 0: 1577.8. Samples: 36741582. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:15:42,934][41694] Avg episode reward: [(0, '4.622')] +[2024-11-08 05:15:47,932][41694] Fps is (10 sec: 6553.4, 60 sec: 6485.3, 300 sec: 6636.9). Total num frames: 167014400. Throughput: 0: 1577.7. Samples: 36746204. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:15:47,934][41694] Avg episode reward: [(0, '4.859')] +[2024-11-08 05:15:48,000][42004] Updated weights for policy 0, policy_version 40776 (0.0043) +[2024-11-08 05:15:52,931][41694] Fps is (10 sec: 7782.7, 60 sec: 6621.9, 300 sec: 6650.8). Total num frames: 167055360. Throughput: 0: 1690.9. Samples: 36758102. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:15:52,933][41694] Avg episode reward: [(0, '4.425')] +[2024-11-08 05:15:53,222][42004] Updated weights for policy 0, policy_version 40786 (0.0027) +[2024-11-08 05:15:57,931][41694] Fps is (10 sec: 8192.4, 60 sec: 6621.9, 300 sec: 6719.7). Total num frames: 167096320. Throughput: 0: 1728.0. Samples: 36769932. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:15:57,933][41694] Avg episode reward: [(0, '4.554')] +[2024-11-08 05:15:58,407][42004] Updated weights for policy 0, policy_version 40796 (0.0028) +[2024-11-08 05:16:02,931][41694] Fps is (10 sec: 7782.4, 60 sec: 6928.0, 300 sec: 6734.1). Total num frames: 167133184. Throughput: 0: 1747.2. Samples: 36775748. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:16:02,934][41694] Avg episode reward: [(0, '4.255')] +[2024-11-08 05:16:03,844][42004] Updated weights for policy 0, policy_version 40806 (0.0027) +[2024-11-08 05:16:07,932][41694] Fps is (10 sec: 6963.0, 60 sec: 6963.2, 300 sec: 6734.1). Total num frames: 167165952. Throughput: 0: 1763.8. Samples: 36786532. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:16:07,934][41694] Avg episode reward: [(0, '4.569')] +[2024-11-08 05:16:12,211][42004] Updated weights for policy 0, policy_version 40816 (0.0030) +[2024-11-08 05:16:12,932][41694] Fps is (10 sec: 4915.2, 60 sec: 6690.1, 300 sec: 6664.7). Total num frames: 167182336. Throughput: 0: 1649.2. Samples: 36792806. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:16:12,933][41694] Avg episode reward: [(0, '4.496')] +[2024-11-08 05:16:17,931][41694] Fps is (10 sec: 4505.7, 60 sec: 6553.7, 300 sec: 6636.9). Total num frames: 167211008. Throughput: 0: 1611.5. Samples: 36796700. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:16:17,933][41694] Avg episode reward: [(0, '4.397')] +[2024-11-08 05:16:19,521][42004] Updated weights for policy 0, policy_version 40826 (0.0030) +[2024-11-08 05:16:22,935][41694] Fps is (10 sec: 6551.6, 60 sec: 6621.5, 300 sec: 6636.8). Total num frames: 167247872. Throughput: 0: 1604.9. Samples: 36806200. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:16:22,937][41694] Avg episode reward: [(0, '4.473')] +[2024-11-08 05:16:25,066][42004] Updated weights for policy 0, policy_version 40836 (0.0022) +[2024-11-08 05:16:27,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6553.6, 300 sec: 6623.0). Total num frames: 167284736. Throughput: 0: 1677.6. Samples: 36817072. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:16:27,933][41694] Avg episode reward: [(0, '4.446')] +[2024-11-08 05:16:30,484][42004] Updated weights for policy 0, policy_version 40846 (0.0023) +[2024-11-08 05:16:32,932][41694] Fps is (10 sec: 7375.0, 60 sec: 6553.6, 300 sec: 6687.7). Total num frames: 167321600. Throughput: 0: 1703.7. Samples: 36822870. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:16:32,934][41694] Avg episode reward: [(0, '4.401')] +[2024-11-08 05:16:36,026][42004] Updated weights for policy 0, policy_version 40856 (0.0028) +[2024-11-08 05:16:37,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6826.7, 300 sec: 6706.3). Total num frames: 167358464. Throughput: 0: 1686.0. Samples: 36833972. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:16:37,933][41694] Avg episode reward: [(0, '4.338')] +[2024-11-08 05:16:41,970][42004] Updated weights for policy 0, policy_version 40866 (0.0026) +[2024-11-08 05:16:42,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6894.9, 300 sec: 6692.5). Total num frames: 167391232. Throughput: 0: 1653.7. Samples: 36844348. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:16:42,934][41694] Avg episode reward: [(0, '4.477')] +[2024-11-08 05:16:47,931][41694] Fps is (10 sec: 5734.5, 60 sec: 6690.2, 300 sec: 6650.8). Total num frames: 167415808. Throughput: 0: 1586.7. Samples: 36847150. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:16:47,934][41694] Avg episode reward: [(0, '4.563')] +[2024-11-08 05:16:49,769][42004] Updated weights for policy 0, policy_version 40876 (0.0039) +[2024-11-08 05:16:52,932][41694] Fps is (10 sec: 5734.5, 60 sec: 6553.6, 300 sec: 6623.0). Total num frames: 167448576. Throughput: 0: 1553.5. Samples: 36856440. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:16:52,934][41694] Avg episode reward: [(0, '4.639')] +[2024-11-08 05:16:55,452][42004] Updated weights for policy 0, policy_version 40886 (0.0026) +[2024-11-08 05:16:57,932][41694] Fps is (10 sec: 6962.9, 60 sec: 6485.3, 300 sec: 6636.9). Total num frames: 167485440. Throughput: 0: 1661.1. Samples: 36867554. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:16:57,933][41694] Avg episode reward: [(0, '4.520')] +[2024-11-08 05:17:00,929][42004] Updated weights for policy 0, policy_version 40896 (0.0030) +[2024-11-08 05:17:02,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6485.3, 300 sec: 6664.7). Total num frames: 167522304. Throughput: 0: 1700.9. Samples: 36873240. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:17:02,933][41694] Avg episode reward: [(0, '4.444')] +[2024-11-08 05:17:06,634][42004] Updated weights for policy 0, policy_version 40906 (0.0036) +[2024-11-08 05:17:07,932][41694] Fps is (10 sec: 7372.9, 60 sec: 6553.6, 300 sec: 6720.2). Total num frames: 167559168. Throughput: 0: 1733.2. Samples: 36884190. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:17:07,934][41694] Avg episode reward: [(0, '4.682')] +[2024-11-08 05:17:12,159][42004] Updated weights for policy 0, policy_version 40916 (0.0033) +[2024-11-08 05:17:12,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6894.9, 300 sec: 6734.1). Total num frames: 167596032. Throughput: 0: 1741.1. Samples: 36895420. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:17:12,933][41694] Avg episode reward: [(0, '4.481')] +[2024-11-08 05:17:19,370][41694] Fps is (10 sec: 6087.3, 60 sec: 6800.1, 300 sec: 6701.4). Total num frames: 167628800. Throughput: 0: 1671.5. Samples: 36900492. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:17:19,373][41694] Avg episode reward: [(0, '4.326')] +[2024-11-08 05:17:19,837][42004] Updated weights for policy 0, policy_version 40926 (0.0026) +[2024-11-08 05:17:22,932][41694] Fps is (10 sec: 5324.8, 60 sec: 6690.5, 300 sec: 6678.6). Total num frames: 167649280. Throughput: 0: 1634.6. Samples: 36907528. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:17:22,934][41694] Avg episode reward: [(0, '4.442')] +[2024-11-08 05:17:26,238][42004] Updated weights for policy 0, policy_version 40936 (0.0033) +[2024-11-08 05:17:27,931][41694] Fps is (10 sec: 6698.4, 60 sec: 6690.1, 300 sec: 6678.6). Total num frames: 167686144. Throughput: 0: 1627.4. Samples: 36917582. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:17:27,933][41694] Avg episode reward: [(0, '4.416')] +[2024-11-08 05:17:31,640][42004] Updated weights for policy 0, policy_version 40946 (0.0028) +[2024-11-08 05:17:32,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6690.1, 300 sec: 6678.6). Total num frames: 167723008. Throughput: 0: 1686.7. Samples: 36923054. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:17:32,934][41694] Avg episode reward: [(0, '4.448')] +[2024-11-08 05:17:37,108][42004] Updated weights for policy 0, policy_version 40956 (0.0023) +[2024-11-08 05:17:37,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6690.1, 300 sec: 6678.6). Total num frames: 167759872. Throughput: 0: 1729.2. Samples: 36934252. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:17:37,934][41694] Avg episode reward: [(0, '4.339')] +[2024-11-08 05:17:37,946][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000040957_167759872.pth... +[2024-11-08 05:17:38,091][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000040565_166154240.pth +[2024-11-08 05:17:42,705][42004] Updated weights for policy 0, policy_version 40966 (0.0023) +[2024-11-08 05:17:42,933][41694] Fps is (10 sec: 7372.1, 60 sec: 6758.3, 300 sec: 6720.2). Total num frames: 167796736. Throughput: 0: 1728.1. Samples: 36945318. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:17:42,936][41694] Avg episode reward: [(0, '4.372')] +[2024-11-08 05:17:47,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6963.2, 300 sec: 6734.1). Total num frames: 167833600. Throughput: 0: 1726.4. Samples: 36950926. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:17:47,933][41694] Avg episode reward: [(0, '4.556')] +[2024-11-08 05:17:48,149][42004] Updated weights for policy 0, policy_version 40976 (0.0029) +[2024-11-08 05:17:53,248][41694] Fps is (10 sec: 6353.3, 60 sec: 6858.8, 300 sec: 6713.0). Total num frames: 167862272. Throughput: 0: 1717.5. Samples: 36962020. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:17:53,250][41694] Avg episode reward: [(0, '4.325')] +[2024-11-08 05:17:55,469][42004] Updated weights for policy 0, policy_version 40986 (0.0046) +[2024-11-08 05:17:57,931][41694] Fps is (10 sec: 6144.0, 60 sec: 6826.7, 300 sec: 6706.3). Total num frames: 167895040. Throughput: 0: 1655.6. Samples: 36969922. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:17:57,933][41694] Avg episode reward: [(0, '4.287')] +[2024-11-08 05:18:01,033][42004] Updated weights for policy 0, policy_version 40996 (0.0024) +[2024-11-08 05:18:02,932][41694] Fps is (10 sec: 6767.4, 60 sec: 6758.4, 300 sec: 6692.5). Total num frames: 167927808. Throughput: 0: 1720.4. Samples: 36975436. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:18:02,934][41694] Avg episode reward: [(0, '4.491')] +[2024-11-08 05:18:07,567][42004] Updated weights for policy 0, policy_version 41006 (0.0030) +[2024-11-08 05:18:07,932][41694] Fps is (10 sec: 6553.2, 60 sec: 6690.1, 300 sec: 6678.6). Total num frames: 167960576. Throughput: 0: 1725.0. Samples: 36985154. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:18:07,934][41694] Avg episode reward: [(0, '4.372')] +[2024-11-08 05:18:12,931][41694] Fps is (10 sec: 6963.5, 60 sec: 6690.1, 300 sec: 6692.4). Total num frames: 167997440. Throughput: 0: 1741.6. Samples: 36995952. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:18:12,933][41694] Avg episode reward: [(0, '4.464')] +[2024-11-08 05:18:13,043][42004] Updated weights for policy 0, policy_version 41016 (0.0027) +[2024-11-08 05:18:17,932][41694] Fps is (10 sec: 7373.1, 60 sec: 6924.5, 300 sec: 6734.1). Total num frames: 168034304. Throughput: 0: 1738.5. Samples: 37001286. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:18:17,934][41694] Avg episode reward: [(0, '4.337')] +[2024-11-08 05:18:18,542][42004] Updated weights for policy 0, policy_version 41026 (0.0023) +[2024-11-08 05:18:22,932][41694] Fps is (10 sec: 7372.7, 60 sec: 7031.5, 300 sec: 6761.9). Total num frames: 168071168. Throughput: 0: 1744.5. Samples: 37012756. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:18:22,936][41694] Avg episode reward: [(0, '4.395')] +[2024-11-08 05:18:24,011][42004] Updated weights for policy 0, policy_version 41036 (0.0039) +[2024-11-08 05:18:27,932][41694] Fps is (10 sec: 6144.0, 60 sec: 6826.7, 300 sec: 6706.3). Total num frames: 168095744. Throughput: 0: 1673.2. Samples: 37020612. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:18:27,933][41694] Avg episode reward: [(0, '4.302')] +[2024-11-08 05:18:31,516][42004] Updated weights for policy 0, policy_version 41046 (0.0030) +[2024-11-08 05:18:32,936][41694] Fps is (10 sec: 6141.4, 60 sec: 6826.2, 300 sec: 6706.2). Total num frames: 168132608. Throughput: 0: 1668.1. Samples: 37025996. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:18:32,952][41694] Avg episode reward: [(0, '4.488')] +[2024-11-08 05:18:37,697][42004] Updated weights for policy 0, policy_version 41056 (0.0030) +[2024-11-08 05:18:37,932][41694] Fps is (10 sec: 6962.8, 60 sec: 6758.3, 300 sec: 6692.5). Total num frames: 168165376. Throughput: 0: 1656.6. Samples: 37036046. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:18:37,935][41694] Avg episode reward: [(0, '4.446')] +[2024-11-08 05:18:42,931][41694] Fps is (10 sec: 6556.5, 60 sec: 6690.3, 300 sec: 6664.7). Total num frames: 168198144. Throughput: 0: 1695.6. Samples: 37046222. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:18:42,933][41694] Avg episode reward: [(0, '4.630')] +[2024-11-08 05:18:43,490][42004] Updated weights for policy 0, policy_version 41066 (0.0024) +[2024-11-08 05:18:47,932][41694] Fps is (10 sec: 7373.1, 60 sec: 6758.4, 300 sec: 6735.4). Total num frames: 168239104. Throughput: 0: 1697.7. Samples: 37051832. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:18:47,933][41694] Avg episode reward: [(0, '4.235')] +[2024-11-08 05:18:48,885][42004] Updated weights for policy 0, policy_version 41076 (0.0030) +[2024-11-08 05:18:52,932][41694] Fps is (10 sec: 7782.3, 60 sec: 6931.5, 300 sec: 6761.9). Total num frames: 168275968. Throughput: 0: 1731.1. Samples: 37063052. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:18:52,933][41694] Avg episode reward: [(0, '4.535')] +[2024-11-08 05:18:54,582][42004] Updated weights for policy 0, policy_version 41086 (0.0037) +[2024-11-08 05:18:57,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6894.9, 300 sec: 6761.9). Total num frames: 168308736. Throughput: 0: 1728.7. Samples: 37073744. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:18:57,933][41694] Avg episode reward: [(0, '4.615')] +[2024-11-08 05:19:02,454][42004] Updated weights for policy 0, policy_version 41096 (0.0034) +[2024-11-08 05:19:02,932][41694] Fps is (10 sec: 5324.8, 60 sec: 6690.2, 300 sec: 6706.3). Total num frames: 168329216. Throughput: 0: 1696.1. Samples: 37077612. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:19:02,935][41694] Avg episode reward: [(0, '4.676')] +[2024-11-08 05:19:07,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6758.4, 300 sec: 6706.3). Total num frames: 168366080. Throughput: 0: 1626.6. Samples: 37085952. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:19:07,935][41694] Avg episode reward: [(0, '4.375')] +[2024-11-08 05:19:08,167][42004] Updated weights for policy 0, policy_version 41106 (0.0024) +[2024-11-08 05:19:12,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6758.4, 300 sec: 6706.3). Total num frames: 168402944. Throughput: 0: 1686.4. Samples: 37096500. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:19:12,933][41694] Avg episode reward: [(0, '4.700')] +[2024-11-08 05:19:14,188][42004] Updated weights for policy 0, policy_version 41116 (0.0022) +[2024-11-08 05:19:17,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6690.1, 300 sec: 6692.4). Total num frames: 168435712. Throughput: 0: 1680.5. Samples: 37101612. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:19:17,934][41694] Avg episode reward: [(0, '4.309')] +[2024-11-08 05:19:19,848][42004] Updated weights for policy 0, policy_version 41126 (0.0035) +[2024-11-08 05:19:22,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6690.1, 300 sec: 6744.6). Total num frames: 168472576. Throughput: 0: 1708.0. Samples: 37112906. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:19:22,933][41694] Avg episode reward: [(0, '4.440')] +[2024-11-08 05:19:25,316][42004] Updated weights for policy 0, policy_version 41136 (0.0035) +[2024-11-08 05:19:27,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6894.9, 300 sec: 6761.9). Total num frames: 168509440. Throughput: 0: 1720.1. Samples: 37123626. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:19:27,935][41694] Avg episode reward: [(0, '4.593')] +[2024-11-08 05:19:31,077][42004] Updated weights for policy 0, policy_version 41146 (0.0033) +[2024-11-08 05:19:32,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6895.4, 300 sec: 6761.9). Total num frames: 168546304. Throughput: 0: 1715.7. Samples: 37129038. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:19:32,933][41694] Avg episode reward: [(0, '4.563')] +[2024-11-08 05:19:37,932][41694] Fps is (10 sec: 6143.9, 60 sec: 6758.4, 300 sec: 6720.2). Total num frames: 168570880. Throughput: 0: 1641.9. Samples: 37136938. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:19:37,933][41694] Avg episode reward: [(0, '4.355')] +[2024-11-08 05:19:37,943][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000041155_168570880.pth... +[2024-11-08 05:19:38,051][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000040759_166948864.pth +[2024-11-08 05:19:38,219][42004] Updated weights for policy 0, policy_version 41156 (0.0031) +[2024-11-08 05:19:42,932][41694] Fps is (10 sec: 6144.0, 60 sec: 6826.7, 300 sec: 6720.2). Total num frames: 168607744. Throughput: 0: 1651.4. Samples: 37148056. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:19:42,933][41694] Avg episode reward: [(0, '4.479')] +[2024-11-08 05:19:44,044][42004] Updated weights for policy 0, policy_version 41166 (0.0026) +[2024-11-08 05:19:47,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6690.1, 300 sec: 6720.2). Total num frames: 168640512. Throughput: 0: 1672.6. Samples: 37152878. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:19:47,934][41694] Avg episode reward: [(0, '4.545')] +[2024-11-08 05:19:50,253][42004] Updated weights for policy 0, policy_version 41176 (0.0023) +[2024-11-08 05:19:52,932][41694] Fps is (10 sec: 6553.6, 60 sec: 6621.9, 300 sec: 6692.4). Total num frames: 168673280. Throughput: 0: 1718.2. Samples: 37163272. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:19:52,934][41694] Avg episode reward: [(0, '4.513')] +[2024-11-08 05:19:55,675][42004] Updated weights for policy 0, policy_version 41186 (0.0028) +[2024-11-08 05:19:57,932][41694] Fps is (10 sec: 6963.3, 60 sec: 6690.1, 300 sec: 6754.5). Total num frames: 168710144. Throughput: 0: 1732.6. Samples: 37174466. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:19:57,934][41694] Avg episode reward: [(0, '4.596')] +[2024-11-08 05:20:01,404][42004] Updated weights for policy 0, policy_version 41196 (0.0034) +[2024-11-08 05:20:02,933][41694] Fps is (10 sec: 7372.1, 60 sec: 6963.1, 300 sec: 6775.7). Total num frames: 168747008. Throughput: 0: 1740.6. Samples: 37179940. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:20:02,935][41694] Avg episode reward: [(0, '4.420')] +[2024-11-08 05:20:07,138][42004] Updated weights for policy 0, policy_version 41206 (0.0028) +[2024-11-08 05:20:08,895][41694] Fps is (10 sec: 6351.1, 60 sec: 6785.9, 300 sec: 6753.7). Total num frames: 168779776. Throughput: 0: 1685.9. Samples: 37190396. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:20:08,897][41694] Avg episode reward: [(0, '4.415')] +[2024-11-08 05:20:12,932][41694] Fps is (10 sec: 6144.6, 60 sec: 6758.4, 300 sec: 6748.0). Total num frames: 168808448. Throughput: 0: 1665.6. Samples: 37198576. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:20:12,933][41694] Avg episode reward: [(0, '4.321')] +[2024-11-08 05:20:14,519][42004] Updated weights for policy 0, policy_version 41216 (0.0033) +[2024-11-08 05:20:17,932][41694] Fps is (10 sec: 6345.9, 60 sec: 6690.1, 300 sec: 6734.1). Total num frames: 168837120. Throughput: 0: 1655.2. Samples: 37203522. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:20:17,934][41694] Avg episode reward: [(0, '4.355')] +[2024-11-08 05:20:21,419][42004] Updated weights for policy 0, policy_version 41226 (0.0051) +[2024-11-08 05:20:22,931][41694] Fps is (10 sec: 6144.1, 60 sec: 6621.9, 300 sec: 6706.3). Total num frames: 168869888. Throughput: 0: 1670.1. Samples: 37212092. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:20:22,933][41694] Avg episode reward: [(0, '4.404')] +[2024-11-08 05:20:27,455][42004] Updated weights for policy 0, policy_version 41236 (0.0026) +[2024-11-08 05:20:27,932][41694] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6692.4). Total num frames: 168902656. Throughput: 0: 1649.7. Samples: 37222292. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:20:27,934][41694] Avg episode reward: [(0, '4.301')] +[2024-11-08 05:20:32,856][42004] Updated weights for policy 0, policy_version 41246 (0.0027) +[2024-11-08 05:20:32,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6621.9, 300 sec: 6761.9). Total num frames: 168943616. Throughput: 0: 1669.9. Samples: 37228024. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:20:32,935][41694] Avg episode reward: [(0, '4.490')] +[2024-11-08 05:20:37,931][41694] Fps is (10 sec: 7782.6, 60 sec: 6826.7, 300 sec: 6789.6). Total num frames: 168980480. Throughput: 0: 1695.0. Samples: 37239546. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:20:37,933][41694] Avg episode reward: [(0, '4.310')] +[2024-11-08 05:20:38,153][42004] Updated weights for policy 0, policy_version 41256 (0.0029) +[2024-11-08 05:20:42,934][41694] Fps is (10 sec: 6142.8, 60 sec: 6621.7, 300 sec: 6747.9). Total num frames: 169005056. Throughput: 0: 1657.5. Samples: 37249056. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:20:42,936][41694] Avg episode reward: [(0, '4.402')] +[2024-11-08 05:20:45,755][42004] Updated weights for policy 0, policy_version 41266 (0.0026) +[2024-11-08 05:20:47,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6621.9, 300 sec: 6720.2). Total num frames: 169037824. Throughput: 0: 1613.3. Samples: 37252538. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:20:47,933][41694] Avg episode reward: [(0, '4.277')] +[2024-11-08 05:20:51,383][42004] Updated weights for policy 0, policy_version 41276 (0.0027) +[2024-11-08 05:20:52,932][41694] Fps is (10 sec: 6964.2, 60 sec: 6690.1, 300 sec: 6706.3). Total num frames: 169074688. Throughput: 0: 1662.2. Samples: 37263594. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:20:52,935][41694] Avg episode reward: [(0, '4.351')] +[2024-11-08 05:20:57,019][42004] Updated weights for policy 0, policy_version 41286 (0.0024) +[2024-11-08 05:20:57,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6690.1, 300 sec: 6706.3). Total num frames: 169111552. Throughput: 0: 1684.2. Samples: 37274366. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:20:57,934][41694] Avg episode reward: [(0, '4.444')] +[2024-11-08 05:21:02,932][41694] Fps is (10 sec: 6963.4, 60 sec: 6622.0, 300 sec: 6706.3). Total num frames: 169144320. Throughput: 0: 1683.6. Samples: 37279286. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:21:02,934][41694] Avg episode reward: [(0, '4.379')] +[2024-11-08 05:21:03,088][42004] Updated weights for policy 0, policy_version 41296 (0.0039) +[2024-11-08 05:21:07,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6799.4, 300 sec: 6775.8). Total num frames: 169181184. Throughput: 0: 1732.6. Samples: 37290058. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:21:07,933][41694] Avg episode reward: [(0, '4.613')] +[2024-11-08 05:21:08,510][42004] Updated weights for policy 0, policy_version 41306 (0.0033) +[2024-11-08 05:21:12,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6758.4, 300 sec: 6789.6). Total num frames: 169213952. Throughput: 0: 1725.5. Samples: 37299940. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:21:12,933][41694] Avg episode reward: [(0, '4.331')] +[2024-11-08 05:21:15,011][42004] Updated weights for policy 0, policy_version 41316 (0.0039) +[2024-11-08 05:21:17,931][41694] Fps is (10 sec: 5734.4, 60 sec: 6690.2, 300 sec: 6748.1). Total num frames: 169238528. Throughput: 0: 1714.2. Samples: 37305162. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:21:17,933][41694] Avg episode reward: [(0, '4.472')] +[2024-11-08 05:21:22,205][42004] Updated weights for policy 0, policy_version 41326 (0.0027) +[2024-11-08 05:21:22,931][41694] Fps is (10 sec: 6144.0, 60 sec: 6758.4, 300 sec: 6748.0). Total num frames: 169275392. Throughput: 0: 1632.4. Samples: 37313004. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:21:22,934][41694] Avg episode reward: [(0, '4.564')] +[2024-11-08 05:21:27,642][42004] Updated weights for policy 0, policy_version 41336 (0.0033) +[2024-11-08 05:21:27,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6826.7, 300 sec: 6748.0). Total num frames: 169312256. Throughput: 0: 1671.2. Samples: 37324256. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:21:27,935][41694] Avg episode reward: [(0, '4.364')] +[2024-11-08 05:21:32,933][41694] Fps is (10 sec: 7372.2, 60 sec: 6758.3, 300 sec: 6748.0). Total num frames: 169349120. Throughput: 0: 1714.1. Samples: 37329674. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:21:32,935][41694] Avg episode reward: [(0, '4.360')] +[2024-11-08 05:21:33,455][42004] Updated weights for policy 0, policy_version 41346 (0.0036) +[2024-11-08 05:21:37,932][41694] Fps is (10 sec: 7372.6, 60 sec: 6758.3, 300 sec: 6761.9). Total num frames: 169385984. Throughput: 0: 1709.8. Samples: 37340534. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:21:37,934][41694] Avg episode reward: [(0, '4.387')] +[2024-11-08 05:21:37,952][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000041354_169385984.pth... +[2024-11-08 05:21:38,072][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000040957_167759872.pth +[2024-11-08 05:21:38,954][42004] Updated weights for policy 0, policy_version 41356 (0.0029) +[2024-11-08 05:21:42,932][41694] Fps is (10 sec: 6963.7, 60 sec: 6895.2, 300 sec: 6789.6). Total num frames: 169418752. Throughput: 0: 1713.7. Samples: 37351484. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:21:42,933][41694] Avg episode reward: [(0, '4.382')] +[2024-11-08 05:21:44,699][42004] Updated weights for policy 0, policy_version 41366 (0.0031) +[2024-11-08 05:21:47,931][41694] Fps is (10 sec: 6963.6, 60 sec: 6963.2, 300 sec: 6803.5). Total num frames: 169455616. Throughput: 0: 1727.1. Samples: 37357004. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:21:47,934][41694] Avg episode reward: [(0, '4.514')] +[2024-11-08 05:21:51,949][42004] Updated weights for policy 0, policy_version 41376 (0.0034) +[2024-11-08 05:21:52,932][41694] Fps is (10 sec: 6144.0, 60 sec: 6758.5, 300 sec: 6761.9). Total num frames: 169480192. Throughput: 0: 1660.9. Samples: 37364798. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:21:52,936][41694] Avg episode reward: [(0, '4.389')] +[2024-11-08 05:21:57,440][42004] Updated weights for policy 0, policy_version 41386 (0.0028) +[2024-11-08 05:21:57,932][41694] Fps is (10 sec: 6143.9, 60 sec: 6758.4, 300 sec: 6761.9). Total num frames: 169517056. Throughput: 0: 1692.4. Samples: 37376098. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:21:57,933][41694] Avg episode reward: [(0, '4.326')] +[2024-11-08 05:22:02,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6826.7, 300 sec: 6761.9). Total num frames: 169553920. Throughput: 0: 1698.5. Samples: 37381594. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:22:02,934][41694] Avg episode reward: [(0, '4.231')] +[2024-11-08 05:22:03,524][42004] Updated weights for policy 0, policy_version 41396 (0.0030) +[2024-11-08 05:22:07,935][41694] Fps is (10 sec: 6551.9, 60 sec: 6689.8, 300 sec: 6734.0). Total num frames: 169582592. Throughput: 0: 1724.1. Samples: 37390594. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:22:07,943][41694] Avg episode reward: [(0, '4.516')] +[2024-11-08 05:22:09,835][42004] Updated weights for policy 0, policy_version 41406 (0.0038) +[2024-11-08 05:22:12,931][41694] Fps is (10 sec: 6553.8, 60 sec: 6758.4, 300 sec: 6781.1). Total num frames: 169619456. Throughput: 0: 1712.7. Samples: 37401328. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:22:12,933][41694] Avg episode reward: [(0, '4.217')] +[2024-11-08 05:22:15,399][42004] Updated weights for policy 0, policy_version 41416 (0.0028) +[2024-11-08 05:22:17,931][41694] Fps is (10 sec: 7374.8, 60 sec: 6963.2, 300 sec: 6803.5). Total num frames: 169656320. Throughput: 0: 1713.1. Samples: 37406764. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:22:17,933][41694] Avg episode reward: [(0, '4.490')] +[2024-11-08 05:22:20,910][42004] Updated weights for policy 0, policy_version 41426 (0.0024) +[2024-11-08 05:22:22,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6963.2, 300 sec: 6803.5). Total num frames: 169693184. Throughput: 0: 1721.1. Samples: 37417982. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:22:22,934][41694] Avg episode reward: [(0, '4.666')] +[2024-11-08 05:22:27,932][41694] Fps is (10 sec: 6144.0, 60 sec: 6758.4, 300 sec: 6761.9). Total num frames: 169717760. Throughput: 0: 1652.0. Samples: 37425824. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:22:27,934][41694] Avg episode reward: [(0, '4.650')] +[2024-11-08 05:22:28,223][42004] Updated weights for policy 0, policy_version 41436 (0.0040) +[2024-11-08 05:22:32,932][41694] Fps is (10 sec: 6143.9, 60 sec: 6758.5, 300 sec: 6761.9). Total num frames: 169754624. Throughput: 0: 1646.7. Samples: 37431108. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:22:32,934][41694] Avg episode reward: [(0, '4.307')] +[2024-11-08 05:22:33,774][42004] Updated weights for policy 0, policy_version 41446 (0.0032) +[2024-11-08 05:22:37,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6758.4, 300 sec: 6761.9). Total num frames: 169791488. Throughput: 0: 1720.6. Samples: 37442224. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:22:37,939][41694] Avg episode reward: [(0, '4.390')] +[2024-11-08 05:22:39,921][42004] Updated weights for policy 0, policy_version 41456 (0.0049) +[2024-11-08 05:22:42,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6758.4, 300 sec: 6748.0). Total num frames: 169824256. Throughput: 0: 1690.2. Samples: 37452158. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:22:42,933][41694] Avg episode reward: [(0, '4.423')] +[2024-11-08 05:22:45,613][42004] Updated weights for policy 0, policy_version 41466 (0.0023) +[2024-11-08 05:22:47,932][41694] Fps is (10 sec: 6963.0, 60 sec: 6758.3, 300 sec: 6783.0). Total num frames: 169861120. Throughput: 0: 1690.6. Samples: 37457670. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:22:47,934][41694] Avg episode reward: [(0, '4.475')] +[2024-11-08 05:22:51,140][42004] Updated weights for policy 0, policy_version 41476 (0.0030) +[2024-11-08 05:22:52,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6963.2, 300 sec: 6789.6). Total num frames: 169897984. Throughput: 0: 1739.6. Samples: 37468870. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:22:52,933][41694] Avg episode reward: [(0, '4.375')] +[2024-11-08 05:22:56,526][42004] Updated weights for policy 0, policy_version 41486 (0.0026) +[2024-11-08 05:22:58,602][41694] Fps is (10 sec: 6142.0, 60 sec: 6751.2, 300 sec: 6760.4). Total num frames: 169926656. Throughput: 0: 1597.6. Samples: 37474292. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:22:58,607][41694] Avg episode reward: [(0, '4.549')] +[2024-11-08 05:23:02,932][41694] Fps is (10 sec: 6143.7, 60 sec: 6758.4, 300 sec: 6775.8). Total num frames: 169959424. Throughput: 0: 1679.2. Samples: 37482330. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:23:02,935][41694] Avg episode reward: [(0, '4.347')] +[2024-11-08 05:23:04,031][42004] Updated weights for policy 0, policy_version 41496 (0.0036) +[2024-11-08 05:23:07,932][41694] Fps is (10 sec: 7463.7, 60 sec: 6895.2, 300 sec: 6775.8). Total num frames: 169996288. Throughput: 0: 1666.9. Samples: 37492994. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:23:07,934][41694] Avg episode reward: [(0, '4.549')] +[2024-11-08 05:23:09,575][42004] Updated weights for policy 0, policy_version 41506 (0.0031) +[2024-11-08 05:23:12,932][41694] Fps is (10 sec: 6963.4, 60 sec: 6826.7, 300 sec: 6761.9). Total num frames: 170029056. Throughput: 0: 1729.1. Samples: 37503634. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:23:12,933][41694] Avg episode reward: [(0, '4.358')] +[2024-11-08 05:23:15,764][42004] Updated weights for policy 0, policy_version 41516 (0.0026) +[2024-11-08 05:23:17,931][41694] Fps is (10 sec: 6553.6, 60 sec: 6758.4, 300 sec: 6748.0). Total num frames: 170061824. Throughput: 0: 1723.5. Samples: 37508666. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:23:17,934][41694] Avg episode reward: [(0, '4.568')] +[2024-11-08 05:23:21,716][42004] Updated weights for policy 0, policy_version 41526 (0.0023) +[2024-11-08 05:23:22,932][41694] Fps is (10 sec: 6962.9, 60 sec: 6758.4, 300 sec: 6789.6). Total num frames: 170098688. Throughput: 0: 1706.1. Samples: 37519000. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:23:22,934][41694] Avg episode reward: [(0, '4.419')] +[2024-11-08 05:23:27,278][42004] Updated weights for policy 0, policy_version 41536 (0.0025) +[2024-11-08 05:23:27,932][41694] Fps is (10 sec: 7372.5, 60 sec: 6963.2, 300 sec: 6789.7). Total num frames: 170135552. Throughput: 0: 1729.8. Samples: 37530000. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:23:27,933][41694] Avg episode reward: [(0, '4.509')] +[2024-11-08 05:23:32,931][41694] Fps is (10 sec: 6144.3, 60 sec: 6758.4, 300 sec: 6761.9). Total num frames: 170160128. Throughput: 0: 1728.6. Samples: 37535458. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:23:32,933][41694] Avg episode reward: [(0, '4.373')] +[2024-11-08 05:23:34,360][42004] Updated weights for policy 0, policy_version 41546 (0.0037) +[2024-11-08 05:23:37,932][41694] Fps is (10 sec: 6144.2, 60 sec: 6758.4, 300 sec: 6775.8). Total num frames: 170196992. Throughput: 0: 1656.7. Samples: 37543420. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:23:37,934][41694] Avg episode reward: [(0, '4.590')] +[2024-11-08 05:23:37,945][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000041552_170196992.pth... +[2024-11-08 05:23:38,162][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000041155_168570880.pth +[2024-11-08 05:23:40,095][42004] Updated weights for policy 0, policy_version 41556 (0.0030) +[2024-11-08 05:23:42,932][41694] Fps is (10 sec: 7372.5, 60 sec: 6826.6, 300 sec: 6761.9). Total num frames: 170233856. Throughput: 0: 1806.1. Samples: 37554358. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:23:42,934][41694] Avg episode reward: [(0, '4.476')] +[2024-11-08 05:23:45,858][42004] Updated weights for policy 0, policy_version 41566 (0.0023) +[2024-11-08 05:23:47,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6758.4, 300 sec: 6748.0). Total num frames: 170266624. Throughput: 0: 1718.2. Samples: 37559648. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:23:47,935][41694] Avg episode reward: [(0, '4.519')] +[2024-11-08 05:23:51,840][42004] Updated weights for policy 0, policy_version 41576 (0.0034) +[2024-11-08 05:23:52,931][41694] Fps is (10 sec: 6963.5, 60 sec: 6758.4, 300 sec: 6761.9). Total num frames: 170303488. Throughput: 0: 1706.2. Samples: 37569772. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:23:52,933][41694] Avg episode reward: [(0, '4.311')] +[2024-11-08 05:23:57,231][42004] Updated weights for policy 0, policy_version 41586 (0.0026) +[2024-11-08 05:23:57,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6972.9, 300 sec: 6817.4). Total num frames: 170340352. Throughput: 0: 1723.2. Samples: 37581176. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:23:57,934][41694] Avg episode reward: [(0, '4.559')] +[2024-11-08 05:24:02,936][41694] Fps is (10 sec: 6960.1, 60 sec: 6894.5, 300 sec: 6803.4). Total num frames: 170373120. Throughput: 0: 1732.7. Samples: 37586644. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:24:02,938][41694] Avg episode reward: [(0, '4.427')] +[2024-11-08 05:24:03,294][42004] Updated weights for policy 0, policy_version 41596 (0.0022) +[2024-11-08 05:24:07,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6690.1, 300 sec: 6761.9). Total num frames: 170397696. Throughput: 0: 1667.8. Samples: 37594050. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:24:07,934][41694] Avg episode reward: [(0, '4.533')] +[2024-11-08 05:24:10,595][42004] Updated weights for policy 0, policy_version 41606 (0.0028) +[2024-11-08 05:24:12,931][41694] Fps is (10 sec: 6146.8, 60 sec: 6758.4, 300 sec: 6775.8). Total num frames: 170434560. Throughput: 0: 1658.5. Samples: 37604630. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:24:12,936][41694] Avg episode reward: [(0, '4.403')] +[2024-11-08 05:24:16,098][42004] Updated weights for policy 0, policy_version 41616 (0.0023) +[2024-11-08 05:24:17,932][41694] Fps is (10 sec: 7372.6, 60 sec: 6826.6, 300 sec: 6775.7). Total num frames: 170471424. Throughput: 0: 1660.6. Samples: 37610186. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:24:17,934][41694] Avg episode reward: [(0, '4.464')] +[2024-11-08 05:24:22,070][42004] Updated weights for policy 0, policy_version 41626 (0.0032) +[2024-11-08 05:24:22,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6758.5, 300 sec: 6761.9). Total num frames: 170504192. Throughput: 0: 1711.7. Samples: 37620448. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:24:22,933][41694] Avg episode reward: [(0, '4.426')] +[2024-11-08 05:24:27,823][42004] Updated weights for policy 0, policy_version 41636 (0.0031) +[2024-11-08 05:24:27,932][41694] Fps is (10 sec: 6963.5, 60 sec: 6758.4, 300 sec: 6761.9). Total num frames: 170541056. Throughput: 0: 1713.8. Samples: 37631480. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:24:27,933][41694] Avg episode reward: [(0, '4.455')] +[2024-11-08 05:24:32,932][41694] Fps is (10 sec: 7372.1, 60 sec: 6963.1, 300 sec: 6803.5). Total num frames: 170577920. Throughput: 0: 1721.7. Samples: 37637128. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:24:32,934][41694] Avg episode reward: [(0, '4.526')] +[2024-11-08 05:24:33,069][42004] Updated weights for policy 0, policy_version 41646 (0.0034) +[2024-11-08 05:24:37,932][41694] Fps is (10 sec: 7372.9, 60 sec: 6963.2, 300 sec: 6803.5). Total num frames: 170614784. Throughput: 0: 1749.3. Samples: 37648490. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:24:37,938][41694] Avg episode reward: [(0, '4.465')] +[2024-11-08 05:24:40,169][42004] Updated weights for policy 0, policy_version 41656 (0.0031) +[2024-11-08 05:24:42,932][41694] Fps is (10 sec: 6144.6, 60 sec: 6758.4, 300 sec: 6775.8). Total num frames: 170639360. Throughput: 0: 1674.2. Samples: 37656514. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:24:42,934][41694] Avg episode reward: [(0, '4.535')] +[2024-11-08 05:24:45,842][42004] Updated weights for policy 0, policy_version 41666 (0.0023) +[2024-11-08 05:24:47,932][41694] Fps is (10 sec: 6144.0, 60 sec: 6826.7, 300 sec: 6789.6). Total num frames: 170676224. Throughput: 0: 1673.7. Samples: 37661954. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:24:47,933][41694] Avg episode reward: [(0, '4.361')] +[2024-11-08 05:24:51,242][42004] Updated weights for policy 0, policy_version 41676 (0.0025) +[2024-11-08 05:24:52,932][41694] Fps is (10 sec: 7372.3, 60 sec: 6826.6, 300 sec: 6789.6). Total num frames: 170713088. Throughput: 0: 1762.1. Samples: 37673344. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:24:52,935][41694] Avg episode reward: [(0, '4.377')] +[2024-11-08 05:24:57,213][42004] Updated weights for policy 0, policy_version 41686 (0.0036) +[2024-11-08 05:24:57,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6826.7, 300 sec: 6789.7). Total num frames: 170749952. Throughput: 0: 1756.2. Samples: 37683658. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:24:57,933][41694] Avg episode reward: [(0, '4.345')] +[2024-11-08 05:25:02,773][42004] Updated weights for policy 0, policy_version 41696 (0.0028) +[2024-11-08 05:25:02,931][41694] Fps is (10 sec: 7373.3, 60 sec: 6895.5, 300 sec: 6825.8). Total num frames: 170786816. Throughput: 0: 1761.0. Samples: 37689432. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:25:02,933][41694] Avg episode reward: [(0, '4.466')] +[2024-11-08 05:25:07,932][41694] Fps is (10 sec: 7372.6, 60 sec: 7099.7, 300 sec: 6831.3). Total num frames: 170823680. Throughput: 0: 1775.2. Samples: 37700334. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:25:07,935][41694] Avg episode reward: [(0, '4.348')] +[2024-11-08 05:25:08,149][42004] Updated weights for policy 0, policy_version 41706 (0.0044) +[2024-11-08 05:25:14,126][41694] Fps is (10 sec: 6220.3, 60 sec: 6894.2, 300 sec: 6817.6). Total num frames: 170856448. Throughput: 0: 1732.1. Samples: 37711494. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:25:14,130][41694] Avg episode reward: [(0, '4.682')] +[2024-11-08 05:25:16,076][42004] Updated weights for policy 0, policy_version 41716 (0.0032) +[2024-11-08 05:25:17,932][41694] Fps is (10 sec: 5324.7, 60 sec: 6758.4, 300 sec: 6803.5). Total num frames: 170876928. Throughput: 0: 1690.3. Samples: 37713190. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:25:17,934][41694] Avg episode reward: [(0, '4.435')] +[2024-11-08 05:25:21,936][42004] Updated weights for policy 0, policy_version 41726 (0.0036) +[2024-11-08 05:25:22,931][41694] Fps is (10 sec: 6512.2, 60 sec: 6826.7, 300 sec: 6817.4). Total num frames: 170913792. Throughput: 0: 1657.6. Samples: 37723082. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:25:22,933][41694] Avg episode reward: [(0, '4.412')] +[2024-11-08 05:25:27,505][42004] Updated weights for policy 0, policy_version 41736 (0.0023) +[2024-11-08 05:25:27,933][41694] Fps is (10 sec: 7372.9, 60 sec: 6826.7, 300 sec: 6803.5). Total num frames: 170950656. Throughput: 0: 1731.0. Samples: 37734410. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:25:27,934][41694] Avg episode reward: [(0, '4.453')] +[2024-11-08 05:25:32,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6826.8, 300 sec: 6803.5). Total num frames: 170987520. Throughput: 0: 1717.2. Samples: 37739230. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:25:32,933][41694] Avg episode reward: [(0, '4.611')] +[2024-11-08 05:25:33,212][42004] Updated weights for policy 0, policy_version 41746 (0.0022) +[2024-11-08 05:25:37,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6826.6, 300 sec: 6845.2). Total num frames: 171024384. Throughput: 0: 1720.3. Samples: 37750756. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:25:37,933][41694] Avg episode reward: [(0, '4.437')] +[2024-11-08 05:25:38,059][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000041755_171028480.pth... +[2024-11-08 05:25:38,157][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000041354_169385984.pth +[2024-11-08 05:25:38,613][42004] Updated weights for policy 0, policy_version 41756 (0.0028) +[2024-11-08 05:25:42,932][41694] Fps is (10 sec: 7372.6, 60 sec: 7031.4, 300 sec: 6859.1). Total num frames: 171061248. Throughput: 0: 1739.7. Samples: 37761946. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:25:42,937][41694] Avg episode reward: [(0, '4.628')] +[2024-11-08 05:25:44,102][42004] Updated weights for policy 0, policy_version 41766 (0.0028) +[2024-11-08 05:25:47,932][41694] Fps is (10 sec: 6144.1, 60 sec: 6826.7, 300 sec: 6817.4). Total num frames: 171085824. Throughput: 0: 1732.9. Samples: 37767412. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:25:47,935][41694] Avg episode reward: [(0, '4.424')] +[2024-11-08 05:25:51,703][42004] Updated weights for policy 0, policy_version 41776 (0.0035) +[2024-11-08 05:25:52,931][41694] Fps is (10 sec: 6144.2, 60 sec: 6826.8, 300 sec: 6817.4). Total num frames: 171122688. Throughput: 0: 1656.7. Samples: 37774884. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:25:52,933][41694] Avg episode reward: [(0, '4.566')] +[2024-11-08 05:25:57,101][42004] Updated weights for policy 0, policy_version 41786 (0.0026) +[2024-11-08 05:25:57,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6826.7, 300 sec: 6831.3). Total num frames: 171159552. Throughput: 0: 1708.4. Samples: 37786332. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:25:57,934][41694] Avg episode reward: [(0, '4.474')] +[2024-11-08 05:26:02,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6758.4, 300 sec: 6817.4). Total num frames: 171192320. Throughput: 0: 1739.5. Samples: 37791466. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:26:02,934][41694] Avg episode reward: [(0, '4.310')] +[2024-11-08 05:26:03,184][42004] Updated weights for policy 0, policy_version 41796 (0.0022) +[2024-11-08 05:26:07,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6758.4, 300 sec: 6831.3). Total num frames: 171229184. Throughput: 0: 1742.0. Samples: 37801472. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:26:07,933][41694] Avg episode reward: [(0, '4.321')] +[2024-11-08 05:26:08,993][42004] Updated weights for policy 0, policy_version 41806 (0.0034) +[2024-11-08 05:26:12,934][41694] Fps is (10 sec: 6961.4, 60 sec: 6895.4, 300 sec: 6859.0). Total num frames: 171261952. Throughput: 0: 1708.4. Samples: 37811292. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:26:12,936][41694] Avg episode reward: [(0, '4.466')] +[2024-11-08 05:26:15,334][42004] Updated weights for policy 0, policy_version 41816 (0.0036) +[2024-11-08 05:26:17,931][41694] Fps is (10 sec: 6553.6, 60 sec: 6963.3, 300 sec: 6845.2). Total num frames: 171294720. Throughput: 0: 1715.2. Samples: 37816412. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:26:17,933][41694] Avg episode reward: [(0, '4.659')] +[2024-11-08 05:26:22,932][41694] Fps is (10 sec: 5326.1, 60 sec: 6690.1, 300 sec: 6789.6). Total num frames: 171315200. Throughput: 0: 1666.9. Samples: 37825766. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:26:22,935][41694] Avg episode reward: [(0, '4.551')] +[2024-11-08 05:26:22,995][42004] Updated weights for policy 0, policy_version 41826 (0.0035) +[2024-11-08 05:26:27,931][41694] Fps is (10 sec: 6144.0, 60 sec: 6758.4, 300 sec: 6803.5). Total num frames: 171356160. Throughput: 0: 1618.6. Samples: 37834784. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:26:27,933][41694] Avg episode reward: [(0, '4.215')] +[2024-11-08 05:26:28,420][42004] Updated weights for policy 0, policy_version 41836 (0.0027) +[2024-11-08 05:26:32,932][41694] Fps is (10 sec: 7782.3, 60 sec: 6758.4, 300 sec: 6803.5). Total num frames: 171393024. Throughput: 0: 1622.5. Samples: 37840424. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:26:32,934][41694] Avg episode reward: [(0, '4.550')] +[2024-11-08 05:26:34,147][42004] Updated weights for policy 0, policy_version 41846 (0.0030) +[2024-11-08 05:26:37,931][41694] Fps is (10 sec: 6553.6, 60 sec: 6621.9, 300 sec: 6789.6). Total num frames: 171421696. Throughput: 0: 1682.1. Samples: 37850580. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:26:37,933][41694] Avg episode reward: [(0, '4.337')] +[2024-11-08 05:26:40,308][42004] Updated weights for policy 0, policy_version 41856 (0.0046) +[2024-11-08 05:26:42,932][41694] Fps is (10 sec: 6553.7, 60 sec: 6621.9, 300 sec: 6789.6). Total num frames: 171458560. Throughput: 0: 1662.9. Samples: 37861164. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:26:42,935][41694] Avg episode reward: [(0, '4.462')] +[2024-11-08 05:26:45,838][42004] Updated weights for policy 0, policy_version 41866 (0.0021) +[2024-11-08 05:26:47,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6826.7, 300 sec: 6831.3). Total num frames: 171495424. Throughput: 0: 1673.5. Samples: 37866774. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:26:47,932][41694] Avg episode reward: [(0, '4.644')] +[2024-11-08 05:26:51,585][42004] Updated weights for policy 0, policy_version 41876 (0.0047) +[2024-11-08 05:26:52,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6826.6, 300 sec: 6831.3). Total num frames: 171532288. Throughput: 0: 1692.2. Samples: 37877620. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:26:52,936][41694] Avg episode reward: [(0, '4.532')] +[2024-11-08 05:26:57,932][41694] Fps is (10 sec: 6143.9, 60 sec: 6621.9, 300 sec: 6789.6). Total num frames: 171556864. Throughput: 0: 1649.2. Samples: 37885502. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:26:57,934][41694] Avg episode reward: [(0, '4.231')] +[2024-11-08 05:26:58,957][42004] Updated weights for policy 0, policy_version 41886 (0.0027) +[2024-11-08 05:27:02,931][41694] Fps is (10 sec: 5734.5, 60 sec: 6621.9, 300 sec: 6803.6). Total num frames: 171589632. Throughput: 0: 1651.3. Samples: 37890722. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:27:02,933][41694] Avg episode reward: [(0, '4.385')] +[2024-11-08 05:27:04,877][42004] Updated weights for policy 0, policy_version 41896 (0.0027) +[2024-11-08 05:27:07,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6621.8, 300 sec: 6803.5). Total num frames: 171626496. Throughput: 0: 1677.1. Samples: 37901234. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:27:07,933][41694] Avg episode reward: [(0, '4.579')] +[2024-11-08 05:27:11,131][42004] Updated weights for policy 0, policy_version 41906 (0.0028) +[2024-11-08 05:27:12,931][41694] Fps is (10 sec: 6553.6, 60 sec: 6553.9, 300 sec: 6775.8). Total num frames: 171655168. Throughput: 0: 1684.8. Samples: 37910598. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:27:12,934][41694] Avg episode reward: [(0, '4.313')] +[2024-11-08 05:27:17,018][42004] Updated weights for policy 0, policy_version 41916 (0.0034) +[2024-11-08 05:27:17,932][41694] Fps is (10 sec: 6553.6, 60 sec: 6621.8, 300 sec: 6775.8). Total num frames: 171692032. Throughput: 0: 1669.3. Samples: 37915542. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:27:17,934][41694] Avg episode reward: [(0, '4.559')] +[2024-11-08 05:27:22,493][42004] Updated weights for policy 0, policy_version 41926 (0.0030) +[2024-11-08 05:27:22,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6895.0, 300 sec: 6817.4). Total num frames: 171728896. Throughput: 0: 1697.0. Samples: 37926944. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:27:22,933][41694] Avg episode reward: [(0, '4.576')] +[2024-11-08 05:27:27,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6826.7, 300 sec: 6817.4). Total num frames: 171765760. Throughput: 0: 1708.8. Samples: 37938058. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:27:27,932][41694] Avg episode reward: [(0, '4.623')] +[2024-11-08 05:27:28,026][42004] Updated weights for policy 0, policy_version 41936 (0.0029) +[2024-11-08 05:27:32,931][41694] Fps is (10 sec: 6144.0, 60 sec: 6621.9, 300 sec: 6775.8). Total num frames: 171790336. Throughput: 0: 1639.6. Samples: 37940558. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:27:32,935][41694] Avg episode reward: [(0, '4.637')] +[2024-11-08 05:27:35,265][42004] Updated weights for policy 0, policy_version 41946 (0.0031) +[2024-11-08 05:27:37,931][41694] Fps is (10 sec: 6144.0, 60 sec: 6758.4, 300 sec: 6789.6). Total num frames: 171827200. Throughput: 0: 1647.6. Samples: 37951760. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:27:37,933][41694] Avg episode reward: [(0, '4.171')] +[2024-11-08 05:27:37,976][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000041951_171831296.pth... +[2024-11-08 05:27:38,072][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000041552_170196992.pth +[2024-11-08 05:27:40,841][42004] Updated weights for policy 0, policy_version 41956 (0.0025) +[2024-11-08 05:27:42,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6758.4, 300 sec: 6789.7). Total num frames: 171864064. Throughput: 0: 1708.5. Samples: 37962386. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:27:42,933][41694] Avg episode reward: [(0, '4.539')] +[2024-11-08 05:27:46,958][42004] Updated weights for policy 0, policy_version 41966 (0.0027) +[2024-11-08 05:27:47,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6690.1, 300 sec: 6775.8). Total num frames: 171896832. Throughput: 0: 1697.5. Samples: 37967112. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:27:47,935][41694] Avg episode reward: [(0, '4.312')] +[2024-11-08 05:27:52,470][42004] Updated weights for policy 0, policy_version 41976 (0.0023) +[2024-11-08 05:27:52,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6690.2, 300 sec: 6819.0). Total num frames: 171933696. Throughput: 0: 1711.7. Samples: 37978258. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:27:52,932][41694] Avg episode reward: [(0, '4.593')] +[2024-11-08 05:27:57,815][42004] Updated weights for policy 0, policy_version 41986 (0.0032) +[2024-11-08 05:27:57,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6963.2, 300 sec: 6831.3). Total num frames: 171974656. Throughput: 0: 1757.5. Samples: 37989688. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:27:57,933][41694] Avg episode reward: [(0, '4.552')] +[2024-11-08 05:28:03,983][41694] Fps is (10 sec: 6300.4, 60 sec: 6776.1, 300 sec: 6779.4). Total num frames: 172003328. Throughput: 0: 1729.0. Samples: 37995166. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:28:03,985][41694] Avg episode reward: [(0, '4.669')] +[2024-11-08 05:28:05,685][42004] Updated weights for policy 0, policy_version 41996 (0.0027) +[2024-11-08 05:28:07,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6758.4, 300 sec: 6789.6). Total num frames: 172032000. Throughput: 0: 1669.6. Samples: 38002078. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:28:07,934][41694] Avg episode reward: [(0, '4.876')] +[2024-11-08 05:28:11,342][42004] Updated weights for policy 0, policy_version 42006 (0.0033) +[2024-11-08 05:28:12,931][41694] Fps is (10 sec: 6866.3, 60 sec: 6826.7, 300 sec: 6789.6). Total num frames: 172064768. Throughput: 0: 1665.8. Samples: 38013018. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:28:12,934][41694] Avg episode reward: [(0, '4.599')] +[2024-11-08 05:28:17,429][42004] Updated weights for policy 0, policy_version 42016 (0.0046) +[2024-11-08 05:28:17,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6826.7, 300 sec: 6789.7). Total num frames: 172101632. Throughput: 0: 1720.9. Samples: 38017998. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:28:17,932][41694] Avg episode reward: [(0, '4.560')] +[2024-11-08 05:28:22,932][41694] Fps is (10 sec: 6553.5, 60 sec: 6690.1, 300 sec: 6761.9). Total num frames: 172130304. Throughput: 0: 1686.8. Samples: 38027664. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:28:22,948][41694] Avg episode reward: [(0, '4.376')] +[2024-11-08 05:28:23,593][42004] Updated weights for policy 0, policy_version 42026 (0.0029) +[2024-11-08 05:28:27,932][41694] Fps is (10 sec: 6553.5, 60 sec: 6690.1, 300 sec: 6803.5). Total num frames: 172167168. Throughput: 0: 1688.3. Samples: 38038360. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:28:27,933][41694] Avg episode reward: [(0, '4.318')] +[2024-11-08 05:28:29,099][42004] Updated weights for policy 0, policy_version 42036 (0.0029) +[2024-11-08 05:28:32,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6894.9, 300 sec: 6803.5). Total num frames: 172204032. Throughput: 0: 1709.0. Samples: 38044018. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:28:32,933][41694] Avg episode reward: [(0, '4.607')] +[2024-11-08 05:28:34,844][42004] Updated weights for policy 0, policy_version 42046 (0.0045) +[2024-11-08 05:28:37,932][41694] Fps is (10 sec: 6143.9, 60 sec: 6690.1, 300 sec: 6761.9). Total num frames: 172228608. Throughput: 0: 1698.4. Samples: 38054686. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:28:37,934][41694] Avg episode reward: [(0, '4.645')] +[2024-11-08 05:28:41,992][42004] Updated weights for policy 0, policy_version 42056 (0.0026) +[2024-11-08 05:28:42,931][41694] Fps is (10 sec: 6144.0, 60 sec: 6690.1, 300 sec: 6775.8). Total num frames: 172265472. Throughput: 0: 1626.5. Samples: 38062880. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:28:42,933][41694] Avg episode reward: [(0, '4.259')] +[2024-11-08 05:28:47,338][42004] Updated weights for policy 0, policy_version 42066 (0.0036) +[2024-11-08 05:28:47,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6826.6, 300 sec: 6789.6). Total num frames: 172306432. Throughput: 0: 1671.6. Samples: 38068628. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:28:47,934][41694] Avg episode reward: [(0, '4.297')] +[2024-11-08 05:28:52,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6690.1, 300 sec: 6761.9). Total num frames: 172335104. Throughput: 0: 1694.4. Samples: 38078324. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:28:52,933][41694] Avg episode reward: [(0, '4.493')] +[2024-11-08 05:28:54,024][42004] Updated weights for policy 0, policy_version 42076 (0.0028) +[2024-11-08 05:28:57,931][41694] Fps is (10 sec: 6144.2, 60 sec: 6553.6, 300 sec: 6762.0). Total num frames: 172367872. Throughput: 0: 1681.3. Samples: 38088678. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:28:57,933][41694] Avg episode reward: [(0, '4.468')] +[2024-11-08 05:28:59,622][42004] Updated weights for policy 0, policy_version 42086 (0.0025) +[2024-11-08 05:29:02,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6809.5, 300 sec: 6803.5). Total num frames: 172404736. Throughput: 0: 1697.1. Samples: 38094368. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:29:02,934][41694] Avg episode reward: [(0, '4.567')] +[2024-11-08 05:29:05,624][42004] Updated weights for policy 0, policy_version 42096 (0.0031) +[2024-11-08 05:29:07,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6758.4, 300 sec: 6789.6). Total num frames: 172437504. Throughput: 0: 1710.8. Samples: 38104652. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:29:07,935][41694] Avg episode reward: [(0, '4.478')] +[2024-11-08 05:29:12,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6621.8, 300 sec: 6748.0). Total num frames: 172462080. Throughput: 0: 1637.6. Samples: 38112054. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:29:12,934][41694] Avg episode reward: [(0, '4.362')] +[2024-11-08 05:29:13,096][42004] Updated weights for policy 0, policy_version 42106 (0.0032) +[2024-11-08 05:29:17,931][41694] Fps is (10 sec: 6553.6, 60 sec: 6690.1, 300 sec: 6775.8). Total num frames: 172503040. Throughput: 0: 1634.2. Samples: 38117558. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:29:17,933][41694] Avg episode reward: [(0, '4.372')] +[2024-11-08 05:29:18,359][42004] Updated weights for policy 0, policy_version 42116 (0.0031) +[2024-11-08 05:29:22,931][41694] Fps is (10 sec: 7782.7, 60 sec: 6826.7, 300 sec: 6775.8). Total num frames: 172539904. Throughput: 0: 1654.0. Samples: 38129114. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:29:22,934][41694] Avg episode reward: [(0, '4.287')] +[2024-11-08 05:29:24,281][42004] Updated weights for policy 0, policy_version 42126 (0.0032) +[2024-11-08 05:29:27,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6758.4, 300 sec: 6761.9). Total num frames: 172572672. Throughput: 0: 1695.5. Samples: 38139176. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:29:27,934][41694] Avg episode reward: [(0, '4.460')] +[2024-11-08 05:29:30,229][42004] Updated weights for policy 0, policy_version 42136 (0.0038) +[2024-11-08 05:29:32,931][41694] Fps is (10 sec: 6553.6, 60 sec: 6690.1, 300 sec: 6748.0). Total num frames: 172605440. Throughput: 0: 1684.1. Samples: 38144410. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:29:32,934][41694] Avg episode reward: [(0, '4.513')] +[2024-11-08 05:29:35,866][42004] Updated weights for policy 0, policy_version 42146 (0.0028) +[2024-11-08 05:29:37,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6895.0, 300 sec: 6789.6). Total num frames: 172642304. Throughput: 0: 1713.9. Samples: 38155448. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:29:37,934][41694] Avg episode reward: [(0, '4.467')] +[2024-11-08 05:29:37,946][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000042149_172642304.pth... +[2024-11-08 05:29:38,136][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000041755_171028480.pth +[2024-11-08 05:29:41,724][42004] Updated weights for policy 0, policy_version 42156 (0.0024) +[2024-11-08 05:29:42,932][41694] Fps is (10 sec: 6962.8, 60 sec: 6826.6, 300 sec: 6775.7). Total num frames: 172675072. Throughput: 0: 1710.0. Samples: 38165630. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:29:42,935][41694] Avg episode reward: [(0, '4.526')] +[2024-11-08 05:29:47,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6553.6, 300 sec: 6734.1). Total num frames: 172699648. Throughput: 0: 1662.3. Samples: 38169170. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:29:47,933][41694] Avg episode reward: [(0, '4.476')] +[2024-11-08 05:29:49,482][42004] Updated weights for policy 0, policy_version 42166 (0.0060) +[2024-11-08 05:29:52,932][41694] Fps is (10 sec: 6144.2, 60 sec: 6690.1, 300 sec: 6734.1). Total num frames: 172736512. Throughput: 0: 1641.0. Samples: 38178496. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:29:52,933][41694] Avg episode reward: [(0, '4.323')] +[2024-11-08 05:29:55,474][42004] Updated weights for policy 0, policy_version 42176 (0.0026) +[2024-11-08 05:29:57,932][41694] Fps is (10 sec: 6553.6, 60 sec: 6621.9, 300 sec: 6706.3). Total num frames: 172765184. Throughput: 0: 1689.4. Samples: 38188078. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:29:57,937][41694] Avg episode reward: [(0, '4.248')] +[2024-11-08 05:30:01,793][42004] Updated weights for policy 0, policy_version 42186 (0.0024) +[2024-11-08 05:30:02,933][41694] Fps is (10 sec: 6143.1, 60 sec: 6553.4, 300 sec: 6692.4). Total num frames: 172797952. Throughput: 0: 1676.1. Samples: 38192986. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:30:02,938][41694] Avg episode reward: [(0, '4.432')] +[2024-11-08 05:30:07,337][42004] Updated weights for policy 0, policy_version 42196 (0.0025) +[2024-11-08 05:30:07,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6690.1, 300 sec: 6747.5). Total num frames: 172838912. Throughput: 0: 1653.0. Samples: 38203498. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:30:07,933][41694] Avg episode reward: [(0, '4.601')] +[2024-11-08 05:30:12,932][41694] Fps is (10 sec: 7373.7, 60 sec: 6826.7, 300 sec: 6761.9). Total num frames: 172871680. Throughput: 0: 1669.7. Samples: 38214314. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:30:12,937][41694] Avg episode reward: [(0, '4.251')] +[2024-11-08 05:30:13,277][42004] Updated weights for policy 0, policy_version 42206 (0.0036) +[2024-11-08 05:30:17,932][41694] Fps is (10 sec: 6553.5, 60 sec: 6690.1, 300 sec: 6748.0). Total num frames: 172904448. Throughput: 0: 1650.8. Samples: 38218696. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:30:17,935][41694] Avg episode reward: [(0, '4.297')] +[2024-11-08 05:30:21,623][42004] Updated weights for policy 0, policy_version 42216 (0.0042) +[2024-11-08 05:30:22,935][41694] Fps is (10 sec: 5322.9, 60 sec: 6416.6, 300 sec: 6692.4). Total num frames: 172924928. Throughput: 0: 1562.8. Samples: 38225778. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:30:22,938][41694] Avg episode reward: [(0, '4.546')] +[2024-11-08 05:30:27,259][42004] Updated weights for policy 0, policy_version 42226 (0.0040) +[2024-11-08 05:30:27,933][41694] Fps is (10 sec: 5734.3, 60 sec: 6485.3, 300 sec: 6692.4). Total num frames: 172961792. Throughput: 0: 1578.1. Samples: 38236642. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:30:27,936][41694] Avg episode reward: [(0, '4.294')] +[2024-11-08 05:30:32,932][41694] Fps is (10 sec: 6965.7, 60 sec: 6485.3, 300 sec: 6678.6). Total num frames: 172994560. Throughput: 0: 1611.6. Samples: 38241694. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:30:32,934][41694] Avg episode reward: [(0, '4.482')] +[2024-11-08 05:30:33,063][42004] Updated weights for policy 0, policy_version 42236 (0.0026) +[2024-11-08 05:30:37,932][41694] Fps is (10 sec: 7372.3, 60 sec: 6553.5, 300 sec: 6692.4). Total num frames: 173035520. Throughput: 0: 1646.2. Samples: 38252576. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:30:37,935][41694] Avg episode reward: [(0, '4.436')] +[2024-11-08 05:30:38,447][42004] Updated weights for policy 0, policy_version 42246 (0.0025) +[2024-11-08 05:30:42,932][41694] Fps is (10 sec: 7782.5, 60 sec: 6621.9, 300 sec: 6734.1). Total num frames: 173072384. Throughput: 0: 1695.4. Samples: 38264372. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:30:42,935][41694] Avg episode reward: [(0, '4.452')] +[2024-11-08 05:30:43,794][42004] Updated weights for policy 0, policy_version 42256 (0.0029) +[2024-11-08 05:30:47,931][41694] Fps is (10 sec: 7373.5, 60 sec: 6826.7, 300 sec: 6734.1). Total num frames: 173109248. Throughput: 0: 1708.5. Samples: 38269868. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:30:47,933][41694] Avg episode reward: [(0, '4.517')] +[2024-11-08 05:30:49,522][42004] Updated weights for policy 0, policy_version 42266 (0.0029) +[2024-11-08 05:30:54,404][41694] Fps is (10 sec: 5712.3, 60 sec: 6529.8, 300 sec: 6673.0). Total num frames: 173137920. Throughput: 0: 1660.1. Samples: 38280648. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:30:54,406][41694] Avg episode reward: [(0, '4.303')] +[2024-11-08 05:30:57,287][42004] Updated weights for policy 0, policy_version 42276 (0.0030) +[2024-11-08 05:30:57,932][41694] Fps is (10 sec: 5734.1, 60 sec: 6690.1, 300 sec: 6692.4). Total num frames: 173166592. Throughput: 0: 1633.1. Samples: 38287802. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:30:57,936][41694] Avg episode reward: [(0, '4.459')] +[2024-11-08 05:31:02,907][42004] Updated weights for policy 0, policy_version 42286 (0.0024) +[2024-11-08 05:31:02,931][41694] Fps is (10 sec: 7685.7, 60 sec: 6758.6, 300 sec: 6692.4). Total num frames: 173203456. Throughput: 0: 1658.6. Samples: 38293334. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:31:02,933][41694] Avg episode reward: [(0, '4.407')] +[2024-11-08 05:31:07,932][41694] Fps is (10 sec: 6963.5, 60 sec: 6621.9, 300 sec: 6692.5). Total num frames: 173236224. Throughput: 0: 1726.2. Samples: 38303448. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:31:07,934][41694] Avg episode reward: [(0, '4.348')] +[2024-11-08 05:31:08,629][42004] Updated weights for policy 0, policy_version 42296 (0.0028) +[2024-11-08 05:31:12,934][41694] Fps is (10 sec: 6961.7, 60 sec: 6689.9, 300 sec: 6706.3). Total num frames: 173273088. Throughput: 0: 1718.6. Samples: 38313980. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:31:12,935][41694] Avg episode reward: [(0, '4.328')] +[2024-11-08 05:31:14,809][42004] Updated weights for policy 0, policy_version 42306 (0.0043) +[2024-11-08 05:31:17,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6690.1, 300 sec: 6748.0). Total num frames: 173305856. Throughput: 0: 1723.5. Samples: 38319250. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:31:17,934][41694] Avg episode reward: [(0, '4.506')] +[2024-11-08 05:31:20,454][42004] Updated weights for policy 0, policy_version 42316 (0.0029) +[2024-11-08 05:31:22,931][41694] Fps is (10 sec: 6964.7, 60 sec: 6963.7, 300 sec: 6734.1). Total num frames: 173342720. Throughput: 0: 1728.4. Samples: 38330352. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:31:22,933][41694] Avg episode reward: [(0, '4.326')] +[2024-11-08 05:31:25,895][42004] Updated weights for policy 0, policy_version 42326 (0.0030) +[2024-11-08 05:31:28,768][41694] Fps is (10 sec: 6048.1, 60 sec: 6732.9, 300 sec: 6687.4). Total num frames: 173371392. Throughput: 0: 1560.4. Samples: 38335896. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:31:28,770][41694] Avg episode reward: [(0, '4.268')] +[2024-11-08 05:31:32,931][41694] Fps is (10 sec: 6144.0, 60 sec: 6826.7, 300 sec: 6720.2). Total num frames: 173404160. Throughput: 0: 1625.4. Samples: 38343012. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:31:32,933][41694] Avg episode reward: [(0, '4.179')] +[2024-11-08 05:31:33,471][42004] Updated weights for policy 0, policy_version 42336 (0.0037) +[2024-11-08 05:31:37,932][41694] Fps is (10 sec: 7151.4, 60 sec: 6690.2, 300 sec: 6706.3). Total num frames: 173436928. Throughput: 0: 1692.3. Samples: 38354308. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:31:37,933][41694] Avg episode reward: [(0, '4.463')] +[2024-11-08 05:31:38,054][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000042344_173441024.pth... +[2024-11-08 05:31:38,174][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000041951_171831296.pth +[2024-11-08 05:31:39,310][42004] Updated weights for policy 0, policy_version 42346 (0.0029) +[2024-11-08 05:31:42,932][41694] Fps is (10 sec: 6553.4, 60 sec: 6621.9, 300 sec: 6692.4). Total num frames: 173469696. Throughput: 0: 1699.0. Samples: 38364258. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:31:42,934][41694] Avg episode reward: [(0, '4.527')] +[2024-11-08 05:31:45,100][42004] Updated weights for policy 0, policy_version 42356 (0.0032) +[2024-11-08 05:31:47,935][41694] Fps is (10 sec: 7369.9, 60 sec: 6689.7, 300 sec: 6706.2). Total num frames: 173510656. Throughput: 0: 1702.5. Samples: 38369952. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:31:47,938][41694] Avg episode reward: [(0, '4.486')] +[2024-11-08 05:31:50,207][42004] Updated weights for policy 0, policy_version 42366 (0.0031) +[2024-11-08 05:31:52,931][41694] Fps is (10 sec: 8192.4, 60 sec: 7068.5, 300 sec: 6761.9). Total num frames: 173551616. Throughput: 0: 1742.8. Samples: 38381874. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:31:52,933][41694] Avg episode reward: [(0, '4.309')] +[2024-11-08 05:31:55,592][42004] Updated weights for policy 0, policy_version 42376 (0.0029) +[2024-11-08 05:31:57,931][41694] Fps is (10 sec: 7785.5, 60 sec: 7031.5, 300 sec: 6775.8). Total num frames: 173588480. Throughput: 0: 1758.2. Samples: 38393096. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:31:57,934][41694] Avg episode reward: [(0, '4.378')] +[2024-11-08 05:32:03,089][41694] Fps is (10 sec: 5645.2, 60 sec: 6740.7, 300 sec: 6716.6). Total num frames: 173608960. Throughput: 0: 1757.2. Samples: 38398600. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:32:03,092][41694] Avg episode reward: [(0, '4.332')] +[2024-11-08 05:32:03,427][42004] Updated weights for policy 0, policy_version 42386 (0.0035) +[2024-11-08 05:32:07,932][41694] Fps is (10 sec: 5324.6, 60 sec: 6758.4, 300 sec: 6734.1). Total num frames: 173641728. Throughput: 0: 1663.7. Samples: 38405220. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:32:07,933][41694] Avg episode reward: [(0, '4.375')] +[2024-11-08 05:32:09,080][42004] Updated weights for policy 0, policy_version 42396 (0.0047) +[2024-11-08 05:32:12,931][41694] Fps is (10 sec: 7075.0, 60 sec: 6758.6, 300 sec: 6734.1). Total num frames: 173678592. Throughput: 0: 1807.1. Samples: 38415704. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:32:12,933][41694] Avg episode reward: [(0, '4.394')] +[2024-11-08 05:32:15,133][42004] Updated weights for policy 0, policy_version 42406 (0.0030) +[2024-11-08 05:32:17,931][41694] Fps is (10 sec: 6963.4, 60 sec: 6758.4, 300 sec: 6720.2). Total num frames: 173711360. Throughput: 0: 1731.2. Samples: 38420916. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:32:17,933][41694] Avg episode reward: [(0, '4.324')] +[2024-11-08 05:32:20,592][42004] Updated weights for policy 0, policy_version 42416 (0.0025) +[2024-11-08 05:32:22,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6826.7, 300 sec: 6734.1). Total num frames: 173752320. Throughput: 0: 1731.1. Samples: 38432206. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:32:22,933][41694] Avg episode reward: [(0, '4.396')] +[2024-11-08 05:32:26,114][42004] Updated weights for policy 0, policy_version 42426 (0.0027) +[2024-11-08 05:32:27,932][41694] Fps is (10 sec: 7781.9, 60 sec: 7061.5, 300 sec: 6775.7). Total num frames: 173789184. Throughput: 0: 1765.9. Samples: 38443724. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:32:27,935][41694] Avg episode reward: [(0, '4.426')] +[2024-11-08 05:32:31,541][42004] Updated weights for policy 0, policy_version 42436 (0.0038) +[2024-11-08 05:32:32,931][41694] Fps is (10 sec: 7372.8, 60 sec: 7031.5, 300 sec: 6775.8). Total num frames: 173826048. Throughput: 0: 1757.2. Samples: 38449018. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:32:32,934][41694] Avg episode reward: [(0, '4.504')] +[2024-11-08 05:32:37,932][41694] Fps is (10 sec: 5325.1, 60 sec: 6758.4, 300 sec: 6706.3). Total num frames: 173842432. Throughput: 0: 1708.8. Samples: 38458770. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:32:37,933][41694] Avg episode reward: [(0, '4.339')] +[2024-11-08 05:32:39,940][42004] Updated weights for policy 0, policy_version 42446 (0.0036) +[2024-11-08 05:32:42,931][41694] Fps is (10 sec: 4915.2, 60 sec: 6758.4, 300 sec: 6706.3). Total num frames: 173875200. Throughput: 0: 1610.5. Samples: 38465570. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:32:42,936][41694] Avg episode reward: [(0, '4.313')] +[2024-11-08 05:32:46,077][42004] Updated weights for policy 0, policy_version 42456 (0.0033) +[2024-11-08 05:32:47,932][41694] Fps is (10 sec: 6553.6, 60 sec: 6622.3, 300 sec: 6692.4). Total num frames: 173907968. Throughput: 0: 1609.6. Samples: 38470780. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:32:47,933][41694] Avg episode reward: [(0, '4.509')] +[2024-11-08 05:32:51,808][42004] Updated weights for policy 0, policy_version 42466 (0.0041) +[2024-11-08 05:32:52,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6621.9, 300 sec: 6692.5). Total num frames: 173948928. Throughput: 0: 1687.6. Samples: 38481160. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:32:52,933][41694] Avg episode reward: [(0, '4.498')] +[2024-11-08 05:32:57,055][42004] Updated weights for policy 0, policy_version 42476 (0.0020) +[2024-11-08 05:32:57,931][41694] Fps is (10 sec: 7782.5, 60 sec: 6621.9, 300 sec: 6744.3). Total num frames: 173985792. Throughput: 0: 1712.5. Samples: 38492768. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:32:57,933][41694] Avg episode reward: [(0, '4.636')] +[2024-11-08 05:33:02,743][42004] Updated weights for policy 0, policy_version 42486 (0.0031) +[2024-11-08 05:33:02,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6913.1, 300 sec: 6748.0). Total num frames: 174022656. Throughput: 0: 1714.8. Samples: 38498082. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:33:02,933][41694] Avg episode reward: [(0, '4.505')] +[2024-11-08 05:33:07,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6963.2, 300 sec: 6761.9). Total num frames: 174059520. Throughput: 0: 1709.1. Samples: 38509114. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:33:07,933][41694] Avg episode reward: [(0, '4.361')] +[2024-11-08 05:33:08,239][42004] Updated weights for policy 0, policy_version 42496 (0.0028) +[2024-11-08 05:33:12,931][41694] Fps is (10 sec: 5734.5, 60 sec: 6690.1, 300 sec: 6706.3). Total num frames: 174080000. Throughput: 0: 1616.7. Samples: 38516476. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:33:12,933][41694] Avg episode reward: [(0, '4.370')] +[2024-11-08 05:33:16,304][42004] Updated weights for policy 0, policy_version 42506 (0.0029) +[2024-11-08 05:33:17,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6758.4, 300 sec: 6734.1). Total num frames: 174116864. Throughput: 0: 1604.9. Samples: 38521240. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:33:17,934][41694] Avg episode reward: [(0, '4.358')] +[2024-11-08 05:33:22,245][42004] Updated weights for policy 0, policy_version 42516 (0.0029) +[2024-11-08 05:33:22,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6621.9, 300 sec: 6720.2). Total num frames: 174149632. Throughput: 0: 1615.4. Samples: 38531462. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:33:22,933][41694] Avg episode reward: [(0, '4.509')] +[2024-11-08 05:33:27,556][42004] Updated weights for policy 0, policy_version 42526 (0.0024) +[2024-11-08 05:33:27,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6621.9, 300 sec: 6720.2). Total num frames: 174186496. Throughput: 0: 1722.3. Samples: 38543072. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:33:27,933][41694] Avg episode reward: [(0, '4.462')] +[2024-11-08 05:33:32,633][42004] Updated weights for policy 0, policy_version 42536 (0.0036) +[2024-11-08 05:33:32,931][41694] Fps is (10 sec: 7782.5, 60 sec: 6690.1, 300 sec: 6775.8). Total num frames: 174227456. Throughput: 0: 1738.0. Samples: 38548990. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:33:32,933][41694] Avg episode reward: [(0, '4.298')] +[2024-11-08 05:33:37,932][41694] Fps is (10 sec: 7782.2, 60 sec: 7031.5, 300 sec: 6775.8). Total num frames: 174264320. Throughput: 0: 1767.8. Samples: 38560712. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:33:37,934][41694] Avg episode reward: [(0, '4.496')] +[2024-11-08 05:33:38,087][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000042546_174268416.pth... +[2024-11-08 05:33:38,091][42004] Updated weights for policy 0, policy_version 42546 (0.0030) +[2024-11-08 05:33:38,205][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000042149_172642304.pth +[2024-11-08 05:33:42,932][41694] Fps is (10 sec: 7372.6, 60 sec: 7099.7, 300 sec: 6761.9). Total num frames: 174301184. Throughput: 0: 1764.7. Samples: 38572180. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:33:42,933][41694] Avg episode reward: [(0, '4.342')] +[2024-11-08 05:33:43,489][42004] Updated weights for policy 0, policy_version 42556 (0.0030) +[2024-11-08 05:33:47,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6894.9, 300 sec: 6734.1). Total num frames: 174321664. Throughput: 0: 1748.1. Samples: 38576748. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:33:47,935][41694] Avg episode reward: [(0, '4.398')] +[2024-11-08 05:33:51,887][42004] Updated weights for policy 0, policy_version 42566 (0.0032) +[2024-11-08 05:33:52,931][41694] Fps is (10 sec: 5324.9, 60 sec: 6758.4, 300 sec: 6734.1). Total num frames: 174354432. Throughput: 0: 1652.8. Samples: 38583492. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:33:52,934][41694] Avg episode reward: [(0, '4.388')] +[2024-11-08 05:33:57,932][41694] Fps is (10 sec: 6553.7, 60 sec: 6690.1, 300 sec: 6720.2). Total num frames: 174387200. Throughput: 0: 1709.6. Samples: 38593410. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:33:57,934][41694] Avg episode reward: [(0, '4.737')] +[2024-11-08 05:33:57,973][42004] Updated weights for policy 0, policy_version 42576 (0.0034) +[2024-11-08 05:34:02,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6758.4, 300 sec: 6748.0). Total num frames: 174428160. Throughput: 0: 1732.2. Samples: 38599188. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:34:02,932][41694] Avg episode reward: [(0, '4.321')] +[2024-11-08 05:34:03,420][42004] Updated weights for policy 0, policy_version 42586 (0.0027) +[2024-11-08 05:34:07,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6758.4, 300 sec: 6789.6). Total num frames: 174465024. Throughput: 0: 1748.2. Samples: 38610132. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:34:07,935][41694] Avg episode reward: [(0, '4.294')] +[2024-11-08 05:34:08,863][42004] Updated weights for policy 0, policy_version 42596 (0.0028) +[2024-11-08 05:34:12,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6963.2, 300 sec: 6761.9). Total num frames: 174497792. Throughput: 0: 1727.1. Samples: 38620790. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:34:12,933][41694] Avg episode reward: [(0, '4.363')] +[2024-11-08 05:34:14,909][42004] Updated weights for policy 0, policy_version 42606 (0.0030) +[2024-11-08 05:34:17,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6963.2, 300 sec: 6761.9). Total num frames: 174534656. Throughput: 0: 1718.4. Samples: 38626318. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:34:17,933][41694] Avg episode reward: [(0, '4.575')] +[2024-11-08 05:34:22,778][42004] Updated weights for policy 0, policy_version 42616 (0.0027) +[2024-11-08 05:34:22,931][41694] Fps is (10 sec: 5734.4, 60 sec: 6758.4, 300 sec: 6720.2). Total num frames: 174555136. Throughput: 0: 1629.4. Samples: 38634034. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:34:22,933][41694] Avg episode reward: [(0, '4.553')] +[2024-11-08 05:34:27,932][41694] Fps is (10 sec: 5324.6, 60 sec: 6690.1, 300 sec: 6720.2). Total num frames: 174587904. Throughput: 0: 1584.4. Samples: 38643478. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:34:27,934][41694] Avg episode reward: [(0, '4.246')] +[2024-11-08 05:34:28,975][42004] Updated weights for policy 0, policy_version 42626 (0.0035) +[2024-11-08 05:34:32,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6621.9, 300 sec: 6720.2). Total num frames: 174624768. Throughput: 0: 1588.1. Samples: 38648214. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:34:32,934][41694] Avg episode reward: [(0, '4.457')] +[2024-11-08 05:34:34,553][42004] Updated weights for policy 0, policy_version 42636 (0.0038) +[2024-11-08 05:34:37,931][41694] Fps is (10 sec: 7373.1, 60 sec: 6621.9, 300 sec: 6734.1). Total num frames: 174661632. Throughput: 0: 1691.4. Samples: 38659604. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:34:37,933][41694] Avg episode reward: [(0, '4.516')] +[2024-11-08 05:34:39,776][42004] Updated weights for policy 0, policy_version 42646 (0.0028) +[2024-11-08 05:34:42,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6621.9, 300 sec: 6775.8). Total num frames: 174698496. Throughput: 0: 1723.6. Samples: 38670974. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:34:42,933][41694] Avg episode reward: [(0, '4.301')] +[2024-11-08 05:34:45,351][42004] Updated weights for policy 0, policy_version 42656 (0.0027) +[2024-11-08 05:34:47,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6894.9, 300 sec: 6775.8). Total num frames: 174735360. Throughput: 0: 1724.3. Samples: 38676780. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:34:47,934][41694] Avg episode reward: [(0, '4.253')] +[2024-11-08 05:34:50,962][42004] Updated weights for policy 0, policy_version 42666 (0.0035) +[2024-11-08 05:34:52,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6963.2, 300 sec: 6803.5). Total num frames: 174772224. Throughput: 0: 1723.8. Samples: 38687704. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:34:52,933][41694] Avg episode reward: [(0, '4.475')] +[2024-11-08 05:34:57,933][41694] Fps is (10 sec: 5324.2, 60 sec: 6690.0, 300 sec: 6748.0). Total num frames: 174788608. Throughput: 0: 1622.2. Samples: 38693790. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 05:34:57,935][41694] Avg episode reward: [(0, '4.319')] +[2024-11-08 05:34:59,399][42004] Updated weights for policy 0, policy_version 42676 (0.0030) +[2024-11-08 05:35:02,931][41694] Fps is (10 sec: 4505.6, 60 sec: 6485.3, 300 sec: 6706.3). Total num frames: 174817280. Throughput: 0: 1607.2. Samples: 38698644. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 05:35:02,933][41694] Avg episode reward: [(0, '4.408')] +[2024-11-08 05:35:06,228][42004] Updated weights for policy 0, policy_version 42686 (0.0032) +[2024-11-08 05:35:07,932][41694] Fps is (10 sec: 6144.3, 60 sec: 6417.0, 300 sec: 6706.3). Total num frames: 174850048. Throughput: 0: 1635.8. Samples: 38707644. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 05:35:07,934][41694] Avg episode reward: [(0, '4.364')] +[2024-11-08 05:35:12,031][42004] Updated weights for policy 0, policy_version 42696 (0.0036) +[2024-11-08 05:35:12,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6485.3, 300 sec: 6720.2). Total num frames: 174886912. Throughput: 0: 1659.6. Samples: 38718158. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 05:35:12,933][41694] Avg episode reward: [(0, '4.469')] +[2024-11-08 05:35:17,932][41694] Fps is (10 sec: 6963.6, 60 sec: 6417.0, 300 sec: 6762.0). Total num frames: 174919680. Throughput: 0: 1663.5. Samples: 38723072. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 05:35:17,933][41694] Avg episode reward: [(0, '4.535')] +[2024-11-08 05:35:18,229][42004] Updated weights for policy 0, policy_version 42706 (0.0033) +[2024-11-08 05:35:22,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6690.1, 300 sec: 6761.9). Total num frames: 174956544. Throughput: 0: 1636.3. Samples: 38733236. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:35:22,934][41694] Avg episode reward: [(0, '4.454')] +[2024-11-08 05:35:23,784][42004] Updated weights for policy 0, policy_version 42716 (0.0031) +[2024-11-08 05:35:27,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6758.4, 300 sec: 6775.8). Total num frames: 174993408. Throughput: 0: 1627.0. Samples: 38744188. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:35:27,934][41694] Avg episode reward: [(0, '4.516')] +[2024-11-08 05:35:30,944][42004] Updated weights for policy 0, policy_version 42726 (0.0038) +[2024-11-08 05:35:32,931][41694] Fps is (10 sec: 6144.1, 60 sec: 6553.6, 300 sec: 6720.2). Total num frames: 175017984. Throughput: 0: 1568.7. Samples: 38747372. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:35:32,934][41694] Avg episode reward: [(0, '4.477')] +[2024-11-08 05:35:37,023][42004] Updated weights for policy 0, policy_version 42736 (0.0045) +[2024-11-08 05:35:37,932][41694] Fps is (10 sec: 5734.6, 60 sec: 6485.3, 300 sec: 6706.3). Total num frames: 175050752. Throughput: 0: 1547.9. Samples: 38757360. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:35:37,934][41694] Avg episode reward: [(0, '4.406')] +[2024-11-08 05:35:37,947][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000042737_175050752.pth... +[2024-11-08 05:35:38,070][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000042344_173441024.pth +[2024-11-08 05:35:42,661][42004] Updated weights for policy 0, policy_version 42746 (0.0026) +[2024-11-08 05:35:42,932][41694] Fps is (10 sec: 6963.0, 60 sec: 6485.3, 300 sec: 6706.3). Total num frames: 175087616. Throughput: 0: 1652.7. Samples: 38768158. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:35:42,933][41694] Avg episode reward: [(0, '4.372')] +[2024-11-08 05:35:47,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6485.3, 300 sec: 6767.9). Total num frames: 175124480. Throughput: 0: 1661.6. Samples: 38773414. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:35:47,934][41694] Avg episode reward: [(0, '4.299')] +[2024-11-08 05:35:48,192][42004] Updated weights for policy 0, policy_version 42756 (0.0028) +[2024-11-08 05:35:52,932][41694] Fps is (10 sec: 7372.9, 60 sec: 6485.3, 300 sec: 6761.9). Total num frames: 175161344. Throughput: 0: 1716.9. Samples: 38784902. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:35:52,935][41694] Avg episode reward: [(0, '4.342')] +[2024-11-08 05:35:53,503][42004] Updated weights for policy 0, policy_version 42766 (0.0029) +[2024-11-08 05:35:57,932][41694] Fps is (10 sec: 7782.2, 60 sec: 6895.0, 300 sec: 6775.7). Total num frames: 175202304. Throughput: 0: 1739.6. Samples: 38796440. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:35:57,934][41694] Avg episode reward: [(0, '4.488')] +[2024-11-08 05:35:58,952][42004] Updated weights for policy 0, policy_version 42776 (0.0033) +[2024-11-08 05:36:04,242][41694] Fps is (10 sec: 5794.1, 60 sec: 6680.7, 300 sec: 6718.1). Total num frames: 175226880. Throughput: 0: 1702.8. Samples: 38801930. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:36:04,244][41694] Avg episode reward: [(0, '4.670')] +[2024-11-08 05:36:07,549][42004] Updated weights for policy 0, policy_version 42786 (0.0024) +[2024-11-08 05:36:07,932][41694] Fps is (10 sec: 4915.2, 60 sec: 6690.2, 300 sec: 6706.4). Total num frames: 175251456. Throughput: 0: 1667.8. Samples: 38808286. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:36:07,935][41694] Avg episode reward: [(0, '4.459')] +[2024-11-08 05:36:12,932][41694] Fps is (10 sec: 6128.0, 60 sec: 6553.6, 300 sec: 6692.4). Total num frames: 175280128. Throughput: 0: 1617.6. Samples: 38816980. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:36:12,934][41694] Avg episode reward: [(0, '4.267')] +[2024-11-08 05:36:14,191][42004] Updated weights for policy 0, policy_version 42796 (0.0029) +[2024-11-08 05:36:17,932][41694] Fps is (10 sec: 6553.5, 60 sec: 6621.8, 300 sec: 6692.4). Total num frames: 175316992. Throughput: 0: 1657.8. Samples: 38821972. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:36:17,934][41694] Avg episode reward: [(0, '4.270')] +[2024-11-08 05:36:19,918][42004] Updated weights for policy 0, policy_version 42806 (0.0027) +[2024-11-08 05:36:22,931][41694] Fps is (10 sec: 7373.0, 60 sec: 6621.9, 300 sec: 6739.3). Total num frames: 175353856. Throughput: 0: 1678.8. Samples: 38832906. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:36:22,933][41694] Avg episode reward: [(0, '4.380')] +[2024-11-08 05:36:25,393][42004] Updated weights for policy 0, policy_version 42816 (0.0022) +[2024-11-08 05:36:27,932][41694] Fps is (10 sec: 7373.1, 60 sec: 6621.9, 300 sec: 6734.1). Total num frames: 175390720. Throughput: 0: 1687.4. Samples: 38844092. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:36:27,933][41694] Avg episode reward: [(0, '4.601')] +[2024-11-08 05:36:30,909][42004] Updated weights for policy 0, policy_version 42826 (0.0026) +[2024-11-08 05:36:32,931][41694] Fps is (10 sec: 7372.7, 60 sec: 6826.7, 300 sec: 6748.0). Total num frames: 175427584. Throughput: 0: 1693.8. Samples: 38849636. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:36:32,934][41694] Avg episode reward: [(0, '4.408')] +[2024-11-08 05:36:38,923][41694] Fps is (10 sec: 5589.7, 60 sec: 6581.4, 300 sec: 6697.7). Total num frames: 175452160. Throughput: 0: 1643.3. Samples: 38860478. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:36:38,928][41694] Avg episode reward: [(0, '4.433')] +[2024-11-08 05:36:39,060][42004] Updated weights for policy 0, policy_version 42836 (0.0041) +[2024-11-08 05:36:42,932][41694] Fps is (10 sec: 4915.2, 60 sec: 6485.3, 300 sec: 6664.8). Total num frames: 175476736. Throughput: 0: 1551.8. Samples: 38866272. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:36:42,934][41694] Avg episode reward: [(0, '4.412')] +[2024-11-08 05:36:45,509][42004] Updated weights for policy 0, policy_version 42846 (0.0041) +[2024-11-08 05:36:47,932][41694] Fps is (10 sec: 6820.4, 60 sec: 6485.3, 300 sec: 6650.8). Total num frames: 175513600. Throughput: 0: 1580.6. Samples: 38870986. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:36:47,933][41694] Avg episode reward: [(0, '4.318')] +[2024-11-08 05:36:51,132][42004] Updated weights for policy 0, policy_version 42856 (0.0030) +[2024-11-08 05:36:52,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6485.3, 300 sec: 6650.8). Total num frames: 175550464. Throughput: 0: 1635.2. Samples: 38881870. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:36:52,933][41694] Avg episode reward: [(0, '4.559')] +[2024-11-08 05:36:56,497][42004] Updated weights for policy 0, policy_version 42866 (0.0021) +[2024-11-08 05:36:57,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6417.1, 300 sec: 6709.9). Total num frames: 175587328. Throughput: 0: 1697.4. Samples: 38893362. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:36:57,933][41694] Avg episode reward: [(0, '4.542')] +[2024-11-08 05:37:01,983][42004] Updated weights for policy 0, policy_version 42876 (0.0039) +[2024-11-08 05:37:02,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6769.8, 300 sec: 6720.2). Total num frames: 175624192. Throughput: 0: 1713.8. Samples: 38899094. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:37:02,934][41694] Avg episode reward: [(0, '4.371')] +[2024-11-08 05:37:07,451][42004] Updated weights for policy 0, policy_version 42886 (0.0024) +[2024-11-08 05:37:07,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6826.7, 300 sec: 6720.2). Total num frames: 175661056. Throughput: 0: 1716.1. Samples: 38910130. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:37:07,934][41694] Avg episode reward: [(0, '4.497')] +[2024-11-08 05:37:13,310][41694] Fps is (10 sec: 5920.0, 60 sec: 6716.1, 300 sec: 6683.9). Total num frames: 175685632. Throughput: 0: 1577.6. Samples: 38915680. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:37:13,313][41694] Avg episode reward: [(0, '4.591')] +[2024-11-08 05:37:15,519][42004] Updated weights for policy 0, policy_version 42896 (0.0036) +[2024-11-08 05:37:17,931][41694] Fps is (10 sec: 5324.9, 60 sec: 6621.9, 300 sec: 6650.8). Total num frames: 175714304. Throughput: 0: 1614.3. Samples: 38922280. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:37:17,935][41694] Avg episode reward: [(0, '4.378')] +[2024-11-08 05:37:21,543][42004] Updated weights for policy 0, policy_version 42906 (0.0019) +[2024-11-08 05:37:22,932][41694] Fps is (10 sec: 6810.8, 60 sec: 6621.8, 300 sec: 6650.8). Total num frames: 175751168. Throughput: 0: 1628.2. Samples: 38932132. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:37:22,933][41694] Avg episode reward: [(0, '4.441')] +[2024-11-08 05:37:26,901][42004] Updated weights for policy 0, policy_version 42916 (0.0019) +[2024-11-08 05:37:27,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6621.9, 300 sec: 6650.8). Total num frames: 175788032. Throughput: 0: 1722.6. Samples: 38943790. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:37:27,933][41694] Avg episode reward: [(0, '4.450')] +[2024-11-08 05:37:32,130][42004] Updated weights for policy 0, policy_version 42926 (0.0029) +[2024-11-08 05:37:32,932][41694] Fps is (10 sec: 7782.9, 60 sec: 6690.1, 300 sec: 6734.1). Total num frames: 175828992. Throughput: 0: 1746.3. Samples: 38949568. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:37:32,934][41694] Avg episode reward: [(0, '4.656')] +[2024-11-08 05:37:37,531][42004] Updated weights for policy 0, policy_version 42936 (0.0028) +[2024-11-08 05:37:37,933][41694] Fps is (10 sec: 7781.4, 60 sec: 7010.7, 300 sec: 6748.0). Total num frames: 175865856. Throughput: 0: 1768.8. Samples: 38961466. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:37:37,936][41694] Avg episode reward: [(0, '4.465')] +[2024-11-08 05:37:37,949][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000042936_175865856.pth... +[2024-11-08 05:37:38,084][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000042546_174268416.pth +[2024-11-08 05:37:42,932][41694] Fps is (10 sec: 7372.9, 60 sec: 7099.7, 300 sec: 6761.9). Total num frames: 175902720. Throughput: 0: 1752.8. Samples: 38972240. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:37:42,933][41694] Avg episode reward: [(0, '4.607')] +[2024-11-08 05:37:43,085][42004] Updated weights for policy 0, policy_version 42946 (0.0031) +[2024-11-08 05:37:47,982][41694] Fps is (10 sec: 5706.4, 60 sec: 6820.9, 300 sec: 6691.3). Total num frames: 175923200. Throughput: 0: 1741.7. Samples: 38977560. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:37:47,989][41694] Avg episode reward: [(0, '4.405')] +[2024-11-08 05:37:51,778][42004] Updated weights for policy 0, policy_version 42956 (0.0041) +[2024-11-08 05:37:52,932][41694] Fps is (10 sec: 5324.8, 60 sec: 6758.4, 300 sec: 6678.6). Total num frames: 175955968. Throughput: 0: 1623.1. Samples: 38983168. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:37:52,940][41694] Avg episode reward: [(0, '4.552')] +[2024-11-08 05:37:57,308][42004] Updated weights for policy 0, policy_version 42966 (0.0025) +[2024-11-08 05:37:57,931][41694] Fps is (10 sec: 6998.5, 60 sec: 6758.4, 300 sec: 6678.6). Total num frames: 175992832. Throughput: 0: 1760.9. Samples: 38994256. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:37:57,933][41694] Avg episode reward: [(0, '4.415')] +[2024-11-08 05:38:02,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6690.1, 300 sec: 6664.7). Total num frames: 176025600. Throughput: 0: 1724.7. Samples: 38999892. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:38:02,933][41694] Avg episode reward: [(0, '4.742')] +[2024-11-08 05:38:02,952][42004] Updated weights for policy 0, policy_version 42976 (0.0025) +[2024-11-08 05:38:07,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6758.4, 300 sec: 6734.1). Total num frames: 176066560. Throughput: 0: 1755.2. Samples: 39011116. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:38:07,933][41694] Avg episode reward: [(0, '4.411')] +[2024-11-08 05:38:08,066][42004] Updated weights for policy 0, policy_version 42986 (0.0032) +[2024-11-08 05:38:12,931][41694] Fps is (10 sec: 8192.0, 60 sec: 7076.1, 300 sec: 6748.0). Total num frames: 176107520. Throughput: 0: 1761.0. Samples: 39023034. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:38:12,937][41694] Avg episode reward: [(0, '4.377')] +[2024-11-08 05:38:13,347][42004] Updated weights for policy 0, policy_version 42996 (0.0036) +[2024-11-08 05:38:17,932][41694] Fps is (10 sec: 7782.4, 60 sec: 7168.0, 300 sec: 6761.9). Total num frames: 176144384. Throughput: 0: 1755.2. Samples: 39028552. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:38:17,933][41694] Avg episode reward: [(0, '4.535')] +[2024-11-08 05:38:18,832][42004] Updated weights for policy 0, policy_version 43006 (0.0028) +[2024-11-08 05:38:22,932][41694] Fps is (10 sec: 5324.7, 60 sec: 6826.7, 300 sec: 6692.4). Total num frames: 176160768. Throughput: 0: 1695.2. Samples: 39037746. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:38:22,934][41694] Avg episode reward: [(0, '4.714')] +[2024-11-08 05:38:27,635][42004] Updated weights for policy 0, policy_version 43016 (0.0035) +[2024-11-08 05:38:27,932][41694] Fps is (10 sec: 4915.0, 60 sec: 6758.3, 300 sec: 6664.7). Total num frames: 176193536. Throughput: 0: 1614.6. Samples: 39044896. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:38:27,935][41694] Avg episode reward: [(0, '4.674')] +[2024-11-08 05:38:32,929][42004] Updated weights for policy 0, policy_version 43026 (0.0040) +[2024-11-08 05:38:32,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6758.4, 300 sec: 6678.6). Total num frames: 176234496. Throughput: 0: 1618.5. Samples: 39050312. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:38:32,934][41694] Avg episode reward: [(0, '4.485')] +[2024-11-08 05:38:37,931][41694] Fps is (10 sec: 7782.8, 60 sec: 6758.5, 300 sec: 6678.6). Total num frames: 176271360. Throughput: 0: 1755.7. Samples: 39062176. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:38:37,933][41694] Avg episode reward: [(0, '4.415')] +[2024-11-08 05:38:38,251][42004] Updated weights for policy 0, policy_version 43036 (0.0032) +[2024-11-08 05:38:42,931][41694] Fps is (10 sec: 7782.6, 60 sec: 6826.7, 300 sec: 6748.0). Total num frames: 176312320. Throughput: 0: 1771.3. Samples: 39073966. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:38:42,933][41694] Avg episode reward: [(0, '4.493')] +[2024-11-08 05:38:43,388][42004] Updated weights for policy 0, policy_version 43046 (0.0027) +[2024-11-08 05:38:47,931][41694] Fps is (10 sec: 7782.4, 60 sec: 7105.7, 300 sec: 6761.9). Total num frames: 176349184. Throughput: 0: 1771.7. Samples: 39079620. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:38:47,934][41694] Avg episode reward: [(0, '4.418')] +[2024-11-08 05:38:48,762][42004] Updated weights for policy 0, policy_version 43056 (0.0028) +[2024-11-08 05:38:52,932][41694] Fps is (10 sec: 6963.0, 60 sec: 7099.7, 300 sec: 6761.9). Total num frames: 176381952. Throughput: 0: 1759.6. Samples: 39090300. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:38:52,933][41694] Avg episode reward: [(0, '4.389')] +[2024-11-08 05:38:57,494][42004] Updated weights for policy 0, policy_version 43066 (0.0035) +[2024-11-08 05:38:57,932][41694] Fps is (10 sec: 4914.8, 60 sec: 6758.3, 300 sec: 6678.5). Total num frames: 176398336. Throughput: 0: 1634.4. Samples: 39096584. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:38:57,934][41694] Avg episode reward: [(0, '4.460')] +[2024-11-08 05:39:02,932][41694] Fps is (10 sec: 4915.2, 60 sec: 6758.4, 300 sec: 6664.7). Total num frames: 176431104. Throughput: 0: 1607.6. Samples: 39100894. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:39:02,935][41694] Avg episode reward: [(0, '4.440')] +[2024-11-08 05:39:03,869][42004] Updated weights for policy 0, policy_version 43076 (0.0048) +[2024-11-08 05:39:07,931][41694] Fps is (10 sec: 6963.8, 60 sec: 6690.1, 300 sec: 6678.6). Total num frames: 176467968. Throughput: 0: 1630.2. Samples: 39111106. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:39:07,934][41694] Avg episode reward: [(0, '4.549')] +[2024-11-08 05:39:09,428][42004] Updated weights for policy 0, policy_version 43086 (0.0037) +[2024-11-08 05:39:12,932][41694] Fps is (10 sec: 7372.9, 60 sec: 6621.8, 300 sec: 6678.6). Total num frames: 176504832. Throughput: 0: 1720.1. Samples: 39122300. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:39:12,935][41694] Avg episode reward: [(0, '4.519')] +[2024-11-08 05:39:15,147][42004] Updated weights for policy 0, policy_version 43096 (0.0034) +[2024-11-08 05:39:17,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6621.9, 300 sec: 6734.1). Total num frames: 176541696. Throughput: 0: 1719.7. Samples: 39127698. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:39:17,937][41694] Avg episode reward: [(0, '4.700')] +[2024-11-08 05:39:20,457][42004] Updated weights for policy 0, policy_version 43106 (0.0040) +[2024-11-08 05:39:22,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6963.2, 300 sec: 6748.0). Total num frames: 176578560. Throughput: 0: 1706.0. Samples: 39138948. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:39:22,934][41694] Avg episode reward: [(0, '4.481')] +[2024-11-08 05:39:26,177][42004] Updated weights for policy 0, policy_version 43116 (0.0034) +[2024-11-08 05:39:27,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6963.2, 300 sec: 6734.1). Total num frames: 176611328. Throughput: 0: 1686.0. Samples: 39149836. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:39:27,933][41694] Avg episode reward: [(0, '4.337')] +[2024-11-08 05:39:32,932][41694] Fps is (10 sec: 4915.2, 60 sec: 6553.6, 300 sec: 6664.7). Total num frames: 176627712. Throughput: 0: 1633.9. Samples: 39153144. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:39:32,933][41694] Avg episode reward: [(0, '4.304')] +[2024-11-08 05:39:35,021][42004] Updated weights for policy 0, policy_version 43126 (0.0028) +[2024-11-08 05:39:37,931][41694] Fps is (10 sec: 5324.9, 60 sec: 6553.6, 300 sec: 6664.7). Total num frames: 176664576. Throughput: 0: 1556.0. Samples: 39160320. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:39:37,933][41694] Avg episode reward: [(0, '4.463')] +[2024-11-08 05:39:37,946][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000043131_176664576.pth... +[2024-11-08 05:39:38,043][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000042737_175050752.pth +[2024-11-08 05:39:40,775][42004] Updated weights for policy 0, policy_version 43136 (0.0042) +[2024-11-08 05:39:42,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6417.1, 300 sec: 6650.8). Total num frames: 176697344. Throughput: 0: 1652.4. Samples: 39170942. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:39:42,934][41694] Avg episode reward: [(0, '4.324')] +[2024-11-08 05:39:46,546][42004] Updated weights for policy 0, policy_version 43146 (0.0032) +[2024-11-08 05:39:47,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6417.1, 300 sec: 6650.8). Total num frames: 176734208. Throughput: 0: 1670.2. Samples: 39176054. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:39:47,934][41694] Avg episode reward: [(0, '4.604')] +[2024-11-08 05:39:52,175][42004] Updated weights for policy 0, policy_version 43156 (0.0021) +[2024-11-08 05:39:52,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6485.4, 300 sec: 6720.2). Total num frames: 176771072. Throughput: 0: 1686.3. Samples: 39186988. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:39:52,933][41694] Avg episode reward: [(0, '4.416')] +[2024-11-08 05:39:57,686][42004] Updated weights for policy 0, policy_version 43166 (0.0036) +[2024-11-08 05:39:57,935][41694] Fps is (10 sec: 7370.4, 60 sec: 6826.4, 300 sec: 6747.9). Total num frames: 176807936. Throughput: 0: 1687.5. Samples: 39198244. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:39:57,937][41694] Avg episode reward: [(0, '4.436')] +[2024-11-08 05:40:02,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6826.7, 300 sec: 6748.0). Total num frames: 176840704. Throughput: 0: 1687.3. Samples: 39203628. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:40:02,933][41694] Avg episode reward: [(0, '4.625')] +[2024-11-08 05:40:03,879][42004] Updated weights for policy 0, policy_version 43176 (0.0033) +[2024-11-08 05:40:07,932][41694] Fps is (10 sec: 5326.3, 60 sec: 6553.5, 300 sec: 6692.4). Total num frames: 176861184. Throughput: 0: 1589.5. Samples: 39210476. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:40:07,934][41694] Avg episode reward: [(0, '4.432')] +[2024-11-08 05:40:11,492][42004] Updated weights for policy 0, policy_version 43186 (0.0038) +[2024-11-08 05:40:12,931][41694] Fps is (10 sec: 5734.4, 60 sec: 6553.6, 300 sec: 6706.3). Total num frames: 176898048. Throughput: 0: 1578.3. Samples: 39220858. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:40:12,933][41694] Avg episode reward: [(0, '4.598')] +[2024-11-08 05:40:17,662][42004] Updated weights for policy 0, policy_version 43196 (0.0028) +[2024-11-08 05:40:17,933][41694] Fps is (10 sec: 6962.8, 60 sec: 6485.2, 300 sec: 6692.4). Total num frames: 176930816. Throughput: 0: 1610.1. Samples: 39225600. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:40:17,935][41694] Avg episode reward: [(0, '4.712')] +[2024-11-08 05:40:22,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6485.4, 300 sec: 6692.5). Total num frames: 176967680. Throughput: 0: 1681.9. Samples: 39236004. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:40:22,933][41694] Avg episode reward: [(0, '4.563')] +[2024-11-08 05:40:23,502][42004] Updated weights for policy 0, policy_version 43206 (0.0037) +[2024-11-08 05:40:27,932][41694] Fps is (10 sec: 7373.0, 60 sec: 6553.5, 300 sec: 6734.1). Total num frames: 177004544. Throughput: 0: 1687.6. Samples: 39246884. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:40:27,934][41694] Avg episode reward: [(0, '4.640')] +[2024-11-08 05:40:28,975][42004] Updated weights for policy 0, policy_version 43216 (0.0032) +[2024-11-08 05:40:32,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6894.9, 300 sec: 6748.0). Total num frames: 177041408. Throughput: 0: 1698.0. Samples: 39252462. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:40:32,933][41694] Avg episode reward: [(0, '4.621')] +[2024-11-08 05:40:34,645][42004] Updated weights for policy 0, policy_version 43226 (0.0038) +[2024-11-08 05:40:39,810][41694] Fps is (10 sec: 5862.5, 60 sec: 6619.4, 300 sec: 6691.5). Total num frames: 177074176. Throughput: 0: 1623.9. Samples: 39263112. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:40:39,813][41694] Avg episode reward: [(0, '4.588')] +[2024-11-08 05:40:42,913][42004] Updated weights for policy 0, policy_version 43236 (0.0036) +[2024-11-08 05:40:42,932][41694] Fps is (10 sec: 5324.7, 60 sec: 6621.8, 300 sec: 6678.6). Total num frames: 177094656. Throughput: 0: 1581.1. Samples: 39269390. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:40:42,934][41694] Avg episode reward: [(0, '4.445')] +[2024-11-08 05:40:47,932][41694] Fps is (10 sec: 7060.3, 60 sec: 6621.9, 300 sec: 6678.6). Total num frames: 177131520. Throughput: 0: 1583.6. Samples: 39274892. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:40:47,934][41694] Avg episode reward: [(0, '4.436')] +[2024-11-08 05:40:48,377][42004] Updated weights for policy 0, policy_version 43246 (0.0024) +[2024-11-08 05:40:52,932][41694] Fps is (10 sec: 7372.9, 60 sec: 6621.8, 300 sec: 6664.7). Total num frames: 177168384. Throughput: 0: 1682.2. Samples: 39286174. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:40:52,933][41694] Avg episode reward: [(0, '4.518')] +[2024-11-08 05:40:53,870][42004] Updated weights for policy 0, policy_version 43256 (0.0038) +[2024-11-08 05:40:57,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6622.2, 300 sec: 6736.3). Total num frames: 177205248. Throughput: 0: 1695.0. Samples: 39297134. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:40:57,934][41694] Avg episode reward: [(0, '4.544')] +[2024-11-08 05:40:59,668][42004] Updated weights for policy 0, policy_version 43266 (0.0029) +[2024-11-08 05:41:02,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6621.9, 300 sec: 6734.1). Total num frames: 177238016. Throughput: 0: 1713.0. Samples: 39302682. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:41:02,933][41694] Avg episode reward: [(0, '4.118')] +[2024-11-08 05:41:05,411][42004] Updated weights for policy 0, policy_version 43276 (0.0049) +[2024-11-08 05:41:07,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6894.9, 300 sec: 6761.9). Total num frames: 177274880. Throughput: 0: 1717.7. Samples: 39313300. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:41:07,935][41694] Avg episode reward: [(0, '4.473')] +[2024-11-08 05:41:11,878][42004] Updated weights for policy 0, policy_version 43286 (0.0038) +[2024-11-08 05:41:14,031][41694] Fps is (10 sec: 5535.1, 60 sec: 6569.7, 300 sec: 6695.3). Total num frames: 177299456. Throughput: 0: 1554.5. Samples: 39318544. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:41:14,034][41694] Avg episode reward: [(0, '4.377')] +[2024-11-08 05:41:17,931][41694] Fps is (10 sec: 4915.4, 60 sec: 6553.7, 300 sec: 6678.6). Total num frames: 177324032. Throughput: 0: 1598.0. Samples: 39324374. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:41:17,933][41694] Avg episode reward: [(0, '4.383')] +[2024-11-08 05:41:19,850][42004] Updated weights for policy 0, policy_version 43296 (0.0042) +[2024-11-08 05:41:22,932][41694] Fps is (10 sec: 6903.2, 60 sec: 6553.6, 300 sec: 6678.6). Total num frames: 177360896. Throughput: 0: 1662.6. Samples: 39334808. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:41:22,933][41694] Avg episode reward: [(0, '4.274')] +[2024-11-08 05:41:25,823][42004] Updated weights for policy 0, policy_version 43306 (0.0033) +[2024-11-08 05:41:27,932][41694] Fps is (10 sec: 6962.8, 60 sec: 6485.4, 300 sec: 6664.7). Total num frames: 177393664. Throughput: 0: 1684.2. Samples: 39345180. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:41:27,934][41694] Avg episode reward: [(0, '4.331')] +[2024-11-08 05:41:31,337][42004] Updated weights for policy 0, policy_version 43316 (0.0031) +[2024-11-08 05:41:32,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6485.3, 300 sec: 6729.0). Total num frames: 177430528. Throughput: 0: 1682.8. Samples: 39350616. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:41:32,934][41694] Avg episode reward: [(0, '4.522')] +[2024-11-08 05:41:36,779][42004] Updated weights for policy 0, policy_version 43326 (0.0029) +[2024-11-08 05:41:37,932][41694] Fps is (10 sec: 7782.6, 60 sec: 6835.8, 300 sec: 6761.9). Total num frames: 177471488. Throughput: 0: 1684.0. Samples: 39361956. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:41:37,934][41694] Avg episode reward: [(0, '4.589')] +[2024-11-08 05:41:37,956][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000043328_177471488.pth... +[2024-11-08 05:41:38,106][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000042936_175865856.pth +[2024-11-08 05:41:42,373][42004] Updated weights for policy 0, policy_version 43336 (0.0031) +[2024-11-08 05:41:42,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6826.7, 300 sec: 6748.0). Total num frames: 177504256. Throughput: 0: 1684.8. Samples: 39372950. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:41:42,936][41694] Avg episode reward: [(0, '4.480')] +[2024-11-08 05:41:48,281][41694] Fps is (10 sec: 5540.8, 60 sec: 6583.5, 300 sec: 6698.4). Total num frames: 177528832. Throughput: 0: 1658.1. Samples: 39377876. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:41:48,284][41694] Avg episode reward: [(0, '4.439')] +[2024-11-08 05:41:50,504][42004] Updated weights for policy 0, policy_version 43346 (0.0024) +[2024-11-08 05:41:52,931][41694] Fps is (10 sec: 5734.5, 60 sec: 6553.6, 300 sec: 6692.4). Total num frames: 177561600. Throughput: 0: 1592.1. Samples: 39384942. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:41:52,933][41694] Avg episode reward: [(0, '4.220')] +[2024-11-08 05:41:55,954][42004] Updated weights for policy 0, policy_version 43356 (0.0026) +[2024-11-08 05:41:57,931][41694] Fps is (10 sec: 7215.4, 60 sec: 6553.6, 300 sec: 6692.4). Total num frames: 177598464. Throughput: 0: 1770.8. Samples: 39396284. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:41:57,933][41694] Avg episode reward: [(0, '4.404')] +[2024-11-08 05:42:01,223][42004] Updated weights for policy 0, policy_version 43366 (0.0026) +[2024-11-08 05:42:02,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6621.9, 300 sec: 6692.4). Total num frames: 177635328. Throughput: 0: 1721.9. Samples: 39401858. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:42:02,933][41694] Avg episode reward: [(0, '4.448')] +[2024-11-08 05:42:06,929][42004] Updated weights for policy 0, policy_version 43376 (0.0033) +[2024-11-08 05:42:07,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6621.9, 300 sec: 6742.7). Total num frames: 177672192. Throughput: 0: 1730.9. Samples: 39412700. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:42:07,933][41694] Avg episode reward: [(0, '4.488')] +[2024-11-08 05:42:12,270][42004] Updated weights for policy 0, policy_version 43386 (0.0037) +[2024-11-08 05:42:12,931][41694] Fps is (10 sec: 7782.5, 60 sec: 7023.7, 300 sec: 6775.8). Total num frames: 177713152. Throughput: 0: 1759.6. Samples: 39424362. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:42:12,933][41694] Avg episode reward: [(0, '4.413')] +[2024-11-08 05:42:17,827][42004] Updated weights for policy 0, policy_version 43396 (0.0026) +[2024-11-08 05:42:17,932][41694] Fps is (10 sec: 7782.4, 60 sec: 7099.7, 300 sec: 6775.8). Total num frames: 177750016. Throughput: 0: 1761.7. Samples: 39429892. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:42:17,933][41694] Avg episode reward: [(0, '4.606')] +[2024-11-08 05:42:22,932][41694] Fps is (10 sec: 5324.6, 60 sec: 6758.4, 300 sec: 6706.3). Total num frames: 177766400. Throughput: 0: 1719.9. Samples: 39439350. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:42:22,934][41694] Avg episode reward: [(0, '4.534')] +[2024-11-08 05:42:25,800][42004] Updated weights for policy 0, policy_version 43406 (0.0026) +[2024-11-08 05:42:27,932][41694] Fps is (10 sec: 5324.7, 60 sec: 6826.7, 300 sec: 6692.4). Total num frames: 177803264. Throughput: 0: 1663.2. Samples: 39447792. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:42:27,933][41694] Avg episode reward: [(0, '4.528')] +[2024-11-08 05:42:31,170][42004] Updated weights for policy 0, policy_version 43416 (0.0032) +[2024-11-08 05:42:32,931][41694] Fps is (10 sec: 7782.7, 60 sec: 6894.9, 300 sec: 6706.4). Total num frames: 177844224. Throughput: 0: 1686.6. Samples: 39453184. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:42:32,933][41694] Avg episode reward: [(0, '4.535')] +[2024-11-08 05:42:36,780][42004] Updated weights for policy 0, policy_version 43426 (0.0027) +[2024-11-08 05:42:37,931][41694] Fps is (10 sec: 7373.0, 60 sec: 6758.4, 300 sec: 6692.4). Total num frames: 177876992. Throughput: 0: 1764.2. Samples: 39464332. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:42:37,933][41694] Avg episode reward: [(0, '4.539')] +[2024-11-08 05:42:42,237][42004] Updated weights for policy 0, policy_version 43436 (0.0028) +[2024-11-08 05:42:42,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6894.9, 300 sec: 6763.0). Total num frames: 177917952. Throughput: 0: 1763.6. Samples: 39475646. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:42:42,933][41694] Avg episode reward: [(0, '4.451')] +[2024-11-08 05:42:47,573][42004] Updated weights for policy 0, policy_version 43446 (0.0028) +[2024-11-08 05:42:47,932][41694] Fps is (10 sec: 7782.3, 60 sec: 7141.3, 300 sec: 6775.8). Total num frames: 177954816. Throughput: 0: 1763.3. Samples: 39481208. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:42:47,934][41694] Avg episode reward: [(0, '4.552')] +[2024-11-08 05:42:52,931][41694] Fps is (10 sec: 7372.9, 60 sec: 7168.0, 300 sec: 6775.8). Total num frames: 177991680. Throughput: 0: 1765.7. Samples: 39492158. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:42:52,933][41694] Avg episode reward: [(0, '4.478')] +[2024-11-08 05:42:53,465][42004] Updated weights for policy 0, policy_version 43456 (0.0035) +[2024-11-08 05:42:57,931][41694] Fps is (10 sec: 5324.8, 60 sec: 6826.7, 300 sec: 6720.2). Total num frames: 178008064. Throughput: 0: 1649.6. Samples: 39498596. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:42:57,933][41694] Avg episode reward: [(0, '4.201')] +[2024-11-08 05:43:01,274][42004] Updated weights for policy 0, policy_version 43466 (0.0048) +[2024-11-08 05:43:02,931][41694] Fps is (10 sec: 5324.8, 60 sec: 6826.7, 300 sec: 6706.3). Total num frames: 178044928. Throughput: 0: 1650.2. Samples: 39504152. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:43:02,934][41694] Avg episode reward: [(0, '4.661')] +[2024-11-08 05:43:07,114][42004] Updated weights for policy 0, policy_version 43476 (0.0021) +[2024-11-08 05:43:07,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6826.7, 300 sec: 6692.4). Total num frames: 178081792. Throughput: 0: 1677.7. Samples: 39514844. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:43:07,934][41694] Avg episode reward: [(0, '4.552')] +[2024-11-08 05:43:12,408][42004] Updated weights for policy 0, policy_version 43486 (0.0025) +[2024-11-08 05:43:12,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6758.4, 300 sec: 6692.5). Total num frames: 178118656. Throughput: 0: 1748.9. Samples: 39526490. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:43:12,933][41694] Avg episode reward: [(0, '4.508')] +[2024-11-08 05:43:17,689][42004] Updated weights for policy 0, policy_version 43496 (0.0039) +[2024-11-08 05:43:17,932][41694] Fps is (10 sec: 7782.3, 60 sec: 6826.7, 300 sec: 6775.8). Total num frames: 178159616. Throughput: 0: 1752.5. Samples: 39532048. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:43:17,933][41694] Avg episode reward: [(0, '4.288')] +[2024-11-08 05:43:22,931][41694] Fps is (10 sec: 7782.4, 60 sec: 7168.0, 300 sec: 6789.7). Total num frames: 178196480. Throughput: 0: 1758.1. Samples: 39543448. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:43:22,933][41694] Avg episode reward: [(0, '4.612')] +[2024-11-08 05:43:23,260][42004] Updated weights for policy 0, policy_version 43506 (0.0026) +[2024-11-08 05:43:27,933][41694] Fps is (10 sec: 6962.5, 60 sec: 7099.6, 300 sec: 6761.8). Total num frames: 178229248. Throughput: 0: 1726.9. Samples: 39553360. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:43:27,935][41694] Avg episode reward: [(0, '4.701')] +[2024-11-08 05:43:31,830][42004] Updated weights for policy 0, policy_version 43516 (0.0040) +[2024-11-08 05:43:32,932][41694] Fps is (10 sec: 5324.4, 60 sec: 6758.3, 300 sec: 6706.3). Total num frames: 178249728. Throughput: 0: 1675.7. Samples: 39556614. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:43:32,934][41694] Avg episode reward: [(0, '4.474')] +[2024-11-08 05:43:37,249][42004] Updated weights for policy 0, policy_version 43526 (0.0032) +[2024-11-08 05:43:37,932][41694] Fps is (10 sec: 5735.0, 60 sec: 6826.6, 300 sec: 6692.4). Total num frames: 178286592. Throughput: 0: 1638.0. Samples: 39565870. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:43:37,934][41694] Avg episode reward: [(0, '4.460')] +[2024-11-08 05:43:37,947][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000043527_178286592.pth... +[2024-11-08 05:43:38,052][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000043131_176664576.pth +[2024-11-08 05:43:42,776][42004] Updated weights for policy 0, policy_version 43536 (0.0026) +[2024-11-08 05:43:42,931][41694] Fps is (10 sec: 7373.3, 60 sec: 6758.4, 300 sec: 6692.4). Total num frames: 178323456. Throughput: 0: 1740.4. Samples: 39576912. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:43:42,933][41694] Avg episode reward: [(0, '4.403')] +[2024-11-08 05:43:47,932][41694] Fps is (10 sec: 7372.9, 60 sec: 6758.4, 300 sec: 6706.3). Total num frames: 178360320. Throughput: 0: 1741.6. Samples: 39582524. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:43:47,934][41694] Avg episode reward: [(0, '4.317')] +[2024-11-08 05:43:48,123][42004] Updated weights for policy 0, policy_version 43546 (0.0022) +[2024-11-08 05:43:52,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6758.4, 300 sec: 6775.8). Total num frames: 178397184. Throughput: 0: 1757.4. Samples: 39593926. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:43:52,935][41694] Avg episode reward: [(0, '4.568')] +[2024-11-08 05:43:53,566][42004] Updated weights for policy 0, policy_version 43556 (0.0030) +[2024-11-08 05:43:57,932][41694] Fps is (10 sec: 7372.6, 60 sec: 7099.7, 300 sec: 6789.6). Total num frames: 178434048. Throughput: 0: 1748.0. Samples: 39605152. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:43:57,933][41694] Avg episode reward: [(0, '4.588')] +[2024-11-08 05:43:59,482][42004] Updated weights for policy 0, policy_version 43566 (0.0032) +[2024-11-08 05:44:02,932][41694] Fps is (10 sec: 6553.6, 60 sec: 6963.2, 300 sec: 6761.9). Total num frames: 178462720. Throughput: 0: 1728.1. Samples: 39609814. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:44:02,934][41694] Avg episode reward: [(0, '4.577')] +[2024-11-08 05:44:07,932][41694] Fps is (10 sec: 4915.3, 60 sec: 6690.1, 300 sec: 6706.3). Total num frames: 178483200. Throughput: 0: 1616.9. Samples: 39616208. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:44:07,933][41694] Avg episode reward: [(0, '4.577')] +[2024-11-08 05:44:08,093][42004] Updated weights for policy 0, policy_version 43576 (0.0032) +[2024-11-08 05:44:12,931][41694] Fps is (10 sec: 5734.5, 60 sec: 6690.1, 300 sec: 6706.3). Total num frames: 178520064. Throughput: 0: 1632.3. Samples: 39626810. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:44:12,933][41694] Avg episode reward: [(0, '4.455')] +[2024-11-08 05:44:13,468][42004] Updated weights for policy 0, policy_version 43586 (0.0026) +[2024-11-08 05:44:17,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6690.1, 300 sec: 6720.2). Total num frames: 178561024. Throughput: 0: 1681.3. Samples: 39632270. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:44:17,935][41694] Avg episode reward: [(0, '4.297')] +[2024-11-08 05:44:19,021][42004] Updated weights for policy 0, policy_version 43596 (0.0032) +[2024-11-08 05:44:22,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6621.8, 300 sec: 6720.2). Total num frames: 178593792. Throughput: 0: 1715.0. Samples: 39643046. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:44:22,934][41694] Avg episode reward: [(0, '4.571')] +[2024-11-08 05:44:24,836][42004] Updated weights for policy 0, policy_version 43606 (0.0025) +[2024-11-08 05:44:27,931][41694] Fps is (10 sec: 6963.4, 60 sec: 6690.3, 300 sec: 6789.6). Total num frames: 178630656. Throughput: 0: 1715.2. Samples: 39654096. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:44:27,933][41694] Avg episode reward: [(0, '4.427')] +[2024-11-08 05:44:30,367][42004] Updated weights for policy 0, policy_version 43616 (0.0037) +[2024-11-08 05:44:32,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6895.0, 300 sec: 6775.8). Total num frames: 178663424. Throughput: 0: 1714.8. Samples: 39659690. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 05:44:32,934][41694] Avg episode reward: [(0, '4.488')] +[2024-11-08 05:44:36,648][42004] Updated weights for policy 0, policy_version 43626 (0.0037) +[2024-11-08 05:44:39,647][41694] Fps is (10 sec: 5593.8, 60 sec: 6636.9, 300 sec: 6736.6). Total num frames: 178696192. Throughput: 0: 1620.4. Samples: 39669626. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 05:44:39,650][41694] Avg episode reward: [(0, '4.540')] +[2024-11-08 05:44:42,931][41694] Fps is (10 sec: 5734.5, 60 sec: 6621.9, 300 sec: 6734.1). Total num frames: 178720768. Throughput: 0: 1593.1. Samples: 39676840. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 05:44:42,933][41694] Avg episode reward: [(0, '4.546')] +[2024-11-08 05:44:44,341][42004] Updated weights for policy 0, policy_version 43636 (0.0044) +[2024-11-08 05:44:47,932][41694] Fps is (10 sec: 7416.3, 60 sec: 6621.9, 300 sec: 6734.1). Total num frames: 178757632. Throughput: 0: 1607.0. Samples: 39682128. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 05:44:47,934][41694] Avg episode reward: [(0, '4.408')] +[2024-11-08 05:44:49,887][42004] Updated weights for policy 0, policy_version 43646 (0.0036) +[2024-11-08 05:44:52,934][41694] Fps is (10 sec: 7370.9, 60 sec: 6621.6, 300 sec: 6734.1). Total num frames: 178794496. Throughput: 0: 1709.6. Samples: 39693144. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 05:44:52,937][41694] Avg episode reward: [(0, '4.460')] +[2024-11-08 05:44:55,773][42004] Updated weights for policy 0, policy_version 43656 (0.0037) +[2024-11-08 05:44:57,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6553.6, 300 sec: 6734.1). Total num frames: 178827264. Throughput: 0: 1707.0. Samples: 39703624. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:44:57,934][41694] Avg episode reward: [(0, '4.255')] +[2024-11-08 05:45:01,426][42004] Updated weights for policy 0, policy_version 43666 (0.0031) +[2024-11-08 05:45:02,932][41694] Fps is (10 sec: 6964.4, 60 sec: 6690.1, 300 sec: 6789.6). Total num frames: 178864128. Throughput: 0: 1708.1. Samples: 39709136. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:45:02,934][41694] Avg episode reward: [(0, '4.339')] +[2024-11-08 05:45:07,549][42004] Updated weights for policy 0, policy_version 43676 (0.0027) +[2024-11-08 05:45:07,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6895.0, 300 sec: 6775.8). Total num frames: 178896896. Throughput: 0: 1694.9. Samples: 39719314. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:45:07,936][41694] Avg episode reward: [(0, '4.346')] +[2024-11-08 05:45:13,947][41694] Fps is (10 sec: 5578.1, 60 sec: 6645.9, 300 sec: 6738.7). Total num frames: 178925568. Throughput: 0: 1527.4. Samples: 39724380. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:45:13,949][41694] Avg episode reward: [(0, '4.338')] +[2024-11-08 05:45:15,989][42004] Updated weights for policy 0, policy_version 43686 (0.0043) +[2024-11-08 05:45:17,931][41694] Fps is (10 sec: 4915.2, 60 sec: 6417.1, 300 sec: 6706.3). Total num frames: 178946048. Throughput: 0: 1579.6. Samples: 39730772. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:45:17,934][41694] Avg episode reward: [(0, '4.510')] +[2024-11-08 05:45:21,976][42004] Updated weights for policy 0, policy_version 43696 (0.0031) +[2024-11-08 05:45:22,933][41694] Fps is (10 sec: 6381.5, 60 sec: 6485.2, 300 sec: 6706.3). Total num frames: 178982912. Throughput: 0: 1630.6. Samples: 39740206. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:45:22,935][41694] Avg episode reward: [(0, '4.598')] +[2024-11-08 05:45:27,535][42004] Updated weights for policy 0, policy_version 43706 (0.0025) +[2024-11-08 05:45:27,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6485.3, 300 sec: 6706.3). Total num frames: 179019776. Throughput: 0: 1657.2. Samples: 39751416. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:45:27,932][41694] Avg episode reward: [(0, '4.623')] +[2024-11-08 05:45:32,932][41694] Fps is (10 sec: 7373.2, 60 sec: 6553.5, 300 sec: 6763.3). Total num frames: 179056640. Throughput: 0: 1667.0. Samples: 39757144. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:45:32,935][41694] Avg episode reward: [(0, '4.397')] +[2024-11-08 05:45:33,059][42004] Updated weights for policy 0, policy_version 43716 (0.0026) +[2024-11-08 05:45:37,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6816.8, 300 sec: 6775.8). Total num frames: 179093504. Throughput: 0: 1666.8. Samples: 39768148. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:45:37,933][41694] Avg episode reward: [(0, '4.611')] +[2024-11-08 05:45:38,024][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000043725_179097600.pth... +[2024-11-08 05:45:38,118][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000043328_177471488.pth +[2024-11-08 05:45:38,688][42004] Updated weights for policy 0, policy_version 43726 (0.0034) +[2024-11-08 05:45:42,932][41694] Fps is (10 sec: 6963.7, 60 sec: 6758.4, 300 sec: 6761.9). Total num frames: 179126272. Throughput: 0: 1665.3. Samples: 39778562. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:45:42,936][41694] Avg episode reward: [(0, '4.447')] +[2024-11-08 05:45:44,792][42004] Updated weights for policy 0, policy_version 43736 (0.0022) +[2024-11-08 05:45:48,273][41694] Fps is (10 sec: 5545.2, 60 sec: 6516.6, 300 sec: 6712.5). Total num frames: 179150848. Throughput: 0: 1643.5. Samples: 39783654. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:45:48,279][41694] Avg episode reward: [(0, '4.439')] +[2024-11-08 05:45:52,420][42004] Updated weights for policy 0, policy_version 43746 (0.0033) +[2024-11-08 05:45:52,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6485.6, 300 sec: 6706.3). Total num frames: 179183616. Throughput: 0: 1587.1. Samples: 39790736. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:45:52,934][41694] Avg episode reward: [(0, '4.529')] +[2024-11-08 05:45:57,919][42004] Updated weights for policy 0, policy_version 43756 (0.0036) +[2024-11-08 05:45:57,932][41694] Fps is (10 sec: 7633.1, 60 sec: 6621.9, 300 sec: 6734.1). Total num frames: 179224576. Throughput: 0: 1766.3. Samples: 39802070. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:45:57,934][41694] Avg episode reward: [(0, '4.469')] +[2024-11-08 05:46:02,932][41694] Fps is (10 sec: 7372.5, 60 sec: 6553.6, 300 sec: 6720.2). Total num frames: 179257344. Throughput: 0: 1708.3. Samples: 39807648. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:46:02,934][41694] Avg episode reward: [(0, '4.673')] +[2024-11-08 05:46:03,875][42004] Updated weights for policy 0, policy_version 43766 (0.0041) +[2024-11-08 05:46:07,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6621.9, 300 sec: 6787.2). Total num frames: 179294208. Throughput: 0: 1725.4. Samples: 39817846. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:46:07,933][41694] Avg episode reward: [(0, '4.450')] +[2024-11-08 05:46:09,668][42004] Updated weights for policy 0, policy_version 43776 (0.0025) +[2024-11-08 05:46:12,932][41694] Fps is (10 sec: 6963.4, 60 sec: 6805.3, 300 sec: 6789.6). Total num frames: 179326976. Throughput: 0: 1697.5. Samples: 39827806. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:46:12,934][41694] Avg episode reward: [(0, '4.568')] +[2024-11-08 05:46:16,283][42004] Updated weights for policy 0, policy_version 43786 (0.0032) +[2024-11-08 05:46:17,931][41694] Fps is (10 sec: 6144.0, 60 sec: 6826.7, 300 sec: 6761.9). Total num frames: 179355648. Throughput: 0: 1664.7. Samples: 39832056. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:46:17,933][41694] Avg episode reward: [(0, '4.383')] +[2024-11-08 05:46:22,932][41694] Fps is (10 sec: 4915.3, 60 sec: 6553.7, 300 sec: 6720.2). Total num frames: 179376128. Throughput: 0: 1624.2. Samples: 39841236. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:46:22,933][41694] Avg episode reward: [(0, '4.466')] +[2024-11-08 05:46:24,606][42004] Updated weights for policy 0, policy_version 43796 (0.0048) +[2024-11-08 05:46:27,931][41694] Fps is (10 sec: 5734.4, 60 sec: 6553.6, 300 sec: 6720.2). Total num frames: 179412992. Throughput: 0: 1569.7. Samples: 39849200. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:46:27,934][41694] Avg episode reward: [(0, '4.511')] +[2024-11-08 05:46:29,970][42004] Updated weights for policy 0, policy_version 43806 (0.0031) +[2024-11-08 05:46:32,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6553.7, 300 sec: 6706.3). Total num frames: 179449856. Throughput: 0: 1594.3. Samples: 39854854. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:46:32,933][41694] Avg episode reward: [(0, '4.567')] +[2024-11-08 05:46:35,544][42004] Updated weights for policy 0, policy_version 43816 (0.0033) +[2024-11-08 05:46:37,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6553.6, 300 sec: 6720.2). Total num frames: 179486720. Throughput: 0: 1672.2. Samples: 39865984. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:46:37,933][41694] Avg episode reward: [(0, '4.399')] +[2024-11-08 05:46:41,210][42004] Updated weights for policy 0, policy_version 43826 (0.0033) +[2024-11-08 05:46:42,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6553.6, 300 sec: 6756.0). Total num frames: 179519488. Throughput: 0: 1663.1. Samples: 39876908. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:46:42,933][41694] Avg episode reward: [(0, '4.518')] +[2024-11-08 05:46:46,931][42004] Updated weights for policy 0, policy_version 43836 (0.0039) +[2024-11-08 05:46:47,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6797.1, 300 sec: 6761.9). Total num frames: 179556352. Throughput: 0: 1656.6. Samples: 39882194. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:46:47,933][41694] Avg episode reward: [(0, '4.271')] +[2024-11-08 05:46:52,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6758.4, 300 sec: 6748.0). Total num frames: 179589120. Throughput: 0: 1653.4. Samples: 39892250. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:46:52,935][41694] Avg episode reward: [(0, '4.527')] +[2024-11-08 05:46:52,990][42004] Updated weights for policy 0, policy_version 43846 (0.0039) +[2024-11-08 05:46:57,932][41694] Fps is (10 sec: 5324.6, 60 sec: 6417.0, 300 sec: 6692.4). Total num frames: 179609600. Throughput: 0: 1587.8. Samples: 39899256. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:46:57,934][41694] Avg episode reward: [(0, '4.541')] +[2024-11-08 05:47:00,626][42004] Updated weights for policy 0, policy_version 43856 (0.0051) +[2024-11-08 05:47:02,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6485.4, 300 sec: 6692.4). Total num frames: 179646464. Throughput: 0: 1617.1. Samples: 39904826. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:47:02,936][41694] Avg episode reward: [(0, '4.661')] +[2024-11-08 05:47:06,798][42004] Updated weights for policy 0, policy_version 43866 (0.0027) +[2024-11-08 05:47:07,932][41694] Fps is (10 sec: 7373.0, 60 sec: 6485.3, 300 sec: 6678.6). Total num frames: 179683328. Throughput: 0: 1638.9. Samples: 39914988. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:47:07,933][41694] Avg episode reward: [(0, '4.660')] +[2024-11-08 05:47:12,138][42004] Updated weights for policy 0, policy_version 43876 (0.0035) +[2024-11-08 05:47:12,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6553.6, 300 sec: 6678.6). Total num frames: 179720192. Throughput: 0: 1712.2. Samples: 39926250. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:47:12,933][41694] Avg episode reward: [(0, '4.360')] +[2024-11-08 05:47:17,529][42004] Updated weights for policy 0, policy_version 43886 (0.0023) +[2024-11-08 05:47:17,932][41694] Fps is (10 sec: 7372.3, 60 sec: 6690.1, 300 sec: 6748.0). Total num frames: 179757056. Throughput: 0: 1710.0. Samples: 39931806. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:47:17,934][41694] Avg episode reward: [(0, '4.327')] +[2024-11-08 05:47:22,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6963.2, 300 sec: 6748.0). Total num frames: 179793920. Throughput: 0: 1709.7. Samples: 39942920. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:47:22,933][41694] Avg episode reward: [(0, '4.450')] +[2024-11-08 05:47:23,266][42004] Updated weights for policy 0, policy_version 43896 (0.0023) +[2024-11-08 05:47:27,931][41694] Fps is (10 sec: 7373.3, 60 sec: 6963.2, 300 sec: 6734.1). Total num frames: 179830784. Throughput: 0: 1713.2. Samples: 39954002. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:47:27,935][41694] Avg episode reward: [(0, '4.311')] +[2024-11-08 05:47:28,724][42004] Updated weights for policy 0, policy_version 43906 (0.0028) +[2024-11-08 05:47:32,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6690.1, 300 sec: 6692.4). Total num frames: 179851264. Throughput: 0: 1680.5. Samples: 39957818. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:47:32,933][41694] Avg episode reward: [(0, '4.410')] +[2024-11-08 05:47:36,610][42004] Updated weights for policy 0, policy_version 43916 (0.0024) +[2024-11-08 05:47:37,934][41694] Fps is (10 sec: 5732.9, 60 sec: 6689.8, 300 sec: 6678.5). Total num frames: 179888128. Throughput: 0: 1649.1. Samples: 39966464. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:47:37,945][41694] Avg episode reward: [(0, '4.416')] +[2024-11-08 05:47:37,963][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000043918_179888128.pth... +[2024-11-08 05:47:38,196][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000043527_178286592.pth +[2024-11-08 05:47:42,077][42004] Updated weights for policy 0, policy_version 43926 (0.0025) +[2024-11-08 05:47:42,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6758.4, 300 sec: 6678.6). Total num frames: 179924992. Throughput: 0: 1741.8. Samples: 39977636. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:47:42,934][41694] Avg episode reward: [(0, '4.440')] +[2024-11-08 05:47:47,295][42004] Updated weights for policy 0, policy_version 43936 (0.0030) +[2024-11-08 05:47:47,931][41694] Fps is (10 sec: 7784.5, 60 sec: 6826.7, 300 sec: 6692.4). Total num frames: 179965952. Throughput: 0: 1744.9. Samples: 39983344. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:47:47,933][41694] Avg episode reward: [(0, '4.344')] +[2024-11-08 05:47:52,565][42004] Updated weights for policy 0, policy_version 43946 (0.0036) +[2024-11-08 05:47:52,931][41694] Fps is (10 sec: 7782.6, 60 sec: 6894.9, 300 sec: 6761.9). Total num frames: 180002816. Throughput: 0: 1779.6. Samples: 39995072. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:47:52,933][41694] Avg episode reward: [(0, '4.411')] +[2024-11-08 05:47:57,932][41694] Fps is (10 sec: 7372.5, 60 sec: 7168.0, 300 sec: 6761.9). Total num frames: 180039680. Throughput: 0: 1786.0. Samples: 40006620. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:47:57,933][41694] Avg episode reward: [(0, '4.463')] +[2024-11-08 05:47:57,965][42004] Updated weights for policy 0, policy_version 43956 (0.0028) +[2024-11-08 05:48:02,932][41694] Fps is (10 sec: 7372.3, 60 sec: 7167.9, 300 sec: 6761.9). Total num frames: 180076544. Throughput: 0: 1784.4. Samples: 40012106. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:48:02,937][41694] Avg episode reward: [(0, '4.734')] +[2024-11-08 05:48:06,405][42004] Updated weights for policy 0, policy_version 43966 (0.0031) +[2024-11-08 05:48:07,932][41694] Fps is (10 sec: 5324.9, 60 sec: 6826.7, 300 sec: 6692.4). Total num frames: 180092928. Throughput: 0: 1671.7. Samples: 40018148. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:48:07,934][41694] Avg episode reward: [(0, '4.715')] +[2024-11-08 05:48:12,047][42004] Updated weights for policy 0, policy_version 43976 (0.0028) +[2024-11-08 05:48:12,932][41694] Fps is (10 sec: 5325.1, 60 sec: 6826.7, 300 sec: 6678.6). Total num frames: 180129792. Throughput: 0: 1661.9. Samples: 40028788. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:48:12,934][41694] Avg episode reward: [(0, '4.328')] +[2024-11-08 05:48:17,449][42004] Updated weights for policy 0, policy_version 43986 (0.0031) +[2024-11-08 05:48:17,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6826.7, 300 sec: 6678.6). Total num frames: 180166656. Throughput: 0: 1699.5. Samples: 40034294. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:48:17,933][41694] Avg episode reward: [(0, '4.467')] +[2024-11-08 05:48:22,792][42004] Updated weights for policy 0, policy_version 43996 (0.0038) +[2024-11-08 05:48:22,931][41694] Fps is (10 sec: 7782.6, 60 sec: 6895.0, 300 sec: 6706.4). Total num frames: 180207616. Throughput: 0: 1767.3. Samples: 40045988. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:48:22,932][41694] Avg episode reward: [(0, '4.441')] +[2024-11-08 05:48:27,931][41694] Fps is (10 sec: 7782.5, 60 sec: 6894.9, 300 sec: 6761.9). Total num frames: 180244480. Throughput: 0: 1777.5. Samples: 40057624. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:48:27,933][41694] Avg episode reward: [(0, '4.482')] +[2024-11-08 05:48:28,048][42004] Updated weights for policy 0, policy_version 44006 (0.0025) +[2024-11-08 05:48:32,932][41694] Fps is (10 sec: 7781.6, 60 sec: 7236.2, 300 sec: 6775.7). Total num frames: 180285440. Throughput: 0: 1776.3. Samples: 40063278. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:48:32,933][41694] Avg episode reward: [(0, '4.660')] +[2024-11-08 05:48:33,301][42004] Updated weights for policy 0, policy_version 44016 (0.0025) +[2024-11-08 05:48:37,933][41694] Fps is (10 sec: 7371.3, 60 sec: 7168.1, 300 sec: 6761.8). Total num frames: 180318208. Throughput: 0: 1756.7. Samples: 40074126. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:48:37,936][41694] Avg episode reward: [(0, '4.564')] +[2024-11-08 05:48:41,683][42004] Updated weights for policy 0, policy_version 44026 (0.0030) +[2024-11-08 05:48:42,931][41694] Fps is (10 sec: 5325.3, 60 sec: 6895.0, 300 sec: 6706.3). Total num frames: 180338688. Throughput: 0: 1643.6. Samples: 40080582. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:48:42,933][41694] Avg episode reward: [(0, '4.539')] +[2024-11-08 05:48:47,365][42004] Updated weights for policy 0, policy_version 44036 (0.0034) +[2024-11-08 05:48:47,931][41694] Fps is (10 sec: 5735.6, 60 sec: 6826.7, 300 sec: 6706.3). Total num frames: 180375552. Throughput: 0: 1638.6. Samples: 40085840. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:48:47,933][41694] Avg episode reward: [(0, '4.627')] +[2024-11-08 05:48:52,827][42004] Updated weights for policy 0, policy_version 44046 (0.0035) +[2024-11-08 05:48:52,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6826.7, 300 sec: 6706.3). Total num frames: 180412416. Throughput: 0: 1755.5. Samples: 40097146. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:48:52,934][41694] Avg episode reward: [(0, '4.295')] +[2024-11-08 05:48:57,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6826.7, 300 sec: 6734.1). Total num frames: 180449280. Throughput: 0: 1772.4. Samples: 40108544. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:48:57,933][41694] Avg episode reward: [(0, '4.343')] +[2024-11-08 05:48:58,325][42004] Updated weights for policy 0, policy_version 44056 (0.0037) +[2024-11-08 05:49:02,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6826.7, 300 sec: 6789.6). Total num frames: 180486144. Throughput: 0: 1769.9. Samples: 40113938. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:49:02,933][41694] Avg episode reward: [(0, '4.492')] +[2024-11-08 05:49:04,107][42004] Updated weights for policy 0, policy_version 44066 (0.0028) +[2024-11-08 05:49:07,932][41694] Fps is (10 sec: 6962.9, 60 sec: 7099.7, 300 sec: 6775.7). Total num frames: 180518912. Throughput: 0: 1742.2. Samples: 40124390. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:49:07,936][41694] Avg episode reward: [(0, '4.184')] +[2024-11-08 05:49:10,031][42004] Updated weights for policy 0, policy_version 44076 (0.0031) +[2024-11-08 05:49:14,469][41694] Fps is (10 sec: 5325.4, 60 sec: 6789.3, 300 sec: 6699.2). Total num frames: 180547584. Throughput: 0: 1649.3. Samples: 40134378. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:49:14,470][41694] Avg episode reward: [(0, '4.422')] +[2024-11-08 05:49:17,932][41694] Fps is (10 sec: 5324.9, 60 sec: 6758.4, 300 sec: 6706.3). Total num frames: 180572160. Throughput: 0: 1614.0. Samples: 40135906. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:49:17,933][41694] Avg episode reward: [(0, '4.277')] +[2024-11-08 05:49:18,164][42004] Updated weights for policy 0, policy_version 44086 (0.0030) +[2024-11-08 05:49:22,932][41694] Fps is (10 sec: 7259.8, 60 sec: 6690.1, 300 sec: 6706.3). Total num frames: 180609024. Throughput: 0: 1620.8. Samples: 40147060. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:49:22,934][41694] Avg episode reward: [(0, '4.382')] +[2024-11-08 05:49:23,604][42004] Updated weights for policy 0, policy_version 44096 (0.0031) +[2024-11-08 05:49:27,932][41694] Fps is (10 sec: 7372.9, 60 sec: 6690.1, 300 sec: 6720.2). Total num frames: 180645888. Throughput: 0: 1721.4. Samples: 40158046. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:49:27,934][41694] Avg episode reward: [(0, '4.585')] +[2024-11-08 05:49:29,157][42004] Updated weights for policy 0, policy_version 44106 (0.0040) +[2024-11-08 05:49:32,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6690.2, 300 sec: 6787.5). Total num frames: 180686848. Throughput: 0: 1731.4. Samples: 40163752. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:49:32,934][41694] Avg episode reward: [(0, '4.391')] +[2024-11-08 05:49:34,597][42004] Updated weights for policy 0, policy_version 44116 (0.0026) +[2024-11-08 05:49:37,931][41694] Fps is (10 sec: 7782.5, 60 sec: 6758.6, 300 sec: 6789.6). Total num frames: 180723712. Throughput: 0: 1729.1. Samples: 40174954. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:49:37,933][41694] Avg episode reward: [(0, '4.543')] +[2024-11-08 05:49:37,947][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000044122_180723712.pth... +[2024-11-08 05:49:38,057][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000043725_179097600.pth +[2024-11-08 05:49:40,113][42004] Updated weights for policy 0, policy_version 44126 (0.0029) +[2024-11-08 05:49:42,934][41694] Fps is (10 sec: 6961.3, 60 sec: 6962.9, 300 sec: 6775.7). Total num frames: 180756480. Throughput: 0: 1719.6. Samples: 40185932. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:49:42,940][41694] Avg episode reward: [(0, '4.694')] +[2024-11-08 05:49:46,186][42004] Updated weights for policy 0, policy_version 44136 (0.0029) +[2024-11-08 05:49:48,886][41694] Fps is (10 sec: 5235.2, 60 sec: 6652.7, 300 sec: 6712.5). Total num frames: 180781056. Throughput: 0: 1667.6. Samples: 40190570. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:49:48,888][41694] Avg episode reward: [(0, '4.499')] +[2024-11-08 05:49:52,931][41694] Fps is (10 sec: 5736.1, 60 sec: 6690.1, 300 sec: 6734.1). Total num frames: 180813824. Throughput: 0: 1624.9. Samples: 40197508. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:49:52,932][41694] Avg episode reward: [(0, '4.544')] +[2024-11-08 05:49:53,774][42004] Updated weights for policy 0, policy_version 44146 (0.0022) +[2024-11-08 05:49:57,932][41694] Fps is (10 sec: 7697.1, 60 sec: 6690.1, 300 sec: 6734.1). Total num frames: 180850688. Throughput: 0: 1714.4. Samples: 40208890. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:49:57,933][41694] Avg episode reward: [(0, '4.366')] +[2024-11-08 05:49:59,459][42004] Updated weights for policy 0, policy_version 44156 (0.0028) +[2024-11-08 05:50:02,931][41694] Fps is (10 sec: 6963.1, 60 sec: 6621.9, 300 sec: 6734.1). Total num frames: 180883456. Throughput: 0: 1745.3. Samples: 40214444. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:50:02,934][41694] Avg episode reward: [(0, '4.580')] +[2024-11-08 05:50:05,170][42004] Updated weights for policy 0, policy_version 44166 (0.0053) +[2024-11-08 05:50:07,933][41694] Fps is (10 sec: 7371.8, 60 sec: 6758.3, 300 sec: 6799.1). Total num frames: 180924416. Throughput: 0: 1739.4. Samples: 40225336. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:50:07,936][41694] Avg episode reward: [(0, '4.575')] +[2024-11-08 05:50:10,708][42004] Updated weights for policy 0, policy_version 44176 (0.0028) +[2024-11-08 05:50:12,931][41694] Fps is (10 sec: 7372.9, 60 sec: 7006.2, 300 sec: 6817.4). Total num frames: 180957184. Throughput: 0: 1741.7. Samples: 40236422. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:50:12,935][41694] Avg episode reward: [(0, '4.415')] +[2024-11-08 05:50:16,660][42004] Updated weights for policy 0, policy_version 44186 (0.0022) +[2024-11-08 05:50:17,932][41694] Fps is (10 sec: 6963.9, 60 sec: 7031.4, 300 sec: 6817.4). Total num frames: 180994048. Throughput: 0: 1712.8. Samples: 40240830. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:50:17,935][41694] Avg episode reward: [(0, '4.581')] +[2024-11-08 05:50:23,280][41694] Fps is (10 sec: 5541.4, 60 sec: 6719.4, 300 sec: 6753.9). Total num frames: 181014528. Throughput: 0: 1681.6. Samples: 40251212. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:50:23,283][41694] Avg episode reward: [(0, '4.433')] +[2024-11-08 05:50:24,830][42004] Updated weights for policy 0, policy_version 44196 (0.0031) +[2024-11-08 05:50:27,932][41694] Fps is (10 sec: 5325.0, 60 sec: 6690.1, 300 sec: 6748.0). Total num frames: 181047296. Throughput: 0: 1609.0. Samples: 40258334. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:50:27,933][41694] Avg episode reward: [(0, '4.515')] +[2024-11-08 05:50:30,351][42004] Updated weights for policy 0, policy_version 44206 (0.0023) +[2024-11-08 05:50:32,932][41694] Fps is (10 sec: 7214.3, 60 sec: 6621.9, 300 sec: 6748.0). Total num frames: 181084160. Throughput: 0: 1666.8. Samples: 40263988. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:50:32,933][41694] Avg episode reward: [(0, '4.505')] +[2024-11-08 05:50:35,641][42004] Updated weights for policy 0, policy_version 44216 (0.0020) +[2024-11-08 05:50:37,932][41694] Fps is (10 sec: 7372.6, 60 sec: 6621.8, 300 sec: 6761.9). Total num frames: 181121024. Throughput: 0: 1730.2. Samples: 40275366. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:50:37,933][41694] Avg episode reward: [(0, '4.375')] +[2024-11-08 05:50:41,789][42004] Updated weights for policy 0, policy_version 44226 (0.0031) +[2024-11-08 05:50:42,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6690.5, 300 sec: 6811.4). Total num frames: 181157888. Throughput: 0: 1700.1. Samples: 40285394. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:50:42,935][41694] Avg episode reward: [(0, '4.381')] +[2024-11-08 05:50:47,161][42004] Updated weights for policy 0, policy_version 44236 (0.0026) +[2024-11-08 05:50:47,932][41694] Fps is (10 sec: 7373.0, 60 sec: 7006.3, 300 sec: 6817.4). Total num frames: 181194752. Throughput: 0: 1698.3. Samples: 40290866. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:50:47,934][41694] Avg episode reward: [(0, '4.465')] +[2024-11-08 05:50:52,597][42004] Updated weights for policy 0, policy_version 44246 (0.0032) +[2024-11-08 05:50:52,932][41694] Fps is (10 sec: 7372.6, 60 sec: 6963.2, 300 sec: 6803.5). Total num frames: 181231616. Throughput: 0: 1709.4. Samples: 40302256. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:50:52,934][41694] Avg episode reward: [(0, '4.341')] +[2024-11-08 05:50:57,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6690.1, 300 sec: 6761.9). Total num frames: 181252096. Throughput: 0: 1655.9. Samples: 40310940. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:50:57,934][41694] Avg episode reward: [(0, '4.294')] +[2024-11-08 05:51:00,365][42004] Updated weights for policy 0, policy_version 44256 (0.0027) +[2024-11-08 05:51:02,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6758.4, 300 sec: 6761.9). Total num frames: 181288960. Throughput: 0: 1648.5. Samples: 40315014. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:51:02,934][41694] Avg episode reward: [(0, '4.320')] +[2024-11-08 05:51:06,099][42004] Updated weights for policy 0, policy_version 44266 (0.0029) +[2024-11-08 05:51:07,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6690.3, 300 sec: 6775.8). Total num frames: 181325824. Throughput: 0: 1670.1. Samples: 40325784. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:51:07,933][41694] Avg episode reward: [(0, '4.225')] +[2024-11-08 05:51:12,536][42004] Updated weights for policy 0, policy_version 44276 (0.0035) +[2024-11-08 05:51:12,931][41694] Fps is (10 sec: 6553.7, 60 sec: 6621.9, 300 sec: 6775.8). Total num frames: 181354496. Throughput: 0: 1709.4. Samples: 40335256. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:51:12,934][41694] Avg episode reward: [(0, '4.445')] +[2024-11-08 05:51:17,931][41694] Fps is (10 sec: 6144.1, 60 sec: 6553.7, 300 sec: 6817.4). Total num frames: 181387264. Throughput: 0: 1680.9. Samples: 40339630. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:51:17,933][41694] Avg episode reward: [(0, '4.479')] +[2024-11-08 05:51:18,730][42004] Updated weights for policy 0, policy_version 44286 (0.0033) +[2024-11-08 05:51:22,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6866.5, 300 sec: 6817.4). Total num frames: 181424128. Throughput: 0: 1666.1. Samples: 40350342. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:51:22,934][41694] Avg episode reward: [(0, '4.436')] +[2024-11-08 05:51:24,676][42004] Updated weights for policy 0, policy_version 44296 (0.0052) +[2024-11-08 05:51:27,931][41694] Fps is (10 sec: 7372.7, 60 sec: 6894.9, 300 sec: 6817.4). Total num frames: 181460992. Throughput: 0: 1681.1. Samples: 40361044. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:51:27,934][41694] Avg episode reward: [(0, '4.433')] +[2024-11-08 05:51:32,296][42004] Updated weights for policy 0, policy_version 44306 (0.0026) +[2024-11-08 05:51:32,931][41694] Fps is (10 sec: 5734.5, 60 sec: 6621.9, 300 sec: 6761.9). Total num frames: 181481472. Throughput: 0: 1681.9. Samples: 40366550. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:51:32,933][41694] Avg episode reward: [(0, '4.580')] +[2024-11-08 05:51:37,897][42004] Updated weights for policy 0, policy_version 44316 (0.0030) +[2024-11-08 05:51:37,931][41694] Fps is (10 sec: 5734.4, 60 sec: 6621.9, 300 sec: 6775.8). Total num frames: 181518336. Throughput: 0: 1583.7. Samples: 40373520. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:51:37,933][41694] Avg episode reward: [(0, '4.527')] +[2024-11-08 05:51:37,945][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000044316_181518336.pth... +[2024-11-08 05:51:38,058][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000043918_179888128.pth +[2024-11-08 05:51:42,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6621.9, 300 sec: 6775.8). Total num frames: 181555200. Throughput: 0: 1636.8. Samples: 40384596. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:51:42,933][41694] Avg episode reward: [(0, '4.589')] +[2024-11-08 05:51:43,461][42004] Updated weights for policy 0, policy_version 44326 (0.0035) +[2024-11-08 05:51:47,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6553.6, 300 sec: 6775.8). Total num frames: 181587968. Throughput: 0: 1662.6. Samples: 40389832. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:51:47,934][41694] Avg episode reward: [(0, '4.402')] +[2024-11-08 05:51:49,655][42004] Updated weights for policy 0, policy_version 44336 (0.0026) +[2024-11-08 05:51:52,931][41694] Fps is (10 sec: 6963.1, 60 sec: 6553.6, 300 sec: 6831.3). Total num frames: 181624832. Throughput: 0: 1647.7. Samples: 40399932. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:51:52,933][41694] Avg episode reward: [(0, '4.498')] +[2024-11-08 05:51:55,217][42004] Updated weights for policy 0, policy_version 44346 (0.0036) +[2024-11-08 05:51:57,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6758.4, 300 sec: 6817.4). Total num frames: 181657600. Throughput: 0: 1684.8. Samples: 40411072. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 05:51:57,933][41694] Avg episode reward: [(0, '4.454')] +[2024-11-08 05:52:01,049][42004] Updated weights for policy 0, policy_version 44356 (0.0025) +[2024-11-08 05:52:02,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6758.4, 300 sec: 6817.4). Total num frames: 181694464. Throughput: 0: 1708.3. Samples: 40416504. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 05:52:02,933][41694] Avg episode reward: [(0, '4.525')] +[2024-11-08 05:52:07,932][41694] Fps is (10 sec: 5734.5, 60 sec: 6485.3, 300 sec: 6761.9). Total num frames: 181714944. Throughput: 0: 1636.0. Samples: 40423962. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 05:52:07,933][41694] Avg episode reward: [(0, '4.639')] +[2024-11-08 05:52:08,971][42004] Updated weights for policy 0, policy_version 44366 (0.0031) +[2024-11-08 05:52:12,931][41694] Fps is (10 sec: 5734.4, 60 sec: 6621.9, 300 sec: 6761.9). Total num frames: 181751808. Throughput: 0: 1616.9. Samples: 40433806. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 05:52:12,933][41694] Avg episode reward: [(0, '4.355')] +[2024-11-08 05:52:14,669][42004] Updated weights for policy 0, policy_version 44376 (0.0031) +[2024-11-08 05:52:17,932][41694] Fps is (10 sec: 6963.3, 60 sec: 6621.8, 300 sec: 6748.0). Total num frames: 181784576. Throughput: 0: 1612.9. Samples: 40439130. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 05:52:17,933][41694] Avg episode reward: [(0, '4.492')] +[2024-11-08 05:52:20,686][42004] Updated weights for policy 0, policy_version 44386 (0.0035) +[2024-11-08 05:52:22,932][41694] Fps is (10 sec: 6553.5, 60 sec: 6553.6, 300 sec: 6734.1). Total num frames: 181817344. Throughput: 0: 1684.2. Samples: 40449308. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 05:52:22,936][41694] Avg episode reward: [(0, '4.549')] +[2024-11-08 05:52:26,521][42004] Updated weights for policy 0, policy_version 44396 (0.0031) +[2024-11-08 05:52:27,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6553.6, 300 sec: 6789.6). Total num frames: 181854208. Throughput: 0: 1673.0. Samples: 40459880. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:52:27,934][41694] Avg episode reward: [(0, '4.412')] +[2024-11-08 05:52:32,087][42004] Updated weights for policy 0, policy_version 44406 (0.0031) +[2024-11-08 05:52:32,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6826.6, 300 sec: 6789.7). Total num frames: 181891072. Throughput: 0: 1674.1. Samples: 40465168. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:52:32,934][41694] Avg episode reward: [(0, '4.367')] +[2024-11-08 05:52:37,700][42004] Updated weights for policy 0, policy_version 44416 (0.0034) +[2024-11-08 05:52:37,932][41694] Fps is (10 sec: 7372.9, 60 sec: 6826.6, 300 sec: 6789.6). Total num frames: 181927936. Throughput: 0: 1699.7. Samples: 40476420. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:52:37,934][41694] Avg episode reward: [(0, '4.585')] +[2024-11-08 05:52:42,931][41694] Fps is (10 sec: 5734.6, 60 sec: 6553.6, 300 sec: 6720.2). Total num frames: 181948416. Throughput: 0: 1604.1. Samples: 40483258. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:52:42,933][41694] Avg episode reward: [(0, '4.638')] +[2024-11-08 05:52:45,552][42004] Updated weights for policy 0, policy_version 44426 (0.0027) +[2024-11-08 05:52:47,931][41694] Fps is (10 sec: 5734.5, 60 sec: 6621.9, 300 sec: 6720.2). Total num frames: 181985280. Throughput: 0: 1606.8. Samples: 40488808. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:52:47,933][41694] Avg episode reward: [(0, '4.351')] +[2024-11-08 05:52:51,101][42004] Updated weights for policy 0, policy_version 44436 (0.0035) +[2024-11-08 05:52:52,932][41694] Fps is (10 sec: 6963.0, 60 sec: 6553.6, 300 sec: 6706.3). Total num frames: 182018048. Throughput: 0: 1686.0. Samples: 40499830. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:52:52,935][41694] Avg episode reward: [(0, '4.417')] +[2024-11-08 05:52:57,385][42004] Updated weights for policy 0, policy_version 44446 (0.0031) +[2024-11-08 05:52:57,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6621.9, 300 sec: 6706.3). Total num frames: 182054912. Throughput: 0: 1683.6. Samples: 40509568. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:52:57,933][41694] Avg episode reward: [(0, '4.272')] +[2024-11-08 05:53:02,931][41694] Fps is (10 sec: 6963.4, 60 sec: 6553.6, 300 sec: 6761.9). Total num frames: 182087680. Throughput: 0: 1686.4. Samples: 40515016. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:53:02,932][41694] Avg episode reward: [(0, '4.442')] +[2024-11-08 05:53:03,060][42004] Updated weights for policy 0, policy_version 44456 (0.0031) +[2024-11-08 05:53:07,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6895.0, 300 sec: 6775.8). Total num frames: 182128640. Throughput: 0: 1703.7. Samples: 40525976. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:53:07,933][41694] Avg episode reward: [(0, '4.298')] +[2024-11-08 05:53:08,510][42004] Updated weights for policy 0, policy_version 44466 (0.0030) +[2024-11-08 05:53:12,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6826.7, 300 sec: 6761.9). Total num frames: 182161408. Throughput: 0: 1716.3. Samples: 40537112. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:53:12,933][41694] Avg episode reward: [(0, '4.322')] +[2024-11-08 05:53:16,213][42004] Updated weights for policy 0, policy_version 44476 (0.0036) +[2024-11-08 05:53:17,931][41694] Fps is (10 sec: 5324.8, 60 sec: 6621.9, 300 sec: 6692.4). Total num frames: 182181888. Throughput: 0: 1659.3. Samples: 40539834. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:53:17,934][41694] Avg episode reward: [(0, '4.558')] +[2024-11-08 05:53:21,874][42004] Updated weights for policy 0, policy_version 44486 (0.0042) +[2024-11-08 05:53:22,931][41694] Fps is (10 sec: 5734.4, 60 sec: 6690.2, 300 sec: 6692.4). Total num frames: 182218752. Throughput: 0: 1627.5. Samples: 40549656. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:53:22,936][41694] Avg episode reward: [(0, '4.422')] +[2024-11-08 05:53:27,742][42004] Updated weights for policy 0, policy_version 44496 (0.0025) +[2024-11-08 05:53:27,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6690.2, 300 sec: 6678.6). Total num frames: 182255616. Throughput: 0: 1710.0. Samples: 40560208. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:53:27,933][41694] Avg episode reward: [(0, '4.665')] +[2024-11-08 05:53:32,932][41694] Fps is (10 sec: 7372.2, 60 sec: 6690.1, 300 sec: 6692.5). Total num frames: 182292480. Throughput: 0: 1699.2. Samples: 40565274. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:53:32,935][41694] Avg episode reward: [(0, '4.655')] +[2024-11-08 05:53:33,323][42004] Updated weights for policy 0, policy_version 44506 (0.0019) +[2024-11-08 05:53:37,932][41694] Fps is (10 sec: 6963.0, 60 sec: 6621.9, 300 sec: 6734.1). Total num frames: 182325248. Throughput: 0: 1703.8. Samples: 40576500. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:53:37,933][41694] Avg episode reward: [(0, '4.358')] +[2024-11-08 05:53:37,948][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000044514_182329344.pth... +[2024-11-08 05:53:38,058][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000044122_180723712.pth +[2024-11-08 05:53:39,011][42004] Updated weights for policy 0, policy_version 44516 (0.0026) +[2024-11-08 05:53:42,931][41694] Fps is (10 sec: 7373.4, 60 sec: 6963.2, 300 sec: 6748.0). Total num frames: 182366208. Throughput: 0: 1729.0. Samples: 40587374. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:53:42,933][41694] Avg episode reward: [(0, '4.428')] +[2024-11-08 05:53:44,627][42004] Updated weights for policy 0, policy_version 44526 (0.0030) +[2024-11-08 05:53:47,932][41694] Fps is (10 sec: 7372.9, 60 sec: 6894.9, 300 sec: 6734.1). Total num frames: 182398976. Throughput: 0: 1730.9. Samples: 40592906. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:53:47,933][41694] Avg episode reward: [(0, '4.671')] +[2024-11-08 05:53:52,496][42004] Updated weights for policy 0, policy_version 44536 (0.0032) +[2024-11-08 05:53:52,931][41694] Fps is (10 sec: 5324.8, 60 sec: 6690.2, 300 sec: 6678.6). Total num frames: 182419456. Throughput: 0: 1638.4. Samples: 40599706. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:53:52,933][41694] Avg episode reward: [(0, '4.547')] +[2024-11-08 05:53:57,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6690.1, 300 sec: 6678.6). Total num frames: 182456320. Throughput: 0: 1638.6. Samples: 40610850. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:53:57,935][41694] Avg episode reward: [(0, '4.318')] +[2024-11-08 05:53:57,941][42004] Updated weights for policy 0, policy_version 44546 (0.0027) +[2024-11-08 05:54:02,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6758.4, 300 sec: 6692.5). Total num frames: 182493184. Throughput: 0: 1697.7. Samples: 40616230. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:54:02,934][41694] Avg episode reward: [(0, '4.534')] +[2024-11-08 05:54:03,988][42004] Updated weights for policy 0, policy_version 44556 (0.0026) +[2024-11-08 05:54:07,932][41694] Fps is (10 sec: 7372.9, 60 sec: 6690.1, 300 sec: 6755.4). Total num frames: 182530048. Throughput: 0: 1704.6. Samples: 40626364. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:54:07,934][41694] Avg episode reward: [(0, '4.729')] +[2024-11-08 05:54:09,671][42004] Updated weights for policy 0, policy_version 44566 (0.0034) +[2024-11-08 05:54:12,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6690.1, 300 sec: 6748.0). Total num frames: 182562816. Throughput: 0: 1714.7. Samples: 40637370. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:54:12,934][41694] Avg episode reward: [(0, '4.623')] +[2024-11-08 05:54:15,109][42004] Updated weights for policy 0, policy_version 44576 (0.0025) +[2024-11-08 05:54:17,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6963.2, 300 sec: 6748.0). Total num frames: 182599680. Throughput: 0: 1727.6. Samples: 40643014. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:54:17,935][41694] Avg episode reward: [(0, '4.290')] +[2024-11-08 05:54:20,725][42004] Updated weights for policy 0, policy_version 44586 (0.0032) +[2024-11-08 05:54:25,105][41694] Fps is (10 sec: 6056.5, 60 sec: 6719.8, 300 sec: 6698.6). Total num frames: 182636544. Throughput: 0: 1646.8. Samples: 40654186. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:54:25,106][41694] Avg episode reward: [(0, '4.509')] +[2024-11-08 05:54:27,932][41694] Fps is (10 sec: 5734.5, 60 sec: 6690.1, 300 sec: 6678.6). Total num frames: 182657024. Throughput: 0: 1621.6. Samples: 40660348. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:54:27,934][41694] Avg episode reward: [(0, '4.453')] +[2024-11-08 05:54:28,861][42004] Updated weights for policy 0, policy_version 44596 (0.0023) +[2024-11-08 05:54:32,931][41694] Fps is (10 sec: 6803.5, 60 sec: 6622.0, 300 sec: 6664.7). Total num frames: 182689792. Throughput: 0: 1614.9. Samples: 40665578. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:54:32,933][41694] Avg episode reward: [(0, '4.508')] +[2024-11-08 05:54:35,036][42004] Updated weights for policy 0, policy_version 44606 (0.0037) +[2024-11-08 05:54:37,932][41694] Fps is (10 sec: 6553.6, 60 sec: 6621.9, 300 sec: 6664.7). Total num frames: 182722560. Throughput: 0: 1689.3. Samples: 40675726. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:54:37,933][41694] Avg episode reward: [(0, '4.625')] +[2024-11-08 05:54:40,995][42004] Updated weights for policy 0, policy_version 44616 (0.0028) +[2024-11-08 05:54:42,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6553.6, 300 sec: 6728.1). Total num frames: 182759424. Throughput: 0: 1676.6. Samples: 40686296. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:54:42,934][41694] Avg episode reward: [(0, '4.314')] +[2024-11-08 05:54:46,373][42004] Updated weights for policy 0, policy_version 44626 (0.0029) +[2024-11-08 05:54:47,932][41694] Fps is (10 sec: 7372.9, 60 sec: 6621.9, 300 sec: 6720.2). Total num frames: 182796288. Throughput: 0: 1676.9. Samples: 40691690. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:54:47,933][41694] Avg episode reward: [(0, '4.504')] +[2024-11-08 05:54:51,636][42004] Updated weights for policy 0, policy_version 44636 (0.0021) +[2024-11-08 05:54:52,931][41694] Fps is (10 sec: 7782.5, 60 sec: 6963.2, 300 sec: 6734.1). Total num frames: 182837248. Throughput: 0: 1714.8. Samples: 40703528. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:54:52,938][41694] Avg episode reward: [(0, '4.184')] +[2024-11-08 05:54:57,130][42004] Updated weights for policy 0, policy_version 44646 (0.0024) +[2024-11-08 05:54:59,894][41694] Fps is (10 sec: 6163.1, 60 sec: 6676.5, 300 sec: 6689.6). Total num frames: 182870016. Throughput: 0: 1652.7. Samples: 40714986. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:54:59,896][41694] Avg episode reward: [(0, '4.242')] +[2024-11-08 05:55:02,932][41694] Fps is (10 sec: 5324.7, 60 sec: 6621.9, 300 sec: 6664.7). Total num frames: 182890496. Throughput: 0: 1617.9. Samples: 40715818. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:55:02,934][41694] Avg episode reward: [(0, '4.386')] +[2024-11-08 05:55:05,887][42004] Updated weights for policy 0, policy_version 44656 (0.0036) +[2024-11-08 05:55:07,932][41694] Fps is (10 sec: 6625.1, 60 sec: 6553.6, 300 sec: 6664.7). Total num frames: 182923264. Throughput: 0: 1666.2. Samples: 40725546. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:55:07,934][41694] Avg episode reward: [(0, '4.320')] +[2024-11-08 05:55:12,045][42004] Updated weights for policy 0, policy_version 44666 (0.0052) +[2024-11-08 05:55:12,931][41694] Fps is (10 sec: 6553.7, 60 sec: 6553.6, 300 sec: 6650.8). Total num frames: 182956032. Throughput: 0: 1668.8. Samples: 40735444. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:55:12,932][41694] Avg episode reward: [(0, '4.612')] +[2024-11-08 05:55:17,760][42004] Updated weights for policy 0, policy_version 44676 (0.0026) +[2024-11-08 05:55:17,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6553.6, 300 sec: 6714.3). Total num frames: 182992896. Throughput: 0: 1665.0. Samples: 40740502. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:55:17,933][41694] Avg episode reward: [(0, '4.356')] +[2024-11-08 05:55:22,881][42004] Updated weights for policy 0, policy_version 44686 (0.0030) +[2024-11-08 05:55:22,931][41694] Fps is (10 sec: 7782.3, 60 sec: 6870.7, 300 sec: 6734.1). Total num frames: 183033856. Throughput: 0: 1693.6. Samples: 40751940. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:55:22,933][41694] Avg episode reward: [(0, '4.563')] +[2024-11-08 05:55:27,932][41694] Fps is (10 sec: 7782.1, 60 sec: 6894.9, 300 sec: 6734.1). Total num frames: 183070720. Throughput: 0: 1726.9. Samples: 40764006. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:55:27,936][41694] Avg episode reward: [(0, '4.345')] +[2024-11-08 05:55:28,212][42004] Updated weights for policy 0, policy_version 44696 (0.0037) +[2024-11-08 05:55:34,587][41694] Fps is (10 sec: 5974.3, 60 sec: 6709.8, 300 sec: 6682.7). Total num frames: 183103488. Throughput: 0: 1672.9. Samples: 40769738. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:55:34,590][41694] Avg episode reward: [(0, '4.485')] +[2024-11-08 05:55:35,911][42004] Updated weights for policy 0, policy_version 44706 (0.0028) +[2024-11-08 05:55:37,932][41694] Fps is (10 sec: 5734.6, 60 sec: 6758.4, 300 sec: 6678.6). Total num frames: 183128064. Throughput: 0: 1622.9. Samples: 40776560. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:55:37,934][41694] Avg episode reward: [(0, '4.426')] +[2024-11-08 05:55:37,950][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000044709_183128064.pth... +[2024-11-08 05:55:38,077][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000044316_181518336.pth +[2024-11-08 05:55:41,790][42004] Updated weights for policy 0, policy_version 44716 (0.0026) +[2024-11-08 05:55:42,935][41694] Fps is (10 sec: 6869.1, 60 sec: 6689.8, 300 sec: 6664.6). Total num frames: 183160832. Throughput: 0: 1671.0. Samples: 40786908. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 05:55:42,937][41694] Avg episode reward: [(0, '4.403')] +[2024-11-08 05:55:47,589][42004] Updated weights for policy 0, policy_version 44726 (0.0033) +[2024-11-08 05:55:47,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6690.1, 300 sec: 6664.7). Total num frames: 183197696. Throughput: 0: 1689.9. Samples: 40791864. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:55:47,933][41694] Avg episode reward: [(0, '4.288')] +[2024-11-08 05:55:52,706][42004] Updated weights for policy 0, policy_version 44736 (0.0029) +[2024-11-08 05:55:52,932][41694] Fps is (10 sec: 7784.9, 60 sec: 6690.1, 300 sec: 6734.1). Total num frames: 183238656. Throughput: 0: 1737.4. Samples: 40803730. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:55:52,935][41694] Avg episode reward: [(0, '4.424')] +[2024-11-08 05:55:57,872][42004] Updated weights for policy 0, policy_version 44746 (0.0025) +[2024-11-08 05:55:57,932][41694] Fps is (10 sec: 8191.9, 60 sec: 7057.5, 300 sec: 6748.0). Total num frames: 183279616. Throughput: 0: 1785.6. Samples: 40815796. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:55:57,933][41694] Avg episode reward: [(0, '4.655')] +[2024-11-08 05:56:02,931][41694] Fps is (10 sec: 7782.5, 60 sec: 7099.7, 300 sec: 6748.0). Total num frames: 183316480. Throughput: 0: 1797.9. Samples: 40821406. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:56:02,933][41694] Avg episode reward: [(0, '4.436')] +[2024-11-08 05:56:03,366][42004] Updated weights for policy 0, policy_version 44756 (0.0030) +[2024-11-08 05:56:09,415][41694] Fps is (10 sec: 5707.2, 60 sec: 6861.9, 300 sec: 6714.2). Total num frames: 183345152. Throughput: 0: 1728.5. Samples: 40832288. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:56:09,417][41694] Avg episode reward: [(0, '4.337')] +[2024-11-08 05:56:12,370][42004] Updated weights for policy 0, policy_version 44766 (0.0040) +[2024-11-08 05:56:12,932][41694] Fps is (10 sec: 4505.6, 60 sec: 6758.4, 300 sec: 6692.4). Total num frames: 183361536. Throughput: 0: 1633.4. Samples: 40837510. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:56:12,934][41694] Avg episode reward: [(0, '4.613')] +[2024-11-08 05:56:17,932][41694] Fps is (10 sec: 5290.2, 60 sec: 6621.9, 300 sec: 6664.7). Total num frames: 183390208. Throughput: 0: 1657.7. Samples: 40841590. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:56:17,933][41694] Avg episode reward: [(0, '4.615')] +[2024-11-08 05:56:19,433][42004] Updated weights for policy 0, policy_version 44776 (0.0029) +[2024-11-08 05:56:22,932][41694] Fps is (10 sec: 6553.3, 60 sec: 6553.5, 300 sec: 6664.7). Total num frames: 183427072. Throughput: 0: 1654.6. Samples: 40851016. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:56:22,934][41694] Avg episode reward: [(0, '4.465')] +[2024-11-08 05:56:24,878][42004] Updated weights for policy 0, policy_version 44786 (0.0029) +[2024-11-08 05:56:27,932][41694] Fps is (10 sec: 7782.0, 60 sec: 6621.9, 300 sec: 6734.1). Total num frames: 183468032. Throughput: 0: 1680.8. Samples: 40862538. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:56:27,934][41694] Avg episode reward: [(0, '4.721')] +[2024-11-08 05:56:30,192][42004] Updated weights for policy 0, policy_version 44796 (0.0031) +[2024-11-08 05:56:32,932][41694] Fps is (10 sec: 7782.7, 60 sec: 6879.9, 300 sec: 6734.1). Total num frames: 183504896. Throughput: 0: 1700.8. Samples: 40868400. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:56:32,934][41694] Avg episode reward: [(0, '4.551')] +[2024-11-08 05:56:35,516][42004] Updated weights for policy 0, policy_version 44806 (0.0025) +[2024-11-08 05:56:37,931][41694] Fps is (10 sec: 7373.2, 60 sec: 6894.9, 300 sec: 6734.1). Total num frames: 183541760. Throughput: 0: 1690.9. Samples: 40879820. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:56:37,933][41694] Avg episode reward: [(0, '4.419')] +[2024-11-08 05:56:41,078][42004] Updated weights for policy 0, policy_version 44816 (0.0030) +[2024-11-08 05:56:44,128][41694] Fps is (10 sec: 5853.5, 60 sec: 6693.6, 300 sec: 6693.1). Total num frames: 183570432. Throughput: 0: 1506.4. Samples: 40885388. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:56:44,129][41694] Avg episode reward: [(0, '4.417')] +[2024-11-08 05:56:47,931][41694] Fps is (10 sec: 5324.8, 60 sec: 6621.9, 300 sec: 6678.6). Total num frames: 183595008. Throughput: 0: 1568.8. Samples: 40892002. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:56:47,933][41694] Avg episode reward: [(0, '4.467')] +[2024-11-08 05:56:49,357][42004] Updated weights for policy 0, policy_version 44826 (0.0023) +[2024-11-08 05:56:52,931][41694] Fps is (10 sec: 6513.6, 60 sec: 6485.4, 300 sec: 6678.6). Total num frames: 183627776. Throughput: 0: 1597.2. Samples: 40901792. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:56:52,937][41694] Avg episode reward: [(0, '4.409')] +[2024-11-08 05:56:55,357][42004] Updated weights for policy 0, policy_version 44836 (0.0036) +[2024-11-08 05:56:57,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6485.3, 300 sec: 6692.4). Total num frames: 183668736. Throughput: 0: 1672.4. Samples: 40912770. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:56:57,934][41694] Avg episode reward: [(0, '4.450')] +[2024-11-08 05:57:00,584][42004] Updated weights for policy 0, policy_version 44846 (0.0024) +[2024-11-08 05:57:02,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6417.1, 300 sec: 6734.1). Total num frames: 183701504. Throughput: 0: 1711.6. Samples: 40918614. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:57:02,933][41694] Avg episode reward: [(0, '4.599')] +[2024-11-08 05:57:06,232][42004] Updated weights for policy 0, policy_version 44856 (0.0023) +[2024-11-08 05:57:07,931][41694] Fps is (10 sec: 7373.0, 60 sec: 6789.7, 300 sec: 6748.0). Total num frames: 183742464. Throughput: 0: 1749.7. Samples: 40929752. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:57:07,933][41694] Avg episode reward: [(0, '4.553')] +[2024-11-08 05:57:11,712][42004] Updated weights for policy 0, policy_version 44866 (0.0032) +[2024-11-08 05:57:12,931][41694] Fps is (10 sec: 7782.5, 60 sec: 6963.2, 300 sec: 6761.9). Total num frames: 183779328. Throughput: 0: 1744.4. Samples: 40941034. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:57:12,934][41694] Avg episode reward: [(0, '4.376')] +[2024-11-08 05:57:18,968][41694] Fps is (10 sec: 5567.0, 60 sec: 6777.9, 300 sec: 6710.5). Total num frames: 183803904. Throughput: 0: 1695.6. Samples: 40946460. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:57:18,969][41694] Avg episode reward: [(0, '4.422')] +[2024-11-08 05:57:19,810][42004] Updated weights for policy 0, policy_version 44876 (0.0019) +[2024-11-08 05:57:22,933][41694] Fps is (10 sec: 4914.5, 60 sec: 6690.0, 300 sec: 6692.4). Total num frames: 183828480. Throughput: 0: 1612.0. Samples: 40952360. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:57:22,935][41694] Avg episode reward: [(0, '4.550')] +[2024-11-08 05:57:26,230][42004] Updated weights for policy 0, policy_version 44886 (0.0046) +[2024-11-08 05:57:27,932][41694] Fps is (10 sec: 6396.8, 60 sec: 6553.6, 300 sec: 6678.5). Total num frames: 183861248. Throughput: 0: 1750.7. Samples: 40962076. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:57:27,935][41694] Avg episode reward: [(0, '4.331')] +[2024-11-08 05:57:31,924][42004] Updated weights for policy 0, policy_version 44896 (0.0032) +[2024-11-08 05:57:32,932][41694] Fps is (10 sec: 6964.1, 60 sec: 6553.6, 300 sec: 6678.6). Total num frames: 183898112. Throughput: 0: 1673.2. Samples: 40967298. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:57:32,933][41694] Avg episode reward: [(0, '4.333')] +[2024-11-08 05:57:37,136][42004] Updated weights for policy 0, policy_version 44906 (0.0024) +[2024-11-08 05:57:37,932][41694] Fps is (10 sec: 7783.0, 60 sec: 6621.9, 300 sec: 6748.0). Total num frames: 183939072. Throughput: 0: 1716.8. Samples: 40979048. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:57:37,933][41694] Avg episode reward: [(0, '4.538')] +[2024-11-08 05:57:37,949][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000044907_183939072.pth... +[2024-11-08 05:57:38,099][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000044514_182329344.pth +[2024-11-08 05:57:42,331][42004] Updated weights for policy 0, policy_version 44916 (0.0026) +[2024-11-08 05:57:42,932][41694] Fps is (10 sec: 8191.9, 60 sec: 6965.5, 300 sec: 6761.9). Total num frames: 183980032. Throughput: 0: 1737.9. Samples: 40990976. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:57:42,933][41694] Avg episode reward: [(0, '4.494')] +[2024-11-08 05:57:47,594][42004] Updated weights for policy 0, policy_version 44926 (0.0034) +[2024-11-08 05:57:47,931][41694] Fps is (10 sec: 7782.4, 60 sec: 7031.5, 300 sec: 6775.8). Total num frames: 184016896. Throughput: 0: 1735.1. Samples: 40996692. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:57:47,933][41694] Avg episode reward: [(0, '4.366')] +[2024-11-08 05:57:53,540][41694] Fps is (10 sec: 5791.4, 60 sec: 6825.7, 300 sec: 6720.2). Total num frames: 184041472. Throughput: 0: 1720.4. Samples: 41008218. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:57:53,543][41694] Avg episode reward: [(0, '4.634')] +[2024-11-08 05:57:55,517][42004] Updated weights for policy 0, policy_version 44936 (0.0033) +[2024-11-08 05:57:57,936][41694] Fps is (10 sec: 5322.4, 60 sec: 6689.7, 300 sec: 6720.1). Total num frames: 184070144. Throughput: 0: 1628.2. Samples: 41014312. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:57:57,938][41694] Avg episode reward: [(0, '4.576')] +[2024-11-08 05:58:01,782][42004] Updated weights for policy 0, policy_version 44946 (0.0035) +[2024-11-08 05:58:02,931][41694] Fps is (10 sec: 6542.4, 60 sec: 6690.1, 300 sec: 6692.4). Total num frames: 184102912. Throughput: 0: 1653.1. Samples: 41019134. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:58:02,933][41694] Avg episode reward: [(0, '4.563')] +[2024-11-08 05:58:07,254][42004] Updated weights for policy 0, policy_version 44956 (0.0023) +[2024-11-08 05:58:07,932][41694] Fps is (10 sec: 7376.0, 60 sec: 6690.1, 300 sec: 6720.2). Total num frames: 184143872. Throughput: 0: 1725.5. Samples: 41030006. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:58:07,933][41694] Avg episode reward: [(0, '4.495')] +[2024-11-08 05:58:12,329][42004] Updated weights for policy 0, policy_version 44966 (0.0035) +[2024-11-08 05:58:12,931][41694] Fps is (10 sec: 8192.0, 60 sec: 6758.4, 300 sec: 6789.6). Total num frames: 184184832. Throughput: 0: 1777.8. Samples: 41042074. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:58:12,934][41694] Avg episode reward: [(0, '4.549')] +[2024-11-08 05:58:17,470][42004] Updated weights for policy 0, policy_version 44976 (0.0026) +[2024-11-08 05:58:17,932][41694] Fps is (10 sec: 7782.3, 60 sec: 7085.6, 300 sec: 6789.6). Total num frames: 184221696. Throughput: 0: 1793.2. Samples: 41047994. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:58:17,934][41694] Avg episode reward: [(0, '4.535')] +[2024-11-08 05:58:22,811][42004] Updated weights for policy 0, policy_version 44986 (0.0023) +[2024-11-08 05:58:22,931][41694] Fps is (10 sec: 7782.5, 60 sec: 7236.4, 300 sec: 6803.5). Total num frames: 184262656. Throughput: 0: 1791.5. Samples: 41059664. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:58:22,933][41694] Avg episode reward: [(0, '4.474')] +[2024-11-08 05:58:28,371][41694] Fps is (10 sec: 5885.1, 60 sec: 6980.4, 300 sec: 6738.0). Total num frames: 184283136. Throughput: 0: 1638.9. Samples: 41065446. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:58:28,373][41694] Avg episode reward: [(0, '4.495')] +[2024-11-08 05:58:31,499][42004] Updated weights for policy 0, policy_version 44996 (0.0028) +[2024-11-08 05:58:32,931][41694] Fps is (10 sec: 4505.6, 60 sec: 6826.7, 300 sec: 6720.2). Total num frames: 184307712. Throughput: 0: 1657.4. Samples: 41071276. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:58:32,933][41694] Avg episode reward: [(0, '4.747')] +[2024-11-08 05:58:37,774][42004] Updated weights for policy 0, policy_version 45006 (0.0029) +[2024-11-08 05:58:37,931][41694] Fps is (10 sec: 6426.8, 60 sec: 6758.4, 300 sec: 6706.3). Total num frames: 184344576. Throughput: 0: 1620.2. Samples: 41080140. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:58:37,932][41694] Avg episode reward: [(0, '4.581')] +[2024-11-08 05:58:42,932][41694] Fps is (10 sec: 7372.3, 60 sec: 6690.1, 300 sec: 6720.2). Total num frames: 184381440. Throughput: 0: 1719.9. Samples: 41091700. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:58:42,934][41694] Avg episode reward: [(0, '4.626')] +[2024-11-08 05:58:43,043][42004] Updated weights for policy 0, policy_version 45016 (0.0026) +[2024-11-08 05:58:47,931][41694] Fps is (10 sec: 7782.4, 60 sec: 6758.4, 300 sec: 6789.6). Total num frames: 184422400. Throughput: 0: 1744.4. Samples: 41097634. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:58:47,933][41694] Avg episode reward: [(0, '4.375')] +[2024-11-08 05:58:48,382][42004] Updated weights for policy 0, policy_version 45026 (0.0023) +[2024-11-08 05:58:52,932][41694] Fps is (10 sec: 7782.8, 60 sec: 7034.6, 300 sec: 6789.6). Total num frames: 184459264. Throughput: 0: 1765.0. Samples: 41109432. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:58:52,934][41694] Avg episode reward: [(0, '4.414')] +[2024-11-08 05:58:53,518][42004] Updated weights for policy 0, policy_version 45036 (0.0030) +[2024-11-08 05:58:57,932][41694] Fps is (10 sec: 7782.0, 60 sec: 7168.5, 300 sec: 6803.5). Total num frames: 184500224. Throughput: 0: 1760.1. Samples: 41121278. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:58:57,936][41694] Avg episode reward: [(0, '4.369')] +[2024-11-08 05:58:58,788][42004] Updated weights for policy 0, policy_version 45046 (0.0026) +[2024-11-08 05:59:03,090][41694] Fps is (10 sec: 5644.7, 60 sec: 6876.7, 300 sec: 6730.5). Total num frames: 184516608. Throughput: 0: 1739.2. Samples: 41126536. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 05:59:03,093][41694] Avg episode reward: [(0, '4.424')] +[2024-11-08 05:59:07,709][42004] Updated weights for policy 0, policy_version 45056 (0.0033) +[2024-11-08 05:59:07,932][41694] Fps is (10 sec: 4915.4, 60 sec: 6758.4, 300 sec: 6734.1). Total num frames: 184549376. Throughput: 0: 1601.4. Samples: 41131728. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:59:07,938][41694] Avg episode reward: [(0, '4.315')] +[2024-11-08 05:59:12,932][41694] Fps is (10 sec: 6659.3, 60 sec: 6621.9, 300 sec: 6720.2). Total num frames: 184582144. Throughput: 0: 1727.5. Samples: 41142424. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:59:12,933][41694] Avg episode reward: [(0, '4.200')] +[2024-11-08 05:59:13,583][42004] Updated weights for policy 0, policy_version 45066 (0.0031) +[2024-11-08 05:59:17,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6690.1, 300 sec: 6784.1). Total num frames: 184623104. Throughput: 0: 1703.4. Samples: 41147928. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:59:17,935][41694] Avg episode reward: [(0, '4.514')] +[2024-11-08 05:59:18,759][42004] Updated weights for policy 0, policy_version 45076 (0.0034) +[2024-11-08 05:59:22,931][41694] Fps is (10 sec: 7782.5, 60 sec: 6621.9, 300 sec: 6789.6). Total num frames: 184659968. Throughput: 0: 1762.8. Samples: 41159464. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:59:22,933][41694] Avg episode reward: [(0, '4.557')] +[2024-11-08 05:59:24,167][42004] Updated weights for policy 0, policy_version 45086 (0.0027) +[2024-11-08 05:59:27,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6945.9, 300 sec: 6803.5). Total num frames: 184696832. Throughput: 0: 1760.6. Samples: 41170928. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 05:59:27,933][41694] Avg episode reward: [(0, '4.409')] +[2024-11-08 05:59:29,917][42004] Updated weights for policy 0, policy_version 45096 (0.0026) +[2024-11-08 05:59:32,932][41694] Fps is (10 sec: 7372.6, 60 sec: 7099.7, 300 sec: 6817.4). Total num frames: 184733696. Throughput: 0: 1745.0. Samples: 41176158. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 05:59:32,935][41694] Avg episode reward: [(0, '4.501')] +[2024-11-08 05:59:37,932][41694] Fps is (10 sec: 5324.7, 60 sec: 6758.4, 300 sec: 6748.0). Total num frames: 184750080. Throughput: 0: 1694.5. Samples: 41185684. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 05:59:37,934][41694] Avg episode reward: [(0, '4.449')] +[2024-11-08 05:59:37,964][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000045105_184750080.pth... +[2024-11-08 05:59:38,158][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000044709_183128064.pth +[2024-11-08 05:59:38,331][42004] Updated weights for policy 0, policy_version 45106 (0.0029) +[2024-11-08 05:59:42,932][41694] Fps is (10 sec: 4915.1, 60 sec: 6690.1, 300 sec: 6734.1). Total num frames: 184782848. Throughput: 0: 1572.7. Samples: 41192048. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 05:59:42,934][41694] Avg episode reward: [(0, '4.492')] +[2024-11-08 05:59:44,610][42004] Updated weights for policy 0, policy_version 45116 (0.0020) +[2024-11-08 05:59:47,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6621.9, 300 sec: 6720.2). Total num frames: 184819712. Throughput: 0: 1575.3. Samples: 41197176. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 05:59:47,933][41694] Avg episode reward: [(0, '4.395')] +[2024-11-08 05:59:49,871][42004] Updated weights for policy 0, policy_version 45126 (0.0022) +[2024-11-08 05:59:52,933][41694] Fps is (10 sec: 7372.4, 60 sec: 6621.8, 300 sec: 6779.2). Total num frames: 184856576. Throughput: 0: 1714.8. Samples: 41208894. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 05:59:52,935][41694] Avg episode reward: [(0, '4.568')] +[2024-11-08 05:59:55,635][42004] Updated weights for policy 0, policy_version 45136 (0.0022) +[2024-11-08 05:59:57,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6553.6, 300 sec: 6789.6). Total num frames: 184893440. Throughput: 0: 1714.4. Samples: 41219574. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 05:59:57,933][41694] Avg episode reward: [(0, '4.342')] +[2024-11-08 06:00:01,047][42004] Updated weights for policy 0, policy_version 45146 (0.0031) +[2024-11-08 06:00:02,932][41694] Fps is (10 sec: 6963.9, 60 sec: 6844.8, 300 sec: 6789.6). Total num frames: 184926208. Throughput: 0: 1720.5. Samples: 41225352. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:00:02,933][41694] Avg episode reward: [(0, '4.456')] +[2024-11-08 06:00:06,840][42004] Updated weights for policy 0, policy_version 45156 (0.0031) +[2024-11-08 06:00:07,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6894.9, 300 sec: 6803.5). Total num frames: 184963072. Throughput: 0: 1697.9. Samples: 41235868. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:00:07,934][41694] Avg episode reward: [(0, '4.438')] +[2024-11-08 06:00:12,932][41694] Fps is (10 sec: 5324.8, 60 sec: 6621.9, 300 sec: 6734.1). Total num frames: 184979456. Throughput: 0: 1586.9. Samples: 41242338. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:00:12,933][41694] Avg episode reward: [(0, '4.426')] +[2024-11-08 06:00:16,352][42004] Updated weights for policy 0, policy_version 45166 (0.0031) +[2024-11-08 06:00:17,932][41694] Fps is (10 sec: 4505.5, 60 sec: 6417.1, 300 sec: 6692.4). Total num frames: 185008128. Throughput: 0: 1542.4. Samples: 41245564. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:00:17,936][41694] Avg episode reward: [(0, '4.443')] +[2024-11-08 06:00:22,308][42004] Updated weights for policy 0, policy_version 45176 (0.0029) +[2024-11-08 06:00:22,931][41694] Fps is (10 sec: 6553.7, 60 sec: 6417.1, 300 sec: 6692.5). Total num frames: 185044992. Throughput: 0: 1540.3. Samples: 41254998. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:00:22,933][41694] Avg episode reward: [(0, '4.426')] +[2024-11-08 06:00:27,475][42004] Updated weights for policy 0, policy_version 45186 (0.0027) +[2024-11-08 06:00:27,932][41694] Fps is (10 sec: 7372.9, 60 sec: 6417.1, 300 sec: 6744.2). Total num frames: 185081856. Throughput: 0: 1665.5. Samples: 41266994. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:00:27,934][41694] Avg episode reward: [(0, '4.509')] +[2024-11-08 06:00:32,571][42004] Updated weights for policy 0, policy_version 45196 (0.0027) +[2024-11-08 06:00:32,932][41694] Fps is (10 sec: 7782.2, 60 sec: 6485.4, 300 sec: 6761.9). Total num frames: 185122816. Throughput: 0: 1684.4. Samples: 41272976. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:00:32,934][41694] Avg episode reward: [(0, '4.487')] +[2024-11-08 06:00:37,709][42004] Updated weights for policy 0, policy_version 45206 (0.0024) +[2024-11-08 06:00:37,933][41694] Fps is (10 sec: 8190.5, 60 sec: 6894.7, 300 sec: 6789.7). Total num frames: 185163776. Throughput: 0: 1694.8. Samples: 41285162. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:00:37,937][41694] Avg episode reward: [(0, '4.764')] +[2024-11-08 06:00:42,931][41694] Fps is (10 sec: 7782.6, 60 sec: 6963.3, 300 sec: 6789.6). Total num frames: 185200640. Throughput: 0: 1709.8. Samples: 41296514. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:00:42,934][41694] Avg episode reward: [(0, '4.562')] +[2024-11-08 06:00:43,287][42004] Updated weights for policy 0, policy_version 45216 (0.0029) +[2024-11-08 06:00:47,932][41694] Fps is (10 sec: 5325.7, 60 sec: 6621.8, 300 sec: 6706.3). Total num frames: 185217024. Throughput: 0: 1682.1. Samples: 41301046. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:00:47,933][41694] Avg episode reward: [(0, '4.462')] +[2024-11-08 06:00:52,140][42004] Updated weights for policy 0, policy_version 45226 (0.0032) +[2024-11-08 06:00:52,932][41694] Fps is (10 sec: 4915.1, 60 sec: 6553.7, 300 sec: 6678.6). Total num frames: 185249792. Throughput: 0: 1577.8. Samples: 41306868. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:00:52,933][41694] Avg episode reward: [(0, '4.389')] +[2024-11-08 06:00:57,339][42004] Updated weights for policy 0, policy_version 45236 (0.0033) +[2024-11-08 06:00:57,931][41694] Fps is (10 sec: 7373.0, 60 sec: 6621.9, 300 sec: 6692.4). Total num frames: 185290752. Throughput: 0: 1694.8. Samples: 41318604. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:00:57,933][41694] Avg episode reward: [(0, '4.618')] +[2024-11-08 06:01:02,679][42004] Updated weights for policy 0, policy_version 45246 (0.0028) +[2024-11-08 06:01:02,932][41694] Fps is (10 sec: 7782.5, 60 sec: 6690.1, 300 sec: 6754.2). Total num frames: 185327616. Throughput: 0: 1754.8. Samples: 41324532. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:01:02,934][41694] Avg episode reward: [(0, '4.590')] +[2024-11-08 06:01:07,784][42004] Updated weights for policy 0, policy_version 45256 (0.0028) +[2024-11-08 06:01:07,932][41694] Fps is (10 sec: 7782.3, 60 sec: 6758.4, 300 sec: 6803.5). Total num frames: 185368576. Throughput: 0: 1801.4. Samples: 41336062. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:01:07,933][41694] Avg episode reward: [(0, '4.489')] +[2024-11-08 06:01:12,932][41694] Fps is (10 sec: 7372.8, 60 sec: 7031.5, 300 sec: 6817.4). Total num frames: 185401344. Throughput: 0: 1778.4. Samples: 41347024. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:01:12,934][41694] Avg episode reward: [(0, '4.360')] +[2024-11-08 06:01:13,771][42004] Updated weights for policy 0, policy_version 45266 (0.0029) +[2024-11-08 06:01:17,932][41694] Fps is (10 sec: 6962.7, 60 sec: 7167.9, 300 sec: 6817.4). Total num frames: 185438208. Throughput: 0: 1756.3. Samples: 41352010. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:01:17,935][41694] Avg episode reward: [(0, '4.201')] +[2024-11-08 06:01:22,340][42004] Updated weights for policy 0, policy_version 45276 (0.0032) +[2024-11-08 06:01:22,932][41694] Fps is (10 sec: 4915.1, 60 sec: 6758.4, 300 sec: 6720.2). Total num frames: 185450496. Throughput: 0: 1662.1. Samples: 41359954. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:01:22,934][41694] Avg episode reward: [(0, '4.244')] +[2024-11-08 06:01:27,932][41694] Fps is (10 sec: 4915.5, 60 sec: 6758.4, 300 sec: 6720.2). Total num frames: 185487360. Throughput: 0: 1588.6. Samples: 41368000. Policy #0 lag: (min: 0.0, avg: 0.8, max: 1.0) +[2024-11-08 06:01:27,934][41694] Avg episode reward: [(0, '4.620')] +[2024-11-08 06:01:28,256][42004] Updated weights for policy 0, policy_version 45286 (0.0041) +[2024-11-08 06:01:32,932][41694] Fps is (10 sec: 7782.6, 60 sec: 6758.4, 300 sec: 6734.1). Total num frames: 185528320. Throughput: 0: 1615.3. Samples: 41373736. Policy #0 lag: (min: 0.0, avg: 0.8, max: 1.0) +[2024-11-08 06:01:32,933][41694] Avg episode reward: [(0, '4.443')] +[2024-11-08 06:01:33,381][42004] Updated weights for policy 0, policy_version 45296 (0.0030) +[2024-11-08 06:01:37,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6690.3, 300 sec: 6789.4). Total num frames: 185565184. Throughput: 0: 1755.4. Samples: 41385862. Policy #0 lag: (min: 0.0, avg: 0.8, max: 1.0) +[2024-11-08 06:01:37,934][41694] Avg episode reward: [(0, '4.642')] +[2024-11-08 06:01:37,969][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000045305_185569280.pth... +[2024-11-08 06:01:38,058][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000044907_183939072.pth +[2024-11-08 06:01:38,484][42004] Updated weights for policy 0, policy_version 45306 (0.0022) +[2024-11-08 06:01:42,931][41694] Fps is (10 sec: 7782.4, 60 sec: 6758.4, 300 sec: 6817.4). Total num frames: 185606144. Throughput: 0: 1756.3. Samples: 41397636. Policy #0 lag: (min: 0.0, avg: 0.8, max: 1.0) +[2024-11-08 06:01:42,933][41694] Avg episode reward: [(0, '4.691')] +[2024-11-08 06:01:43,776][42004] Updated weights for policy 0, policy_version 45316 (0.0026) +[2024-11-08 06:01:47,931][41694] Fps is (10 sec: 7782.5, 60 sec: 7099.8, 300 sec: 6831.3). Total num frames: 185643008. Throughput: 0: 1753.2. Samples: 41403426. Policy #0 lag: (min: 0.0, avg: 0.8, max: 1.0) +[2024-11-08 06:01:47,933][41694] Avg episode reward: [(0, '4.498')] +[2024-11-08 06:01:49,350][42004] Updated weights for policy 0, policy_version 45326 (0.0024) +[2024-11-08 06:01:52,932][41694] Fps is (10 sec: 6963.0, 60 sec: 7099.7, 300 sec: 6803.5). Total num frames: 185675776. Throughput: 0: 1742.0. Samples: 41414452. Policy #0 lag: (min: 0.0, avg: 0.8, max: 1.0) +[2024-11-08 06:01:52,934][41694] Avg episode reward: [(0, '4.417')] +[2024-11-08 06:01:57,932][41694] Fps is (10 sec: 4915.0, 60 sec: 6690.1, 300 sec: 6748.0). Total num frames: 185692160. Throughput: 0: 1621.2. Samples: 41419978. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:01:57,934][41694] Avg episode reward: [(0, '4.671')] +[2024-11-08 06:01:58,016][42004] Updated weights for policy 0, policy_version 45336 (0.0025) +[2024-11-08 06:02:02,932][41694] Fps is (10 sec: 5324.9, 60 sec: 6690.1, 300 sec: 6734.1). Total num frames: 185729024. Throughput: 0: 1620.8. Samples: 41424944. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:02:02,933][41694] Avg episode reward: [(0, '4.599')] +[2024-11-08 06:02:03,730][42004] Updated weights for policy 0, policy_version 45346 (0.0037) +[2024-11-08 06:02:07,931][41694] Fps is (10 sec: 7373.1, 60 sec: 6621.9, 300 sec: 6734.1). Total num frames: 185765888. Throughput: 0: 1685.3. Samples: 41435790. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:02:07,933][41694] Avg episode reward: [(0, '4.630')] +[2024-11-08 06:02:09,056][42004] Updated weights for policy 0, policy_version 45356 (0.0031) +[2024-11-08 06:02:12,932][41694] Fps is (10 sec: 7782.2, 60 sec: 6758.4, 300 sec: 6813.6). Total num frames: 185806848. Throughput: 0: 1770.9. Samples: 41447690. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:02:12,934][41694] Avg episode reward: [(0, '4.600')] +[2024-11-08 06:02:14,391][42004] Updated weights for policy 0, policy_version 45366 (0.0027) +[2024-11-08 06:02:17,931][41694] Fps is (10 sec: 7782.4, 60 sec: 6758.5, 300 sec: 6831.3). Total num frames: 185843712. Throughput: 0: 1774.2. Samples: 41453576. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:02:17,934][41694] Avg episode reward: [(0, '4.430')] +[2024-11-08 06:02:19,655][42004] Updated weights for policy 0, policy_version 45376 (0.0029) +[2024-11-08 06:02:22,932][41694] Fps is (10 sec: 7782.4, 60 sec: 7236.3, 300 sec: 6859.1). Total num frames: 185884672. Throughput: 0: 1760.3. Samples: 41465076. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:02:22,935][41694] Avg episode reward: [(0, '4.429')] +[2024-11-08 06:02:25,299][42004] Updated weights for policy 0, policy_version 45386 (0.0034) +[2024-11-08 06:02:27,932][41694] Fps is (10 sec: 6963.1, 60 sec: 7099.7, 300 sec: 6831.3). Total num frames: 185913344. Throughput: 0: 1726.9. Samples: 41475348. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:02:27,934][41694] Avg episode reward: [(0, '4.342')] +[2024-11-08 06:02:32,932][41694] Fps is (10 sec: 4505.7, 60 sec: 6690.1, 300 sec: 6748.0). Total num frames: 185929728. Throughput: 0: 1673.7. Samples: 41478742. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:02:32,936][41694] Avg episode reward: [(0, '4.522')] +[2024-11-08 06:02:34,404][42004] Updated weights for policy 0, policy_version 45396 (0.0033) +[2024-11-08 06:02:37,932][41694] Fps is (10 sec: 5324.3, 60 sec: 6690.0, 300 sec: 6734.1). Total num frames: 185966592. Throughput: 0: 1582.6. Samples: 41485670. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:02:37,934][41694] Avg episode reward: [(0, '4.397')] +[2024-11-08 06:02:40,162][42004] Updated weights for policy 0, policy_version 45406 (0.0037) +[2024-11-08 06:02:42,931][41694] Fps is (10 sec: 7373.0, 60 sec: 6621.9, 300 sec: 6734.1). Total num frames: 186003456. Throughput: 0: 1707.8. Samples: 41496830. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:02:42,934][41694] Avg episode reward: [(0, '4.427')] +[2024-11-08 06:02:45,410][42004] Updated weights for policy 0, policy_version 45416 (0.0023) +[2024-11-08 06:02:47,931][41694] Fps is (10 sec: 7373.6, 60 sec: 6621.9, 300 sec: 6789.8). Total num frames: 186040320. Throughput: 0: 1728.4. Samples: 41502722. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:02:47,934][41694] Avg episode reward: [(0, '4.271')] +[2024-11-08 06:02:50,714][42004] Updated weights for policy 0, policy_version 45426 (0.0020) +[2024-11-08 06:02:52,931][41694] Fps is (10 sec: 7782.4, 60 sec: 6758.4, 300 sec: 6817.5). Total num frames: 186081280. Throughput: 0: 1745.0. Samples: 41514316. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:02:52,933][41694] Avg episode reward: [(0, '4.407')] +[2024-11-08 06:02:56,329][42004] Updated weights for policy 0, policy_version 45436 (0.0024) +[2024-11-08 06:02:57,931][41694] Fps is (10 sec: 7372.7, 60 sec: 7031.5, 300 sec: 6817.4). Total num frames: 186114048. Throughput: 0: 1728.4. Samples: 41525468. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:02:57,933][41694] Avg episode reward: [(0, '4.632')] +[2024-11-08 06:03:02,908][42004] Updated weights for policy 0, policy_version 45446 (0.0039) +[2024-11-08 06:03:02,932][41694] Fps is (10 sec: 6553.1, 60 sec: 6963.1, 300 sec: 6789.6). Total num frames: 186146816. Throughput: 0: 1696.5. Samples: 41529920. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:03:02,935][41694] Avg episode reward: [(0, '4.405')] +[2024-11-08 06:03:07,932][41694] Fps is (10 sec: 4915.1, 60 sec: 6621.8, 300 sec: 6706.3). Total num frames: 186163200. Throughput: 0: 1578.0. Samples: 41536084. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:03:07,934][41694] Avg episode reward: [(0, '4.454')] +[2024-11-08 06:03:11,325][42004] Updated weights for policy 0, policy_version 45456 (0.0039) +[2024-11-08 06:03:12,932][41694] Fps is (10 sec: 4915.5, 60 sec: 6485.4, 300 sec: 6692.4). Total num frames: 186195968. Throughput: 0: 1566.0. Samples: 41545818. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:03:12,934][41694] Avg episode reward: [(0, '4.488')] +[2024-11-08 06:03:16,570][42004] Updated weights for policy 0, policy_version 45466 (0.0043) +[2024-11-08 06:03:17,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6553.6, 300 sec: 6692.4). Total num frames: 186236928. Throughput: 0: 1616.8. Samples: 41551496. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:03:17,938][41694] Avg episode reward: [(0, '4.612')] +[2024-11-08 06:03:21,797][42004] Updated weights for policy 0, policy_version 45476 (0.0030) +[2024-11-08 06:03:22,932][41694] Fps is (10 sec: 8191.8, 60 sec: 6553.6, 300 sec: 6772.0). Total num frames: 186277888. Throughput: 0: 1726.9. Samples: 41563380. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:03:22,933][41694] Avg episode reward: [(0, '4.270')] +[2024-11-08 06:03:27,015][42004] Updated weights for policy 0, policy_version 45486 (0.0027) +[2024-11-08 06:03:27,932][41694] Fps is (10 sec: 7782.2, 60 sec: 6690.1, 300 sec: 6803.5). Total num frames: 186314752. Throughput: 0: 1740.9. Samples: 41575170. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:03:27,934][41694] Avg episode reward: [(0, '4.563')] +[2024-11-08 06:03:32,471][42004] Updated weights for policy 0, policy_version 45496 (0.0029) +[2024-11-08 06:03:32,932][41694] Fps is (10 sec: 7373.0, 60 sec: 7031.5, 300 sec: 6803.5). Total num frames: 186351616. Throughput: 0: 1729.2. Samples: 41580536. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:03:32,934][41694] Avg episode reward: [(0, '4.278')] +[2024-11-08 06:03:37,932][41694] Fps is (10 sec: 6963.3, 60 sec: 6963.3, 300 sec: 6789.6). Total num frames: 186384384. Throughput: 0: 1700.2. Samples: 41590828. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:03:37,934][41694] Avg episode reward: [(0, '4.445')] +[2024-11-08 06:03:37,955][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000045504_186384384.pth... +[2024-11-08 06:03:38,091][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000045105_184750080.pth +[2024-11-08 06:03:41,656][42004] Updated weights for policy 0, policy_version 45506 (0.0030) +[2024-11-08 06:03:42,932][41694] Fps is (10 sec: 4915.1, 60 sec: 6621.8, 300 sec: 6706.3). Total num frames: 186400768. Throughput: 0: 1573.1. Samples: 41596260. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:03:42,935][41694] Avg episode reward: [(0, '4.489')] +[2024-11-08 06:03:47,522][42004] Updated weights for policy 0, policy_version 45516 (0.0034) +[2024-11-08 06:03:47,931][41694] Fps is (10 sec: 4915.4, 60 sec: 6553.6, 300 sec: 6692.4). Total num frames: 186433536. Throughput: 0: 1582.6. Samples: 41601136. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:03:47,933][41694] Avg episode reward: [(0, '4.277')] +[2024-11-08 06:03:52,931][41694] Fps is (10 sec: 6963.4, 60 sec: 6485.3, 300 sec: 6678.6). Total num frames: 186470400. Throughput: 0: 1696.3. Samples: 41612416. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:03:52,933][41694] Avg episode reward: [(0, '4.526')] +[2024-11-08 06:03:53,085][42004] Updated weights for policy 0, policy_version 45526 (0.0026) +[2024-11-08 06:03:57,932][41694] Fps is (10 sec: 7372.2, 60 sec: 6553.5, 300 sec: 6751.6). Total num frames: 186507264. Throughput: 0: 1709.6. Samples: 41622750. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:03:57,934][41694] Avg episode reward: [(0, '4.406')] +[2024-11-08 06:03:58,925][42004] Updated weights for policy 0, policy_version 45536 (0.0030) +[2024-11-08 06:04:02,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6553.7, 300 sec: 6748.0). Total num frames: 186540032. Throughput: 0: 1706.3. Samples: 41628278. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:04:02,934][41694] Avg episode reward: [(0, '4.553')] +[2024-11-08 06:04:05,170][42004] Updated weights for policy 0, policy_version 45546 (0.0033) +[2024-11-08 06:04:07,934][41694] Fps is (10 sec: 6552.7, 60 sec: 6826.5, 300 sec: 6747.9). Total num frames: 186572800. Throughput: 0: 1661.8. Samples: 41638162. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:04:07,941][41694] Avg episode reward: [(0, '4.625')] +[2024-11-08 06:04:11,465][42004] Updated weights for policy 0, policy_version 45556 (0.0031) +[2024-11-08 06:04:12,932][41694] Fps is (10 sec: 6553.5, 60 sec: 6826.7, 300 sec: 6720.2). Total num frames: 186605568. Throughput: 0: 1613.1. Samples: 41647758. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:04:12,934][41694] Avg episode reward: [(0, '4.423')] +[2024-11-08 06:04:17,932][41694] Fps is (10 sec: 5735.5, 60 sec: 6553.6, 300 sec: 6678.6). Total num frames: 186630144. Throughput: 0: 1545.6. Samples: 41650086. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:04:17,936][41694] Avg episode reward: [(0, '4.503')] +[2024-11-08 06:04:18,776][42004] Updated weights for policy 0, policy_version 45566 (0.0022) +[2024-11-08 06:04:22,931][41694] Fps is (10 sec: 6144.1, 60 sec: 6485.4, 300 sec: 6678.6). Total num frames: 186667008. Throughput: 0: 1559.3. Samples: 41660994. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:04:22,933][41694] Avg episode reward: [(0, '4.500')] +[2024-11-08 06:04:24,066][42004] Updated weights for policy 0, policy_version 45576 (0.0024) +[2024-11-08 06:04:27,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6553.7, 300 sec: 6692.5). Total num frames: 186707968. Throughput: 0: 1700.4. Samples: 41672776. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:04:27,933][41694] Avg episode reward: [(0, '4.605')] +[2024-11-08 06:04:29,590][42004] Updated weights for policy 0, policy_version 45586 (0.0031) +[2024-11-08 06:04:32,931][41694] Fps is (10 sec: 7782.4, 60 sec: 6553.6, 300 sec: 6761.9). Total num frames: 186744832. Throughput: 0: 1720.4. Samples: 41678552. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:04:32,933][41694] Avg episode reward: [(0, '4.620')] +[2024-11-08 06:04:34,945][42004] Updated weights for policy 0, policy_version 45596 (0.0030) +[2024-11-08 06:04:37,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6621.9, 300 sec: 6775.8). Total num frames: 186781696. Throughput: 0: 1726.0. Samples: 41690088. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:04:37,933][41694] Avg episode reward: [(0, '4.330')] +[2024-11-08 06:04:40,659][42004] Updated weights for policy 0, policy_version 45606 (0.0035) +[2024-11-08 06:04:42,932][41694] Fps is (10 sec: 6962.9, 60 sec: 6894.9, 300 sec: 6761.9). Total num frames: 186814464. Throughput: 0: 1716.5. Samples: 41699992. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:04:42,935][41694] Avg episode reward: [(0, '4.504')] +[2024-11-08 06:04:49,618][41694] Fps is (10 sec: 4906.8, 60 sec: 6573.6, 300 sec: 6682.0). Total num frames: 186839040. Throughput: 0: 1635.4. Samples: 41704628. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:04:49,622][41694] Avg episode reward: [(0, '4.455')] +[2024-11-08 06:04:49,659][42004] Updated weights for policy 0, policy_version 45616 (0.0041) +[2024-11-08 06:04:52,932][41694] Fps is (10 sec: 4915.3, 60 sec: 6553.6, 300 sec: 6678.6). Total num frames: 186863616. Throughput: 0: 1612.4. Samples: 41710716. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:04:52,935][41694] Avg episode reward: [(0, '4.453')] +[2024-11-08 06:04:55,744][42004] Updated weights for policy 0, policy_version 45626 (0.0031) +[2024-11-08 06:04:57,932][41694] Fps is (10 sec: 6897.7, 60 sec: 6485.4, 300 sec: 6678.6). Total num frames: 186896384. Throughput: 0: 1628.9. Samples: 41721060. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:04:57,933][41694] Avg episode reward: [(0, '4.488')] +[2024-11-08 06:05:01,189][42004] Updated weights for policy 0, policy_version 45636 (0.0029) +[2024-11-08 06:05:02,931][41694] Fps is (10 sec: 6963.5, 60 sec: 6553.6, 300 sec: 6678.6). Total num frames: 186933248. Throughput: 0: 1696.9. Samples: 41726448. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:05:02,934][41694] Avg episode reward: [(0, '4.326')] +[2024-11-08 06:05:06,910][42004] Updated weights for policy 0, policy_version 45646 (0.0036) +[2024-11-08 06:05:07,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6622.1, 300 sec: 6748.0). Total num frames: 186970112. Throughput: 0: 1692.7. Samples: 41737166. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:05:07,933][41694] Avg episode reward: [(0, '4.387')] +[2024-11-08 06:05:12,539][42004] Updated weights for policy 0, policy_version 45656 (0.0044) +[2024-11-08 06:05:12,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6690.1, 300 sec: 6775.8). Total num frames: 187006976. Throughput: 0: 1680.6. Samples: 41748402. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:05:12,933][41694] Avg episode reward: [(0, '4.618')] +[2024-11-08 06:05:17,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6826.7, 300 sec: 6761.9). Total num frames: 187039744. Throughput: 0: 1660.8. Samples: 41753290. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:05:17,934][41694] Avg episode reward: [(0, '4.487')] +[2024-11-08 06:05:19,177][42004] Updated weights for policy 0, policy_version 45666 (0.0034) +[2024-11-08 06:05:23,740][41694] Fps is (10 sec: 5305.3, 60 sec: 6533.8, 300 sec: 6701.8). Total num frames: 187064320. Throughput: 0: 1580.0. Samples: 41762464. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:05:23,743][41694] Avg episode reward: [(0, '4.503')] +[2024-11-08 06:05:26,747][42004] Updated weights for policy 0, policy_version 45676 (0.0021) +[2024-11-08 06:05:27,932][41694] Fps is (10 sec: 5734.5, 60 sec: 6485.3, 300 sec: 6692.4). Total num frames: 187097088. Throughput: 0: 1557.1. Samples: 41770062. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:05:27,935][41694] Avg episode reward: [(0, '4.562')] +[2024-11-08 06:05:32,693][42004] Updated weights for policy 0, policy_version 45686 (0.0024) +[2024-11-08 06:05:32,939][41694] Fps is (10 sec: 7124.3, 60 sec: 6416.2, 300 sec: 6664.5). Total num frames: 187129856. Throughput: 0: 1623.2. Samples: 41774946. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:05:32,945][41694] Avg episode reward: [(0, '4.652')] +[2024-11-08 06:05:37,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6417.1, 300 sec: 6664.7). Total num frames: 187166720. Throughput: 0: 1677.9. Samples: 41786220. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:05:37,944][41694] Avg episode reward: [(0, '4.414')] +[2024-11-08 06:05:38,072][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000045696_187170816.pth... +[2024-11-08 06:05:38,074][42004] Updated weights for policy 0, policy_version 45696 (0.0028) +[2024-11-08 06:05:38,208][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000045305_185569280.pth +[2024-11-08 06:05:42,932][41694] Fps is (10 sec: 7378.5, 60 sec: 6485.4, 300 sec: 6734.1). Total num frames: 187203584. Throughput: 0: 1696.8. Samples: 41797418. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:05:42,935][41694] Avg episode reward: [(0, '4.387')] +[2024-11-08 06:05:43,650][42004] Updated weights for policy 0, policy_version 45706 (0.0029) +[2024-11-08 06:05:47,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6883.6, 300 sec: 6748.0). Total num frames: 187240448. Throughput: 0: 1700.2. Samples: 41802958. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:05:47,933][41694] Avg episode reward: [(0, '4.597')] +[2024-11-08 06:05:49,404][42004] Updated weights for policy 0, policy_version 45716 (0.0024) +[2024-11-08 06:05:52,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6826.7, 300 sec: 6720.2). Total num frames: 187273216. Throughput: 0: 1676.2. Samples: 41812594. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:05:52,934][41694] Avg episode reward: [(0, '4.401')] +[2024-11-08 06:05:55,736][42004] Updated weights for policy 0, policy_version 45726 (0.0029) +[2024-11-08 06:05:58,195][41694] Fps is (10 sec: 5188.0, 60 sec: 6592.9, 300 sec: 6658.7). Total num frames: 187293696. Throughput: 0: 1529.5. Samples: 41817634. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:05:58,199][41694] Avg episode reward: [(0, '4.439')] +[2024-11-08 06:06:02,933][41694] Fps is (10 sec: 5324.1, 60 sec: 6553.5, 300 sec: 6636.9). Total num frames: 187326464. Throughput: 0: 1581.4. Samples: 41824456. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:06:02,935][41694] Avg episode reward: [(0, '4.372')] +[2024-11-08 06:06:03,956][42004] Updated weights for policy 0, policy_version 45736 (0.0034) +[2024-11-08 06:06:07,931][41694] Fps is (10 sec: 6731.3, 60 sec: 6485.3, 300 sec: 6636.9). Total num frames: 187359232. Throughput: 0: 1627.7. Samples: 41834394. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:06:07,933][41694] Avg episode reward: [(0, '4.601')] +[2024-11-08 06:06:10,077][42004] Updated weights for policy 0, policy_version 45746 (0.0025) +[2024-11-08 06:06:12,931][41694] Fps is (10 sec: 6554.4, 60 sec: 6417.1, 300 sec: 6623.0). Total num frames: 187392000. Throughput: 0: 1653.9. Samples: 41844488. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:06:12,933][41694] Avg episode reward: [(0, '4.754')] +[2024-11-08 06:06:16,096][42004] Updated weights for policy 0, policy_version 45756 (0.0028) +[2024-11-08 06:06:17,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6485.4, 300 sec: 6706.3). Total num frames: 187428864. Throughput: 0: 1656.2. Samples: 41849460. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:06:17,933][41694] Avg episode reward: [(0, '4.623')] +[2024-11-08 06:06:21,817][42004] Updated weights for policy 0, policy_version 45766 (0.0036) +[2024-11-08 06:06:22,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6781.6, 300 sec: 6706.3). Total num frames: 187465728. Throughput: 0: 1651.9. Samples: 41860556. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:06:22,933][41694] Avg episode reward: [(0, '4.314')] +[2024-11-08 06:06:27,931][42004] Updated weights for policy 0, policy_version 45776 (0.0038) +[2024-11-08 06:06:27,932][41694] Fps is (10 sec: 6962.8, 60 sec: 6690.1, 300 sec: 6678.6). Total num frames: 187498496. Throughput: 0: 1627.4. Samples: 41870652. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:06:27,934][41694] Avg episode reward: [(0, '4.359')] +[2024-11-08 06:06:32,932][41694] Fps is (10 sec: 5324.8, 60 sec: 6486.2, 300 sec: 6623.0). Total num frames: 187518976. Throughput: 0: 1624.8. Samples: 41876074. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:06:32,934][41694] Avg episode reward: [(0, '4.363')] +[2024-11-08 06:06:35,498][42004] Updated weights for policy 0, policy_version 45786 (0.0028) +[2024-11-08 06:06:37,931][41694] Fps is (10 sec: 5734.7, 60 sec: 6485.3, 300 sec: 6609.1). Total num frames: 187555840. Throughput: 0: 1573.2. Samples: 41883390. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:06:37,933][41694] Avg episode reward: [(0, '4.428')] +[2024-11-08 06:06:41,125][42004] Updated weights for policy 0, policy_version 45796 (0.0027) +[2024-11-08 06:06:42,931][41694] Fps is (10 sec: 7373.0, 60 sec: 6485.3, 300 sec: 6609.1). Total num frames: 187592704. Throughput: 0: 1714.5. Samples: 41894336. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:06:42,933][41694] Avg episode reward: [(0, '4.585')] +[2024-11-08 06:06:46,785][42004] Updated weights for policy 0, policy_version 45806 (0.0026) +[2024-11-08 06:06:47,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6485.4, 300 sec: 6623.0). Total num frames: 187629568. Throughput: 0: 1668.2. Samples: 41899522. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:06:47,933][41694] Avg episode reward: [(0, '4.715')] +[2024-11-08 06:06:52,158][42004] Updated weights for policy 0, policy_version 45816 (0.0034) +[2024-11-08 06:06:52,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6553.6, 300 sec: 6692.5). Total num frames: 187666432. Throughput: 0: 1701.0. Samples: 41910938. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:06:52,934][41694] Avg episode reward: [(0, '4.345')] +[2024-11-08 06:06:57,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6788.3, 300 sec: 6678.6). Total num frames: 187699200. Throughput: 0: 1709.0. Samples: 41921392. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:06:57,934][41694] Avg episode reward: [(0, '4.316')] +[2024-11-08 06:06:58,202][42004] Updated weights for policy 0, policy_version 45826 (0.0033) +[2024-11-08 06:07:02,932][41694] Fps is (10 sec: 6553.4, 60 sec: 6758.5, 300 sec: 6664.7). Total num frames: 187731968. Throughput: 0: 1706.3. Samples: 41926246. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:07:02,934][41694] Avg episode reward: [(0, '4.682')] +[2024-11-08 06:07:04,417][42004] Updated weights for policy 0, policy_version 45836 (0.0043) +[2024-11-08 06:07:07,932][41694] Fps is (10 sec: 5324.6, 60 sec: 6553.6, 300 sec: 6595.3). Total num frames: 187752448. Throughput: 0: 1640.6. Samples: 41934382. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:07:07,934][41694] Avg episode reward: [(0, '4.595')] +[2024-11-08 06:07:12,326][42004] Updated weights for policy 0, policy_version 45846 (0.0035) +[2024-11-08 06:07:12,931][41694] Fps is (10 sec: 5734.6, 60 sec: 6621.9, 300 sec: 6595.3). Total num frames: 187789312. Throughput: 0: 1617.7. Samples: 41943446. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:07:12,934][41694] Avg episode reward: [(0, '4.526')] +[2024-11-08 06:07:17,729][42004] Updated weights for policy 0, policy_version 45856 (0.0033) +[2024-11-08 06:07:17,932][41694] Fps is (10 sec: 7372.9, 60 sec: 6621.9, 300 sec: 6581.4). Total num frames: 187826176. Throughput: 0: 1618.3. Samples: 41948898. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:07:17,934][41694] Avg episode reward: [(0, '4.496')] +[2024-11-08 06:07:22,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6621.9, 300 sec: 6609.1). Total num frames: 187863040. Throughput: 0: 1707.4. Samples: 41960222. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:07:22,933][41694] Avg episode reward: [(0, '4.549')] +[2024-11-08 06:07:23,090][42004] Updated weights for policy 0, policy_version 45866 (0.0030) +[2024-11-08 06:07:27,933][41694] Fps is (10 sec: 7372.1, 60 sec: 6690.1, 300 sec: 6678.5). Total num frames: 187899904. Throughput: 0: 1717.0. Samples: 41971602. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:07:27,935][41694] Avg episode reward: [(0, '4.385')] +[2024-11-08 06:07:28,522][42004] Updated weights for policy 0, policy_version 45876 (0.0031) +[2024-11-08 06:07:32,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6963.2, 300 sec: 6678.6). Total num frames: 187936768. Throughput: 0: 1722.2. Samples: 41977022. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:07:32,933][41694] Avg episode reward: [(0, '4.518')] +[2024-11-08 06:07:34,671][42004] Updated weights for policy 0, policy_version 45886 (0.0034) +[2024-11-08 06:07:37,932][41694] Fps is (10 sec: 6963.3, 60 sec: 6894.8, 300 sec: 6664.7). Total num frames: 187969536. Throughput: 0: 1693.7. Samples: 41987154. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:07:37,937][41694] Avg episode reward: [(0, '4.486')] +[2024-11-08 06:07:37,991][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000045892_187973632.pth... +[2024-11-08 06:07:38,114][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000045504_186384384.pth +[2024-11-08 06:07:42,471][42004] Updated weights for policy 0, policy_version 45896 (0.0028) +[2024-11-08 06:07:42,932][41694] Fps is (10 sec: 5324.9, 60 sec: 6621.8, 300 sec: 6609.1). Total num frames: 187990016. Throughput: 0: 1619.1. Samples: 41994254. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:07:42,934][41694] Avg episode reward: [(0, '4.510')] +[2024-11-08 06:07:47,931][41694] Fps is (10 sec: 5734.9, 60 sec: 6621.9, 300 sec: 6595.3). Total num frames: 188026880. Throughput: 0: 1622.3. Samples: 41999248. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:07:47,933][41694] Avg episode reward: [(0, '4.386')] +[2024-11-08 06:07:48,498][42004] Updated weights for policy 0, policy_version 45906 (0.0039) +[2024-11-08 06:07:52,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6621.9, 300 sec: 6609.1). Total num frames: 188063744. Throughput: 0: 1680.8. Samples: 42010018. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:07:52,935][41694] Avg episode reward: [(0, '4.353')] +[2024-11-08 06:07:53,797][42004] Updated weights for policy 0, policy_version 45916 (0.0029) +[2024-11-08 06:07:57,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6690.1, 300 sec: 6623.0). Total num frames: 188100608. Throughput: 0: 1723.3. Samples: 42020994. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:07:57,933][41694] Avg episode reward: [(0, '4.275')] +[2024-11-08 06:07:59,525][42004] Updated weights for policy 0, policy_version 45926 (0.0026) +[2024-11-08 06:08:02,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6690.2, 300 sec: 6678.6). Total num frames: 188133376. Throughput: 0: 1725.7. Samples: 42026556. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:08:02,936][41694] Avg episode reward: [(0, '4.278')] +[2024-11-08 06:08:05,858][42004] Updated weights for policy 0, policy_version 45936 (0.0029) +[2024-11-08 06:08:07,932][41694] Fps is (10 sec: 6553.5, 60 sec: 6895.0, 300 sec: 6678.6). Total num frames: 188166144. Throughput: 0: 1690.9. Samples: 42036314. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:08:07,936][41694] Avg episode reward: [(0, '4.782')] +[2024-11-08 06:08:11,738][42004] Updated weights for policy 0, policy_version 45946 (0.0028) +[2024-11-08 06:08:12,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6894.9, 300 sec: 6664.7). Total num frames: 188203008. Throughput: 0: 1669.1. Samples: 42046710. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:08:12,933][41694] Avg episode reward: [(0, '4.462')] +[2024-11-08 06:08:17,931][41694] Fps is (10 sec: 5324.8, 60 sec: 6553.6, 300 sec: 6581.4). Total num frames: 188219392. Throughput: 0: 1608.5. Samples: 42049404. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:08:17,933][41694] Avg episode reward: [(0, '4.510')] +[2024-11-08 06:08:19,715][42004] Updated weights for policy 0, policy_version 45956 (0.0022) +[2024-11-08 06:08:22,932][41694] Fps is (10 sec: 5324.3, 60 sec: 6553.5, 300 sec: 6581.4). Total num frames: 188256256. Throughput: 0: 1595.7. Samples: 42058962. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:08:22,934][41694] Avg episode reward: [(0, '4.332')] +[2024-11-08 06:08:25,405][42004] Updated weights for policy 0, policy_version 45966 (0.0032) +[2024-11-08 06:08:27,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6553.7, 300 sec: 6581.4). Total num frames: 188293120. Throughput: 0: 1676.4. Samples: 42069690. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:08:27,934][41694] Avg episode reward: [(0, '4.497')] +[2024-11-08 06:08:30,898][42004] Updated weights for policy 0, policy_version 45976 (0.0023) +[2024-11-08 06:08:32,931][41694] Fps is (10 sec: 7373.5, 60 sec: 6553.6, 300 sec: 6595.3). Total num frames: 188329984. Throughput: 0: 1689.9. Samples: 42075294. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:08:32,934][41694] Avg episode reward: [(0, '4.503')] +[2024-11-08 06:08:36,388][42004] Updated weights for policy 0, policy_version 45986 (0.0025) +[2024-11-08 06:08:37,932][41694] Fps is (10 sec: 7372.3, 60 sec: 6621.9, 300 sec: 6664.7). Total num frames: 188366848. Throughput: 0: 1699.1. Samples: 42086480. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:08:37,937][41694] Avg episode reward: [(0, '4.468')] +[2024-11-08 06:08:42,742][42004] Updated weights for policy 0, policy_version 45996 (0.0036) +[2024-11-08 06:08:42,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6826.7, 300 sec: 6664.7). Total num frames: 188399616. Throughput: 0: 1673.6. Samples: 42096304. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:08:42,934][41694] Avg episode reward: [(0, '4.552')] +[2024-11-08 06:08:47,931][41694] Fps is (10 sec: 6963.7, 60 sec: 6826.7, 300 sec: 6664.7). Total num frames: 188436480. Throughput: 0: 1670.7. Samples: 42101738. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:08:47,933][41694] Avg episode reward: [(0, '4.432')] +[2024-11-08 06:08:50,362][42004] Updated weights for policy 0, policy_version 46006 (0.0032) +[2024-11-08 06:08:52,931][41694] Fps is (10 sec: 5734.5, 60 sec: 6553.6, 300 sec: 6609.2). Total num frames: 188456960. Throughput: 0: 1618.3. Samples: 42109136. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:08:52,933][41694] Avg episode reward: [(0, '4.421')] +[2024-11-08 06:08:55,849][42004] Updated weights for policy 0, policy_version 46016 (0.0054) +[2024-11-08 06:08:57,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6553.6, 300 sec: 6623.0). Total num frames: 188493824. Throughput: 0: 1633.3. Samples: 42120210. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:08:57,936][41694] Avg episode reward: [(0, '4.375')] +[2024-11-08 06:09:01,392][42004] Updated weights for policy 0, policy_version 46026 (0.0030) +[2024-11-08 06:09:02,932][41694] Fps is (10 sec: 7372.5, 60 sec: 6621.8, 300 sec: 6636.9). Total num frames: 188530688. Throughput: 0: 1692.6. Samples: 42125572. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:09:02,934][41694] Avg episode reward: [(0, '4.476')] +[2024-11-08 06:09:07,077][42004] Updated weights for policy 0, policy_version 46036 (0.0031) +[2024-11-08 06:09:07,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6690.1, 300 sec: 6650.8). Total num frames: 188567552. Throughput: 0: 1720.2. Samples: 42136372. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:09:07,934][41694] Avg episode reward: [(0, '4.482')] +[2024-11-08 06:09:12,781][42004] Updated weights for policy 0, policy_version 46046 (0.0030) +[2024-11-08 06:09:12,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6690.1, 300 sec: 6692.4). Total num frames: 188604416. Throughput: 0: 1725.6. Samples: 42147340. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:09:12,934][41694] Avg episode reward: [(0, '4.483')] +[2024-11-08 06:09:17,931][41694] Fps is (10 sec: 6963.4, 60 sec: 6963.2, 300 sec: 6678.6). Total num frames: 188637184. Throughput: 0: 1710.2. Samples: 42152254. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:09:17,934][41694] Avg episode reward: [(0, '4.487')] +[2024-11-08 06:09:18,701][42004] Updated weights for policy 0, policy_version 46056 (0.0027) +[2024-11-08 06:09:24,522][41694] Fps is (10 sec: 5654.3, 60 sec: 6717.0, 300 sec: 6615.1). Total num frames: 188669952. Throughput: 0: 1645.2. Samples: 42163132. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:09:24,526][41694] Avg episode reward: [(0, '4.530')] +[2024-11-08 06:09:26,462][42004] Updated weights for policy 0, policy_version 46066 (0.0026) +[2024-11-08 06:09:27,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6690.1, 300 sec: 6609.1). Total num frames: 188694528. Throughput: 0: 1643.1. Samples: 42170242. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:09:27,935][41694] Avg episode reward: [(0, '4.374')] +[2024-11-08 06:09:32,187][42004] Updated weights for policy 0, policy_version 46076 (0.0026) +[2024-11-08 06:09:32,932][41694] Fps is (10 sec: 7306.2, 60 sec: 6690.1, 300 sec: 6609.1). Total num frames: 188731392. Throughput: 0: 1631.4. Samples: 42175150. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:09:32,933][41694] Avg episode reward: [(0, '4.517')] +[2024-11-08 06:09:37,749][42004] Updated weights for policy 0, policy_version 46086 (0.0028) +[2024-11-08 06:09:37,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6690.2, 300 sec: 6623.0). Total num frames: 188768256. Throughput: 0: 1714.3. Samples: 42186282. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:09:37,933][41694] Avg episode reward: [(0, '4.497')] +[2024-11-08 06:09:37,944][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000046086_188768256.pth... +[2024-11-08 06:09:38,086][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000045696_187170816.pth +[2024-11-08 06:09:42,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6690.1, 300 sec: 6689.0). Total num frames: 188801024. Throughput: 0: 1700.1. Samples: 42196716. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:09:42,933][41694] Avg episode reward: [(0, '4.654')] +[2024-11-08 06:09:43,713][42004] Updated weights for policy 0, policy_version 46096 (0.0038) +[2024-11-08 06:09:47,931][41694] Fps is (10 sec: 6553.8, 60 sec: 6621.9, 300 sec: 6678.6). Total num frames: 188833792. Throughput: 0: 1696.9. Samples: 42201930. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:09:47,933][41694] Avg episode reward: [(0, '4.421')] +[2024-11-08 06:09:49,998][42004] Updated weights for policy 0, policy_version 46106 (0.0035) +[2024-11-08 06:09:52,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6894.9, 300 sec: 6692.5). Total num frames: 188870656. Throughput: 0: 1676.6. Samples: 42211818. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:09:52,933][41694] Avg episode reward: [(0, '4.385')] +[2024-11-08 06:09:55,744][42004] Updated weights for policy 0, policy_version 46116 (0.0032) +[2024-11-08 06:09:58,922][41694] Fps is (10 sec: 5590.2, 60 sec: 6581.5, 300 sec: 6628.5). Total num frames: 188895232. Throughput: 0: 1516.8. Samples: 42217100. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:09:58,925][41694] Avg episode reward: [(0, '4.362')] +[2024-11-08 06:10:02,932][41694] Fps is (10 sec: 5324.8, 60 sec: 6553.6, 300 sec: 6623.0). Total num frames: 188923904. Throughput: 0: 1599.5. Samples: 42224232. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:10:02,935][41694] Avg episode reward: [(0, '4.476')] +[2024-11-08 06:10:03,669][42004] Updated weights for policy 0, policy_version 46126 (0.0040) +[2024-11-08 06:10:07,932][41694] Fps is (10 sec: 7273.8, 60 sec: 6553.6, 300 sec: 6623.0). Total num frames: 188960768. Throughput: 0: 1647.9. Samples: 42234666. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:10:07,934][41694] Avg episode reward: [(0, '4.478')] +[2024-11-08 06:10:09,389][42004] Updated weights for policy 0, policy_version 46136 (0.0033) +[2024-11-08 06:10:12,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6485.4, 300 sec: 6623.0). Total num frames: 188993536. Throughput: 0: 1663.1. Samples: 42245080. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:10:12,933][41694] Avg episode reward: [(0, '4.358')] +[2024-11-08 06:10:15,650][42004] Updated weights for policy 0, policy_version 46146 (0.0025) +[2024-11-08 06:10:17,933][41694] Fps is (10 sec: 6553.2, 60 sec: 6485.2, 300 sec: 6669.1). Total num frames: 189026304. Throughput: 0: 1662.3. Samples: 42249956. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:10:17,934][41694] Avg episode reward: [(0, '4.307')] +[2024-11-08 06:10:22,207][42004] Updated weights for policy 0, policy_version 46156 (0.0039) +[2024-11-08 06:10:22,931][41694] Fps is (10 sec: 6553.6, 60 sec: 6662.0, 300 sec: 6650.8). Total num frames: 189059072. Throughput: 0: 1620.9. Samples: 42259220. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:10:22,933][41694] Avg episode reward: [(0, '4.583')] +[2024-11-08 06:10:27,777][42004] Updated weights for policy 0, policy_version 46166 (0.0035) +[2024-11-08 06:10:27,931][41694] Fps is (10 sec: 6964.0, 60 sec: 6690.2, 300 sec: 6664.9). Total num frames: 189095936. Throughput: 0: 1628.5. Samples: 42269998. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:10:27,933][41694] Avg episode reward: [(0, '4.512')] +[2024-11-08 06:10:33,356][41694] Fps is (10 sec: 5500.9, 60 sec: 6372.0, 300 sec: 6599.6). Total num frames: 189116416. Throughput: 0: 1619.7. Samples: 42275502. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:10:33,358][41694] Avg episode reward: [(0, '4.463')] +[2024-11-08 06:10:35,581][42004] Updated weights for policy 0, policy_version 46176 (0.0020) +[2024-11-08 06:10:37,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6417.1, 300 sec: 6609.1). Total num frames: 189153280. Throughput: 0: 1572.9. Samples: 42282600. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:10:37,935][41694] Avg episode reward: [(0, '4.357')] +[2024-11-08 06:10:41,200][42004] Updated weights for policy 0, policy_version 46186 (0.0023) +[2024-11-08 06:10:42,932][41694] Fps is (10 sec: 7699.6, 60 sec: 6485.3, 300 sec: 6609.1). Total num frames: 189190144. Throughput: 0: 1740.4. Samples: 42293694. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:10:42,934][41694] Avg episode reward: [(0, '4.585')] +[2024-11-08 06:10:46,706][42004] Updated weights for policy 0, policy_version 46196 (0.0026) +[2024-11-08 06:10:47,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6553.6, 300 sec: 6623.0). Total num frames: 189227008. Throughput: 0: 1659.7. Samples: 42298918. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:10:47,934][41694] Avg episode reward: [(0, '4.596')] +[2024-11-08 06:10:52,388][42004] Updated weights for policy 0, policy_version 46206 (0.0044) +[2024-11-08 06:10:52,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6485.3, 300 sec: 6670.6). Total num frames: 189259776. Throughput: 0: 1675.0. Samples: 42310042. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:10:52,933][41694] Avg episode reward: [(0, '4.384')] +[2024-11-08 06:10:57,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6802.5, 300 sec: 6678.6). Total num frames: 189296640. Throughput: 0: 1671.4. Samples: 42320294. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:10:57,934][41694] Avg episode reward: [(0, '4.390')] +[2024-11-08 06:10:58,292][42004] Updated weights for policy 0, policy_version 46216 (0.0027) +[2024-11-08 06:11:02,932][41694] Fps is (10 sec: 6962.7, 60 sec: 6758.3, 300 sec: 6678.5). Total num frames: 189329408. Throughput: 0: 1680.8. Samples: 42325592. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:11:02,934][41694] Avg episode reward: [(0, '4.190')] +[2024-11-08 06:11:04,104][42004] Updated weights for policy 0, policy_version 46226 (0.0021) +[2024-11-08 06:11:07,931][41694] Fps is (10 sec: 5324.8, 60 sec: 6485.4, 300 sec: 6636.9). Total num frames: 189349888. Throughput: 0: 1680.3. Samples: 42334832. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:11:07,933][41694] Avg episode reward: [(0, '4.401')] +[2024-11-08 06:11:12,738][42004] Updated weights for policy 0, policy_version 46236 (0.0027) +[2024-11-08 06:11:12,932][41694] Fps is (10 sec: 5324.8, 60 sec: 6485.2, 300 sec: 6623.0). Total num frames: 189382656. Throughput: 0: 1601.4. Samples: 42342062. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:11:12,933][41694] Avg episode reward: [(0, '4.498')] +[2024-11-08 06:11:17,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6553.7, 300 sec: 6623.0). Total num frames: 189419520. Throughput: 0: 1603.3. Samples: 42346970. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:11:17,933][41694] Avg episode reward: [(0, '4.583')] +[2024-11-08 06:11:18,316][42004] Updated weights for policy 0, policy_version 46246 (0.0038) +[2024-11-08 06:11:22,932][41694] Fps is (10 sec: 7373.3, 60 sec: 6621.9, 300 sec: 6636.9). Total num frames: 189456384. Throughput: 0: 1683.1. Samples: 42358338. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:11:22,933][41694] Avg episode reward: [(0, '4.640')] +[2024-11-08 06:11:23,849][42004] Updated weights for policy 0, policy_version 46256 (0.0032) +[2024-11-08 06:11:27,933][41694] Fps is (10 sec: 6962.1, 60 sec: 6553.4, 300 sec: 6678.5). Total num frames: 189489152. Throughput: 0: 1674.8. Samples: 42369062. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:11:27,936][41694] Avg episode reward: [(0, '4.192')] +[2024-11-08 06:11:29,972][42004] Updated weights for policy 0, policy_version 46266 (0.0032) +[2024-11-08 06:11:32,932][41694] Fps is (10 sec: 6553.6, 60 sec: 6806.5, 300 sec: 6664.7). Total num frames: 189521920. Throughput: 0: 1666.0. Samples: 42373888. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:11:32,934][41694] Avg episode reward: [(0, '4.560')] +[2024-11-08 06:11:35,891][42004] Updated weights for policy 0, policy_version 46276 (0.0031) +[2024-11-08 06:11:37,932][41694] Fps is (10 sec: 6964.3, 60 sec: 6758.4, 300 sec: 6664.7). Total num frames: 189558784. Throughput: 0: 1652.9. Samples: 42384422. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:11:37,933][41694] Avg episode reward: [(0, '4.447')] +[2024-11-08 06:11:37,941][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000046279_189558784.pth... +[2024-11-08 06:11:38,083][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000045892_187973632.pth +[2024-11-08 06:11:42,932][41694] Fps is (10 sec: 5734.2, 60 sec: 6485.3, 300 sec: 6609.1). Total num frames: 189579264. Throughput: 0: 1578.1. Samples: 42391310. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:11:42,936][41694] Avg episode reward: [(0, '4.508')] +[2024-11-08 06:11:43,923][42004] Updated weights for policy 0, policy_version 46286 (0.0026) +[2024-11-08 06:11:47,934][41694] Fps is (10 sec: 5732.8, 60 sec: 6485.0, 300 sec: 6609.1). Total num frames: 189616128. Throughput: 0: 1570.0. Samples: 42396244. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:11:47,937][41694] Avg episode reward: [(0, '4.295')] +[2024-11-08 06:11:49,681][42004] Updated weights for policy 0, policy_version 46296 (0.0026) +[2024-11-08 06:11:52,931][41694] Fps is (10 sec: 7373.1, 60 sec: 6553.6, 300 sec: 6623.0). Total num frames: 189652992. Throughput: 0: 1609.3. Samples: 42407252. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:11:52,933][41694] Avg episode reward: [(0, '4.417')] +[2024-11-08 06:11:55,183][42004] Updated weights for policy 0, policy_version 46306 (0.0032) +[2024-11-08 06:11:57,932][41694] Fps is (10 sec: 6964.9, 60 sec: 6485.3, 300 sec: 6623.0). Total num frames: 189685760. Throughput: 0: 1690.3. Samples: 42418124. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:11:57,934][41694] Avg episode reward: [(0, '4.607')] +[2024-11-08 06:12:00,795][42004] Updated weights for policy 0, policy_version 46316 (0.0025) +[2024-11-08 06:12:02,932][41694] Fps is (10 sec: 6963.0, 60 sec: 6553.7, 300 sec: 6678.6). Total num frames: 189722624. Throughput: 0: 1706.3. Samples: 42423752. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:12:02,934][41694] Avg episode reward: [(0, '4.442')] +[2024-11-08 06:12:07,406][42004] Updated weights for policy 0, policy_version 46326 (0.0036) +[2024-11-08 06:12:07,932][41694] Fps is (10 sec: 6553.5, 60 sec: 6690.1, 300 sec: 6650.8). Total num frames: 189751296. Throughput: 0: 1656.3. Samples: 42432872. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:12:07,934][41694] Avg episode reward: [(0, '4.451')] +[2024-11-08 06:12:12,931][41694] Fps is (10 sec: 6553.7, 60 sec: 6758.5, 300 sec: 6650.8). Total num frames: 189788160. Throughput: 0: 1651.3. Samples: 42443366. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:12:12,933][41694] Avg episode reward: [(0, '4.512')] +[2024-11-08 06:12:13,123][42004] Updated weights for policy 0, policy_version 46336 (0.0028) +[2024-11-08 06:12:17,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6485.3, 300 sec: 6595.2). Total num frames: 189808640. Throughput: 0: 1642.2. Samples: 42447786. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:12:17,934][41694] Avg episode reward: [(0, '4.344')] +[2024-11-08 06:12:20,917][42004] Updated weights for policy 0, policy_version 46346 (0.0026) +[2024-11-08 06:12:22,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6485.3, 300 sec: 6595.3). Total num frames: 189845504. Throughput: 0: 1590.8. Samples: 42456008. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:12:22,933][41694] Avg episode reward: [(0, '4.389')] +[2024-11-08 06:12:26,584][42004] Updated weights for policy 0, policy_version 46356 (0.0032) +[2024-11-08 06:12:27,932][41694] Fps is (10 sec: 7373.1, 60 sec: 6553.8, 300 sec: 6595.3). Total num frames: 189882368. Throughput: 0: 1678.5. Samples: 42466842. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:12:27,934][41694] Avg episode reward: [(0, '4.449')] +[2024-11-08 06:12:31,984][42004] Updated weights for policy 0, policy_version 46366 (0.0030) +[2024-11-08 06:12:32,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6621.9, 300 sec: 6609.2). Total num frames: 189919232. Throughput: 0: 1688.5. Samples: 42472222. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:12:32,933][41694] Avg episode reward: [(0, '4.679')] +[2024-11-08 06:12:37,714][42004] Updated weights for policy 0, policy_version 46376 (0.0037) +[2024-11-08 06:12:37,932][41694] Fps is (10 sec: 7372.6, 60 sec: 6621.8, 300 sec: 6664.7). Total num frames: 189956096. Throughput: 0: 1695.1. Samples: 42483530. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:12:37,935][41694] Avg episode reward: [(0, '4.793')] +[2024-11-08 06:12:42,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6826.7, 300 sec: 6650.8). Total num frames: 189988864. Throughput: 0: 1672.3. Samples: 42493378. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:12:42,933][41694] Avg episode reward: [(0, '4.455')] +[2024-11-08 06:12:43,771][42004] Updated weights for policy 0, policy_version 46386 (0.0045) +[2024-11-08 06:12:47,931][41694] Fps is (10 sec: 6963.4, 60 sec: 6827.0, 300 sec: 6650.8). Total num frames: 190025728. Throughput: 0: 1668.1. Samples: 42498818. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:12:47,933][41694] Avg episode reward: [(0, '4.245')] +[2024-11-08 06:12:51,728][42004] Updated weights for policy 0, policy_version 46396 (0.0032) +[2024-11-08 06:12:52,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6553.6, 300 sec: 6595.2). Total num frames: 190046208. Throughput: 0: 1621.7. Samples: 42505846. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:12:52,934][41694] Avg episode reward: [(0, '4.326')] +[2024-11-08 06:12:57,341][42004] Updated weights for policy 0, policy_version 46406 (0.0024) +[2024-11-08 06:12:57,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6621.9, 300 sec: 6609.1). Total num frames: 190083072. Throughput: 0: 1628.5. Samples: 42516648. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:12:57,934][41694] Avg episode reward: [(0, '4.427')] +[2024-11-08 06:13:02,932][41694] Fps is (10 sec: 6963.3, 60 sec: 6553.6, 300 sec: 6609.1). Total num frames: 190115840. Throughput: 0: 1653.0. Samples: 42522172. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:13:02,934][41694] Avg episode reward: [(0, '4.366')] +[2024-11-08 06:13:03,099][42004] Updated weights for policy 0, policy_version 46416 (0.0037) +[2024-11-08 06:13:07,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6690.2, 300 sec: 6609.1). Total num frames: 190152704. Throughput: 0: 1706.9. Samples: 42532820. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:13:07,934][41694] Avg episode reward: [(0, '4.540')] +[2024-11-08 06:13:08,622][42004] Updated weights for policy 0, policy_version 46426 (0.0032) +[2024-11-08 06:13:12,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6690.1, 300 sec: 6678.6). Total num frames: 190189568. Throughput: 0: 1704.7. Samples: 42543554. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:13:12,933][41694] Avg episode reward: [(0, '4.432')] +[2024-11-08 06:13:14,751][42004] Updated weights for policy 0, policy_version 46436 (0.0033) +[2024-11-08 06:13:17,931][41694] Fps is (10 sec: 6963.4, 60 sec: 6895.0, 300 sec: 6664.7). Total num frames: 190222336. Throughput: 0: 1692.6. Samples: 42548388. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:13:17,933][41694] Avg episode reward: [(0, '4.597')] +[2024-11-08 06:13:20,504][42004] Updated weights for policy 0, policy_version 46446 (0.0040) +[2024-11-08 06:13:22,932][41694] Fps is (10 sec: 6962.9, 60 sec: 6894.9, 300 sec: 6664.7). Total num frames: 190259200. Throughput: 0: 1682.1. Samples: 42559224. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:13:22,934][41694] Avg episode reward: [(0, '4.487')] +[2024-11-08 06:13:27,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6621.9, 300 sec: 6609.1). Total num frames: 190279680. Throughput: 0: 1614.3. Samples: 42566022. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:13:27,935][41694] Avg episode reward: [(0, '4.557')] +[2024-11-08 06:13:28,409][42004] Updated weights for policy 0, policy_version 46456 (0.0021) +[2024-11-08 06:13:32,932][41694] Fps is (10 sec: 5734.6, 60 sec: 6621.9, 300 sec: 6609.2). Total num frames: 190316544. Throughput: 0: 1614.1. Samples: 42571454. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:13:32,933][41694] Avg episode reward: [(0, '4.449')] +[2024-11-08 06:13:33,886][42004] Updated weights for policy 0, policy_version 46466 (0.0038) +[2024-11-08 06:13:37,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6621.9, 300 sec: 6623.0). Total num frames: 190353408. Throughput: 0: 1702.8. Samples: 42582470. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:13:37,933][41694] Avg episode reward: [(0, '4.491')] +[2024-11-08 06:13:37,946][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000046473_190353408.pth... +[2024-11-08 06:13:38,085][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000046086_188768256.pth +[2024-11-08 06:13:39,981][42004] Updated weights for policy 0, policy_version 46476 (0.0038) +[2024-11-08 06:13:42,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6621.9, 300 sec: 6609.1). Total num frames: 190386176. Throughput: 0: 1685.3. Samples: 42592488. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:13:42,934][41694] Avg episode reward: [(0, '4.280')] +[2024-11-08 06:13:46,017][42004] Updated weights for policy 0, policy_version 46486 (0.0032) +[2024-11-08 06:13:47,932][41694] Fps is (10 sec: 6144.0, 60 sec: 6485.3, 300 sec: 6636.9). Total num frames: 190414848. Throughput: 0: 1677.1. Samples: 42597642. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:13:47,934][41694] Avg episode reward: [(0, '4.489')] +[2024-11-08 06:13:51,849][42004] Updated weights for policy 0, policy_version 46496 (0.0025) +[2024-11-08 06:13:52,931][41694] Fps is (10 sec: 6553.7, 60 sec: 6758.4, 300 sec: 6636.9). Total num frames: 190451712. Throughput: 0: 1664.8. Samples: 42607734. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:13:52,933][41694] Avg episode reward: [(0, '4.500')] +[2024-11-08 06:13:57,382][42004] Updated weights for policy 0, policy_version 46506 (0.0034) +[2024-11-08 06:13:59,989][41694] Fps is (10 sec: 6114.7, 60 sec: 6534.3, 300 sec: 6590.9). Total num frames: 190488576. Throughput: 0: 1605.8. Samples: 42619120. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:13:59,992][41694] Avg episode reward: [(0, '4.566')] +[2024-11-08 06:14:02,931][41694] Fps is (10 sec: 5734.3, 60 sec: 6553.6, 300 sec: 6581.4). Total num frames: 190509056. Throughput: 0: 1603.5. Samples: 42620546. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:14:02,934][41694] Avg episode reward: [(0, '4.586')] +[2024-11-08 06:14:05,510][42004] Updated weights for policy 0, policy_version 46516 (0.0023) +[2024-11-08 06:14:07,932][41694] Fps is (10 sec: 7219.9, 60 sec: 6553.6, 300 sec: 6581.4). Total num frames: 190545920. Throughput: 0: 1595.2. Samples: 42631006. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 06:14:07,933][41694] Avg episode reward: [(0, '4.498')] +[2024-11-08 06:14:11,012][42004] Updated weights for policy 0, policy_version 46526 (0.0031) +[2024-11-08 06:14:12,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6553.6, 300 sec: 6595.3). Total num frames: 190582784. Throughput: 0: 1691.3. Samples: 42642130. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 06:14:12,933][41694] Avg episode reward: [(0, '4.431')] +[2024-11-08 06:14:16,734][42004] Updated weights for policy 0, policy_version 46536 (0.0025) +[2024-11-08 06:14:17,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6621.8, 300 sec: 6645.0). Total num frames: 190619648. Throughput: 0: 1681.8. Samples: 42647134. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 06:14:17,933][41694] Avg episode reward: [(0, '4.596')] +[2024-11-08 06:14:22,932][41694] Fps is (10 sec: 6553.3, 60 sec: 6485.3, 300 sec: 6623.0). Total num frames: 190648320. Throughput: 0: 1662.3. Samples: 42657276. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 06:14:22,933][41694] Avg episode reward: [(0, '4.546')] +[2024-11-08 06:14:23,038][42004] Updated weights for policy 0, policy_version 46546 (0.0040) +[2024-11-08 06:14:27,932][41694] Fps is (10 sec: 6553.5, 60 sec: 6758.4, 300 sec: 6623.0). Total num frames: 190685184. Throughput: 0: 1682.1. Samples: 42668182. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 06:14:27,933][41694] Avg episode reward: [(0, '4.421')] +[2024-11-08 06:14:28,469][42004] Updated weights for policy 0, policy_version 46556 (0.0029) +[2024-11-08 06:14:34,449][41694] Fps is (10 sec: 6046.0, 60 sec: 6525.1, 300 sec: 6575.3). Total num frames: 190717952. Throughput: 0: 1635.8. Samples: 42673736. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0) +[2024-11-08 06:14:34,452][41694] Avg episode reward: [(0, '4.418')] +[2024-11-08 06:14:36,233][42004] Updated weights for policy 0, policy_version 46566 (0.0037) +[2024-11-08 06:14:37,931][41694] Fps is (10 sec: 5734.5, 60 sec: 6485.3, 300 sec: 6581.4). Total num frames: 190742528. Throughput: 0: 1623.8. Samples: 42680804. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:14:37,933][41694] Avg episode reward: [(0, '4.293')] +[2024-11-08 06:14:41,881][42004] Updated weights for policy 0, policy_version 46576 (0.0041) +[2024-11-08 06:14:42,933][41694] Fps is (10 sec: 7241.7, 60 sec: 6553.4, 300 sec: 6595.2). Total num frames: 190779392. Throughput: 0: 1689.2. Samples: 42691660. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:14:42,935][41694] Avg episode reward: [(0, '4.249')] +[2024-11-08 06:14:47,843][42004] Updated weights for policy 0, policy_version 46586 (0.0037) +[2024-11-08 06:14:47,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6690.1, 300 sec: 6595.3). Total num frames: 190816256. Throughput: 0: 1687.2. Samples: 42696470. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:14:47,934][41694] Avg episode reward: [(0, '4.435')] +[2024-11-08 06:14:52,932][41694] Fps is (10 sec: 6964.1, 60 sec: 6621.8, 300 sec: 6645.3). Total num frames: 190849024. Throughput: 0: 1696.8. Samples: 42707364. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:14:52,940][41694] Avg episode reward: [(0, '4.366')] +[2024-11-08 06:14:53,955][42004] Updated weights for policy 0, policy_version 46596 (0.0050) +[2024-11-08 06:14:57,932][41694] Fps is (10 sec: 6553.7, 60 sec: 6786.3, 300 sec: 6636.9). Total num frames: 190881792. Throughput: 0: 1656.0. Samples: 42716652. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:14:57,933][41694] Avg episode reward: [(0, '4.494')] +[2024-11-08 06:14:59,970][42004] Updated weights for policy 0, policy_version 46606 (0.0038) +[2024-11-08 06:15:02,936][41694] Fps is (10 sec: 6550.5, 60 sec: 6757.8, 300 sec: 6622.9). Total num frames: 190914560. Throughput: 0: 1666.6. Samples: 42722140. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:15:02,940][41694] Avg episode reward: [(0, '4.468')] +[2024-11-08 06:15:05,883][42004] Updated weights for policy 0, policy_version 46616 (0.0056) +[2024-11-08 06:15:08,948][41694] Fps is (10 sec: 5577.3, 60 sec: 6511.6, 300 sec: 6586.4). Total num frames: 190943232. Throughput: 0: 1636.1. Samples: 42732564. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:15:08,951][41694] Avg episode reward: [(0, '4.459')] +[2024-11-08 06:15:12,931][41694] Fps is (10 sec: 5737.2, 60 sec: 6485.3, 300 sec: 6595.3). Total num frames: 190971904. Throughput: 0: 1586.7. Samples: 42739584. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:15:12,934][41694] Avg episode reward: [(0, '4.467')] +[2024-11-08 06:15:13,611][42004] Updated weights for policy 0, policy_version 46626 (0.0022) +[2024-11-08 06:15:17,932][41694] Fps is (10 sec: 6838.9, 60 sec: 6417.1, 300 sec: 6595.3). Total num frames: 191004672. Throughput: 0: 1631.6. Samples: 42744684. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:15:17,933][41694] Avg episode reward: [(0, '4.330')] +[2024-11-08 06:15:19,871][42004] Updated weights for policy 0, policy_version 46636 (0.0053) +[2024-11-08 06:15:22,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6553.6, 300 sec: 6595.2). Total num frames: 191041536. Throughput: 0: 1644.8. Samples: 42754820. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:15:22,933][41694] Avg episode reward: [(0, '4.591')] +[2024-11-08 06:15:25,393][42004] Updated weights for policy 0, policy_version 46646 (0.0030) +[2024-11-08 06:15:27,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6485.4, 300 sec: 6646.5). Total num frames: 191074304. Throughput: 0: 1643.3. Samples: 42765606. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:15:27,937][41694] Avg episode reward: [(0, '4.503')] +[2024-11-08 06:15:31,428][42004] Updated weights for policy 0, policy_version 46656 (0.0042) +[2024-11-08 06:15:32,932][41694] Fps is (10 sec: 6963.0, 60 sec: 6723.6, 300 sec: 6636.9). Total num frames: 191111168. Throughput: 0: 1646.3. Samples: 42770554. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:15:32,934][41694] Avg episode reward: [(0, '4.303')] +[2024-11-08 06:15:36,974][42004] Updated weights for policy 0, policy_version 46666 (0.0021) +[2024-11-08 06:15:37,935][41694] Fps is (10 sec: 7371.2, 60 sec: 6758.1, 300 sec: 6636.9). Total num frames: 191148032. Throughput: 0: 1651.7. Samples: 42781692. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:15:37,940][41694] Avg episode reward: [(0, '4.410')] +[2024-11-08 06:15:38,005][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000046668_191152128.pth... +[2024-11-08 06:15:38,153][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000046279_189558784.pth +[2024-11-08 06:15:43,509][41694] Fps is (10 sec: 5808.8, 60 sec: 6491.3, 300 sec: 6582.4). Total num frames: 191172608. Throughput: 0: 1552.1. Samples: 42787394. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:15:43,512][41694] Avg episode reward: [(0, '4.537')] +[2024-11-08 06:15:44,954][42004] Updated weights for policy 0, policy_version 46676 (0.0033) +[2024-11-08 06:15:47,932][41694] Fps is (10 sec: 5735.6, 60 sec: 6485.3, 300 sec: 6595.3). Total num frames: 191205376. Throughput: 0: 1595.4. Samples: 42793926. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:15:47,934][41694] Avg episode reward: [(0, '4.400')] +[2024-11-08 06:15:50,605][42004] Updated weights for policy 0, policy_version 46686 (0.0021) +[2024-11-08 06:15:52,932][41694] Fps is (10 sec: 7389.9, 60 sec: 6553.6, 300 sec: 6595.3). Total num frames: 191242240. Throughput: 0: 1642.7. Samples: 42804816. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:15:52,933][41694] Avg episode reward: [(0, '4.234')] +[2024-11-08 06:15:56,080][42004] Updated weights for policy 0, policy_version 46696 (0.0028) +[2024-11-08 06:15:57,932][41694] Fps is (10 sec: 7372.3, 60 sec: 6621.8, 300 sec: 6609.1). Total num frames: 191279104. Throughput: 0: 1700.0. Samples: 42816086. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:15:57,934][41694] Avg episode reward: [(0, '4.333')] +[2024-11-08 06:16:02,185][42004] Updated weights for policy 0, policy_version 46706 (0.0034) +[2024-11-08 06:16:02,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6622.4, 300 sec: 6650.8). Total num frames: 191311872. Throughput: 0: 1701.7. Samples: 42821258. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:16:02,933][41694] Avg episode reward: [(0, '4.724')] +[2024-11-08 06:16:07,829][42004] Updated weights for policy 0, policy_version 46716 (0.0027) +[2024-11-08 06:16:07,932][41694] Fps is (10 sec: 6963.6, 60 sec: 6874.8, 300 sec: 6664.7). Total num frames: 191348736. Throughput: 0: 1695.1. Samples: 42831100. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:16:07,935][41694] Avg episode reward: [(0, '4.466')] +[2024-11-08 06:16:12,932][41694] Fps is (10 sec: 6963.0, 60 sec: 6826.6, 300 sec: 6650.8). Total num frames: 191381504. Throughput: 0: 1681.0. Samples: 42841250. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:16:12,934][41694] Avg episode reward: [(0, '4.553')] +[2024-11-08 06:16:14,032][42004] Updated weights for policy 0, policy_version 46726 (0.0031) +[2024-11-08 06:16:18,054][41694] Fps is (10 sec: 4855.6, 60 sec: 6540.2, 300 sec: 6578.6). Total num frames: 191397888. Throughput: 0: 1680.5. Samples: 42846382. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:16:18,056][41694] Avg episode reward: [(0, '4.637')] +[2024-11-08 06:16:22,200][42004] Updated weights for policy 0, policy_version 46736 (0.0022) +[2024-11-08 06:16:22,932][41694] Fps is (10 sec: 5324.9, 60 sec: 6553.6, 300 sec: 6595.3). Total num frames: 191434752. Throughput: 0: 1587.5. Samples: 42853124. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:16:22,933][41694] Avg episode reward: [(0, '4.350')] +[2024-11-08 06:16:27,471][42004] Updated weights for policy 0, policy_version 46746 (0.0028) +[2024-11-08 06:16:27,931][41694] Fps is (10 sec: 7464.5, 60 sec: 6621.9, 300 sec: 6609.1). Total num frames: 191471616. Throughput: 0: 1734.9. Samples: 42864464. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:16:27,933][41694] Avg episode reward: [(0, '4.371')] +[2024-11-08 06:16:32,493][42004] Updated weights for policy 0, policy_version 46756 (0.0022) +[2024-11-08 06:16:32,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6690.2, 300 sec: 6623.0). Total num frames: 191512576. Throughput: 0: 1701.4. Samples: 42870490. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:16:32,934][41694] Avg episode reward: [(0, '4.415')] +[2024-11-08 06:16:37,931][41694] Fps is (10 sec: 7782.5, 60 sec: 6690.4, 300 sec: 6678.6). Total num frames: 191549440. Throughput: 0: 1705.7. Samples: 42881574. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:16:37,933][41694] Avg episode reward: [(0, '4.415')] +[2024-11-08 06:16:38,306][42004] Updated weights for policy 0, policy_version 46766 (0.0031) +[2024-11-08 06:16:42,931][41694] Fps is (10 sec: 7782.5, 60 sec: 7030.9, 300 sec: 6692.5). Total num frames: 191590400. Throughput: 0: 1715.1. Samples: 42893266. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:16:42,933][41694] Avg episode reward: [(0, '4.373')] +[2024-11-08 06:16:43,372][42004] Updated weights for policy 0, policy_version 46776 (0.0024) +[2024-11-08 06:16:47,932][41694] Fps is (10 sec: 7782.2, 60 sec: 7031.4, 300 sec: 6692.4). Total num frames: 191627264. Throughput: 0: 1729.1. Samples: 42899070. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:16:47,933][41694] Avg episode reward: [(0, '4.322')] +[2024-11-08 06:16:48,720][42004] Updated weights for policy 0, policy_version 46786 (0.0032) +[2024-11-08 06:16:52,931][41694] Fps is (10 sec: 5324.8, 60 sec: 6690.1, 300 sec: 6636.9). Total num frames: 191643648. Throughput: 0: 1725.6. Samples: 42908752. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:16:52,933][41694] Avg episode reward: [(0, '4.318')] +[2024-11-08 06:16:57,234][42004] Updated weights for policy 0, policy_version 46796 (0.0034) +[2024-11-08 06:16:57,932][41694] Fps is (10 sec: 5324.8, 60 sec: 6690.2, 300 sec: 6636.9). Total num frames: 191680512. Throughput: 0: 1666.4. Samples: 42916240. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:16:57,934][41694] Avg episode reward: [(0, '4.392')] +[2024-11-08 06:17:02,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6690.1, 300 sec: 6650.8). Total num frames: 191713280. Throughput: 0: 1673.9. Samples: 42921500. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:17:02,933][41694] Avg episode reward: [(0, '4.533')] +[2024-11-08 06:17:03,195][42004] Updated weights for policy 0, policy_version 46806 (0.0028) +[2024-11-08 06:17:07,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6690.2, 300 sec: 6650.8). Total num frames: 191750144. Throughput: 0: 1757.0. Samples: 42932188. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:17:07,933][41694] Avg episode reward: [(0, '4.577')] +[2024-11-08 06:17:08,942][42004] Updated weights for policy 0, policy_version 46816 (0.0029) +[2024-11-08 06:17:12,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6690.2, 300 sec: 6692.5). Total num frames: 191782912. Throughput: 0: 1724.8. Samples: 42942078. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:17:12,932][41694] Avg episode reward: [(0, '4.212')] +[2024-11-08 06:17:14,755][42004] Updated weights for policy 0, policy_version 46826 (0.0028) +[2024-11-08 06:17:17,931][41694] Fps is (10 sec: 7372.8, 60 sec: 7114.3, 300 sec: 6706.3). Total num frames: 191823872. Throughput: 0: 1722.0. Samples: 42947978. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:17:17,933][41694] Avg episode reward: [(0, '4.508')] +[2024-11-08 06:17:20,015][42004] Updated weights for policy 0, policy_version 46836 (0.0026) +[2024-11-08 06:17:22,932][41694] Fps is (10 sec: 7782.0, 60 sec: 7099.7, 300 sec: 6706.3). Total num frames: 191860736. Throughput: 0: 1733.8. Samples: 42959594. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:17:22,934][41694] Avg episode reward: [(0, '4.607')] +[2024-11-08 06:17:27,870][42004] Updated weights for policy 0, policy_version 46846 (0.0031) +[2024-11-08 06:17:27,931][41694] Fps is (10 sec: 5734.4, 60 sec: 6826.7, 300 sec: 6650.8). Total num frames: 191881216. Throughput: 0: 1642.3. Samples: 42967170. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:17:27,933][41694] Avg episode reward: [(0, '4.517')] +[2024-11-08 06:17:32,932][41694] Fps is (10 sec: 5734.6, 60 sec: 6758.4, 300 sec: 6650.8). Total num frames: 191918080. Throughput: 0: 1615.0. Samples: 42971744. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:17:32,934][41694] Avg episode reward: [(0, '4.360')] +[2024-11-08 06:17:33,292][42004] Updated weights for policy 0, policy_version 46856 (0.0031) +[2024-11-08 06:17:37,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6758.4, 300 sec: 6664.7). Total num frames: 191954944. Throughput: 0: 1643.9. Samples: 42982726. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:17:37,944][41694] Avg episode reward: [(0, '4.374')] +[2024-11-08 06:17:37,978][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000046864_191954944.pth... +[2024-11-08 06:17:38,081][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000046473_190353408.pth +[2024-11-08 06:17:38,868][42004] Updated weights for policy 0, policy_version 46866 (0.0025) +[2024-11-08 06:17:42,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6690.1, 300 sec: 6664.7). Total num frames: 191991808. Throughput: 0: 1737.7. Samples: 42994438. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:17:42,933][41694] Avg episode reward: [(0, '4.452')] +[2024-11-08 06:17:44,004][42004] Updated weights for policy 0, policy_version 46876 (0.0032) +[2024-11-08 06:17:47,932][41694] Fps is (10 sec: 7782.0, 60 sec: 6758.3, 300 sec: 6734.1). Total num frames: 192032768. Throughput: 0: 1754.8. Samples: 43000468. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:17:47,935][41694] Avg episode reward: [(0, '4.589')] +[2024-11-08 06:17:49,321][42004] Updated weights for policy 0, policy_version 46886 (0.0021) +[2024-11-08 06:17:52,931][41694] Fps is (10 sec: 8192.0, 60 sec: 7168.0, 300 sec: 6748.0). Total num frames: 192073728. Throughput: 0: 1777.8. Samples: 43012188. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:17:52,935][41694] Avg episode reward: [(0, '4.563')] +[2024-11-08 06:17:54,531][42004] Updated weights for policy 0, policy_version 46896 (0.0019) +[2024-11-08 06:17:57,931][41694] Fps is (10 sec: 7783.0, 60 sec: 7168.0, 300 sec: 6761.9). Total num frames: 192110592. Throughput: 0: 1814.8. Samples: 43023742. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:17:57,933][41694] Avg episode reward: [(0, '4.434')] +[2024-11-08 06:18:02,931][41694] Fps is (10 sec: 4915.2, 60 sec: 6826.7, 300 sec: 6678.6). Total num frames: 192122880. Throughput: 0: 1777.9. Samples: 43027984. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:18:02,934][41694] Avg episode reward: [(0, '4.636')] +[2024-11-08 06:18:03,168][42004] Updated weights for policy 0, policy_version 46906 (0.0025) +[2024-11-08 06:18:07,931][41694] Fps is (10 sec: 4505.6, 60 sec: 6758.4, 300 sec: 6664.7). Total num frames: 192155648. Throughput: 0: 1640.2. Samples: 43033402. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:18:07,933][41694] Avg episode reward: [(0, '4.358')] +[2024-11-08 06:18:09,907][42004] Updated weights for policy 0, policy_version 46916 (0.0039) +[2024-11-08 06:18:12,931][41694] Fps is (10 sec: 6553.6, 60 sec: 6758.4, 300 sec: 6664.7). Total num frames: 192188416. Throughput: 0: 1689.7. Samples: 43043208. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:18:12,933][41694] Avg episode reward: [(0, '4.487')] +[2024-11-08 06:18:15,332][42004] Updated weights for policy 0, policy_version 46926 (0.0029) +[2024-11-08 06:18:17,932][41694] Fps is (10 sec: 6962.8, 60 sec: 6690.1, 300 sec: 6664.7). Total num frames: 192225280. Throughput: 0: 1719.9. Samples: 43049138. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:18:17,935][41694] Avg episode reward: [(0, '4.384')] +[2024-11-08 06:18:20,506][42004] Updated weights for policy 0, policy_version 46936 (0.0029) +[2024-11-08 06:18:22,932][41694] Fps is (10 sec: 7782.3, 60 sec: 6758.4, 300 sec: 6734.1). Total num frames: 192266240. Throughput: 0: 1739.5. Samples: 43061002. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:18:22,934][41694] Avg episode reward: [(0, '4.307')] +[2024-11-08 06:18:25,857][42004] Updated weights for policy 0, policy_version 46946 (0.0028) +[2024-11-08 06:18:27,931][41694] Fps is (10 sec: 8192.4, 60 sec: 7099.7, 300 sec: 6748.0). Total num frames: 192307200. Throughput: 0: 1741.0. Samples: 43072784. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:18:27,932][41694] Avg episode reward: [(0, '4.480')] +[2024-11-08 06:18:31,108][42004] Updated weights for policy 0, policy_version 46956 (0.0031) +[2024-11-08 06:18:32,932][41694] Fps is (10 sec: 7782.4, 60 sec: 7099.7, 300 sec: 6748.0). Total num frames: 192344064. Throughput: 0: 1728.9. Samples: 43078268. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:18:32,933][41694] Avg episode reward: [(0, '4.386')] +[2024-11-08 06:18:37,933][41694] Fps is (10 sec: 5323.8, 60 sec: 6758.2, 300 sec: 6692.4). Total num frames: 192360448. Throughput: 0: 1655.9. Samples: 43086706. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:18:37,935][41694] Avg episode reward: [(0, '4.287')] +[2024-11-08 06:18:39,743][42004] Updated weights for policy 0, policy_version 46966 (0.0039) +[2024-11-08 06:18:42,932][41694] Fps is (10 sec: 4505.5, 60 sec: 6621.8, 300 sec: 6692.4). Total num frames: 192389120. Throughput: 0: 1564.3. Samples: 43094136. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:18:42,938][41694] Avg episode reward: [(0, '4.569')] +[2024-11-08 06:18:45,866][42004] Updated weights for policy 0, policy_version 46976 (0.0037) +[2024-11-08 06:18:47,932][41694] Fps is (10 sec: 6554.7, 60 sec: 6553.7, 300 sec: 6692.4). Total num frames: 192425984. Throughput: 0: 1584.8. Samples: 43099298. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:18:47,934][41694] Avg episode reward: [(0, '4.591')] +[2024-11-08 06:18:50,995][42004] Updated weights for policy 0, policy_version 46986 (0.0018) +[2024-11-08 06:18:52,931][41694] Fps is (10 sec: 7782.7, 60 sec: 6553.6, 300 sec: 6753.4). Total num frames: 192466944. Throughput: 0: 1728.4. Samples: 43111182. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:18:52,934][41694] Avg episode reward: [(0, '4.360')] +[2024-11-08 06:18:56,323][42004] Updated weights for policy 0, policy_version 46996 (0.0028) +[2024-11-08 06:18:57,932][41694] Fps is (10 sec: 7782.2, 60 sec: 6553.6, 300 sec: 6761.9). Total num frames: 192503808. Throughput: 0: 1769.7. Samples: 43122844. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:18:57,935][41694] Avg episode reward: [(0, '4.476')] +[2024-11-08 06:19:01,901][42004] Updated weights for policy 0, policy_version 47006 (0.0034) +[2024-11-08 06:19:02,931][41694] Fps is (10 sec: 7782.4, 60 sec: 7031.5, 300 sec: 6775.8). Total num frames: 192544768. Throughput: 0: 1759.7. Samples: 43128326. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:19:02,935][41694] Avg episode reward: [(0, '4.534')] +[2024-11-08 06:19:07,188][42004] Updated weights for policy 0, policy_version 47016 (0.0027) +[2024-11-08 06:19:07,932][41694] Fps is (10 sec: 7782.5, 60 sec: 7099.7, 300 sec: 6775.8). Total num frames: 192581632. Throughput: 0: 1745.8. Samples: 43139564. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:19:07,933][41694] Avg episode reward: [(0, '4.573')] +[2024-11-08 06:19:12,932][41694] Fps is (10 sec: 5324.8, 60 sec: 6826.6, 300 sec: 6706.3). Total num frames: 192598016. Throughput: 0: 1620.1. Samples: 43145690. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:19:12,934][41694] Avg episode reward: [(0, '4.446')] +[2024-11-08 06:19:16,110][42004] Updated weights for policy 0, policy_version 47026 (0.0035) +[2024-11-08 06:19:17,931][41694] Fps is (10 sec: 4505.7, 60 sec: 6690.2, 300 sec: 6706.3). Total num frames: 192626688. Throughput: 0: 1600.1. Samples: 43150272. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:19:17,933][41694] Avg episode reward: [(0, '4.397')] +[2024-11-08 06:19:21,696][42004] Updated weights for policy 0, policy_version 47036 (0.0028) +[2024-11-08 06:19:22,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6690.1, 300 sec: 6720.2). Total num frames: 192667648. Throughput: 0: 1649.6. Samples: 43160934. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:19:22,933][41694] Avg episode reward: [(0, '4.585')] +[2024-11-08 06:19:27,051][42004] Updated weights for policy 0, policy_version 47046 (0.0036) +[2024-11-08 06:19:27,932][41694] Fps is (10 sec: 7782.2, 60 sec: 6621.8, 300 sec: 6768.9). Total num frames: 192704512. Throughput: 0: 1744.7. Samples: 43172646. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:19:27,934][41694] Avg episode reward: [(0, '4.434')] +[2024-11-08 06:19:32,117][42004] Updated weights for policy 0, policy_version 47056 (0.0034) +[2024-11-08 06:19:32,932][41694] Fps is (10 sec: 7782.3, 60 sec: 6690.1, 300 sec: 6789.6). Total num frames: 192745472. Throughput: 0: 1759.7. Samples: 43178484. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:19:32,934][41694] Avg episode reward: [(0, '4.525')] +[2024-11-08 06:19:37,261][42004] Updated weights for policy 0, policy_version 47066 (0.0039) +[2024-11-08 06:19:37,933][41694] Fps is (10 sec: 8191.2, 60 sec: 7099.8, 300 sec: 6803.5). Total num frames: 192786432. Throughput: 0: 1769.4. Samples: 43190806. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:19:37,934][41694] Avg episode reward: [(0, '4.527')] +[2024-11-08 06:19:37,948][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000047067_192786432.pth... +[2024-11-08 06:19:38,071][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000046668_191152128.pth +[2024-11-08 06:19:42,771][42004] Updated weights for policy 0, policy_version 47076 (0.0046) +[2024-11-08 06:19:42,933][41694] Fps is (10 sec: 7781.5, 60 sec: 7236.2, 300 sec: 6803.5). Total num frames: 192823296. Throughput: 0: 1758.1. Samples: 43201958. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:19:42,935][41694] Avg episode reward: [(0, '4.504')] +[2024-11-08 06:19:47,931][41694] Fps is (10 sec: 5325.4, 60 sec: 6894.9, 300 sec: 6748.0). Total num frames: 192839680. Throughput: 0: 1713.1. Samples: 43205414. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:19:47,933][41694] Avg episode reward: [(0, '4.324')] +[2024-11-08 06:19:51,476][42004] Updated weights for policy 0, policy_version 47086 (0.0028) +[2024-11-08 06:19:52,932][41694] Fps is (10 sec: 4915.7, 60 sec: 6758.4, 300 sec: 6748.0). Total num frames: 192872448. Throughput: 0: 1627.1. Samples: 43212786. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:19:52,934][41694] Avg episode reward: [(0, '4.556')] +[2024-11-08 06:19:56,968][42004] Updated weights for policy 0, policy_version 47096 (0.0026) +[2024-11-08 06:19:57,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6758.4, 300 sec: 6762.0). Total num frames: 192909312. Throughput: 0: 1740.0. Samples: 43223992. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:19:57,934][41694] Avg episode reward: [(0, '4.337')] +[2024-11-08 06:20:02,328][42004] Updated weights for policy 0, policy_version 47106 (0.0038) +[2024-11-08 06:20:02,931][41694] Fps is (10 sec: 7782.7, 60 sec: 6758.4, 300 sec: 6827.0). Total num frames: 192950272. Throughput: 0: 1766.5. Samples: 43229766. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:20:02,933][41694] Avg episode reward: [(0, '4.789')] +[2024-11-08 06:20:07,408][42004] Updated weights for policy 0, policy_version 47116 (0.0032) +[2024-11-08 06:20:07,931][41694] Fps is (10 sec: 8192.1, 60 sec: 6826.7, 300 sec: 6845.2). Total num frames: 192991232. Throughput: 0: 1788.4. Samples: 43241414. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:20:07,933][41694] Avg episode reward: [(0, '4.434')] +[2024-11-08 06:20:12,933][41694] Fps is (10 sec: 7371.5, 60 sec: 7099.6, 300 sec: 6845.1). Total num frames: 193024000. Throughput: 0: 1785.9. Samples: 43253016. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:20:12,935][41694] Avg episode reward: [(0, '4.573')] +[2024-11-08 06:20:13,006][42004] Updated weights for policy 0, policy_version 47126 (0.0027) +[2024-11-08 06:20:17,932][41694] Fps is (10 sec: 6553.3, 60 sec: 7167.9, 300 sec: 6831.3). Total num frames: 193056768. Throughput: 0: 1758.9. Samples: 43257636. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:20:17,934][41694] Avg episode reward: [(0, '4.470')] +[2024-11-08 06:20:22,300][42004] Updated weights for policy 0, policy_version 47136 (0.0049) +[2024-11-08 06:20:22,932][41694] Fps is (10 sec: 4916.0, 60 sec: 6758.4, 300 sec: 6775.8). Total num frames: 193073152. Throughput: 0: 1615.9. Samples: 43263518. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:20:22,933][41694] Avg episode reward: [(0, '4.428')] +[2024-11-08 06:20:27,931][41694] Fps is (10 sec: 4915.4, 60 sec: 6690.2, 300 sec: 6761.9). Total num frames: 193105920. Throughput: 0: 1575.4. Samples: 43272850. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:20:27,934][41694] Avg episode reward: [(0, '4.408')] +[2024-11-08 06:20:28,153][42004] Updated weights for policy 0, policy_version 47146 (0.0028) +[2024-11-08 06:20:32,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6621.9, 300 sec: 6761.9). Total num frames: 193142784. Throughput: 0: 1623.8. Samples: 43278486. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:20:32,933][41694] Avg episode reward: [(0, '4.353')] +[2024-11-08 06:20:33,627][42004] Updated weights for policy 0, policy_version 47156 (0.0029) +[2024-11-08 06:20:37,932][41694] Fps is (10 sec: 7782.3, 60 sec: 6622.0, 300 sec: 6830.8). Total num frames: 193183744. Throughput: 0: 1705.7. Samples: 43289544. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:20:37,933][41694] Avg episode reward: [(0, '4.415')] +[2024-11-08 06:20:38,921][42004] Updated weights for policy 0, policy_version 47166 (0.0016) +[2024-11-08 06:20:42,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6553.7, 300 sec: 6817.4). Total num frames: 193216512. Throughput: 0: 1702.9. Samples: 43300622. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:20:42,933][41694] Avg episode reward: [(0, '4.449')] +[2024-11-08 06:20:44,811][42004] Updated weights for policy 0, policy_version 47176 (0.0026) +[2024-11-08 06:20:47,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6894.9, 300 sec: 6817.4). Total num frames: 193253376. Throughput: 0: 1699.3. Samples: 43306234. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:20:47,933][41694] Avg episode reward: [(0, '4.749')] +[2024-11-08 06:20:50,361][42004] Updated weights for policy 0, policy_version 47186 (0.0030) +[2024-11-08 06:20:52,932][41694] Fps is (10 sec: 6963.0, 60 sec: 6894.9, 300 sec: 6803.5). Total num frames: 193286144. Throughput: 0: 1679.2. Samples: 43316980. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:20:52,934][41694] Avg episode reward: [(0, '4.609')] +[2024-11-08 06:20:57,932][41694] Fps is (10 sec: 4915.1, 60 sec: 6553.6, 300 sec: 6748.0). Total num frames: 193302528. Throughput: 0: 1547.8. Samples: 43322664. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:20:57,934][41694] Avg episode reward: [(0, '4.672')] +[2024-11-08 06:20:59,067][42004] Updated weights for policy 0, policy_version 47196 (0.0041) +[2024-11-08 06:21:02,932][41694] Fps is (10 sec: 5324.8, 60 sec: 6485.3, 300 sec: 6748.0). Total num frames: 193339392. Throughput: 0: 1558.3. Samples: 43327758. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:21:02,934][41694] Avg episode reward: [(0, '4.371')] +[2024-11-08 06:21:04,985][42004] Updated weights for policy 0, policy_version 47206 (0.0033) +[2024-11-08 06:21:07,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6417.1, 300 sec: 6761.9). Total num frames: 193376256. Throughput: 0: 1667.0. Samples: 43338532. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:21:07,933][41694] Avg episode reward: [(0, '4.356')] +[2024-11-08 06:21:10,684][42004] Updated weights for policy 0, policy_version 47216 (0.0025) +[2024-11-08 06:21:12,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6417.2, 300 sec: 6820.2). Total num frames: 193409024. Throughput: 0: 1681.4. Samples: 43348514. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:21:12,933][41694] Avg episode reward: [(0, '4.388')] +[2024-11-08 06:21:16,605][42004] Updated weights for policy 0, policy_version 47226 (0.0023) +[2024-11-08 06:21:17,935][41694] Fps is (10 sec: 6960.8, 60 sec: 6485.0, 300 sec: 6817.3). Total num frames: 193445888. Throughput: 0: 1670.5. Samples: 43353664. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:21:17,937][41694] Avg episode reward: [(0, '4.302')] +[2024-11-08 06:21:22,015][42004] Updated weights for policy 0, policy_version 47236 (0.0026) +[2024-11-08 06:21:22,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6826.7, 300 sec: 6817.4). Total num frames: 193482752. Throughput: 0: 1681.9. Samples: 43365230. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:21:22,933][41694] Avg episode reward: [(0, '4.458')] +[2024-11-08 06:21:27,931][41694] Fps is (10 sec: 6965.7, 60 sec: 6826.7, 300 sec: 6789.6). Total num frames: 193515520. Throughput: 0: 1668.5. Samples: 43375706. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:21:27,935][41694] Avg episode reward: [(0, '4.672')] +[2024-11-08 06:21:27,947][42004] Updated weights for policy 0, policy_version 47246 (0.0040) +[2024-11-08 06:21:32,931][41694] Fps is (10 sec: 5324.8, 60 sec: 6553.6, 300 sec: 6734.1). Total num frames: 193536000. Throughput: 0: 1587.2. Samples: 43377660. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:21:32,934][41694] Avg episode reward: [(0, '4.645')] +[2024-11-08 06:21:36,187][42004] Updated weights for policy 0, policy_version 47256 (0.0023) +[2024-11-08 06:21:37,932][41694] Fps is (10 sec: 5734.1, 60 sec: 6485.3, 300 sec: 6720.2). Total num frames: 193572864. Throughput: 0: 1563.6. Samples: 43387342. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:21:37,935][41694] Avg episode reward: [(0, '4.621')] +[2024-11-08 06:21:37,949][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000047259_193572864.pth... +[2024-11-08 06:21:38,103][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000046864_191954944.pth +[2024-11-08 06:21:41,752][42004] Updated weights for policy 0, policy_version 47266 (0.0025) +[2024-11-08 06:21:42,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6553.6, 300 sec: 6720.2). Total num frames: 193609728. Throughput: 0: 1684.1. Samples: 43398450. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:21:42,933][41694] Avg episode reward: [(0, '4.493')] +[2024-11-08 06:21:47,036][42004] Updated weights for policy 0, policy_version 47276 (0.0028) +[2024-11-08 06:21:47,934][41694] Fps is (10 sec: 7371.3, 60 sec: 6553.3, 300 sec: 6789.6). Total num frames: 193646592. Throughput: 0: 1694.5. Samples: 43404014. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:21:47,937][41694] Avg episode reward: [(0, '4.496')] +[2024-11-08 06:21:52,317][42004] Updated weights for policy 0, policy_version 47286 (0.0026) +[2024-11-08 06:21:52,931][41694] Fps is (10 sec: 7782.6, 60 sec: 6690.2, 300 sec: 6803.5). Total num frames: 193687552. Throughput: 0: 1719.0. Samples: 43415888. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:21:52,934][41694] Avg episode reward: [(0, '4.477')] +[2024-11-08 06:21:57,815][42004] Updated weights for policy 0, policy_version 47296 (0.0031) +[2024-11-08 06:21:57,932][41694] Fps is (10 sec: 7784.0, 60 sec: 7031.4, 300 sec: 6817.4). Total num frames: 193724416. Throughput: 0: 1750.5. Samples: 43427286. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:21:57,935][41694] Avg episode reward: [(0, '4.658')] +[2024-11-08 06:22:05,216][41694] Fps is (10 sec: 5334.8, 60 sec: 6642.0, 300 sec: 6737.5). Total num frames: 193753088. Throughput: 0: 1651.8. Samples: 43431764. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:22:05,218][41694] Avg episode reward: [(0, '4.331')] +[2024-11-08 06:22:07,129][42004] Updated weights for policy 0, policy_version 47306 (0.0037) +[2024-11-08 06:22:07,932][41694] Fps is (10 sec: 4505.7, 60 sec: 6553.6, 300 sec: 6734.1). Total num frames: 193769472. Throughput: 0: 1602.4. Samples: 43437340. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:22:07,933][41694] Avg episode reward: [(0, '4.247')] +[2024-11-08 06:22:12,931][41694] Fps is (10 sec: 5839.8, 60 sec: 6485.4, 300 sec: 6692.4). Total num frames: 193798144. Throughput: 0: 1570.1. Samples: 43446360. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:22:12,933][41694] Avg episode reward: [(0, '4.393')] +[2024-11-08 06:22:13,592][42004] Updated weights for policy 0, policy_version 47316 (0.0027) +[2024-11-08 06:22:17,932][41694] Fps is (10 sec: 6553.4, 60 sec: 6485.7, 300 sec: 6692.4). Total num frames: 193835008. Throughput: 0: 1645.2. Samples: 43451694. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:22:17,936][41694] Avg episode reward: [(0, '4.369')] +[2024-11-08 06:22:19,620][42004] Updated weights for policy 0, policy_version 47326 (0.0027) +[2024-11-08 06:22:22,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6417.1, 300 sec: 6734.1). Total num frames: 193867776. Throughput: 0: 1651.3. Samples: 43461650. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:22:22,933][41694] Avg episode reward: [(0, '4.343')] +[2024-11-08 06:22:25,284][42004] Updated weights for policy 0, policy_version 47336 (0.0027) +[2024-11-08 06:22:27,931][41694] Fps is (10 sec: 6963.5, 60 sec: 6485.3, 300 sec: 6734.1). Total num frames: 193904640. Throughput: 0: 1656.6. Samples: 43472998. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:22:27,933][41694] Avg episode reward: [(0, '4.445')] +[2024-11-08 06:22:30,874][42004] Updated weights for policy 0, policy_version 47346 (0.0028) +[2024-11-08 06:22:32,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6690.1, 300 sec: 6720.2). Total num frames: 193937408. Throughput: 0: 1649.2. Samples: 43478226. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:22:32,935][41694] Avg episode reward: [(0, '4.715')] +[2024-11-08 06:22:37,558][42004] Updated weights for policy 0, policy_version 47356 (0.0041) +[2024-11-08 06:22:40,153][41694] Fps is (10 sec: 5362.3, 60 sec: 6385.5, 300 sec: 6656.2). Total num frames: 193970176. Throughput: 0: 1518.5. Samples: 43487594. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:22:40,155][41694] Avg episode reward: [(0, '4.704')] +[2024-11-08 06:22:42,931][41694] Fps is (10 sec: 5324.8, 60 sec: 6348.8, 300 sec: 6636.9). Total num frames: 193990656. Throughput: 0: 1471.1. Samples: 43493486. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:22:42,934][41694] Avg episode reward: [(0, '4.537')] +[2024-11-08 06:22:45,699][42004] Updated weights for policy 0, policy_version 47366 (0.0032) +[2024-11-08 06:22:47,931][41694] Fps is (10 sec: 6845.8, 60 sec: 6280.8, 300 sec: 6609.1). Total num frames: 194023424. Throughput: 0: 1575.1. Samples: 43499044. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:22:47,933][41694] Avg episode reward: [(0, '4.664')] +[2024-11-08 06:22:51,433][42004] Updated weights for policy 0, policy_version 47376 (0.0030) +[2024-11-08 06:22:52,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6212.2, 300 sec: 6609.1). Total num frames: 194060288. Throughput: 0: 1612.0. Samples: 43509878. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:22:52,933][41694] Avg episode reward: [(0, '4.651')] +[2024-11-08 06:22:56,606][42004] Updated weights for policy 0, policy_version 47386 (0.0030) +[2024-11-08 06:22:57,931][41694] Fps is (10 sec: 7782.4, 60 sec: 6280.6, 300 sec: 6706.3). Total num frames: 194101248. Throughput: 0: 1672.7. Samples: 43521632. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:22:57,933][41694] Avg episode reward: [(0, '4.371')] +[2024-11-08 06:23:02,248][42004] Updated weights for policy 0, policy_version 47396 (0.0035) +[2024-11-08 06:23:02,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6671.1, 300 sec: 6720.2). Total num frames: 194138112. Throughput: 0: 1677.7. Samples: 43527192. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:23:02,933][41694] Avg episode reward: [(0, '4.485')] +[2024-11-08 06:23:07,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6690.1, 300 sec: 6720.2). Total num frames: 194170880. Throughput: 0: 1687.2. Samples: 43537574. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:23:07,936][41694] Avg episode reward: [(0, '4.803')] +[2024-11-08 06:23:08,216][42004] Updated weights for policy 0, policy_version 47406 (0.0029) +[2024-11-08 06:23:14,912][41694] Fps is (10 sec: 5128.3, 60 sec: 6476.3, 300 sec: 6647.8). Total num frames: 194199552. Throughput: 0: 1586.9. Samples: 43547550. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:23:14,914][41694] Avg episode reward: [(0, '4.291')] +[2024-11-08 06:23:16,899][42004] Updated weights for policy 0, policy_version 47416 (0.0023) +[2024-11-08 06:23:17,931][41694] Fps is (10 sec: 4915.3, 60 sec: 6417.1, 300 sec: 6623.0). Total num frames: 194220032. Throughput: 0: 1561.2. Samples: 43548478. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:23:17,933][41694] Avg episode reward: [(0, '4.225')] +[2024-11-08 06:23:22,328][42004] Updated weights for policy 0, policy_version 47426 (0.0023) +[2024-11-08 06:23:22,933][41694] Fps is (10 sec: 7659.8, 60 sec: 6553.4, 300 sec: 6623.0). Total num frames: 194260992. Throughput: 0: 1675.4. Samples: 43559268. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:23:22,936][41694] Avg episode reward: [(0, '4.356')] +[2024-11-08 06:23:27,692][42004] Updated weights for policy 0, policy_version 47436 (0.0029) +[2024-11-08 06:23:27,932][41694] Fps is (10 sec: 7782.3, 60 sec: 6553.6, 300 sec: 6623.0). Total num frames: 194297856. Throughput: 0: 1716.0. Samples: 43570704. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:23:27,934][41694] Avg episode reward: [(0, '4.354')] +[2024-11-08 06:23:32,932][41694] Fps is (10 sec: 7374.0, 60 sec: 6621.9, 300 sec: 6692.5). Total num frames: 194334720. Throughput: 0: 1721.4. Samples: 43576506. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:23:32,940][41694] Avg episode reward: [(0, '4.474')] +[2024-11-08 06:23:32,996][42004] Updated weights for policy 0, policy_version 47446 (0.0035) +[2024-11-08 06:23:37,931][41694] Fps is (10 sec: 7782.5, 60 sec: 7018.3, 300 sec: 6734.1). Total num frames: 194375680. Throughput: 0: 1740.9. Samples: 43588220. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:23:37,932][41694] Avg episode reward: [(0, '4.424')] +[2024-11-08 06:23:37,948][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000047455_194375680.pth... +[2024-11-08 06:23:38,156][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000047067_192786432.pth +[2024-11-08 06:23:38,301][42004] Updated weights for policy 0, policy_version 47456 (0.0025) +[2024-11-08 06:23:42,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6963.2, 300 sec: 6720.2). Total num frames: 194408448. Throughput: 0: 1708.4. Samples: 43598512. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:23:42,935][41694] Avg episode reward: [(0, '4.453')] +[2024-11-08 06:23:44,511][42004] Updated weights for policy 0, policy_version 47466 (0.0037) +[2024-11-08 06:23:49,816][41694] Fps is (10 sec: 5169.8, 60 sec: 6685.0, 300 sec: 6636.2). Total num frames: 194437120. Throughput: 0: 1632.1. Samples: 43603710. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:23:49,819][41694] Avg episode reward: [(0, '4.534')] +[2024-11-08 06:23:52,774][42004] Updated weights for policy 0, policy_version 47476 (0.0026) +[2024-11-08 06:23:52,932][41694] Fps is (10 sec: 5324.7, 60 sec: 6690.1, 300 sec: 6636.9). Total num frames: 194461696. Throughput: 0: 1605.3. Samples: 43609812. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:23:52,933][41694] Avg episode reward: [(0, '4.380')] +[2024-11-08 06:23:57,931][41694] Fps is (10 sec: 7570.5, 60 sec: 6621.9, 300 sec: 6623.0). Total num frames: 194498560. Throughput: 0: 1708.3. Samples: 43621040. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:23:57,933][41694] Avg episode reward: [(0, '4.214')] +[2024-11-08 06:23:58,017][42004] Updated weights for policy 0, policy_version 47486 (0.0027) +[2024-11-08 06:24:02,932][41694] Fps is (10 sec: 7782.5, 60 sec: 6690.1, 300 sec: 6636.9). Total num frames: 194539520. Throughput: 0: 1743.7. Samples: 43626944. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:24:02,935][41694] Avg episode reward: [(0, '4.408')] +[2024-11-08 06:24:03,422][42004] Updated weights for policy 0, policy_version 47496 (0.0041) +[2024-11-08 06:24:07,931][41694] Fps is (10 sec: 7782.5, 60 sec: 6758.4, 300 sec: 6706.3). Total num frames: 194576384. Throughput: 0: 1761.4. Samples: 43638530. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:24:07,934][41694] Avg episode reward: [(0, '4.528')] +[2024-11-08 06:24:08,498][42004] Updated weights for policy 0, policy_version 47506 (0.0027) +[2024-11-08 06:24:12,933][41694] Fps is (10 sec: 7782.5, 60 sec: 7200.9, 300 sec: 6748.0). Total num frames: 194617344. Throughput: 0: 1776.7. Samples: 43650654. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:24:12,935][41694] Avg episode reward: [(0, '4.389')] +[2024-11-08 06:24:13,948][42004] Updated weights for policy 0, policy_version 47516 (0.0040) +[2024-11-08 06:24:17,932][41694] Fps is (10 sec: 6962.9, 60 sec: 7099.7, 300 sec: 6706.3). Total num frames: 194646016. Throughput: 0: 1748.3. Samples: 43655180. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:24:17,936][41694] Avg episode reward: [(0, '4.515')] +[2024-11-08 06:24:20,292][42004] Updated weights for policy 0, policy_version 47526 (0.0027) +[2024-11-08 06:24:24,505][41694] Fps is (10 sec: 4954.7, 60 sec: 6718.9, 300 sec: 6643.1). Total num frames: 194674688. Throughput: 0: 1660.6. Samples: 43665558. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:24:24,507][41694] Avg episode reward: [(0, '4.671')] +[2024-11-08 06:24:27,932][41694] Fps is (10 sec: 5324.9, 60 sec: 6690.1, 300 sec: 6623.0). Total num frames: 194699264. Throughput: 0: 1617.5. Samples: 43671298. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:24:27,935][41694] Avg episode reward: [(0, '4.595')] +[2024-11-08 06:24:28,677][42004] Updated weights for policy 0, policy_version 47536 (0.0032) +[2024-11-08 06:24:32,932][41694] Fps is (10 sec: 7777.5, 60 sec: 6758.4, 300 sec: 6623.0). Total num frames: 194740224. Throughput: 0: 1696.3. Samples: 43676848. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-08 06:24:32,934][41694] Avg episode reward: [(0, '4.511')] +[2024-11-08 06:24:33,904][42004] Updated weights for policy 0, policy_version 47546 (0.0023) +[2024-11-08 06:24:37,932][41694] Fps is (10 sec: 7782.5, 60 sec: 6690.1, 300 sec: 6623.0). Total num frames: 194777088. Throughput: 0: 1747.8. Samples: 43688464. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-08 06:24:37,933][41694] Avg episode reward: [(0, '4.403')] +[2024-11-08 06:24:39,304][42004] Updated weights for policy 0, policy_version 47556 (0.0029) +[2024-11-08 06:24:42,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6758.4, 300 sec: 6692.5). Total num frames: 194813952. Throughput: 0: 1751.7. Samples: 43699866. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-08 06:24:42,933][41694] Avg episode reward: [(0, '4.440')] +[2024-11-08 06:24:44,778][42004] Updated weights for policy 0, policy_version 47566 (0.0035) +[2024-11-08 06:24:47,932][41694] Fps is (10 sec: 7372.7, 60 sec: 7118.5, 300 sec: 6706.3). Total num frames: 194850816. Throughput: 0: 1747.6. Samples: 43705584. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-08 06:24:47,934][41694] Avg episode reward: [(0, '4.571')] +[2024-11-08 06:24:50,897][42004] Updated weights for policy 0, policy_version 47576 (0.0042) +[2024-11-08 06:24:52,932][41694] Fps is (10 sec: 6963.1, 60 sec: 7031.5, 300 sec: 6692.4). Total num frames: 194883584. Throughput: 0: 1709.8. Samples: 43715470. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-08 06:24:52,934][41694] Avg episode reward: [(0, '4.674')] +[2024-11-08 06:24:59,075][41694] Fps is (10 sec: 5145.9, 60 sec: 6699.0, 300 sec: 6611.3). Total num frames: 194908160. Throughput: 0: 1513.7. Samples: 43720500. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-08 06:24:59,078][41694] Avg episode reward: [(0, '4.316')] +[2024-11-08 06:24:59,441][42004] Updated weights for policy 0, policy_version 47586 (0.0023) +[2024-11-08 06:25:02,932][41694] Fps is (10 sec: 4915.2, 60 sec: 6553.6, 300 sec: 6581.4). Total num frames: 194932736. Throughput: 0: 1590.5. Samples: 43726752. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:25:02,933][41694] Avg episode reward: [(0, '4.477')] +[2024-11-08 06:25:05,439][42004] Updated weights for policy 0, policy_version 47596 (0.0029) +[2024-11-08 06:25:07,932][41694] Fps is (10 sec: 6937.5, 60 sec: 6553.6, 300 sec: 6595.3). Total num frames: 194969600. Throughput: 0: 1645.2. Samples: 43737004. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:25:07,934][41694] Avg episode reward: [(0, '4.454')] +[2024-11-08 06:25:10,838][42004] Updated weights for policy 0, policy_version 47606 (0.0021) +[2024-11-08 06:25:12,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6485.3, 300 sec: 6609.1). Total num frames: 195006464. Throughput: 0: 1711.7. Samples: 43748322. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:25:12,933][41694] Avg episode reward: [(0, '4.604')] +[2024-11-08 06:25:16,675][42004] Updated weights for policy 0, policy_version 47616 (0.0023) +[2024-11-08 06:25:17,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6621.9, 300 sec: 6678.6). Total num frames: 195043328. Throughput: 0: 1696.7. Samples: 43753200. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:25:17,933][41694] Avg episode reward: [(0, '4.542')] +[2024-11-08 06:25:22,796][42004] Updated weights for policy 0, policy_version 47626 (0.0034) +[2024-11-08 06:25:22,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6870.3, 300 sec: 6678.6). Total num frames: 195076096. Throughput: 0: 1669.3. Samples: 43763582. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:25:22,934][41694] Avg episode reward: [(0, '4.350')] +[2024-11-08 06:25:27,931][41694] Fps is (10 sec: 6553.7, 60 sec: 6826.7, 300 sec: 6664.7). Total num frames: 195108864. Throughput: 0: 1644.6. Samples: 43773872. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:25:27,933][41694] Avg episode reward: [(0, '4.291')] +[2024-11-08 06:25:28,488][42004] Updated weights for policy 0, policy_version 47636 (0.0033) +[2024-11-08 06:25:33,711][41694] Fps is (10 sec: 5319.7, 60 sec: 6469.5, 300 sec: 6591.7). Total num frames: 195133440. Throughput: 0: 1608.3. Samples: 43779210. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:25:33,716][41694] Avg episode reward: [(0, '4.357')] +[2024-11-08 06:25:36,941][42004] Updated weights for policy 0, policy_version 47646 (0.0029) +[2024-11-08 06:25:37,932][41694] Fps is (10 sec: 5324.6, 60 sec: 6417.0, 300 sec: 6595.2). Total num frames: 195162112. Throughput: 0: 1550.7. Samples: 43785250. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:25:37,937][41694] Avg episode reward: [(0, '4.527')] +[2024-11-08 06:25:38,091][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000047648_195166208.pth... +[2024-11-08 06:25:38,189][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000047259_193572864.pth +[2024-11-08 06:25:42,358][42004] Updated weights for policy 0, policy_version 47656 (0.0032) +[2024-11-08 06:25:42,931][41694] Fps is (10 sec: 7552.1, 60 sec: 6485.3, 300 sec: 6609.1). Total num frames: 195203072. Throughput: 0: 1735.0. Samples: 43796590. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:25:42,933][41694] Avg episode reward: [(0, '4.643')] +[2024-11-08 06:25:47,671][42004] Updated weights for policy 0, policy_version 47666 (0.0030) +[2024-11-08 06:25:47,932][41694] Fps is (10 sec: 7782.6, 60 sec: 6485.3, 300 sec: 6623.0). Total num frames: 195239936. Throughput: 0: 1675.1. Samples: 43802130. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:25:47,933][41694] Avg episode reward: [(0, '4.541')] +[2024-11-08 06:25:52,838][42004] Updated weights for policy 0, policy_version 47676 (0.0031) +[2024-11-08 06:25:52,931][41694] Fps is (10 sec: 7782.4, 60 sec: 6621.9, 300 sec: 6706.3). Total num frames: 195280896. Throughput: 0: 1713.2. Samples: 43814096. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:25:52,934][41694] Avg episode reward: [(0, '4.450')] +[2024-11-08 06:25:57,933][41694] Fps is (10 sec: 7371.4, 60 sec: 6889.5, 300 sec: 6692.4). Total num frames: 195313664. Throughput: 0: 1703.7. Samples: 43824990. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:25:57,935][41694] Avg episode reward: [(0, '4.403')] +[2024-11-08 06:25:58,797][42004] Updated weights for policy 0, policy_version 47686 (0.0021) +[2024-11-08 06:26:02,933][41694] Fps is (10 sec: 6552.3, 60 sec: 6894.7, 300 sec: 6678.5). Total num frames: 195346432. Throughput: 0: 1703.9. Samples: 43829878. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:26:02,936][41694] Avg episode reward: [(0, '4.365')] +[2024-11-08 06:26:05,074][42004] Updated weights for policy 0, policy_version 47696 (0.0033) +[2024-11-08 06:26:08,282][41694] Fps is (10 sec: 5145.5, 60 sec: 6583.4, 300 sec: 6629.0). Total num frames: 195366912. Throughput: 0: 1575.2. Samples: 43835018. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:26:08,290][41694] Avg episode reward: [(0, '4.622')] +[2024-11-08 06:26:12,932][41694] Fps is (10 sec: 4915.8, 60 sec: 6485.2, 300 sec: 6609.2). Total num frames: 195395584. Throughput: 0: 1593.1. Samples: 43845564. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:26:12,935][41694] Avg episode reward: [(0, '4.354')] +[2024-11-08 06:26:13,647][42004] Updated weights for policy 0, policy_version 47706 (0.0039) +[2024-11-08 06:26:17,931][41694] Fps is (10 sec: 6791.6, 60 sec: 6485.3, 300 sec: 6609.1). Total num frames: 195432448. Throughput: 0: 1617.3. Samples: 43850728. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:26:17,934][41694] Avg episode reward: [(0, '4.498')] +[2024-11-08 06:26:19,195][42004] Updated weights for policy 0, policy_version 47716 (0.0035) +[2024-11-08 06:26:22,931][41694] Fps is (10 sec: 7783.1, 60 sec: 6621.9, 300 sec: 6636.9). Total num frames: 195473408. Throughput: 0: 1707.6. Samples: 43862090. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:26:22,934][41694] Avg episode reward: [(0, '4.361')] +[2024-11-08 06:26:24,571][42004] Updated weights for policy 0, policy_version 47726 (0.0028) +[2024-11-08 06:26:27,932][41694] Fps is (10 sec: 7782.1, 60 sec: 6690.1, 300 sec: 6692.4). Total num frames: 195510272. Throughput: 0: 1713.8. Samples: 43873714. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:26:27,935][41694] Avg episode reward: [(0, '4.571')] +[2024-11-08 06:26:30,560][42004] Updated weights for policy 0, policy_version 47736 (0.0052) +[2024-11-08 06:26:32,932][41694] Fps is (10 sec: 6553.5, 60 sec: 6847.4, 300 sec: 6664.7). Total num frames: 195538944. Throughput: 0: 1692.4. Samples: 43878288. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:26:32,934][41694] Avg episode reward: [(0, '4.290')] +[2024-11-08 06:26:36,221][42004] Updated weights for policy 0, policy_version 47746 (0.0023) +[2024-11-08 06:26:37,932][41694] Fps is (10 sec: 6553.7, 60 sec: 6894.9, 300 sec: 6664.7). Total num frames: 195575808. Throughput: 0: 1664.7. Samples: 43889008. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:26:37,933][41694] Avg episode reward: [(0, '4.574')] +[2024-11-08 06:26:42,944][41694] Fps is (10 sec: 5318.1, 60 sec: 6484.0, 300 sec: 6595.0). Total num frames: 195592192. Throughput: 0: 1534.7. Samples: 43894068. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:26:42,946][41694] Avg episode reward: [(0, '4.473')] +[2024-11-08 06:26:44,850][42004] Updated weights for policy 0, policy_version 47756 (0.0032) +[2024-11-08 06:26:47,932][41694] Fps is (10 sec: 5324.9, 60 sec: 6485.3, 300 sec: 6581.4). Total num frames: 195629056. Throughput: 0: 1558.4. Samples: 43900002. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:26:47,933][41694] Avg episode reward: [(0, '4.600')] +[2024-11-08 06:26:50,569][42004] Updated weights for policy 0, policy_version 47766 (0.0037) +[2024-11-08 06:26:52,931][41694] Fps is (10 sec: 7382.2, 60 sec: 6417.1, 300 sec: 6581.4). Total num frames: 195665920. Throughput: 0: 1696.6. Samples: 43910770. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:26:52,934][41694] Avg episode reward: [(0, '4.492')] +[2024-11-08 06:26:56,079][42004] Updated weights for policy 0, policy_version 47776 (0.0029) +[2024-11-08 06:26:57,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6485.6, 300 sec: 6660.7). Total num frames: 195702784. Throughput: 0: 1697.9. Samples: 43921966. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:26:57,933][41694] Avg episode reward: [(0, '4.558')] +[2024-11-08 06:27:02,076][42004] Updated weights for policy 0, policy_version 47786 (0.0026) +[2024-11-08 06:27:02,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6485.5, 300 sec: 6664.7). Total num frames: 195735552. Throughput: 0: 1695.8. Samples: 43927040. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:27:02,933][41694] Avg episode reward: [(0, '4.391')] +[2024-11-08 06:27:07,932][41694] Fps is (10 sec: 6553.5, 60 sec: 6729.4, 300 sec: 6678.6). Total num frames: 195768320. Throughput: 0: 1657.1. Samples: 43936660. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:27:07,933][41694] Avg episode reward: [(0, '4.678')] +[2024-11-08 06:27:08,359][42004] Updated weights for policy 0, policy_version 47796 (0.0039) +[2024-11-08 06:27:12,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6826.7, 300 sec: 6678.6). Total num frames: 195805184. Throughput: 0: 1643.6. Samples: 43947676. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:27:12,935][41694] Avg episode reward: [(0, '4.420')] +[2024-11-08 06:27:13,808][42004] Updated weights for policy 0, policy_version 47806 (0.0022) +[2024-11-08 06:27:17,931][41694] Fps is (10 sec: 6144.0, 60 sec: 6621.9, 300 sec: 6650.8). Total num frames: 195829760. Throughput: 0: 1661.7. Samples: 43953064. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:27:17,933][41694] Avg episode reward: [(0, '4.197')] +[2024-11-08 06:27:21,249][42004] Updated weights for policy 0, policy_version 47816 (0.0023) +[2024-11-08 06:27:22,932][41694] Fps is (10 sec: 5734.5, 60 sec: 6485.3, 300 sec: 6636.9). Total num frames: 195862528. Throughput: 0: 1592.3. Samples: 43960660. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:27:22,933][41694] Avg episode reward: [(0, '4.535')] +[2024-11-08 06:27:26,685][42004] Updated weights for policy 0, policy_version 47826 (0.0022) +[2024-11-08 06:27:27,932][41694] Fps is (10 sec: 7372.5, 60 sec: 6553.6, 300 sec: 6664.7). Total num frames: 195903488. Throughput: 0: 1734.3. Samples: 43972092. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:27:27,936][41694] Avg episode reward: [(0, '4.410')] +[2024-11-08 06:27:32,040][42004] Updated weights for policy 0, policy_version 47836 (0.0024) +[2024-11-08 06:27:32,932][41694] Fps is (10 sec: 7782.0, 60 sec: 6690.1, 300 sec: 6729.2). Total num frames: 195940352. Throughput: 0: 1724.6. Samples: 43977608. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:27:32,935][41694] Avg episode reward: [(0, '4.363')] +[2024-11-08 06:27:37,932][41694] Fps is (10 sec: 6963.3, 60 sec: 6621.9, 300 sec: 6720.2). Total num frames: 195973120. Throughput: 0: 1729.8. Samples: 43988610. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:27:37,934][41694] Avg episode reward: [(0, '4.547')] +[2024-11-08 06:27:38,048][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000047846_195977216.pth... +[2024-11-08 06:27:38,053][42004] Updated weights for policy 0, policy_version 47846 (0.0029) +[2024-11-08 06:27:38,253][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000047455_194375680.pth +[2024-11-08 06:27:42,932][41694] Fps is (10 sec: 6963.6, 60 sec: 6964.7, 300 sec: 6734.1). Total num frames: 196009984. Throughput: 0: 1701.1. Samples: 43998514. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:27:42,934][41694] Avg episode reward: [(0, '4.521')] +[2024-11-08 06:27:43,936][42004] Updated weights for policy 0, policy_version 47856 (0.0023) +[2024-11-08 06:27:47,932][41694] Fps is (10 sec: 6963.3, 60 sec: 6894.9, 300 sec: 6720.2). Total num frames: 196042752. Throughput: 0: 1709.2. Samples: 44003952. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:27:47,933][41694] Avg episode reward: [(0, '4.619')] +[2024-11-08 06:27:51,989][42004] Updated weights for policy 0, policy_version 47866 (0.0032) +[2024-11-08 06:27:52,932][41694] Fps is (10 sec: 5324.8, 60 sec: 6621.8, 300 sec: 6650.8). Total num frames: 196063232. Throughput: 0: 1659.2. Samples: 44011326. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:27:52,934][41694] Avg episode reward: [(0, '4.585')] +[2024-11-08 06:27:57,388][42004] Updated weights for policy 0, policy_version 47876 (0.0023) +[2024-11-08 06:27:57,932][41694] Fps is (10 sec: 6144.0, 60 sec: 6690.1, 300 sec: 6664.7). Total num frames: 196104192. Throughput: 0: 1650.3. Samples: 44021940. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:27:57,934][41694] Avg episode reward: [(0, '4.541')] +[2024-11-08 06:28:02,852][42004] Updated weights for policy 0, policy_version 47886 (0.0041) +[2024-11-08 06:28:02,931][41694] Fps is (10 sec: 7782.5, 60 sec: 6758.4, 300 sec: 6678.6). Total num frames: 196141056. Throughput: 0: 1657.1. Samples: 44027634. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:28:02,933][41694] Avg episode reward: [(0, '4.393')] +[2024-11-08 06:28:07,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6826.7, 300 sec: 6751.7). Total num frames: 196177920. Throughput: 0: 1742.6. Samples: 44039076. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:28:07,933][41694] Avg episode reward: [(0, '4.479')] +[2024-11-08 06:28:08,004][42004] Updated weights for policy 0, policy_version 47896 (0.0023) +[2024-11-08 06:28:12,934][41694] Fps is (10 sec: 7370.8, 60 sec: 6826.4, 300 sec: 6761.8). Total num frames: 196214784. Throughput: 0: 1727.8. Samples: 44049846. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:28:12,936][41694] Avg episode reward: [(0, '4.411')] +[2024-11-08 06:28:13,968][42004] Updated weights for policy 0, policy_version 47906 (0.0043) +[2024-11-08 06:28:17,932][41694] Fps is (10 sec: 7372.7, 60 sec: 7031.5, 300 sec: 6748.0). Total num frames: 196251648. Throughput: 0: 1720.8. Samples: 44055044. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:28:17,934][41694] Avg episode reward: [(0, '4.458')] +[2024-11-08 06:28:19,642][42004] Updated weights for policy 0, policy_version 47916 (0.0030) +[2024-11-08 06:28:22,931][41694] Fps is (10 sec: 6965.1, 60 sec: 7031.5, 300 sec: 6734.1). Total num frames: 196284416. Throughput: 0: 1718.5. Samples: 44065944. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:28:22,933][41694] Avg episode reward: [(0, '4.518')] +[2024-11-08 06:28:27,932][41694] Fps is (10 sec: 4915.1, 60 sec: 6621.9, 300 sec: 6664.7). Total num frames: 196300800. Throughput: 0: 1632.4. Samples: 44071974. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:28:27,935][41694] Avg episode reward: [(0, '4.584')] +[2024-11-08 06:28:28,007][42004] Updated weights for policy 0, policy_version 47926 (0.0028) +[2024-11-08 06:28:32,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6690.2, 300 sec: 6664.7). Total num frames: 196341760. Throughput: 0: 1637.6. Samples: 44077646. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:28:32,934][41694] Avg episode reward: [(0, '4.456')] +[2024-11-08 06:28:33,171][42004] Updated weights for policy 0, policy_version 47936 (0.0027) +[2024-11-08 06:28:37,932][41694] Fps is (10 sec: 8192.1, 60 sec: 6826.7, 300 sec: 6692.4). Total num frames: 196382720. Throughput: 0: 1735.4. Samples: 44089420. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:28:37,937][41694] Avg episode reward: [(0, '4.858')] +[2024-11-08 06:28:38,333][42004] Updated weights for policy 0, policy_version 47946 (0.0021) +[2024-11-08 06:28:42,933][41694] Fps is (10 sec: 7371.5, 60 sec: 6758.2, 300 sec: 6749.4). Total num frames: 196415488. Throughput: 0: 1754.7. Samples: 44100906. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:28:42,936][41694] Avg episode reward: [(0, '4.450')] +[2024-11-08 06:28:44,438][42004] Updated weights for policy 0, policy_version 47956 (0.0028) +[2024-11-08 06:28:47,932][41694] Fps is (10 sec: 6553.6, 60 sec: 6758.4, 300 sec: 6734.1). Total num frames: 196448256. Throughput: 0: 1726.2. Samples: 44105314. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:28:47,934][41694] Avg episode reward: [(0, '4.457')] +[2024-11-08 06:28:50,676][42004] Updated weights for policy 0, policy_version 47966 (0.0027) +[2024-11-08 06:28:52,931][41694] Fps is (10 sec: 6964.5, 60 sec: 7031.5, 300 sec: 6734.1). Total num frames: 196485120. Throughput: 0: 1693.7. Samples: 44115292. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:28:52,933][41694] Avg episode reward: [(0, '4.611')] +[2024-11-08 06:28:56,369][42004] Updated weights for policy 0, policy_version 47976 (0.0028) +[2024-11-08 06:28:57,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6894.9, 300 sec: 6706.3). Total num frames: 196517888. Throughput: 0: 1693.5. Samples: 44126050. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:28:57,933][41694] Avg episode reward: [(0, '4.549')] +[2024-11-08 06:29:02,931][41694] Fps is (10 sec: 4915.2, 60 sec: 6553.6, 300 sec: 6636.9). Total num frames: 196534272. Throughput: 0: 1625.0. Samples: 44128170. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:29:02,934][41694] Avg episode reward: [(0, '4.531')] +[2024-11-08 06:29:05,017][42004] Updated weights for policy 0, policy_version 47986 (0.0035) +[2024-11-08 06:29:07,932][41694] Fps is (10 sec: 5324.8, 60 sec: 6553.6, 300 sec: 6623.0). Total num frames: 196571136. Throughput: 0: 1580.2. Samples: 44137052. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:29:07,933][41694] Avg episode reward: [(0, '4.329')] +[2024-11-08 06:29:10,699][42004] Updated weights for policy 0, policy_version 47996 (0.0028) +[2024-11-08 06:29:12,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6553.9, 300 sec: 6650.8). Total num frames: 196608000. Throughput: 0: 1687.0. Samples: 44147890. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:29:12,933][41694] Avg episode reward: [(0, '4.445')] +[2024-11-08 06:29:16,383][42004] Updated weights for policy 0, policy_version 48006 (0.0032) +[2024-11-08 06:29:17,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6485.4, 300 sec: 6700.4). Total num frames: 196640768. Throughput: 0: 1673.1. Samples: 44152934. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:29:17,933][41694] Avg episode reward: [(0, '4.375')] +[2024-11-08 06:29:22,677][42004] Updated weights for policy 0, policy_version 48016 (0.0029) +[2024-11-08 06:29:22,932][41694] Fps is (10 sec: 6553.6, 60 sec: 6485.3, 300 sec: 6692.5). Total num frames: 196673536. Throughput: 0: 1636.4. Samples: 44163060. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:29:22,934][41694] Avg episode reward: [(0, '4.232')] +[2024-11-08 06:29:27,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6826.7, 300 sec: 6678.6). Total num frames: 196710400. Throughput: 0: 1626.2. Samples: 44174082. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:29:27,933][41694] Avg episode reward: [(0, '4.467')] +[2024-11-08 06:29:28,095][42004] Updated weights for policy 0, policy_version 48026 (0.0031) +[2024-11-08 06:29:32,932][41694] Fps is (10 sec: 7782.0, 60 sec: 6826.6, 300 sec: 6692.4). Total num frames: 196751360. Throughput: 0: 1653.1. Samples: 44179706. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:29:32,938][41694] Avg episode reward: [(0, '4.462')] +[2024-11-08 06:29:35,418][42004] Updated weights for policy 0, policy_version 48036 (0.0031) +[2024-11-08 06:29:37,932][41694] Fps is (10 sec: 6144.0, 60 sec: 6485.3, 300 sec: 6636.9). Total num frames: 196771840. Throughput: 0: 1605.5. Samples: 44187540. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:29:37,933][41694] Avg episode reward: [(0, '4.373')] +[2024-11-08 06:29:37,945][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000048040_196771840.pth... +[2024-11-08 06:29:38,059][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000047648_195166208.pth +[2024-11-08 06:29:41,049][42004] Updated weights for policy 0, policy_version 48046 (0.0033) +[2024-11-08 06:29:42,932][41694] Fps is (10 sec: 5734.6, 60 sec: 6553.8, 300 sec: 6636.9). Total num frames: 196808704. Throughput: 0: 1609.4. Samples: 44198472. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:29:42,933][41694] Avg episode reward: [(0, '4.333')] +[2024-11-08 06:29:46,724][42004] Updated weights for policy 0, policy_version 48056 (0.0026) +[2024-11-08 06:29:47,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6621.9, 300 sec: 6650.8). Total num frames: 196845568. Throughput: 0: 1675.5. Samples: 44203568. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:29:47,933][41694] Avg episode reward: [(0, '4.382')] +[2024-11-08 06:29:52,932][41694] Fps is (10 sec: 6553.7, 60 sec: 6485.3, 300 sec: 6690.6). Total num frames: 196874240. Throughput: 0: 1716.9. Samples: 44214314. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:29:52,934][41694] Avg episode reward: [(0, '4.605')] +[2024-11-08 06:29:53,335][42004] Updated weights for policy 0, policy_version 48066 (0.0036) +[2024-11-08 06:29:57,932][41694] Fps is (10 sec: 6143.7, 60 sec: 6485.3, 300 sec: 6692.4). Total num frames: 196907008. Throughput: 0: 1671.5. Samples: 44223110. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:29:57,935][41694] Avg episode reward: [(0, '4.634')] +[2024-11-08 06:29:59,492][42004] Updated weights for policy 0, policy_version 48076 (0.0030) +[2024-11-08 06:30:02,931][41694] Fps is (10 sec: 6553.7, 60 sec: 6758.4, 300 sec: 6678.6). Total num frames: 196939776. Throughput: 0: 1669.5. Samples: 44228060. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 06:30:02,933][41694] Avg episode reward: [(0, '4.496')] +[2024-11-08 06:30:05,564][42004] Updated weights for policy 0, policy_version 48086 (0.0032) +[2024-11-08 06:30:09,459][41694] Fps is (10 sec: 5330.2, 60 sec: 6457.5, 300 sec: 6616.5). Total num frames: 196968448. Throughput: 0: 1620.6. Samples: 44238464. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 06:30:09,461][41694] Avg episode reward: [(0, '4.454')] +[2024-11-08 06:30:12,931][41694] Fps is (10 sec: 5734.4, 60 sec: 6485.4, 300 sec: 6623.0). Total num frames: 196997120. Throughput: 0: 1577.4. Samples: 44245066. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 06:30:12,933][41694] Avg episode reward: [(0, '4.614')] +[2024-11-08 06:30:13,513][42004] Updated weights for policy 0, policy_version 48096 (0.0035) +[2024-11-08 06:30:17,931][41694] Fps is (10 sec: 6768.1, 60 sec: 6417.1, 300 sec: 6609.1). Total num frames: 197025792. Throughput: 0: 1561.2. Samples: 44249960. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 06:30:17,933][41694] Avg episode reward: [(0, '4.576')] +[2024-11-08 06:30:20,198][42004] Updated weights for policy 0, policy_version 48106 (0.0031) +[2024-11-08 06:30:22,931][41694] Fps is (10 sec: 6144.0, 60 sec: 6417.1, 300 sec: 6609.1). Total num frames: 197058560. Throughput: 0: 1603.1. Samples: 44259678. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 06:30:22,933][41694] Avg episode reward: [(0, '4.633')] +[2024-11-08 06:30:26,371][42004] Updated weights for policy 0, policy_version 48116 (0.0033) +[2024-11-08 06:30:27,932][41694] Fps is (10 sec: 6553.4, 60 sec: 6348.8, 300 sec: 6654.5). Total num frames: 197091328. Throughput: 0: 1577.1. Samples: 44269440. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 06:30:27,934][41694] Avg episode reward: [(0, '4.807')] +[2024-11-08 06:30:32,353][42004] Updated weights for policy 0, policy_version 48126 (0.0034) +[2024-11-08 06:30:32,932][41694] Fps is (10 sec: 6963.0, 60 sec: 6280.6, 300 sec: 6664.7). Total num frames: 197128192. Throughput: 0: 1569.5. Samples: 44274198. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:30:32,934][41694] Avg episode reward: [(0, '4.537')] +[2024-11-08 06:30:37,684][42004] Updated weights for policy 0, policy_version 48136 (0.0036) +[2024-11-08 06:30:37,932][41694] Fps is (10 sec: 7372.9, 60 sec: 6553.6, 300 sec: 6650.8). Total num frames: 197165056. Throughput: 0: 1584.5. Samples: 44285618. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:30:37,939][41694] Avg episode reward: [(0, '4.356')] +[2024-11-08 06:30:43,910][41694] Fps is (10 sec: 5969.9, 60 sec: 6314.2, 300 sec: 6601.1). Total num frames: 197193728. Throughput: 0: 1481.5. Samples: 44291228. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:30:43,911][41694] Avg episode reward: [(0, '4.525')] +[2024-11-08 06:30:45,481][42004] Updated weights for policy 0, policy_version 48146 (0.0040) +[2024-11-08 06:30:47,931][41694] Fps is (10 sec: 5734.5, 60 sec: 6280.5, 300 sec: 6581.4). Total num frames: 197222400. Throughput: 0: 1563.3. Samples: 44298408. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:30:47,933][41694] Avg episode reward: [(0, '4.451')] +[2024-11-08 06:30:51,067][42004] Updated weights for policy 0, policy_version 48156 (0.0024) +[2024-11-08 06:30:52,931][41694] Fps is (10 sec: 7264.1, 60 sec: 6417.1, 300 sec: 6595.3). Total num frames: 197259264. Throughput: 0: 1627.4. Samples: 44309210. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:30:52,933][41694] Avg episode reward: [(0, '4.255')] +[2024-11-08 06:30:56,691][42004] Updated weights for policy 0, policy_version 48166 (0.0037) +[2024-11-08 06:30:57,935][41694] Fps is (10 sec: 7370.5, 60 sec: 6485.0, 300 sec: 6609.1). Total num frames: 197296128. Throughput: 0: 1668.6. Samples: 44320160. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:30:57,937][41694] Avg episode reward: [(0, '4.434')] +[2024-11-08 06:31:02,931][41694] Fps is (10 sec: 6553.7, 60 sec: 6417.1, 300 sec: 6644.8). Total num frames: 197324800. Throughput: 0: 1670.6. Samples: 44325138. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:31:02,933][41694] Avg episode reward: [(0, '4.254')] +[2024-11-08 06:31:02,990][42004] Updated weights for policy 0, policy_version 48176 (0.0040) +[2024-11-08 06:31:07,931][41694] Fps is (10 sec: 6555.7, 60 sec: 6724.8, 300 sec: 6664.7). Total num frames: 197361664. Throughput: 0: 1670.4. Samples: 44334846. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:31:07,933][41694] Avg episode reward: [(0, '4.400')] +[2024-11-08 06:31:08,711][42004] Updated weights for policy 0, policy_version 48186 (0.0039) +[2024-11-08 06:31:12,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6621.9, 300 sec: 6650.8). Total num frames: 197394432. Throughput: 0: 1680.3. Samples: 44345054. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:31:12,933][41694] Avg episode reward: [(0, '4.365')] +[2024-11-08 06:31:15,182][42004] Updated weights for policy 0, policy_version 48196 (0.0029) +[2024-11-08 06:31:18,386][41694] Fps is (10 sec: 5093.1, 60 sec: 6436.5, 300 sec: 6571.2). Total num frames: 197414912. Throughput: 0: 1667.2. Samples: 44349978. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:31:18,389][41694] Avg episode reward: [(0, '4.258')] +[2024-11-08 06:31:22,931][41694] Fps is (10 sec: 5324.8, 60 sec: 6485.3, 300 sec: 6567.5). Total num frames: 197447680. Throughput: 0: 1586.2. Samples: 44356996. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:31:22,934][41694] Avg episode reward: [(0, '4.564')] +[2024-11-08 06:31:22,976][42004] Updated weights for policy 0, policy_version 48206 (0.0040) +[2024-11-08 06:31:27,931][41694] Fps is (10 sec: 7724.1, 60 sec: 6621.9, 300 sec: 6609.1). Total num frames: 197488640. Throughput: 0: 1749.2. Samples: 44368230. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:31:27,933][41694] Avg episode reward: [(0, '4.829')] +[2024-11-08 06:31:28,309][42004] Updated weights for policy 0, policy_version 48216 (0.0029) +[2024-11-08 06:31:32,932][41694] Fps is (10 sec: 7782.3, 60 sec: 6621.9, 300 sec: 6609.1). Total num frames: 197525504. Throughput: 0: 1674.2. Samples: 44373748. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:31:32,934][41694] Avg episode reward: [(0, '4.358')] +[2024-11-08 06:31:33,940][42004] Updated weights for policy 0, policy_version 48226 (0.0023) +[2024-11-08 06:31:37,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6553.6, 300 sec: 6665.0). Total num frames: 197558272. Throughput: 0: 1663.6. Samples: 44384072. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:31:37,936][41694] Avg episode reward: [(0, '4.318')] +[2024-11-08 06:31:37,947][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000048232_197558272.pth... +[2024-11-08 06:31:38,057][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000047846_195977216.pth +[2024-11-08 06:31:40,093][42004] Updated weights for policy 0, policy_version 48236 (0.0039) +[2024-11-08 06:31:42,932][41694] Fps is (10 sec: 6963.3, 60 sec: 6801.0, 300 sec: 6664.7). Total num frames: 197595136. Throughput: 0: 1660.6. Samples: 44394880. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:31:42,935][41694] Avg episode reward: [(0, '4.387')] +[2024-11-08 06:31:45,677][42004] Updated weights for policy 0, policy_version 48246 (0.0033) +[2024-11-08 06:31:47,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6826.7, 300 sec: 6664.7). Total num frames: 197632000. Throughput: 0: 1669.0. Samples: 44400244. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:31:47,934][41694] Avg episode reward: [(0, '4.580')] +[2024-11-08 06:31:52,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6553.6, 300 sec: 6609.1). Total num frames: 197652480. Throughput: 0: 1687.1. Samples: 44410764. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:31:52,934][41694] Avg episode reward: [(0, '4.538')] +[2024-11-08 06:31:53,318][42004] Updated weights for policy 0, policy_version 48256 (0.0033) +[2024-11-08 06:31:57,931][41694] Fps is (10 sec: 5734.4, 60 sec: 6553.9, 300 sec: 6623.0). Total num frames: 197689344. Throughput: 0: 1637.3. Samples: 44418734. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:31:57,933][41694] Avg episode reward: [(0, '4.513')] +[2024-11-08 06:31:58,665][42004] Updated weights for policy 0, policy_version 48266 (0.0028) +[2024-11-08 06:32:02,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6690.1, 300 sec: 6636.9). Total num frames: 197726208. Throughput: 0: 1673.7. Samples: 44424534. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:32:02,935][41694] Avg episode reward: [(0, '4.555')] +[2024-11-08 06:32:04,447][42004] Updated weights for policy 0, policy_version 48276 (0.0034) +[2024-11-08 06:32:07,932][41694] Fps is (10 sec: 6963.0, 60 sec: 6621.8, 300 sec: 6623.0). Total num frames: 197758976. Throughput: 0: 1731.6. Samples: 44434918. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:32:07,934][41694] Avg episode reward: [(0, '4.500')] +[2024-11-08 06:32:10,687][42004] Updated weights for policy 0, policy_version 48286 (0.0023) +[2024-11-08 06:32:12,932][41694] Fps is (10 sec: 6553.5, 60 sec: 6621.8, 300 sec: 6650.8). Total num frames: 197791744. Throughput: 0: 1700.6. Samples: 44444758. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:32:12,934][41694] Avg episode reward: [(0, '4.392')] +[2024-11-08 06:32:16,615][42004] Updated weights for policy 0, policy_version 48296 (0.0022) +[2024-11-08 06:32:17,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6947.6, 300 sec: 6664.7). Total num frames: 197828608. Throughput: 0: 1685.2. Samples: 44449580. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:32:17,935][41694] Avg episode reward: [(0, '4.520')] +[2024-11-08 06:32:22,235][42004] Updated weights for policy 0, policy_version 48306 (0.0025) +[2024-11-08 06:32:22,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6963.2, 300 sec: 6650.8). Total num frames: 197865472. Throughput: 0: 1699.3. Samples: 44460542. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:32:22,934][41694] Avg episode reward: [(0, '4.746')] +[2024-11-08 06:32:27,932][41694] Fps is (10 sec: 5734.5, 60 sec: 6621.9, 300 sec: 6595.3). Total num frames: 197885952. Throughput: 0: 1625.1. Samples: 44468010. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:32:27,933][41694] Avg episode reward: [(0, '4.591')] +[2024-11-08 06:32:30,058][42004] Updated weights for policy 0, policy_version 48316 (0.2258) +[2024-11-08 06:32:32,931][41694] Fps is (10 sec: 5734.6, 60 sec: 6621.9, 300 sec: 6609.1). Total num frames: 197922816. Throughput: 0: 1618.0. Samples: 44473052. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:32:32,933][41694] Avg episode reward: [(0, '4.260')] +[2024-11-08 06:32:35,472][42004] Updated weights for policy 0, policy_version 48326 (0.0029) +[2024-11-08 06:32:37,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6690.1, 300 sec: 6609.1). Total num frames: 197959680. Throughput: 0: 1633.1. Samples: 44484254. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:32:37,934][41694] Avg episode reward: [(0, '4.376')] +[2024-11-08 06:32:41,167][42004] Updated weights for policy 0, policy_version 48336 (0.0032) +[2024-11-08 06:32:42,934][41694] Fps is (10 sec: 6961.6, 60 sec: 6621.6, 300 sec: 6609.1). Total num frames: 197992448. Throughput: 0: 1690.4. Samples: 44494806. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:32:42,935][41694] Avg episode reward: [(0, '4.534')] +[2024-11-08 06:32:47,484][42004] Updated weights for policy 0, policy_version 48346 (0.0053) +[2024-11-08 06:32:47,932][41694] Fps is (10 sec: 6553.5, 60 sec: 6553.6, 300 sec: 6650.8). Total num frames: 198025216. Throughput: 0: 1659.0. Samples: 44499188. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:32:47,934][41694] Avg episode reward: [(0, '4.430')] +[2024-11-08 06:32:52,931][41694] Fps is (10 sec: 6964.8, 60 sec: 6826.7, 300 sec: 6636.9). Total num frames: 198062080. Throughput: 0: 1675.8. Samples: 44510330. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:32:52,933][41694] Avg episode reward: [(0, '4.377')] +[2024-11-08 06:32:52,951][42004] Updated weights for policy 0, policy_version 48356 (0.0023) +[2024-11-08 06:32:57,932][41694] Fps is (10 sec: 7781.9, 60 sec: 6894.8, 300 sec: 6650.8). Total num frames: 198103040. Throughput: 0: 1711.4. Samples: 44521770. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:32:57,935][41694] Avg episode reward: [(0, '4.502')] +[2024-11-08 06:32:58,316][42004] Updated weights for policy 0, policy_version 48366 (0.0031) +[2024-11-08 06:33:02,931][41694] Fps is (10 sec: 5734.4, 60 sec: 6553.6, 300 sec: 6581.4). Total num frames: 198119424. Throughput: 0: 1712.9. Samples: 44526660. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:33:02,934][41694] Avg episode reward: [(0, '4.568')] +[2024-11-08 06:33:06,488][42004] Updated weights for policy 0, policy_version 48376 (0.0030) +[2024-11-08 06:33:07,944][41694] Fps is (10 sec: 5318.9, 60 sec: 6620.6, 300 sec: 6581.2). Total num frames: 198156288. Throughput: 0: 1625.4. Samples: 44533704. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:33:07,946][41694] Avg episode reward: [(0, '4.625')] +[2024-11-08 06:33:12,084][42004] Updated weights for policy 0, policy_version 48386 (0.0026) +[2024-11-08 06:33:12,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6690.2, 300 sec: 6581.4). Total num frames: 198193152. Throughput: 0: 1700.7. Samples: 44544540. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:33:12,934][41694] Avg episode reward: [(0, '4.478')] +[2024-11-08 06:33:17,932][41694] Fps is (10 sec: 6971.1, 60 sec: 6621.8, 300 sec: 6581.4). Total num frames: 198225920. Throughput: 0: 1707.2. Samples: 44549878. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:33:17,933][41694] Avg episode reward: [(0, '4.424')] +[2024-11-08 06:33:18,003][42004] Updated weights for policy 0, policy_version 48396 (0.0024) +[2024-11-08 06:33:22,932][41694] Fps is (10 sec: 6962.5, 60 sec: 6621.8, 300 sec: 6650.8). Total num frames: 198262784. Throughput: 0: 1688.2. Samples: 44560224. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:33:22,934][41694] Avg episode reward: [(0, '4.457')] +[2024-11-08 06:33:23,676][42004] Updated weights for policy 0, policy_version 48406 (0.0028) +[2024-11-08 06:33:27,931][41694] Fps is (10 sec: 7373.2, 60 sec: 6894.9, 300 sec: 6636.9). Total num frames: 198299648. Throughput: 0: 1705.5. Samples: 44571550. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:33:27,933][41694] Avg episode reward: [(0, '4.557')] +[2024-11-08 06:33:29,033][42004] Updated weights for policy 0, policy_version 48416 (0.0039) +[2024-11-08 06:33:32,932][41694] Fps is (10 sec: 7373.4, 60 sec: 6894.9, 300 sec: 6623.0). Total num frames: 198336512. Throughput: 0: 1727.2. Samples: 44576910. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:33:32,933][41694] Avg episode reward: [(0, '4.333')] +[2024-11-08 06:33:37,133][42004] Updated weights for policy 0, policy_version 48426 (0.0047) +[2024-11-08 06:33:37,931][41694] Fps is (10 sec: 5734.4, 60 sec: 6621.9, 300 sec: 6581.4). Total num frames: 198356992. Throughput: 0: 1642.8. Samples: 44584258. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:33:37,934][41694] Avg episode reward: [(0, '4.223')] +[2024-11-08 06:33:37,947][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000048427_198356992.pth... +[2024-11-08 06:33:38,074][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000048040_196771840.pth +[2024-11-08 06:33:42,877][42004] Updated weights for policy 0, policy_version 48436 (0.0032) +[2024-11-08 06:33:42,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6690.3, 300 sec: 6595.2). Total num frames: 198393856. Throughput: 0: 1611.8. Samples: 44594300. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:33:42,933][41694] Avg episode reward: [(0, '4.571')] +[2024-11-08 06:33:47,933][41694] Fps is (10 sec: 7371.8, 60 sec: 6758.3, 300 sec: 6595.2). Total num frames: 198430720. Throughput: 0: 1625.5. Samples: 44599812. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:33:47,937][41694] Avg episode reward: [(0, '4.767')] +[2024-11-08 06:33:48,424][42004] Updated weights for policy 0, policy_version 48446 (0.0031) +[2024-11-08 06:33:52,937][41694] Fps is (10 sec: 6959.6, 60 sec: 6689.5, 300 sec: 6595.1). Total num frames: 198463488. Throughput: 0: 1697.8. Samples: 44610092. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:33:52,943][41694] Avg episode reward: [(0, '4.524')] +[2024-11-08 06:33:54,633][42004] Updated weights for policy 0, policy_version 48456 (0.0020) +[2024-11-08 06:33:57,931][41694] Fps is (10 sec: 6964.2, 60 sec: 6622.0, 300 sec: 6664.7). Total num frames: 198500352. Throughput: 0: 1697.2. Samples: 44620914. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:33:57,933][41694] Avg episode reward: [(0, '4.466')] +[2024-11-08 06:34:00,234][42004] Updated weights for policy 0, policy_version 48466 (0.0035) +[2024-11-08 06:34:02,932][41694] Fps is (10 sec: 6967.0, 60 sec: 6894.9, 300 sec: 6650.8). Total num frames: 198533120. Throughput: 0: 1699.7. Samples: 44626364. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:34:02,934][41694] Avg episode reward: [(0, '4.496')] +[2024-11-08 06:34:06,147][42004] Updated weights for policy 0, policy_version 48476 (0.0033) +[2024-11-08 06:34:07,932][41694] Fps is (10 sec: 6553.4, 60 sec: 6828.0, 300 sec: 6636.9). Total num frames: 198565888. Throughput: 0: 1697.9. Samples: 44636630. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:34:07,934][41694] Avg episode reward: [(0, '4.529')] +[2024-11-08 06:34:12,932][41694] Fps is (10 sec: 5324.8, 60 sec: 6553.6, 300 sec: 6595.2). Total num frames: 198586368. Throughput: 0: 1591.1. Samples: 44643148. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:34:12,935][41694] Avg episode reward: [(0, '4.462')] +[2024-11-08 06:34:14,162][42004] Updated weights for policy 0, policy_version 48486 (0.0040) +[2024-11-08 06:34:17,932][41694] Fps is (10 sec: 5734.2, 60 sec: 6621.9, 300 sec: 6609.1). Total num frames: 198623232. Throughput: 0: 1594.2. Samples: 44648648. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:34:17,939][41694] Avg episode reward: [(0, '4.571')] +[2024-11-08 06:34:19,792][42004] Updated weights for policy 0, policy_version 48496 (0.0036) +[2024-11-08 06:34:22,933][41694] Fps is (10 sec: 7372.1, 60 sec: 6621.9, 300 sec: 6609.1). Total num frames: 198660096. Throughput: 0: 1679.6. Samples: 44659840. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:34:22,935][41694] Avg episode reward: [(0, '4.314')] +[2024-11-08 06:34:25,781][42004] Updated weights for policy 0, policy_version 48506 (0.0028) +[2024-11-08 06:34:27,931][41694] Fps is (10 sec: 6963.6, 60 sec: 6553.6, 300 sec: 6581.4). Total num frames: 198692864. Throughput: 0: 1673.8. Samples: 44669622. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:34:27,933][41694] Avg episode reward: [(0, '4.350')] +[2024-11-08 06:34:31,699][42004] Updated weights for policy 0, policy_version 48516 (0.0027) +[2024-11-08 06:34:32,931][41694] Fps is (10 sec: 6964.0, 60 sec: 6553.6, 300 sec: 6636.9). Total num frames: 198729728. Throughput: 0: 1671.3. Samples: 44675016. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:34:32,933][41694] Avg episode reward: [(0, '4.678')] +[2024-11-08 06:34:37,154][42004] Updated weights for policy 0, policy_version 48526 (0.0030) +[2024-11-08 06:34:37,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6826.7, 300 sec: 6636.9). Total num frames: 198766592. Throughput: 0: 1685.8. Samples: 44685944. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:34:37,933][41694] Avg episode reward: [(0, '4.262')] +[2024-11-08 06:34:42,678][42004] Updated weights for policy 0, policy_version 48536 (0.0029) +[2024-11-08 06:34:42,938][41694] Fps is (10 sec: 7367.9, 60 sec: 6826.0, 300 sec: 6636.8). Total num frames: 198803456. Throughput: 0: 1694.7. Samples: 44697188. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:34:42,946][41694] Avg episode reward: [(0, '4.480')] +[2024-11-08 06:34:47,932][41694] Fps is (10 sec: 5734.1, 60 sec: 6553.7, 300 sec: 6609.1). Total num frames: 198823936. Throughput: 0: 1615.0. Samples: 44699038. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:34:47,934][41694] Avg episode reward: [(0, '4.587')] +[2024-11-08 06:34:50,524][42004] Updated weights for policy 0, policy_version 48546 (0.0032) +[2024-11-08 06:34:52,932][41694] Fps is (10 sec: 5738.2, 60 sec: 6622.5, 300 sec: 6623.0). Total num frames: 198860800. Throughput: 0: 1618.7. Samples: 44709472. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:34:52,934][41694] Avg episode reward: [(0, '4.339')] +[2024-11-08 06:34:56,392][42004] Updated weights for policy 0, policy_version 48556 (0.0027) +[2024-11-08 06:34:57,933][41694] Fps is (10 sec: 6962.9, 60 sec: 6553.5, 300 sec: 6623.0). Total num frames: 198893568. Throughput: 0: 1706.9. Samples: 44719962. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:34:57,935][41694] Avg episode reward: [(0, '4.394')] +[2024-11-08 06:35:02,852][42004] Updated weights for policy 0, policy_version 48566 (0.0035) +[2024-11-08 06:35:02,931][41694] Fps is (10 sec: 6553.7, 60 sec: 6553.6, 300 sec: 6671.4). Total num frames: 198926336. Throughput: 0: 1691.1. Samples: 44724748. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:35:02,933][41694] Avg episode reward: [(0, '4.384')] +[2024-11-08 06:35:07,931][41694] Fps is (10 sec: 6964.0, 60 sec: 6621.9, 300 sec: 6664.7). Total num frames: 198963200. Throughput: 0: 1671.2. Samples: 44735042. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:35:07,933][41694] Avg episode reward: [(0, '4.556')] +[2024-11-08 06:35:08,212][42004] Updated weights for policy 0, policy_version 48576 (0.0038) +[2024-11-08 06:35:12,932][41694] Fps is (10 sec: 7372.6, 60 sec: 6894.9, 300 sec: 6692.4). Total num frames: 199000064. Throughput: 0: 1700.7. Samples: 44746156. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:35:12,938][41694] Avg episode reward: [(0, '4.412')] +[2024-11-08 06:35:14,126][42004] Updated weights for policy 0, policy_version 48586 (0.0033) +[2024-11-08 06:35:19,784][41694] Fps is (10 sec: 5529.2, 60 sec: 6556.0, 300 sec: 6636.9). Total num frames: 199028736. Throughput: 0: 1616.9. Samples: 44750774. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:35:19,790][41694] Avg episode reward: [(0, '4.524')] +[2024-11-08 06:35:22,339][42004] Updated weights for policy 0, policy_version 48596 (0.0027) +[2024-11-08 06:35:22,933][41694] Fps is (10 sec: 5324.0, 60 sec: 6553.5, 300 sec: 6650.8). Total num frames: 199053312. Throughput: 0: 1583.7. Samples: 44757214. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:35:22,936][41694] Avg episode reward: [(0, '4.305')] +[2024-11-08 06:35:27,926][42004] Updated weights for policy 0, policy_version 48606 (0.0031) +[2024-11-08 06:35:27,932][41694] Fps is (10 sec: 7540.7, 60 sec: 6621.8, 300 sec: 6650.8). Total num frames: 199090176. Throughput: 0: 1581.3. Samples: 44768338. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:35:27,935][41694] Avg episode reward: [(0, '4.584')] +[2024-11-08 06:35:32,932][41694] Fps is (10 sec: 6964.3, 60 sec: 6553.6, 300 sec: 6636.9). Total num frames: 199122944. Throughput: 0: 1661.4. Samples: 44773802. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:35:32,933][41694] Avg episode reward: [(0, '4.470')] +[2024-11-08 06:35:33,859][42004] Updated weights for policy 0, policy_version 48616 (0.0034) +[2024-11-08 06:35:37,931][41694] Fps is (10 sec: 6963.5, 60 sec: 6553.6, 300 sec: 6686.8). Total num frames: 199159808. Throughput: 0: 1656.4. Samples: 44784008. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:35:37,933][41694] Avg episode reward: [(0, '4.452')] +[2024-11-08 06:35:37,946][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000048623_199159808.pth... +[2024-11-08 06:35:38,121][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000048232_197558272.pth +[2024-11-08 06:35:39,473][42004] Updated weights for policy 0, policy_version 48626 (0.0030) +[2024-11-08 06:35:42,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6486.0, 300 sec: 6678.6). Total num frames: 199192576. Throughput: 0: 1658.8. Samples: 44794608. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:35:42,933][41694] Avg episode reward: [(0, '4.382')] +[2024-11-08 06:35:45,424][42004] Updated weights for policy 0, policy_version 48636 (0.0026) +[2024-11-08 06:35:47,933][41694] Fps is (10 sec: 6962.4, 60 sec: 6758.4, 300 sec: 6678.5). Total num frames: 199229440. Throughput: 0: 1675.4. Samples: 44800142. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:35:47,937][41694] Avg episode reward: [(0, '4.275')] +[2024-11-08 06:35:50,912][42004] Updated weights for policy 0, policy_version 48646 (0.0023) +[2024-11-08 06:35:54,274][41694] Fps is (10 sec: 6139.2, 60 sec: 6543.7, 300 sec: 6634.6). Total num frames: 199262208. Throughput: 0: 1645.1. Samples: 44811278. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:35:54,277][41694] Avg episode reward: [(0, '4.376')] +[2024-11-08 06:35:57,931][41694] Fps is (10 sec: 6144.7, 60 sec: 6622.0, 300 sec: 6664.7). Total num frames: 199290880. Throughput: 0: 1603.4. Samples: 44818310. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:35:57,933][41694] Avg episode reward: [(0, '4.556')] +[2024-11-08 06:35:58,505][42004] Updated weights for policy 0, policy_version 48656 (0.0025) +[2024-11-08 06:36:02,932][41694] Fps is (10 sec: 7096.5, 60 sec: 6621.8, 300 sec: 6650.8). Total num frames: 199323648. Throughput: 0: 1696.9. Samples: 44823992. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:36:02,934][41694] Avg episode reward: [(0, '4.774')] +[2024-11-08 06:36:04,556][42004] Updated weights for policy 0, policy_version 48666 (0.0033) +[2024-11-08 06:36:07,931][41694] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6650.8). Total num frames: 199356416. Throughput: 0: 1698.5. Samples: 44833644. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:36:07,934][41694] Avg episode reward: [(0, '4.528')] +[2024-11-08 06:36:11,053][42004] Updated weights for policy 0, policy_version 48676 (0.0031) +[2024-11-08 06:36:12,932][41694] Fps is (10 sec: 6143.7, 60 sec: 6417.0, 300 sec: 6688.9). Total num frames: 199385088. Throughput: 0: 1667.4. Samples: 44843372. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:36:12,934][41694] Avg episode reward: [(0, '4.710')] +[2024-11-08 06:36:17,006][42004] Updated weights for policy 0, policy_version 48686 (0.0035) +[2024-11-08 06:36:17,932][41694] Fps is (10 sec: 6553.3, 60 sec: 6762.4, 300 sec: 6692.4). Total num frames: 199421952. Throughput: 0: 1651.9. Samples: 44848140. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:36:17,936][41694] Avg episode reward: [(0, '4.413')] +[2024-11-08 06:36:22,548][42004] Updated weights for policy 0, policy_version 48696 (0.0024) +[2024-11-08 06:36:22,932][41694] Fps is (10 sec: 7373.2, 60 sec: 6758.6, 300 sec: 6678.6). Total num frames: 199458816. Throughput: 0: 1672.9. Samples: 44859290. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:36:22,935][41694] Avg episode reward: [(0, '4.294')] +[2024-11-08 06:36:28,776][41694] Fps is (10 sec: 6043.2, 60 sec: 6529.9, 300 sec: 6631.8). Total num frames: 199487488. Throughput: 0: 1535.1. Samples: 44864986. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:36:28,778][41694] Avg episode reward: [(0, '4.371')] +[2024-11-08 06:36:30,326][42004] Updated weights for policy 0, policy_version 48706 (0.0033) +[2024-11-08 06:36:32,931][41694] Fps is (10 sec: 5734.6, 60 sec: 6553.6, 300 sec: 6636.9). Total num frames: 199516160. Throughput: 0: 1594.5. Samples: 44871892. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:36:32,933][41694] Avg episode reward: [(0, '4.201')] +[2024-11-08 06:36:35,807][42004] Updated weights for policy 0, policy_version 48716 (0.0024) +[2024-11-08 06:36:37,932][41694] Fps is (10 sec: 7158.4, 60 sec: 6553.6, 300 sec: 6636.9). Total num frames: 199553024. Throughput: 0: 1641.3. Samples: 44882934. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:36:37,936][41694] Avg episode reward: [(0, '4.219')] +[2024-11-08 06:36:42,144][42004] Updated weights for policy 0, policy_version 48726 (0.0039) +[2024-11-08 06:36:42,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6553.6, 300 sec: 6623.0). Total num frames: 199585792. Throughput: 0: 1654.4. Samples: 44892756. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:36:42,933][41694] Avg episode reward: [(0, '4.566')] +[2024-11-08 06:36:47,663][42004] Updated weights for policy 0, policy_version 48736 (0.0025) +[2024-11-08 06:36:47,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6553.7, 300 sec: 6678.6). Total num frames: 199622656. Throughput: 0: 1645.7. Samples: 44898048. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:36:47,933][41694] Avg episode reward: [(0, '4.476')] +[2024-11-08 06:36:52,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6773.4, 300 sec: 6678.6). Total num frames: 199659520. Throughput: 0: 1680.2. Samples: 44909254. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:36:52,933][41694] Avg episode reward: [(0, '4.370')] +[2024-11-08 06:36:53,181][42004] Updated weights for policy 0, policy_version 48746 (0.0029) +[2024-11-08 06:36:57,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6758.4, 300 sec: 6678.6). Total num frames: 199696384. Throughput: 0: 1713.2. Samples: 44920464. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:36:57,933][41694] Avg episode reward: [(0, '4.529')] +[2024-11-08 06:36:58,605][42004] Updated weights for policy 0, policy_version 48756 (0.0025) +[2024-11-08 06:37:03,327][41694] Fps is (10 sec: 5516.3, 60 sec: 6510.7, 300 sec: 6628.0). Total num frames: 199716864. Throughput: 0: 1713.2. Samples: 44925910. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:37:03,338][41694] Avg episode reward: [(0, '4.564')] +[2024-11-08 06:37:06,886][42004] Updated weights for policy 0, policy_version 48766 (0.0025) +[2024-11-08 06:37:07,931][41694] Fps is (10 sec: 5324.8, 60 sec: 6553.6, 300 sec: 6636.9). Total num frames: 199749632. Throughput: 0: 1622.1. Samples: 44932284. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:37:07,933][41694] Avg episode reward: [(0, '4.596')] +[2024-11-08 06:37:12,651][42004] Updated weights for policy 0, policy_version 48776 (0.0034) +[2024-11-08 06:37:12,931][41694] Fps is (10 sec: 7249.9, 60 sec: 6690.2, 300 sec: 6636.9). Total num frames: 199786496. Throughput: 0: 1769.4. Samples: 44943112. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:37:12,934][41694] Avg episode reward: [(0, '4.643')] +[2024-11-08 06:37:17,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6690.2, 300 sec: 6636.9). Total num frames: 199823360. Throughput: 0: 1692.4. Samples: 44948050. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:37:17,934][41694] Avg episode reward: [(0, '4.486')] +[2024-11-08 06:37:18,365][42004] Updated weights for policy 0, policy_version 48786 (0.0030) +[2024-11-08 06:37:22,932][41694] Fps is (10 sec: 7372.6, 60 sec: 6690.1, 300 sec: 6692.4). Total num frames: 199860224. Throughput: 0: 1697.9. Samples: 44959340. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:37:22,933][41694] Avg episode reward: [(0, '4.549')] +[2024-11-08 06:37:23,827][42004] Updated weights for policy 0, policy_version 48796 (0.0029) +[2024-11-08 06:37:27,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6924.2, 300 sec: 6692.4). Total num frames: 199897088. Throughput: 0: 1723.6. Samples: 44970316. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:37:27,933][41694] Avg episode reward: [(0, '4.369')] +[2024-11-08 06:37:29,565][42004] Updated weights for policy 0, policy_version 48806 (0.0026) +[2024-11-08 06:37:32,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6963.2, 300 sec: 6692.4). Total num frames: 199933952. Throughput: 0: 1725.7. Samples: 44975704. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:37:32,933][41694] Avg episode reward: [(0, '4.586')] +[2024-11-08 06:37:35,175][42004] Updated weights for policy 0, policy_version 48816 (0.0041) +[2024-11-08 06:37:37,944][41694] Fps is (10 sec: 5318.0, 60 sec: 6620.5, 300 sec: 6636.7). Total num frames: 199950336. Throughput: 0: 1596.5. Samples: 44981116. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:37:37,948][41694] Avg episode reward: [(0, '4.481')] +[2024-11-08 06:37:37,959][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000048816_199950336.pth... +[2024-11-08 06:37:38,073][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000048427_198356992.pth +[2024-11-08 06:37:42,932][41694] Fps is (10 sec: 5324.5, 60 sec: 6690.1, 300 sec: 6650.8). Total num frames: 199987200. Throughput: 0: 1614.5. Samples: 44993118. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:37:42,934][41694] Avg episode reward: [(0, '4.654')] +[2024-11-08 06:37:43,251][42004] Updated weights for policy 0, policy_version 48826 (0.0038) +[2024-11-08 06:37:47,932][41694] Fps is (10 sec: 6972.1, 60 sec: 6621.9, 300 sec: 6636.9). Total num frames: 200019968. Throughput: 0: 1626.2. Samples: 44998444. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:37:47,933][41694] Avg episode reward: [(0, '4.548')] +[2024-11-08 06:37:49,156][42004] Updated weights for policy 0, policy_version 48836 (0.0045) +[2024-11-08 06:37:52,932][41694] Fps is (10 sec: 6963.6, 60 sec: 6621.9, 300 sec: 6623.0). Total num frames: 200056832. Throughput: 0: 1695.2. Samples: 45008570. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:37:52,934][41694] Avg episode reward: [(0, '4.384')] +[2024-11-08 06:37:55,010][42004] Updated weights for policy 0, policy_version 48846 (0.0038) +[2024-11-08 06:37:57,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6621.9, 300 sec: 6692.4). Total num frames: 200093696. Throughput: 0: 1699.4. Samples: 45019586. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:37:57,933][41694] Avg episode reward: [(0, '4.495')] +[2024-11-08 06:38:00,600][42004] Updated weights for policy 0, policy_version 48856 (0.0028) +[2024-11-08 06:38:02,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6940.7, 300 sec: 6692.7). Total num frames: 200130560. Throughput: 0: 1711.6. Samples: 45025070. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:38:02,933][41694] Avg episode reward: [(0, '4.378')] +[2024-11-08 06:38:06,289][42004] Updated weights for policy 0, policy_version 48866 (0.0035) +[2024-11-08 06:38:07,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6894.9, 300 sec: 6678.6). Total num frames: 200163328. Throughput: 0: 1701.6. Samples: 45035912. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:38:07,934][41694] Avg episode reward: [(0, '4.453')] +[2024-11-08 06:38:12,931][41694] Fps is (10 sec: 5324.9, 60 sec: 6621.9, 300 sec: 6636.9). Total num frames: 200183808. Throughput: 0: 1624.4. Samples: 45043412. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:38:12,933][41694] Avg episode reward: [(0, '4.300')] +[2024-11-08 06:38:14,480][42004] Updated weights for policy 0, policy_version 48876 (0.0028) +[2024-11-08 06:38:17,931][41694] Fps is (10 sec: 5734.4, 60 sec: 6621.9, 300 sec: 6636.9). Total num frames: 200220672. Throughput: 0: 1596.9. Samples: 45047566. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:38:17,935][41694] Avg episode reward: [(0, '4.422')] +[2024-11-08 06:38:20,476][42004] Updated weights for policy 0, policy_version 48886 (0.0028) +[2024-11-08 06:38:22,931][41694] Fps is (10 sec: 6553.5, 60 sec: 6485.3, 300 sec: 6609.1). Total num frames: 200249344. Throughput: 0: 1700.7. Samples: 45057626. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:38:22,934][41694] Avg episode reward: [(0, '4.571')] +[2024-11-08 06:38:26,389][42004] Updated weights for policy 0, policy_version 48896 (0.0025) +[2024-11-08 06:38:27,932][41694] Fps is (10 sec: 6553.6, 60 sec: 6485.3, 300 sec: 6609.1). Total num frames: 200286208. Throughput: 0: 1670.9. Samples: 45068308. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:38:27,934][41694] Avg episode reward: [(0, '4.366')] +[2024-11-08 06:38:31,849][42004] Updated weights for policy 0, policy_version 48906 (0.0032) +[2024-11-08 06:38:32,931][41694] Fps is (10 sec: 7782.4, 60 sec: 6553.6, 300 sec: 6678.6). Total num frames: 200327168. Throughput: 0: 1666.8. Samples: 45073450. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:38:32,935][41694] Avg episode reward: [(0, '4.658')] +[2024-11-08 06:38:37,159][42004] Updated weights for policy 0, policy_version 48916 (0.0026) +[2024-11-08 06:38:37,932][41694] Fps is (10 sec: 7782.0, 60 sec: 6896.3, 300 sec: 6678.6). Total num frames: 200364032. Throughput: 0: 1704.9. Samples: 45085292. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:38:37,935][41694] Avg episode reward: [(0, '4.518')] +[2024-11-08 06:38:42,887][42004] Updated weights for policy 0, policy_version 48926 (0.0037) +[2024-11-08 06:38:42,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6895.0, 300 sec: 6678.6). Total num frames: 200400896. Throughput: 0: 1697.9. Samples: 45095990. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:38:42,933][41694] Avg episode reward: [(0, '4.435')] +[2024-11-08 06:38:47,932][41694] Fps is (10 sec: 5734.6, 60 sec: 6690.1, 300 sec: 6637.0). Total num frames: 200421376. Throughput: 0: 1697.5. Samples: 45101456. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:38:47,933][41694] Avg episode reward: [(0, '4.519')] +[2024-11-08 06:38:50,788][42004] Updated weights for policy 0, policy_version 48936 (0.0022) +[2024-11-08 06:38:52,932][41694] Fps is (10 sec: 5324.8, 60 sec: 6621.9, 300 sec: 6623.0). Total num frames: 200454144. Throughput: 0: 1610.6. Samples: 45108388. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:38:52,936][41694] Avg episode reward: [(0, '4.559')] +[2024-11-08 06:38:57,049][42004] Updated weights for policy 0, policy_version 48946 (0.0024) +[2024-11-08 06:38:57,932][41694] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6623.0). Total num frames: 200486912. Throughput: 0: 1662.6. Samples: 45118228. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:38:57,935][41694] Avg episode reward: [(0, '4.444')] +[2024-11-08 06:39:02,661][42004] Updated weights for policy 0, policy_version 48956 (0.0024) +[2024-11-08 06:39:02,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6553.6, 300 sec: 6636.9). Total num frames: 200523776. Throughput: 0: 1692.1. Samples: 45123712. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:39:02,935][41694] Avg episode reward: [(0, '4.318')] +[2024-11-08 06:39:07,932][41694] Fps is (10 sec: 7372.6, 60 sec: 6621.8, 300 sec: 6692.4). Total num frames: 200560640. Throughput: 0: 1716.6. Samples: 45134874. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:39:07,934][41694] Avg episode reward: [(0, '4.312')] +[2024-11-08 06:39:07,948][42004] Updated weights for policy 0, policy_version 48966 (0.0020) +[2024-11-08 06:39:12,931][41694] Fps is (10 sec: 7782.5, 60 sec: 6963.2, 300 sec: 6706.3). Total num frames: 200601600. Throughput: 0: 1742.3. Samples: 45146710. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:39:12,935][41694] Avg episode reward: [(0, '4.545')] +[2024-11-08 06:39:13,239][42004] Updated weights for policy 0, policy_version 48976 (0.0021) +[2024-11-08 06:39:17,931][41694] Fps is (10 sec: 7782.7, 60 sec: 6963.2, 300 sec: 6706.4). Total num frames: 200638464. Throughput: 0: 1752.9. Samples: 45152330. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:39:17,934][41694] Avg episode reward: [(0, '4.552')] +[2024-11-08 06:39:18,552][42004] Updated weights for policy 0, policy_version 48986 (0.0029) +[2024-11-08 06:39:22,932][41694] Fps is (10 sec: 5324.8, 60 sec: 6758.4, 300 sec: 6650.8). Total num frames: 200654848. Throughput: 0: 1680.5. Samples: 45160914. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:39:22,933][41694] Avg episode reward: [(0, '4.654')] +[2024-11-08 06:39:27,357][42004] Updated weights for policy 0, policy_version 48996 (0.0030) +[2024-11-08 06:39:27,932][41694] Fps is (10 sec: 4914.8, 60 sec: 6690.0, 300 sec: 6636.9). Total num frames: 200687616. Throughput: 0: 1621.7. Samples: 45168970. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:39:27,935][41694] Avg episode reward: [(0, '4.578')] +[2024-11-08 06:39:32,932][41694] Fps is (10 sec: 6962.9, 60 sec: 6621.8, 300 sec: 6636.9). Total num frames: 200724480. Throughput: 0: 1600.8. Samples: 45173494. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:39:32,934][41694] Avg episode reward: [(0, '4.535')] +[2024-11-08 06:39:33,486][42004] Updated weights for policy 0, policy_version 49006 (0.0043) +[2024-11-08 06:39:37,932][41694] Fps is (10 sec: 7373.2, 60 sec: 6621.9, 300 sec: 6637.1). Total num frames: 200761344. Throughput: 0: 1691.9. Samples: 45184524. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:39:37,933][41694] Avg episode reward: [(0, '4.281')] +[2024-11-08 06:39:37,948][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000049014_200761344.pth... +[2024-11-08 06:39:38,086][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000048623_199159808.pth +[2024-11-08 06:39:38,794][42004] Updated weights for policy 0, policy_version 49016 (0.0030) +[2024-11-08 06:39:42,931][41694] Fps is (10 sec: 7373.1, 60 sec: 6621.9, 300 sec: 6692.5). Total num frames: 200798208. Throughput: 0: 1730.1. Samples: 45196082. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:39:42,935][41694] Avg episode reward: [(0, '4.494')] +[2024-11-08 06:39:44,282][42004] Updated weights for policy 0, policy_version 49026 (0.0039) +[2024-11-08 06:39:47,931][41694] Fps is (10 sec: 7782.6, 60 sec: 6963.2, 300 sec: 6706.3). Total num frames: 200839168. Throughput: 0: 1733.4. Samples: 45201714. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:39:47,932][41694] Avg episode reward: [(0, '4.588')] +[2024-11-08 06:39:49,593][42004] Updated weights for policy 0, policy_version 49036 (0.0032) +[2024-11-08 06:39:52,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6963.2, 300 sec: 6706.4). Total num frames: 200871936. Throughput: 0: 1732.5. Samples: 45212834. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:39:52,933][41694] Avg episode reward: [(0, '4.519')] +[2024-11-08 06:39:57,879][42004] Updated weights for policy 0, policy_version 49046 (0.0045) +[2024-11-08 06:39:57,932][41694] Fps is (10 sec: 5324.7, 60 sec: 6758.4, 300 sec: 6664.7). Total num frames: 200892416. Throughput: 0: 1608.1. Samples: 45219074. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:39:57,934][41694] Avg episode reward: [(0, '4.457')] +[2024-11-08 06:40:02,932][41694] Fps is (10 sec: 4915.1, 60 sec: 6621.9, 300 sec: 6636.9). Total num frames: 200921088. Throughput: 0: 1598.5. Samples: 45224264. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:40:02,935][41694] Avg episode reward: [(0, '4.400')] +[2024-11-08 06:40:04,596][42004] Updated weights for policy 0, policy_version 49056 (0.0043) +[2024-11-08 06:40:07,936][41694] Fps is (10 sec: 6141.5, 60 sec: 6553.2, 300 sec: 6622.9). Total num frames: 200953856. Throughput: 0: 1610.8. Samples: 45233408. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:40:07,938][41694] Avg episode reward: [(0, '4.334')] +[2024-11-08 06:40:10,273][42004] Updated weights for policy 0, policy_version 49066 (0.0026) +[2024-11-08 06:40:12,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6485.3, 300 sec: 6692.8). Total num frames: 200990720. Throughput: 0: 1674.5. Samples: 45244320. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:40:12,933][41694] Avg episode reward: [(0, '4.368')] +[2024-11-08 06:40:16,160][42004] Updated weights for policy 0, policy_version 49076 (0.0026) +[2024-11-08 06:40:17,931][41694] Fps is (10 sec: 6966.0, 60 sec: 6417.1, 300 sec: 6678.6). Total num frames: 201023488. Throughput: 0: 1685.8. Samples: 45249356. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:40:17,933][41694] Avg episode reward: [(0, '4.365')] +[2024-11-08 06:40:21,757][42004] Updated weights for policy 0, policy_version 49086 (0.0035) +[2024-11-08 06:40:22,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6758.4, 300 sec: 6678.6). Total num frames: 201060352. Throughput: 0: 1683.5. Samples: 45260282. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:40:22,936][41694] Avg episode reward: [(0, '4.421')] +[2024-11-08 06:40:27,462][42004] Updated weights for policy 0, policy_version 49096 (0.0023) +[2024-11-08 06:40:27,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6826.8, 300 sec: 6692.4). Total num frames: 201097216. Throughput: 0: 1665.5. Samples: 45271030. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:40:27,933][41694] Avg episode reward: [(0, '4.656')] +[2024-11-08 06:40:32,931][41694] Fps is (10 sec: 5734.5, 60 sec: 6553.7, 300 sec: 6636.9). Total num frames: 201117696. Throughput: 0: 1621.6. Samples: 45274686. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:40:32,933][41694] Avg episode reward: [(0, '4.511')] +[2024-11-08 06:40:36,223][42004] Updated weights for policy 0, policy_version 49106 (0.0038) +[2024-11-08 06:40:37,932][41694] Fps is (10 sec: 4914.9, 60 sec: 6417.0, 300 sec: 6623.0). Total num frames: 201146368. Throughput: 0: 1533.5. Samples: 45281844. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:40:37,943][41694] Avg episode reward: [(0, '4.331')] +[2024-11-08 06:40:42,095][42004] Updated weights for policy 0, policy_version 49116 (0.0051) +[2024-11-08 06:40:42,931][41694] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6623.1). Total num frames: 201183232. Throughput: 0: 1626.1. Samples: 45292248. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:40:42,933][41694] Avg episode reward: [(0, '4.517')] +[2024-11-08 06:40:47,623][42004] Updated weights for policy 0, policy_version 49126 (0.0019) +[2024-11-08 06:40:47,931][41694] Fps is (10 sec: 7373.4, 60 sec: 6348.8, 300 sec: 6667.2). Total num frames: 201220096. Throughput: 0: 1628.5. Samples: 45297546. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:40:47,933][41694] Avg episode reward: [(0, '4.633')] +[2024-11-08 06:40:52,778][42004] Updated weights for policy 0, policy_version 49136 (0.0025) +[2024-11-08 06:40:52,931][41694] Fps is (10 sec: 7782.4, 60 sec: 6485.3, 300 sec: 6678.6). Total num frames: 201261056. Throughput: 0: 1685.4. Samples: 45309246. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:40:52,933][41694] Avg episode reward: [(0, '4.409')] +[2024-11-08 06:40:57,876][42004] Updated weights for policy 0, policy_version 49146 (0.0027) +[2024-11-08 06:40:57,931][41694] Fps is (10 sec: 8191.9, 60 sec: 6826.7, 300 sec: 6706.3). Total num frames: 201302016. Throughput: 0: 1708.3. Samples: 45321192. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:40:57,933][41694] Avg episode reward: [(0, '4.460')] +[2024-11-08 06:41:02,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6894.9, 300 sec: 6706.3). Total num frames: 201334784. Throughput: 0: 1718.5. Samples: 45326690. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:41:02,933][41694] Avg episode reward: [(0, '4.567')] +[2024-11-08 06:41:03,601][42004] Updated weights for policy 0, policy_version 49156 (0.0029) +[2024-11-08 06:41:07,933][41694] Fps is (10 sec: 4914.4, 60 sec: 6622.1, 300 sec: 6664.7). Total num frames: 201351168. Throughput: 0: 1634.0. Samples: 45333816. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:41:07,935][41694] Avg episode reward: [(0, '4.504')] +[2024-11-08 06:41:12,932][41694] Fps is (10 sec: 4505.6, 60 sec: 6485.3, 300 sec: 6636.9). Total num frames: 201379840. Throughput: 0: 1581.6. Samples: 45342200. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:41:12,933][41694] Avg episode reward: [(0, '4.680')] +[2024-11-08 06:41:13,067][42004] Updated weights for policy 0, policy_version 49166 (0.0055) +[2024-11-08 06:41:17,932][41694] Fps is (10 sec: 6554.2, 60 sec: 6553.5, 300 sec: 6636.9). Total num frames: 201416704. Throughput: 0: 1600.1. Samples: 45346690. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:41:17,933][41694] Avg episode reward: [(0, '4.605')] +[2024-11-08 06:41:18,726][42004] Updated weights for policy 0, policy_version 49176 (0.0027) +[2024-11-08 06:41:22,931][41694] Fps is (10 sec: 7782.5, 60 sec: 6621.9, 300 sec: 6697.7). Total num frames: 201457664. Throughput: 0: 1694.4. Samples: 45358092. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:41:22,933][41694] Avg episode reward: [(0, '4.622')] +[2024-11-08 06:41:23,961][42004] Updated weights for policy 0, policy_version 49186 (0.0022) +[2024-11-08 06:41:27,932][41694] Fps is (10 sec: 7373.1, 60 sec: 6553.6, 300 sec: 6692.4). Total num frames: 201490432. Throughput: 0: 1710.5. Samples: 45369220. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:41:27,935][41694] Avg episode reward: [(0, '4.527')] +[2024-11-08 06:41:29,730][42004] Updated weights for policy 0, policy_version 49196 (0.0028) +[2024-11-08 06:41:32,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6894.9, 300 sec: 6706.3). Total num frames: 201531392. Throughput: 0: 1717.8. Samples: 45374846. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:41:32,935][41694] Avg episode reward: [(0, '4.425')] +[2024-11-08 06:41:35,137][42004] Updated weights for policy 0, policy_version 49206 (0.0027) +[2024-11-08 06:41:37,932][41694] Fps is (10 sec: 7373.0, 60 sec: 6963.3, 300 sec: 6706.3). Total num frames: 201564160. Throughput: 0: 1707.3. Samples: 45386074. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:41:37,933][41694] Avg episode reward: [(0, '4.509')] +[2024-11-08 06:41:38,031][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000049211_201568256.pth... +[2024-11-08 06:41:38,148][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000048816_199950336.pth +[2024-11-08 06:41:42,932][41694] Fps is (10 sec: 4915.1, 60 sec: 6621.8, 300 sec: 6636.9). Total num frames: 201580544. Throughput: 0: 1568.2. Samples: 45391762. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:41:42,935][41694] Avg episode reward: [(0, '4.490')] +[2024-11-08 06:41:44,178][42004] Updated weights for policy 0, policy_version 49216 (0.0027) +[2024-11-08 06:41:47,931][41694] Fps is (10 sec: 4505.7, 60 sec: 6485.3, 300 sec: 6609.1). Total num frames: 201609216. Throughput: 0: 1537.7. Samples: 45395886. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:41:47,934][41694] Avg episode reward: [(0, '4.413')] +[2024-11-08 06:41:50,403][42004] Updated weights for policy 0, policy_version 49226 (0.0026) +[2024-11-08 06:41:52,931][41694] Fps is (10 sec: 6553.7, 60 sec: 6417.1, 300 sec: 6609.1). Total num frames: 201646080. Throughput: 0: 1601.3. Samples: 45405870. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-11-08 06:41:52,933][41694] Avg episode reward: [(0, '4.298')] +[2024-11-08 06:41:56,162][42004] Updated weights for policy 0, policy_version 49236 (0.0029) +[2024-11-08 06:41:57,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6348.8, 300 sec: 6673.6). Total num frames: 201682944. Throughput: 0: 1661.3. Samples: 45416956. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-11-08 06:41:57,934][41694] Avg episode reward: [(0, '4.329')] +[2024-11-08 06:42:01,748][42004] Updated weights for policy 0, policy_version 49246 (0.0027) +[2024-11-08 06:42:02,932][41694] Fps is (10 sec: 7372.6, 60 sec: 6417.1, 300 sec: 6678.6). Total num frames: 201719808. Throughput: 0: 1679.2. Samples: 45422254. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-11-08 06:42:02,934][41694] Avg episode reward: [(0, '4.364')] +[2024-11-08 06:42:07,512][42004] Updated weights for policy 0, policy_version 49256 (0.0030) +[2024-11-08 06:42:07,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6690.3, 300 sec: 6664.7). Total num frames: 201752576. Throughput: 0: 1660.7. Samples: 45432824. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-11-08 06:42:07,933][41694] Avg episode reward: [(0, '4.480')] +[2024-11-08 06:42:12,764][42004] Updated weights for policy 0, policy_version 49266 (0.0021) +[2024-11-08 06:42:12,932][41694] Fps is (10 sec: 7372.6, 60 sec: 6894.9, 300 sec: 6678.6). Total num frames: 201793536. Throughput: 0: 1669.0. Samples: 45444326. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-11-08 06:42:12,934][41694] Avg episode reward: [(0, '4.314')] +[2024-11-08 06:42:17,932][41694] Fps is (10 sec: 6143.9, 60 sec: 6621.9, 300 sec: 6623.0). Total num frames: 201814016. Throughput: 0: 1600.3. Samples: 45446858. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2024-11-08 06:42:17,934][41694] Avg episode reward: [(0, '4.435')] +[2024-11-08 06:42:20,778][42004] Updated weights for policy 0, policy_version 49276 (0.0043) +[2024-11-08 06:42:22,933][41694] Fps is (10 sec: 5324.4, 60 sec: 6485.2, 300 sec: 6609.1). Total num frames: 201846784. Throughput: 0: 1566.7. Samples: 45456576. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:42:22,935][41694] Avg episode reward: [(0, '4.487')] +[2024-11-08 06:42:26,256][42004] Updated weights for policy 0, policy_version 49286 (0.0030) +[2024-11-08 06:42:27,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6621.9, 300 sec: 6623.0). Total num frames: 201887744. Throughput: 0: 1692.8. Samples: 45467938. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:42:27,933][41694] Avg episode reward: [(0, '4.278')] +[2024-11-08 06:42:31,739][42004] Updated weights for policy 0, policy_version 49296 (0.0027) +[2024-11-08 06:42:32,931][41694] Fps is (10 sec: 7783.4, 60 sec: 6553.6, 300 sec: 6692.7). Total num frames: 201924608. Throughput: 0: 1715.6. Samples: 45473086. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:42:32,934][41694] Avg episode reward: [(0, '4.602')] +[2024-11-08 06:42:37,160][42004] Updated weights for policy 0, policy_version 49306 (0.0030) +[2024-11-08 06:42:37,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6621.9, 300 sec: 6692.5). Total num frames: 201961472. Throughput: 0: 1750.9. Samples: 45484662. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:42:37,933][41694] Avg episode reward: [(0, '4.557')] +[2024-11-08 06:42:42,463][42004] Updated weights for policy 0, policy_version 49316 (0.0025) +[2024-11-08 06:42:42,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6963.2, 300 sec: 6706.3). Total num frames: 201998336. Throughput: 0: 1761.3. Samples: 45496214. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:42:42,933][41694] Avg episode reward: [(0, '4.574')] +[2024-11-08 06:42:49,743][41694] Fps is (10 sec: 5895.3, 60 sec: 6825.4, 300 sec: 6651.6). Total num frames: 202031104. Throughput: 0: 1696.7. Samples: 45501680. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:42:49,745][41694] Avg episode reward: [(0, '4.489')] +[2024-11-08 06:42:50,488][42004] Updated weights for policy 0, policy_version 49326 (0.0031) +[2024-11-08 06:42:52,931][41694] Fps is (10 sec: 5324.9, 60 sec: 6758.4, 300 sec: 6636.9). Total num frames: 202051584. Throughput: 0: 1673.2. Samples: 45508116. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:42:52,933][41694] Avg episode reward: [(0, '4.522')] +[2024-11-08 06:42:56,624][42004] Updated weights for policy 0, policy_version 49336 (0.0027) +[2024-11-08 06:42:57,932][41694] Fps is (10 sec: 7003.0, 60 sec: 6758.4, 300 sec: 6636.9). Total num frames: 202088448. Throughput: 0: 1646.6. Samples: 45518422. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:42:57,934][41694] Avg episode reward: [(0, '4.484')] +[2024-11-08 06:43:02,125][42004] Updated weights for policy 0, policy_version 49346 (0.0024) +[2024-11-08 06:43:02,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6758.4, 300 sec: 6650.8). Total num frames: 202125312. Throughput: 0: 1711.1. Samples: 45523858. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:43:02,934][41694] Avg episode reward: [(0, '4.729')] +[2024-11-08 06:43:07,410][42004] Updated weights for policy 0, policy_version 49356 (0.0026) +[2024-11-08 06:43:07,932][41694] Fps is (10 sec: 7372.6, 60 sec: 6826.6, 300 sec: 6706.3). Total num frames: 202162176. Throughput: 0: 1749.7. Samples: 45535310. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:43:07,934][41694] Avg episode reward: [(0, '4.769')] +[2024-11-08 06:43:12,659][42004] Updated weights for policy 0, policy_version 49366 (0.0029) +[2024-11-08 06:43:12,931][41694] Fps is (10 sec: 7782.5, 60 sec: 6826.7, 300 sec: 6720.2). Total num frames: 202203136. Throughput: 0: 1759.3. Samples: 45547106. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:43:12,934][41694] Avg episode reward: [(0, '4.387')] +[2024-11-08 06:43:17,932][41694] Fps is (10 sec: 7782.6, 60 sec: 7099.7, 300 sec: 6748.0). Total num frames: 202240000. Throughput: 0: 1769.9. Samples: 45552732. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:43:17,933][41694] Avg episode reward: [(0, '4.423')] +[2024-11-08 06:43:18,075][42004] Updated weights for policy 0, policy_version 49376 (0.0027) +[2024-11-08 06:43:24,527][41694] Fps is (10 sec: 5651.7, 60 sec: 6849.4, 300 sec: 6684.1). Total num frames: 202268672. Throughput: 0: 1698.8. Samples: 45563820. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:43:24,530][41694] Avg episode reward: [(0, '4.425')] +[2024-11-08 06:43:26,686][42004] Updated weights for policy 0, policy_version 49386 (0.0034) +[2024-11-08 06:43:27,935][41694] Fps is (10 sec: 4914.9, 60 sec: 6690.1, 300 sec: 6650.8). Total num frames: 202289152. Throughput: 0: 1623.2. Samples: 45569258. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:43:27,942][41694] Avg episode reward: [(0, '4.422')] +[2024-11-08 06:43:32,708][42004] Updated weights for policy 0, policy_version 49396 (0.0028) +[2024-11-08 06:43:32,932][41694] Fps is (10 sec: 6823.2, 60 sec: 6690.1, 300 sec: 6650.8). Total num frames: 202326016. Throughput: 0: 1673.4. Samples: 45573952. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:43:32,933][41694] Avg episode reward: [(0, '4.665')] +[2024-11-08 06:43:37,932][41694] Fps is (10 sec: 7373.1, 60 sec: 6690.1, 300 sec: 6650.8). Total num frames: 202362880. Throughput: 0: 1717.5. Samples: 45585406. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:43:37,933][41694] Avg episode reward: [(0, '4.604')] +[2024-11-08 06:43:38,023][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000049406_202366976.pth... +[2024-11-08 06:43:38,037][42004] Updated weights for policy 0, policy_version 49406 (0.0026) +[2024-11-08 06:43:38,146][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000049014_200761344.pth +[2024-11-08 06:43:42,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6758.4, 300 sec: 6720.2). Total num frames: 202403840. Throughput: 0: 1745.6. Samples: 45596972. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:43:42,934][41694] Avg episode reward: [(0, '4.534')] +[2024-11-08 06:43:43,336][42004] Updated weights for policy 0, policy_version 49416 (0.0022) +[2024-11-08 06:43:47,931][41694] Fps is (10 sec: 8192.2, 60 sec: 7109.6, 300 sec: 6748.0). Total num frames: 202444800. Throughput: 0: 1754.7. Samples: 45602818. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:43:47,933][41694] Avg episode reward: [(0, '4.420')] +[2024-11-08 06:43:48,511][42004] Updated weights for policy 0, policy_version 49426 (0.0027) +[2024-11-08 06:43:52,931][41694] Fps is (10 sec: 7372.9, 60 sec: 7099.7, 300 sec: 6748.0). Total num frames: 202477568. Throughput: 0: 1745.0. Samples: 45613836. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:43:52,936][41694] Avg episode reward: [(0, '4.513')] +[2024-11-08 06:43:54,469][42004] Updated weights for policy 0, policy_version 49436 (0.0026) +[2024-11-08 06:43:59,067][41694] Fps is (10 sec: 5149.4, 60 sec: 6766.8, 300 sec: 6680.6). Total num frames: 202502144. Throughput: 0: 1564.9. Samples: 45619306. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:43:59,069][41694] Avg episode reward: [(0, '4.461')] +[2024-11-08 06:44:02,931][41694] Fps is (10 sec: 4915.2, 60 sec: 6690.2, 300 sec: 6664.7). Total num frames: 202526720. Throughput: 0: 1604.4. Samples: 45624932. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:44:02,933][41694] Avg episode reward: [(0, '4.412')] +[2024-11-08 06:44:03,364][42004] Updated weights for policy 0, policy_version 49446 (0.0035) +[2024-11-08 06:44:07,931][41694] Fps is (10 sec: 6931.4, 60 sec: 6690.2, 300 sec: 6650.8). Total num frames: 202563584. Throughput: 0: 1641.6. Samples: 45635070. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:44:07,933][41694] Avg episode reward: [(0, '4.426')] +[2024-11-08 06:44:08,636][42004] Updated weights for policy 0, policy_version 49456 (0.0029) +[2024-11-08 06:44:12,932][41694] Fps is (10 sec: 7372.1, 60 sec: 6621.8, 300 sec: 6650.8). Total num frames: 202600448. Throughput: 0: 1716.0. Samples: 45646480. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:44:12,934][41694] Avg episode reward: [(0, '4.332')] +[2024-11-08 06:44:14,134][42004] Updated weights for policy 0, policy_version 49466 (0.0018) +[2024-11-08 06:44:17,931][41694] Fps is (10 sec: 7372.7, 60 sec: 6621.9, 300 sec: 6720.2). Total num frames: 202637312. Throughput: 0: 1741.4. Samples: 45652316. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:44:17,933][41694] Avg episode reward: [(0, '4.545')] +[2024-11-08 06:44:19,655][42004] Updated weights for policy 0, policy_version 49476 (0.0035) +[2024-11-08 06:44:22,932][41694] Fps is (10 sec: 7783.0, 60 sec: 7013.2, 300 sec: 6748.0). Total num frames: 202678272. Throughput: 0: 1738.9. Samples: 45663656. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:44:22,933][41694] Avg episode reward: [(0, '4.413')] +[2024-11-08 06:44:25,103][42004] Updated weights for policy 0, policy_version 49486 (0.0025) +[2024-11-08 06:44:27,931][41694] Fps is (10 sec: 7782.4, 60 sec: 7099.8, 300 sec: 6748.0). Total num frames: 202715136. Throughput: 0: 1726.9. Samples: 45674684. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:44:27,933][41694] Avg episode reward: [(0, '4.649')] +[2024-11-08 06:44:31,236][42004] Updated weights for policy 0, policy_version 49496 (0.0029) +[2024-11-08 06:44:34,040][41694] Fps is (10 sec: 5162.3, 60 sec: 6702.9, 300 sec: 6667.4). Total num frames: 202735616. Throughput: 0: 1659.0. Samples: 45679312. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:44:34,044][41694] Avg episode reward: [(0, '4.795')] +[2024-11-08 06:44:37,932][41694] Fps is (10 sec: 4915.1, 60 sec: 6690.1, 300 sec: 6664.7). Total num frames: 202764288. Throughput: 0: 1579.1. Samples: 45684894. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:44:37,934][41694] Avg episode reward: [(0, '4.507')] +[2024-11-08 06:44:39,531][42004] Updated weights for policy 0, policy_version 49506 (0.0038) +[2024-11-08 06:44:42,931][41694] Fps is (10 sec: 7370.5, 60 sec: 6621.9, 300 sec: 6650.8). Total num frames: 202801152. Throughput: 0: 1747.9. Samples: 45695974. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:44:42,933][41694] Avg episode reward: [(0, '4.566')] +[2024-11-08 06:44:45,350][42004] Updated weights for policy 0, policy_version 49516 (0.0020) +[2024-11-08 06:44:47,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6485.3, 300 sec: 6650.8). Total num frames: 202833920. Throughput: 0: 1699.2. Samples: 45701396. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:44:47,937][41694] Avg episode reward: [(0, '4.450')] +[2024-11-08 06:44:50,662][42004] Updated weights for policy 0, policy_version 49526 (0.0026) +[2024-11-08 06:44:52,933][41694] Fps is (10 sec: 6962.3, 60 sec: 6553.5, 300 sec: 6706.3). Total num frames: 202870784. Throughput: 0: 1727.1. Samples: 45712790. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:44:52,936][41694] Avg episode reward: [(0, '4.527')] +[2024-11-08 06:44:56,548][42004] Updated weights for policy 0, policy_version 49536 (0.0036) +[2024-11-08 06:44:57,942][41694] Fps is (10 sec: 7364.9, 60 sec: 6887.5, 300 sec: 6733.9). Total num frames: 202907648. Throughput: 0: 1706.5. Samples: 45723288. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:44:57,944][41694] Avg episode reward: [(0, '4.743')] +[2024-11-08 06:45:02,614][42004] Updated weights for policy 0, policy_version 49546 (0.0035) +[2024-11-08 06:45:02,931][41694] Fps is (10 sec: 6964.1, 60 sec: 6894.9, 300 sec: 6734.2). Total num frames: 202940416. Throughput: 0: 1685.4. Samples: 45728158. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:45:02,933][41694] Avg episode reward: [(0, '4.538')] +[2024-11-08 06:45:08,646][41694] Fps is (10 sec: 4974.8, 60 sec: 6543.9, 300 sec: 6662.4). Total num frames: 202960896. Throughput: 0: 1627.5. Samples: 45738054. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:45:08,648][41694] Avg episode reward: [(0, '4.288')] +[2024-11-08 06:45:11,288][42004] Updated weights for policy 0, policy_version 49556 (0.0032) +[2024-11-08 06:45:12,932][41694] Fps is (10 sec: 4915.1, 60 sec: 6485.4, 300 sec: 6664.7). Total num frames: 202989568. Throughput: 0: 1540.8. Samples: 45744020. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:45:12,934][41694] Avg episode reward: [(0, '4.468')] +[2024-11-08 06:45:17,702][42004] Updated weights for policy 0, policy_version 49566 (0.0027) +[2024-11-08 06:45:17,932][41694] Fps is (10 sec: 6616.5, 60 sec: 6417.1, 300 sec: 6650.8). Total num frames: 203022336. Throughput: 0: 1578.0. Samples: 45748574. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:45:17,934][41694] Avg episode reward: [(0, '4.674')] +[2024-11-08 06:45:22,931][41694] Fps is (10 sec: 6963.4, 60 sec: 6348.8, 300 sec: 6650.8). Total num frames: 203059200. Throughput: 0: 1642.2. Samples: 45758794. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:45:22,933][41694] Avg episode reward: [(0, '4.521')] +[2024-11-08 06:45:23,336][42004] Updated weights for policy 0, policy_version 49576 (0.0026) +[2024-11-08 06:45:27,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6348.8, 300 sec: 6706.3). Total num frames: 203096064. Throughput: 0: 1648.9. Samples: 45770176. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:45:27,933][41694] Avg episode reward: [(0, '4.284')] +[2024-11-08 06:45:28,931][42004] Updated weights for policy 0, policy_version 49586 (0.0029) +[2024-11-08 06:45:32,932][41694] Fps is (10 sec: 7372.4, 60 sec: 6746.4, 300 sec: 6734.1). Total num frames: 203132928. Throughput: 0: 1650.5. Samples: 45775668. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:45:32,934][41694] Avg episode reward: [(0, '4.635')] +[2024-11-08 06:45:34,664][42004] Updated weights for policy 0, policy_version 49596 (0.0032) +[2024-11-08 06:45:37,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6690.1, 300 sec: 6720.2). Total num frames: 203165696. Throughput: 0: 1627.3. Samples: 45786018. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:45:37,936][41694] Avg episode reward: [(0, '4.490')] +[2024-11-08 06:45:37,947][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000049601_203165696.pth... +[2024-11-08 06:45:38,310][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000049211_201568256.pth +[2024-11-08 06:45:43,374][41694] Fps is (10 sec: 4707.2, 60 sec: 6302.3, 300 sec: 6640.8). Total num frames: 203182080. Throughput: 0: 1483.2. Samples: 45790670. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:45:43,377][41694] Avg episode reward: [(0, '4.536')] +[2024-11-08 06:45:43,754][42004] Updated weights for policy 0, policy_version 49606 (0.0032) +[2024-11-08 06:45:47,932][41694] Fps is (10 sec: 4915.2, 60 sec: 6348.8, 300 sec: 6623.0). Total num frames: 203214848. Throughput: 0: 1519.0. Samples: 45796512. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:45:47,933][41694] Avg episode reward: [(0, '4.468')] +[2024-11-08 06:45:49,285][42004] Updated weights for policy 0, policy_version 49616 (0.0027) +[2024-11-08 06:45:52,932][41694] Fps is (10 sec: 7285.4, 60 sec: 6348.9, 300 sec: 6609.1). Total num frames: 203251712. Throughput: 0: 1566.6. Samples: 45807432. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:45:52,936][41694] Avg episode reward: [(0, '4.408')] +[2024-11-08 06:45:54,707][42004] Updated weights for policy 0, policy_version 49626 (0.0024) +[2024-11-08 06:45:57,931][41694] Fps is (10 sec: 7782.4, 60 sec: 6418.2, 300 sec: 6636.9). Total num frames: 203292672. Throughput: 0: 1667.4. Samples: 45819054. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:45:57,935][41694] Avg episode reward: [(0, '4.571')] +[2024-11-08 06:46:00,220][42004] Updated weights for policy 0, policy_version 49636 (0.0033) +[2024-11-08 06:46:02,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6417.1, 300 sec: 6692.5). Total num frames: 203325440. Throughput: 0: 1687.0. Samples: 45824488. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:46:02,935][41694] Avg episode reward: [(0, '4.488')] +[2024-11-08 06:46:05,963][42004] Updated weights for policy 0, policy_version 49646 (0.0025) +[2024-11-08 06:46:07,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6770.8, 300 sec: 6720.2). Total num frames: 203362304. Throughput: 0: 1697.0. Samples: 45835158. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:46:07,934][41694] Avg episode reward: [(0, '4.630')] +[2024-11-08 06:46:12,663][42004] Updated weights for policy 0, policy_version 49656 (0.0039) +[2024-11-08 06:46:12,932][41694] Fps is (10 sec: 6553.3, 60 sec: 6690.1, 300 sec: 6692.5). Total num frames: 203390976. Throughput: 0: 1648.2. Samples: 45844346. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:46:12,942][41694] Avg episode reward: [(0, '4.678')] +[2024-11-08 06:46:18,189][41694] Fps is (10 sec: 4392.3, 60 sec: 6389.6, 300 sec: 6603.4). Total num frames: 203407360. Throughput: 0: 1619.1. Samples: 45848944. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:46:18,192][41694] Avg episode reward: [(0, '4.702')] +[2024-11-08 06:46:21,234][42004] Updated weights for policy 0, policy_version 49666 (0.0030) +[2024-11-08 06:46:22,932][41694] Fps is (10 sec: 4915.4, 60 sec: 6348.8, 300 sec: 6609.1). Total num frames: 203440128. Throughput: 0: 1538.0. Samples: 45855226. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:46:22,935][41694] Avg episode reward: [(0, '4.367')] +[2024-11-08 06:46:27,025][42004] Updated weights for policy 0, policy_version 49676 (0.0033) +[2024-11-08 06:46:27,932][41694] Fps is (10 sec: 7147.5, 60 sec: 6348.8, 300 sec: 6595.3). Total num frames: 203476992. Throughput: 0: 1685.0. Samples: 45865750. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:46:27,934][41694] Avg episode reward: [(0, '4.312')] +[2024-11-08 06:46:32,357][42004] Updated weights for policy 0, policy_version 49686 (0.0031) +[2024-11-08 06:46:32,931][41694] Fps is (10 sec: 7782.6, 60 sec: 6417.1, 300 sec: 6623.0). Total num frames: 203517952. Throughput: 0: 1662.6. Samples: 45871328. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-08 06:46:32,933][41694] Avg episode reward: [(0, '4.579')] +[2024-11-08 06:46:37,510][42004] Updated weights for policy 0, policy_version 49696 (0.0027) +[2024-11-08 06:46:37,931][41694] Fps is (10 sec: 7782.5, 60 sec: 6485.4, 300 sec: 6692.5). Total num frames: 203554816. Throughput: 0: 1683.9. Samples: 45883208. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-08 06:46:37,934][41694] Avg episode reward: [(0, '4.305')] +[2024-11-08 06:46:42,848][42004] Updated weights for policy 0, policy_version 49706 (0.0024) +[2024-11-08 06:46:42,931][41694] Fps is (10 sec: 7782.3, 60 sec: 6946.2, 300 sec: 6734.1). Total num frames: 203595776. Throughput: 0: 1684.8. Samples: 45894868. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-08 06:46:42,933][41694] Avg episode reward: [(0, '4.667')] +[2024-11-08 06:46:47,932][41694] Fps is (10 sec: 6963.0, 60 sec: 6826.6, 300 sec: 6706.3). Total num frames: 203624448. Throughput: 0: 1672.6. Samples: 45899756. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-08 06:46:47,936][41694] Avg episode reward: [(0, '4.390')] +[2024-11-08 06:46:49,322][42004] Updated weights for policy 0, policy_version 49716 (0.0029) +[2024-11-08 06:46:52,931][41694] Fps is (10 sec: 4505.6, 60 sec: 6485.4, 300 sec: 6636.9). Total num frames: 203640832. Throughput: 0: 1632.6. Samples: 45908624. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-08 06:46:52,933][41694] Avg episode reward: [(0, '4.389')] +[2024-11-08 06:46:57,333][42004] Updated weights for policy 0, policy_version 49726 (0.0024) +[2024-11-08 06:46:57,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6485.3, 300 sec: 6650.8). Total num frames: 203681792. Throughput: 0: 1599.7. Samples: 45916332. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2024-11-08 06:46:57,934][41694] Avg episode reward: [(0, '4.606')] +[2024-11-08 06:47:02,890][42004] Updated weights for policy 0, policy_version 49736 (0.0039) +[2024-11-08 06:47:02,931][41694] Fps is (10 sec: 7782.3, 60 sec: 6553.6, 300 sec: 6664.7). Total num frames: 203718656. Throughput: 0: 1630.6. Samples: 45921900. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:47:02,933][41694] Avg episode reward: [(0, '4.379')] +[2024-11-08 06:47:07,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6553.6, 300 sec: 6650.8). Total num frames: 203755520. Throughput: 0: 1731.2. Samples: 45933132. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:47:07,935][41694] Avg episode reward: [(0, '4.677')] +[2024-11-08 06:47:08,276][42004] Updated weights for policy 0, policy_version 49746 (0.0034) +[2024-11-08 06:47:12,931][41694] Fps is (10 sec: 7782.5, 60 sec: 6758.5, 300 sec: 6720.2). Total num frames: 203796480. Throughput: 0: 1756.3. Samples: 45944784. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:47:12,933][41694] Avg episode reward: [(0, '4.500')] +[2024-11-08 06:47:13,447][42004] Updated weights for policy 0, policy_version 49756 (0.0035) +[2024-11-08 06:47:17,932][41694] Fps is (10 sec: 7372.9, 60 sec: 7061.8, 300 sec: 6720.2). Total num frames: 203829248. Throughput: 0: 1758.1. Samples: 45950444. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:47:17,933][41694] Avg episode reward: [(0, '4.436')] +[2024-11-08 06:47:19,375][42004] Updated weights for policy 0, policy_version 49766 (0.0031) +[2024-11-08 06:47:22,932][41694] Fps is (10 sec: 6553.5, 60 sec: 7031.5, 300 sec: 6692.4). Total num frames: 203862016. Throughput: 0: 1719.3. Samples: 45960578. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:47:22,933][41694] Avg episode reward: [(0, '4.577')] +[2024-11-08 06:47:27,777][42004] Updated weights for policy 0, policy_version 49776 (0.0027) +[2024-11-08 06:47:27,931][41694] Fps is (10 sec: 5324.8, 60 sec: 6758.4, 300 sec: 6636.9). Total num frames: 203882496. Throughput: 0: 1621.2. Samples: 45967822. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:47:27,935][41694] Avg episode reward: [(0, '4.672')] +[2024-11-08 06:47:32,932][41694] Fps is (10 sec: 5734.2, 60 sec: 6690.1, 300 sec: 6636.9). Total num frames: 203919360. Throughput: 0: 1607.4. Samples: 45972088. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:47:32,935][41694] Avg episode reward: [(0, '4.483')] +[2024-11-08 06:47:33,166][42004] Updated weights for policy 0, policy_version 49786 (0.0034) +[2024-11-08 06:47:37,932][41694] Fps is (10 sec: 7372.4, 60 sec: 6690.1, 300 sec: 6636.9). Total num frames: 203956224. Throughput: 0: 1662.9. Samples: 45983456. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:47:37,934][41694] Avg episode reward: [(0, '4.691')] +[2024-11-08 06:47:37,990][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000049795_203960320.pth... +[2024-11-08 06:47:38,120][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000049406_202366976.pth +[2024-11-08 06:47:38,588][42004] Updated weights for policy 0, policy_version 49796 (0.0024) +[2024-11-08 06:47:42,931][41694] Fps is (10 sec: 7782.8, 60 sec: 6690.1, 300 sec: 6705.9). Total num frames: 203997184. Throughput: 0: 1753.0. Samples: 45995216. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:47:42,934][41694] Avg episode reward: [(0, '4.427')] +[2024-11-08 06:47:43,848][42004] Updated weights for policy 0, policy_version 49806 (0.0025) +[2024-11-08 06:47:47,931][41694] Fps is (10 sec: 7782.8, 60 sec: 6826.7, 300 sec: 6720.2). Total num frames: 204034048. Throughput: 0: 1758.8. Samples: 46001048. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:47:47,933][41694] Avg episode reward: [(0, '4.643')] +[2024-11-08 06:47:49,258][42004] Updated weights for policy 0, policy_version 49816 (0.0024) +[2024-11-08 06:47:52,932][41694] Fps is (10 sec: 7372.8, 60 sec: 7168.0, 300 sec: 6720.2). Total num frames: 204070912. Throughput: 0: 1756.8. Samples: 46012186. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:47:52,937][41694] Avg episode reward: [(0, '4.292')] +[2024-11-08 06:47:54,859][42004] Updated weights for policy 0, policy_version 49826 (0.0029) +[2024-11-08 06:47:57,931][41694] Fps is (10 sec: 7372.8, 60 sec: 7099.8, 300 sec: 6720.2). Total num frames: 204107776. Throughput: 0: 1744.2. Samples: 46023274. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:47:57,934][41694] Avg episode reward: [(0, '4.608')] +[2024-11-08 06:48:02,932][41694] Fps is (10 sec: 5324.8, 60 sec: 6758.4, 300 sec: 6650.8). Total num frames: 204124160. Throughput: 0: 1722.9. Samples: 46027976. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:48:02,935][41694] Avg episode reward: [(0, '4.765')] +[2024-11-08 06:48:03,570][42004] Updated weights for policy 0, policy_version 49836 (0.0035) +[2024-11-08 06:48:07,931][41694] Fps is (10 sec: 4915.2, 60 sec: 6690.2, 300 sec: 6623.0). Total num frames: 204156928. Throughput: 0: 1623.9. Samples: 46033652. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:48:07,933][41694] Avg episode reward: [(0, '4.489')] +[2024-11-08 06:48:09,789][42004] Updated weights for policy 0, policy_version 49846 (0.0032) +[2024-11-08 06:48:12,932][41694] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6609.1). Total num frames: 204189696. Throughput: 0: 1688.9. Samples: 46043824. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:48:12,933][41694] Avg episode reward: [(0, '4.469')] +[2024-11-08 06:48:15,516][42004] Updated weights for policy 0, policy_version 49856 (0.0027) +[2024-11-08 06:48:17,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6621.9, 300 sec: 6673.0). Total num frames: 204226560. Throughput: 0: 1717.4. Samples: 46049372. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:48:17,933][41694] Avg episode reward: [(0, '4.352')] +[2024-11-08 06:48:20,883][42004] Updated weights for policy 0, policy_version 49866 (0.0028) +[2024-11-08 06:48:22,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6690.1, 300 sec: 6692.5). Total num frames: 204263424. Throughput: 0: 1713.4. Samples: 46060556. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:48:22,933][41694] Avg episode reward: [(0, '4.595')] +[2024-11-08 06:48:26,309][42004] Updated weights for policy 0, policy_version 49876 (0.0024) +[2024-11-08 06:48:27,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6963.2, 300 sec: 6692.4). Total num frames: 204300288. Throughput: 0: 1703.4. Samples: 46071870. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:48:27,933][41694] Avg episode reward: [(0, '4.440')] +[2024-11-08 06:48:31,920][42004] Updated weights for policy 0, policy_version 49886 (0.0025) +[2024-11-08 06:48:32,932][41694] Fps is (10 sec: 7372.6, 60 sec: 6963.2, 300 sec: 6692.4). Total num frames: 204337152. Throughput: 0: 1687.9. Samples: 46077006. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:48:32,935][41694] Avg episode reward: [(0, '4.600')] +[2024-11-08 06:48:37,932][41694] Fps is (10 sec: 5324.8, 60 sec: 6621.9, 300 sec: 6609.1). Total num frames: 204353536. Throughput: 0: 1626.7. Samples: 46085386. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:48:37,935][41694] Avg episode reward: [(0, '4.510')] +[2024-11-08 06:48:40,816][42004] Updated weights for policy 0, policy_version 49896 (0.0033) +[2024-11-08 06:48:42,932][41694] Fps is (10 sec: 4915.0, 60 sec: 6485.3, 300 sec: 6581.4). Total num frames: 204386304. Throughput: 0: 1547.1. Samples: 46092894. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:48:42,935][41694] Avg episode reward: [(0, '4.411')] +[2024-11-08 06:48:46,670][42004] Updated weights for policy 0, policy_version 49906 (0.0034) +[2024-11-08 06:48:47,932][41694] Fps is (10 sec: 6963.3, 60 sec: 6485.3, 300 sec: 6595.3). Total num frames: 204423168. Throughput: 0: 1554.6. Samples: 46097934. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:48:47,933][41694] Avg episode reward: [(0, '4.474')] +[2024-11-08 06:48:52,131][42004] Updated weights for policy 0, policy_version 49916 (0.0028) +[2024-11-08 06:48:52,931][41694] Fps is (10 sec: 7373.4, 60 sec: 6485.4, 300 sec: 6662.6). Total num frames: 204460032. Throughput: 0: 1684.2. Samples: 46109442. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:48:52,933][41694] Avg episode reward: [(0, '4.576')] +[2024-11-08 06:48:57,320][42004] Updated weights for policy 0, policy_version 49926 (0.0023) +[2024-11-08 06:48:57,931][41694] Fps is (10 sec: 7782.4, 60 sec: 6553.6, 300 sec: 6692.4). Total num frames: 204500992. Throughput: 0: 1718.0. Samples: 46121136. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:48:57,933][41694] Avg episode reward: [(0, '4.566')] +[2024-11-08 06:49:02,932][41694] Fps is (10 sec: 7372.5, 60 sec: 6826.6, 300 sec: 6678.6). Total num frames: 204533760. Throughput: 0: 1715.6. Samples: 46126574. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:49:02,933][41694] Avg episode reward: [(0, '4.407')] +[2024-11-08 06:49:02,990][42004] Updated weights for policy 0, policy_version 49936 (0.0041) +[2024-11-08 06:49:07,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6894.9, 300 sec: 6678.6). Total num frames: 204570624. Throughput: 0: 1706.3. Samples: 46137342. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:49:07,933][41694] Avg episode reward: [(0, '4.269')] +[2024-11-08 06:49:08,710][42004] Updated weights for policy 0, policy_version 49946 (0.0044) +[2024-11-08 06:49:12,934][41694] Fps is (10 sec: 5323.4, 60 sec: 6621.6, 300 sec: 6609.1). Total num frames: 204587008. Throughput: 0: 1590.2. Samples: 46143434. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:49:12,941][41694] Avg episode reward: [(0, '4.136')] +[2024-11-08 06:49:17,668][42004] Updated weights for policy 0, policy_version 49956 (0.0032) +[2024-11-08 06:49:17,931][41694] Fps is (10 sec: 4915.3, 60 sec: 6553.6, 300 sec: 6581.4). Total num frames: 204619776. Throughput: 0: 1572.8. Samples: 46147782. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:49:17,934][41694] Avg episode reward: [(0, '4.467')] +[2024-11-08 06:49:22,857][42004] Updated weights for policy 0, policy_version 49966 (0.0023) +[2024-11-08 06:49:22,931][41694] Fps is (10 sec: 7375.0, 60 sec: 6621.9, 300 sec: 6595.3). Total num frames: 204660736. Throughput: 0: 1637.1. Samples: 46159054. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:49:22,933][41694] Avg episode reward: [(0, '4.304')] +[2024-11-08 06:49:27,932][41694] Fps is (10 sec: 7782.3, 60 sec: 6621.9, 300 sec: 6675.9). Total num frames: 204697600. Throughput: 0: 1731.8. Samples: 46170826. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:49:27,934][41694] Avg episode reward: [(0, '4.539')] +[2024-11-08 06:49:28,162][42004] Updated weights for policy 0, policy_version 49976 (0.0025) +[2024-11-08 06:49:32,932][41694] Fps is (10 sec: 7781.6, 60 sec: 6690.1, 300 sec: 6692.4). Total num frames: 204738560. Throughput: 0: 1747.5. Samples: 46176574. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:49:32,934][41694] Avg episode reward: [(0, '4.300')] +[2024-11-08 06:49:33,219][42004] Updated weights for policy 0, policy_version 49986 (0.0027) +[2024-11-08 06:49:37,932][41694] Fps is (10 sec: 7782.4, 60 sec: 7031.5, 300 sec: 6692.4). Total num frames: 204775424. Throughput: 0: 1752.2. Samples: 46188292. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:49:37,933][41694] Avg episode reward: [(0, '4.406')] +[2024-11-08 06:49:37,963][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000049994_204775424.pth... +[2024-11-08 06:49:38,089][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000049601_203165696.pth +[2024-11-08 06:49:38,671][42004] Updated weights for policy 0, policy_version 49996 (0.0026) +[2024-11-08 06:49:42,932][41694] Fps is (10 sec: 7373.3, 60 sec: 7099.8, 300 sec: 6706.3). Total num frames: 204812288. Throughput: 0: 1733.0. Samples: 46199122. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 06:49:42,934][41694] Avg episode reward: [(0, '4.604')] +[2024-11-08 06:49:47,597][42004] Updated weights for policy 0, policy_version 50006 (0.0033) +[2024-11-08 06:49:47,932][41694] Fps is (10 sec: 4915.2, 60 sec: 6690.1, 300 sec: 6623.0). Total num frames: 204824576. Throughput: 0: 1688.1. Samples: 46202538. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 06:49:47,934][41694] Avg episode reward: [(0, '4.595')] +[2024-11-08 06:49:52,932][41694] Fps is (10 sec: 4505.6, 60 sec: 6621.8, 300 sec: 6609.4). Total num frames: 204857344. Throughput: 0: 1588.0. Samples: 46208800. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 06:49:52,934][41694] Avg episode reward: [(0, '4.549')] +[2024-11-08 06:49:53,521][42004] Updated weights for policy 0, policy_version 50016 (0.0027) +[2024-11-08 06:49:57,932][41694] Fps is (10 sec: 7372.9, 60 sec: 6621.9, 300 sec: 6636.9). Total num frames: 204898304. Throughput: 0: 1718.4. Samples: 46220756. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 06:49:57,933][41694] Avg episode reward: [(0, '4.318')] +[2024-11-08 06:49:58,852][42004] Updated weights for policy 0, policy_version 50026 (0.0031) +[2024-11-08 06:50:02,932][41694] Fps is (10 sec: 7782.6, 60 sec: 6690.2, 300 sec: 6708.7). Total num frames: 204935168. Throughput: 0: 1747.6. Samples: 46226422. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 06:50:02,934][41694] Avg episode reward: [(0, '4.430')] +[2024-11-08 06:50:04,519][42004] Updated weights for policy 0, policy_version 50036 (0.0036) +[2024-11-08 06:50:07,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6690.2, 300 sec: 6720.2). Total num frames: 204972032. Throughput: 0: 1740.0. Samples: 46237356. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 06:50:07,933][41694] Avg episode reward: [(0, '4.514')] +[2024-11-08 06:50:09,925][42004] Updated weights for policy 0, policy_version 50046 (0.0033) +[2024-11-08 06:50:12,934][41694] Fps is (10 sec: 6961.4, 60 sec: 6963.2, 300 sec: 6720.2). Total num frames: 205004800. Throughput: 0: 1717.8. Samples: 46248132. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:50:12,937][41694] Avg episode reward: [(0, '4.371')] +[2024-11-08 06:50:16,746][42004] Updated weights for policy 0, policy_version 50056 (0.0035) +[2024-11-08 06:50:17,932][41694] Fps is (10 sec: 6143.9, 60 sec: 6894.9, 300 sec: 6692.4). Total num frames: 205033472. Throughput: 0: 1680.7. Samples: 46252204. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:50:17,933][41694] Avg episode reward: [(0, '4.439')] +[2024-11-08 06:50:22,932][41694] Fps is (10 sec: 4097.1, 60 sec: 6417.0, 300 sec: 6609.1). Total num frames: 205045760. Throughput: 0: 1530.5. Samples: 46257164. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:50:22,934][41694] Avg episode reward: [(0, '4.405')] +[2024-11-08 06:50:26,411][42004] Updated weights for policy 0, policy_version 50066 (0.0033) +[2024-11-08 06:50:27,932][41694] Fps is (10 sec: 4505.5, 60 sec: 6348.8, 300 sec: 6595.3). Total num frames: 205078528. Throughput: 0: 1490.8. Samples: 46266208. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:50:27,935][41694] Avg episode reward: [(0, '4.333')] +[2024-11-08 06:50:32,023][42004] Updated weights for policy 0, policy_version 50076 (0.0027) +[2024-11-08 06:50:32,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6280.6, 300 sec: 6609.1). Total num frames: 205115392. Throughput: 0: 1533.8. Samples: 46271560. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:50:32,934][41694] Avg episode reward: [(0, '4.364')] +[2024-11-08 06:50:37,370][42004] Updated weights for policy 0, policy_version 50086 (0.0028) +[2024-11-08 06:50:37,932][41694] Fps is (10 sec: 7782.6, 60 sec: 6348.8, 300 sec: 6702.5). Total num frames: 205156352. Throughput: 0: 1648.4. Samples: 46282976. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:50:37,934][41694] Avg episode reward: [(0, '4.491')] +[2024-11-08 06:50:42,861][42004] Updated weights for policy 0, policy_version 50096 (0.0035) +[2024-11-08 06:50:42,932][41694] Fps is (10 sec: 7782.3, 60 sec: 6348.8, 300 sec: 6706.3). Total num frames: 205193216. Throughput: 0: 1634.8. Samples: 46294324. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:50:42,935][41694] Avg episode reward: [(0, '4.436')] +[2024-11-08 06:50:47,931][41694] Fps is (10 sec: 7373.0, 60 sec: 6758.4, 300 sec: 6706.3). Total num frames: 205230080. Throughput: 0: 1629.6. Samples: 46299754. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:50:47,933][41694] Avg episode reward: [(0, '4.466')] +[2024-11-08 06:50:48,456][42004] Updated weights for policy 0, policy_version 50106 (0.0040) +[2024-11-08 06:50:52,932][41694] Fps is (10 sec: 6553.4, 60 sec: 6690.1, 300 sec: 6664.7). Total num frames: 205258752. Throughput: 0: 1622.7. Samples: 46310378. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:50:52,939][41694] Avg episode reward: [(0, '4.469')] +[2024-11-08 06:50:57,529][42004] Updated weights for policy 0, policy_version 50116 (0.0035) +[2024-11-08 06:50:57,932][41694] Fps is (10 sec: 4505.5, 60 sec: 6280.5, 300 sec: 6609.1). Total num frames: 205275136. Throughput: 0: 1499.2. Samples: 46315594. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:50:57,933][41694] Avg episode reward: [(0, '4.543')] +[2024-11-08 06:51:02,931][41694] Fps is (10 sec: 4915.4, 60 sec: 6212.3, 300 sec: 6595.3). Total num frames: 205307904. Throughput: 0: 1510.4. Samples: 46320174. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:51:02,933][41694] Avg episode reward: [(0, '4.636')] +[2024-11-08 06:51:03,736][42004] Updated weights for policy 0, policy_version 50126 (0.0030) +[2024-11-08 06:51:07,933][41694] Fps is (10 sec: 6962.1, 60 sec: 6212.1, 300 sec: 6623.0). Total num frames: 205344768. Throughput: 0: 1628.2. Samples: 46330436. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:51:07,936][41694] Avg episode reward: [(0, '4.492')] +[2024-11-08 06:51:09,685][42004] Updated weights for policy 0, policy_version 50136 (0.0034) +[2024-11-08 06:51:12,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6212.5, 300 sec: 6684.4). Total num frames: 205377536. Throughput: 0: 1649.9. Samples: 46340454. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:51:12,934][41694] Avg episode reward: [(0, '4.601')] +[2024-11-08 06:51:16,152][42004] Updated weights for policy 0, policy_version 50146 (0.0028) +[2024-11-08 06:51:17,931][41694] Fps is (10 sec: 6554.7, 60 sec: 6280.6, 300 sec: 6678.6). Total num frames: 205410304. Throughput: 0: 1633.7. Samples: 46345076. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:51:17,933][41694] Avg episode reward: [(0, '4.474')] +[2024-11-08 06:51:21,760][42004] Updated weights for policy 0, policy_version 50156 (0.0034) +[2024-11-08 06:51:22,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6690.1, 300 sec: 6678.6). Total num frames: 205447168. Throughput: 0: 1621.9. Samples: 46355960. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:51:22,933][41694] Avg episode reward: [(0, '4.460')] +[2024-11-08 06:51:27,865][42004] Updated weights for policy 0, policy_version 50166 (0.0035) +[2024-11-08 06:51:27,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6690.2, 300 sec: 6650.8). Total num frames: 205479936. Throughput: 0: 1595.5. Samples: 46366120. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:51:27,933][41694] Avg episode reward: [(0, '4.458')] +[2024-11-08 06:51:32,932][41694] Fps is (10 sec: 5324.8, 60 sec: 6417.1, 300 sec: 6595.2). Total num frames: 205500416. Throughput: 0: 1521.9. Samples: 46368240. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:51:32,935][41694] Avg episode reward: [(0, '4.611')] +[2024-11-08 06:51:35,285][42004] Updated weights for policy 0, policy_version 50176 (0.0031) +[2024-11-08 06:51:37,931][41694] Fps is (10 sec: 6144.0, 60 sec: 6417.1, 300 sec: 6595.3). Total num frames: 205541376. Throughput: 0: 1527.9. Samples: 46379134. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:51:37,933][41694] Avg episode reward: [(0, '4.431')] +[2024-11-08 06:51:37,944][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000050181_205541376.pth... +[2024-11-08 06:51:38,052][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000049795_203960320.pth +[2024-11-08 06:51:40,704][42004] Updated weights for policy 0, policy_version 50186 (0.0034) +[2024-11-08 06:51:42,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6417.1, 300 sec: 6623.0). Total num frames: 205578240. Throughput: 0: 1664.9. Samples: 46390514. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:51:42,933][41694] Avg episode reward: [(0, '4.565')] +[2024-11-08 06:51:46,210][42004] Updated weights for policy 0, policy_version 50196 (0.0048) +[2024-11-08 06:51:47,933][41694] Fps is (10 sec: 7372.2, 60 sec: 6417.0, 300 sec: 6692.4). Total num frames: 205615104. Throughput: 0: 1687.6. Samples: 46396118. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:51:47,935][41694] Avg episode reward: [(0, '4.389')] +[2024-11-08 06:51:51,720][42004] Updated weights for policy 0, policy_version 50206 (0.0025) +[2024-11-08 06:51:52,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6553.6, 300 sec: 6678.6). Total num frames: 205651968. Throughput: 0: 1704.9. Samples: 46407152. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:51:52,934][41694] Avg episode reward: [(0, '4.233')] +[2024-11-08 06:51:57,462][42004] Updated weights for policy 0, policy_version 50216 (0.0028) +[2024-11-08 06:51:57,937][41694] Fps is (10 sec: 6962.7, 60 sec: 6826.5, 300 sec: 6664.6). Total num frames: 205684736. Throughput: 0: 1722.2. Samples: 46417954. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:51:57,949][41694] Avg episode reward: [(0, '4.426')] +[2024-11-08 06:52:04,380][41694] Fps is (10 sec: 5008.7, 60 sec: 6532.4, 300 sec: 6590.7). Total num frames: 205709312. Throughput: 0: 1668.5. Samples: 46422578. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:52:04,383][41694] Avg episode reward: [(0, '4.578')] +[2024-11-08 06:52:06,665][42004] Updated weights for policy 0, policy_version 50226 (0.0027) +[2024-11-08 06:52:07,934][41694] Fps is (10 sec: 4914.8, 60 sec: 6485.3, 300 sec: 6567.4). Total num frames: 205733888. Throughput: 0: 1599.8. Samples: 46427954. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:52:07,940][41694] Avg episode reward: [(0, '4.381')] +[2024-11-08 06:52:12,339][42004] Updated weights for policy 0, policy_version 50236 (0.0031) +[2024-11-08 06:52:12,932][41694] Fps is (10 sec: 7185.1, 60 sec: 6553.6, 300 sec: 6581.4). Total num frames: 205770752. Throughput: 0: 1613.2. Samples: 46438716. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:52:12,934][41694] Avg episode reward: [(0, '4.204')] +[2024-11-08 06:52:17,741][42004] Updated weights for policy 0, policy_version 50246 (0.0033) +[2024-11-08 06:52:17,932][41694] Fps is (10 sec: 7374.5, 60 sec: 6621.9, 300 sec: 6595.3). Total num frames: 205807616. Throughput: 0: 1687.4. Samples: 46444172. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:52:17,933][41694] Avg episode reward: [(0, '4.513')] +[2024-11-08 06:52:22,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6621.9, 300 sec: 6650.8). Total num frames: 205844480. Throughput: 0: 1703.3. Samples: 46455784. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:52:22,933][41694] Avg episode reward: [(0, '4.557')] +[2024-11-08 06:52:23,005][42004] Updated weights for policy 0, policy_version 50256 (0.0027) +[2024-11-08 06:52:27,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6758.4, 300 sec: 6664.7). Total num frames: 205885440. Throughput: 0: 1709.7. Samples: 46467452. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:52:27,934][41694] Avg episode reward: [(0, '4.493')] +[2024-11-08 06:52:28,296][42004] Updated weights for policy 0, policy_version 50266 (0.0023) +[2024-11-08 06:52:32,932][41694] Fps is (10 sec: 7782.2, 60 sec: 7031.5, 300 sec: 6664.7). Total num frames: 205922304. Throughput: 0: 1703.6. Samples: 46472778. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:52:32,935][41694] Avg episode reward: [(0, '4.401')] +[2024-11-08 06:52:34,244][42004] Updated weights for policy 0, policy_version 50276 (0.0031) +[2024-11-08 06:52:39,071][41694] Fps is (10 sec: 5147.8, 60 sec: 6565.4, 300 sec: 6569.9). Total num frames: 205942784. Throughput: 0: 1638.9. Samples: 46482768. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:52:39,073][41694] Avg episode reward: [(0, '4.625')] +[2024-11-08 06:52:42,932][41694] Fps is (10 sec: 4505.7, 60 sec: 6485.3, 300 sec: 6553.6). Total num frames: 205967360. Throughput: 0: 1566.0. Samples: 46488420. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:52:42,933][41694] Avg episode reward: [(0, '4.568')] +[2024-11-08 06:52:43,053][42004] Updated weights for policy 0, policy_version 50286 (0.0037) +[2024-11-08 06:52:47,931][41694] Fps is (10 sec: 6934.2, 60 sec: 6485.4, 300 sec: 6553.6). Total num frames: 206004224. Throughput: 0: 1638.0. Samples: 46493916. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:52:47,933][41694] Avg episode reward: [(0, '4.490')] +[2024-11-08 06:52:48,521][42004] Updated weights for policy 0, policy_version 50296 (0.0043) +[2024-11-08 06:52:52,931][41694] Fps is (10 sec: 7782.5, 60 sec: 6553.6, 300 sec: 6567.5). Total num frames: 206045184. Throughput: 0: 1715.1. Samples: 46505130. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:52:52,934][41694] Avg episode reward: [(0, '4.488')] +[2024-11-08 06:52:53,996][42004] Updated weights for policy 0, policy_version 50306 (0.0029) +[2024-11-08 06:52:57,931][41694] Fps is (10 sec: 7782.4, 60 sec: 6622.0, 300 sec: 6636.9). Total num frames: 206082048. Throughput: 0: 1728.9. Samples: 46516518. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:52:57,934][41694] Avg episode reward: [(0, '4.264')] +[2024-11-08 06:52:59,462][42004] Updated weights for policy 0, policy_version 50316 (0.0034) +[2024-11-08 06:53:02,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6925.7, 300 sec: 6636.9). Total num frames: 206114816. Throughput: 0: 1731.3. Samples: 46522080. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:53:02,933][41694] Avg episode reward: [(0, '4.641')] +[2024-11-08 06:53:05,486][42004] Updated weights for policy 0, policy_version 50326 (0.0028) +[2024-11-08 06:53:07,932][41694] Fps is (10 sec: 6553.4, 60 sec: 6895.2, 300 sec: 6636.9). Total num frames: 206147584. Throughput: 0: 1696.5. Samples: 46532126. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:53:07,936][41694] Avg episode reward: [(0, '4.524')] +[2024-11-08 06:53:13,787][41694] Fps is (10 sec: 4905.4, 60 sec: 6528.8, 300 sec: 6562.4). Total num frames: 206168064. Throughput: 0: 1504.7. Samples: 46536452. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:53:13,788][41694] Avg episode reward: [(0, '4.406')] +[2024-11-08 06:53:14,631][42004] Updated weights for policy 0, policy_version 50336 (0.0050) +[2024-11-08 06:53:17,932][41694] Fps is (10 sec: 4915.3, 60 sec: 6485.3, 300 sec: 6553.6). Total num frames: 206196736. Throughput: 0: 1544.0. Samples: 46542260. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:53:17,934][41694] Avg episode reward: [(0, '4.390')] +[2024-11-08 06:53:20,277][42004] Updated weights for policy 0, policy_version 50346 (0.0033) +[2024-11-08 06:53:22,932][41694] Fps is (10 sec: 7166.2, 60 sec: 6485.3, 300 sec: 6553.6). Total num frames: 206233600. Throughput: 0: 1608.8. Samples: 46553330. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:53:22,934][41694] Avg episode reward: [(0, '4.581')] +[2024-11-08 06:53:25,828][42004] Updated weights for policy 0, policy_version 50356 (0.0021) +[2024-11-08 06:53:27,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6417.1, 300 sec: 6553.6). Total num frames: 206270464. Throughput: 0: 1690.9. Samples: 46564512. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:53:27,935][41694] Avg episode reward: [(0, '4.384')] +[2024-11-08 06:53:31,183][42004] Updated weights for policy 0, policy_version 50366 (0.0036) +[2024-11-08 06:53:32,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6417.1, 300 sec: 6623.0). Total num frames: 206307328. Throughput: 0: 1692.2. Samples: 46570064. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:53:32,933][41694] Avg episode reward: [(0, '4.592')] +[2024-11-08 06:53:36,717][42004] Updated weights for policy 0, policy_version 50376 (0.0038) +[2024-11-08 06:53:37,932][41694] Fps is (10 sec: 7782.5, 60 sec: 6889.2, 300 sec: 6650.8). Total num frames: 206348288. Throughput: 0: 1694.2. Samples: 46581368. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:53:37,933][41694] Avg episode reward: [(0, '4.417')] +[2024-11-08 06:53:37,943][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000050378_206348288.pth... +[2024-11-08 06:53:38,252][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000049994_204775424.pth +[2024-11-08 06:53:42,932][41694] Fps is (10 sec: 6963.0, 60 sec: 6826.6, 300 sec: 6623.0). Total num frames: 206376960. Throughput: 0: 1659.8. Samples: 46591210. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:53:42,933][41694] Avg episode reward: [(0, '4.538')] +[2024-11-08 06:53:42,964][42004] Updated weights for policy 0, policy_version 50386 (0.0036) +[2024-11-08 06:53:48,481][41694] Fps is (10 sec: 4659.0, 60 sec: 6494.1, 300 sec: 6555.3). Total num frames: 206397440. Throughput: 0: 1620.1. Samples: 46595874. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:53:48,483][41694] Avg episode reward: [(0, '4.495')] +[2024-11-08 06:53:51,414][42004] Updated weights for policy 0, policy_version 50396 (0.0029) +[2024-11-08 06:53:52,932][41694] Fps is (10 sec: 5324.9, 60 sec: 6417.0, 300 sec: 6539.7). Total num frames: 206430208. Throughput: 0: 1560.3. Samples: 46602338. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:53:52,934][41694] Avg episode reward: [(0, '4.393')] +[2024-11-08 06:53:57,130][42004] Updated weights for policy 0, policy_version 50406 (0.0032) +[2024-11-08 06:53:57,932][41694] Fps is (10 sec: 7368.3, 60 sec: 6417.0, 300 sec: 6553.6). Total num frames: 206467072. Throughput: 0: 1735.3. Samples: 46613056. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:53:57,933][41694] Avg episode reward: [(0, '4.544')] +[2024-11-08 06:54:02,647][42004] Updated weights for policy 0, policy_version 50416 (0.0028) +[2024-11-08 06:54:02,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6485.3, 300 sec: 6553.6). Total num frames: 206503936. Throughput: 0: 1697.9. Samples: 46618666. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:54:02,933][41694] Avg episode reward: [(0, '4.532')] +[2024-11-08 06:54:07,828][42004] Updated weights for policy 0, policy_version 50426 (0.0034) +[2024-11-08 06:54:07,932][41694] Fps is (10 sec: 7782.5, 60 sec: 6621.9, 300 sec: 6637.0). Total num frames: 206544896. Throughput: 0: 1706.0. Samples: 46630098. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:54:07,933][41694] Avg episode reward: [(0, '4.274')] +[2024-11-08 06:54:12,931][41694] Fps is (10 sec: 7782.4, 60 sec: 6994.6, 300 sec: 6650.8). Total num frames: 206581760. Throughput: 0: 1721.4. Samples: 46641974. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:54:12,933][41694] Avg episode reward: [(0, '4.460')] +[2024-11-08 06:54:13,064][42004] Updated weights for policy 0, policy_version 50436 (0.0030) +[2024-11-08 06:54:17,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6963.2, 300 sec: 6623.0). Total num frames: 206614528. Throughput: 0: 1712.3. Samples: 46647120. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:54:17,934][41694] Avg episode reward: [(0, '4.372')] +[2024-11-08 06:54:19,459][42004] Updated weights for policy 0, policy_version 50446 (0.0048) +[2024-11-08 06:54:23,206][41694] Fps is (10 sec: 5182.6, 60 sec: 6659.7, 300 sec: 6561.4). Total num frames: 206635008. Throughput: 0: 1560.8. Samples: 46652030. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:54:23,209][41694] Avg episode reward: [(0, '4.412')] +[2024-11-08 06:54:27,576][42004] Updated weights for policy 0, policy_version 50456 (0.0023) +[2024-11-08 06:54:27,931][41694] Fps is (10 sec: 5324.9, 60 sec: 6621.9, 300 sec: 6539.7). Total num frames: 206667776. Throughput: 0: 1602.0. Samples: 46663298. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:54:27,933][41694] Avg episode reward: [(0, '4.504')] +[2024-11-08 06:54:32,903][42004] Updated weights for policy 0, policy_version 50466 (0.0028) +[2024-11-08 06:54:32,931][41694] Fps is (10 sec: 7580.9, 60 sec: 6690.1, 300 sec: 6553.6). Total num frames: 206708736. Throughput: 0: 1644.2. Samples: 46668958. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:54:32,933][41694] Avg episode reward: [(0, '4.682')] +[2024-11-08 06:54:37,932][41694] Fps is (10 sec: 7782.3, 60 sec: 6621.9, 300 sec: 6553.6). Total num frames: 206745600. Throughput: 0: 1741.5. Samples: 46680706. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:54:37,933][41694] Avg episode reward: [(0, '4.254')] +[2024-11-08 06:54:38,110][42004] Updated weights for policy 0, policy_version 50476 (0.0033) +[2024-11-08 06:54:42,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6758.4, 300 sec: 6636.9). Total num frames: 206782464. Throughput: 0: 1760.2. Samples: 46692264. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:54:42,933][41694] Avg episode reward: [(0, '4.423')] +[2024-11-08 06:54:43,515][42004] Updated weights for policy 0, policy_version 50486 (0.0027) +[2024-11-08 06:54:47,933][41694] Fps is (10 sec: 7372.0, 60 sec: 7096.4, 300 sec: 6650.8). Total num frames: 206819328. Throughput: 0: 1758.4. Samples: 46697796. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:54:47,936][41694] Avg episode reward: [(0, '4.574')] +[2024-11-08 06:54:49,613][42004] Updated weights for policy 0, policy_version 50496 (0.0040) +[2024-11-08 06:54:52,931][41694] Fps is (10 sec: 6963.3, 60 sec: 7031.5, 300 sec: 6623.0). Total num frames: 206852096. Throughput: 0: 1727.4. Samples: 46707832. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:54:52,933][41694] Avg episode reward: [(0, '4.808')] +[2024-11-08 06:54:57,973][41694] Fps is (10 sec: 4895.3, 60 sec: 6685.5, 300 sec: 6552.7). Total num frames: 206868480. Throughput: 0: 1576.3. Samples: 46712974. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:54:57,974][41694] Avg episode reward: [(0, '4.944')] +[2024-11-08 06:54:57,999][41991] Saving new best policy, reward=4.944! +[2024-11-08 06:54:58,446][42004] Updated weights for policy 0, policy_version 50506 (0.0034) +[2024-11-08 06:55:02,932][41694] Fps is (10 sec: 4915.1, 60 sec: 6621.9, 300 sec: 6539.7). Total num frames: 206901248. Throughput: 0: 1587.3. Samples: 46718546. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:55:02,934][41694] Avg episode reward: [(0, '4.485')] +[2024-11-08 06:55:04,108][42004] Updated weights for policy 0, policy_version 50516 (0.0031) +[2024-11-08 06:55:07,932][41694] Fps is (10 sec: 7403.6, 60 sec: 6621.9, 300 sec: 6567.5). Total num frames: 206942208. Throughput: 0: 1733.4. Samples: 46729556. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:55:07,934][41694] Avg episode reward: [(0, '4.642')] +[2024-11-08 06:55:09,420][42004] Updated weights for policy 0, policy_version 50526 (0.0029) +[2024-11-08 06:55:12,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6621.9, 300 sec: 6595.3). Total num frames: 206979072. Throughput: 0: 1727.9. Samples: 46741054. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:55:12,935][41694] Avg episode reward: [(0, '4.512')] +[2024-11-08 06:55:15,345][42004] Updated weights for policy 0, policy_version 50536 (0.0031) +[2024-11-08 06:55:17,933][41694] Fps is (10 sec: 6962.5, 60 sec: 6621.8, 300 sec: 6664.7). Total num frames: 207011840. Throughput: 0: 1710.5. Samples: 46745934. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:55:17,935][41694] Avg episode reward: [(0, '4.613')] +[2024-11-08 06:55:21,683][42004] Updated weights for policy 0, policy_version 50546 (0.0038) +[2024-11-08 06:55:22,931][41694] Fps is (10 sec: 6144.1, 60 sec: 6789.5, 300 sec: 6650.8). Total num frames: 207040512. Throughput: 0: 1664.5. Samples: 46755608. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:55:22,933][41694] Avg episode reward: [(0, '4.892')] +[2024-11-08 06:55:27,932][41694] Fps is (10 sec: 6144.5, 60 sec: 6758.4, 300 sec: 6636.9). Total num frames: 207073280. Throughput: 0: 1620.9. Samples: 46765204. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:55:27,933][41694] Avg episode reward: [(0, '4.838')] +[2024-11-08 06:55:27,957][42004] Updated weights for policy 0, policy_version 50556 (0.0026) +[2024-11-08 06:55:32,932][41694] Fps is (10 sec: 5324.7, 60 sec: 6417.1, 300 sec: 6567.5). Total num frames: 207093760. Throughput: 0: 1614.9. Samples: 46770464. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:55:32,933][41694] Avg episode reward: [(0, '4.528')] +[2024-11-08 06:55:36,270][42004] Updated weights for policy 0, policy_version 50566 (0.0030) +[2024-11-08 06:55:37,932][41694] Fps is (10 sec: 5734.5, 60 sec: 6417.1, 300 sec: 6567.5). Total num frames: 207130624. Throughput: 0: 1533.6. Samples: 46776842. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:55:37,933][41694] Avg episode reward: [(0, '4.425')] +[2024-11-08 06:55:37,945][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000050569_207130624.pth... +[2024-11-08 06:55:38,085][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000050181_205541376.pth +[2024-11-08 06:55:41,638][42004] Updated weights for policy 0, policy_version 50576 (0.0044) +[2024-11-08 06:55:42,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6417.1, 300 sec: 6567.5). Total num frames: 207167488. Throughput: 0: 1672.2. Samples: 46788152. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:55:42,935][41694] Avg episode reward: [(0, '4.553')] +[2024-11-08 06:55:46,971][42004] Updated weights for policy 0, policy_version 50586 (0.0030) +[2024-11-08 06:55:47,932][41694] Fps is (10 sec: 7372.6, 60 sec: 6417.2, 300 sec: 6595.3). Total num frames: 207204352. Throughput: 0: 1665.8. Samples: 46793506. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:55:47,935][41694] Avg episode reward: [(0, '4.644')] +[2024-11-08 06:55:52,220][42004] Updated weights for policy 0, policy_version 50596 (0.0033) +[2024-11-08 06:55:52,932][41694] Fps is (10 sec: 7781.9, 60 sec: 6553.5, 300 sec: 6678.5). Total num frames: 207245312. Throughput: 0: 1689.4. Samples: 46805580. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:55:52,934][41694] Avg episode reward: [(0, '4.549')] +[2024-11-08 06:55:57,932][41694] Fps is (10 sec: 7373.0, 60 sec: 6831.4, 300 sec: 6678.6). Total num frames: 207278080. Throughput: 0: 1666.0. Samples: 46816022. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:55:57,935][41694] Avg episode reward: [(0, '4.458')] +[2024-11-08 06:55:58,135][42004] Updated weights for policy 0, policy_version 50606 (0.0031) +[2024-11-08 06:56:02,933][41694] Fps is (10 sec: 6553.0, 60 sec: 6826.5, 300 sec: 6664.7). Total num frames: 207310848. Throughput: 0: 1663.0. Samples: 46820772. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:56:02,937][41694] Avg episode reward: [(0, '4.421')] +[2024-11-08 06:56:04,633][42004] Updated weights for policy 0, policy_version 50616 (0.0030) +[2024-11-08 06:56:07,932][41694] Fps is (10 sec: 4915.2, 60 sec: 6417.1, 300 sec: 6609.1). Total num frames: 207327232. Throughput: 0: 1628.3. Samples: 46828884. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:56:07,933][41694] Avg episode reward: [(0, '4.253')] +[2024-11-08 06:56:12,931][41694] Fps is (10 sec: 4916.0, 60 sec: 6348.8, 300 sec: 6609.1). Total num frames: 207360000. Throughput: 0: 1590.5. Samples: 46836774. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:56:12,933][41694] Avg episode reward: [(0, '4.393')] +[2024-11-08 06:56:12,977][42004] Updated weights for policy 0, policy_version 50626 (0.0030) +[2024-11-08 06:56:17,931][41694] Fps is (10 sec: 7373.0, 60 sec: 6485.5, 300 sec: 6623.0). Total num frames: 207400960. Throughput: 0: 1586.0. Samples: 46841832. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:56:17,932][41694] Avg episode reward: [(0, '4.532')] +[2024-11-08 06:56:18,498][42004] Updated weights for policy 0, policy_version 50636 (0.0027) +[2024-11-08 06:56:22,931][41694] Fps is (10 sec: 7782.4, 60 sec: 6621.9, 300 sec: 6636.9). Total num frames: 207437824. Throughput: 0: 1702.4. Samples: 46853448. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:56:22,933][41694] Avg episode reward: [(0, '4.441')] +[2024-11-08 06:56:23,948][42004] Updated weights for policy 0, policy_version 50646 (0.0020) +[2024-11-08 06:56:27,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6690.2, 300 sec: 6692.5). Total num frames: 207474688. Throughput: 0: 1699.1. Samples: 46864610. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:56:27,933][41694] Avg episode reward: [(0, '4.591')] +[2024-11-08 06:56:29,776][42004] Updated weights for policy 0, policy_version 50656 (0.0031) +[2024-11-08 06:56:32,932][41694] Fps is (10 sec: 6553.5, 60 sec: 6826.7, 300 sec: 6650.8). Total num frames: 207503360. Throughput: 0: 1690.9. Samples: 46869594. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:56:32,934][41694] Avg episode reward: [(0, '4.439')] +[2024-11-08 06:56:35,830][42004] Updated weights for policy 0, policy_version 50666 (0.0030) +[2024-11-08 06:56:37,932][41694] Fps is (10 sec: 6553.4, 60 sec: 6826.7, 300 sec: 6650.8). Total num frames: 207540224. Throughput: 0: 1652.4. Samples: 46879936. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:56:37,935][41694] Avg episode reward: [(0, '4.343')] +[2024-11-08 06:56:42,931][41694] Fps is (10 sec: 5324.9, 60 sec: 6485.3, 300 sec: 6581.4). Total num frames: 207556608. Throughput: 0: 1546.9. Samples: 46885632. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:56:42,934][41694] Avg episode reward: [(0, '4.574')] +[2024-11-08 06:56:44,368][42004] Updated weights for policy 0, policy_version 50676 (0.0038) +[2024-11-08 06:56:47,932][41694] Fps is (10 sec: 5324.9, 60 sec: 6485.4, 300 sec: 6581.4). Total num frames: 207593472. Throughput: 0: 1561.9. Samples: 46891054. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:56:47,933][41694] Avg episode reward: [(0, '4.673')] +[2024-11-08 06:56:49,627][42004] Updated weights for policy 0, policy_version 50686 (0.0031) +[2024-11-08 06:56:52,932][41694] Fps is (10 sec: 7782.3, 60 sec: 6485.4, 300 sec: 6609.2). Total num frames: 207634432. Throughput: 0: 1634.2. Samples: 46902424. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:56:52,935][41694] Avg episode reward: [(0, '4.260')] +[2024-11-08 06:56:55,028][42004] Updated weights for policy 0, policy_version 50696 (0.0037) +[2024-11-08 06:56:57,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6553.6, 300 sec: 6683.6). Total num frames: 207671296. Throughput: 0: 1716.0. Samples: 46913994. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:56:57,933][41694] Avg episode reward: [(0, '4.437')] +[2024-11-08 06:57:00,642][42004] Updated weights for policy 0, policy_version 50706 (0.0028) +[2024-11-08 06:57:02,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6553.8, 300 sec: 6678.6). Total num frames: 207704064. Throughput: 0: 1723.7. Samples: 46919400. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:57:02,933][41694] Avg episode reward: [(0, '4.471')] +[2024-11-08 06:57:06,947][42004] Updated weights for policy 0, policy_version 50716 (0.0042) +[2024-11-08 06:57:07,931][41694] Fps is (10 sec: 6553.6, 60 sec: 6826.7, 300 sec: 6664.7). Total num frames: 207736832. Throughput: 0: 1682.2. Samples: 46929148. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:57:07,933][41694] Avg episode reward: [(0, '4.412')] +[2024-11-08 06:57:12,604][42004] Updated weights for policy 0, policy_version 50726 (0.0029) +[2024-11-08 06:57:12,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6894.9, 300 sec: 6664.7). Total num frames: 207773696. Throughput: 0: 1671.3. Samples: 46939818. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:57:12,933][41694] Avg episode reward: [(0, '4.263')] +[2024-11-08 06:57:17,932][41694] Fps is (10 sec: 5324.7, 60 sec: 6485.3, 300 sec: 6595.3). Total num frames: 207790080. Throughput: 0: 1649.2. Samples: 46943810. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:57:17,933][41694] Avg episode reward: [(0, '4.462')] +[2024-11-08 06:57:20,943][42004] Updated weights for policy 0, policy_version 50736 (0.0027) +[2024-11-08 06:57:22,932][41694] Fps is (10 sec: 5324.5, 60 sec: 6485.3, 300 sec: 6581.4). Total num frames: 207826944. Throughput: 0: 1587.6. Samples: 46951380. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:57:22,934][41694] Avg episode reward: [(0, '4.346')] +[2024-11-08 06:57:26,447][42004] Updated weights for policy 0, policy_version 50746 (0.0030) +[2024-11-08 06:57:27,932][41694] Fps is (10 sec: 7372.9, 60 sec: 6485.3, 300 sec: 6581.4). Total num frames: 207863808. Throughput: 0: 1707.4. Samples: 46962464. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:57:27,933][41694] Avg episode reward: [(0, '4.502')] +[2024-11-08 06:57:31,955][42004] Updated weights for policy 0, policy_version 50756 (0.0029) +[2024-11-08 06:57:32,931][41694] Fps is (10 sec: 7373.3, 60 sec: 6621.9, 300 sec: 6662.6). Total num frames: 207900672. Throughput: 0: 1705.5. Samples: 46967802. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:57:32,933][41694] Avg episode reward: [(0, '4.526')] +[2024-11-08 06:57:37,574][42004] Updated weights for policy 0, policy_version 50766 (0.0049) +[2024-11-08 06:57:37,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6621.9, 300 sec: 6678.6). Total num frames: 207937536. Throughput: 0: 1711.4. Samples: 46979438. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:57:37,935][41694] Avg episode reward: [(0, '4.708')] +[2024-11-08 06:57:37,952][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000050766_207937536.pth... +[2024-11-08 06:57:38,137][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000050378_206348288.pth +[2024-11-08 06:57:42,931][41694] Fps is (10 sec: 6963.1, 60 sec: 6894.9, 300 sec: 6664.7). Total num frames: 207970304. Throughput: 0: 1668.3. Samples: 46989066. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 06:57:42,934][41694] Avg episode reward: [(0, '4.489')] +[2024-11-08 06:57:43,574][42004] Updated weights for policy 0, policy_version 50776 (0.0034) +[2024-11-08 06:57:47,932][41694] Fps is (10 sec: 6963.0, 60 sec: 6894.9, 300 sec: 6650.8). Total num frames: 208007168. Throughput: 0: 1672.4. Samples: 46994658. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 06:57:47,934][41694] Avg episode reward: [(0, '4.540')] +[2024-11-08 06:57:49,159][42004] Updated weights for policy 0, policy_version 50786 (0.0030) +[2024-11-08 06:57:52,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6553.6, 300 sec: 6595.2). Total num frames: 208027648. Throughput: 0: 1625.5. Samples: 47002296. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 06:57:52,934][41694] Avg episode reward: [(0, '4.432')] +[2024-11-08 06:57:57,296][42004] Updated weights for policy 0, policy_version 50796 (0.0029) +[2024-11-08 06:57:57,932][41694] Fps is (10 sec: 5734.5, 60 sec: 6553.6, 300 sec: 6609.1). Total num frames: 208064512. Throughput: 0: 1607.3. Samples: 47012148. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 06:57:57,934][41694] Avg episode reward: [(0, '4.382')] +[2024-11-08 06:58:02,770][42004] Updated weights for policy 0, policy_version 50806 (0.0023) +[2024-11-08 06:58:02,932][41694] Fps is (10 sec: 7372.9, 60 sec: 6621.9, 300 sec: 6623.0). Total num frames: 208101376. Throughput: 0: 1644.8. Samples: 47017826. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 06:58:02,933][41694] Avg episode reward: [(0, '4.352')] +[2024-11-08 06:58:07,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6690.1, 300 sec: 6698.0). Total num frames: 208138240. Throughput: 0: 1722.0. Samples: 47028870. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 06:58:07,934][41694] Avg episode reward: [(0, '4.403')] +[2024-11-08 06:58:08,171][42004] Updated weights for policy 0, policy_version 50816 (0.0027) +[2024-11-08 06:58:12,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6621.9, 300 sec: 6692.4). Total num frames: 208171008. Throughput: 0: 1718.6. Samples: 47039802. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:58:12,933][41694] Avg episode reward: [(0, '4.486')] +[2024-11-08 06:58:14,142][42004] Updated weights for policy 0, policy_version 50826 (0.0037) +[2024-11-08 06:58:17,932][41694] Fps is (10 sec: 6963.3, 60 sec: 6963.2, 300 sec: 6692.4). Total num frames: 208207872. Throughput: 0: 1709.9. Samples: 47044746. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:58:17,934][41694] Avg episode reward: [(0, '4.504')] +[2024-11-08 06:58:19,713][42004] Updated weights for policy 0, policy_version 50836 (0.0028) +[2024-11-08 06:58:22,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6963.2, 300 sec: 6692.4). Total num frames: 208244736. Throughput: 0: 1701.7. Samples: 47056016. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:58:22,935][41694] Avg episode reward: [(0, '4.415')] +[2024-11-08 06:58:27,931][41694] Fps is (10 sec: 5324.9, 60 sec: 6621.9, 300 sec: 6623.0). Total num frames: 208261120. Throughput: 0: 1614.4. Samples: 47061712. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:58:27,933][41694] Avg episode reward: [(0, '4.524')] +[2024-11-08 06:58:28,234][42004] Updated weights for policy 0, policy_version 50846 (0.0028) +[2024-11-08 06:58:32,932][41694] Fps is (10 sec: 5324.8, 60 sec: 6621.8, 300 sec: 6609.1). Total num frames: 208297984. Throughput: 0: 1615.9. Samples: 47067372. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:58:32,934][41694] Avg episode reward: [(0, '4.345')] +[2024-11-08 06:58:33,635][42004] Updated weights for policy 0, policy_version 50856 (0.0037) +[2024-11-08 06:58:37,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6621.9, 300 sec: 6636.9). Total num frames: 208334848. Throughput: 0: 1691.6. Samples: 47078418. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:58:37,935][41694] Avg episode reward: [(0, '4.411')] +[2024-11-08 06:58:39,042][42004] Updated weights for policy 0, policy_version 50866 (0.0023) +[2024-11-08 06:58:42,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6758.4, 300 sec: 6718.9). Total num frames: 208375808. Throughput: 0: 1728.2. Samples: 47089918. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 06:58:42,935][41694] Avg episode reward: [(0, '4.374')] +[2024-11-08 06:58:44,976][42004] Updated weights for policy 0, policy_version 50876 (0.0035) +[2024-11-08 06:58:47,932][41694] Fps is (10 sec: 6963.0, 60 sec: 6621.9, 300 sec: 6692.4). Total num frames: 208404480. Throughput: 0: 1708.0. Samples: 47094688. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:58:47,934][41694] Avg episode reward: [(0, '4.446')] +[2024-11-08 06:58:51,090][42004] Updated weights for policy 0, policy_version 50886 (0.0028) +[2024-11-08 06:58:52,932][41694] Fps is (10 sec: 6553.3, 60 sec: 6894.9, 300 sec: 6692.4). Total num frames: 208441344. Throughput: 0: 1685.7. Samples: 47104726. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:58:52,934][41694] Avg episode reward: [(0, '4.378')] +[2024-11-08 06:58:57,044][42004] Updated weights for policy 0, policy_version 50896 (0.0029) +[2024-11-08 06:58:57,932][41694] Fps is (10 sec: 6963.4, 60 sec: 6826.7, 300 sec: 6678.6). Total num frames: 208474112. Throughput: 0: 1670.2. Samples: 47114962. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:58:57,934][41694] Avg episode reward: [(0, '4.381')] +[2024-11-08 06:59:02,931][41694] Fps is (10 sec: 4915.5, 60 sec: 6485.4, 300 sec: 6595.3). Total num frames: 208490496. Throughput: 0: 1637.4. Samples: 47118430. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:59:02,933][41694] Avg episode reward: [(0, '4.529')] +[2024-11-08 06:59:05,519][42004] Updated weights for policy 0, policy_version 50906 (0.0027) +[2024-11-08 06:59:07,932][41694] Fps is (10 sec: 5324.7, 60 sec: 6485.3, 300 sec: 6595.3). Total num frames: 208527360. Throughput: 0: 1562.0. Samples: 47126308. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:59:07,933][41694] Avg episode reward: [(0, '4.771')] +[2024-11-08 06:59:10,969][42004] Updated weights for policy 0, policy_version 50916 (0.0023) +[2024-11-08 06:59:12,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6553.6, 300 sec: 6609.1). Total num frames: 208564224. Throughput: 0: 1686.8. Samples: 47137620. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:59:12,934][41694] Avg episode reward: [(0, '4.195')] +[2024-11-08 06:59:16,226][42004] Updated weights for policy 0, policy_version 50926 (0.0027) +[2024-11-08 06:59:17,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6553.6, 300 sec: 6670.9). Total num frames: 208601088. Throughput: 0: 1684.8. Samples: 47143188. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:59:17,937][41694] Avg episode reward: [(0, '4.484')] +[2024-11-08 06:59:22,301][42004] Updated weights for policy 0, policy_version 50936 (0.0027) +[2024-11-08 06:59:22,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6553.6, 300 sec: 6678.6). Total num frames: 208637952. Throughput: 0: 1675.9. Samples: 47153834. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:59:22,933][41694] Avg episode reward: [(0, '4.533')] +[2024-11-08 06:59:27,551][42004] Updated weights for policy 0, policy_version 50946 (0.0029) +[2024-11-08 06:59:27,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6894.9, 300 sec: 6664.7). Total num frames: 208674816. Throughput: 0: 1672.4. Samples: 47165176. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:59:27,935][41694] Avg episode reward: [(0, '4.404')] +[2024-11-08 06:59:32,932][41694] Fps is (10 sec: 7372.5, 60 sec: 6894.9, 300 sec: 6664.7). Total num frames: 208711680. Throughput: 0: 1686.0. Samples: 47170560. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:59:32,936][41694] Avg episode reward: [(0, '4.655')] +[2024-11-08 06:59:33,156][42004] Updated weights for policy 0, policy_version 50956 (0.0027) +[2024-11-08 06:59:37,932][41694] Fps is (10 sec: 5324.7, 60 sec: 6553.6, 300 sec: 6595.2). Total num frames: 208728064. Throughput: 0: 1617.5. Samples: 47177514. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:59:37,934][41694] Avg episode reward: [(0, '4.629')] +[2024-11-08 06:59:37,980][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000050960_208732160.pth... +[2024-11-08 06:59:38,090][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000050569_207130624.pth +[2024-11-08 06:59:41,298][42004] Updated weights for policy 0, policy_version 50966 (0.0032) +[2024-11-08 06:59:42,933][41694] Fps is (10 sec: 5324.1, 60 sec: 6485.2, 300 sec: 6595.2). Total num frames: 208764928. Throughput: 0: 1625.3. Samples: 47188102. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 06:59:42,935][41694] Avg episode reward: [(0, '4.288')] +[2024-11-08 06:59:46,746][42004] Updated weights for policy 0, policy_version 50976 (0.0034) +[2024-11-08 06:59:47,932][41694] Fps is (10 sec: 7782.5, 60 sec: 6690.1, 300 sec: 6623.0). Total num frames: 208805888. Throughput: 0: 1663.5. Samples: 47193288. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:59:47,943][41694] Avg episode reward: [(0, '4.424')] +[2024-11-08 06:59:52,557][42004] Updated weights for policy 0, policy_version 50986 (0.0032) +[2024-11-08 06:59:52,932][41694] Fps is (10 sec: 7374.0, 60 sec: 6621.9, 300 sec: 6679.5). Total num frames: 208838656. Throughput: 0: 1746.0. Samples: 47204876. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:59:52,934][41694] Avg episode reward: [(0, '4.236')] +[2024-11-08 06:59:57,932][41694] Fps is (10 sec: 6553.7, 60 sec: 6621.9, 300 sec: 6678.6). Total num frames: 208871424. Throughput: 0: 1710.3. Samples: 47214586. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 06:59:57,933][41694] Avg episode reward: [(0, '4.320')] +[2024-11-08 06:59:58,543][42004] Updated weights for policy 0, policy_version 50996 (0.0045) +[2024-11-08 07:00:02,932][41694] Fps is (10 sec: 6962.9, 60 sec: 6963.1, 300 sec: 6664.7). Total num frames: 208908288. Throughput: 0: 1709.5. Samples: 47220116. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 07:00:02,935][41694] Avg episode reward: [(0, '4.485')] +[2024-11-08 07:00:04,236][42004] Updated weights for policy 0, policy_version 51006 (0.0031) +[2024-11-08 07:00:07,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6963.2, 300 sec: 6664.7). Total num frames: 208945152. Throughput: 0: 1707.5. Samples: 47230674. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 07:00:07,933][41694] Avg episode reward: [(0, '4.488')] +[2024-11-08 07:00:12,771][42004] Updated weights for policy 0, policy_version 51016 (0.0038) +[2024-11-08 07:00:12,932][41694] Fps is (10 sec: 5324.9, 60 sec: 6621.8, 300 sec: 6609.2). Total num frames: 208961536. Throughput: 0: 1589.6. Samples: 47236708. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 07:00:12,934][41694] Avg episode reward: [(0, '4.472')] +[2024-11-08 07:00:17,931][41694] Fps is (10 sec: 4915.3, 60 sec: 6553.6, 300 sec: 6623.0). Total num frames: 208994304. Throughput: 0: 1580.6. Samples: 47241684. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 07:00:17,933][41694] Avg episode reward: [(0, '4.428')] +[2024-11-08 07:00:18,699][42004] Updated weights for policy 0, policy_version 51026 (0.0026) +[2024-11-08 07:00:22,934][41694] Fps is (10 sec: 6552.0, 60 sec: 6485.0, 300 sec: 6623.0). Total num frames: 209027072. Throughput: 0: 1646.0. Samples: 47251588. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:00:22,938][41694] Avg episode reward: [(0, '4.393')] +[2024-11-08 07:00:24,900][42004] Updated weights for policy 0, policy_version 51036 (0.0059) +[2024-11-08 07:00:27,933][41694] Fps is (10 sec: 6552.3, 60 sec: 6416.9, 300 sec: 6664.6). Total num frames: 209059840. Throughput: 0: 1635.3. Samples: 47261692. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:00:27,936][41694] Avg episode reward: [(0, '4.256')] +[2024-11-08 07:00:31,325][42004] Updated weights for policy 0, policy_version 51046 (0.0027) +[2024-11-08 07:00:32,932][41694] Fps is (10 sec: 6555.3, 60 sec: 6348.8, 300 sec: 6650.8). Total num frames: 209092608. Throughput: 0: 1627.6. Samples: 47266528. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:00:32,933][41694] Avg episode reward: [(0, '4.524')] +[2024-11-08 07:00:36,734][42004] Updated weights for policy 0, policy_version 51056 (0.0026) +[2024-11-08 07:00:37,932][41694] Fps is (10 sec: 7374.1, 60 sec: 6758.4, 300 sec: 6664.7). Total num frames: 209133568. Throughput: 0: 1616.2. Samples: 47277604. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:00:37,934][41694] Avg episode reward: [(0, '4.436')] +[2024-11-08 07:00:42,153][42004] Updated weights for policy 0, policy_version 51066 (0.0031) +[2024-11-08 07:00:42,932][41694] Fps is (10 sec: 7782.4, 60 sec: 6758.6, 300 sec: 6664.7). Total num frames: 209170432. Throughput: 0: 1650.2. Samples: 47288844. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:00:42,933][41694] Avg episode reward: [(0, '4.318')] +[2024-11-08 07:00:47,931][41694] Fps is (10 sec: 5734.5, 60 sec: 6417.1, 300 sec: 6595.3). Total num frames: 209190912. Throughput: 0: 1579.0. Samples: 47291170. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:00:47,933][41694] Avg episode reward: [(0, '4.290')] +[2024-11-08 07:00:50,079][42004] Updated weights for policy 0, policy_version 51076 (0.0024) +[2024-11-08 07:00:52,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6485.3, 300 sec: 6609.1). Total num frames: 209227776. Throughput: 0: 1563.9. Samples: 47301050. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 07:00:52,934][41694] Avg episode reward: [(0, '4.561')] +[2024-11-08 07:00:55,583][42004] Updated weights for policy 0, policy_version 51086 (0.0029) +[2024-11-08 07:00:57,932][41694] Fps is (10 sec: 7372.2, 60 sec: 6553.5, 300 sec: 6623.0). Total num frames: 209264640. Throughput: 0: 1675.6. Samples: 47312112. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 07:00:57,935][41694] Avg episode reward: [(0, '4.325')] +[2024-11-08 07:01:01,480][42004] Updated weights for policy 0, policy_version 51096 (0.0043) +[2024-11-08 07:01:02,931][41694] Fps is (10 sec: 6553.7, 60 sec: 6417.1, 300 sec: 6664.7). Total num frames: 209293312. Throughput: 0: 1685.4. Samples: 47317528. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 07:01:02,934][41694] Avg episode reward: [(0, '4.405')] +[2024-11-08 07:01:07,724][42004] Updated weights for policy 0, policy_version 51106 (0.0024) +[2024-11-08 07:01:07,932][41694] Fps is (10 sec: 6554.0, 60 sec: 6417.1, 300 sec: 6678.6). Total num frames: 209330176. Throughput: 0: 1668.1. Samples: 47326648. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 07:01:07,934][41694] Avg episode reward: [(0, '4.662')] +[2024-11-08 07:01:12,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6690.2, 300 sec: 6650.8). Total num frames: 209362944. Throughput: 0: 1675.8. Samples: 47337098. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 07:01:12,935][41694] Avg episode reward: [(0, '4.363')] +[2024-11-08 07:01:14,029][42004] Updated weights for policy 0, policy_version 51116 (0.0027) +[2024-11-08 07:01:17,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6758.4, 300 sec: 6650.8). Total num frames: 209399808. Throughput: 0: 1683.2. Samples: 47342270. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 07:01:17,933][41694] Avg episode reward: [(0, '4.381')] +[2024-11-08 07:01:21,779][42004] Updated weights for policy 0, policy_version 51126 (0.0033) +[2024-11-08 07:01:22,932][41694] Fps is (10 sec: 5324.4, 60 sec: 6485.6, 300 sec: 6581.4). Total num frames: 209416192. Throughput: 0: 1592.9. Samples: 47349286. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 07:01:22,934][41694] Avg episode reward: [(0, '4.351')] +[2024-11-08 07:01:27,565][42004] Updated weights for policy 0, policy_version 51136 (0.0025) +[2024-11-08 07:01:27,931][41694] Fps is (10 sec: 5324.9, 60 sec: 6553.8, 300 sec: 6609.1). Total num frames: 209453056. Throughput: 0: 1579.2. Samples: 47359906. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 07:01:27,934][41694] Avg episode reward: [(0, '4.537')] +[2024-11-08 07:01:32,877][42004] Updated weights for policy 0, policy_version 51146 (0.0028) +[2024-11-08 07:01:32,931][41694] Fps is (10 sec: 7783.0, 60 sec: 6690.2, 300 sec: 6623.0). Total num frames: 209494016. Throughput: 0: 1652.8. Samples: 47365544. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 07:01:32,933][41694] Avg episode reward: [(0, '4.438')] +[2024-11-08 07:01:37,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6485.3, 300 sec: 6664.7). Total num frames: 209522688. Throughput: 0: 1664.1. Samples: 47375936. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 07:01:37,933][41694] Avg episode reward: [(0, '4.471')] +[2024-11-08 07:01:37,956][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000051154_209526784.pth... +[2024-11-08 07:01:38,106][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000050766_207937536.pth +[2024-11-08 07:01:39,444][42004] Updated weights for policy 0, policy_version 51156 (0.0039) +[2024-11-08 07:01:42,932][41694] Fps is (10 sec: 6553.5, 60 sec: 6485.3, 300 sec: 6664.7). Total num frames: 209559552. Throughput: 0: 1638.6. Samples: 47385846. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 07:01:42,933][41694] Avg episode reward: [(0, '4.477')] +[2024-11-08 07:01:45,031][42004] Updated weights for policy 0, policy_version 51166 (0.0034) +[2024-11-08 07:01:47,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6758.4, 300 sec: 6650.8). Total num frames: 209596416. Throughput: 0: 1641.5. Samples: 47391396. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 07:01:47,933][41694] Avg episode reward: [(0, '4.626')] +[2024-11-08 07:01:50,727][42004] Updated weights for policy 0, policy_version 51176 (0.0034) +[2024-11-08 07:01:54,959][41694] Fps is (10 sec: 5789.3, 60 sec: 6471.4, 300 sec: 6591.6). Total num frames: 209629184. Throughput: 0: 1608.0. Samples: 47402270. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 07:01:54,961][41694] Avg episode reward: [(0, '4.236')] +[2024-11-08 07:01:57,932][41694] Fps is (10 sec: 5324.7, 60 sec: 6417.1, 300 sec: 6595.3). Total num frames: 209649664. Throughput: 0: 1597.7. Samples: 47408996. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 07:01:57,934][41694] Avg episode reward: [(0, '4.184')] +[2024-11-08 07:01:58,834][42004] Updated weights for policy 0, policy_version 51186 (0.0022) +[2024-11-08 07:02:02,931][41694] Fps is (10 sec: 6679.2, 60 sec: 6485.3, 300 sec: 6595.3). Total num frames: 209682432. Throughput: 0: 1597.0. Samples: 47414136. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 07:02:02,933][41694] Avg episode reward: [(0, '4.487')] +[2024-11-08 07:02:04,653][42004] Updated weights for policy 0, policy_version 51196 (0.0034) +[2024-11-08 07:02:07,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6485.3, 300 sec: 6595.3). Total num frames: 209719296. Throughput: 0: 1674.0. Samples: 47424616. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 07:02:07,933][41694] Avg episode reward: [(0, '4.503')] +[2024-11-08 07:02:10,780][42004] Updated weights for policy 0, policy_version 51206 (0.0028) +[2024-11-08 07:02:12,932][41694] Fps is (10 sec: 6962.9, 60 sec: 6485.3, 300 sec: 6650.8). Total num frames: 209752064. Throughput: 0: 1658.2. Samples: 47434526. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 07:02:12,934][41694] Avg episode reward: [(0, '4.594')] +[2024-11-08 07:02:16,725][42004] Updated weights for policy 0, policy_version 51216 (0.0029) +[2024-11-08 07:02:17,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6485.3, 300 sec: 6650.8). Total num frames: 209788928. Throughput: 0: 1637.2. Samples: 47439220. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 07:02:17,935][41694] Avg episode reward: [(0, '4.420')] +[2024-11-08 07:02:22,222][42004] Updated weights for policy 0, policy_version 51226 (0.0028) +[2024-11-08 07:02:22,932][41694] Fps is (10 sec: 7373.0, 60 sec: 6826.7, 300 sec: 6650.8). Total num frames: 209825792. Throughput: 0: 1657.1. Samples: 47450504. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 07:02:22,933][41694] Avg episode reward: [(0, '4.671')] +[2024-11-08 07:02:29,614][41694] Fps is (10 sec: 5960.6, 60 sec: 6574.1, 300 sec: 6599.3). Total num frames: 209858560. Throughput: 0: 1629.1. Samples: 47461898. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 07:02:29,616][41694] Avg episode reward: [(0, '4.546')] +[2024-11-08 07:02:30,173][42004] Updated weights for policy 0, policy_version 51236 (0.0026) +[2024-11-08 07:02:32,932][41694] Fps is (10 sec: 5734.5, 60 sec: 6485.3, 300 sec: 6595.3). Total num frames: 209883136. Throughput: 0: 1590.7. Samples: 47462976. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 07:02:32,933][41694] Avg episode reward: [(0, '4.386')] +[2024-11-08 07:02:35,600][42004] Updated weights for policy 0, policy_version 51246 (0.0028) +[2024-11-08 07:02:37,932][41694] Fps is (10 sec: 7386.4, 60 sec: 6621.9, 300 sec: 6609.1). Total num frames: 209920000. Throughput: 0: 1675.3. Samples: 47474262. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 07:02:37,934][41694] Avg episode reward: [(0, '4.479')] +[2024-11-08 07:02:41,275][42004] Updated weights for policy 0, policy_version 51256 (0.0038) +[2024-11-08 07:02:42,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6553.6, 300 sec: 6595.3). Total num frames: 209952768. Throughput: 0: 1688.2. Samples: 47484966. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 07:02:42,935][41694] Avg episode reward: [(0, '4.482')] +[2024-11-08 07:02:47,588][42004] Updated weights for policy 0, policy_version 51266 (0.0027) +[2024-11-08 07:02:47,932][41694] Fps is (10 sec: 6553.4, 60 sec: 6485.3, 300 sec: 6636.9). Total num frames: 209985536. Throughput: 0: 1673.7. Samples: 47489452. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 07:02:47,934][41694] Avg episode reward: [(0, '4.392')] +[2024-11-08 07:02:52,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6782.8, 300 sec: 6636.9). Total num frames: 210022400. Throughput: 0: 1687.2. Samples: 47500540. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 07:02:52,933][41694] Avg episode reward: [(0, '4.419')] +[2024-11-08 07:02:53,031][42004] Updated weights for policy 0, policy_version 51276 (0.0025) +[2024-11-08 07:02:57,931][41694] Fps is (10 sec: 7782.6, 60 sec: 6895.0, 300 sec: 6650.8). Total num frames: 210063360. Throughput: 0: 1719.0. Samples: 47511880. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 07:02:57,933][41694] Avg episode reward: [(0, '4.549')] +[2024-11-08 07:02:58,323][42004] Updated weights for policy 0, policy_version 51286 (0.0036) +[2024-11-08 07:03:04,189][41694] Fps is (10 sec: 5821.6, 60 sec: 6619.7, 300 sec: 6581.1). Total num frames: 210087936. Throughput: 0: 1680.0. Samples: 47516932. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:03:04,192][41694] Avg episode reward: [(0, '4.452')] +[2024-11-08 07:03:06,629][42004] Updated weights for policy 0, policy_version 51296 (0.0035) +[2024-11-08 07:03:07,932][41694] Fps is (10 sec: 5324.4, 60 sec: 6621.8, 300 sec: 6595.2). Total num frames: 210116608. Throughput: 0: 1621.2. Samples: 47523458. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:03:07,937][41694] Avg episode reward: [(0, '4.545')] +[2024-11-08 07:03:12,098][42004] Updated weights for policy 0, policy_version 51306 (0.0033) +[2024-11-08 07:03:12,931][41694] Fps is (10 sec: 7496.1, 60 sec: 6690.2, 300 sec: 6595.3). Total num frames: 210153472. Throughput: 0: 1680.3. Samples: 47534684. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:03:12,933][41694] Avg episode reward: [(0, '4.483')] +[2024-11-08 07:03:17,845][42004] Updated weights for policy 0, policy_version 51316 (0.0028) +[2024-11-08 07:03:17,932][41694] Fps is (10 sec: 7373.3, 60 sec: 6690.1, 300 sec: 6595.3). Total num frames: 210190336. Throughput: 0: 1713.8. Samples: 47540096. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:03:17,935][41694] Avg episode reward: [(0, '4.482')] +[2024-11-08 07:03:22,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6621.9, 300 sec: 6650.8). Total num frames: 210223104. Throughput: 0: 1688.3. Samples: 47550234. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:03:22,936][41694] Avg episode reward: [(0, '4.481')] +[2024-11-08 07:03:23,671][42004] Updated weights for policy 0, policy_version 51326 (0.0034) +[2024-11-08 07:03:27,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6953.3, 300 sec: 6664.7). Total num frames: 210264064. Throughput: 0: 1710.6. Samples: 47561942. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:03:27,933][41694] Avg episode reward: [(0, '4.563')] +[2024-11-08 07:03:28,878][42004] Updated weights for policy 0, policy_version 51336 (0.0028) +[2024-11-08 07:03:32,932][41694] Fps is (10 sec: 7782.3, 60 sec: 6963.2, 300 sec: 6664.7). Total num frames: 210300928. Throughput: 0: 1738.2. Samples: 47567672. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:03:32,933][41694] Avg episode reward: [(0, '4.412')] +[2024-11-08 07:03:34,417][42004] Updated weights for policy 0, policy_version 51346 (0.0026) +[2024-11-08 07:03:38,780][41694] Fps is (10 sec: 5663.6, 60 sec: 6664.2, 300 sec: 6590.2). Total num frames: 210325504. Throughput: 0: 1708.0. Samples: 47578848. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:03:38,782][41694] Avg episode reward: [(0, '4.324')] +[2024-11-08 07:03:38,796][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000051349_210325504.pth... +[2024-11-08 07:03:38,940][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000050960_208732160.pth +[2024-11-08 07:03:42,391][42004] Updated weights for policy 0, policy_version 51356 (0.0023) +[2024-11-08 07:03:42,932][41694] Fps is (10 sec: 5734.0, 60 sec: 6758.3, 300 sec: 6623.0). Total num frames: 210358272. Throughput: 0: 1634.5. Samples: 47585434. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:03:42,937][41694] Avg episode reward: [(0, '4.376')] +[2024-11-08 07:03:47,931][41694] Fps is (10 sec: 7161.0, 60 sec: 6758.4, 300 sec: 6609.2). Total num frames: 210391040. Throughput: 0: 1685.5. Samples: 47590660. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:03:47,933][41694] Avg episode reward: [(0, '4.379')] +[2024-11-08 07:03:47,943][42004] Updated weights for policy 0, policy_version 51366 (0.0042) +[2024-11-08 07:03:52,931][41694] Fps is (10 sec: 6144.5, 60 sec: 6621.9, 300 sec: 6595.3). Total num frames: 210419712. Throughput: 0: 1720.6. Samples: 47600886. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:03:52,935][41694] Avg episode reward: [(0, '4.548')] +[2024-11-08 07:03:57,786][42004] Updated weights for policy 0, policy_version 51376 (0.0043) +[2024-11-08 07:03:57,932][41694] Fps is (10 sec: 4505.6, 60 sec: 6212.3, 300 sec: 6595.3). Total num frames: 210436096. Throughput: 0: 1576.9. Samples: 47605646. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:03:57,937][41694] Avg episode reward: [(0, '4.644')] +[2024-11-08 07:04:02,933][41694] Fps is (10 sec: 4095.6, 60 sec: 6345.1, 300 sec: 6553.6). Total num frames: 210460672. Throughput: 0: 1527.3. Samples: 47608824. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:04:02,943][41694] Avg episode reward: [(0, '4.590')] +[2024-11-08 07:04:05,143][42004] Updated weights for policy 0, policy_version 51386 (0.0036) +[2024-11-08 07:04:07,932][41694] Fps is (10 sec: 5734.2, 60 sec: 6280.6, 300 sec: 6539.7). Total num frames: 210493440. Throughput: 0: 1508.3. Samples: 47618108. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 07:04:07,934][41694] Avg episode reward: [(0, '4.407')] +[2024-11-08 07:04:13,017][41694] Fps is (10 sec: 4874.3, 60 sec: 5930.8, 300 sec: 6468.4). Total num frames: 210509824. Throughput: 0: 1354.5. Samples: 47623010. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 07:04:13,018][41694] Avg episode reward: [(0, '4.580')] +[2024-11-08 07:04:13,694][42004] Updated weights for policy 0, policy_version 51396 (0.0039) +[2024-11-08 07:04:17,932][41694] Fps is (10 sec: 4915.4, 60 sec: 5870.9, 300 sec: 6456.4). Total num frames: 210542592. Throughput: 0: 1371.4. Samples: 47629386. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 07:04:17,933][41694] Avg episode reward: [(0, '4.609')] +[2024-11-08 07:04:20,093][42004] Updated weights for policy 0, policy_version 51406 (0.0031) +[2024-11-08 07:04:22,931][41694] Fps is (10 sec: 6609.9, 60 sec: 5870.9, 300 sec: 6442.5). Total num frames: 210575360. Throughput: 0: 1357.9. Samples: 47638802. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 07:04:22,933][41694] Avg episode reward: [(0, '4.512')] +[2024-11-08 07:04:26,384][42004] Updated weights for policy 0, policy_version 51416 (0.0044) +[2024-11-08 07:04:27,931][41694] Fps is (10 sec: 6553.7, 60 sec: 5734.4, 300 sec: 6428.6). Total num frames: 210608128. Throughput: 0: 1402.4. Samples: 47648540. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 07:04:27,933][41694] Avg episode reward: [(0, '4.442')] +[2024-11-08 07:04:32,216][42004] Updated weights for policy 0, policy_version 51426 (0.0026) +[2024-11-08 07:04:32,933][41694] Fps is (10 sec: 6962.3, 60 sec: 5734.3, 300 sec: 6498.0). Total num frames: 210644992. Throughput: 0: 1398.2. Samples: 47653580. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 07:04:32,935][41694] Avg episode reward: [(0, '4.385')] +[2024-11-08 07:04:37,932][41694] Fps is (10 sec: 6963.1, 60 sec: 5955.1, 300 sec: 6484.2). Total num frames: 210677760. Throughput: 0: 1402.8. Samples: 47664012. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 07:04:37,934][41694] Avg episode reward: [(0, '4.393')] +[2024-11-08 07:04:38,357][42004] Updated weights for policy 0, policy_version 51436 (0.0061) +[2024-11-08 07:04:42,932][41694] Fps is (10 sec: 6963.9, 60 sec: 5939.3, 300 sec: 6470.3). Total num frames: 210714624. Throughput: 0: 1534.4. Samples: 47674696. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 07:04:42,933][41694] Avg episode reward: [(0, '4.442')] +[2024-11-08 07:04:44,083][42004] Updated weights for policy 0, policy_version 51446 (0.0023) +[2024-11-08 07:04:47,932][41694] Fps is (10 sec: 6143.8, 60 sec: 5802.6, 300 sec: 6442.5). Total num frames: 210739200. Throughput: 0: 1576.9. Samples: 47679784. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 07:04:47,934][41694] Avg episode reward: [(0, '4.266')] +[2024-11-08 07:04:51,146][42004] Updated weights for policy 0, policy_version 51456 (0.0028) +[2024-11-08 07:04:52,932][41694] Fps is (10 sec: 5734.4, 60 sec: 5870.9, 300 sec: 6442.5). Total num frames: 210771968. Throughput: 0: 1556.0. Samples: 47688126. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 07:04:52,937][41694] Avg episode reward: [(0, '4.234')] +[2024-11-08 07:04:56,840][42004] Updated weights for policy 0, policy_version 51466 (0.0036) +[2024-11-08 07:04:57,931][41694] Fps is (10 sec: 6963.5, 60 sec: 6212.3, 300 sec: 6442.5). Total num frames: 210808832. Throughput: 0: 1691.9. Samples: 47699002. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 07:04:57,936][41694] Avg episode reward: [(0, '4.293')] +[2024-11-08 07:05:02,750][42004] Updated weights for policy 0, policy_version 51476 (0.0027) +[2024-11-08 07:05:02,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6417.2, 300 sec: 6442.5). Total num frames: 210845696. Throughput: 0: 1659.9. Samples: 47704080. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 07:05:02,935][41694] Avg episode reward: [(0, '4.440')] +[2024-11-08 07:05:07,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6417.1, 300 sec: 6498.1). Total num frames: 210878464. Throughput: 0: 1683.7. Samples: 47714570. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 07:05:07,933][41694] Avg episode reward: [(0, '4.484')] +[2024-11-08 07:05:08,894][42004] Updated weights for policy 0, policy_version 51486 (0.0035) +[2024-11-08 07:05:12,931][41694] Fps is (10 sec: 6144.1, 60 sec: 6631.3, 300 sec: 6484.2). Total num frames: 210907136. Throughput: 0: 1672.0. Samples: 47723782. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-11-08 07:05:12,933][41694] Avg episode reward: [(0, '4.707')] +[2024-11-08 07:05:15,602][42004] Updated weights for policy 0, policy_version 51496 (0.0038) +[2024-11-08 07:05:17,931][41694] Fps is (10 sec: 6144.1, 60 sec: 6621.9, 300 sec: 6484.2). Total num frames: 210939904. Throughput: 0: 1662.0. Samples: 47728370. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-11-08 07:05:17,933][41694] Avg episode reward: [(0, '4.324')] +[2024-11-08 07:05:22,902][42004] Updated weights for policy 0, policy_version 51506 (0.0026) +[2024-11-08 07:05:22,932][41694] Fps is (10 sec: 6143.8, 60 sec: 6553.6, 300 sec: 6470.3). Total num frames: 210968576. Throughput: 0: 1603.6. Samples: 47736176. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-11-08 07:05:22,933][41694] Avg episode reward: [(0, '4.313')] +[2024-11-08 07:05:27,931][41694] Fps is (10 sec: 5734.4, 60 sec: 6485.3, 300 sec: 6456.4). Total num frames: 210997248. Throughput: 0: 1595.1. Samples: 47746476. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-11-08 07:05:27,933][41694] Avg episode reward: [(0, '4.431')] +[2024-11-08 07:05:28,988][42004] Updated weights for policy 0, policy_version 51516 (0.0026) +[2024-11-08 07:05:32,931][41694] Fps is (10 sec: 6553.8, 60 sec: 6485.5, 300 sec: 6442.5). Total num frames: 211034112. Throughput: 0: 1593.9. Samples: 47751510. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-11-08 07:05:32,934][41694] Avg episode reward: [(0, '4.354')] +[2024-11-08 07:05:34,796][42004] Updated weights for policy 0, policy_version 51526 (0.0038) +[2024-11-08 07:05:37,932][41694] Fps is (10 sec: 7372.5, 60 sec: 6553.6, 300 sec: 6442.5). Total num frames: 211070976. Throughput: 0: 1644.7. Samples: 47762140. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2024-11-08 07:05:37,934][41694] Avg episode reward: [(0, '4.679')] +[2024-11-08 07:05:37,943][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000051531_211070976.pth... +[2024-11-08 07:05:38,086][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000051154_209526784.pth +[2024-11-08 07:05:40,935][42004] Updated weights for policy 0, policy_version 51536 (0.0029) +[2024-11-08 07:05:42,932][41694] Fps is (10 sec: 6553.5, 60 sec: 6417.1, 300 sec: 6470.3). Total num frames: 211099648. Throughput: 0: 1620.8. Samples: 47771938. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 07:05:42,934][41694] Avg episode reward: [(0, '4.452')] +[2024-11-08 07:05:47,673][42004] Updated weights for policy 0, policy_version 51546 (0.0036) +[2024-11-08 07:05:47,932][41694] Fps is (10 sec: 6144.1, 60 sec: 6553.6, 300 sec: 6456.4). Total num frames: 211132416. Throughput: 0: 1597.9. Samples: 47775988. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 07:05:47,934][41694] Avg episode reward: [(0, '4.487')] +[2024-11-08 07:05:53,348][41694] Fps is (10 sec: 6291.6, 60 sec: 6508.4, 300 sec: 6433.5). Total num frames: 211165184. Throughput: 0: 1586.3. Samples: 47786616. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 07:05:53,354][41694] Avg episode reward: [(0, '4.497')] +[2024-11-08 07:05:54,242][42004] Updated weights for policy 0, policy_version 51556 (0.0032) +[2024-11-08 07:05:57,932][41694] Fps is (10 sec: 6553.7, 60 sec: 6485.3, 300 sec: 6456.4). Total num frames: 211197952. Throughput: 0: 1592.3. Samples: 47795436. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 07:05:57,934][41694] Avg episode reward: [(0, '4.757')] +[2024-11-08 07:06:00,279][42004] Updated weights for policy 0, policy_version 51566 (0.0036) +[2024-11-08 07:06:02,932][41694] Fps is (10 sec: 6838.4, 60 sec: 6417.1, 300 sec: 6442.5). Total num frames: 211230720. Throughput: 0: 1609.6. Samples: 47800804. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 07:06:02,934][41694] Avg episode reward: [(0, '4.565')] +[2024-11-08 07:06:06,195][42004] Updated weights for policy 0, policy_version 51576 (0.0024) +[2024-11-08 07:06:07,932][41694] Fps is (10 sec: 6553.5, 60 sec: 6417.0, 300 sec: 6442.5). Total num frames: 211263488. Throughput: 0: 1665.5. Samples: 47811124. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 07:06:07,933][41694] Avg episode reward: [(0, '4.459')] +[2024-11-08 07:06:12,804][42004] Updated weights for policy 0, policy_version 51586 (0.0041) +[2024-11-08 07:06:12,931][41694] Fps is (10 sec: 6553.6, 60 sec: 6485.3, 300 sec: 6428.6). Total num frames: 211296256. Throughput: 0: 1639.4. Samples: 47820250. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:06:12,933][41694] Avg episode reward: [(0, '4.618')] +[2024-11-08 07:06:17,932][41694] Fps is (10 sec: 6143.9, 60 sec: 6417.0, 300 sec: 6470.3). Total num frames: 211324928. Throughput: 0: 1623.9. Samples: 47824586. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:06:17,934][41694] Avg episode reward: [(0, '4.727')] +[2024-11-08 07:06:19,682][42004] Updated weights for policy 0, policy_version 51596 (0.0039) +[2024-11-08 07:06:22,931][41694] Fps is (10 sec: 6144.0, 60 sec: 6485.4, 300 sec: 6456.4). Total num frames: 211357696. Throughput: 0: 1603.2. Samples: 47834282. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:06:22,935][41694] Avg episode reward: [(0, '4.356')] +[2024-11-08 07:06:25,429][42004] Updated weights for policy 0, policy_version 51606 (0.0030) +[2024-11-08 07:06:27,932][41694] Fps is (10 sec: 6144.1, 60 sec: 6485.3, 300 sec: 6414.7). Total num frames: 211386368. Throughput: 0: 1591.8. Samples: 47843570. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:06:27,934][41694] Avg episode reward: [(0, '4.543')] +[2024-11-08 07:06:31,946][42004] Updated weights for policy 0, policy_version 51616 (0.0031) +[2024-11-08 07:06:32,932][41694] Fps is (10 sec: 6553.4, 60 sec: 6485.3, 300 sec: 6442.5). Total num frames: 211423232. Throughput: 0: 1610.3. Samples: 47848452. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:06:32,936][41694] Avg episode reward: [(0, '4.619')] +[2024-11-08 07:06:37,722][42004] Updated weights for policy 0, policy_version 51626 (0.0037) +[2024-11-08 07:06:37,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6485.4, 300 sec: 6442.5). Total num frames: 211460096. Throughput: 0: 1631.5. Samples: 47859354. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:06:37,935][41694] Avg episode reward: [(0, '4.524')] +[2024-11-08 07:06:42,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6621.8, 300 sec: 6442.5). Total num frames: 211496960. Throughput: 0: 1663.3. Samples: 47870284. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:06:42,935][41694] Avg episode reward: [(0, '4.519')] +[2024-11-08 07:06:43,400][42004] Updated weights for policy 0, policy_version 51636 (0.0033) +[2024-11-08 07:06:47,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6621.9, 300 sec: 6487.1). Total num frames: 211529728. Throughput: 0: 1658.0. Samples: 47875416. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 07:06:47,933][41694] Avg episode reward: [(0, '4.514')] +[2024-11-08 07:06:49,675][42004] Updated weights for policy 0, policy_version 51646 (0.0036) +[2024-11-08 07:06:52,932][41694] Fps is (10 sec: 6144.1, 60 sec: 6599.4, 300 sec: 6470.3). Total num frames: 211558400. Throughput: 0: 1639.4. Samples: 47884896. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 07:06:52,935][41694] Avg episode reward: [(0, '4.587')] +[2024-11-08 07:06:56,008][42004] Updated weights for policy 0, policy_version 51656 (0.0027) +[2024-11-08 07:06:57,932][41694] Fps is (10 sec: 6553.6, 60 sec: 6621.9, 300 sec: 6484.2). Total num frames: 211595264. Throughput: 0: 1663.5. Samples: 47895106. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 07:06:57,933][41694] Avg episode reward: [(0, '4.261')] +[2024-11-08 07:07:02,921][42004] Updated weights for policy 0, policy_version 51666 (0.0042) +[2024-11-08 07:07:02,932][41694] Fps is (10 sec: 6553.7, 60 sec: 6553.6, 300 sec: 6456.4). Total num frames: 211623936. Throughput: 0: 1648.8. Samples: 47898780. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 07:07:02,933][41694] Avg episode reward: [(0, '4.371')] +[2024-11-08 07:07:07,932][41694] Fps is (10 sec: 6143.8, 60 sec: 6553.6, 300 sec: 6456.4). Total num frames: 211656704. Throughput: 0: 1657.1. Samples: 47908852. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 07:07:07,934][41694] Avg episode reward: [(0, '4.579')] +[2024-11-08 07:07:09,048][42004] Updated weights for policy 0, policy_version 51676 (0.0043) +[2024-11-08 07:07:12,932][41694] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6442.5). Total num frames: 211689472. Throughput: 0: 1666.7. Samples: 47918570. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 07:07:12,934][41694] Avg episode reward: [(0, '4.358')] +[2024-11-08 07:07:15,026][42004] Updated weights for policy 0, policy_version 51686 (0.0032) +[2024-11-08 07:07:17,933][41694] Fps is (10 sec: 6553.0, 60 sec: 6621.8, 300 sec: 6428.6). Total num frames: 211722240. Throughput: 0: 1679.4. Samples: 47924026. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 07:07:17,935][41694] Avg episode reward: [(0, '4.357')] +[2024-11-08 07:07:20,885][42004] Updated weights for policy 0, policy_version 51696 (0.0032) +[2024-11-08 07:07:22,935][41694] Fps is (10 sec: 6551.1, 60 sec: 6621.4, 300 sec: 6465.4). Total num frames: 211755008. Throughput: 0: 1672.4. Samples: 47934616. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 07:07:22,938][41694] Avg episode reward: [(0, '4.367')] +[2024-11-08 07:07:27,708][42004] Updated weights for policy 0, policy_version 51706 (0.0044) +[2024-11-08 07:07:27,932][41694] Fps is (10 sec: 6554.4, 60 sec: 6690.1, 300 sec: 6456.4). Total num frames: 211787776. Throughput: 0: 1622.6. Samples: 47943300. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 07:07:27,935][41694] Avg episode reward: [(0, '4.549')] +[2024-11-08 07:07:32,932][41694] Fps is (10 sec: 6556.0, 60 sec: 6621.9, 300 sec: 6442.5). Total num frames: 211820544. Throughput: 0: 1630.2. Samples: 47948776. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 07:07:32,934][41694] Avg episode reward: [(0, '4.344')] +[2024-11-08 07:07:33,831][42004] Updated weights for policy 0, policy_version 51716 (0.0033) +[2024-11-08 07:07:37,934][41694] Fps is (10 sec: 6552.2, 60 sec: 6553.4, 300 sec: 6442.5). Total num frames: 211853312. Throughput: 0: 1625.6. Samples: 47958050. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 07:07:37,936][41694] Avg episode reward: [(0, '4.461')] +[2024-11-08 07:07:37,947][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000051723_211857408.pth... +[2024-11-08 07:07:38,106][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000051349_210325504.pth +[2024-11-08 07:07:39,981][42004] Updated weights for policy 0, policy_version 51726 (0.0037) +[2024-11-08 07:07:42,932][41694] Fps is (10 sec: 6553.7, 60 sec: 6485.4, 300 sec: 6442.5). Total num frames: 211886080. Throughput: 0: 1626.1. Samples: 47968282. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 07:07:42,933][41694] Avg episode reward: [(0, '4.484')] +[2024-11-08 07:07:45,713][42004] Updated weights for policy 0, policy_version 51736 (0.0030) +[2024-11-08 07:07:47,932][41694] Fps is (10 sec: 6964.5, 60 sec: 6553.6, 300 sec: 6442.5). Total num frames: 211922944. Throughput: 0: 1667.6. Samples: 47973822. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 07:07:47,934][41694] Avg episode reward: [(0, '4.609')] +[2024-11-08 07:07:51,650][42004] Updated weights for policy 0, policy_version 51746 (0.0027) +[2024-11-08 07:07:52,931][41694] Fps is (10 sec: 7372.9, 60 sec: 6690.2, 300 sec: 6428.6). Total num frames: 211959808. Throughput: 0: 1677.1. Samples: 47984320. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 07:07:52,933][41694] Avg episode reward: [(0, '4.603')] +[2024-11-08 07:07:57,935][41694] Fps is (10 sec: 6551.8, 60 sec: 6553.3, 300 sec: 6470.0). Total num frames: 211988480. Throughput: 0: 1679.5. Samples: 47994154. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 07:07:57,941][41694] Avg episode reward: [(0, '4.595')] +[2024-11-08 07:07:58,092][42004] Updated weights for policy 0, policy_version 51756 (0.0028) +[2024-11-08 07:08:02,932][41694] Fps is (10 sec: 6143.9, 60 sec: 6621.9, 300 sec: 6456.4). Total num frames: 212021248. Throughput: 0: 1655.5. Samples: 47998522. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 07:08:02,934][41694] Avg episode reward: [(0, '4.662')] +[2024-11-08 07:08:04,994][42004] Updated weights for policy 0, policy_version 51766 (0.0042) +[2024-11-08 07:08:07,932][41694] Fps is (10 sec: 6145.7, 60 sec: 6553.6, 300 sec: 6428.6). Total num frames: 212049920. Throughput: 0: 1621.9. Samples: 48007594. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 07:08:07,937][41694] Avg episode reward: [(0, '4.531')] +[2024-11-08 07:08:10,980][42004] Updated weights for policy 0, policy_version 51776 (0.0026) +[2024-11-08 07:08:12,932][41694] Fps is (10 sec: 6553.3, 60 sec: 6621.8, 300 sec: 6428.6). Total num frames: 212086784. Throughput: 0: 1661.3. Samples: 48018060. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 07:08:12,935][41694] Avg episode reward: [(0, '4.546')] +[2024-11-08 07:08:16,895][42004] Updated weights for policy 0, policy_version 51786 (0.0028) +[2024-11-08 07:08:17,931][41694] Fps is (10 sec: 7373.0, 60 sec: 6690.3, 300 sec: 6442.5). Total num frames: 212123648. Throughput: 0: 1647.4. Samples: 48022908. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 07:08:17,934][41694] Avg episode reward: [(0, '4.487')] +[2024-11-08 07:08:22,530][42004] Updated weights for policy 0, policy_version 51796 (0.0026) +[2024-11-08 07:08:22,932][41694] Fps is (10 sec: 6963.6, 60 sec: 6690.5, 300 sec: 6414.7). Total num frames: 212156416. Throughput: 0: 1688.5. Samples: 48034030. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 07:08:22,933][41694] Avg episode reward: [(0, '4.547')] +[2024-11-08 07:08:27,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6758.4, 300 sec: 6414.8). Total num frames: 212193280. Throughput: 0: 1693.6. Samples: 48044494. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:08:27,933][41694] Avg episode reward: [(0, '4.484')] +[2024-11-08 07:08:28,329][42004] Updated weights for policy 0, policy_version 51806 (0.0028) +[2024-11-08 07:08:32,931][41694] Fps is (10 sec: 6553.7, 60 sec: 6690.2, 300 sec: 6447.2). Total num frames: 212221952. Throughput: 0: 1675.9. Samples: 48049236. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:08:32,936][41694] Avg episode reward: [(0, '4.695')] +[2024-11-08 07:08:34,846][42004] Updated weights for policy 0, policy_version 51816 (0.0043) +[2024-11-08 07:08:37,932][41694] Fps is (10 sec: 6144.0, 60 sec: 6690.4, 300 sec: 6428.7). Total num frames: 212254720. Throughput: 0: 1669.1. Samples: 48059428. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:08:37,933][41694] Avg episode reward: [(0, '4.493')] +[2024-11-08 07:08:41,355][42004] Updated weights for policy 0, policy_version 51826 (0.0030) +[2024-11-08 07:08:42,932][41694] Fps is (10 sec: 6553.3, 60 sec: 6690.1, 300 sec: 6428.6). Total num frames: 212287488. Throughput: 0: 1654.4. Samples: 48068600. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:08:42,939][41694] Avg episode reward: [(0, '4.768')] +[2024-11-08 07:08:47,236][42004] Updated weights for policy 0, policy_version 51836 (0.0033) +[2024-11-08 07:08:47,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6690.1, 300 sec: 6456.4). Total num frames: 212324352. Throughput: 0: 1670.9. Samples: 48073712. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:08:47,933][41694] Avg episode reward: [(0, '4.655')] +[2024-11-08 07:08:52,703][42004] Updated weights for policy 0, policy_version 51846 (0.0026) +[2024-11-08 07:08:52,932][41694] Fps is (10 sec: 7373.1, 60 sec: 6690.1, 300 sec: 6525.8). Total num frames: 212361216. Throughput: 0: 1712.1. Samples: 48084638. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:08:52,934][41694] Avg episode reward: [(0, '4.356')] +[2024-11-08 07:08:57,932][41694] Fps is (10 sec: 7372.9, 60 sec: 6827.0, 300 sec: 6567.5). Total num frames: 212398080. Throughput: 0: 1715.5. Samples: 48095256. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 07:08:57,933][41694] Avg episode reward: [(0, '4.511')] +[2024-11-08 07:08:58,474][42004] Updated weights for policy 0, policy_version 51856 (0.0032) +[2024-11-08 07:09:02,931][41694] Fps is (10 sec: 6963.3, 60 sec: 6826.7, 300 sec: 6567.5). Total num frames: 212430848. Throughput: 0: 1731.1. Samples: 48100806. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 07:09:02,933][41694] Avg episode reward: [(0, '4.483')] +[2024-11-08 07:09:04,893][42004] Updated weights for policy 0, policy_version 51866 (0.0034) +[2024-11-08 07:09:07,932][41694] Fps is (10 sec: 6143.8, 60 sec: 6826.7, 300 sec: 6611.0). Total num frames: 212459520. Throughput: 0: 1692.1. Samples: 48110174. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 07:09:07,935][41694] Avg episode reward: [(0, '4.401')] +[2024-11-08 07:09:11,398][42004] Updated weights for policy 0, policy_version 51876 (0.0027) +[2024-11-08 07:09:12,935][41694] Fps is (10 sec: 6141.9, 60 sec: 6758.1, 300 sec: 6609.1). Total num frames: 212492288. Throughput: 0: 1668.1. Samples: 48119564. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 07:09:12,942][41694] Avg episode reward: [(0, '4.272')] +[2024-11-08 07:09:17,294][42004] Updated weights for policy 0, policy_version 51886 (0.0032) +[2024-11-08 07:09:17,932][41694] Fps is (10 sec: 6963.4, 60 sec: 6758.4, 300 sec: 6623.0). Total num frames: 212529152. Throughput: 0: 1670.8. Samples: 48124420. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 07:09:17,935][41694] Avg episode reward: [(0, '4.550')] +[2024-11-08 07:09:22,837][42004] Updated weights for policy 0, policy_version 51896 (0.0029) +[2024-11-08 07:09:22,932][41694] Fps is (10 sec: 7374.9, 60 sec: 6826.6, 300 sec: 6636.9). Total num frames: 212566016. Throughput: 0: 1689.3. Samples: 48135446. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 07:09:22,933][41694] Avg episode reward: [(0, '4.729')] +[2024-11-08 07:09:27,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6826.7, 300 sec: 6636.9). Total num frames: 212602880. Throughput: 0: 1735.2. Samples: 48146682. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 07:09:27,934][41694] Avg episode reward: [(0, '4.439')] +[2024-11-08 07:09:28,400][42004] Updated weights for policy 0, policy_version 51906 (0.0027) +[2024-11-08 07:09:32,932][41694] Fps is (10 sec: 7373.1, 60 sec: 6963.2, 300 sec: 6650.8). Total num frames: 212639744. Throughput: 0: 1739.8. Samples: 48152002. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:09:32,933][41694] Avg episode reward: [(0, '4.476')] +[2024-11-08 07:09:33,865][42004] Updated weights for policy 0, policy_version 51916 (0.0025) +[2024-11-08 07:09:37,932][41694] Fps is (10 sec: 6553.3, 60 sec: 6894.9, 300 sec: 6623.0). Total num frames: 212668416. Throughput: 0: 1733.5. Samples: 48162644. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:09:37,936][41694] Avg episode reward: [(0, '4.737')] +[2024-11-08 07:09:37,986][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000051922_212672512.pth... +[2024-11-08 07:09:38,133][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000051531_211070976.pth +[2024-11-08 07:09:40,610][42004] Updated weights for policy 0, policy_version 51926 (0.0034) +[2024-11-08 07:09:43,313][41694] Fps is (10 sec: 5918.2, 60 sec: 6851.4, 300 sec: 6642.2). Total num frames: 212701184. Throughput: 0: 1691.2. Samples: 48172006. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:09:43,318][41694] Avg episode reward: [(0, '4.379')] +[2024-11-08 07:09:47,199][42004] Updated weights for policy 0, policy_version 51936 (0.0025) +[2024-11-08 07:09:47,931][41694] Fps is (10 sec: 6554.0, 60 sec: 6826.7, 300 sec: 6650.8). Total num frames: 212733952. Throughput: 0: 1672.8. Samples: 48176084. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:09:47,933][41694] Avg episode reward: [(0, '4.397')] +[2024-11-08 07:09:52,932][41694] Fps is (10 sec: 6813.4, 60 sec: 6758.4, 300 sec: 6636.9). Total num frames: 212766720. Throughput: 0: 1700.3. Samples: 48186688. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:09:52,934][41694] Avg episode reward: [(0, '4.408')] +[2024-11-08 07:09:53,069][42004] Updated weights for policy 0, policy_version 51946 (0.0021) +[2024-11-08 07:09:57,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6758.4, 300 sec: 6636.9). Total num frames: 212803584. Throughput: 0: 1731.0. Samples: 48197452. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:09:57,933][41694] Avg episode reward: [(0, '4.411')] +[2024-11-08 07:09:58,826][42004] Updated weights for policy 0, policy_version 51956 (0.0037) +[2024-11-08 07:10:02,932][41694] Fps is (10 sec: 6962.9, 60 sec: 6758.3, 300 sec: 6636.9). Total num frames: 212836352. Throughput: 0: 1733.5. Samples: 48202428. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 07:10:02,934][41694] Avg episode reward: [(0, '4.447')] +[2024-11-08 07:10:05,084][42004] Updated weights for policy 0, policy_version 51966 (0.0038) +[2024-11-08 07:10:07,931][41694] Fps is (10 sec: 6553.6, 60 sec: 6826.7, 300 sec: 6650.8). Total num frames: 212869120. Throughput: 0: 1710.7. Samples: 48212428. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 07:10:07,933][41694] Avg episode reward: [(0, '4.478')] +[2024-11-08 07:10:11,700][42004] Updated weights for policy 0, policy_version 51976 (0.0035) +[2024-11-08 07:10:12,932][41694] Fps is (10 sec: 6144.5, 60 sec: 6758.8, 300 sec: 6636.9). Total num frames: 212897792. Throughput: 0: 1659.8. Samples: 48221374. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 07:10:12,933][41694] Avg episode reward: [(0, '4.358')] +[2024-11-08 07:10:17,932][41694] Fps is (10 sec: 5324.8, 60 sec: 6553.6, 300 sec: 6623.0). Total num frames: 212922368. Throughput: 0: 1620.2. Samples: 48224910. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 07:10:17,935][41694] Avg episode reward: [(0, '4.417')] +[2024-11-08 07:10:19,699][42004] Updated weights for policy 0, policy_version 51986 (0.0034) +[2024-11-08 07:10:22,932][41694] Fps is (10 sec: 5734.3, 60 sec: 6485.4, 300 sec: 6636.9). Total num frames: 212955136. Throughput: 0: 1577.3. Samples: 48233622. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 07:10:22,934][41694] Avg episode reward: [(0, '4.519')] +[2024-11-08 07:10:25,669][42004] Updated weights for policy 0, policy_version 51996 (0.0032) +[2024-11-08 07:10:27,932][41694] Fps is (10 sec: 6553.5, 60 sec: 6417.1, 300 sec: 6623.0). Total num frames: 212987904. Throughput: 0: 1609.4. Samples: 48243814. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 07:10:27,934][41694] Avg episode reward: [(0, '4.754')] +[2024-11-08 07:10:31,666][42004] Updated weights for policy 0, policy_version 52006 (0.0036) +[2024-11-08 07:10:32,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6417.1, 300 sec: 6623.0). Total num frames: 213024768. Throughput: 0: 1622.0. Samples: 48249074. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 07:10:32,935][41694] Avg episode reward: [(0, '4.480')] +[2024-11-08 07:10:37,842][42004] Updated weights for policy 0, policy_version 52016 (0.0025) +[2024-11-08 07:10:37,932][41694] Fps is (10 sec: 6963.3, 60 sec: 6485.4, 300 sec: 6636.9). Total num frames: 213057536. Throughput: 0: 1597.3. Samples: 48258566. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:10:37,934][41694] Avg episode reward: [(0, '4.556')] +[2024-11-08 07:10:42,932][41694] Fps is (10 sec: 6553.6, 60 sec: 6526.8, 300 sec: 6636.9). Total num frames: 213090304. Throughput: 0: 1592.7. Samples: 48269124. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:10:42,934][41694] Avg episode reward: [(0, '4.514')] +[2024-11-08 07:10:43,743][42004] Updated weights for policy 0, policy_version 52026 (0.0028) +[2024-11-08 07:10:47,932][41694] Fps is (10 sec: 6553.5, 60 sec: 6485.3, 300 sec: 6646.3). Total num frames: 213123072. Throughput: 0: 1588.3. Samples: 48273900. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:10:47,934][41694] Avg episode reward: [(0, '4.625')] +[2024-11-08 07:10:51,007][42004] Updated weights for policy 0, policy_version 52036 (0.0057) +[2024-11-08 07:10:52,931][41694] Fps is (10 sec: 6144.2, 60 sec: 6417.1, 300 sec: 6623.0). Total num frames: 213151744. Throughput: 0: 1555.6. Samples: 48282428. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:10:52,932][41694] Avg episode reward: [(0, '4.500')] +[2024-11-08 07:10:56,984][42004] Updated weights for policy 0, policy_version 52046 (0.0043) +[2024-11-08 07:10:57,932][41694] Fps is (10 sec: 6143.9, 60 sec: 6348.8, 300 sec: 6623.0). Total num frames: 213184512. Throughput: 0: 1589.3. Samples: 48292894. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:10:57,935][41694] Avg episode reward: [(0, '4.434')] +[2024-11-08 07:11:02,690][42004] Updated weights for policy 0, policy_version 52056 (0.0033) +[2024-11-08 07:11:02,935][41694] Fps is (10 sec: 6961.0, 60 sec: 6416.8, 300 sec: 6636.8). Total num frames: 213221376. Throughput: 0: 1628.7. Samples: 48298208. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:11:02,937][41694] Avg episode reward: [(0, '4.366')] +[2024-11-08 07:11:07,932][41694] Fps is (10 sec: 6962.8, 60 sec: 6417.0, 300 sec: 6636.9). Total num frames: 213254144. Throughput: 0: 1667.6. Samples: 48308664. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:11:07,937][41694] Avg episode reward: [(0, '4.576')] +[2024-11-08 07:11:08,548][42004] Updated weights for policy 0, policy_version 52066 (0.0026) +[2024-11-08 07:11:12,932][41694] Fps is (10 sec: 6555.2, 60 sec: 6485.3, 300 sec: 6650.8). Total num frames: 213286912. Throughput: 0: 1655.5. Samples: 48318314. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:11:12,934][41694] Avg episode reward: [(0, '4.547')] +[2024-11-08 07:11:15,271][42004] Updated weights for policy 0, policy_version 52076 (0.0035) +[2024-11-08 07:11:17,932][41694] Fps is (10 sec: 6553.8, 60 sec: 6621.8, 300 sec: 6650.8). Total num frames: 213319680. Throughput: 0: 1642.5. Samples: 48322986. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:11:17,935][41694] Avg episode reward: [(0, '4.273')] +[2024-11-08 07:11:21,553][42004] Updated weights for policy 0, policy_version 52086 (0.0026) +[2024-11-08 07:11:22,932][41694] Fps is (10 sec: 6144.3, 60 sec: 6553.6, 300 sec: 6650.8). Total num frames: 213348352. Throughput: 0: 1646.0. Samples: 48332636. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:11:22,936][41694] Avg episode reward: [(0, '4.360')] +[2024-11-08 07:11:27,932][41694] Fps is (10 sec: 6144.2, 60 sec: 6553.6, 300 sec: 6636.9). Total num frames: 213381120. Throughput: 0: 1620.6. Samples: 48342052. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:11:27,933][41694] Avg episode reward: [(0, '4.453')] +[2024-11-08 07:11:28,052][42004] Updated weights for policy 0, policy_version 52096 (0.0027) +[2024-11-08 07:11:32,931][41694] Fps is (10 sec: 6963.4, 60 sec: 6553.6, 300 sec: 6636.9). Total num frames: 213417984. Throughput: 0: 1625.4. Samples: 48347044. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:11:32,933][41694] Avg episode reward: [(0, '4.423')] +[2024-11-08 07:11:33,757][42004] Updated weights for policy 0, policy_version 52106 (0.0029) +[2024-11-08 07:11:37,932][41694] Fps is (10 sec: 6963.3, 60 sec: 6553.6, 300 sec: 6623.0). Total num frames: 213450752. Throughput: 0: 1669.1. Samples: 48357538. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:11:37,933][41694] Avg episode reward: [(0, '4.659')] +[2024-11-08 07:11:38,049][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000052113_213454848.pth... +[2024-11-08 07:11:38,161][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000051723_211857408.pth +[2024-11-08 07:11:39,900][42004] Updated weights for policy 0, policy_version 52116 (0.0028) +[2024-11-08 07:11:42,931][41694] Fps is (10 sec: 6963.2, 60 sec: 6621.9, 300 sec: 6636.9). Total num frames: 213487616. Throughput: 0: 1669.4. Samples: 48368016. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 07:11:42,933][41694] Avg episode reward: [(0, '4.434')] +[2024-11-08 07:11:45,723][42004] Updated weights for policy 0, policy_version 52126 (0.0026) +[2024-11-08 07:11:47,933][41694] Fps is (10 sec: 6962.4, 60 sec: 6621.8, 300 sec: 6650.8). Total num frames: 213520384. Throughput: 0: 1669.2. Samples: 48373320. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 07:11:47,935][41694] Avg episode reward: [(0, '4.642')] +[2024-11-08 07:11:51,781][42004] Updated weights for policy 0, policy_version 52136 (0.0043) +[2024-11-08 07:11:52,932][41694] Fps is (10 sec: 6553.5, 60 sec: 6690.1, 300 sec: 6636.9). Total num frames: 213553152. Throughput: 0: 1664.9. Samples: 48383584. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 07:11:52,935][41694] Avg episode reward: [(0, '4.547')] +[2024-11-08 07:11:57,932][41694] Fps is (10 sec: 6144.7, 60 sec: 6621.9, 300 sec: 6636.9). Total num frames: 213581824. Throughput: 0: 1637.0. Samples: 48391976. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 07:11:57,934][41694] Avg episode reward: [(0, '4.564')] +[2024-11-08 07:11:58,715][42004] Updated weights for policy 0, policy_version 52146 (0.0035) +[2024-11-08 07:12:02,932][41694] Fps is (10 sec: 6143.6, 60 sec: 6553.9, 300 sec: 6636.9). Total num frames: 213614592. Throughput: 0: 1648.5. Samples: 48397170. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 07:12:02,934][41694] Avg episode reward: [(0, '4.351')] +[2024-11-08 07:12:05,330][42004] Updated weights for policy 0, policy_version 52156 (0.0035) +[2024-11-08 07:12:07,932][41694] Fps is (10 sec: 6553.5, 60 sec: 6553.7, 300 sec: 6636.9). Total num frames: 213647360. Throughput: 0: 1637.6. Samples: 48406328. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 07:12:07,937][41694] Avg episode reward: [(0, '4.419')] +[2024-11-08 07:12:11,284][42004] Updated weights for policy 0, policy_version 52166 (0.0024) +[2024-11-08 07:12:12,932][41694] Fps is (10 sec: 6554.0, 60 sec: 6553.7, 300 sec: 6636.9). Total num frames: 213680128. Throughput: 0: 1660.6. Samples: 48416780. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 07:12:12,933][41694] Avg episode reward: [(0, '4.556')] +[2024-11-08 07:12:16,995][42004] Updated weights for policy 0, policy_version 52176 (0.0031) +[2024-11-08 07:12:17,932][41694] Fps is (10 sec: 6963.4, 60 sec: 6621.9, 300 sec: 6650.9). Total num frames: 213716992. Throughput: 0: 1669.4. Samples: 48422166. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 07:12:17,933][41694] Avg episode reward: [(0, '4.724')] +[2024-11-08 07:12:22,666][42004] Updated weights for policy 0, policy_version 52186 (0.0025) +[2024-11-08 07:12:22,932][41694] Fps is (10 sec: 7372.9, 60 sec: 6758.4, 300 sec: 6664.7). Total num frames: 213753856. Throughput: 0: 1672.0. Samples: 48432776. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 07:12:22,933][41694] Avg episode reward: [(0, '4.431')] +[2024-11-08 07:12:27,931][41694] Fps is (10 sec: 6553.6, 60 sec: 6690.1, 300 sec: 6650.8). Total num frames: 213782528. Throughput: 0: 1665.8. Samples: 48442976. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 07:12:27,933][41694] Avg episode reward: [(0, '4.559')] +[2024-11-08 07:12:29,609][42004] Updated weights for policy 0, policy_version 52196 (0.0038) +[2024-11-08 07:12:32,934][41694] Fps is (10 sec: 6142.4, 60 sec: 6621.6, 300 sec: 6650.8). Total num frames: 213815296. Throughput: 0: 1633.2. Samples: 48446818. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 07:12:32,936][41694] Avg episode reward: [(0, '4.417')] +[2024-11-08 07:12:35,617][42004] Updated weights for policy 0, policy_version 52206 (0.0024) +[2024-11-08 07:12:37,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6690.1, 300 sec: 6664.7). Total num frames: 213852160. Throughput: 0: 1637.4. Samples: 48457266. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 07:12:37,933][41694] Avg episode reward: [(0, '4.491')] +[2024-11-08 07:12:41,278][42004] Updated weights for policy 0, policy_version 52216 (0.0035) +[2024-11-08 07:12:42,932][41694] Fps is (10 sec: 6964.8, 60 sec: 6621.8, 300 sec: 6650.8). Total num frames: 213884928. Throughput: 0: 1691.3. Samples: 48468086. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 07:12:42,934][41694] Avg episode reward: [(0, '4.522')] +[2024-11-08 07:12:46,902][42004] Updated weights for policy 0, policy_version 52226 (0.0024) +[2024-11-08 07:12:47,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6758.5, 300 sec: 6664.7). Total num frames: 213925888. Throughput: 0: 1684.3. Samples: 48472964. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:12:47,933][41694] Avg episode reward: [(0, '4.481')] +[2024-11-08 07:12:52,649][42004] Updated weights for policy 0, policy_version 52236 (0.0031) +[2024-11-08 07:12:52,932][41694] Fps is (10 sec: 7372.6, 60 sec: 6758.4, 300 sec: 6678.6). Total num frames: 213958656. Throughput: 0: 1732.5. Samples: 48484290. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:12:52,937][41694] Avg episode reward: [(0, '4.519')] +[2024-11-08 07:12:57,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6894.9, 300 sec: 6692.4). Total num frames: 213995520. Throughput: 0: 1742.8. Samples: 48495206. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:12:57,934][41694] Avg episode reward: [(0, '4.452')] +[2024-11-08 07:12:58,341][42004] Updated weights for policy 0, policy_version 52246 (0.0033) +[2024-11-08 07:13:02,932][41694] Fps is (10 sec: 6144.3, 60 sec: 6758.5, 300 sec: 6678.6). Total num frames: 214020096. Throughput: 0: 1718.0. Samples: 48499474. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:13:02,935][41694] Avg episode reward: [(0, '4.396')] +[2024-11-08 07:13:06,235][42004] Updated weights for policy 0, policy_version 52256 (0.0035) +[2024-11-08 07:13:07,932][41694] Fps is (10 sec: 5324.9, 60 sec: 6690.2, 300 sec: 6650.8). Total num frames: 214048768. Throughput: 0: 1657.5. Samples: 48507364. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:13:07,934][41694] Avg episode reward: [(0, '4.503')] +[2024-11-08 07:13:11,933][42004] Updated weights for policy 0, policy_version 52266 (0.0025) +[2024-11-08 07:13:12,932][41694] Fps is (10 sec: 6553.6, 60 sec: 6758.4, 300 sec: 6650.8). Total num frames: 214085632. Throughput: 0: 1670.7. Samples: 48518156. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:13:12,933][41694] Avg episode reward: [(0, '4.528')] +[2024-11-08 07:13:17,679][42004] Updated weights for policy 0, policy_version 52276 (0.0024) +[2024-11-08 07:13:17,932][41694] Fps is (10 sec: 7372.6, 60 sec: 6758.4, 300 sec: 6664.7). Total num frames: 214122496. Throughput: 0: 1699.1. Samples: 48523274. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 07:13:17,935][41694] Avg episode reward: [(0, '4.688')] +[2024-11-08 07:13:22,932][41694] Fps is (10 sec: 7372.8, 60 sec: 6758.4, 300 sec: 6664.7). Total num frames: 214159360. Throughput: 0: 1715.6. Samples: 48534470. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 07:13:22,933][41694] Avg episode reward: [(0, '4.546')] +[2024-11-08 07:13:23,215][42004] Updated weights for policy 0, policy_version 52286 (0.0038) +[2024-11-08 07:13:27,932][41694] Fps is (10 sec: 7372.9, 60 sec: 6894.9, 300 sec: 6692.4). Total num frames: 214196224. Throughput: 0: 1718.0. Samples: 48545396. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 07:13:27,935][41694] Avg episode reward: [(0, '4.423')] +[2024-11-08 07:13:28,718][42004] Updated weights for policy 0, policy_version 52296 (0.0028) +[2024-11-08 07:13:33,200][41694] Fps is (10 sec: 6780.8, 60 sec: 6864.5, 300 sec: 6686.4). Total num frames: 214228992. Throughput: 0: 1718.2. Samples: 48550746. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 07:13:33,203][41694] Avg episode reward: [(0, '4.572')] +[2024-11-08 07:13:35,708][42004] Updated weights for policy 0, policy_version 52306 (0.0036) +[2024-11-08 07:13:37,932][41694] Fps is (10 sec: 6143.9, 60 sec: 6758.4, 300 sec: 6678.6). Total num frames: 214257664. Throughput: 0: 1670.0. Samples: 48559438. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 07:13:37,934][41694] Avg episode reward: [(0, '4.480')] +[2024-11-08 07:13:37,947][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000052309_214257664.pth... +[2024-11-08 07:13:38,119][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000051922_212672512.pth +[2024-11-08 07:13:41,970][42004] Updated weights for policy 0, policy_version 52316 (0.0049) +[2024-11-08 07:13:42,931][41694] Fps is (10 sec: 6313.9, 60 sec: 6758.4, 300 sec: 6664.7). Total num frames: 214290432. Throughput: 0: 1645.8. Samples: 48569266. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 07:13:42,933][41694] Avg episode reward: [(0, '4.609')] +[2024-11-08 07:13:47,568][42004] Updated weights for policy 0, policy_version 52326 (0.0047) +[2024-11-08 07:13:47,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6690.1, 300 sec: 6664.7). Total num frames: 214327296. Throughput: 0: 1676.7. Samples: 48574924. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 07:13:47,933][41694] Avg episode reward: [(0, '4.497')] +[2024-11-08 07:13:52,932][41694] Fps is (10 sec: 7372.7, 60 sec: 6758.4, 300 sec: 6664.7). Total num frames: 214364160. Throughput: 0: 1724.0. Samples: 48584946. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 07:13:52,933][41694] Avg episode reward: [(0, '4.550')] +[2024-11-08 07:13:53,472][42004] Updated weights for policy 0, policy_version 52336 (0.0032) +[2024-11-08 07:13:57,932][41694] Fps is (10 sec: 6963.2, 60 sec: 6690.1, 300 sec: 6664.7). Total num frames: 214396928. Throughput: 0: 1723.5. Samples: 48595712. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 07:13:57,937][41694] Avg episode reward: [(0, '4.763')] +[2024-11-08 07:13:59,335][42004] Updated weights for policy 0, policy_version 52346 (0.0028) +[2024-11-08 07:14:02,932][41694] Fps is (10 sec: 6143.9, 60 sec: 6758.4, 300 sec: 6664.7). Total num frames: 214425600. Throughput: 0: 1720.0. Samples: 48600672. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 07:14:02,934][41694] Avg episode reward: [(0, '4.486')] +[2024-11-08 07:14:06,814][42004] Updated weights for policy 0, policy_version 52356 (0.0035) +[2024-11-08 07:14:07,932][41694] Fps is (10 sec: 5734.5, 60 sec: 6758.4, 300 sec: 6650.9). Total num frames: 214454272. Throughput: 0: 1650.8. Samples: 48608758. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 07:14:07,934][41694] Avg episode reward: [(0, '4.387')] +[2024-11-08 07:14:12,931][41694] Fps is (10 sec: 6144.2, 60 sec: 6690.2, 300 sec: 6636.9). Total num frames: 214487040. Throughput: 0: 1622.4. Samples: 48618402. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 07:14:12,934][41694] Avg episode reward: [(0, '4.460')] +[2024-11-08 07:14:13,071][42004] Updated weights for policy 0, policy_version 52366 (0.0040) +[2024-11-08 07:14:17,932][41694] Fps is (10 sec: 6963.1, 60 sec: 6690.1, 300 sec: 6636.9). Total num frames: 214523904. Throughput: 0: 1625.6. Samples: 48623460. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 07:14:17,934][41694] Avg episode reward: [(0, '4.438')] +[2024-11-08 07:14:18,671][42004] Updated weights for policy 0, policy_version 52376 (0.0033) +[2024-11-08 07:14:22,931][41694] Fps is (10 sec: 7372.8, 60 sec: 6690.1, 300 sec: 6636.9). Total num frames: 214560768. Throughput: 0: 1667.6. Samples: 48634478. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:14:22,933][41694] Avg episode reward: [(0, '4.425')] +[2024-11-08 07:14:24,556][42004] Updated weights for policy 0, policy_version 52386 (0.0040) +[2024-11-08 07:14:27,932][41694] Fps is (10 sec: 6963.4, 60 sec: 6621.9, 300 sec: 6623.0). Total num frames: 214593536. Throughput: 0: 1684.3. Samples: 48645060. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:14:27,934][41694] Avg episode reward: [(0, '4.443')] +[2024-11-08 07:14:30,459][42004] Updated weights for policy 0, policy_version 52396 (0.0024) +[2024-11-08 07:14:32,932][41694] Fps is (10 sec: 6553.5, 60 sec: 6651.7, 300 sec: 6636.9). Total num frames: 214626304. Throughput: 0: 1673.1. Samples: 48650212. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:14:32,945][41694] Avg episode reward: [(0, '4.531')] +[2024-11-08 07:14:37,934][41694] Fps is (10 sec: 5323.5, 60 sec: 6485.1, 300 sec: 6603.7). Total num frames: 214646784. Throughput: 0: 1608.8. Samples: 48657346. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:14:37,938][41694] Avg episode reward: [(0, '4.738')] +[2024-11-08 07:14:40,144][42004] Updated weights for policy 0, policy_version 52406 (0.0048) +[2024-11-08 07:14:42,932][41694] Fps is (10 sec: 3686.3, 60 sec: 6212.2, 300 sec: 6539.7). Total num frames: 214663168. Throughput: 0: 1497.1. Samples: 48663080. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:14:42,936][41694] Avg episode reward: [(0, '4.804')] +[2024-11-08 07:14:47,931][41694] Fps is (10 sec: 3687.3, 60 sec: 5939.2, 300 sec: 6498.1). Total num frames: 214683648. Throughput: 0: 1437.1. Samples: 48665340. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:14:47,933][41694] Avg episode reward: [(0, '4.608')] +[2024-11-08 07:14:49,523][42004] Updated weights for policy 0, policy_version 52416 (0.0049) +[2024-11-08 07:14:52,932][41694] Fps is (10 sec: 5324.9, 60 sec: 5870.9, 300 sec: 6484.2). Total num frames: 214716416. Throughput: 0: 1448.6. Samples: 48673946. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:14:52,934][41694] Avg episode reward: [(0, '4.560')] +[2024-11-08 07:14:55,822][42004] Updated weights for policy 0, policy_version 52426 (0.0032) +[2024-11-08 07:14:57,932][41694] Fps is (10 sec: 6553.0, 60 sec: 5870.9, 300 sec: 6484.2). Total num frames: 214749184. Throughput: 0: 1451.2. Samples: 48683708. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 07:14:57,936][41694] Avg episode reward: [(0, '4.524')] +[2024-11-08 07:15:02,517][42004] Updated weights for policy 0, policy_version 52436 (0.0028) +[2024-11-08 07:15:02,932][41694] Fps is (10 sec: 6143.9, 60 sec: 5870.9, 300 sec: 6470.3). Total num frames: 214777856. Throughput: 0: 1445.6. Samples: 48688510. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 07:15:02,937][41694] Avg episode reward: [(0, '4.531')] +[2024-11-08 07:15:07,934][41694] Fps is (10 sec: 6552.6, 60 sec: 6007.2, 300 sec: 6498.0). Total num frames: 214814720. Throughput: 0: 1412.1. Samples: 48698026. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 07:15:07,935][41694] Avg episode reward: [(0, '4.721')] +[2024-11-08 07:15:08,302][42004] Updated weights for policy 0, policy_version 52446 (0.0034) +[2024-11-08 07:15:12,932][41694] Fps is (10 sec: 6553.6, 60 sec: 5939.2, 300 sec: 6511.9). Total num frames: 214843392. Throughput: 0: 1379.5. Samples: 48707136. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 07:15:12,935][41694] Avg episode reward: [(0, '4.506')] +[2024-11-08 07:15:15,616][42004] Updated weights for policy 0, policy_version 52456 (0.0039) +[2024-11-08 07:15:17,932][41694] Fps is (10 sec: 5735.6, 60 sec: 5802.7, 300 sec: 6498.1). Total num frames: 214872064. Throughput: 0: 1362.3. Samples: 48711518. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 07:15:17,934][41694] Avg episode reward: [(0, '4.464')] +[2024-11-08 07:15:22,548][42004] Updated weights for policy 0, policy_version 52466 (0.0046) +[2024-11-08 07:15:22,932][41694] Fps is (10 sec: 5734.3, 60 sec: 5666.1, 300 sec: 6484.2). Total num frames: 214900736. Throughput: 0: 1388.8. Samples: 48719840. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 07:15:22,937][41694] Avg episode reward: [(0, '4.393')] +[2024-11-08 07:15:27,933][41694] Fps is (10 sec: 6143.2, 60 sec: 5666.0, 300 sec: 6470.3). Total num frames: 214933504. Throughput: 0: 1479.7. Samples: 48729670. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2024-11-08 07:15:27,936][41694] Avg episode reward: [(0, '4.474')] +[2024-11-08 07:15:28,797][42004] Updated weights for policy 0, policy_version 52476 (0.0032) +[2024-11-08 07:15:32,932][41694] Fps is (10 sec: 6553.7, 60 sec: 5666.1, 300 sec: 6470.3). Total num frames: 214966272. Throughput: 0: 1544.3. Samples: 48734836. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 07:15:32,934][41694] Avg episode reward: [(0, '4.568')] +[2024-11-08 07:15:35,288][42004] Updated weights for policy 0, policy_version 52486 (0.0033) +[2024-11-08 07:15:37,938][41694] Fps is (10 sec: 6552.6, 60 sec: 5870.9, 300 sec: 6470.2). Total num frames: 214999040. Throughput: 0: 1569.1. Samples: 48744562. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 07:15:37,941][41694] Avg episode reward: [(0, '4.704')] +[2024-11-08 07:15:37,964][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000052490_214999040.pth... +[2024-11-08 07:15:38,137][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000052113_213454848.pth +[2024-11-08 07:15:41,527][42004] Updated weights for policy 0, policy_version 52496 (0.0032) +[2024-11-08 07:15:42,932][41694] Fps is (10 sec: 6553.6, 60 sec: 6144.0, 300 sec: 6470.3). Total num frames: 215031808. Throughput: 0: 1568.4. Samples: 48754284. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 07:15:42,934][41694] Avg episode reward: [(0, '4.696')] +[2024-11-08 07:15:47,882][42004] Updated weights for policy 0, policy_version 52506 (0.0025) +[2024-11-08 07:15:47,932][41694] Fps is (10 sec: 6555.6, 60 sec: 6348.8, 300 sec: 6484.2). Total num frames: 215064576. Throughput: 0: 1543.6. Samples: 48757972. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 07:15:47,933][41694] Avg episode reward: [(0, '4.303')] +[2024-11-08 07:15:52,932][41694] Fps is (10 sec: 6144.1, 60 sec: 6280.5, 300 sec: 6470.3). Total num frames: 215093248. Throughput: 0: 1572.0. Samples: 48768762. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 07:15:52,933][41694] Avg episode reward: [(0, '4.489')] +[2024-11-08 07:15:54,682][42004] Updated weights for policy 0, policy_version 52516 (0.0029) +[2024-11-08 07:15:57,932][41694] Fps is (10 sec: 5734.0, 60 sec: 6212.3, 300 sec: 6442.6). Total num frames: 215121920. Throughput: 0: 1565.4. Samples: 48777580. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2024-11-08 07:15:57,937][41694] Avg episode reward: [(0, '4.460')] +[2024-11-08 07:16:01,563][42004] Updated weights for policy 0, policy_version 52526 (0.0048) +[2024-11-08 07:16:02,931][41694] Fps is (10 sec: 6144.1, 60 sec: 6280.6, 300 sec: 6442.5). Total num frames: 215154688. Throughput: 0: 1561.1. Samples: 48781766. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 07:16:02,933][41694] Avg episode reward: [(0, '4.544')] +[2024-11-08 07:16:07,873][42004] Updated weights for policy 0, policy_version 52536 (0.0026) +[2024-11-08 07:16:07,933][41694] Fps is (10 sec: 6553.4, 60 sec: 6212.4, 300 sec: 6442.5). Total num frames: 215187456. Throughput: 0: 1585.1. Samples: 48791170. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 07:16:07,935][41694] Avg episode reward: [(0, '4.522')] +[2024-11-08 07:16:12,932][41694] Fps is (10 sec: 6143.9, 60 sec: 6212.3, 300 sec: 6428.6). Total num frames: 215216128. Throughput: 0: 1578.7. Samples: 48800708. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 07:16:12,933][41694] Avg episode reward: [(0, '4.521')] +[2024-11-08 07:16:14,691][42004] Updated weights for policy 0, policy_version 52546 (0.0039) +[2024-11-08 07:16:17,931][41694] Fps is (10 sec: 5735.1, 60 sec: 6212.3, 300 sec: 6428.6). Total num frames: 215244800. Throughput: 0: 1560.9. Samples: 48805078. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 07:16:17,933][41694] Avg episode reward: [(0, '4.638')] +[2024-11-08 07:16:21,154][42004] Updated weights for policy 0, policy_version 52556 (0.0505) +[2024-11-08 07:16:22,932][41694] Fps is (10 sec: 6143.9, 60 sec: 6280.5, 300 sec: 6428.6). Total num frames: 215277568. Throughput: 0: 1557.4. Samples: 48814642. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 07:16:22,935][41694] Avg episode reward: [(0, '4.745')] +[2024-11-08 07:16:27,612][42004] Updated weights for policy 0, policy_version 52566 (0.0026) +[2024-11-08 07:16:27,934][41694] Fps is (10 sec: 6552.0, 60 sec: 6280.5, 300 sec: 6414.7). Total num frames: 215310336. Throughput: 0: 1552.8. Samples: 48824164. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 07:16:27,937][41694] Avg episode reward: [(0, '4.398')] +[2024-11-08 07:16:32,931][41694] Fps is (10 sec: 6144.2, 60 sec: 6212.3, 300 sec: 6400.9). Total num frames: 215339008. Throughput: 0: 1567.7. Samples: 48828518. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2024-11-08 07:16:32,934][41694] Avg episode reward: [(0, '4.608')] +[2024-11-08 07:16:34,354][42004] Updated weights for policy 0, policy_version 52576 (0.0029) +[2024-11-08 07:16:37,932][41694] Fps is (10 sec: 6145.3, 60 sec: 6212.6, 300 sec: 6387.0). Total num frames: 215371776. Throughput: 0: 1545.3. Samples: 48838300. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:16:37,934][41694] Avg episode reward: [(0, '4.688')] +[2024-11-08 07:16:40,911][42004] Updated weights for policy 0, policy_version 52586 (0.0034) +[2024-11-08 07:16:42,933][41694] Fps is (10 sec: 6143.2, 60 sec: 6143.9, 300 sec: 6373.1). Total num frames: 215400448. Throughput: 0: 1546.7. Samples: 48847182. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:16:42,938][41694] Avg episode reward: [(0, '4.511')] +[2024-11-08 07:16:47,447][42004] Updated weights for policy 0, policy_version 52596 (0.0032) +[2024-11-08 07:16:47,932][41694] Fps is (10 sec: 6144.0, 60 sec: 6144.0, 300 sec: 6373.1). Total num frames: 215433216. Throughput: 0: 1547.4. Samples: 48851398. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:16:47,936][41694] Avg episode reward: [(0, '4.578')] +[2024-11-08 07:16:52,932][41694] Fps is (10 sec: 6554.3, 60 sec: 6212.3, 300 sec: 6387.0). Total num frames: 215465984. Throughput: 0: 1556.3. Samples: 48861204. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:16:52,934][41694] Avg episode reward: [(0, '4.389')] +[2024-11-08 07:16:53,675][42004] Updated weights for policy 0, policy_version 52606 (0.0033) +[2024-11-08 07:16:57,932][41694] Fps is (10 sec: 6553.7, 60 sec: 6280.6, 300 sec: 6387.0). Total num frames: 215498752. Throughput: 0: 1555.8. Samples: 48870720. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:16:57,933][41694] Avg episode reward: [(0, '4.680')] +[2024-11-08 07:17:00,172][42004] Updated weights for policy 0, policy_version 52616 (0.0041) +[2024-11-08 07:17:02,932][41694] Fps is (10 sec: 6144.1, 60 sec: 6212.3, 300 sec: 6373.1). Total num frames: 215527424. Throughput: 0: 1577.6. Samples: 48876068. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:17:02,934][41694] Avg episode reward: [(0, '4.774')] +[2024-11-08 07:17:07,932][41694] Fps is (10 sec: 5324.7, 60 sec: 6075.8, 300 sec: 6345.3). Total num frames: 215552000. Throughput: 0: 1527.0. Samples: 48883358. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:17:07,938][41694] Avg episode reward: [(0, '4.564')] +[2024-11-08 07:17:08,353][42004] Updated weights for policy 0, policy_version 52626 (0.0031) +[2024-11-08 07:17:12,932][41694] Fps is (10 sec: 5734.4, 60 sec: 6144.0, 300 sec: 6331.4). Total num frames: 215584768. Throughput: 0: 1529.0. Samples: 48892964. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:17:12,935][41694] Avg episode reward: [(0, '4.614')] +[2024-11-08 07:17:14,904][42004] Updated weights for policy 0, policy_version 52636 (0.0048) +[2024-11-08 07:17:17,932][41694] Fps is (10 sec: 6143.7, 60 sec: 6143.9, 300 sec: 6303.7). Total num frames: 215613440. Throughput: 0: 1522.8. Samples: 48897044. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:17:17,938][41694] Avg episode reward: [(0, '4.461')] +[2024-11-08 07:17:21,101][42004] Updated weights for policy 0, policy_version 52646 (0.0040) +[2024-11-08 07:17:22,932][41694] Fps is (10 sec: 6143.5, 60 sec: 6143.9, 300 sec: 6317.5). Total num frames: 215646208. Throughput: 0: 1523.9. Samples: 48906876. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:17:22,934][41694] Avg episode reward: [(0, '4.471')] +[2024-11-08 07:17:27,881][42004] Updated weights for policy 0, policy_version 52656 (0.0041) +[2024-11-08 07:17:27,932][41694] Fps is (10 sec: 6553.9, 60 sec: 6144.2, 300 sec: 6317.6). Total num frames: 215678976. Throughput: 0: 1523.1. Samples: 48915718. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2024-11-08 07:17:27,933][41694] Avg episode reward: [(0, '4.580')] +[2024-11-08 07:17:29,276][41694] Keyboard interrupt detected in the event loop EvtLoop [Runner_EvtLoop, process=main process 41694], exiting... +[2024-11-08 07:17:29,285][41694] Runner profile tree view: +main_loop: 29019.2188 +[2024-11-08 07:17:29,294][41694] Collected {0: 215687168}, FPS: 6742.9 +[2024-11-08 07:17:29,328][41991] Stopping Batcher_0... +[2024-11-08 07:17:29,333][41991] Loop batcher_evt_loop terminating... +[2024-11-08 07:17:29,341][41991] Saving /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000052658_215687168.pth... +[2024-11-08 07:17:29,520][41991] Removing /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000052309_214257664.pth +[2024-11-08 07:17:29,533][41991] Stopping LearnerWorker_p0... +[2024-11-08 07:17:29,533][41991] Loop learner_proc0_evt_loop terminating... +[2024-11-08 07:17:29,843][42009] Stopping RolloutWorker_w5... +[2024-11-08 07:17:29,844][42006] Stopping RolloutWorker_w1... +[2024-11-08 07:17:29,850][42006] Loop rollout_proc1_evt_loop terminating... +[2024-11-08 07:17:29,850][42009] Loop rollout_proc5_evt_loop terminating... +[2024-11-08 07:17:29,847][42008] Stopping RolloutWorker_w3... +[2024-11-08 07:17:29,860][42008] Loop rollout_proc3_evt_loop terminating... +[2024-11-08 07:17:29,867][42010] Stopping RolloutWorker_w4... +[2024-11-08 07:17:29,881][42010] Loop rollout_proc4_evt_loop terminating... +[2024-11-08 07:17:29,858][42005] Stopping RolloutWorker_w0... +[2024-11-08 07:17:29,889][42005] Loop rollout_proc0_evt_loop terminating... +[2024-11-08 07:17:29,955][42004] Weights refcount: 2 0 +[2024-11-08 07:17:29,964][42004] Stopping InferenceWorker_p0-w0... +[2024-11-08 07:17:29,964][42004] Loop inference_proc0-0_evt_loop terminating... +[2024-11-08 07:17:29,934][42018] Stopping RolloutWorker_w7... +[2024-11-08 07:17:29,968][42018] Loop rollout_proc7_evt_loop terminating... +[2024-11-08 07:17:30,233][42007] Stopping RolloutWorker_w2... +[2024-11-08 07:17:30,253][42007] Loop rollout_proc2_evt_loop terminating... +[2024-11-08 07:17:31,452][42017] Stopping RolloutWorker_w6... +[2024-11-08 07:17:31,680][42017] Loop rollout_proc6_evt_loop terminating... +[2024-11-08 07:18:22,720][41694] Loading existing experiment configuration from /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/config.json +[2024-11-08 07:18:22,722][41694] Overriding arg 'num_workers' with value 1 passed from command line +[2024-11-08 07:18:22,724][41694] Adding new argument 'no_render'=True that is not in the saved config file! +[2024-11-08 07:18:22,726][41694] Adding new argument 'save_video'=True that is not in the saved config file! +[2024-11-08 07:18:22,728][41694] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! +[2024-11-08 07:18:22,730][41694] Adding new argument 'video_name'=None that is not in the saved config file! +[2024-11-08 07:18:22,732][41694] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! +[2024-11-08 07:18:22,734][41694] Adding new argument 'max_num_episodes'=100 that is not in the saved config file! +[2024-11-08 07:18:22,737][41694] Adding new argument 'push_to_hub'=False that is not in the saved config file! +[2024-11-08 07:18:22,738][41694] Adding new argument 'hf_repository'=None that is not in the saved config file! +[2024-11-08 07:18:22,740][41694] Adding new argument 'policy_index'=0 that is not in the saved config file! +[2024-11-08 07:18:22,742][41694] Adding new argument 'eval_deterministic'=False that is not in the saved config file! +[2024-11-08 07:18:22,745][41694] Adding new argument 'train_script'=None that is not in the saved config file! +[2024-11-08 07:18:22,746][41694] Adding new argument 'enjoy_script'=None that is not in the saved config file! +[2024-11-08 07:18:22,748][41694] Using frameskip 1 and render_action_repeat=4 for evaluation +[2024-11-08 07:18:22,824][41694] Doom resolution: 160x120, resize resolution: (128, 72) +[2024-11-08 07:18:22,842][41694] RunningMeanStd input shape: (3, 72, 128) +[2024-11-08 07:18:22,861][41694] RunningMeanStd input shape: (1,) +[2024-11-08 07:18:22,952][41694] ConvEncoder: input_channels=3 +[2024-11-08 07:18:23,210][41694] Conv encoder output size: 512 +[2024-11-08 07:18:23,216][41694] Policy head output size: 512 +[2024-11-08 07:18:24,814][41694] Loading state from checkpoint /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000052658_215687168.pth... +[2024-11-08 07:18:27,067][41694] Num frames 100... +[2024-11-08 07:18:27,242][41694] Num frames 200... +[2024-11-08 07:18:27,445][41694] Num frames 300... +[2024-11-08 07:18:28,200][41694] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840 +[2024-11-08 07:18:28,206][41694] Avg episode reward: 3.840, avg true_objective: 3.840 +[2024-11-08 07:18:28,262][41694] Num frames 400... +[2024-11-08 07:18:28,436][41694] Num frames 500... +[2024-11-08 07:18:28,660][41694] Num frames 600... +[2024-11-08 07:18:28,872][41694] Num frames 700... +[2024-11-08 07:18:29,088][41694] Num frames 800... +[2024-11-08 07:18:29,284][41694] Avg episode rewards: #0: 5.320, true rewards: #0: 4.320 +[2024-11-08 07:18:29,289][41694] Avg episode reward: 5.320, avg true_objective: 4.320 +[2024-11-08 07:18:29,366][41694] Num frames 900... +[2024-11-08 07:18:29,532][41694] Num frames 1000... +[2024-11-08 07:18:29,700][41694] Num frames 1100... +[2024-11-08 07:18:29,942][41694] Num frames 1200... +[2024-11-08 07:18:30,174][41694] Avg episode rewards: #0: 5.647, true rewards: #0: 4.313 +[2024-11-08 07:18:30,178][41694] Avg episode reward: 5.647, avg true_objective: 4.313 +[2024-11-08 07:18:30,204][41694] Num frames 1300... +[2024-11-08 07:18:30,587][41694] Num frames 1400... +[2024-11-08 07:18:31,071][41694] Num frames 1500... +[2024-11-08 07:18:31,256][41694] Num frames 1600... +[2024-11-08 07:18:31,449][41694] Num frames 1700... +[2024-11-08 07:18:31,555][41694] Avg episode rewards: #0: 5.525, true rewards: #0: 4.275 +[2024-11-08 07:18:31,562][41694] Avg episode reward: 5.525, avg true_objective: 4.275 +[2024-11-08 07:18:31,752][41694] Num frames 1800... +[2024-11-08 07:18:31,938][41694] Num frames 1900... +[2024-11-08 07:18:32,185][41694] Avg episode rewards: #0: 5.196, true rewards: #0: 3.996 +[2024-11-08 07:18:32,187][41694] Avg episode reward: 5.196, avg true_objective: 3.996 +[2024-11-08 07:18:32,191][41694] Num frames 2000... +[2024-11-08 07:18:32,440][41694] Num frames 2100... +[2024-11-08 07:18:32,630][41694] Num frames 2200... +[2024-11-08 07:18:32,887][41694] Num frames 2300... +[2024-11-08 07:18:33,110][41694] Avg episode rewards: #0: 4.970, true rewards: #0: 3.970 +[2024-11-08 07:18:33,111][41694] Avg episode reward: 4.970, avg true_objective: 3.970 +[2024-11-08 07:18:33,150][41694] Num frames 2400... +[2024-11-08 07:18:33,359][41694] Num frames 2500... +[2024-11-08 07:18:33,545][41694] Num frames 2600... +[2024-11-08 07:18:33,738][41694] Num frames 2700... +[2024-11-08 07:18:33,924][41694] Avg episode rewards: #0: 4.809, true rewards: #0: 3.951 +[2024-11-08 07:18:33,926][41694] Avg episode reward: 4.809, avg true_objective: 3.951 +[2024-11-08 07:18:33,998][41694] Num frames 2800... +[2024-11-08 07:18:34,215][41694] Num frames 2900... +[2024-11-08 07:18:34,414][41694] Num frames 3000... +[2024-11-08 07:18:34,660][41694] Num frames 3100... +[2024-11-08 07:18:34,895][41694] Avg episode rewards: #0: 4.688, true rewards: #0: 3.937 +[2024-11-08 07:18:34,899][41694] Avg episode reward: 4.688, avg true_objective: 3.937 +[2024-11-08 07:18:35,060][41694] Num frames 3200... +[2024-11-08 07:18:35,233][41694] Num frames 3300... +[2024-11-08 07:18:35,403][41694] Num frames 3400... +[2024-11-08 07:18:35,580][41694] Num frames 3500... +[2024-11-08 07:18:35,694][41694] Avg episode rewards: #0: 4.593, true rewards: #0: 3.927 +[2024-11-08 07:18:35,696][41694] Avg episode reward: 4.593, avg true_objective: 3.927 +[2024-11-08 07:18:35,822][41694] Num frames 3600... +[2024-11-08 07:18:35,993][41694] Num frames 3700... +[2024-11-08 07:18:36,161][41694] Num frames 3800... +[2024-11-08 07:18:36,351][41694] Num frames 3900... +[2024-11-08 07:18:36,523][41694] Num frames 4000... +[2024-11-08 07:18:36,716][41694] Avg episode rewards: #0: 4.878, true rewards: #0: 4.078 +[2024-11-08 07:18:36,721][41694] Avg episode reward: 4.878, avg true_objective: 4.078 +[2024-11-08 07:18:36,786][41694] Num frames 4100... +[2024-11-08 07:18:36,953][41694] Num frames 4200... +[2024-11-08 07:18:37,115][41694] Num frames 4300... +[2024-11-08 07:18:37,291][41694] Num frames 4400... +[2024-11-08 07:18:37,464][41694] Num frames 4500... +[2024-11-08 07:18:37,566][41694] Avg episode rewards: #0: 4.933, true rewards: #0: 4.115 +[2024-11-08 07:18:37,569][41694] Avg episode reward: 4.933, avg true_objective: 4.115 +[2024-11-08 07:18:37,708][41694] Num frames 4600... +[2024-11-08 07:18:37,879][41694] Num frames 4700... +[2024-11-08 07:18:38,059][41694] Num frames 4800... +[2024-11-08 07:18:38,235][41694] Num frames 4900... +[2024-11-08 07:18:38,310][41694] Avg episode rewards: #0: 4.842, true rewards: #0: 4.092 +[2024-11-08 07:18:38,311][41694] Avg episode reward: 4.842, avg true_objective: 4.092 +[2024-11-08 07:18:38,489][41694] Num frames 5000... +[2024-11-08 07:18:38,672][41694] Num frames 5100... +[2024-11-08 07:18:38,873][41694] Num frames 5200... +[2024-11-08 07:18:39,114][41694] Num frames 5300... +[2024-11-08 07:18:39,329][41694] Avg episode rewards: #0: 4.891, true rewards: #0: 4.122 +[2024-11-08 07:18:39,332][41694] Avg episode reward: 4.891, avg true_objective: 4.122 +[2024-11-08 07:18:39,430][41694] Num frames 5400... +[2024-11-08 07:18:39,638][41694] Num frames 5500... +[2024-11-08 07:18:39,838][41694] Num frames 5600... +[2024-11-08 07:18:40,057][41694] Num frames 5700... +[2024-11-08 07:18:40,265][41694] Avg episode rewards: #0: 4.910, true rewards: #0: 4.124 +[2024-11-08 07:18:40,271][41694] Avg episode reward: 4.910, avg true_objective: 4.124 +[2024-11-08 07:18:40,357][41694] Num frames 5800... +[2024-11-08 07:18:40,574][41694] Num frames 5900... +[2024-11-08 07:18:40,785][41694] Num frames 6000... +[2024-11-08 07:18:40,985][41694] Num frames 6100... +[2024-11-08 07:18:41,174][41694] Avg episode rewards: #0: 4.839, true rewards: #0: 4.105 +[2024-11-08 07:18:41,176][41694] Avg episode reward: 4.839, avg true_objective: 4.105 +[2024-11-08 07:18:41,262][41694] Num frames 6200... +[2024-11-08 07:18:41,471][41694] Num frames 6300... +[2024-11-08 07:18:41,683][41694] Num frames 6400... +[2024-11-08 07:18:41,879][41694] Num frames 6500... +[2024-11-08 07:18:42,023][41694] Avg episode rewards: #0: 4.776, true rewards: #0: 4.089 +[2024-11-08 07:18:42,030][41694] Avg episode reward: 4.776, avg true_objective: 4.089 +[2024-11-08 07:18:42,158][41694] Num frames 6600... +[2024-11-08 07:18:42,360][41694] Num frames 6700... +[2024-11-08 07:18:42,586][41694] Num frames 6800... +[2024-11-08 07:18:42,828][41694] Num frames 6900... +[2024-11-08 07:18:43,108][41694] Avg episode rewards: #0: 4.818, true rewards: #0: 4.112 +[2024-11-08 07:18:43,110][41694] Avg episode reward: 4.818, avg true_objective: 4.112 +[2024-11-08 07:18:43,139][41694] Num frames 7000... +[2024-11-08 07:18:43,373][41694] Num frames 7100... +[2024-11-08 07:18:43,615][41694] Num frames 7200... +[2024-11-08 07:18:43,846][41694] Num frames 7300... +[2024-11-08 07:18:44,054][41694] Avg episode rewards: #0: 4.763, true rewards: #0: 4.097 +[2024-11-08 07:18:44,056][41694] Avg episode reward: 4.763, avg true_objective: 4.097 +[2024-11-08 07:18:44,111][41694] Num frames 7400... +[2024-11-08 07:18:44,344][41694] Num frames 7500... +[2024-11-08 07:18:44,535][41694] Num frames 7600... +[2024-11-08 07:18:44,742][41694] Num frames 7700... +[2024-11-08 07:18:44,927][41694] Num frames 7800... +[2024-11-08 07:18:45,082][41694] Avg episode rewards: #0: 4.818, true rewards: #0: 4.134 +[2024-11-08 07:18:45,083][41694] Avg episode reward: 4.818, avg true_objective: 4.134 +[2024-11-08 07:18:45,171][41694] Num frames 7900... +[2024-11-08 07:18:45,385][41694] Num frames 8000... +[2024-11-08 07:18:45,669][41694] Num frames 8100... +[2024-11-08 07:18:45,930][41694] Num frames 8200... +[2024-11-08 07:18:46,067][41694] Avg episode rewards: #0: 4.769, true rewards: #0: 4.119 +[2024-11-08 07:18:46,068][41694] Avg episode reward: 4.769, avg true_objective: 4.119 +[2024-11-08 07:18:46,209][41694] Num frames 8300... +[2024-11-08 07:18:46,405][41694] Num frames 8400... +[2024-11-08 07:18:46,594][41694] Num frames 8500... +[2024-11-08 07:18:46,799][41694] Num frames 8600... +[2024-11-08 07:18:46,887][41694] Avg episode rewards: #0: 4.720, true rewards: #0: 4.101 +[2024-11-08 07:18:46,888][41694] Avg episode reward: 4.720, avg true_objective: 4.101 +[2024-11-08 07:18:47,076][41694] Num frames 8700... +[2024-11-08 07:18:47,301][41694] Num frames 8800... +[2024-11-08 07:18:47,504][41694] Num frames 8900... +[2024-11-08 07:18:47,757][41694] Avg episode rewards: #0: 4.680, true rewards: #0: 4.090 +[2024-11-08 07:18:47,762][41694] Avg episode reward: 4.680, avg true_objective: 4.090 +[2024-11-08 07:18:47,781][41694] Num frames 9000... +[2024-11-08 07:18:48,003][41694] Num frames 9100... +[2024-11-08 07:18:48,218][41694] Num frames 9200... +[2024-11-08 07:18:48,428][41694] Num frames 9300... +[2024-11-08 07:18:48,654][41694] Avg episode rewards: #0: 4.644, true rewards: #0: 4.079 +[2024-11-08 07:18:48,657][41694] Avg episode reward: 4.644, avg true_objective: 4.079 +[2024-11-08 07:18:48,713][41694] Num frames 9400... +[2024-11-08 07:18:48,922][41694] Num frames 9500... +[2024-11-08 07:18:49,136][41694] Num frames 9600... +[2024-11-08 07:18:49,347][41694] Num frames 9700... +[2024-11-08 07:18:49,538][41694] Avg episode rewards: #0: 4.610, true rewards: #0: 4.069 +[2024-11-08 07:18:49,541][41694] Avg episode reward: 4.610, avg true_objective: 4.069 +[2024-11-08 07:18:49,631][41694] Num frames 9800... +[2024-11-08 07:18:49,846][41694] Num frames 9900... +[2024-11-08 07:18:50,060][41694] Num frames 10000... +[2024-11-08 07:18:50,276][41694] Num frames 10100... +[2024-11-08 07:18:50,443][41694] Avg episode rewards: #0: 4.580, true rewards: #0: 4.060 +[2024-11-08 07:18:50,448][41694] Avg episode reward: 4.580, avg true_objective: 4.060 +[2024-11-08 07:18:50,566][41694] Num frames 10200... +[2024-11-08 07:18:50,768][41694] Num frames 10300... +[2024-11-08 07:18:51,005][41694] Num frames 10400... +[2024-11-08 07:18:51,210][41694] Num frames 10500... +[2024-11-08 07:18:51,480][41694] Avg episode rewards: #0: 4.614, true rewards: #0: 4.076 +[2024-11-08 07:18:51,482][41694] Avg episode reward: 4.614, avg true_objective: 4.076 +[2024-11-08 07:18:51,491][41694] Num frames 10600... +[2024-11-08 07:18:51,703][41694] Num frames 10700... +[2024-11-08 07:18:51,897][41694] Num frames 10800... +[2024-11-08 07:18:52,102][41694] Num frames 10900... +[2024-11-08 07:18:52,317][41694] Avg episode rewards: #0: 4.586, true rewards: #0: 4.067 +[2024-11-08 07:18:52,321][41694] Avg episode reward: 4.586, avg true_objective: 4.067 +[2024-11-08 07:18:52,375][41694] Num frames 11000... +[2024-11-08 07:18:52,563][41694] Num frames 11100... +[2024-11-08 07:18:52,751][41694] Num frames 11200... +[2024-11-08 07:18:52,942][41694] Num frames 11300... +[2024-11-08 07:18:53,120][41694] Avg episode rewards: #0: 4.559, true rewards: #0: 4.059 +[2024-11-08 07:18:53,127][41694] Avg episode reward: 4.559, avg true_objective: 4.059 +[2024-11-08 07:18:53,208][41694] Num frames 11400... +[2024-11-08 07:18:53,389][41694] Num frames 11500... +[2024-11-08 07:18:53,567][41694] Num frames 11600... +[2024-11-08 07:18:53,749][41694] Num frames 11700... +[2024-11-08 07:18:53,897][41694] Avg episode rewards: #0: 4.534, true rewards: #0: 4.051 +[2024-11-08 07:18:53,901][41694] Avg episode reward: 4.534, avg true_objective: 4.051 +[2024-11-08 07:18:54,012][41694] Num frames 11800... +[2024-11-08 07:18:54,196][41694] Num frames 11900... +[2024-11-08 07:18:54,375][41694] Num frames 12000... +[2024-11-08 07:18:54,555][41694] Num frames 12100... +[2024-11-08 07:18:54,796][41694] Avg episode rewards: #0: 4.566, true rewards: #0: 4.066 +[2024-11-08 07:18:54,798][41694] Avg episode reward: 4.566, avg true_objective: 4.066 +[2024-11-08 07:18:54,806][41694] Num frames 12200... +[2024-11-08 07:18:55,133][41694] Num frames 12300... +[2024-11-08 07:18:55,418][41694] Num frames 12400... +[2024-11-08 07:18:55,680][41694] Num frames 12500... +[2024-11-08 07:18:55,959][41694] Avg episode rewards: #0: 4.542, true rewards: #0: 4.058 +[2024-11-08 07:18:55,963][41694] Avg episode reward: 4.542, avg true_objective: 4.058 +[2024-11-08 07:18:56,019][41694] Num frames 12600... +[2024-11-08 07:18:56,300][41694] Num frames 12700... +[2024-11-08 07:18:56,582][41694] Num frames 12800... +[2024-11-08 07:18:56,862][41694] Num frames 12900... +[2024-11-08 07:18:57,097][41694] Avg episode rewards: #0: 4.520, true rewards: #0: 4.052 +[2024-11-08 07:18:57,099][41694] Avg episode reward: 4.520, avg true_objective: 4.052 +[2024-11-08 07:18:57,197][41694] Num frames 13000... +[2024-11-08 07:18:57,453][41694] Num frames 13100... +[2024-11-08 07:18:57,683][41694] Num frames 13200... +[2024-11-08 07:18:57,915][41694] Num frames 13300... +[2024-11-08 07:18:58,104][41694] Avg episode rewards: #0: 4.500, true rewards: #0: 4.045 +[2024-11-08 07:18:58,106][41694] Avg episode reward: 4.500, avg true_objective: 4.045 +[2024-11-08 07:18:58,221][41694] Num frames 13400... +[2024-11-08 07:18:58,418][41694] Num frames 13500... +[2024-11-08 07:18:58,611][41694] Num frames 13600... +[2024-11-08 07:18:58,799][41694] Num frames 13700... +[2024-11-08 07:18:58,925][41694] Avg episode rewards: #0: 4.480, true rewards: #0: 4.039 +[2024-11-08 07:18:58,930][41694] Avg episode reward: 4.480, avg true_objective: 4.039 +[2024-11-08 07:18:59,082][41694] Num frames 13800... +[2024-11-08 07:18:59,280][41694] Num frames 13900... +[2024-11-08 07:18:59,479][41694] Num frames 14000... +[2024-11-08 07:18:59,683][41694] Num frames 14100... +[2024-11-08 07:18:59,785][41694] Avg episode rewards: #0: 4.462, true rewards: #0: 4.033 +[2024-11-08 07:18:59,786][41694] Avg episode reward: 4.462, avg true_objective: 4.033 +[2024-11-08 07:18:59,962][41694] Num frames 14200... +[2024-11-08 07:19:00,269][41694] Num frames 14300... +[2024-11-08 07:19:00,973][41694] Num frames 14400... +[2024-11-08 07:19:01,201][41694] Num frames 14500... +[2024-11-08 07:19:01,262][41694] Avg episode rewards: #0: 4.445, true rewards: #0: 4.028 +[2024-11-08 07:19:01,264][41694] Avg episode reward: 4.445, avg true_objective: 4.028 +[2024-11-08 07:19:01,514][41694] Num frames 14600... +[2024-11-08 07:19:01,693][41694] Num frames 14700... +[2024-11-08 07:19:01,881][41694] Num frames 14800... +[2024-11-08 07:19:02,089][41694] Avg episode rewards: #0: 4.428, true rewards: #0: 4.023 +[2024-11-08 07:19:02,091][41694] Avg episode reward: 4.428, avg true_objective: 4.023 +[2024-11-08 07:19:02,140][41694] Num frames 14900... +[2024-11-08 07:19:02,330][41694] Num frames 15000... +[2024-11-08 07:19:02,503][41694] Num frames 15100... +[2024-11-08 07:19:02,676][41694] Num frames 15200... +[2024-11-08 07:19:02,854][41694] Avg episode rewards: #0: 4.413, true rewards: #0: 4.018 +[2024-11-08 07:19:02,858][41694] Avg episode reward: 4.413, avg true_objective: 4.018 +[2024-11-08 07:19:02,936][41694] Num frames 15300... +[2024-11-08 07:19:03,115][41694] Num frames 15400... +[2024-11-08 07:19:03,304][41694] Num frames 15500... +[2024-11-08 07:19:03,504][41694] Num frames 15600... +[2024-11-08 07:19:03,671][41694] Avg episode rewards: #0: 4.398, true rewards: #0: 4.014 +[2024-11-08 07:19:03,677][41694] Avg episode reward: 4.398, avg true_objective: 4.014 +[2024-11-08 07:19:03,791][41694] Num frames 15700... +[2024-11-08 07:19:04,007][41694] Num frames 15800... +[2024-11-08 07:19:04,194][41694] Num frames 15900... +[2024-11-08 07:19:04,386][41694] Num frames 16000... +[2024-11-08 07:19:04,521][41694] Avg episode rewards: #0: 4.384, true rewards: #0: 4.009 +[2024-11-08 07:19:04,528][41694] Avg episode reward: 4.384, avg true_objective: 4.009 +[2024-11-08 07:19:04,670][41694] Num frames 16100... +[2024-11-08 07:19:04,857][41694] Num frames 16200... +[2024-11-08 07:19:05,084][41694] Num frames 16300... +[2024-11-08 07:19:05,287][41694] Num frames 16400... +[2024-11-08 07:19:05,489][41694] Num frames 16500... +[2024-11-08 07:19:05,707][41694] Avg episode rewards: #0: 4.459, true rewards: #0: 4.044 +[2024-11-08 07:19:05,712][41694] Avg episode reward: 4.459, avg true_objective: 4.044 +[2024-11-08 07:19:05,769][41694] Num frames 16600... +[2024-11-08 07:19:05,964][41694] Num frames 16700... +[2024-11-08 07:19:06,164][41694] Num frames 16800... +[2024-11-08 07:19:06,355][41694] Num frames 16900... +[2024-11-08 07:19:06,539][41694] Avg episode rewards: #0: 4.444, true rewards: #0: 4.039 +[2024-11-08 07:19:06,542][41694] Avg episode reward: 4.444, avg true_objective: 4.039 +[2024-11-08 07:19:06,638][41694] Num frames 17000... +[2024-11-08 07:19:06,836][41694] Num frames 17100... +[2024-11-08 07:19:07,040][41694] Num frames 17200... +[2024-11-08 07:19:07,228][41694] Num frames 17300... +[2024-11-08 07:19:07,411][41694] Num frames 17400... +[2024-11-08 07:19:07,498][41694] Avg episode rewards: #0: 4.468, true rewards: #0: 4.050 +[2024-11-08 07:19:07,504][41694] Avg episode reward: 4.468, avg true_objective: 4.050 +[2024-11-08 07:19:07,686][41694] Num frames 17500... +[2024-11-08 07:19:07,860][41694] Num frames 17600... +[2024-11-08 07:19:08,042][41694] Num frames 17700... +[2024-11-08 07:19:08,278][41694] Avg episode rewards: #0: 4.454, true rewards: #0: 4.045 +[2024-11-08 07:19:08,281][41694] Avg episode reward: 4.454, avg true_objective: 4.045 +[2024-11-08 07:19:08,290][41694] Num frames 17800... +[2024-11-08 07:19:08,483][41694] Num frames 17900... +[2024-11-08 07:19:08,671][41694] Num frames 18000... +[2024-11-08 07:19:08,853][41694] Num frames 18100... +[2024-11-08 07:19:09,027][41694] Num frames 18200... +[2024-11-08 07:19:09,116][41694] Avg episode rewards: #0: 4.470, true rewards: #0: 4.047 +[2024-11-08 07:19:09,120][41694] Avg episode reward: 4.470, avg true_objective: 4.047 +[2024-11-08 07:19:09,289][41694] Num frames 18300... +[2024-11-08 07:19:09,506][41694] Num frames 18400... +[2024-11-08 07:19:09,684][41694] Num frames 18500... +[2024-11-08 07:19:09,930][41694] Avg episode rewards: #0: 4.456, true rewards: #0: 4.043 +[2024-11-08 07:19:09,934][41694] Avg episode reward: 4.456, avg true_objective: 4.043 +[2024-11-08 07:19:09,947][41694] Num frames 18600... +[2024-11-08 07:19:10,139][41694] Num frames 18700... +[2024-11-08 07:19:10,325][41694] Num frames 18800... +[2024-11-08 07:19:10,508][41694] Num frames 18900... +[2024-11-08 07:19:10,685][41694] Num frames 19000... +[2024-11-08 07:19:10,827][41694] Avg episode rewards: #0: 4.478, true rewards: #0: 4.052 +[2024-11-08 07:19:10,829][41694] Avg episode reward: 4.478, avg true_objective: 4.052 +[2024-11-08 07:19:10,932][41694] Num frames 19100... +[2024-11-08 07:19:11,119][41694] Num frames 19200... +[2024-11-08 07:19:11,328][41694] Num frames 19300... +[2024-11-08 07:19:11,564][41694] Num frames 19400... +[2024-11-08 07:19:11,688][41694] Avg episode rewards: #0: 4.464, true rewards: #0: 4.048 +[2024-11-08 07:19:11,690][41694] Avg episode reward: 4.464, avg true_objective: 4.048 +[2024-11-08 07:19:11,841][41694] Num frames 19500... +[2024-11-08 07:19:12,049][41694] Num frames 19600... +[2024-11-08 07:19:12,264][41694] Num frames 19700... +[2024-11-08 07:19:12,454][41694] Num frames 19800... +[2024-11-08 07:19:12,541][41694] Avg episode rewards: #0: 4.452, true rewards: #0: 4.043 +[2024-11-08 07:19:12,543][41694] Avg episode reward: 4.452, avg true_objective: 4.043 +[2024-11-08 07:19:12,729][41694] Num frames 19900... +[2024-11-08 07:19:12,918][41694] Num frames 20000... +[2024-11-08 07:19:13,115][41694] Num frames 20100... +[2024-11-08 07:19:13,336][41694] Num frames 20200... +[2024-11-08 07:19:13,514][41694] Avg episode rewards: #0: 4.472, true rewards: #0: 4.052 +[2024-11-08 07:19:13,517][41694] Avg episode reward: 4.472, avg true_objective: 4.052 +[2024-11-08 07:19:13,627][41694] Num frames 20300... +[2024-11-08 07:19:13,825][41694] Num frames 20400... +[2024-11-08 07:19:14,026][41694] Num frames 20500... +[2024-11-08 07:19:14,217][41694] Num frames 20600... +[2024-11-08 07:19:14,374][41694] Avg episode rewards: #0: 4.460, true rewards: #0: 4.048 +[2024-11-08 07:19:14,378][41694] Avg episode reward: 4.460, avg true_objective: 4.048 +[2024-11-08 07:19:14,512][41694] Num frames 20700... +[2024-11-08 07:19:14,735][41694] Num frames 20800... +[2024-11-08 07:19:14,961][41694] Num frames 20900... +[2024-11-08 07:19:15,200][41694] Num frames 21000... +[2024-11-08 07:19:15,329][41694] Avg episode rewards: #0: 4.448, true rewards: #0: 4.044 +[2024-11-08 07:19:15,332][41694] Avg episode reward: 4.448, avg true_objective: 4.044 +[2024-11-08 07:19:15,500][41694] Num frames 21100... +[2024-11-08 07:19:15,732][41694] Num frames 21200... +[2024-11-08 07:19:15,985][41694] Num frames 21300... +[2024-11-08 07:19:16,210][41694] Num frames 21400... +[2024-11-08 07:19:16,435][41694] Avg episode rewards: #0: 4.467, true rewards: #0: 4.052 +[2024-11-08 07:19:16,436][41694] Avg episode reward: 4.467, avg true_objective: 4.052 +[2024-11-08 07:19:16,488][41694] Num frames 21500... +[2024-11-08 07:19:16,715][41694] Num frames 21600... +[2024-11-08 07:19:16,900][41694] Num frames 21700... +[2024-11-08 07:19:17,092][41694] Num frames 21800... +[2024-11-08 07:19:17,272][41694] Avg episode rewards: #0: 4.456, true rewards: #0: 4.048 +[2024-11-08 07:19:17,274][41694] Avg episode reward: 4.456, avg true_objective: 4.048 +[2024-11-08 07:19:17,353][41694] Num frames 21900... +[2024-11-08 07:19:17,558][41694] Num frames 22000... +[2024-11-08 07:19:17,760][41694] Num frames 22100... +[2024-11-08 07:19:17,974][41694] Num frames 22200... +[2024-11-08 07:19:18,171][41694] Num frames 22300... +[2024-11-08 07:19:18,249][41694] Avg episode rewards: #0: 4.474, true rewards: #0: 4.056 +[2024-11-08 07:19:18,251][41694] Avg episode reward: 4.474, avg true_objective: 4.056 +[2024-11-08 07:19:18,439][41694] Num frames 22400... +[2024-11-08 07:19:18,617][41694] Num frames 22500... +[2024-11-08 07:19:18,796][41694] Num frames 22600... +[2024-11-08 07:19:18,973][41694] Num frames 22700... +[2024-11-08 07:19:19,163][41694] Num frames 22800... +[2024-11-08 07:19:19,387][41694] Avg episode rewards: #0: 4.551, true rewards: #0: 4.087 +[2024-11-08 07:19:19,393][41694] Avg episode reward: 4.551, avg true_objective: 4.087 +[2024-11-08 07:19:19,442][41694] Num frames 22900... +[2024-11-08 07:19:19,650][41694] Num frames 23000... +[2024-11-08 07:19:19,849][41694] Num frames 23100... +[2024-11-08 07:19:20,048][41694] Num frames 23200... +[2024-11-08 07:19:20,251][41694] Num frames 23300... +[2024-11-08 07:19:20,313][41694] Avg episode rewards: #0: 4.544, true rewards: #0: 4.088 +[2024-11-08 07:19:20,314][41694] Avg episode reward: 4.544, avg true_objective: 4.088 +[2024-11-08 07:19:20,523][41694] Num frames 23400... +[2024-11-08 07:19:20,723][41694] Num frames 23500... +[2024-11-08 07:19:20,914][41694] Num frames 23600... +[2024-11-08 07:19:21,141][41694] Avg episode rewards: #0: 4.532, true rewards: #0: 4.084 +[2024-11-08 07:19:21,146][41694] Avg episode reward: 4.532, avg true_objective: 4.084 +[2024-11-08 07:19:21,194][41694] Num frames 23700... +[2024-11-08 07:19:21,394][41694] Num frames 23800... +[2024-11-08 07:19:21,604][41694] Num frames 23900... +[2024-11-08 07:19:21,807][41694] Num frames 24000... +[2024-11-08 07:19:22,003][41694] Num frames 24100... +[2024-11-08 07:19:22,128][41694] Avg episode rewards: #0: 4.548, true rewards: #0: 4.090 +[2024-11-08 07:19:22,132][41694] Avg episode reward: 4.548, avg true_objective: 4.090 +[2024-11-08 07:19:22,290][41694] Num frames 24200... +[2024-11-08 07:19:22,494][41694] Num frames 24300... +[2024-11-08 07:19:22,701][41694] Num frames 24400... +[2024-11-08 07:19:22,901][41694] Num frames 24500... +[2024-11-08 07:19:22,997][41694] Avg episode rewards: #0: 4.536, true rewards: #0: 4.086 +[2024-11-08 07:19:23,002][41694] Avg episode reward: 4.536, avg true_objective: 4.086 +[2024-11-08 07:19:23,189][41694] Num frames 24600... +[2024-11-08 07:19:23,380][41694] Num frames 24700... +[2024-11-08 07:19:23,575][41694] Num frames 24800... +[2024-11-08 07:19:23,778][41694] Num frames 24900... +[2024-11-08 07:19:23,839][41694] Avg episode rewards: #0: 4.525, true rewards: #0: 4.082 +[2024-11-08 07:19:23,841][41694] Avg episode reward: 4.525, avg true_objective: 4.082 +[2024-11-08 07:19:24,038][41694] Num frames 25000... +[2024-11-08 07:19:24,233][41694] Num frames 25100... +[2024-11-08 07:19:24,429][41694] Num frames 25200... +[2024-11-08 07:19:24,635][41694] Num frames 25300... +[2024-11-08 07:19:24,791][41694] Avg episode rewards: #0: 4.540, true rewards: #0: 4.089 +[2024-11-08 07:19:24,797][41694] Avg episode reward: 4.540, avg true_objective: 4.089 +[2024-11-08 07:19:24,911][41694] Num frames 25400... +[2024-11-08 07:19:25,088][41694] Num frames 25500... +[2024-11-08 07:19:25,263][41694] Num frames 25600... +[2024-11-08 07:19:25,449][41694] Num frames 25700... +[2024-11-08 07:19:25,691][41694] Avg episode rewards: #0: 4.555, true rewards: #0: 4.095 +[2024-11-08 07:19:25,694][41694] Avg episode reward: 4.555, avg true_objective: 4.095 +[2024-11-08 07:19:25,710][41694] Num frames 25800... +[2024-11-08 07:19:25,916][41694] Num frames 25900... +[2024-11-08 07:19:26,104][41694] Num frames 26000... +[2024-11-08 07:19:26,289][41694] Num frames 26100... +[2024-11-08 07:19:26,583][41694] Avg episode rewards: #0: 4.544, true rewards: #0: 4.091 +[2024-11-08 07:19:26,587][41694] Avg episode reward: 4.544, avg true_objective: 4.091 +[2024-11-08 07:19:26,652][41694] Num frames 26200... +[2024-11-08 07:19:26,835][41694] Num frames 26300... +[2024-11-08 07:19:27,010][41694] Num frames 26400... +[2024-11-08 07:19:27,195][41694] Num frames 26500... +[2024-11-08 07:19:27,395][41694] Num frames 26600... +[2024-11-08 07:19:27,636][41694] Avg episode rewards: #0: 4.584, true rewards: #0: 4.107 +[2024-11-08 07:19:27,639][41694] Avg episode reward: 4.584, avg true_objective: 4.107 +[2024-11-08 07:19:27,669][41694] Num frames 26700... +[2024-11-08 07:19:27,865][41694] Num frames 26800... +[2024-11-08 07:19:28,066][41694] Num frames 26900... +[2024-11-08 07:19:28,260][41694] Num frames 27000... +[2024-11-08 07:19:28,467][41694] Avg episode rewards: #0: 4.572, true rewards: #0: 4.103 +[2024-11-08 07:19:28,471][41694] Avg episode reward: 4.572, avg true_objective: 4.103 +[2024-11-08 07:19:28,540][41694] Num frames 27100... +[2024-11-08 07:19:28,730][41694] Num frames 27200... +[2024-11-08 07:19:28,926][41694] Num frames 27300... +[2024-11-08 07:19:29,146][41694] Num frames 27400... +[2024-11-08 07:19:29,336][41694] Avg episode rewards: #0: 4.561, true rewards: #0: 4.099 +[2024-11-08 07:19:29,342][41694] Avg episode reward: 4.561, avg true_objective: 4.099 +[2024-11-08 07:19:29,444][41694] Num frames 27500... +[2024-11-08 07:19:29,643][41694] Num frames 27600... +[2024-11-08 07:19:29,850][41694] Num frames 27700... +[2024-11-08 07:19:30,071][41694] Num frames 27800... +[2024-11-08 07:19:30,280][41694] Num frames 27900... +[2024-11-08 07:19:30,362][41694] Avg episode rewards: #0: 4.575, true rewards: #0: 4.104 +[2024-11-08 07:19:30,366][41694] Avg episode reward: 4.575, avg true_objective: 4.104 +[2024-11-08 07:19:30,583][41694] Num frames 28000... +[2024-11-08 07:19:30,786][41694] Num frames 28100... +[2024-11-08 07:19:31,000][41694] Num frames 28200... +[2024-11-08 07:19:31,238][41694] Num frames 28300... +[2024-11-08 07:19:31,416][41694] Avg episode rewards: #0: 4.588, true rewards: #0: 4.110 +[2024-11-08 07:19:31,421][41694] Avg episode reward: 4.588, avg true_objective: 4.110 +[2024-11-08 07:19:31,527][41694] Num frames 28400... +[2024-11-08 07:19:31,735][41694] Num frames 28500... +[2024-11-08 07:19:31,945][41694] Num frames 28600... +[2024-11-08 07:19:32,150][41694] Num frames 28700... +[2024-11-08 07:19:32,374][41694] Num frames 28800... +[2024-11-08 07:19:32,447][41694] Avg episode rewards: #0: 4.601, true rewards: #0: 4.115 +[2024-11-08 07:19:32,452][41694] Avg episode reward: 4.601, avg true_objective: 4.115 +[2024-11-08 07:19:32,674][41694] Num frames 28900... +[2024-11-08 07:19:32,891][41694] Num frames 29000... +[2024-11-08 07:19:33,569][41694] Num frames 29100... +[2024-11-08 07:19:33,831][41694] Avg episode rewards: #0: 4.590, true rewards: #0: 4.111 +[2024-11-08 07:19:33,835][41694] Avg episode reward: 4.590, avg true_objective: 4.111 +[2024-11-08 07:19:33,879][41694] Num frames 29200... +[2024-11-08 07:19:34,101][41694] Num frames 29300... +[2024-11-08 07:19:34,294][41694] Num frames 29400... +[2024-11-08 07:19:34,489][41694] Num frames 29500... +[2024-11-08 07:19:34,692][41694] Avg episode rewards: #0: 4.580, true rewards: #0: 4.107 +[2024-11-08 07:19:34,696][41694] Avg episode reward: 4.580, avg true_objective: 4.107 +[2024-11-08 07:19:34,765][41694] Num frames 29600... +[2024-11-08 07:19:34,980][41694] Num frames 29700... +[2024-11-08 07:19:35,206][41694] Num frames 29800... +[2024-11-08 07:19:35,421][41694] Num frames 29900... +[2024-11-08 07:19:35,594][41694] Avg episode rewards: #0: 4.569, true rewards: #0: 4.104 +[2024-11-08 07:19:35,598][41694] Avg episode reward: 4.569, avg true_objective: 4.104 +[2024-11-08 07:19:35,794][41694] Num frames 30000... +[2024-11-08 07:19:36,015][41694] Num frames 30100... +[2024-11-08 07:19:36,228][41694] Num frames 30200... +[2024-11-08 07:19:36,438][41694] Num frames 30300... +[2024-11-08 07:19:36,590][41694] Avg episode rewards: #0: 4.560, true rewards: #0: 4.100 +[2024-11-08 07:19:36,595][41694] Avg episode reward: 4.560, avg true_objective: 4.100 +[2024-11-08 07:19:36,741][41694] Num frames 30400... +[2024-11-08 07:19:36,969][41694] Num frames 30500... +[2024-11-08 07:19:37,189][41694] Num frames 30600... +[2024-11-08 07:19:37,394][41694] Num frames 30700... +[2024-11-08 07:19:37,563][41694] Avg episode rewards: #0: 4.554, true rewards: #0: 4.101 +[2024-11-08 07:19:37,569][41694] Avg episode reward: 4.554, avg true_objective: 4.101 +[2024-11-08 07:19:37,673][41694] Num frames 30800... +[2024-11-08 07:19:37,890][41694] Num frames 30900... +[2024-11-08 07:19:38,094][41694] Num frames 31000... +[2024-11-08 07:19:38,298][41694] Num frames 31100... +[2024-11-08 07:19:38,442][41694] Avg episode rewards: #0: 4.545, true rewards: #0: 4.097 +[2024-11-08 07:19:38,447][41694] Avg episode reward: 4.545, avg true_objective: 4.097 +[2024-11-08 07:19:38,582][41694] Num frames 31200... +[2024-11-08 07:19:38,786][41694] Num frames 31300... +[2024-11-08 07:19:38,994][41694] Num frames 31400... +[2024-11-08 07:19:39,192][41694] Num frames 31500... +[2024-11-08 07:19:39,302][41694] Avg episode rewards: #0: 4.536, true rewards: #0: 4.094 +[2024-11-08 07:19:39,308][41694] Avg episode reward: 4.536, avg true_objective: 4.094 +[2024-11-08 07:19:39,468][41694] Num frames 31600... +[2024-11-08 07:19:39,645][41694] Num frames 31700... +[2024-11-08 07:19:39,872][41694] Num frames 31800... +[2024-11-08 07:19:40,056][41694] Num frames 31900... +[2024-11-08 07:19:40,133][41694] Avg episode rewards: #0: 4.527, true rewards: #0: 4.091 +[2024-11-08 07:19:40,137][41694] Avg episode reward: 4.527, avg true_objective: 4.091 +[2024-11-08 07:19:40,319][41694] Num frames 32000... +[2024-11-08 07:19:40,524][41694] Num frames 32100... +[2024-11-08 07:19:40,727][41694] Num frames 32200... +[2024-11-08 07:19:40,957][41694] Avg episode rewards: #0: 4.518, true rewards: #0: 4.088 +[2024-11-08 07:19:40,958][41694] Avg episode reward: 4.518, avg true_objective: 4.088 +[2024-11-08 07:19:40,978][41694] Num frames 32300... +[2024-11-08 07:19:41,168][41694] Num frames 32400... +[2024-11-08 07:19:41,347][41694] Num frames 32500... +[2024-11-08 07:19:41,546][41694] Avg episode rewards: #0: 4.510, true rewards: #0: 4.073 +[2024-11-08 07:19:41,549][41694] Avg episode reward: 4.510, avg true_objective: 4.073 +[2024-11-08 07:19:41,610][41694] Num frames 32600... +[2024-11-08 07:19:41,838][41694] Num frames 32700... +[2024-11-08 07:19:42,019][41694] Num frames 32800... +[2024-11-08 07:19:42,208][41694] Num frames 32900... +[2024-11-08 07:19:42,385][41694] Avg episode rewards: #0: 4.502, true rewards: #0: 4.070 +[2024-11-08 07:19:42,386][41694] Avg episode reward: 4.502, avg true_objective: 4.070 +[2024-11-08 07:19:42,470][41694] Num frames 33000... +[2024-11-08 07:19:42,656][41694] Num frames 33100... +[2024-11-08 07:19:42,836][41694] Num frames 33200... +[2024-11-08 07:19:43,019][41694] Num frames 33300... +[2024-11-08 07:19:43,165][41694] Avg episode rewards: #0: 4.494, true rewards: #0: 4.067 +[2024-11-08 07:19:43,167][41694] Avg episode reward: 4.494, avg true_objective: 4.067 +[2024-11-08 07:19:43,260][41694] Num frames 33400... +[2024-11-08 07:19:43,446][41694] Num frames 33500... +[2024-11-08 07:19:43,632][41694] Num frames 33600... +[2024-11-08 07:19:43,844][41694] Num frames 33700... +[2024-11-08 07:19:44,035][41694] Avg episode rewards: #0: 4.502, true rewards: #0: 4.068 +[2024-11-08 07:19:44,038][41694] Avg episode reward: 4.502, avg true_objective: 4.068 +[2024-11-08 07:19:44,118][41694] Num frames 33800... +[2024-11-08 07:19:44,323][41694] Num frames 33900... +[2024-11-08 07:19:44,526][41694] Num frames 34000... +[2024-11-08 07:19:44,733][41694] Num frames 34100... +[2024-11-08 07:19:44,943][41694] Num frames 34200... +[2024-11-08 07:19:45,031][41694] Avg episode rewards: #0: 4.513, true rewards: #0: 4.073 +[2024-11-08 07:19:45,035][41694] Avg episode reward: 4.513, avg true_objective: 4.073 +[2024-11-08 07:19:45,217][41694] Num frames 34300... +[2024-11-08 07:19:45,408][41694] Num frames 34400... +[2024-11-08 07:19:45,613][41694] Num frames 34500... +[2024-11-08 07:19:45,811][41694] Num frames 34600... +[2024-11-08 07:19:45,930][41694] Avg episode rewards: #0: 4.521, true rewards: #0: 4.074 +[2024-11-08 07:19:45,935][41694] Avg episode reward: 4.521, avg true_objective: 4.074 +[2024-11-08 07:19:46,110][41694] Num frames 34700... +[2024-11-08 07:19:46,332][41694] Num frames 34800... +[2024-11-08 07:19:46,545][41694] Num frames 34900... +[2024-11-08 07:19:46,762][41694] Num frames 35000... +[2024-11-08 07:19:46,855][41694] Avg episode rewards: #0: 4.513, true rewards: #0: 4.071 +[2024-11-08 07:19:46,856][41694] Avg episode reward: 4.513, avg true_objective: 4.071 +[2024-11-08 07:19:47,048][41694] Num frames 35100... +[2024-11-08 07:19:47,288][41694] Num frames 35200... +[2024-11-08 07:19:47,519][41694] Num frames 35300... +[2024-11-08 07:19:47,795][41694] Avg episode rewards: #0: 4.505, true rewards: #0: 4.069 +[2024-11-08 07:19:47,797][41694] Avg episode reward: 4.505, avg true_objective: 4.069 +[2024-11-08 07:19:47,805][41694] Num frames 35400... +[2024-11-08 07:19:48,035][41694] Num frames 35500... +[2024-11-08 07:19:48,258][41694] Num frames 35600... +[2024-11-08 07:19:48,484][41694] Num frames 35700... +[2024-11-08 07:19:48,712][41694] Avg episode rewards: #0: 4.498, true rewards: #0: 4.066 +[2024-11-08 07:19:48,714][41694] Avg episode reward: 4.498, avg true_objective: 4.066 +[2024-11-08 07:19:48,781][41694] Num frames 35800... +[2024-11-08 07:19:48,983][41694] Num frames 35900... +[2024-11-08 07:19:49,162][41694] Num frames 36000... +[2024-11-08 07:19:49,344][41694] Num frames 36100... +[2024-11-08 07:19:49,535][41694] Num frames 36200... +[2024-11-08 07:19:49,716][41694] Num frames 36300... +[2024-11-08 07:19:49,952][41694] Avg episode rewards: #0: 4.549, true rewards: #0: 4.089 +[2024-11-08 07:19:49,954][41694] Avg episode reward: 4.549, avg true_objective: 4.089 +[2024-11-08 07:19:49,980][41694] Num frames 36400... +[2024-11-08 07:19:50,203][41694] Num frames 36500... +[2024-11-08 07:19:50,413][41694] Num frames 36600... +[2024-11-08 07:19:50,625][41694] Num frames 36700... +[2024-11-08 07:19:50,817][41694] Num frames 36800... +[2024-11-08 07:19:50,950][41694] Avg episode rewards: #0: 4.560, true rewards: #0: 4.093 +[2024-11-08 07:19:50,956][41694] Avg episode reward: 4.560, avg true_objective: 4.093 +[2024-11-08 07:19:51,079][41694] Num frames 36900... +[2024-11-08 07:19:51,257][41694] Num frames 37000... +[2024-11-08 07:19:51,447][41694] Num frames 37100... +[2024-11-08 07:19:51,625][41694] Num frames 37200... +[2024-11-08 07:19:51,727][41694] Avg episode rewards: #0: 4.552, true rewards: #0: 4.090 +[2024-11-08 07:19:51,731][41694] Avg episode reward: 4.552, avg true_objective: 4.090 +[2024-11-08 07:19:51,902][41694] Num frames 37300... +[2024-11-08 07:19:52,091][41694] Num frames 37400... +[2024-11-08 07:19:52,282][41694] Num frames 37500... +[2024-11-08 07:19:52,478][41694] Num frames 37600... +[2024-11-08 07:19:52,547][41694] Avg episode rewards: #0: 4.544, true rewards: #0: 4.087 +[2024-11-08 07:19:52,550][41694] Avg episode reward: 4.544, avg true_objective: 4.087 +[2024-11-08 07:19:52,752][41694] Num frames 37700... +[2024-11-08 07:19:52,951][41694] Num frames 37800... +[2024-11-08 07:19:53,127][41694] Avg episode rewards: #0: 4.523, true rewards: #0: 4.071 +[2024-11-08 07:19:53,134][41694] Avg episode reward: 4.523, avg true_objective: 4.071 +[2024-11-08 07:19:53,231][41694] Num frames 37900... +[2024-11-08 07:19:53,441][41694] Num frames 38000... +[2024-11-08 07:19:53,633][41694] Num frames 38100... +[2024-11-08 07:19:53,831][41694] Num frames 38200... +[2024-11-08 07:19:53,976][41694] Avg episode rewards: #0: 4.515, true rewards: #0: 4.069 +[2024-11-08 07:19:53,980][41694] Avg episode reward: 4.515, avg true_objective: 4.069 +[2024-11-08 07:19:54,100][41694] Num frames 38300... +[2024-11-08 07:19:54,292][41694] Num frames 38400... +[2024-11-08 07:19:54,489][41694] Num frames 38500... +[2024-11-08 07:19:54,689][41694] Num frames 38600... +[2024-11-08 07:19:54,808][41694] Avg episode rewards: #0: 4.508, true rewards: #0: 4.066 +[2024-11-08 07:19:54,814][41694] Avg episode reward: 4.508, avg true_objective: 4.066 +[2024-11-08 07:19:54,967][41694] Num frames 38700... +[2024-11-08 07:19:55,154][41694] Num frames 38800... +[2024-11-08 07:19:55,339][41694] Num frames 38900... +[2024-11-08 07:19:55,528][41694] Num frames 39000... +[2024-11-08 07:19:55,622][41694] Avg episode rewards: #0: 4.501, true rewards: #0: 4.064 +[2024-11-08 07:19:55,625][41694] Avg episode reward: 4.501, avg true_objective: 4.064 +[2024-11-08 07:19:55,814][41694] Num frames 39100... +[2024-11-08 07:19:56,001][41694] Num frames 39200... +[2024-11-08 07:19:56,196][41694] Avg episode rewards: #0: 4.481, true rewards: #0: 4.048 +[2024-11-08 07:19:56,201][41694] Avg episode reward: 4.481, avg true_objective: 4.048 +[2024-11-08 07:19:56,287][41694] Num frames 39300... +[2024-11-08 07:19:56,472][41694] Num frames 39400... +[2024-11-08 07:19:56,676][41694] Num frames 39500... +[2024-11-08 07:19:56,878][41694] Num frames 39600... +[2024-11-08 07:19:57,041][41694] Avg episode rewards: #0: 4.475, true rewards: #0: 4.046 +[2024-11-08 07:19:57,043][41694] Avg episode reward: 4.475, avg true_objective: 4.046 +[2024-11-08 07:19:57,161][41694] Num frames 39700... +[2024-11-08 07:19:57,353][41694] Num frames 39800... +[2024-11-08 07:19:57,543][41694] Num frames 39900... +[2024-11-08 07:19:57,740][41694] Num frames 40000... +[2024-11-08 07:19:57,937][41694] Num frames 40100... +[2024-11-08 07:19:57,997][41694] Avg episode rewards: #0: 4.485, true rewards: #0: 4.051 +[2024-11-08 07:19:58,000][41694] Avg episode reward: 4.485, avg true_objective: 4.051 +[2024-11-08 07:19:58,176][41694] Num frames 40200... +[2024-11-08 07:19:58,351][41694] Num frames 40300... +[2024-11-08 07:19:58,534][41694] Num frames 40400... +[2024-11-08 07:19:58,743][41694] Avg episode rewards: #0: 4.479, true rewards: #0: 4.048 +[2024-11-08 07:19:58,745][41694] Avg episode reward: 4.479, avg true_objective: 4.048 +[2024-11-08 07:21:36,776][41694] Replay video saved to /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/replay.mp4! +[2024-11-08 07:21:37,442][41694] Loading existing experiment configuration from /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/config.json +[2024-11-08 07:21:37,444][41694] Overriding arg 'num_workers' with value 1 passed from command line +[2024-11-08 07:21:37,445][41694] Adding new argument 'no_render'=True that is not in the saved config file! +[2024-11-08 07:21:37,447][41694] Adding new argument 'save_video'=True that is not in the saved config file! +[2024-11-08 07:21:37,449][41694] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! +[2024-11-08 07:21:37,452][41694] Adding new argument 'video_name'=None that is not in the saved config file! +[2024-11-08 07:21:37,455][41694] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! +[2024-11-08 07:21:37,456][41694] Adding new argument 'max_num_episodes'=100 that is not in the saved config file! +[2024-11-08 07:21:37,459][41694] Adding new argument 'push_to_hub'=True that is not in the saved config file! +[2024-11-08 07:21:37,461][41694] Adding new argument 'hf_repository'='alidenewade/rl_course_vizdoom_health_gathering_supreme-alid' that is not in the saved config file! +[2024-11-08 07:21:37,462][41694] Adding new argument 'policy_index'=0 that is not in the saved config file! +[2024-11-08 07:21:37,463][41694] Adding new argument 'eval_deterministic'=False that is not in the saved config file! +[2024-11-08 07:21:37,466][41694] Adding new argument 'train_script'=None that is not in the saved config file! +[2024-11-08 07:21:37,468][41694] Adding new argument 'enjoy_script'=None that is not in the saved config file! +[2024-11-08 07:21:37,470][41694] Using frameskip 1 and render_action_repeat=4 for evaluation +[2024-11-08 07:21:37,513][41694] RunningMeanStd input shape: (3, 72, 128) +[2024-11-08 07:21:37,520][41694] RunningMeanStd input shape: (1,) +[2024-11-08 07:21:37,560][41694] ConvEncoder: input_channels=3 +[2024-11-08 07:21:37,628][41694] Conv encoder output size: 512 +[2024-11-08 07:21:37,630][41694] Policy head output size: 512 +[2024-11-08 07:21:37,676][41694] Loading state from checkpoint /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/checkpoint_p0/checkpoint_000052658_215687168.pth... +[2024-11-08 07:21:38,452][41694] Num frames 100... +[2024-11-08 07:21:38,676][41694] Num frames 200... +[2024-11-08 07:21:38,899][41694] Num frames 300... +[2024-11-08 07:21:39,174][41694] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840 +[2024-11-08 07:21:39,176][41694] Avg episode reward: 3.840, avg true_objective: 3.840 +[2024-11-08 07:21:39,263][41694] Num frames 400... +[2024-11-08 07:21:39,530][41694] Num frames 500... +[2024-11-08 07:21:39,729][41694] Num frames 600... +[2024-11-08 07:21:39,952][41694] Num frames 700... +[2024-11-08 07:21:40,170][41694] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840 +[2024-11-08 07:21:40,173][41694] Avg episode reward: 3.840, avg true_objective: 3.840 +[2024-11-08 07:21:40,248][41694] Num frames 800... +[2024-11-08 07:21:40,446][41694] Num frames 900... +[2024-11-08 07:21:40,670][41694] Num frames 1000... +[2024-11-08 07:21:40,911][41694] Num frames 1100... +[2024-11-08 07:21:41,130][41694] Num frames 1200... +[2024-11-08 07:21:41,226][41694] Avg episode rewards: #0: 4.387, true rewards: #0: 4.053 +[2024-11-08 07:21:41,228][41694] Avg episode reward: 4.387, avg true_objective: 4.053 +[2024-11-08 07:21:41,410][41694] Num frames 1300... +[2024-11-08 07:21:41,624][41694] Num frames 1400... +[2024-11-08 07:21:41,833][41694] Num frames 1500... +[2024-11-08 07:21:42,056][41694] Num frames 1600... +[2024-11-08 07:21:42,259][41694] Avg episode rewards: #0: 4.660, true rewards: #0: 4.160 +[2024-11-08 07:21:42,262][41694] Avg episode reward: 4.660, avg true_objective: 4.160 +[2024-11-08 07:21:42,347][41694] Num frames 1700... +[2024-11-08 07:21:42,548][41694] Num frames 1800... +[2024-11-08 07:21:42,752][41694] Num frames 1900... +[2024-11-08 07:21:42,949][41694] Num frames 2000... +[2024-11-08 07:21:43,101][41694] Avg episode rewards: #0: 4.496, true rewards: #0: 4.096 +[2024-11-08 07:21:43,102][41694] Avg episode reward: 4.496, avg true_objective: 4.096 +[2024-11-08 07:21:43,203][41694] Num frames 2100... +[2024-11-08 07:21:43,409][41694] Num frames 2200... +[2024-11-08 07:21:43,620][41694] Num frames 2300... +[2024-11-08 07:21:43,816][41694] Num frames 2400... +[2024-11-08 07:21:43,935][41694] Avg episode rewards: #0: 4.387, true rewards: #0: 4.053 +[2024-11-08 07:21:43,937][41694] Avg episode reward: 4.387, avg true_objective: 4.053 +[2024-11-08 07:21:44,528][41694] Num frames 2500... +[2024-11-08 07:21:44,728][41694] Num frames 2600... +[2024-11-08 07:21:44,932][41694] Num frames 2700... +[2024-11-08 07:21:45,128][41694] Num frames 2800... +[2024-11-08 07:21:45,215][41694] Avg episode rewards: #0: 4.309, true rewards: #0: 4.023 +[2024-11-08 07:21:45,218][41694] Avg episode reward: 4.309, avg true_objective: 4.023 +[2024-11-08 07:21:45,406][41694] Num frames 2900... +[2024-11-08 07:21:45,587][41694] Num frames 3000... +[2024-11-08 07:21:45,796][41694] Num frames 3100... +[2024-11-08 07:21:45,993][41694] Num frames 3200... +[2024-11-08 07:21:46,187][41694] Avg episode rewards: #0: 4.455, true rewards: #0: 4.080 +[2024-11-08 07:21:46,190][41694] Avg episode reward: 4.455, avg true_objective: 4.080 +[2024-11-08 07:21:46,294][41694] Num frames 3300... +[2024-11-08 07:21:46,527][41694] Num frames 3400... +[2024-11-08 07:21:46,764][41694] Num frames 3500... +[2024-11-08 07:21:46,964][41694] Num frames 3600... +[2024-11-08 07:21:47,111][41694] Avg episode rewards: #0: 4.387, true rewards: #0: 4.053 +[2024-11-08 07:21:47,115][41694] Avg episode reward: 4.387, avg true_objective: 4.053 +[2024-11-08 07:21:47,240][41694] Num frames 3700... +[2024-11-08 07:21:47,433][41694] Num frames 3800... +[2024-11-08 07:21:47,645][41694] Num frames 3900... +[2024-11-08 07:21:47,849][41694] Num frames 4000... +[2024-11-08 07:21:47,971][41694] Avg episode rewards: #0: 4.332, true rewards: #0: 4.032 +[2024-11-08 07:21:47,974][41694] Avg episode reward: 4.332, avg true_objective: 4.032 +[2024-11-08 07:21:48,125][41694] Num frames 4100... +[2024-11-08 07:21:48,320][41694] Num frames 4200... +[2024-11-08 07:21:48,510][41694] Num frames 4300... +[2024-11-08 07:21:48,714][41694] Num frames 4400... +[2024-11-08 07:21:48,806][41694] Avg episode rewards: #0: 4.287, true rewards: #0: 4.015 +[2024-11-08 07:21:48,810][41694] Avg episode reward: 4.287, avg true_objective: 4.015 +[2024-11-08 07:21:49,001][41694] Num frames 4500... +[2024-11-08 07:21:49,252][41694] Num frames 4600... +[2024-11-08 07:21:49,449][41694] Num frames 4700... +[2024-11-08 07:21:49,648][41694] Num frames 4800... +[2024-11-08 07:21:49,789][41694] Avg episode rewards: #0: 4.360, true rewards: #0: 4.027 +[2024-11-08 07:21:49,796][41694] Avg episode reward: 4.360, avg true_objective: 4.027 +[2024-11-08 07:21:49,945][41694] Num frames 4900... +[2024-11-08 07:21:50,138][41694] Num frames 5000... +[2024-11-08 07:21:50,345][41694] Num frames 5100... +[2024-11-08 07:21:50,554][41694] Num frames 5200... +[2024-11-08 07:21:50,643][41694] Avg episode rewards: #0: 4.320, true rewards: #0: 4.012 +[2024-11-08 07:21:50,647][41694] Avg episode reward: 4.320, avg true_objective: 4.012 +[2024-11-08 07:21:50,867][41694] Num frames 5300... +[2024-11-08 07:21:51,061][41694] Num frames 5400... +[2024-11-08 07:21:51,245][41694] Num frames 5500... +[2024-11-08 07:21:51,430][41694] Num frames 5600... +[2024-11-08 07:21:51,483][41694] Avg episode rewards: #0: 4.286, true rewards: #0: 4.000 +[2024-11-08 07:21:51,486][41694] Avg episode reward: 4.286, avg true_objective: 4.000 +[2024-11-08 07:21:51,688][41694] Num frames 5700... +[2024-11-08 07:21:51,883][41694] Num frames 5800... +[2024-11-08 07:21:52,089][41694] Num frames 5900... +[2024-11-08 07:21:52,299][41694] Avg episode rewards: #0: 4.256, true rewards: #0: 3.989 +[2024-11-08 07:21:52,303][41694] Avg episode reward: 4.256, avg true_objective: 3.989 +[2024-11-08 07:21:52,342][41694] Num frames 6000... +[2024-11-08 07:21:52,542][41694] Num frames 6100... +[2024-11-08 07:21:52,749][41694] Num frames 6200... +[2024-11-08 07:21:52,941][41694] Num frames 6300... +[2024-11-08 07:21:53,129][41694] Avg episode rewards: #0: 4.230, true rewards: #0: 3.980 +[2024-11-08 07:21:53,133][41694] Avg episode reward: 4.230, avg true_objective: 3.980 +[2024-11-08 07:21:53,210][41694] Num frames 6400... +[2024-11-08 07:21:53,401][41694] Num frames 6500... +[2024-11-08 07:21:53,595][41694] Num frames 6600... +[2024-11-08 07:21:53,762][41694] Avg episode rewards: #0: 4.209, true rewards: #0: 3.915 +[2024-11-08 07:21:53,763][41694] Avg episode reward: 4.209, avg true_objective: 3.915 +[2024-11-08 07:21:53,868][41694] Num frames 6700... +[2024-11-08 07:21:54,073][41694] Num frames 6800... +[2024-11-08 07:21:54,283][41694] Num frames 6900... +[2024-11-08 07:21:54,480][41694] Num frames 7000... +[2024-11-08 07:21:54,625][41694] Avg episode rewards: #0: 4.189, true rewards: #0: 3.911 +[2024-11-08 07:21:54,626][41694] Avg episode reward: 4.189, avg true_objective: 3.911 +[2024-11-08 07:21:54,758][41694] Num frames 7100... +[2024-11-08 07:21:54,970][41694] Num frames 7200... +[2024-11-08 07:21:55,243][41694] Num frames 7300... +[2024-11-08 07:21:55,542][41694] Num frames 7400... +[2024-11-08 07:21:55,649][41694] Avg episode rewards: #0: 4.171, true rewards: #0: 3.907 +[2024-11-08 07:21:55,650][41694] Avg episode reward: 4.171, avg true_objective: 3.907 +[2024-11-08 07:21:55,818][41694] Num frames 7500... +[2024-11-08 07:21:56,042][41694] Num frames 7600... +[2024-11-08 07:21:56,247][41694] Num frames 7700... +[2024-11-08 07:21:56,459][41694] Num frames 7800... +[2024-11-08 07:21:56,737][41694] Avg episode rewards: #0: 4.236, true rewards: #0: 3.936 +[2024-11-08 07:21:56,744][41694] Avg episode reward: 4.236, avg true_objective: 3.936 +[2024-11-08 07:21:56,830][41694] Num frames 7900... +[2024-11-08 07:21:57,068][41694] Num frames 8000... +[2024-11-08 07:21:57,324][41694] Num frames 8100... +[2024-11-08 07:21:57,451][41694] Avg episode rewards: #0: 4.156, true rewards: #0: 3.870 +[2024-11-08 07:21:57,454][41694] Avg episode reward: 4.156, avg true_objective: 3.870 +[2024-11-08 07:21:57,631][41694] Num frames 8200... +[2024-11-08 07:21:57,923][41694] Num frames 8300... +[2024-11-08 07:21:58,163][41694] Num frames 8400... +[2024-11-08 07:21:58,432][41694] Num frames 8500... +[2024-11-08 07:21:58,665][41694] Avg episode rewards: #0: 4.216, true rewards: #0: 3.898 +[2024-11-08 07:21:58,667][41694] Avg episode reward: 4.216, avg true_objective: 3.898 +[2024-11-08 07:21:58,754][41694] Num frames 8600... +[2024-11-08 07:21:58,989][41694] Num frames 8700... +[2024-11-08 07:21:59,267][41694] Num frames 8800... +[2024-11-08 07:21:59,490][41694] Num frames 8900... +[2024-11-08 07:21:59,720][41694] Num frames 9000... +[2024-11-08 07:21:59,940][41694] Num frames 9100... +[2024-11-08 07:22:00,173][41694] Avg episode rewards: #0: 4.428, true rewards: #0: 3.993 +[2024-11-08 07:22:00,175][41694] Avg episode reward: 4.428, avg true_objective: 3.993 +[2024-11-08 07:22:00,208][41694] Num frames 9200... +[2024-11-08 07:22:00,488][41694] Num frames 9300... +[2024-11-08 07:22:00,734][41694] Num frames 9400... +[2024-11-08 07:22:01,010][41694] Num frames 9500... +[2024-11-08 07:22:01,233][41694] Avg episode rewards: #0: 4.403, true rewards: #0: 3.987 +[2024-11-08 07:22:01,235][41694] Avg episode reward: 4.403, avg true_objective: 3.987 +[2024-11-08 07:22:01,338][41694] Num frames 9600... +[2024-11-08 07:22:01,587][41694] Num frames 9700... +[2024-11-08 07:22:01,782][41694] Num frames 9800... +[2024-11-08 07:22:01,992][41694] Num frames 9900... +[2024-11-08 07:22:02,216][41694] Num frames 10000... +[2024-11-08 07:22:02,307][41694] Avg episode rewards: #0: 4.446, true rewards: #0: 4.006 +[2024-11-08 07:22:02,310][41694] Avg episode reward: 4.446, avg true_objective: 4.006 +[2024-11-08 07:22:02,477][41694] Num frames 10100... +[2024-11-08 07:22:02,676][41694] Num frames 10200... +[2024-11-08 07:22:02,894][41694] Num frames 10300... +[2024-11-08 07:22:03,090][41694] Num frames 10400... +[2024-11-08 07:22:03,287][41694] Avg episode rewards: #0: 4.486, true rewards: #0: 4.025 +[2024-11-08 07:22:03,293][41694] Avg episode reward: 4.486, avg true_objective: 4.025 +[2024-11-08 07:22:03,470][41694] Num frames 10500... +[2024-11-08 07:22:03,679][41694] Num frames 10600... +[2024-11-08 07:22:03,898][41694] Num frames 10700... +[2024-11-08 07:22:04,114][41694] Num frames 10800... +[2024-11-08 07:22:04,332][41694] Num frames 10900... +[2024-11-08 07:22:04,418][41694] Avg episode rewards: #0: 4.523, true rewards: #0: 4.041 +[2024-11-08 07:22:04,424][41694] Avg episode reward: 4.523, avg true_objective: 4.041 +[2024-11-08 07:22:04,610][41694] Num frames 11000... +[2024-11-08 07:22:04,816][41694] Num frames 11100... +[2024-11-08 07:22:05,037][41694] Num frames 11200... +[2024-11-08 07:22:05,248][41694] Num frames 11300... +[2024-11-08 07:22:05,469][41694] Num frames 11400... +[2024-11-08 07:22:05,647][41694] Avg episode rewards: #0: 4.627, true rewards: #0: 4.091 +[2024-11-08 07:22:05,648][41694] Avg episode reward: 4.627, avg true_objective: 4.091 +[2024-11-08 07:22:05,752][41694] Num frames 11500... +[2024-11-08 07:22:05,964][41694] Num frames 11600... +[2024-11-08 07:22:06,174][41694] Num frames 11700... +[2024-11-08 07:22:06,381][41694] Num frames 11800... +[2024-11-08 07:22:06,533][41694] Avg episode rewards: #0: 4.600, true rewards: #0: 4.083 +[2024-11-08 07:22:06,537][41694] Avg episode reward: 4.600, avg true_objective: 4.083 +[2024-11-08 07:22:06,681][41694] Num frames 11900... +[2024-11-08 07:22:06,890][41694] Num frames 12000... +[2024-11-08 07:22:07,075][41694] Num frames 12100... +[2024-11-08 07:22:07,266][41694] Num frames 12200... +[2024-11-08 07:22:07,386][41694] Avg episode rewards: #0: 4.575, true rewards: #0: 4.075 +[2024-11-08 07:22:07,387][41694] Avg episode reward: 4.575, avg true_objective: 4.075 +[2024-11-08 07:22:07,589][41694] Num frames 12300... +[2024-11-08 07:22:07,789][41694] Num frames 12400... +[2024-11-08 07:22:07,985][41694] Num frames 12500... +[2024-11-08 07:22:08,178][41694] Num frames 12600... +[2024-11-08 07:22:08,250][41694] Avg episode rewards: #0: 4.551, true rewards: #0: 4.067 +[2024-11-08 07:22:08,254][41694] Avg episode reward: 4.551, avg true_objective: 4.067 +[2024-11-08 07:22:08,483][41694] Num frames 12700... +[2024-11-08 07:22:08,700][41694] Num frames 12800... +[2024-11-08 07:22:08,972][41694] Num frames 12900... +[2024-11-08 07:22:09,247][41694] Avg episode rewards: #0: 4.529, true rewards: #0: 4.060 +[2024-11-08 07:22:09,249][41694] Avg episode reward: 4.529, avg true_objective: 4.060 +[2024-11-08 07:22:09,269][41694] Num frames 13000... +[2024-11-08 07:22:09,485][41694] Num frames 13100... +[2024-11-08 07:22:09,683][41694] Num frames 13200... +[2024-11-08 07:22:09,911][41694] Num frames 13300... +[2024-11-08 07:22:10,122][41694] Num frames 13400... +[2024-11-08 07:22:10,198][41694] Avg episode rewards: #0: 4.548, true rewards: #0: 4.063 +[2024-11-08 07:22:10,202][41694] Avg episode reward: 4.548, avg true_objective: 4.063 +[2024-11-08 07:22:10,404][41694] Num frames 13500... +[2024-11-08 07:22:10,612][41694] Num frames 13600... +[2024-11-08 07:22:10,823][41694] Num frames 13700... +[2024-11-08 07:22:11,086][41694] Avg episode rewards: #0: 4.527, true rewards: #0: 4.056 +[2024-11-08 07:22:11,090][41694] Avg episode reward: 4.527, avg true_objective: 4.056 +[2024-11-08 07:22:11,129][41694] Num frames 13800... +[2024-11-08 07:22:11,354][41694] Num frames 13900... +[2024-11-08 07:22:11,666][41694] Num frames 14000... +[2024-11-08 07:22:11,907][41694] Num frames 14100... +[2024-11-08 07:22:12,119][41694] Avg episode rewards: #0: 4.507, true rewards: #0: 4.050 +[2024-11-08 07:22:12,122][41694] Avg episode reward: 4.507, avg true_objective: 4.050 +[2024-11-08 07:22:12,211][41694] Num frames 14200... +[2024-11-08 07:22:12,502][41694] Num frames 14300... +[2024-11-08 07:22:12,843][41694] Num frames 14400... +[2024-11-08 07:22:13,074][41694] Num frames 14500... +[2024-11-08 07:22:13,293][41694] Num frames 14600... +[2024-11-08 07:22:13,504][41694] Num frames 14700... +[2024-11-08 07:22:13,609][41694] Avg episode rewards: #0: 4.589, true rewards: #0: 4.089 +[2024-11-08 07:22:13,613][41694] Avg episode reward: 4.589, avg true_objective: 4.089 +[2024-11-08 07:22:13,817][41694] Num frames 14800... +[2024-11-08 07:22:14,076][41694] Num frames 14900... +[2024-11-08 07:22:14,290][41694] Num frames 15000... +[2024-11-08 07:22:14,514][41694] Num frames 15100... +[2024-11-08 07:22:14,711][41694] Avg episode rewards: #0: 4.613, true rewards: #0: 4.099 +[2024-11-08 07:22:14,715][41694] Avg episode reward: 4.613, avg true_objective: 4.099 +[2024-11-08 07:22:14,798][41694] Num frames 15200... +[2024-11-08 07:22:15,026][41694] Num frames 15300... +[2024-11-08 07:22:15,231][41694] Num frames 15400... +[2024-11-08 07:22:15,414][41694] Num frames 15500... +[2024-11-08 07:22:15,564][41694] Avg episode rewards: #0: 4.593, true rewards: #0: 4.093 +[2024-11-08 07:22:15,570][41694] Avg episode reward: 4.593, avg true_objective: 4.093 +[2024-11-08 07:22:15,676][41694] Num frames 15600... +[2024-11-08 07:22:15,862][41694] Num frames 15700... +[2024-11-08 07:22:16,058][41694] Num frames 15800... +[2024-11-08 07:22:16,255][41694] Num frames 15900... +[2024-11-08 07:22:16,390][41694] Avg episode rewards: #0: 4.573, true rewards: #0: 4.086 +[2024-11-08 07:22:16,395][41694] Avg episode reward: 4.573, avg true_objective: 4.086 +[2024-11-08 07:22:16,569][41694] Num frames 16000... +[2024-11-08 07:22:17,223][41694] Num frames 16100... +[2024-11-08 07:22:17,465][41694] Num frames 16200... +[2024-11-08 07:22:17,700][41694] Num frames 16300... +[2024-11-08 07:22:17,799][41694] Avg episode rewards: #0: 4.555, true rewards: #0: 4.080 +[2024-11-08 07:22:17,801][41694] Avg episode reward: 4.555, avg true_objective: 4.080 +[2024-11-08 07:22:17,993][41694] Num frames 16400... +[2024-11-08 07:22:18,195][41694] Num frames 16500... +[2024-11-08 07:22:18,446][41694] Num frames 16600... +[2024-11-08 07:22:18,668][41694] Num frames 16700... +[2024-11-08 07:22:18,897][41694] Avg episode rewards: #0: 4.578, true rewards: #0: 4.090 +[2024-11-08 07:22:18,901][41694] Avg episode reward: 4.578, avg true_objective: 4.090 +[2024-11-08 07:22:19,000][41694] Num frames 16800... +[2024-11-08 07:22:19,242][41694] Num frames 16900... +[2024-11-08 07:22:19,479][41694] Num frames 17000... +[2024-11-08 07:22:19,588][41694] Avg episode rewards: #0: 4.530, true rewards: #0: 4.053 +[2024-11-08 07:22:19,589][41694] Avg episode reward: 4.530, avg true_objective: 4.053 +[2024-11-08 07:22:19,759][41694] Num frames 17100... +[2024-11-08 07:22:19,985][41694] Num frames 17200... +[2024-11-08 07:22:20,201][41694] Num frames 17300... +[2024-11-08 07:22:20,417][41694] Num frames 17400... +[2024-11-08 07:22:20,500][41694] Avg episode rewards: #0: 4.513, true rewards: #0: 4.048 +[2024-11-08 07:22:20,505][41694] Avg episode reward: 4.513, avg true_objective: 4.048 +[2024-11-08 07:22:20,723][41694] Num frames 17500... +[2024-11-08 07:22:20,945][41694] Num frames 17600... +[2024-11-08 07:22:21,235][41694] Num frames 17700... +[2024-11-08 07:22:21,449][41694] Num frames 17800... +[2024-11-08 07:22:21,645][41694] Avg episode rewards: #0: 4.535, true rewards: #0: 4.058 +[2024-11-08 07:22:21,650][41694] Avg episode reward: 4.535, avg true_objective: 4.058 +[2024-11-08 07:22:21,787][41694] Num frames 17900... +[2024-11-08 07:22:22,074][41694] Num frames 18000... +[2024-11-08 07:22:22,338][41694] Num frames 18100... +[2024-11-08 07:22:22,581][41694] Num frames 18200... +[2024-11-08 07:22:22,802][41694] Avg episode rewards: #0: 4.549, true rewards: #0: 4.060 +[2024-11-08 07:22:22,806][41694] Avg episode reward: 4.549, avg true_objective: 4.060 +[2024-11-08 07:22:22,888][41694] Num frames 18300... +[2024-11-08 07:22:23,117][41694] Num frames 18400... +[2024-11-08 07:22:23,334][41694] Num frames 18500... +[2024-11-08 07:22:23,563][41694] Num frames 18600... +[2024-11-08 07:22:23,762][41694] Avg episode rewards: #0: 4.534, true rewards: #0: 4.056 +[2024-11-08 07:22:23,766][41694] Avg episode reward: 4.534, avg true_objective: 4.056 +[2024-11-08 07:22:23,877][41694] Num frames 18700... +[2024-11-08 07:22:24,101][41694] Num frames 18800... +[2024-11-08 07:22:24,334][41694] Num frames 18900... +[2024-11-08 07:22:24,542][41694] Num frames 19000... +[2024-11-08 07:22:24,733][41694] Num frames 19100... +[2024-11-08 07:22:24,796][41694] Avg episode rewards: #0: 4.554, true rewards: #0: 4.065 +[2024-11-08 07:22:24,801][41694] Avg episode reward: 4.554, avg true_objective: 4.065 +[2024-11-08 07:22:24,997][41694] Num frames 19200... +[2024-11-08 07:22:25,195][41694] Num frames 19300... +[2024-11-08 07:22:25,385][41694] Num frames 19400... +[2024-11-08 07:22:25,597][41694] Avg episode rewards: #0: 4.539, true rewards: #0: 4.060 +[2024-11-08 07:22:25,603][41694] Avg episode reward: 4.539, avg true_objective: 4.060 +[2024-11-08 07:22:25,650][41694] Num frames 19500... +[2024-11-08 07:22:25,865][41694] Num frames 19600... +[2024-11-08 07:22:26,093][41694] Num frames 19700... +[2024-11-08 07:22:26,250][41694] Avg episode rewards: #0: 4.499, true rewards: #0: 4.029 +[2024-11-08 07:22:26,253][41694] Avg episode reward: 4.499, avg true_objective: 4.029 +[2024-11-08 07:22:26,394][41694] Num frames 19800... +[2024-11-08 07:22:26,626][41694] Num frames 19900... +[2024-11-08 07:22:26,892][41694] Num frames 20000... +[2024-11-08 07:22:27,163][41694] Num frames 20100... +[2024-11-08 07:22:27,298][41694] Avg episode rewards: #0: 4.486, true rewards: #0: 4.026 +[2024-11-08 07:22:27,303][41694] Avg episode reward: 4.486, avg true_objective: 4.026 +[2024-11-08 07:22:27,491][41694] Num frames 20200... +[2024-11-08 07:22:27,748][41694] Num frames 20300... +[2024-11-08 07:22:27,954][41694] Num frames 20400... +[2024-11-08 07:22:28,150][41694] Num frames 20500... +[2024-11-08 07:22:28,239][41694] Avg episode rewards: #0: 4.473, true rewards: #0: 4.022 +[2024-11-08 07:22:28,241][41694] Avg episode reward: 4.473, avg true_objective: 4.022 +[2024-11-08 07:22:28,419][41694] Num frames 20600... +[2024-11-08 07:22:28,618][41694] Num frames 20700... +[2024-11-08 07:22:28,835][41694] Num frames 20800... +[2024-11-08 07:22:29,067][41694] Num frames 20900... +[2024-11-08 07:22:29,278][41694] Avg episode rewards: #0: 4.492, true rewards: #0: 4.031 +[2024-11-08 07:22:29,282][41694] Avg episode reward: 4.492, avg true_objective: 4.031 +[2024-11-08 07:22:29,405][41694] Num frames 21000... +[2024-11-08 07:22:29,665][41694] Num frames 21100... +[2024-11-08 07:22:29,917][41694] Num frames 21200... +[2024-11-08 07:22:30,169][41694] Num frames 21300... +[2024-11-08 07:22:30,433][41694] Num frames 21400... +[2024-11-08 07:22:30,675][41694] Num frames 21500... +[2024-11-08 07:22:30,742][41694] Avg episode rewards: #0: 4.548, true rewards: #0: 4.057 +[2024-11-08 07:22:30,744][41694] Avg episode reward: 4.548, avg true_objective: 4.057 +[2024-11-08 07:22:31,002][41694] Num frames 21600... +[2024-11-08 07:22:31,292][41694] Num frames 21700... +[2024-11-08 07:22:31,561][41694] Num frames 21800... +[2024-11-08 07:22:31,817][41694] Avg episode rewards: #0: 4.535, true rewards: #0: 4.053 +[2024-11-08 07:22:31,819][41694] Avg episode reward: 4.535, avg true_objective: 4.053 +[2024-11-08 07:22:31,849][41694] Num frames 21900... +[2024-11-08 07:22:32,079][41694] Num frames 22000... +[2024-11-08 07:22:32,304][41694] Num frames 22100... +[2024-11-08 07:22:32,519][41694] Num frames 22200... +[2024-11-08 07:22:32,727][41694] Avg episode rewards: #0: 4.522, true rewards: #0: 4.049 +[2024-11-08 07:22:32,732][41694] Avg episode reward: 4.522, avg true_objective: 4.049 +[2024-11-08 07:22:32,817][41694] Num frames 22300... +[2024-11-08 07:22:33,074][41694] Num frames 22400... +[2024-11-08 07:22:33,314][41694] Num frames 22500... +[2024-11-08 07:22:33,546][41694] Num frames 22600... +[2024-11-08 07:22:33,755][41694] Avg episode rewards: #0: 4.510, true rewards: #0: 4.046 +[2024-11-08 07:22:33,760][41694] Avg episode reward: 4.510, avg true_objective: 4.046 +[2024-11-08 07:22:33,860][41694] Num frames 22700... +[2024-11-08 07:22:34,062][41694] Num frames 22800... +[2024-11-08 07:22:34,274][41694] Num frames 22900... +[2024-11-08 07:22:34,505][41694] Num frames 23000... +[2024-11-08 07:22:34,662][41694] Avg episode rewards: #0: 4.498, true rewards: #0: 4.042 +[2024-11-08 07:22:34,667][41694] Avg episode reward: 4.498, avg true_objective: 4.042 +[2024-11-08 07:22:34,809][41694] Num frames 23100... +[2024-11-08 07:22:35,054][41694] Num frames 23200... +[2024-11-08 07:22:35,283][41694] Num frames 23300... +[2024-11-08 07:22:35,503][41694] Num frames 23400... +[2024-11-08 07:22:35,620][41694] Avg episode rewards: #0: 4.487, true rewards: #0: 4.039 +[2024-11-08 07:22:35,624][41694] Avg episode reward: 4.487, avg true_objective: 4.039 +[2024-11-08 07:22:35,811][41694] Num frames 23500... +[2024-11-08 07:22:36,073][41694] Num frames 23600... +[2024-11-08 07:22:36,338][41694] Num frames 23700... +[2024-11-08 07:22:36,606][41694] Num frames 23800... +[2024-11-08 07:22:36,686][41694] Avg episode rewards: #0: 4.476, true rewards: #0: 4.035 +[2024-11-08 07:22:36,690][41694] Avg episode reward: 4.476, avg true_objective: 4.035 +[2024-11-08 07:22:36,941][41694] Num frames 23900... +[2024-11-08 07:22:37,203][41694] Num frames 24000... +[2024-11-08 07:22:37,451][41694] Num frames 24100... +[2024-11-08 07:22:37,731][41694] Avg episode rewards: #0: 4.465, true rewards: #0: 4.032 +[2024-11-08 07:22:37,738][41694] Avg episode reward: 4.465, avg true_objective: 4.032 +[2024-11-08 07:22:37,773][41694] Num frames 24200... +[2024-11-08 07:22:38,027][41694] Num frames 24300... +[2024-11-08 07:22:38,235][41694] Num frames 24400... +[2024-11-08 07:22:38,474][41694] Num frames 24500... +[2024-11-08 07:22:38,732][41694] Num frames 24600... +[2024-11-08 07:22:38,995][41694] Num frames 24700... +[2024-11-08 07:22:39,138][41694] Avg episode rewards: #0: 4.514, true rewards: #0: 4.055 +[2024-11-08 07:22:39,142][41694] Avg episode reward: 4.514, avg true_objective: 4.055 +[2024-11-08 07:22:39,310][41694] Num frames 24800... +[2024-11-08 07:22:39,534][41694] Num frames 24900... +[2024-11-08 07:22:39,767][41694] Num frames 25000... +[2024-11-08 07:22:39,998][41694] Num frames 25100... +[2024-11-08 07:22:40,099][41694] Avg episode rewards: #0: 4.503, true rewards: #0: 4.052 +[2024-11-08 07:22:40,102][41694] Avg episode reward: 4.503, avg true_objective: 4.052 +[2024-11-08 07:22:40,286][41694] Num frames 25200... +[2024-11-08 07:22:40,506][41694] Num frames 25300... +[2024-11-08 07:22:40,735][41694] Num frames 25400... +[2024-11-08 07:22:40,957][41694] Num frames 25500... +[2024-11-08 07:22:41,018][41694] Avg episode rewards: #0: 4.493, true rewards: #0: 4.048 +[2024-11-08 07:22:41,025][41694] Avg episode reward: 4.493, avg true_objective: 4.048 +[2024-11-08 07:22:41,247][41694] Num frames 25600... +[2024-11-08 07:22:41,463][41694] Num frames 25700... +[2024-11-08 07:22:41,764][41694] Num frames 25800... +[2024-11-08 07:22:41,992][41694] Num frames 25900... +[2024-11-08 07:22:42,094][41694] Avg episode rewards: #0: 4.488, true rewards: #0: 4.050 +[2024-11-08 07:22:42,095][41694] Avg episode reward: 4.488, avg true_objective: 4.050 +[2024-11-08 07:22:42,273][41694] Num frames 26000... +[2024-11-08 07:22:42,507][41694] Num frames 26100... +[2024-11-08 07:22:42,728][41694] Num frames 26200... +[2024-11-08 07:22:42,935][41694] Num frames 26300... +[2024-11-08 07:22:43,011][41694] Avg episode rewards: #0: 4.478, true rewards: #0: 4.047 +[2024-11-08 07:22:43,013][41694] Avg episode reward: 4.478, avg true_objective: 4.047 +[2024-11-08 07:22:43,255][41694] Num frames 26400... +[2024-11-08 07:22:43,488][41694] Num frames 26500... +[2024-11-08 07:22:43,730][41694] Num frames 26600... +[2024-11-08 07:22:44,013][41694] Avg episode rewards: #0: 4.468, true rewards: #0: 4.044 +[2024-11-08 07:22:44,019][41694] Avg episode reward: 4.468, avg true_objective: 4.044 +[2024-11-08 07:22:44,069][41694] Num frames 26700... +[2024-11-08 07:22:44,325][41694] Num frames 26800... +[2024-11-08 07:22:44,576][41694] Num frames 26900... +[2024-11-08 07:22:44,825][41694] Num frames 27000... +[2024-11-08 07:22:45,127][41694] Avg episode rewards: #0: 4.476, true rewards: #0: 4.043 +[2024-11-08 07:22:45,134][41694] Avg episode reward: 4.476, avg true_objective: 4.043 +[2024-11-08 07:22:45,182][41694] Num frames 27100... +[2024-11-08 07:22:45,424][41694] Num frames 27200... +[2024-11-08 07:22:45,670][41694] Num frames 27300... +[2024-11-08 07:22:45,932][41694] Num frames 27400... +[2024-11-08 07:22:46,170][41694] Avg episode rewards: #0: 4.467, true rewards: #0: 4.040 +[2024-11-08 07:22:46,174][41694] Avg episode reward: 4.467, avg true_objective: 4.040 +[2024-11-08 07:22:46,253][41694] Num frames 27500... +[2024-11-08 07:22:46,453][41694] Num frames 27600... +[2024-11-08 07:22:46,672][41694] Num frames 27700... +[2024-11-08 07:22:46,887][41694] Num frames 27800... +[2024-11-08 07:22:47,101][41694] Num frames 27900... +[2024-11-08 07:22:47,211][41694] Avg episode rewards: #0: 4.481, true rewards: #0: 4.047 +[2024-11-08 07:22:47,216][41694] Avg episode reward: 4.481, avg true_objective: 4.047 +[2024-11-08 07:22:47,392][41694] Num frames 28000... +[2024-11-08 07:22:47,587][41694] Num frames 28100... +[2024-11-08 07:22:47,786][41694] Num frames 28200... +[2024-11-08 07:22:47,981][41694] Num frames 28300... +[2024-11-08 07:22:48,185][41694] Num frames 28400... +[2024-11-08 07:22:48,373][41694] Num frames 28500... +[2024-11-08 07:22:48,567][41694] Num frames 28600... +[2024-11-08 07:22:48,680][41694] Avg episode rewards: #0: 4.575, true rewards: #0: 4.089 +[2024-11-08 07:22:48,682][41694] Avg episode reward: 4.575, avg true_objective: 4.089 +[2024-11-08 07:22:48,833][41694] Num frames 28700... +[2024-11-08 07:22:49,031][41694] Num frames 28800... +[2024-11-08 07:22:49,234][41694] Num frames 28900... +[2024-11-08 07:22:49,899][41694] Num frames 29000... +[2024-11-08 07:22:50,129][41694] Avg episode rewards: #0: 4.588, true rewards: #0: 4.095 +[2024-11-08 07:22:50,132][41694] Avg episode reward: 4.588, avg true_objective: 4.095 +[2024-11-08 07:22:50,188][41694] Num frames 29100... +[2024-11-08 07:22:50,402][41694] Num frames 29200... +[2024-11-08 07:22:50,606][41694] Num frames 29300... +[2024-11-08 07:22:50,825][41694] Num frames 29400... +[2024-11-08 07:22:51,103][41694] Avg episode rewards: #0: 4.596, true rewards: #0: 4.096 +[2024-11-08 07:22:51,110][41694] Avg episode reward: 4.596, avg true_objective: 4.096 +[2024-11-08 07:22:51,153][41694] Num frames 29500... +[2024-11-08 07:22:51,396][41694] Num frames 29600... +[2024-11-08 07:22:51,621][41694] Num frames 29700... +[2024-11-08 07:22:51,835][41694] Num frames 29800... +[2024-11-08 07:22:52,055][41694] Avg episode rewards: #0: 4.585, true rewards: #0: 4.092 +[2024-11-08 07:22:52,058][41694] Avg episode reward: 4.585, avg true_objective: 4.092 +[2024-11-08 07:22:52,140][41694] Num frames 29900... +[2024-11-08 07:22:52,398][41694] Num frames 30000... +[2024-11-08 07:22:52,637][41694] Num frames 30100... +[2024-11-08 07:22:52,884][41694] Num frames 30200... +[2024-11-08 07:22:53,140][41694] Num frames 30300... +[2024-11-08 07:22:53,333][41694] Avg episode rewards: #0: 4.602, true rewards: #0: 4.102 +[2024-11-08 07:22:53,335][41694] Avg episode reward: 4.602, avg true_objective: 4.102 +[2024-11-08 07:22:53,442][41694] Num frames 30400... +[2024-11-08 07:22:53,671][41694] Num frames 30500... +[2024-11-08 07:22:53,922][41694] Num frames 30600... +[2024-11-08 07:22:54,178][41694] Num frames 30700... +[2024-11-08 07:22:54,422][41694] Num frames 30800... +[2024-11-08 07:22:54,487][41694] Avg episode rewards: #0: 4.613, true rewards: #0: 4.107 +[2024-11-08 07:22:54,491][41694] Avg episode reward: 4.613, avg true_objective: 4.107 +[2024-11-08 07:22:54,710][41694] Num frames 30900... +[2024-11-08 07:22:54,969][41694] Num frames 31000... +[2024-11-08 07:22:55,196][41694] Num frames 31100... +[2024-11-08 07:22:55,397][41694] Num frames 31200... +[2024-11-08 07:22:55,561][41694] Avg episode rewards: #0: 4.625, true rewards: #0: 4.112 +[2024-11-08 07:22:55,565][41694] Avg episode reward: 4.625, avg true_objective: 4.112 +[2024-11-08 07:22:55,685][41694] Num frames 31300... +[2024-11-08 07:22:55,947][41694] Num frames 31400... +[2024-11-08 07:22:56,188][41694] Num frames 31500... +[2024-11-08 07:22:56,369][41694] Num frames 31600... +[2024-11-08 07:22:56,616][41694] Avg episode rewards: #0: 4.636, true rewards: #0: 4.116 +[2024-11-08 07:22:56,622][41694] Avg episode reward: 4.636, avg true_objective: 4.116 +[2024-11-08 07:22:56,638][41694] Num frames 31700... +[2024-11-08 07:22:56,851][41694] Num frames 31800... +[2024-11-08 07:22:57,061][41694] Num frames 31900... +[2024-11-08 07:22:57,324][41694] Num frames 32000... +[2024-11-08 07:22:57,531][41694] Num frames 32100... +[2024-11-08 07:22:57,731][41694] Num frames 32200... +[2024-11-08 07:22:57,865][41694] Avg episode rewards: #0: 4.672, true rewards: #0: 4.133 +[2024-11-08 07:22:57,867][41694] Avg episode reward: 4.672, avg true_objective: 4.133 +[2024-11-08 07:22:58,003][41694] Num frames 32300... +[2024-11-08 07:22:58,220][41694] Num frames 32400... +[2024-11-08 07:22:58,424][41694] Num frames 32500... +[2024-11-08 07:22:58,632][41694] Num frames 32600... +[2024-11-08 07:22:58,875][41694] Avg episode rewards: #0: 4.682, true rewards: #0: 4.138 +[2024-11-08 07:22:58,878][41694] Avg episode reward: 4.682, avg true_objective: 4.138 +[2024-11-08 07:22:58,925][41694] Num frames 32700... +[2024-11-08 07:22:59,140][41694] Num frames 32800... +[2024-11-08 07:22:59,361][41694] Num frames 32900... +[2024-11-08 07:22:59,576][41694] Num frames 33000... +[2024-11-08 07:22:59,807][41694] Avg episode rewards: #0: 4.672, true rewards: #0: 4.134 +[2024-11-08 07:22:59,808][41694] Avg episode reward: 4.672, avg true_objective: 4.134 +[2024-11-08 07:22:59,889][41694] Num frames 33100... +[2024-11-08 07:23:00,126][41694] Num frames 33200... +[2024-11-08 07:23:00,380][41694] Num frames 33300... +[2024-11-08 07:23:00,721][41694] Num frames 33400... +[2024-11-08 07:23:00,997][41694] Num frames 33500... +[2024-11-08 07:23:01,115][41694] Avg episode rewards: #0: 4.682, true rewards: #0: 4.138 +[2024-11-08 07:23:01,119][41694] Avg episode reward: 4.682, avg true_objective: 4.138 +[2024-11-08 07:23:01,338][41694] Num frames 33600... +[2024-11-08 07:23:01,583][41694] Num frames 33700... +[2024-11-08 07:23:01,795][41694] Num frames 33800... +[2024-11-08 07:23:02,014][41694] Num frames 33900... +[2024-11-08 07:23:02,222][41694] Avg episode rewards: #0: 4.691, true rewards: #0: 4.143 +[2024-11-08 07:23:02,223][41694] Avg episode reward: 4.691, avg true_objective: 4.143 +[2024-11-08 07:23:02,306][41694] Num frames 34000... +[2024-11-08 07:23:02,557][41694] Num frames 34100... +[2024-11-08 07:23:02,832][41694] Num frames 34200... +[2024-11-08 07:23:03,123][41694] Num frames 34300... +[2024-11-08 07:23:03,341][41694] Avg episode rewards: #0: 4.681, true rewards: #0: 4.139 +[2024-11-08 07:23:03,342][41694] Avg episode reward: 4.681, avg true_objective: 4.139 +[2024-11-08 07:23:03,460][41694] Num frames 34400... +[2024-11-08 07:23:03,723][41694] Num frames 34500... +[2024-11-08 07:23:04,011][41694] Num frames 34600... +[2024-11-08 07:23:04,270][41694] Num frames 34700... +[2024-11-08 07:23:04,462][41694] Num frames 34800... +[2024-11-08 07:23:04,522][41694] Avg episode rewards: #0: 4.691, true rewards: #0: 4.143 +[2024-11-08 07:23:04,525][41694] Avg episode reward: 4.691, avg true_objective: 4.143 +[2024-11-08 07:23:04,725][41694] Num frames 34900... +[2024-11-08 07:23:04,920][41694] Num frames 35000... +[2024-11-08 07:23:05,132][41694] Num frames 35100... +[2024-11-08 07:23:05,386][41694] Avg episode rewards: #0: 4.681, true rewards: #0: 4.139 +[2024-11-08 07:23:05,387][41694] Avg episode reward: 4.681, avg true_objective: 4.139 +[2024-11-08 07:23:05,423][41694] Num frames 35200... +[2024-11-08 07:23:05,747][41694] Num frames 35300... +[2024-11-08 07:23:05,973][41694] Num frames 35400... +[2024-11-08 07:23:06,215][41694] Num frames 35500... +[2024-11-08 07:23:06,494][41694] Num frames 35600... +[2024-11-08 07:23:06,631][41694] Avg episode rewards: #0: 4.690, true rewards: #0: 4.143 +[2024-11-08 07:23:06,632][41694] Avg episode reward: 4.690, avg true_objective: 4.143 +[2024-11-08 07:23:06,807][41694] Num frames 35700... +[2024-11-08 07:23:07,044][41694] Num frames 35800... +[2024-11-08 07:23:07,240][41694] Num frames 35900... +[2024-11-08 07:23:07,444][41694] Num frames 36000... +[2024-11-08 07:23:07,541][41694] Avg episode rewards: #0: 4.680, true rewards: #0: 4.140 +[2024-11-08 07:23:07,546][41694] Avg episode reward: 4.680, avg true_objective: 4.140 +[2024-11-08 07:23:07,744][41694] Num frames 36100... +[2024-11-08 07:23:07,958][41694] Num frames 36200... +[2024-11-08 07:23:08,188][41694] Num frames 36300... +[2024-11-08 07:23:08,448][41694] Num frames 36400... +[2024-11-08 07:23:08,516][41694] Avg episode rewards: #0: 4.671, true rewards: #0: 4.136 +[2024-11-08 07:23:08,519][41694] Avg episode reward: 4.671, avg true_objective: 4.136 +[2024-11-08 07:23:08,743][41694] Num frames 36500... +[2024-11-08 07:23:08,978][41694] Num frames 36600... +[2024-11-08 07:23:09,256][41694] Num frames 36700... +[2024-11-08 07:23:09,495][41694] Avg episode rewards: #0: 4.661, true rewards: #0: 4.133 +[2024-11-08 07:23:09,497][41694] Avg episode reward: 4.661, avg true_objective: 4.133 +[2024-11-08 07:23:09,531][41694] Num frames 36800... +[2024-11-08 07:23:09,743][41694] Num frames 36900... +[2024-11-08 07:23:10,025][41694] Num frames 37000... +[2024-11-08 07:23:10,257][41694] Num frames 37100... +[2024-11-08 07:23:10,462][41694] Avg episode rewards: #0: 4.652, true rewards: #0: 4.130 +[2024-11-08 07:23:10,464][41694] Avg episode reward: 4.652, avg true_objective: 4.130 +[2024-11-08 07:23:10,531][41694] Num frames 37200... +[2024-11-08 07:23:10,741][41694] Num frames 37300... +[2024-11-08 07:23:10,945][41694] Num frames 37400... +[2024-11-08 07:23:11,153][41694] Num frames 37500... +[2024-11-08 07:23:11,325][41694] Avg episode rewards: #0: 4.643, true rewards: #0: 4.127 +[2024-11-08 07:23:11,327][41694] Avg episode reward: 4.643, avg true_objective: 4.127 +[2024-11-08 07:23:11,438][41694] Num frames 37600... +[2024-11-08 07:23:11,680][41694] Num frames 37700... +[2024-11-08 07:23:12,035][41694] Num frames 37800... +[2024-11-08 07:23:12,456][41694] Num frames 37900... +[2024-11-08 07:23:12,652][41694] Avg episode rewards: #0: 4.634, true rewards: #0: 4.124 +[2024-11-08 07:23:12,657][41694] Avg episode reward: 4.634, avg true_objective: 4.124 +[2024-11-08 07:23:12,798][41694] Num frames 38000... +[2024-11-08 07:23:12,997][41694] Num frames 38100... +[2024-11-08 07:23:13,220][41694] Num frames 38200... +[2024-11-08 07:23:13,568][41694] Num frames 38300... +[2024-11-08 07:23:13,700][41694] Avg episode rewards: #0: 4.626, true rewards: #0: 4.121 +[2024-11-08 07:23:13,703][41694] Avg episode reward: 4.626, avg true_objective: 4.121 +[2024-11-08 07:23:13,949][41694] Num frames 38400... +[2024-11-08 07:23:14,201][41694] Num frames 38500... +[2024-11-08 07:23:14,460][41694] Num frames 38600... +[2024-11-08 07:23:14,660][41694] Num frames 38700... +[2024-11-08 07:23:14,973][41694] Avg episode rewards: #0: 4.635, true rewards: #0: 4.124 +[2024-11-08 07:23:14,975][41694] Avg episode reward: 4.635, avg true_objective: 4.124 +[2024-11-08 07:23:15,050][41694] Num frames 38800... +[2024-11-08 07:23:15,483][41694] Num frames 38900... +[2024-11-08 07:23:15,793][41694] Num frames 39000... +[2024-11-08 07:23:15,996][41694] Num frames 39100... +[2024-11-08 07:23:16,194][41694] Avg episode rewards: #0: 4.627, true rewards: #0: 4.121 +[2024-11-08 07:23:16,197][41694] Avg episode reward: 4.627, avg true_objective: 4.121 +[2024-11-08 07:23:16,333][41694] Num frames 39200... +[2024-11-08 07:23:16,635][41694] Num frames 39300... +[2024-11-08 07:23:16,960][41694] Num frames 39400... +[2024-11-08 07:23:17,218][41694] Num frames 39500... +[2024-11-08 07:23:17,386][41694] Avg episode rewards: #0: 4.618, true rewards: #0: 4.118 +[2024-11-08 07:23:17,394][41694] Avg episode reward: 4.618, avg true_objective: 4.118 +[2024-11-08 07:23:17,589][41694] Num frames 39600... +[2024-11-08 07:23:17,882][41694] Num frames 39700... +[2024-11-08 07:23:18,142][41694] Num frames 39800... +[2024-11-08 07:23:18,510][41694] Num frames 39900... +[2024-11-08 07:23:18,646][41694] Avg episode rewards: #0: 4.610, true rewards: #0: 4.116 +[2024-11-08 07:23:18,649][41694] Avg episode reward: 4.610, avg true_objective: 4.116 +[2024-11-08 07:23:19,014][41694] Num frames 40000... +[2024-11-08 07:23:19,354][41694] Num frames 40100... +[2024-11-08 07:23:19,591][41694] Num frames 40200... +[2024-11-08 07:23:19,852][41694] Num frames 40300... +[2024-11-08 07:23:19,929][41694] Avg episode rewards: #0: 4.603, true rewards: #0: 4.113 +[2024-11-08 07:23:19,931][41694] Avg episode reward: 4.603, avg true_objective: 4.113 +[2024-11-08 07:23:20,124][41694] Num frames 40400... +[2024-11-08 07:23:20,325][41694] Num frames 40500... +[2024-11-08 07:23:20,532][41694] Num frames 40600... +[2024-11-08 07:23:20,822][41694] Num frames 40700... +[2024-11-08 07:23:21,134][41694] Avg episode rewards: #0: 4.611, true rewards: #0: 4.116 +[2024-11-08 07:23:21,135][41694] Avg episode reward: 4.611, avg true_objective: 4.116 +[2024-11-08 07:23:21,247][41694] Num frames 40800... +[2024-11-08 07:23:21,495][41694] Num frames 40900... +[2024-11-08 07:23:21,858][41694] Num frames 41000... +[2024-11-08 07:23:22,657][41694] Num frames 41100... +[2024-11-08 07:23:22,827][41694] Avg episode rewards: #0: 4.604, true rewards: #0: 4.114 +[2024-11-08 07:23:22,829][41694] Avg episode reward: 4.604, avg true_objective: 4.114 +[2024-11-08 07:25:09,068][41694] Replay video saved to /root/hfRL/ml/LunarLander-v2/train_dir/default_experiment/replay.mp4!