steventrouble commited on
Commit
f2ab607
·
1 Parent(s): 41d902a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +26 -1
README.md CHANGED
@@ -45,4 +45,29 @@ Note that `env_steps` can differ from `train_steps` because the model can
45
  continue fine-tuning using its replay buffer. In the paper, the last 20k
46
  epochs are done in this manner. This isn't necessary outside of benchmarks
47
  and in theory better performance should be attainable by getting more samples
48
- from the env.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
45
  continue fine-tuning using its replay buffer. In the paper, the last 20k
46
  epochs are done in this manner. This isn't necessary outside of benchmarks
47
  and in theory better performance should be attainable by getting more samples
48
+ from the env.
49
+
50
+ ---
51
+
52
+ ## Findings
53
+
54
+ Our primary goal in this project was to test out EfficientZero and see its capabilities.
55
+ We were amazed by the model overall, especially on Breakout, where it far outperformed
56
+ the human baseline. The overall cost was only about $50 per fully trained model, compared
57
+ to the hundreds of thousands of dollars needed to train MuZero.
58
+
59
+ Though the trained models achieved impressive scores in Atari, they didn't reach the
60
+ stellar scores demonstrated in the paper. This could be because we used different hardware
61
+ and dependencies or because ML research papers tend to cherry-pick models and environments
62
+ to showcase good results.
63
+
64
+ Additionally, the models tended to hit a performance wall between 75-100k steps. While we
65
+ don't have enough data to know why or how often this happens, it's not surprising: the model
66
+ was tuned specifically for data efficiency, so it hasn't been tested at larger scales. A
67
+ model like MuZero might be more appropriate if you have a large budget.
68
+
69
+ Training times seemed longer than those reported in the EfficientZero paper. The paper
70
+ stated that they could train a model to completion in 7 hours, while in practice, we've found
71
+ that it takes an A100 with 32 cores between 1 to 2 days to train a model to completion. This
72
+ is likely because the training process uses more CPU than other models and therefore does not
73
+ perform well on the low-frequency, many-core CPUs found in GPU clusters.