Deep Reinforcement Learning

Stable Baselines rely on TF 1.x but Stable Baselines v3 rely on PyTorch.


Best model from CleanRL: --exp-name a2c --update-epochs 1 --num-minibatches 1 --norm-adv False --num-envs 64 --clip-vloss False --vf-coef 0.25 --anneal-lr False --num-steps 5 --track.

Fun fact

  • Running on my CPU was faster than Colab GPU (for the MinAtar environment), possibly because the data was not high dimensional and the network was not very deep