PPO) automatically Recap of a simple RL algorithm 1. Initialize parameters of policy and of value function 2. While true a. Run policy in the episode and collect a trajectory b. Update , where Discovering Reinforcement Learning Algorithms Junhyuk Oh, Matteo Hessel, et al., DeepMind [Neurips paper] not crashed future is ok it’s obvious that the future will be bad the action avoided forecasted crash, awesome