environment. • As they take actions they receive feedback • They aim to maximize good feedback and minimize bad feedback • Computer games are a great way to train reinforcement learning agents. we know we can learn games from just a sequence of images, so computer agents should be able to do the same thing (given enough computational power, time and the right algorithms).
a set of papers which have the best performance for reinforcement learning on a number of atari games such as Pong • They use a deep convolutional architecture to turn a 192x160 image into a much smaller representation of the state of the game A technique called Q-learning is then used to learn the best actions from those states
make modifications to the game, such as removing a menu screen We can grab values from within the game to use for reinforcement ◦ In Pong we can get reinforcement from the scores ◦ In Tetris we can get +ve reinforcement by intercepting the removeCompleteLines function ◦ In a platform game we can grab the player's x position to encourage exploration