Deep-Q Pong, with tensorflow and pygame

Slide 1

Slide 1 text

Deep Q Pong, with Tensorflow and PyGame By Daniel K Slater

Slide 2

Slide 2 text

We will talk about... ● Google deepmind recently got the worlds best performance at learning a variety of atari games. ● Here we are going to look how it works and re-implementing their work in PyGame with TensorFlow ● Why Pong ● Pong - Classic, simple, dynamic game ● We want to train a computer to play it, just from watching it. ● What’s PyGame?

Slide 3

Slide 3 text

PyGame ● http://pygame.org/ ● Most popular python games framework ● 1000’s of games, all free, all open source ● All written in Python

Slide 4

Slide 4 text

Why do we care about this? ● It’s fun ● It’s challenging ● If we can develop generalized learning algorithms they could apply to many other fields ● It will allow us to build our future robot overlords who will inherit the earth from us

Slide 5

Slide 5 text

3 types of learning ● Supervised ○ I have a set of data and a set of labels and I want to teach a machine to learn the labels from the data. ○ E.g. Pictures of handwritten digits to the actual digits. ● Unsupervised ○ I have some unlabeled data and I want a machine learning something about it. ○ E.g. similar and different DNA sequences ● And the 3rd...

Slide 6

Slide 6 text

What is reinforcement learning? ● Agents are run within an environment. ● As they take actions they receive feedback ● They aim to maximize good feedback and minimize bad feedback ● Computer games are a great way to train reinforcement learning agents. we know we can learn games from just a sequence of images, so computer agents should be able to do the same thing (given enough computational power, time and the right algorithms).

Slide 7

Slide 7 text

What is Q-learning ● An approach to learning about a state space ● Classic example is a robot navigating a maze ● Q function: ● Updated iteratively

Slide 8

Slide 8 text

Q learning maze example Images stolen from http://mnemstudio.org/path-finding-q-learning-tutorial.htm

Slide 9

Slide 9 text

Q learning in Tensorflow

Slide 10

Slide 10 text

PyGamePlayer ● https://github.com/DanielSlater/PyGamePlayer ● Allows running of PyGame games with zero touches ● Handles intercepting screen buffer and key presses ● Fixes the game frame rates

Slide 11

Slide 11 text

Very simple Pong framework in Tensorflow

Slide 12

Slide 12 text

● What the states and actions? ● Actions are the key presses. ● The state could be the screen? ● Normal screen is 640x480 pixels = 307200 data points per state = 2^307200 different states ● Pong is a dynamic game a single static shot is not enough our state needs to comprise change. Make it store the last 4 frames. ● State is now 2^1228800 = too f***ing big number. Applying Q-learning to Pong

Slide 13

Slide 13 text

Tensor Flow and convolutional nets to the rescue Convolutional net: Use a deep convolutional architecture to turn a the huge screen image into a much smaller representation of the state of the game.

Slide 14

Slide 14 text

Exploring the space ● Remaining Problem In the beginning our Q-function will be terrible May never find the better states because of local minima ● Solution Start off exploring randomly to get a varied set of samples Slowly replace our random actions with more and more actions chosen by the agent