Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Deep-Q Pong, with tensorflow and pygame

Deep-Q Pong, with tensorflow and pygame

DanielSlater

March 22, 2016
Tweet

More Decks by DanielSlater

Other Decks in Technology

Transcript

  1. We will talk about... • Google deepmind recently got the

    worlds best performance at learning a variety of atari games. • Here we are going to look how it works and re-implementing their work in PyGame with TensorFlow • Why Pong • Pong - Classic, simple, dynamic game • We want to train a computer to play it, just from watching it. • What’s PyGame?
  2. PyGame • http://pygame.org/ • Most popular python games framework •

    1000’s of games, all free, all open source • All written in Python
  3. Why do we care about this? • It’s fun •

    It’s challenging • If we can develop generalized learning algorithms they could apply to many other fields • It will allow us to build our future robot overlords who will inherit the earth from us
  4. 3 types of learning • Supervised ◦ I have a

    set of data and a set of labels and I want to teach a machine to learn the labels from the data. ◦ E.g. Pictures of handwritten digits to the actual digits. • Unsupervised ◦ I have some unlabeled data and I want a machine learning something about it. ◦ E.g. similar and different DNA sequences • And the 3rd...
  5. What is reinforcement learning? • Agents are run within an

    environment. • As they take actions they receive feedback • They aim to maximize good feedback and minimize bad feedback • Computer games are a great way to train reinforcement learning agents. we know we can learn games from just a sequence of images, so computer agents should be able to do the same thing (given enough computational power, time and the right algorithms).
  6. What is Q-learning • An approach to learning about a

    state space • Classic example is a robot navigating a maze • Q function: • Updated iteratively
  7. PyGamePlayer • https://github.com/DanielSlater/PyGamePlayer • Allows running of PyGame games with

    zero touches • Handles intercepting screen buffer and key presses • Fixes the game frame rates
  8. • What the states and actions? • Actions are the

    key presses. • The state could be the screen? • Normal screen is 640x480 pixels = 307200 data points per state = 2^307200 different states • Pong is a dynamic game a single static shot is not enough our state needs to comprise change. Make it store the last 4 frames. • State is now 2^1228800 = too f***ing big number. Applying Q-learning to Pong
  9. Tensor Flow and convolutional nets to the rescue Convolutional net:

    Use a deep convolutional architecture to turn a the huge screen image into a much smaller representation of the state of the game.
  10. Exploring the space • Remaining Problem In the beginning our

    Q-function will be terrible May never find the better states because of local minima • Solution Start off exploring randomly to get a varied set of samples Slowly replace our random actions with more and more actions chosen by the agent