Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Playing Atari with Deep Reinforcement Learning

Liang Gong
April 01, 2017

Playing Atari with Deep Reinforcement Learning

An awesome paper on how deep learning models are used to play Atari video games. Presented by Liang Gong in Berkeley's group meeting.

Liang Gong

April 01, 2017
Tweet

More Decks by Liang Gong

Other Decks in Research

Transcript

  1. Playing Atari with Deep Reinforcement Learning An explanatory tutorial assembled

    by: Liang Gong Liang Gong, Electric Engineering & Computer Science, University of California, Berkeley. 1 Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller
  2. What is Deep Reinforcement Learning? • Simply speaking: Q-learning +

    Neural Networks • What is Q-learning? • Q-Learning (a simple example) • Q-Learning on Atari Games • Why use it with Neural Networks? Liang Gong, Electric Engineering & Computer Science, University of California, Berkeley. 2
  3. What is Deep Reinforcement Learning? • Simply speaking: Q-learning +

    Neural Networks • What is Q-learning? • Q-Learning (a simple example) • Q-Learning on Atari Games • Why use it with Neural Networks? Liang Gong, Electric Engineering & Computer Science, University of California, Berkeley. 3
  4. A Simple Tutorial on Q-learning • Suppose we have a

    house with 5 rooms • Starting from any room • Destination: 5 Liang Gong, Electric Engineering & Computer Science, University of California, Berkeley. 4 http://mnemstudio.org/path-finding-q-learning-tutorial.htm
  5. A Simple Tutorial on Q-learning • Suppose we have a

    house with 5 rooms • Starting from any room • Destination: 5 0 1 2 3 4 5 Liang Gong, Electric Engineering & Computer Science, University of California, Berkeley. 5
  6. A Simple Tutorial on Q-learning • Suppose we have a

    house with 5 rooms • Starting from any room • Destination: 5 0 1 2 3 4 5 Liang Gong, Electric Engineering & Computer Science, University of California, Berkeley. 6
  7. A Simple Tutorial on Q-learning • Destination: 5 • On-step

    reward matrix R (fixed) • Q-Matrix: Q (changing) State * Action  Reward Liang Gong, Electric Engineering & Computer Science, University of California, Berkeley. 7
  8. A Simple Tutorial on Q-learning • Destination: 5 • Goal

    is a Q-graph (or matrix) • At any state, the Q-graph tells us what is the optimal next state (in order to reach goal state early) Liang Gong, Electric Engineering & Computer Science, University of California, Berkeley. 8
  9. A Simple Tutorial on Q-learning • Calculate Q-matrix: • Q(state,

    action) = R(state, action) + Gamma * Max[Q(next state, all actions)] • Init Q-matrix: Liang Gong, Electric Engineering & Computer Science, University of California, Berkeley. 9
  10. A Simple Tutorial on Q-learning • Calculate Q-matrix: • Q(state,

    action) = R(state, action) + Gamma * Max[Q(next state, all actions)] • Randomly pick a state as initial state: 1 • next state: 3, 5; pick 5 as the next state Liang Gong, Electric Engineering & Computer Science, University of California, Berkeley. 10
  11. A Simple Tutorial on Q-learning • Calculate Q-matrix: • Q(state,

    action) = R(state, action) + Gamma * Max[Q(next state, all actions)] • Q(1, 5) = R(1, 5) + 0.8 * Max[Q(5, 1), Q(5, 4), Q(5, 5)] = 100 + 0.8 * 0 = 100 Liang Gong, Electric Engineering & Computer Science, University of California, Berkeley. 11
  12. A Simple Tutorial on Q-learning • Calculate Q-matrix: • Q(state,

    action) = R(state, action) + Gamma * Max[Q(next state, all actions)] • Randomly pick a state: 3 • next state: 1, 2, 4; randomly pick 1 as the next state Liang Gong, Electric Engineering & Computer Science, University of California, Berkeley. 12
  13. A Simple Tutorial on Q-learning • Calculate Q-matrix: • Q(state,

    action) = R(state, action) + Gamma * Max[Q(next state, all actions)] • Q(3, 1) = R(3, 1) + 0.8 * Max[Q(1, 2), Q(1, 5)] = 0 + 0.8 * Max(0, 100) = 80 Liang Gong, Electric Engineering & Computer Science, University of California, Berkeley. 13
  14. A Simple Tutorial on Q-learning • Calculate Q-matrix: • Q(state,

    action) = R(state, action) + Gamma * Max[Q(next state, all actions)] • Repeat the process over and over -> convergence Liang Gong, Electric Engineering & Computer Science, University of California, Berkeley. 14
  15. A Simple Tutorial on Q-learning • Calculate Q-matrix: • Q(state,

    action) = R(state, action) + Gamma * Max[Q(next state, all actions)] • Repeat the process over and over -> convergence • Normalize Liang Gong, Electric Engineering & Computer Science, University of California, Berkeley. 15
  16. A Simple Tutorial on Q-learning • Calculate Q-matrix: • Q(state,

    action) = R(state, action) + Gamma * Max[Q(next state, all actions)] • Repeat the process over and over -> convergence • Normalize Liang Gong, Electric Engineering & Computer Science, University of California, Berkeley. 16
  17. What is Deep Reinforcement Learning? • Simply speaking: Q-learning +

    Neural Networks • What is Q-learning? • Q-Learning (a simple example) • Q-Learning on Atari Games • Why use it with Neural Networks? Liang Gong, Electric Engineering & Computer Science, University of California, Berkeley. 17
  18. What is Deep Reinforcement Learning? • Simply speaking: Q-learning +

    Neural Networks • What is Q-learning? • Q-Learning (a simple example) • Q-Learning on Atari Games • Why use it with Neural Networks? Liang Gong, Electric Engineering & Computer Science, University of California, Berkeley. 18
  19. Q-Learning on Atari Games • Each unique screenshot corresponds to

    one state • At any state, only three actions: left, right, or nop • Number of states: width * height * colors Liang Gong, Electric Engineering & Computer Science, University of California, Berkeley. 19
  20. What is Deep Reinforcement Learning? • Simply speaking: Q-learning +

    Neural Networks • What is Q-learning? • Q-Learning (a simple example) • Q-Learning on Atari Games • Why use it with Neural Networks? Liang Gong, Electric Engineering & Computer Science, University of California, Berkeley. 20
  21. Q-Learning on Atari Games • Each unique screenshot corresponds to

    one state • At any state, only three actions: left, right, or nop • Number of states: width * height * colors Problem: • Can not start from a random state • Impractical: iterating over all possible states and actions Liang Gong, Electric Engineering & Computer Science, University of California, Berkeley. 21
  22. Q-Learning on Atari Games • Each unique screenshot corresponds to

    one state • At any state, only three actions: left, right, or nop • Number of states: width * height * colors Problem: • Can not start from a random state • Impractical: iterating over all possible states and actions Solution: • Use a predictor to estimate (or approximate) the converged Q-matrix • Neural Network seems to work well! Liang Gong, Electric Engineering & Computer Science, University of California, Berkeley. 22
  23. 23 Liang Gong, Electric Engineering & Computer Science, University of

    California, Berkeley. This is called the Deep-Q-Networks (DQN)
  24. 24 Liang Gong, Electric Engineering & Computer Science, University of

    California, Berkeley. This is called the Deep-Q-Networks (DQN) Each output is an estimated reward for one action. …
  25. 25 Liang Gong, Electric Engineering & Computer Science, University of

    California, Berkeley. This is called the Deep-Q-Networks (DQN) During training, with probably ε, pick a random action. Observe the actual reward as y, and the estimated reward as y’ Loss = (y – y’)^2
  26. 26 Liang Gong, Electric Engineering & Computer Science, University of

    California, Berkeley. This is called the Deep-Q-Networks (DQN) Loss = (y – y’)^2 Update the network weights using gradient descent
  27. The Math in the Paper closure Expectation Reward for taking

    action a under state s future discount after action a, the maximal reward for the next possible action The original Q-Learning Formula (Impractical): Liang Gong, Electric Engineering & Computer Science, University of California, Berkeley. 27
  28. The Math in the Paper Loss in step i Expectation

    actual reward from Atari game NN predicted reward under state s, action a, and weights Calculate the loss function NN weights in step i Liang Gong, Electric Engineering & Computer Science, University of California, Berkeley. 28
  29. The Math in the Paper differentiate Li with respect to

    weights Calculate gradient: Liang Gong, Electric Engineering & Computer Science, University of California, Berkeley. 29