Playing Atari with Deep Reinforcement Learning

Playing Atari with Deep Reinforcement Learning An explanatory tutorial assembled
by: Liang Gong Liang Gong, Electric Engineering & Computer Science, University of California, Berkeley. 1 Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller

What is Deep Reinforcement Learning? • Simply speaking: Q-learning +
Neural Networks • What is Q-learning? • Q-Learning (a simple example) • Q-Learning on Atari Games • Why use it with Neural Networks? Liang Gong, Electric Engineering & Computer Science, University of California, Berkeley. 2

A Simple Tutorial on Q-learning • Suppose we have a
house with 5 rooms • Starting from any room • Destination: 5 Liang Gong, Electric Engineering & Computer Science, University of California, Berkeley. 4 http://mnemstudio.org/path-finding-q-learning-tutorial.htm

house with 5 rooms • Starting from any room • Destination: 5 0 1 2 3 4 5 Liang Gong, Electric Engineering & Computer Science, University of California, Berkeley. 5

house with 5 rooms • Starting from any room • Destination: 5 0 1 2 3 4 5 Liang Gong, Electric Engineering & Computer Science, University of California, Berkeley. 6

A Simple Tutorial on Q-learning • Destination: 5 • On-step
reward matrix R (fixed) • Q-Matrix: Q (changing) State * Action  Reward Liang Gong, Electric Engineering & Computer Science, University of California, Berkeley. 7

A Simple Tutorial on Q-learning • Destination: 5 • Goal
is a Q-graph (or matrix) • At any state, the Q-graph tells us what is the optimal next state (in order to reach goal state early) Liang Gong, Electric Engineering & Computer Science, University of California, Berkeley. 8

A Simple Tutorial on Q-learning • Calculate Q-matrix: • Q(state,
action) = R(state, action) + Gamma * Max[Q(next state, all actions)] • Init Q-matrix: Liang Gong, Electric Engineering & Computer Science, University of California, Berkeley. 9

action) = R(state, action) + Gamma * Max[Q(next state, all actions)] • Randomly pick a state as initial state: 1 • next state: 3, 5; pick 5 as the next state Liang Gong, Electric Engineering & Computer Science, University of California, Berkeley. 10

action) = R(state, action) + Gamma * Max[Q(next state, all actions)] • Q(1, 5) = R(1, 5) + 0.8 * Max[Q(5, 1), Q(5, 4), Q(5, 5)] = 100 + 0.8 * 0 = 100 Liang Gong, Electric Engineering & Computer Science, University of California, Berkeley. 11

action) = R(state, action) + Gamma * Max[Q(next state, all actions)] • Randomly pick a state: 3 • next state: 1, 2, 4; randomly pick 1 as the next state Liang Gong, Electric Engineering & Computer Science, University of California, Berkeley. 12

action) = R(state, action) + Gamma * Max[Q(next state, all actions)] • Q(3, 1) = R(3, 1) + 0.8 * Max[Q(1, 2), Q(1, 5)] = 0 + 0.8 * Max(0, 100) = 80 Liang Gong, Electric Engineering & Computer Science, University of California, Berkeley. 13

action) = R(state, action) + Gamma * Max[Q(next state, all actions)] • Repeat the process over and over -> convergence Liang Gong, Electric Engineering & Computer Science, University of California, Berkeley. 14

action) = R(state, action) + Gamma * Max[Q(next state, all actions)] • Repeat the process over and over -> convergence • Normalize Liang Gong, Electric Engineering & Computer Science, University of California, Berkeley. 15

action) = R(state, action) + Gamma * Max[Q(next state, all actions)] • Repeat the process over and over -> convergence • Normalize Liang Gong, Electric Engineering & Computer Science, University of California, Berkeley. 16

Q-Learning on Atari Games • Each unique screenshot corresponds to
one state • At any state, only three actions: left, right, or nop • Number of states: width * height * colors Liang Gong, Electric Engineering & Computer Science, University of California, Berkeley. 19

one state • At any state, only three actions: left, right, or nop • Number of states: width * height * colors Problem: • Can not start from a random state • Impractical: iterating over all possible states and actions Liang Gong, Electric Engineering & Computer Science, University of California, Berkeley. 21

one state • At any state, only three actions: left, right, or nop • Number of states: width * height * colors Problem: • Can not start from a random state • Impractical: iterating over all possible states and actions Solution: • Use a predictor to estimate (or approximate) the converged Q-matrix • Neural Network seems to work well! Liang Gong, Electric Engineering & Computer Science, University of California, Berkeley. 22

23 Liang Gong, Electric Engineering & Computer Science, University of
California, Berkeley. This is called the Deep-Q-Networks (DQN)

California, Berkeley. This is called the Deep-Q-Networks (DQN) Each output is an estimated reward for one action. …

California, Berkeley. This is called the Deep-Q-Networks (DQN) During training, with probably ε, pick a random action. Observe the actual reward as y, and the estimated reward as y’ Loss = (y – y’)^2

California, Berkeley. This is called the Deep-Q-Networks (DQN) Loss = (y – y’)^2 Update the network weights using gradient descent

The Math in the Paper closure Expectation Reward for taking
action a under state s future discount after action a, the maximal reward for the next possible action The original Q-Learning Formula (Impractical): Liang Gong, Electric Engineering & Computer Science, University of California, Berkeley. 27

The Math in the Paper Loss in step i Expectation
actual reward from Atari game NN predicted reward under state s, action a, and weights Calculate the loss function NN weights in step i Liang Gong, Electric Engineering & Computer Science, University of California, Berkeley. 28

The Math in the Paper differentiate Li with respect to
weights Calculate gradient: Liang Gong, Electric Engineering & Computer Science, University of California, Berkeley. 29

30 Theano: https://github.com/Theano/Theano Convnetjs: https://github.com/karpathy/convnetjs DNN-JS-Library: https://github.com/EngageSoftware/DNN- JavaScript-Libraries DNN Libraries
Liang Gong, Electric Engineering & Computer Science, University of California, Berkeley.

Playing Atari with Deep Reinforcement Learning

Playing Atari with Deep Reinforcement Learning

Liang Gong

More Decks by Liang Gong

Other Decks in Research

Featured

Transcript

Playing Atari with Deep Reinforcement Learning An explanatory tutorial assembled

What is Deep Reinforcement Learning? • Simply speaking: Q-learning +

What is Deep Reinforcement Learning? • Simply speaking: Q-learning +

A Simple Tutorial on Q-learning • Suppose we have a

A Simple Tutorial on Q-learning • Suppose we have a

A Simple Tutorial on Q-learning • Suppose we have a

A Simple Tutorial on Q-learning • Destination: 5 • On-step

A Simple Tutorial on Q-learning • Destination: 5 • Goal

A Simple Tutorial on Q-learning • Calculate Q-matrix: • Q(state,

A Simple Tutorial on Q-learning • Calculate Q-matrix: • Q(state,

A Simple Tutorial on Q-learning • Calculate Q-matrix: • Q(state,

A Simple Tutorial on Q-learning • Calculate Q-matrix: • Q(state,

A Simple Tutorial on Q-learning • Calculate Q-matrix: • Q(state,

A Simple Tutorial on Q-learning • Calculate Q-matrix: • Q(state,

A Simple Tutorial on Q-learning • Calculate Q-matrix: • Q(state,

A Simple Tutorial on Q-learning • Calculate Q-matrix: • Q(state,

What is Deep Reinforcement Learning? • Simply speaking: Q-learning +

What is Deep Reinforcement Learning? • Simply speaking: Q-learning +

Q-Learning on Atari Games • Each unique screenshot corresponds to

What is Deep Reinforcement Learning? • Simply speaking: Q-learning +

Q-Learning on Atari Games • Each unique screenshot corresponds to

Q-Learning on Atari Games • Each unique screenshot corresponds to

23 Liang Gong, Electric Engineering & Computer Science, University of

24 Liang Gong, Electric Engineering & Computer Science, University of

25 Liang Gong, Electric Engineering & Computer Science, University of

26 Liang Gong, Electric Engineering & Computer Science, University of

The Math in the Paper closure Expectation Reward for taking

The Math in the Paper Loss in step i Expectation

The Math in the Paper differentiate Li with respect to

30 Theano: https://github.com/Theano/Theano Convnetjs: https://github.com/karpathy/convnetjs DNN-JS-Library: https://github.com/EngageSoftware/DNN- JavaScript-Libraries DNN Libraries