Deep Reinforcement Learning

Slide 1

Slide 1 text

Deep RL Abecon 20.05.2016

Slide 2

Slide 2 text

RL-type Problems • game of chess, GO, Space Invaders • balancing a unicycle • investing in stock market • running a business • making fast food • life…!

Slide 3

Slide 3 text

Markov Decision Process • S - set of states • A - set of actions (or actions for state) • P(s, s’ | a) - state change • R(s, s’ | a) - reward • ∈ [0, 1] - discount factor

Slide 4

Slide 4 text

No content

Slide 5

Slide 5 text

Maximize the total discounted reward: The GOAL

Slide 6

Slide 6 text

: discount factor 0 instant gratification … patience 1

Slide 7

Slide 7 text

Value Functions http://cs.stanford.edu/people/karpathy/reinforcejs/gridworld_dp. html

Slide 8

Slide 8 text

SARSA http://cs.stanford.edu/people/karpathy/reinforcejs/gridworld_td.html

Slide 9

Slide 9 text

Q-Learning http://cs.stanford.edu/people/karpathy/reinforcejs/gridworld_td.html

Slide 10

Slide 10 text

Reinforce.js http://cs.stanford.edu/people/karpathy/reinforcejs/ // create an environment object var env = {}; env.getNumStates = function() { return 8; } env.getMaxNumActions = function() { return 4; } // create the DQN agent var spec = { alpha: 0.01 } agent = new RL.DQNAgent(env, spec); setInterval(function(){ // start the learning loop var action = agent.act(s); // s is an array of length 8 agent.learn(reward); }, 0);

Slide 11

Slide 11 text

2013: Deep RL http://arxiv.org/abs/1312.5602

Slide 12

Slide 12 text

2014: Google buys DeepMind

Slide 13

Slide 13 text

2015: AlphaGO

Slide 14

Slide 14 text

Deep Q-Learning 1. Do a feedforward pass for the current state s to get predicted Q-values for all actions. 2. Do a feedforward pass for the next state s’ and calculate maximum overall network outputs max a’ Q(s’, a’). 3. Set Q-value target for action to r + γmax a’ Q(s’, a’) (use the max calculated in step 2). For all other actions, set the Q-value target to the same as originally returned from step 1, making the error 0 for those outputs. 4. Update the weights using backpropagation. http://www.nervanasys.com/demystifying-deep-reinforcement-learning/

Slide 15

Slide 15 text

Deep Q-Learning http://www.nervanasys.com/demystifying-deep-reinforcement-learning/

Slide 16

Slide 16 text

https://www.youtube.com/watch?v=32y3_iyHpBc http://gabrielecirulli.github.io/2048/

Slide 17

Slide 17 text

Asynchronous Gradient Descent http://arxiv.org/abs/1602.01783

Slide 18

Slide 18 text

http://www.rethinkrobotics.com/baxter/

Slide 19

Slide 19 text

No content