Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Deep Reinforcement Learning

Deep Reinforcement Learning

Leszek Rybicki

May 20, 2016
Tweet

More Decks by Leszek Rybicki

Other Decks in Science

Transcript

  1. RL-type Problems • game of chess, GO, Space Invaders •

    balancing a unicycle • investing in stock market • running a business • making fast food • life…!
  2. Markov Decision Process • S - set of states •

    A - set of actions (or actions for state) • P(s, s’ | a) - state change • R(s, s’ | a) - reward • ∈ [0, 1] - discount factor
  3. Reinforce.js http://cs.stanford.edu/people/karpathy/reinforcejs/ // create an environment object var env =

    {}; env.getNumStates = function() { return 8; } env.getMaxNumActions = function() { return 4; } // create the DQN agent var spec = { alpha: 0.01 } agent = new RL.DQNAgent(env, spec); setInterval(function(){ // start the learning loop var action = agent.act(s); // s is an array of length 8 agent.learn(reward); }, 0);
  4. Deep Q-Learning 1. Do a feedforward pass for the current

    state s to get predicted Q-values for all actions. 2. Do a feedforward pass for the next state s’ and calculate maximum overall network outputs max a’ Q(s’, a’). 3. Set Q-value target for action to r + γmax a’ Q(s’, a’) (use the max calculated in step 2). For all other actions, set the Q-value target to the same as originally returned from step 1, making the error 0 for those outputs. 4. Update the weights using backpropagation. http://www.nervanasys.com/demystifying-deep-reinforcement-learning/