Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Deep Reinforcement Learning - Introduction

Wonseok Jung
October 29, 2018
610

Deep Reinforcement Learning - Introduction

Wonseok Jung

October 29, 2018
Tweet

Transcript

  1. 
 8POTFPL+VOH $JUZ6OJWFSTJUZPG/FX:PSL#BSVDI$PMMFHF %BUB4DJFODF.BKPS  $POOFYJPO"*'PVOEFS %FFQ-FBSOJOH$PMMFHF3FJOGPSDFNFOU-FBSOJOH3FTFBSDIFS .PEVMBCT$53--FBEFS
 3FJOGPSDFNFOU-FBSOJOH 0CKFDU%FUFDUJPO

    $IBUCPU (JUIVC IUUQTHJUIVCDPNXPOTFPLKVOH 'BDFCPPL IUUQTXXXGBDFCPPLDPNXTKVOH
 #MPH IUUQTXPOTFPLKVOHHJUIVCJP
 :PVUVCF
 IUUQTXXXZPVUVCFDPNDIBOOFM6$N5Y8,EIM8W+6GS3X

  2. 8IBUXFDPWFS 1. From supervised learning to decision making 2. Model-free

    algorithms : Q - Learning, policy gradients, actor-critic 3. Advanced model learning and Prediction 4. Exploration 5. Transfer and multi-task learning, meta-learning 6. Open problems, research talks, invited lectures
  3. पण 1. Imitation Learning 2. Policy gradients 3. Q-learning and

    actor-critic algorithms 4. Model-based reinforcement learning 5. Advanced model-free RL algorithms
  4. 3FJOGPSDFNFOU-FBSOJOHQSPWJEFTBGPSNBMJTNGPSCFIBWJPS 1. Deep learning does not provide decision making 2.

    In order to make a decision -> mathematical formalism 3. Reinforcement learning is what give us the mathematical framework 
 for dealing with the decision making
  5. .BUIFNBUJDBMGSBNFXPSLGPSEFBMJOH XJUIEFDJTJPONBLJOH 1. Models an interaction between and Agent and

    an World 2. Agent makes a decision 3. World responds to that decision with consequences - observation, reward "DUJ "HFOU &O 3F A t R t 4UB S t R t+1 S t+1 &OWJSPONFOU "HFOU
  6. 8IBUEPFTFOEUPFOEMFBSOJOHNFBOGPS TFRVFOUJBMEFDJTJPONBLJOH 1. You are walking to the jungle and

    see the tiger 2. You need to take some action (You may wanna run away ) 3. Tiger -> perception (“oh yeah it is a tiger”) -> control system -> “Run”
  7. 4JNQMJGJFE 1. You don’t even know that is a tiger

    2. You just know that if getting eaten is a bad thing, not getting eaten is a good thing 3. Tiger -> control system -> “Run”
  8. Action, Observation and Rewards 1. Agent makes decisions : actions

    2. The world responds with consequences : observations and rewards
  9. &OWJSPONFOU 3FXBSE A t R t S t R t+1

    S t+1 5BQUIFCBMM 1PTJUJWF3FXBSE

  10. 3PCPUJDT 1. Actions : motor current or torque 2. Observations

    : Camera images 3. Rewards : task success measure
  11. *NBHFDMBTTJGJDBUJPO 1. Actions : label the output 2. Observations :

    Image pixels 3. Rewards : correct or not correct
  12. 1. Reinforcement basically is solving the task in the most

    general form in the most complex setting 2. Deep models are what allow reinforcement learning algorithms to solve complex problems end to end 3. Reinforcement learning provides formulas (algorithm framework) 4. Deep learning provides the representations that allow us to apply that formalism to very complex problems
 with high dimensional observations and complicated action spaces 3-9%-
  13. 1. Dexterous multi-fingered hands : versatile, provide a generic way

    to perform a multitude of tasks 2. Controlling : challenging due to high dimensionality and large number of potential contacts. 3. Success of DRL in robotics has thus far been limited to simpler manipulators and tasks 4VNNBSZ 1. Model-free DRL can effectively scale up to complex manipulation tasks ( in simulated experiments ) 2. use of a small number of human demonstrations -> sample complexity can be significantly reduced 3. Use of demonstrations -> natural movements 4. successful policies -> object relocation, in-hand manipulation, tool use, door opening
  14. DOUBLE DQN SUPERMARIO WITH R.L JOQVU "DUJPO WBMVF &OW 2/FUXPSL

    s’ s 3FQMBZNFNPSZ 2 T B a r (S t , A t , R t+ 1 , S t+ 1 )
  15. 8IZTIPVMEXFTUVEZUIJTOPX 1. Advances in deep learning 2. Advances in reinforcement

    learning 3. Advances in computational capability https://pdfs.semanticscholar.org/54c4/cf3a8168c1b70f91cf78a3dc98b671935492.pdf
  16. -FBSOJOHGSPNSFXBSE 1. Basic reinforcement learning : maximizing rewards 2. Learning

    reward function from examples ( Inverse R.L ) 3. Transferring knowledge between domains (transfer learning , meta-learning) 4. Learning to predict and using prediction to act
  17. "SFUIFSFBOZGPSNTPGTVQFSWJTJPO  1. Learning from demonstrations
 - Directly copying observed

    behavior (Imitation )
 - Inferring rewards from observed behavior 2. Learning from observing the world 
 - Learning to Predict 
 - Unsupervised Learning 3. Learning from other tasks
 - Transfer learning
 - Meta-learning : learning to learn
  18. 8IBUIBTQSPWFODIBMMFOHJOHTPGBS  1. Humans can learn incredibly quickly 
 -

    Deep RL methods are usually slow 2. Humans can reuse past knowledge 
 - Transfer learning in deep RL is an open problem 3. Not clear what the reward function should be 4. Not clear what the role of prediction should be