Deep Reinforcement Learning - Introduction

3-9%- 8POTFPL+VOH *OUSPEVDUJPO GFBUDT

  8POTFPL+VOH $JUZ6OJWFSTJUZPG/FX:PSL#BSVDI$PMMFHF %BUB4DJFODF.BKPS $POOFYJPO"*'PVOEFS %FFQ-FBSOJOH$PMMFHF3FJOGPSDFNFOU-FBSOJOH3FTFBSDIFS .PEVMBCT$53--FBEFS  3FJOGPSDFNFOU-FBSOJOH 0CKFDU%FUFDUJPO
$IBUCPU (JUIVC IUUQTHJUIVCDPNXPOTFPLKVOH 'BDFCPPL IUUQTXXXGBDFCPPLDPNXTKVOH  #MPH IUUQTXPOTFPLKVOHHJUIVCJP  :PVUVCF  IUUQTXXXZPVUVCFDPNDIBOOFM6$N5Y8,EIM8W+6GS3X 

8IBUXFDPWFS 1. From supervised learning to decision making 2. Model-free
algorithms : Q - Learning, policy gradients, actor-critic 3. Advanced model learning and Prediction 4. Exploration 5. Transfer and multi-task learning, meta-learning 6. Open problems, research talks, invited lectures

पण 1. Imitation Learning 2. Policy gradients 3. Q-learning and
actor-critic algorithms 4. Model-based reinforcement learning 5. Advanced model-free RL algorithms

3FGFSFODF https://www.tensorflow.org/guide/low_level_intro 5FOTPSGMPXਸੜݽܲ׮ݶ ইې݂௼ܳࠁࣁਃ աࠗఠŬŬ https://github.com/wonseokjung/hands_on_tf 5FOTPSGMPXFBHFSਸোणೞҊर׮ݶইې݂௼۽ ઁӥ೸

What is R.L and why should we care about it
now?

)PXXFCVJMEJOUFMMJHFODFNBDIJOF IPVTFIPME TVQQPSU )VNBOPJE 3PCPUীѱ*OUFMMJHFODFо੓׮ݶ ৈ۞о૑ਊب۽ࢎਊؼࣻ੓׮ о੿۽ࠈ
ҳઑ۽ࠈ ോݠ֢੉٘

Intelligent machines must be able to adapt

0JM5BOL ߓחஶప੉ց য়ੌఝ௼ ܳऩҊ߄׮ܳѤցח৉ೡਸೠ׮ ݅ডߓীࢶਗ੉఑थೞ૑ঋইبҡଳ׮ݶ

%FFQMFBSOJOHIFMQTVTIBOEMFVOTUSVDUVSFEFOWJSPONFOUT 1SFEJDU6OTUSVDUVSFEFOWJSPONFOU 6OTUSVDUVSFE੿ഛೞѱ1SFEJDUೡࣻহחѪ FYߓীൗܽয়ੌী੄ೠച੤ߊࢤ

/FVSBMOFUXPSUܳࢎਊೞৈࣻ݅ѐ੄QBSBNFUFSܳ݅ٞ *OQVUਵ۽*NBHFܳ߉ਸࣻ੓׮ ੸ਊ࠙ঠ࠺੹ ੗োয୊ܻ١

3FJOGPSDFNFOU-FBSOJOHQSPWJEFTBGPSNBMJTNGPSCFIBWJPS 1. Deep learning does not provide decision making 2.
In order to make a decision -> mathematical formalism 3. Reinforcement learning is what give us the mathematical framework   for dealing with the decision making

.BUIFNBUJDBMGSBNFXPSLGPSEFBMJOH XJUIEFDJTJPONBLJOH 1. Models an interaction between and Agent and
an World 2. Agent makes a decision 3. World responds to that decision with consequences - observation, reward "DUJ "HFOU &O 3F A t R t 4UB S t R t+1 S t+1 &OWJSPONFOU "HFOU

'JSTUCJHTVDDFTTFTJO3FJOGPSDFNFOU-FBSOJOH 1. It came from the combination of reinforcement learning
2. Playing 3. AlphaGo 4. Robotic manipulation

8IBUJTEFFQ3- BOEXIZTIPVMEXFDBSF

8IBUEPFTFOEUPFOEMFBSOJOHNFBOGPS TFRVFOUJBMEFDJTJPONBLJOH 1. You are walking to the jungle and
see the tiger 2. You need to take some action (You may wanna run away ) 3. Tiger -> perception (“oh yeah it is a tiger”) -> control system -> “Run”

4JNQMJGJFE 1. You don’t even know that is a tiger
2. You just know that if getting eaten is a bad thing, not getting eaten is a good thing 3. Tiger -> control system -> “Run”

Action, Observation and Rewards 1. Agent makes decisions : actions
2. The world responds with consequences : observations and rewards

)PXUPBOJNBMTMFBSO 1. Actions : muscle contractions 2. Observations : sight,
smell 3. Rewards : food

&OWJSPONFOU 3FXBSE A t R t S t R t+1
S t+1 5BQUIFCBMM 1PTJUJWF3FXBSE 

3PCPUJDT 1. Actions : motor current or torque 2. Observations
: Camera images 3. Rewards : task success measure

*OWFOUPSZ.BOBHFNFOU 1. Actions : what to purchase 2. Observations :
Inventory levels 3. Rewards : profit

*NBHFDMBTTJGJDBUJPO 1. Actions : label the output 2. Observations :
Image pixels 3. Rewards : correct or not correct

1. Reinforcement basically is solving the task in the most
general form in the most complex setting 2. Deep models are what allow reinforcement learning algorithms to solve complex problems end to end 3. Reinforcement learning provides formulas (algorithm framework) 4. Deep learning provides the representations that allow us to apply that formalism to very complex problems  with high dimensional observations and complicated action spaces 3-9%-

https://arxiv.org/pdf/1709.10087.pdf

1. Dexterous multi-fingered hands : versatile, provide a generic way
to perform a multitude of tasks 2. Controlling : challenging due to high dimensionality and large number of potential contacts. 3. Success of DRL in robotics has thus far been limited to simpler manipulators and tasks 4VNNBSZ 1. Model-free DRL can effectively scale up to complex manipulation tasks ( in simulated experiments ) 2. use of a small number of human demonstrations -> sample complexity can be significantly reduced 3. Use of demonstrations -> natural movements 4. successful policies -> object relocation, in-hand manipulation, tool use, door opening

https://www.cs.toronto.edu/~vmnih/docs/dqn.pdf

APPROXIMATE ACTION-VALUE SUPERMARIO WITH R.L

DOUBLE DQN SUPERMARIO WITH R.L JOQVU "DUJPO WBMVF &OW 2/FUXPSL
s’ s 3FQMBZNFNPSZ 2 T B a r (S t , A t , R t+ 1 , S t+ 1 )

https://arxiv.org/pdf/1806.10293.pdf

8IZTIPVMEXFTUVEZUIJTOPX 1. Advances in deep learning 2. Advances in reinforcement
learning 3. Advances in computational capability https://pdfs.semanticscholar.org/54c4/cf3a8168c1b70f91cf78a3dc98b671935492.pdf

8IZTIPVMEXFTUVEZUIJTOPX 1. In five years, we’ve seen few impressive successes
2. playing game(Left) 3. Robot(middle) 4. GO (Right)

8IBUPUIFSQSPCMFNTEPXFOFFEUPTPMWFUPFOBCMFSFBM XPSMETFRVFOUJBMEFDJTJPONBLJOH

-FBSOJOHGSPNSFXBSE 1. Basic reinforcement learning : maximizing rewards 2. Learning
reward function from examples ( Inverse R.L ) 3. Transferring knowledge between domains (transfer learning , meta-learning) 4. Learning to predict and using prediction to act

8IFSFEPSFXBSEDPNFGSPN

"SFUIFSFBOZGPSNTPGTVQFSWJTJPO 1. Learning from demonstrations  - Directly copying observed
behavior (Imitation )  - Inferring rewards from observed behavior 2. Learning from observing the world   - Learning to Predict   - Unsupervised Learning 3. Learning from other tasks  - Transfer learning  - Meta-learning : learning to learn

8IBUIBTQSPWFODIBMMFOHJOHTPGBS 1. Humans can learn incredibly quickly   -
Deep RL methods are usually slow 2. Humans can reuse past knowledge   - Transfer learning in deep RL is an open problem 3. Not clear what the reward function should be 4. Not clear what the role of prediction should be

/&95&1*40%&    4VQFSWJTFE-FBSOJOHBOE*NJUBUJPO

Deep Reinforcement Learning - Introduction

Deep Reinforcement Learning - Introduction

More Decks by Wonseok Jung

Featured

Transcript