×
Copy
Open
Link
Embed
Share
Beginning
This slide
Copy link URL
Copy link URL
Copy iframe embed code
Copy iframe embed code
Copy javascript embed code
Copy javascript embed code
Share
Tweet
Share
Tweet
Slide 1
Slide 1 text
'BTUFS3FJOGPSDFNFOU-FBSOJOH 8POTFPL+VOH WJB5SBOTGFS +PIO4DIVMNBO
Slide 2
Slide 2 text
No content
Slide 3
Slide 3 text
FASTER REINFORCEMENT LEARNING VIA TRANSFER
Slide 4
Slide 4 text
No content
Slide 5
Slide 5 text
OVERVIEW 1. Policy Gradients Success Stories Limitations 2. Meta Reinforcement Learning 3. Gym retro OVERVIEW
Slide 6
Slide 6 text
TERMINOLOGY 3FJOGPSDFNFOU-FBOJOH 5SJBMBOEFSSPSਸೞݴ3FXBSEܳ୭ചೠ %FFQ3- /FVSBMOFUXPSLܳࢎਊೞৈ3-BMHPSJUINਸSFQSFTFOUೠѪ .FUB-FBSOJOH t-FBSOJOHIPXUP-FBSOuযځೠ-FBSOJOHীҙৈೞח5BTLܳ.BTUFS REINFORCEMENT LEARNING
Slide 7
Slide 7 text
TERMINOLOGY 3FJOGPSDFNFOU-FBOJOH 5SJBMBOEFSSPSਸೞݴ3FXBSEܳ୭ചೠ %FFQ3- /FVSBMOFUXPSLܳࢎਊೞৈ3-BMHPSJUINਸSFQSFTFOUೠѪ .FUB-FBSOJOH t-FBSOJOHIPXUP-FBSOuযځೠ-FBSOJOHীҙৈೞח5BTLܳ.BTUFS REINFORCEMENT LEARNING
Slide 8
Slide 8 text
TERMINOLOGY 3FJOGPSDFNFOU-FBOJOH 5SJBMBOEFSSPSਸೞݴ3FXBSEܳ୭ചೠ %FFQ3- /FVSBMOFUXPSLܳࢎਊೞৈ3-BMHPSJUINਸSFQSFTFOUೠѪ .FUB-FBSOJOH t-FBSOJOHIPXUP-FBSOuযځೠ-FBSOJOHীҙৈೞח5BTLܳ.BTUFS REINFORCEMENT LEARNING
Slide 9
Slide 9 text
TERMINOLOGY REINFORCEMENT LEARNING 3FJOGPSDFNFOU-FBOJOH 5SJBMBOEFSSPSਸೞݴ3FXBSEܳ୭ചೠ %FFQ3- /FVSBMOFUXPSLܳࢎਊೞৈ3-BMHPSJUINਸSFQSFTFOUೠѪ .FUB-FBSOJOH t-FBSOJOHIPXUP-FBSOuযځೠ-FBSOJOHীҙৈೞח5BTLܳ.BTUFS
Slide 10
Slide 10 text
MARKOV DECISION PROCESS "DUJPO "HFOU &OWJSPONFOU 3FXBSE At Rt 4UBUF St Rt+1 St+1 REINFORCEMENT LEARNING
Slide 11
Slide 11 text
AGENT "DUJPO "HFOU &OWJSPONFOU 3FXBSE At Rt 4UBUF St Rt+1 St+1 REINFORCEMENT LEARNING
Slide 12
Slide 12 text
ACTION "DUJPO "HFOU &OWJSPONFOU 3FXBSE At Rt 4UBUF St Rt+1 St+1 REINFORCEMENT LEARNING
Slide 13
Slide 13 text
OBSERVATION, REWARD "DUJPO "HFOU &OWJSPONFOU 3FXBSE At Rt 4UBUF St Rt+1 St+1 REINFORCEMENT LEARNING
Slide 14
Slide 14 text
TRAJECTORY REINFORCEMENT LEARNING (St , At , Rt+1 , St+1 ) (St+1 , At+1 , Rt+2 , St+2 ) (St+2 , At+2 , Rt+3 , St+3 )
Slide 15
Slide 15 text
RETURN 4UBUFWBMVF REINFORCEMENT LEARNING 4UBUF"DUJPOWBMVF
Slide 16
Slide 16 text
1. POLICY GRADIENTS
Slide 17
Slide 17 text
POLICY 1PMJDZ 0CTFSWBUJPOী೧"DUJPOਸࢶఖೞחೣࣻ REINFORCEMENT LEARNING
Slide 18
Slide 18 text
REINFORCEMENT LEARNING POLICY GRADIENTS https://wonseokjung.github.io//reinforcementlearning/update/RL-PG_RE/ 1PMJDZHSBEJFOUTী೧ખ؊ࣁঌҊरݶ 1PMJDZ(SBEJFOUTNFUIPE ؊જ1PMJDZܳӝਤ೧ࢲ1PMJDZܳ୭ചೞחъചणঌҊ્ܻ
Slide 19
Slide 19 text
PSEUDO CODE REINFORCEMENT LEARNING
Slide 20
Slide 20 text
POLICY GRADIENTS - HISTORY REINFORCEMENT LEARNING
Slide 21
Slide 21 text
POLICY GRADIENTS - HISTORY REINFORCEMENT LEARNING https://wonseokjung.github.io//reinforcementlearning/update/RL-PG_RE_AC/ 1PMJDZ(SBEJFOUT 3&*/'03$& "$503$3*5*$֤ޙܻ࠭
Slide 22
Slide 22 text
POLICY GRADIENTS - PPO REINFORCEMENT LEARNING
Slide 23
Slide 23 text
ALPHAGO, DOTA REINFORCEMENT LEARNING
Slide 24
Slide 24 text
ROBOTIC MANIPULATION REINFORCEMENT LEARNING
Slide 25
Slide 25 text
RL REQUIRES A LOT OF TRAINING TIME REINFORCEMENT LEARNING
Slide 26
Slide 26 text
RL REQUIRES A LOT OF TRAINING TIME REINFORCEMENT LEARNING
Slide 27
Slide 27 text
PRIOR KNOWLEDGE REINFORCEMENT LEARNING
Slide 28
Slide 28 text
HOW WE CAN ALLOW OUT A.I SYSTEM MAKE TO USE PRIOR KNOWLEDGE? REINFORCEMENT LEARNING https://ubisafe.org/explore/demeanure-clipart-prior-knowledge/
Slide 29
Slide 29 text
META REINFORCEMENT LEARNING REINFORCEMENT LEARNING https://ubisafe.org/explore/demeanure-clipart-prior-knowledge/ Meta Reinforcement Learning
Slide 30
Slide 30 text
META REINFORCEMENT LEARNING REINFORCEMENT LEARNING https://ubisafe.org/explore/demeanure-clipart-prior-knowledge/ 4JOH5BTL
Slide 31
Slide 31 text
SINGLE R.L : MAZE NAVIGATION REINFORCEMENT LEARNING 4JOH5BTL -FBSOUPOBWJHBUFGSPNTUBSUUPHPBMBTGBTUBTQPTTJCMFJOB TJOHMFNB[F
Slide 32
Slide 32 text
META R.L : MAZE NAVIGATION REINFORCEMENT LEARNING
Slide 33
Slide 33 text
META RL IS A SPECIAL CASE OF NORMAL RL REINFORCEMENT LEARNING
Slide 34
Slide 34 text
REINFORCEMENT LEARNING RL2
Slide 35
Slide 35 text
REINFORCEMENT LEARNING META-RL IN MAZE
Slide 36
Slide 36 text
REINFORCEMENT LEARNING LEARNING DEXTEROUS IN HAND MANIPULATION
Slide 37
Slide 37 text
REINFORCEMENT LEARNING META-RL IN ROBOT
Slide 38
Slide 38 text
REINFORCEMENT LEARNING META RL : LIMITATION OF APPROACHES DESCRIBE PREVIOUSLY
Slide 39
Slide 39 text
REINFORCEMENT LEARNING META RL : CHANGES TO PROBLEM FORMULATION 'JOJUFTFUPGUSBJOJOHUBTLTBOEBUFTUTFUPGUBTLT
Slide 40
Slide 40 text
REINFORCEMENT LEARNING GYM RETRO
Slide 41
Slide 41 text
REINFORCEMENT LEARNING ALGORITHMS
Slide 42
Slide 42 text
REINFORCEMENT LEARNING PPO(JOINT) + FINE TUNING
Slide 43
Slide 43 text
REINFORCEMENT LEARNING RETRO CONTEST
Slide 44
Slide 44 text
REINFORCEMENT LEARNING IMPROVEMENT
Slide 45
Slide 45 text
REINFORCEMENT LEARNING MODULABS CTRL
Slide 46
Slide 46 text
(JUIVC IUUQTHJUIVCDPNXPOTFPLKVOH 'BDFCPPL IUUQTXXXGBDFCPPLDPNXTKVOH #MPH IUUQTXPOTFPLKVOHHJUIVCJP хࢎפ 5IBOLZPV