Slide 1

Slide 1 text

'BTUFS3FJOGPSDFNFOU-FBSOJOH 8POTFPL+VOH WJB5SBOTGFS +PIO4DIVMNBO

Slide 2

Slide 2 text

No content

Slide 3

Slide 3 text

FASTER REINFORCEMENT LEARNING VIA TRANSFER

Slide 4

Slide 4 text

No content

Slide 5

Slide 5 text

OVERVIEW 1. Policy Gradients
 Success Stories 
 Limitations 2. Meta Reinforcement Learning 3. Gym retro OVERVIEW

Slide 6

Slide 6 text

TERMINOLOGY 3FJOGPSDFNFOU-FBOJOH
 
 5SJBMBOEFSSPSਸೞݴ3FXBSEܳ୭؀ചೠ׮
 %FFQ3-
 
 /FVSBMOFUXPSLܳࢎਊೞৈ3-BMHPSJUINਸSFQSFTFOUೠѪ
 .FUB-FBSOJOH
 
 t-FBSOJOHIPXUP-FBSOuযځೠ-FBSOJOHীҙৈೞח5BTLܳ.BTUFS REINFORCEMENT LEARNING

Slide 7

Slide 7 text

TERMINOLOGY 3FJOGPSDFNFOU-FBOJOH
 
 5SJBMBOEFSSPSਸೞݴ3FXBSEܳ୭؀ചೠ׮
 %FFQ3-
 
 /FVSBMOFUXPSLܳࢎਊೞৈ3-BMHPSJUINਸSFQSFTFOUೠѪ
 .FUB-FBSOJOH
 
 t-FBSOJOHIPXUP-FBSOuযځೠ-FBSOJOHীҙৈೞח5BTLܳ.BTUFS REINFORCEMENT LEARNING

Slide 8

Slide 8 text

TERMINOLOGY 3FJOGPSDFNFOU-FBOJOH
 
 5SJBMBOEFSSPSਸೞݴ3FXBSEܳ୭؀ചೠ׮
 %FFQ3-
 
 /FVSBMOFUXPSLܳࢎਊೞৈ3-BMHPSJUINਸSFQSFTFOUೠѪ
 .FUB-FBSOJOH
 
 t-FBSOJOHIPXUP-FBSOuযځೠ-FBSOJOHীҙৈೞח5BTLܳ.BTUFS REINFORCEMENT LEARNING

Slide 9

Slide 9 text

TERMINOLOGY REINFORCEMENT LEARNING 3FJOGPSDFNFOU-FBOJOH
 
 5SJBMBOEFSSPSਸೞݴ3FXBSEܳ୭؀ചೠ׮
 %FFQ3-
 
 /FVSBMOFUXPSLܳࢎਊೞৈ3-BMHPSJUINਸSFQSFTFOUೠѪ
 .FUB-FBSOJOH
 
 t-FBSOJOHIPXUP-FBSOuযځೠ-FBSOJOHীҙৈೞח5BTLܳ.BTUFS

Slide 10

Slide 10 text

MARKOV DECISION PROCESS "DUJPO "HFOU &OWJSPONFOU 3FXBSE At Rt 4UBUF St Rt+1 St+1 REINFORCEMENT LEARNING

Slide 11

Slide 11 text

AGENT "DUJPO "HFOU &OWJSPONFOU 3FXBSE At Rt 4UBUF St Rt+1 St+1 REINFORCEMENT LEARNING

Slide 12

Slide 12 text

ACTION "DUJPO "HFOU &OWJSPONFOU 3FXBSE At Rt 4UBUF St Rt+1 St+1 REINFORCEMENT LEARNING

Slide 13

Slide 13 text

OBSERVATION, REWARD "DUJPO "HFOU &OWJSPONFOU 3FXBSE At Rt 4UBUF St Rt+1 St+1 REINFORCEMENT LEARNING

Slide 14

Slide 14 text

TRAJECTORY REINFORCEMENT LEARNING (St , At , Rt+1 , St+1 ) (St+1 , At+1 , Rt+2 , St+2 ) (St+2 , At+2 , Rt+3 , St+3 )

Slide 15

Slide 15 text

RETURN 4UBUFWBMVF REINFORCEMENT LEARNING 4UBUF"DUJPOWBMVF

Slide 16

Slide 16 text

1. POLICY GRADIENTS

Slide 17

Slide 17 text

POLICY 1PMJDZ 
 0CTFSWBUJPOী੄೧"DUJPOਸࢶఖೞחೣࣻ REINFORCEMENT LEARNING

Slide 18

Slide 18 text

REINFORCEMENT LEARNING POLICY GRADIENTS https://wonseokjung.github.io//reinforcementlearning/update/RL-PG_RE/ 1PMJDZHSBEJFOUTী؀೧ખ؊੗ࣁ൤ঌҊर׮ݶ 1PMJDZ(SBEJFOUTNFUIPE ؊જ਷1PMJDZܳ଺ӝਤ೧ࢲ1PMJDZ੗୓ܳ୭੸ചೞחъച೟णঌҊ્ܻ

Slide 19

Slide 19 text

PSEUDO CODE REINFORCEMENT LEARNING

Slide 20

Slide 20 text

POLICY GRADIENTS - HISTORY REINFORCEMENT LEARNING

Slide 21

Slide 21 text

POLICY GRADIENTS - HISTORY REINFORCEMENT LEARNING https://wonseokjung.github.io//reinforcementlearning/update/RL-PG_RE_AC/ 1PMJDZ(SBEJFOUT 3&*/'03$& "$503$3*5*$֤ޙܻ࠭

Slide 22

Slide 22 text

POLICY GRADIENTS - PPO REINFORCEMENT LEARNING

Slide 23

Slide 23 text

ALPHAGO, DOTA REINFORCEMENT LEARNING

Slide 24

Slide 24 text

ROBOTIC MANIPULATION REINFORCEMENT LEARNING

Slide 25

Slide 25 text

RL REQUIRES A LOT OF TRAINING TIME REINFORCEMENT LEARNING

Slide 26

Slide 26 text

RL REQUIRES A LOT OF TRAINING TIME REINFORCEMENT LEARNING

Slide 27

Slide 27 text

PRIOR KNOWLEDGE REINFORCEMENT LEARNING

Slide 28

Slide 28 text

HOW WE CAN ALLOW OUT A.I SYSTEM MAKE TO USE PRIOR KNOWLEDGE? REINFORCEMENT LEARNING https://ubisafe.org/explore/demeanure-clipart-prior-knowledge/

Slide 29

Slide 29 text

META REINFORCEMENT LEARNING REINFORCEMENT LEARNING https://ubisafe.org/explore/demeanure-clipart-prior-knowledge/ Meta Reinforcement Learning

Slide 30

Slide 30 text

META REINFORCEMENT LEARNING REINFORCEMENT LEARNING https://ubisafe.org/explore/demeanure-clipart-prior-knowledge/ 4JOH5BTL

Slide 31

Slide 31 text

SINGLE R.L : MAZE NAVIGATION REINFORCEMENT LEARNING 4JOH5BTL
 
 -FBSOUPOBWJHBUFGSPNTUBSUUPHPBMBTGBTUBTQPTTJCMFJOB TJOHMFNB[F

Slide 32

Slide 32 text

META R.L : MAZE NAVIGATION REINFORCEMENT LEARNING

Slide 33

Slide 33 text

META RL IS A SPECIAL CASE OF NORMAL RL REINFORCEMENT LEARNING

Slide 34

Slide 34 text

REINFORCEMENT LEARNING RL2

Slide 35

Slide 35 text

REINFORCEMENT LEARNING META-RL IN MAZE

Slide 36

Slide 36 text

REINFORCEMENT LEARNING LEARNING DEXTEROUS IN HAND MANIPULATION

Slide 37

Slide 37 text

REINFORCEMENT LEARNING META-RL IN ROBOT

Slide 38

Slide 38 text

REINFORCEMENT LEARNING META RL : LIMITATION OF APPROACHES DESCRIBE PREVIOUSLY

Slide 39

Slide 39 text

REINFORCEMENT LEARNING META RL : CHANGES TO PROBLEM FORMULATION 'JOJUFTFUPGUSBJOJOHUBTLTBOEBUFTUTFUPGUBTLT

Slide 40

Slide 40 text

REINFORCEMENT LEARNING GYM RETRO

Slide 41

Slide 41 text

REINFORCEMENT LEARNING ALGORITHMS

Slide 42

Slide 42 text

REINFORCEMENT LEARNING PPO(JOINT) + FINE TUNING

Slide 43

Slide 43 text

REINFORCEMENT LEARNING RETRO CONTEST

Slide 44

Slide 44 text

REINFORCEMENT LEARNING IMPROVEMENT

Slide 45

Slide 45 text

REINFORCEMENT LEARNING MODULABS CTRL

Slide 46

Slide 46 text

(JUIVC IUUQTHJUIVCDPNXPOTFPLKVOH 'BDFCPPL IUUQTXXXGBDFCPPLDPNXTKVOH #MPH IUUQTXPOTFPLKVOHHJUIVCJP хࢎ೤פ׮ 
 5IBOLZPV