Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Introduction Deep Reinforcement Learning
Search
Wonseok Jung
November 20, 2018
0
130
Introduction Deep Reinforcement Learning
Wonseok Jung
November 20, 2018
Tweet
Share
More Decks by Wonseok Jung
See All by Wonseok Jung
Ai for business -self car driving
wonseokjung
0
180
reinforcement_learning_.pdf
wonseokjung
2
1.5k
원석이의 모두연에서 강화학습 보석되기
wonseokjung
0
390
NeuralIPS
wonseokjung
0
360
Deep reinforcemenet learning -2
wonseokjung
0
170
Deep Reinforcement Learning - Introduction
wonseokjung
1
610
How to become a datascientist ?
wonseokjung
2
2.3k
Review of Taylor series
wonseokjung
1
120
꿈꾸는 Agent
wonseokjung
2
110
Featured
See All Featured
Facilitating Awesome Meetings
lara
50
6.1k
Become a Pro
speakerdeck
PRO
25
5k
Fireside Chat
paigeccino
34
3k
A Modern Web Designer's Workflow
chriscoyier
693
190k
KATA
mclloyd
29
14k
RailsConf & Balkan Ruby 2019: The Past, Present, and Future of Rails at GitHub
eileencodes
131
33k
Rails Girls Zürich Keynote
gr2m
94
13k
ピンチをチャンスに:未来をつくるプロダクトロードマップ #pmconf2020
aki_iinuma
109
49k
A designer walks into a library…
pauljervisheath
203
24k
[RailsConf 2023] Rails as a piece of cake
palkan
52
4.9k
The Cost Of JavaScript in 2023
addyosmani
45
6.7k
For a Future-Friendly Web
brad_frost
175
9.4k
Transcript
3-9%- 8POTFPL+VOH *OUSPEVDUJPOUP3FJOGPSDFNFOU-FBSOJOH GFBUDT
8POTFPL+VOH $JUZ6OJWFSTJUZPG/FX:PSL#BSVDI$PMMFHF %BUB4DJFODF.BKPS $POOFYJPO"*'PVOEFS %FFQ-FBSOJOH$PMMFHF3FJOGPSDFNFOU-FBSOJOH3FTFBSDIFS .PEVMBCT$53--FBEFS 3FJOGPSDFNFOU-FBSOJOH 0CKFDU%FUFDUJPO
$IBUCPU (JUIVC IUUQTHJUIVCDPNXPOTFPLKVOH 'BDFCPPL IUUQTXXXGBDFCPPLDPNXTKVOH #MPH IUUQTXPOTFPLKVOHHJUIVCJP :PVUVCF IUUQTXXXZPVUVCFDPNDIBOOFM6$N5Y8,EIM8W+6GS3X
5PEBZ .BSLPWEFDJTJPOQSPDFTT ъചणޙઁ ъചणঌҊ્ܻࣁо࠙ܨ ъചणঌҊ્ܻఋੑѐਃ
.BSLPW%FDJTJPO1SPCMFN πθ (at ∣ ot ) πθ (at ∣ st
) - policy - policy ( fully observed ) st ot at - state - observation - action o1 s1 a1 o2 s2 a2 o3 s3 a3 p(st+1 ∣ st , at ) p(st+1 ∣ st , at ) .BSLPW%FDJTJPO1SPCMFN TUBUF BDUJPO SFXBSE USBOTJUJPO١ਵ۽അغחъചणীࢲࣁ࢚ਸ
3FXBSEGVODUJPOT )JHI3FXBSE ֫ࠁ࢚ উೞѱݾীب -PX3FXBSE ծࠁ࢚ ҮాࢎҊ
1PMJDZউೞѱೞѱೞח଼ण )JHI3FXBSE -PX3FXBSE 3FXBSEGVODUJPOਸా೧ण
.BSLPWDIBJO s ∈ S s T 4UBUFTQBDF 5SBOTJUJPOPQFSBUPS 4UBUFTQBDF M
= S, T
.BSLPWDIBJO HSBQIJDBMMZ →μt +1 = T→μt μt,i = p(st =
i) s1 s2 s3 p(st+1 ∣ st , at ) p(st+1 ∣ st , at ) Ti,j = p(st+1 = i ∣ st = j) TUBUFKоযਸٸTUBUFоJؼഛܫݫܼझ UJNFTUFQUীTUBUFоJੌഛܫ
.BSLPWEFDJTJPOQSPDFTT s1 a1 s2 a2 s3 p(st+1 ∣ st ,
at ) p(st+1 ∣ st , at ) s ∈ S s A T a ∈ A 4UBUFTQBDF "DUJPOTQBDF 5SBOTJUJPOPQFSBUPS 4UBUFTQBDF "DUJPOTQBDF M = {S, A, T, r} r 3FXBSEGVODUJPO
.BSLPWEFDJTJPOQSPDFTT μt,i = p(st = i) Ti,j,k = p(st+1 =
i ∣ st = j, at = k) UJNFTUFQUীTUBUFоJੌഛܫ UJNFTUFQUীࢲTUBUFоKҊBDUJPOLੌ⮶ UJNFTUFQU ীࢲTUBUFоJੌഛܫ ξt,k = p(at = k) UJNFTUFQUীBDUJPOLੌഛܫ r : SxA → R SFXBSEGVODUJPO μt,i = ∑ j,k Ti,j,k μt,j ξt,k
1BSUJBM0CTFSWFE.BSLPWEFDJTJPOQSPDFTT s ∈ S s A T a ∈ A
4UBUFTQBDF "DUJPOTQBDF 5SBOTJUJPOPQFSBUPS 4UBUFTQBDF "DUJPOTQBDF M = {S, A, O, T, E, r} O 0CTFSWBUJPOTQBDF E &NJTTJPOQSPCBCJMJUZ P(ot ∣ st ) r 3FXBSEGVODUJPO o ∈ O PCTFSWBUJPOTQBDF o1 s1 a1 o2 s2 a2 o3 s3 a3 p(st+1 ∣ st , at ) p(st+1 ∣ st , at )
5IFHPBMPGSFJOGPSDFNFOUMFBSOJOH πθ (a ∣ s) θ ੋҕन҃ݎXFJHIUT 1PMJDZחۄݫఠܳ߸ೠ θ ੋҕन҃ݎੑ۱ਵ۽TUBUFܳ߉ҊBDUJPOਸ۱ೠ
ജ҃BDUJPOਸੑ۱ਵ۽߉ҊTUBUFܳ۱ೠ
5IFHPBMPGSFJOGPSDFNFOUMFBSOJOH pθ (s1 , a1 , . . . .
. , ST , aT ) = p(s1 ) T ∏ t=1 πθ (at ∣ st )p(st+1 ∣ st , at ) pθ (τ) θ* = argmaxθ Eτ pθ (τ) [∑ t r(st , at )]
5IFBOBUPNZPGBSFJOGPSDFNFOUMFBSOJOHBMHPSJUIN 1PMJDZܳࢎਊೞৈ ࢠࢤࢿ SFUVSOчਸஏ SFUVSO SFXBSE ܳਊೞৈ଼
সؘ
2GVODUJPOBOE7BMVFGVODUJPO Qπ(st , at ) = ∑T t′=t Eπθ [r(s′
t , a′ t ) ∣ st , at ] 2GVODUJPO 4UBUFU BDUJPOUীࢲࠗఠզٸө 5 ߉ਸࣻח୨SFXBSEӝчਸ҅
2GVODUJPOBOE7BMVFGVODUJPO 7BMVFGVODUJPO Vπ(st ) = T ∑ t′=t Eπθ [r(s′
t , a′ t ) ∣ st ] TUBUFUীࢲࠗఠզٸө 5 ߉ਸࣻח୨SFXBSEӝчਸ҅
5ZQFPG3-BMHPSJUINT 1PMJDZHSBEJFOUT 7BMVFCBTFE "DUPSDSJUJD .PEFMCBTFE3-
ъചणঌҊ્ܻ݆ਬ ਃೠࢠ ೞಌۄݫఠ উࢿ $POWFSHF
.%1 4UPDIBTUJD %FUFSNJOJTUJD $POUJOVPVT %JTDSFUF .%1ೠ ޖೠ
/&95DIBQUFS 1PMJDZHSBEJFOUT