Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Introduction Deep Reinforcement Learning
Search
Wonseok Jung
November 20, 2018
170
0
Share
Introduction Deep Reinforcement Learning
Wonseok Jung
November 20, 2018
More Decks by Wonseok Jung
See All by Wonseok Jung
Ai for business -self car driving
wonseokjung
0
210
reinforcement_learning_.pdf
wonseokjung
2
1.5k
원석이의 모두연에서 강화학습 보석되기
wonseokjung
0
440
NeuralIPS
wonseokjung
0
440
Deep reinforcemenet learning -2
wonseokjung
0
220
Deep Reinforcement Learning - Introduction
wonseokjung
1
660
How to become a datascientist ?
wonseokjung
2
2.3k
Review of Taylor series
wonseokjung
1
130
꿈꾸는 Agent
wonseokjung
2
160
Featured
See All Featured
The Spectacular Lies of Maps
axbom
PRO
1
700
The browser strikes back
jonoalderson
0
970
Building an army of robots
kneath
306
46k
It's Worth the Effort
3n
188
29k
Designing for humans not robots
tammielis
254
26k
Code Review Best Practice
trishagee
74
20k
SEO for Brand Visibility & Recognition
aleyda
0
4.5k
Navigating the moral maze — ethical principles for Al-driven product design
skipperchong
2
330
Sam Torres - BigQuery for SEOs
techseoconnect
PRO
0
250
More Than Pixels: Becoming A User Experience Designer
marktimemedia
3
380
VelocityConf: Rendering Performance Case Studies
addyosmani
333
25k
Build your cross-platform service in a week with App Engine
jlugia
234
18k
Transcript
3-9%- 8POTFPL+VOH *OUSPEVDUJPOUP3FJOGPSDFNFOU-FBSOJOH GFBUDT
8POTFPL+VOH $JUZ6OJWFSTJUZPG/FX:PSL#BSVDI$PMMFHF %BUB4DJFODF.BKPS $POOFYJPO"*'PVOEFS %FFQ-FBSOJOH$PMMFHF3FJOGPSDFNFOU-FBSOJOH3FTFBSDIFS .PEVMBCT$53--FBEFS 3FJOGPSDFNFOU-FBSOJOH 0CKFDU%FUFDUJPO
$IBUCPU (JUIVC IUUQTHJUIVCDPNXPOTFPLKVOH 'BDFCPPL IUUQTXXXGBDFCPPLDPNXTKVOH #MPH IUUQTXPOTFPLKVOHHJUIVCJP :PVUVCF IUUQTXXXZPVUVCFDPNDIBOOFM6$N5Y8,EIM8W+6GS3X
5PEBZ .BSLPWEFDJTJPOQSPDFTT ъചणޙઁ ъചणঌҊ્ܻࣁо࠙ܨ ъചणঌҊ્ܻఋੑѐਃ
.BSLPW%FDJTJPO1SPCMFN πθ (at ∣ ot ) πθ (at ∣ st
) - policy - policy ( fully observed ) st ot at - state - observation - action o1 s1 a1 o2 s2 a2 o3 s3 a3 p(st+1 ∣ st , at ) p(st+1 ∣ st , at ) .BSLPW%FDJTJPO1SPCMFN TUBUF BDUJPO SFXBSE USBOTJUJPO١ਵ۽അغחъചणীࢲࣁ࢚ਸ
3FXBSEGVODUJPOT )JHI3FXBSE ֫ࠁ࢚ উೞѱݾীب -PX3FXBSE ծࠁ࢚ ҮాࢎҊ
1PMJDZউೞѱೞѱೞח଼ण )JHI3FXBSE -PX3FXBSE 3FXBSEGVODUJPOਸా೧ण
.BSLPWDIBJO s ∈ S s T 4UBUFTQBDF 5SBOTJUJPOPQFSBUPS 4UBUFTQBDF M
= S, T
.BSLPWDIBJO HSBQIJDBMMZ →μt +1 = T→μt μt,i = p(st =
i) s1 s2 s3 p(st+1 ∣ st , at ) p(st+1 ∣ st , at ) Ti,j = p(st+1 = i ∣ st = j) TUBUFKоযਸٸTUBUFоJؼഛܫݫܼझ UJNFTUFQUীTUBUFоJੌഛܫ
.BSLPWEFDJTJPOQSPDFTT s1 a1 s2 a2 s3 p(st+1 ∣ st ,
at ) p(st+1 ∣ st , at ) s ∈ S s A T a ∈ A 4UBUFTQBDF "DUJPOTQBDF 5SBOTJUJPOPQFSBUPS 4UBUFTQBDF "DUJPOTQBDF M = {S, A, T, r} r 3FXBSEGVODUJPO
.BSLPWEFDJTJPOQSPDFTT μt,i = p(st = i) Ti,j,k = p(st+1 =
i ∣ st = j, at = k) UJNFTUFQUীTUBUFоJੌഛܫ UJNFTUFQUীࢲTUBUFоKҊBDUJPOLੌ⮶ UJNFTUFQU ীࢲTUBUFоJੌഛܫ ξt,k = p(at = k) UJNFTUFQUীBDUJPOLੌഛܫ r : SxA → R SFXBSEGVODUJPO μt,i = ∑ j,k Ti,j,k μt,j ξt,k
1BSUJBM0CTFSWFE.BSLPWEFDJTJPOQSPDFTT s ∈ S s A T a ∈ A
4UBUFTQBDF "DUJPOTQBDF 5SBOTJUJPOPQFSBUPS 4UBUFTQBDF "DUJPOTQBDF M = {S, A, O, T, E, r} O 0CTFSWBUJPOTQBDF E &NJTTJPOQSPCBCJMJUZ P(ot ∣ st ) r 3FXBSEGVODUJPO o ∈ O PCTFSWBUJPOTQBDF o1 s1 a1 o2 s2 a2 o3 s3 a3 p(st+1 ∣ st , at ) p(st+1 ∣ st , at )
5IFHPBMPGSFJOGPSDFNFOUMFBSOJOH πθ (a ∣ s) θ ੋҕन҃ݎXFJHIUT 1PMJDZחۄݫఠܳ߸ೠ θ ੋҕन҃ݎੑ۱ਵ۽TUBUFܳ߉ҊBDUJPOਸ۱ೠ
ജ҃BDUJPOਸੑ۱ਵ۽߉ҊTUBUFܳ۱ೠ
5IFHPBMPGSFJOGPSDFNFOUMFBSOJOH pθ (s1 , a1 , . . . .
. , ST , aT ) = p(s1 ) T ∏ t=1 πθ (at ∣ st )p(st+1 ∣ st , at ) pθ (τ) θ* = argmaxθ Eτ pθ (τ) [∑ t r(st , at )]
5IFBOBUPNZPGBSFJOGPSDFNFOUMFBSOJOHBMHPSJUIN 1PMJDZܳࢎਊೞৈ ࢠࢤࢿ SFUVSOчਸஏ SFUVSO SFXBSE ܳਊೞৈ଼
সؘ
2GVODUJPOBOE7BMVFGVODUJPO Qπ(st , at ) = ∑T t′=t Eπθ [r(s′
t , a′ t ) ∣ st , at ] 2GVODUJPO 4UBUFU BDUJPOUীࢲࠗఠզٸө 5 ߉ਸࣻח୨SFXBSEӝчਸ҅
2GVODUJPOBOE7BMVFGVODUJPO 7BMVFGVODUJPO Vπ(st ) = T ∑ t′=t Eπθ [r(s′
t , a′ t ) ∣ st ] TUBUFUীࢲࠗఠզٸө 5 ߉ਸࣻח୨SFXBSEӝчਸ҅
5ZQFPG3-BMHPSJUINT 1PMJDZHSBEJFOUT 7BMVFCBTFE "DUPSDSJUJD .PEFMCBTFE3-
ъചणঌҊ્ܻ݆ਬ ਃೠࢠ ೞಌۄݫఠ উࢿ $POWFSHF
.%1 4UPDIBTUJD %FUFSNJOJTUJD $POUJOVPVT %JTDSFUF .%1ೠ ޖೠ
/&95DIBQUFS 1PMJDZHSBEJFOUT