Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Introduction Deep Reinforcement Learning
Search
Wonseok Jung
November 20, 2018
0
160
Introduction Deep Reinforcement Learning
Wonseok Jung
November 20, 2018
Tweet
Share
More Decks by Wonseok Jung
See All by Wonseok Jung
Ai for business -self car driving
wonseokjung
0
200
reinforcement_learning_.pdf
wonseokjung
2
1.5k
원석이의 모두연에서 강화학습 보석되기
wonseokjung
0
420
NeuralIPS
wonseokjung
0
420
Deep reinforcemenet learning -2
wonseokjung
0
200
Deep Reinforcement Learning - Introduction
wonseokjung
1
640
How to become a datascientist ?
wonseokjung
2
2.3k
Review of Taylor series
wonseokjung
1
120
꿈꾸는 Agent
wonseokjung
2
150
Featured
See All Featured
How To Stay Up To Date on Web Technology
chriscoyier
791
250k
GitHub's CSS Performance
jonrohan
1032
470k
Principles of Awesome APIs and How to Build Them.
keavy
127
17k
Visualizing Your Data: Incorporating Mongo into Loggly Infrastructure
mongodb
48
9.7k
The Language of Interfaces
destraynor
162
25k
Faster Mobile Websites
deanohume
310
31k
Statistics for Hackers
jakevdp
799
220k
Raft: Consensus for Rubyists
vanstee
140
7.2k
Site-Speed That Sticks
csswizardry
13
940
Build your cross-platform service in a week with App Engine
jlugia
234
18k
Dealing with People You Can't Stand - Big Design 2015
cassininazir
367
27k
Embracing the Ebb and Flow
colly
88
4.9k
Transcript
3-9%- 8POTFPL+VOH *OUSPEVDUJPOUP3FJOGPSDFNFOU-FBSOJOH GFBUDT
8POTFPL+VOH $JUZ6OJWFSTJUZPG/FX:PSL#BSVDI$PMMFHF %BUB4DJFODF.BKPS $POOFYJPO"*'PVOEFS %FFQ-FBSOJOH$PMMFHF3FJOGPSDFNFOU-FBSOJOH3FTFBSDIFS .PEVMBCT$53--FBEFS 3FJOGPSDFNFOU-FBSOJOH 0CKFDU%FUFDUJPO
$IBUCPU (JUIVC IUUQTHJUIVCDPNXPOTFPLKVOH 'BDFCPPL IUUQTXXXGBDFCPPLDPNXTKVOH #MPH IUUQTXPOTFPLKVOHHJUIVCJP :PVUVCF IUUQTXXXZPVUVCFDPNDIBOOFM6$N5Y8,EIM8W+6GS3X
5PEBZ .BSLPWEFDJTJPOQSPDFTT ъചणޙઁ ъചणঌҊ્ܻࣁо࠙ܨ ъചणঌҊ્ܻఋੑѐਃ
.BSLPW%FDJTJPO1SPCMFN πθ (at ∣ ot ) πθ (at ∣ st
) - policy - policy ( fully observed ) st ot at - state - observation - action o1 s1 a1 o2 s2 a2 o3 s3 a3 p(st+1 ∣ st , at ) p(st+1 ∣ st , at ) .BSLPW%FDJTJPO1SPCMFN TUBUF BDUJPO SFXBSE USBOTJUJPO١ਵ۽അغחъചणীࢲࣁ࢚ਸ
3FXBSEGVODUJPOT )JHI3FXBSE ֫ࠁ࢚ উೞѱݾীب -PX3FXBSE ծࠁ࢚ ҮాࢎҊ
1PMJDZউೞѱೞѱೞח଼ण )JHI3FXBSE -PX3FXBSE 3FXBSEGVODUJPOਸా೧ण
.BSLPWDIBJO s ∈ S s T 4UBUFTQBDF 5SBOTJUJPOPQFSBUPS 4UBUFTQBDF M
= S, T
.BSLPWDIBJO HSBQIJDBMMZ →μt +1 = T→μt μt,i = p(st =
i) s1 s2 s3 p(st+1 ∣ st , at ) p(st+1 ∣ st , at ) Ti,j = p(st+1 = i ∣ st = j) TUBUFKоযਸٸTUBUFоJؼഛܫݫܼझ UJNFTUFQUীTUBUFоJੌഛܫ
.BSLPWEFDJTJPOQSPDFTT s1 a1 s2 a2 s3 p(st+1 ∣ st ,
at ) p(st+1 ∣ st , at ) s ∈ S s A T a ∈ A 4UBUFTQBDF "DUJPOTQBDF 5SBOTJUJPOPQFSBUPS 4UBUFTQBDF "DUJPOTQBDF M = {S, A, T, r} r 3FXBSEGVODUJPO
.BSLPWEFDJTJPOQSPDFTT μt,i = p(st = i) Ti,j,k = p(st+1 =
i ∣ st = j, at = k) UJNFTUFQUীTUBUFоJੌഛܫ UJNFTUFQUীࢲTUBUFоKҊBDUJPOLੌ⮶ UJNFTUFQU ীࢲTUBUFоJੌഛܫ ξt,k = p(at = k) UJNFTUFQUীBDUJPOLੌഛܫ r : SxA → R SFXBSEGVODUJPO μt,i = ∑ j,k Ti,j,k μt,j ξt,k
1BSUJBM0CTFSWFE.BSLPWEFDJTJPOQSPDFTT s ∈ S s A T a ∈ A
4UBUFTQBDF "DUJPOTQBDF 5SBOTJUJPOPQFSBUPS 4UBUFTQBDF "DUJPOTQBDF M = {S, A, O, T, E, r} O 0CTFSWBUJPOTQBDF E &NJTTJPOQSPCBCJMJUZ P(ot ∣ st ) r 3FXBSEGVODUJPO o ∈ O PCTFSWBUJPOTQBDF o1 s1 a1 o2 s2 a2 o3 s3 a3 p(st+1 ∣ st , at ) p(st+1 ∣ st , at )
5IFHPBMPGSFJOGPSDFNFOUMFBSOJOH πθ (a ∣ s) θ ੋҕन҃ݎXFJHIUT 1PMJDZחۄݫఠܳ߸ೠ θ ੋҕन҃ݎੑ۱ਵ۽TUBUFܳ߉ҊBDUJPOਸ۱ೠ
ജ҃BDUJPOਸੑ۱ਵ۽߉ҊTUBUFܳ۱ೠ
5IFHPBMPGSFJOGPSDFNFOUMFBSOJOH pθ (s1 , a1 , . . . .
. , ST , aT ) = p(s1 ) T ∏ t=1 πθ (at ∣ st )p(st+1 ∣ st , at ) pθ (τ) θ* = argmaxθ Eτ pθ (τ) [∑ t r(st , at )]
5IFBOBUPNZPGBSFJOGPSDFNFOUMFBSOJOHBMHPSJUIN 1PMJDZܳࢎਊೞৈ ࢠࢤࢿ SFUVSOчਸஏ SFUVSO SFXBSE ܳਊೞৈ଼
সؘ
2GVODUJPOBOE7BMVFGVODUJPO Qπ(st , at ) = ∑T t′=t Eπθ [r(s′
t , a′ t ) ∣ st , at ] 2GVODUJPO 4UBUFU BDUJPOUীࢲࠗఠզٸө 5 ߉ਸࣻח୨SFXBSEӝчਸ҅
2GVODUJPOBOE7BMVFGVODUJPO 7BMVFGVODUJPO Vπ(st ) = T ∑ t′=t Eπθ [r(s′
t , a′ t ) ∣ st ] TUBUFUীࢲࠗఠզٸө 5 ߉ਸࣻח୨SFXBSEӝчਸ҅
5ZQFPG3-BMHPSJUINT 1PMJDZHSBEJFOUT 7BMVFCBTFE "DUPSDSJUJD .PEFMCBTFE3-
ъചणঌҊ્ܻ݆ਬ ਃೠࢠ ೞಌۄݫఠ উࢿ $POWFSHF
.%1 4UPDIBTUJD %FUFSNJOJTUJD $POUJOVPVT %JTDSFUF .%1ೠ ޖೠ
/&95DIBQUFS 1PMJDZHSBEJFOUT