Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Reinforcement Learning from classic to DQN
Search
Wonseok Jung
September 13, 2018
Research
0
72
Reinforcement Learning from classic to DQN
고전강화학습부터 DQN 까지 설명 자료 입니다.
Wonseok Jung
September 13, 2018
Tweet
Share
More Decks by Wonseok Jung
See All by Wonseok Jung
Ai for business -self car driving
wonseokjung
0
170
reinforcement_learning_.pdf
wonseokjung
2
1.5k
원석이의 모두연에서 강화학습 보석되기
wonseokjung
0
390
NeuralIPS
wonseokjung
0
300
Introduction Deep Reinforcement Learning
wonseokjung
0
120
Deep reinforcemenet learning -2
wonseokjung
0
160
Deep Reinforcement Learning - Introduction
wonseokjung
1
600
How to become a datascientist ?
wonseokjung
2
2.3k
Review of Taylor series
wonseokjung
1
110
Other Decks in Research
See All in Research
言語間転移学習で大規模言語モデルを賢くする
ikuyamada
6
3.3k
オープンな日本語埋め込みモデルの選択肢 / Exploring Publicly Available Japanese Embedding Models
nttcom
14
5.5k
Introduction of NII S. Koyama's Lab (AY2024)
skoyamalab
0
110
サウナでのプロジェクションマッピングの可能性の検討 / EC71koizumi
yumulab
0
170
Sosiaalisen median katsaus 02/2024
hponka
0
2.6k
[ICLR'24] Towards Assessing and Benchmarking Risk-Return Tradeoff of OPE
harukakiyohara_
0
200
CVPR2023 EarthVision Workshopより衛星画像関連論文紹介 / Satellite Imaging Processing Papers in CVPR2023 EarthVision Workshop
nttcom
0
120
Generative Spoken Dialogue Language Modeling [対話論文読み会@電通大]
yuta0306
1
130
Azure Arc-enabled Serversを利用した ハイブリッド・マルチクラウド環境の管理 / Managing Hybrid Multi-cloud Environments with Azure Arc-enabled Servers
nttcom
0
210
Prompt Tuning から Fine Tuning への移行時期推定
icoxfog417
17
7k
継続的な研究費獲得のための考え方
moda0
0
150
AIを前提とした体験の実現に向けて/toward_ai_based_experiences
monochromegane
1
240
Featured
See All Featured
A Philosophy of Restraint
colly
197
16k
Helping Users Find Their Own Way: Creating Modern Search Experiences
danielanewman
20
1.9k
A Tale of Four Properties
chriscoyier
151
22k
A better future with KSS
kneath
231
16k
Building a Modern Day E-commerce SEO Strategy
aleyda
17
6.4k
Fontdeck: Realign not Redesign
paulrobertlloyd
76
4.9k
What’s in a name? Adding method to the madness
productmarketing
PRO
16
2.6k
From Idea to $5000 a Month in 5 Months
shpigford
377
45k
Art, The Web, and Tiny UX
lynnandtonic
289
19k
Rebuilding a faster, lazier Slack
samanthasiow
73
8.2k
Building a Scalable Design System with Sketch
lauravandoore
456
32k
Learning to Love Humans: Emotional Interface Design
aarron
267
39k
Transcript
ъചणۿҗपઁ 䤯ࢳ
ъചणۿҗपणਸ߽೯ ੋҕמगಌܻ݃য়ܳझझ۽ٜ݅ӝਤೠ ѢݽٚѪѹणפ
ࣗѐ ਗࢳোҳਗ City University of New York -Baruch College Data
Science ҕ ConnexionAIোҳਗ Freelancer Data Scientist ݽفোҳࣗъചणোҳਗ Github: https://github.com/wonseokjung Facebook: https://www.facebook.com/ws.jung.798 Blog: https://wonseokjung.github.io/
1. Dynamic Programming a. Policy iteration b. Value iteration 2.
Monte Carlo method 3. Temporal-Difference Learning a. Sarsa b. Q-learning ٩۞ۨਕாۄझࣗѐ߂गಌܻ݃য়ജ҃ҳ୷ %2/ਸਊೠੋҕמगಌܻ݃য়ٜ݅ӝ ࣽࢲ
1. Dynamic Programming a. Policy iteration b. Value iteration 2.
Monte Carlo method 3. Temporal-Difference Learning a. Sarsa b. Q-learning गಌܻ݃য়ജ҃ҳ୷߂٩۞ۨਕாۄझࣗѐ %2/ਸਊೠੋҕמगಌܻ݃য়ٜ݅ӝ .PEFMGSFF .PEFMCBTFE %FFQMFBSOJOH 3- पण
1. Dynamic Programming a. Policy iteration b. Value iteration 2.
Monte Carlo method 3. Temporal-Difference Learning a. Sarsa b. Q-learning गಌܻ݃য়ജ҃ҳ୷߂٩۞ۨਕாۄझࣗѐ %2/ਸਊೠੋҕמगಌܻ݃য়ٜ݅ӝ (SJEXPSME पण
#FGPSF%FFQMFBSOJOH "GUFS%FFQMFBSOJOH 5BCVMBS *NBHF UFYU WPJDFj ജ҃ࢶఖਬ
$MBTTJD3- %FFQ-FBSOJOH Ҋъചणਃೠਬ
2MFBSOJOH $//%2/ %2/
п-FWFM4UBUFоܰӝٸޙী(FOFSBMBHFOUܳ ٜ݅ӝоয۵ ಽܻঋޙઁٜ
गಌܻ݃য়֤ޙ University of California, Berkeley ICML 2017 Curiosity-driven Exploration by
Self-supervised Prediction
IUUQTHJUIVCDPNXPOTFPLKVOH,*14@3FJOGPSDFNFOU पणܐח $PEF +VQZUFS/PUFCPPLࢳ ࢸߨݽفઁҕ ҾӘೠਵदݶѐੋਵ۽োۅࣁਃ
Markov Decision Process
3FUVSOPG&QJTPEF &QJTPEFزউ3FUVSOػ3FXBSE 5PUBM3FXBSE
%JTDPVOUFE3FUVSO %JTDPVOUFEGBDUPSоਊػ3FXBSE 5PUBM3FXBSEXJUI%JTDPVOUFE
.%1ীࢲY(SJEXPSME Grid World Environment
.%1ীࢲY(SJEXPSME 4UBUFӒܻ٘ઝ "DUJPO࢚ ೞ ઝ 3FXBSEೣ ݾ 5SBOTJUJPO1SPCBCJMJUZ %JTDPVOUGBDUPS
3FXBSE 3FXBSE 4UBUF "DUJPO Grid World Environment
4UBUFWBMVFGVODUJPO 1PMJDZܳٮܲTUBUFWBMVFGVODUJPO 4UBUFWBMVF
4UBUFWBMVFGVODUJPO 1PMJDZܳٮܲTUBUFWBMVFGVODUJPO 4UBUFWBMVF
"DUJPO7BMVFGVODUJPO 1PMJDZܳٮܲBDUJPOWBMVFGVODUJPO 4UBUFBDUJPOWBMVF
#FMMNBOFRVBUJPO "'VOEBNFOUBMQSPQFSUZPGWBMVFGVODUJPO
0QUJNBM1PMJDZܳTUBUF 7BMVFܳ୭۽ 0QUJNBMTUBUFWBMVFGVODUJPO
0QUJNBM1PMJDZܳTUBUFBDUJPO 7BMVFܳ୭۽ 0QUJNBMTUBUFBDUJPOWBMVFGVODUJPO
#FMMNBOFRVBUJPO 0QUJNBMJUZ #FMMNBOPQUJNBMJUZFRVBUJPOW
#FMMNBOFRVBUJPO 0QUJNBMJUZ #FMMNBOPQUJNBMJUZFRVBUJPOR
ܴ .%1 3FUVSO&QJTPEF 3FUVSO&QTJTPEF EJTDPVOU 4UBUFWBMVFGVODUJPO "DUJPOWBMVFGVODUJPO 0QUJNBM1PMJDZ #FMMNBO&RVBUJPO
#FMMNBOPQUJNBMFRVBUJPO #FMMNBO&RVBUJPO 0QUJNBM1PMJDZ
Dynamic Programming
ઑѤ State, Reward, Action
ઑѤ Transition Probability ژೠয
Dynamic Programming 7BMVFGVODUJPOਸࢎਊೞৈࠁա1PMJDZ ܳӝਤ೧ҳઑചदఃҊܻೡࣻ Dynamic programmingKey idea!
Dynamic programming
Y(SJEXPSMEীࢲ%ZOBNJD1SPHSBNNJOH Grid World Environment
Y(SJEXPSME 4UBUFӒܻ٘ઝ "DUJPO࢚ ೞ ઝ 3FXBSEೣ ݾ 5SBOTJUJPO1SPCBCJMJUZ %JTDPVOUGBDUPS
3FXBSE 3FXBSE അTUBUF "DUJPO Grid World Environment TUBUF TUBUF
6QEBUF3VMF #FMMNBOFRVBUJPOਸࢎਊೞৈসؘೠ 4UBUF
فઙܨ0QUJNBM7BMVFGVODUJPOT 4UBUF7BMVF #FMMNBOPQUJNBMJUZFRVBUJPOT
فઙܨ0QUJNBM7BMVFGVODUJPOT "DUJPO7BMVF #FMMNBOPQUJNBMJUZFRVBUJPOT
Dynamic Programming فઙܨ୭7BMVFGVODUJPO State-action Value function
1PMJDZ*UFSBUJPO 7BMVF*UFSBUJPO Dynamic Programming
1PMJDZJUFSBUJPO 1.Policyܳٮۄ state-valueܳ҅ೞ Policy Evaluation ؊જPolicyܳ Policy Improvement ୭1PMJDZܳ ӝਤೠفо
җ
Policy iteration- Policy Evaluation 6QEBUF3VMFਸࢎਊೞৈ&WBMVBUJPOਸೠ 7BMVFVQEBUF 1PMJDZ 5SBOTJUJPO 1SPCBCJMJUZ 3FXBSE
/FYU4UBUF FTUJNBUFEWBMVF
ݽٚTUBUFܳ7 T ਵ۽ୡӝചदఅ пTUBUFܳ6QEBUF3VMFਸࢎਊೞৈ7 T ܳসؘೠ Policy iteration- Policy
Evaluation সؘೞݴ7 T ߸ചݒਸٸসؘܳݥ Policyܳٮۄ state-valueܳ҅ೞ
Policy iteration- Improvement 1PMJDZܳٮۄ7BMVFGVODUJPOਸ҅ೠਬח؊ա 1PMJDZܳӝਤ೧ࢲ (SFFEZ1PMJDZ
Policy iteration- Improvement (SFFEZ1PMJDZਊ
1PMJDZJUFSBUJPO 1PMJDZJUFSBUJPO0QUJNBMQPMJDZܳਸٸө 1PMJDZ&WBMVBUJPOҗ1PMJDZ*NQSPWFNFOUܳ߈ࠂೠ
(SJE8PSME&OWJSPONFOU Y(SJEXPSME
Y(SJEXPSME 4UBUFӒܻ٘ઝ "DUJPO࢚ ೞ ઝ 3FXBSEೠੌٸ݃ 5SBOTJUJPO1SPCBCJMJUZ %JTDPVOUGBDUPS (SJE8PSME&OWJSPONFOU1PMJDZJUFSBUJPO
3FXBSE (PBM "DUJPO (PBM 3FXBSE 3FXBSE 3FXBSE 3FXBSE 3FXBSE 3FXBSE 3FXBSE 3FXBSE 3FXBSE 3FXBSE 3FXBSE 3FXBSE 3FXBSE 3FXBSE 3FXBSE 3FXBSE 3FXBSE 3FXBSE 3FXBSE 3FXBSE 3FXBSE 3FXBSE
(SJE8PSME&OWJSPONFOU1PMJDZJUFSBUJPO Lੌٸ ୡӝച 7L (SFFE1PMJDZ
7L (SFFE1PMJDZ L
(SJE8PSME&OWJSPONFOU1PMJDZJUFSBUJPO
7L (SFFE1PMJDZ L
(SJE8PSME&OWJSPONFOU1PMJDZJUFSBUJPO
7L (SFFE1PMJDZ LJOG
(SJE8PSME&OWJSPONFOU1PMJDZJUFSBUJPO
1PMJDZJUFSBUJPO दো
1PMJDZ*UFSBUJPO 7BMVF*UFSBUJPO Dynamic Programming
7L (SFFE1PMJDZ Lࣻ۴ೞݶ
(SJE8PSME&OWJSPONFOU7BMVFJUFSBUJPO
7BMVF*UFSBUJPO ߈ࠂೞঋח 4UBUF "DUJPO
7BMVFJUFSBUJPO दো
.PEFMহݶ पઁ۽҃ਸ೧ࠁݴജ҃җ࢚ഐਊਸ೧ঠೠ
Monte Carlo method
.POUF$BSMPNFUIPEח%ZOBNJDQSPHSBNJOHۢ ݽٚࠁܳঌҊदೞחѪইצ पઁ۽҃ਸೞݴജ҃җ࢚ഐਊਸೠ .POUF$BSMP
पઁ۽҃ਸೞݴߓחߑߨજFOWJSPONFOU ࠁоহযبपઁ۽҃ਸೞݴPQUJNBMCFIBWJPSਸܖӝ ⮚ٸޙ .POUF$BSMP
.POUF$BSMP .POUF$BSMPחFQJTPEFCZFQJTPEF۽সؘೠ ীೖ݄ࣗ٘݃झపUFSNJOBMTUBUFө оࢲসؘೠ .POUF$BSMPח҃ਸೞݴSFUVSOػTBNQMFਸਊೞৈ TUBUFBDUJPOWBMVFܳಣӐೞৈসؘೠ
(PBM .POUF$BSMP(SJE8PSME өоࠄٍ6QEBUF 4UBSU
.POUF$BSMP दো
Temporal-Difference Learning
݅ডъചणਸೡࣻחই٣যоݶӒ Ѫ5% UFNQPSBMEJGGFSFODF MFBSOJOHੌѪ 4VUUPO 5FNQPSBM%JGGFSFODF-FBSOJOH
5FNQPSBM%JGGFSFODF-FBSOJOH .POUF$BSMP %ZOBNJDQSPHSBNNJOH .POUF$BSMPۢݽ؛হ҃ਸాೞৈWBMVFܳஏೞݴ %1ۢөоঋইبррWBMVFܳFTUJNBUFೞחѪоמ
5FNQPSBM%JGGFSFODF-FBSOJOH അTUBUFীࢲBDUJPOਸࢶఖೞݴ߉ਸ3FXBSEҗ4UBUFীEJTDPVOUGBDUPSоਊػ TUBUFWBMVFܳFTUJNBUFೞݴVQEBUFೠ .POUF$BSPMPীࢲ(Uܳঌইঠসؘоמ
5% .POUF$BSMPъੋജ҃ݽ؛ਸঌޅ೧بࢎਊоמ %ZOBNJDQSPHSBNNJOHীࢲۢ0OMJOFण өӝܻঋইب ррVQEBUFооמೞӝীFQJTPEFо ݆ӡѢաDPOUJOVFೠNPEFMীࢲࢎਊೞӝજ
5%ই٣য 5FNQPSBM%JGGSFOFDF-FBSOJOH 4BSTB 2MFBSOJOH 5FNQPSBM%JGGSFOFDF-FBSOJOH4BSTB৬2MFBSOJOH߄ఔই٣যоغ 0OQPMJDZ 0GGQPMJDZ
4BSTB 2MFBSOJOH Temporal-Diffrenece Learning
4BSTB POQPMJDZߑߨਸࢎਊೞח4BSTB TUBUFWBMVFGVODUJPOनBDUJPOWBMVFGVODUJPOਸण
4BSTB UJNFTUFQীࢲTUBUF৬BDUJPOܳلࢎਊೞৈBDUJPOWBMVFܳFTUJNBUFೠ
4BSTBQTFVEPDPEF
4BSTBQTFVEPDPEF 0OQPMJDZ
4BSTBHSJEXPSME (PBM 4UBSU "U 4U
4BSTBHSJEXPSME
҃ೞঋझపࠁоহ
4BSTBHSJEXPSME
҃ਸৈ۞ߣ೧ࠁݴBDUJPOWBMVFܳসؘೠ 1PMJDZח0OQPMJDZ
4BSTB दো
4BSTB 2MFBSOJOH Temporal-Diffrenece Learning
2MFBSOJOH 2MFBSOJOHۄҊܻࠛחPGGQPMJDZ5%DPOUSPMੋ೧ъചणߊೞח҅ӝоغ 8BULJOT FYQMPSBUJPOҗFYQMPJUBUJPOਸэೠ
2MFBSOJOHQTFVEPDPEF 0GGQPMJDZ
RMFBSOJOHHSJEXPSME (PBM 4UBSU "SHNBY 4U
2MFBSOJOHHSJEXPSME
҃ೞঋझపࠁоহ
2MFBSOJOHHSJEXPSME
҃ਸৈ۞ߣ೧ࠁݴBDUJPOWBMVFܳসؘೠ 1PMJDZח0GGQPMJDZ
2MFBSOJOH दো
٩۞ۨਕாۄझࣗѐगಌܻ݃য়ജ҃ҳ୷
%FFQMFBSOJOH https://goo.gl/images/VA89CC
%FFQMFBSOJOHਵ۽ੋ೧ https://chaosmail.github.io/deeplearning/2016/10/22/intro-to-deep-learning-for-computer-vision/ ࢎਸJOQVUਵ۽߉חѪоמ೧
%FFQ3FJOGPSDFNFOU-FBSOJOH %FFQMFBSOJOH 3FJOGPSDFNFOU-FBSOJOH https://goo.gl/images/oNu5Gr
%FFQNJOE %2/ https://www.youtube.com/watch?v=V1eYniJ0Rnk %FFQMFBSOJOHਸъചणীਊೞৈ ࢎۈࠁۨܳੜೞחੋҕמਸ݅ٞ
%2/ ,FSBT CSFBLPVU ٘ࢸݺ .BJO MJCSBSZ 'VODUJPO
%2/ ,FSBT CSFBLPVU ٘ࢸݺ .BJO MJCSBSZ %2/
ജ҃ਸࠛ۞ৡ BHFOUܳࢤࢿೠ TDPSF FQJTPEF HMPCBM@TUFQܳ೧ળ ೧ળীೖࣗ٘݅ఀणਸद ݾऀࢽѐоয ജ҃ୡӝчਸоઉৡ ੌҳрزউ߄оBDUJPOਸ۽ࢶఖೡࣻ ѱೠ
ਤജ҃ୡӝчਸܻ೧ળ ܻ೧ળ۽֎ѐझషܻܳ݅ٚ .BJO
ѱզٸө҅ࣘجѱೞחXIJMFޙਸ݅ٚ SFOEFSਬޖܳഛੋೠ SFOEFSਸਬޖীٮ ۄणࣘبо׳ۄ Ӗ۽ߥझచਸೞաঀט۰ળ झచਸೞաঀט۰ળ ֎ѐTUBUF IJTUPSZ ܳਊೞৈBDUJPOਸ
ࢶఖೠ .BJO
.BJO ࢶఖೠBDUJPOਵ۽ജ҃җ࢚ഐਊೞݴജ҃ীࢲझప SFXBSE EPOF JOGPчਸ߉ח ߉झపܳदܻ೧ળ IJTUPSZীࢲখࣁѐ৬ߑӘ߉ইৡTUBUFܳࢲ OFYU@IJTUPSZ۽ࢶ R@NBYಣӐਸ҅ೞӝਤ೧ࢲഅNPEFM۽ࠗఠաৡ2
чNBYܳBHFOUBWH@R@NBYী؊ೠ ݅ডEFBEੋ҃EFBEܳ5SVF۽߄ԲҊ TUBSU@MJGFܳೞա ৈળ
.BJO ੌदр݃UBSHFUNPEFMਸVQEBUFೠ ݅ডীલਵݶEFBEGBMTF۽߄ԲҊইפݶ OFYUIJTUPSZчਸIJTUPSZо߉ח ݅ডীEPOFݶীೖࣗ٘णࠁܳӝ۾ ೞৈ۱ೠ ੌীೖࣗ٘݃ݽ؛ਸೠ
%2/ ,FSBT CSFBLPVU ٘ࢸݺ .BJO MJCSBSZ %2/
.BJO ܲইఋܻѱীࢲبਊೡࣻب۾ܻਕ٘ߧਤܳ_ ۽ೠ T B S Tܻܳۨݫݽܻীೠ ܻۨݫݽܻоदࠁ֫ইݶणਸदೠ
.BJO ੌदр݃UBSHFUNPEFMਸVQEBUFೠ ݅ডীલਵݶEFBEGBMTF۽߄ԲҊইפݶ OFYUIJTUPSZчਸIJTUPSZо߉ח ੌীೖࣗ٘݃ݽ؛ਸೠ
*NQPSU ਃೠۄ࠳۞ܻܳࠛ۞ৡ B,FSBT $//MBZFS %FOTFMBZFS PQUJNJ[FS ாۄझীࢲ٩۞ݽ؛
*NQPSU Cܻ JOQVUਵ۽ٜযয়חӝઑ 3(#ܳ(SBZ۽݅٘חۄ࠳۞ܻ SFQMBZNFNPSZ݅٘ח D5FOTPSGMPX UFOTPSGMPXCBDLFOE UFOTPSGMPX Eӝఋ
OVNQZ SBOEPN HZN PT
%2/ ,FSBT CSFBLPVU ٘ࢸݺ .BJO MJCSBSZ %2/
%2/ ↟ SFOEFSਬޖ ↟ NPEFMMPBEਬޖ ↟ TUBUFࢎૉ
↟ BDUJPOࢎૉ ↟ FQTJMPOч ↟ FQTJMPOदҗ EFDBZܳਤ ೧ ↟ FQTJMPOEFDBZTUFQ ↟ ୡӝച
%2/ ↟ ܻۨݫݽܻীࢲࡳਸߓࢎૉ ↟ णਸदೡӝળ ↟ ݽ؛۽সؘӝ
↟ EJTDPVOUGBDUPS ↟ ܻۨݫݽܻ୭ӝ ↟ झఋೡٸBDUJPOਸ۽೧חࢸ ↟ %FFQMFBSOJOHNPEFM ↟ 5BSHFUNPEFM ↟ VQEBUFUBSHFUNPEFM ↟ ୡӝച
%2/ ↟ PQUJNJ[FS ↟ 5FOTPSCPBSE ↟ ୡӝച
4BWFػݽ؛ਝܳоઉৢٸࢎਊ
%2/ ,FSBT۽٩۞ݽ؛ٜ݅ӝ $//-BZFST %FOTF-BZFS
%2/ BDUJPOਸࢶఖೞחೣࣻ QPMJDZ ৈ ӝࢲח&QTJMPOHSFFEZ അNPEFMXFJHIUܳоઉ৬ࢲUBSHFU NPEFMਝ۽সؘೞחೣࣻ
%2/ TUBUF BDUJPO SFXBSE OFYUTUBUFܻܳۨݫݽܻী೧חೣࣻ 3FQMBZ.FNPSZ
%2/ ܻۨݫݽܻীࢲࡳইৡߓ۽ݽ؛ਸणೞחೣࣻ 3FQMBZNFNPSZ
%2/ ↟ PQUJNJ[FS ↟ 5FOTPSCPBSE пীೖࣗ٘णࠁܳӝ۾ೞ חೣࣻ ୡӝച
%2/ ↟ܻܳਤೠೣࣻ ↟ ୡӝച
%2/ 0QUJNJ[FSೣࣻ ৈӝࢲח)VCFS-PTTࢎਊ https://goo.gl/images/XGsfYx
%2/ ,FSBT CSFBLPVU दো
गಌܻ݃য়ജ҃ҳ୷
)VNBO "* %FFQMFBSOJOH 3FJOGPSDFNFOU-FBSOJOH ܲبݫੋীࢲبਊоמೞ
ৈ۞ജ҃ীࢎਊ ೞ݅ജ҃ীٮܲೠъചणঌҊ્ܻਸਊ೧ঠೠ
ೠъചण 4UBUFӝ BDUJPOܰ эঌҊ્ܻਸࢎਊೞ؊ۄب %FFQMFBSOJOHNPEFM IZQFSQBSBNFUFSਸ ೞѱࢎਊ೧ঠೠ
&NVMBUPS &OWJSPONFOU Algorithm 1SPHSBNNJOH-BOHVBHF गಌܻ݃য়ࢸীਃೠ֎о
IUUQTXXXQZUIPOPSHEPXOMPBETWFSTJPO IUUQTXXXBOBDPOEBDPNEPXOMPBE"OBDPOEB IUUQTXXXUFOTPSGMPXPSHJOTUBMM5FOTPS'MPX IUUQTLFSBTJPJOTUBMMBUJPO,FSBT 1SPHSBNNJOH-BOHVBHF1ZUIPO
&NVMBUPS IUUQXXXGDFVYDPNXFCIPNFIUNM 6CVOUV TVEPBQUHFUVQEBUF TVEPBQUHFUJOTUBMMGDFVY ."$ IUUQTCSFXTIIPNFCSFXXFCTJUF 5FSNJOBMPQFOCSFXJOTUBMMGDFVY TVEPBQUHFUJOTUBMMGDFVY &NVMBUPS'$69
Environment 0QFO"*@(ZN IUUQTHJUIVCDPNPQFOBJHZN QJQJOTUBMMHZN HJUDMPOFIUUQTHJUIVCDPNPQFOBJHZNHJU DEHZN QJQJOTUBMMF 0QFO"*@(ZN 0QFO"*(ZNਸࢎਊೞݶࠁऔѱъചणपоמೞ
Environment #BTFMJOFT IUUQTHJUIVCDPNPQFOBJCBTFMJOFT QJQJOTUBMMCBTFMJOFT HJUDMPOFIUUQTHJUIVCDPNPQFOBJCBTFMJOFTHJU DECBTFMJOFT QJQJOTUBMMF 0QFO"*@#BTFMJOFT
Environment 1IJMJQ1BRVFUUF IUUQTHJUIVCDPNQQBRVFUUFHZNTVQFSNBSJP QJQJOTUBMMHZNQVMM JNQPSUHZN JNQPSUHZN@QVMM HZN@QVMMQVMM HJUIVCDPNQQBRVFUUFHZNTVQFSNBSJP FOWHZNNBLF
QQBRVFUUF4VQFS.BSJP#SPTW 4VQFS.BSJP
Algorithm DEEP Q-NETWORK "MHPSJUIN%2/
ࢸীޙઁоࢤӟݶ IUUQTHJUIVCDPNXPOTFPLKVOH,*14@3FJOGPSDFNFOUUSFF NBTUFS%2/ য়טъചणपणъਊHJUIVCীࣁೠࢸߨৢ۰֬ওणפ
%2/ਸਊೠੋҕמगಌܻ݃য়ٜ݅ӝ
Ӓܻ٘ਘ٘৬যڌѱܳө 4UBUFӒܻ٘ઝ "DUJPO࢚ ೞ ઝ 3FXBSEೣ ݾ 5SBOTJUJPO1SPCBCJMJUZ %JTDPVOUGBDUPS
3FXBSE 3FXBSE 4UBUF "DUJPO Ӓܻ٘ਘ٘৬गಌܻ݃য়ജ҃
(PBM࠺Ү (PBM 4UBSU गಌܻ݃য়חӥߊਸחݾ Ӓܻ٘ਘ٘ݾחHPBMTUBUF۽оחѪ
गಌܻ݃য়ীࢲജ҃ 4UBUFചݶ "DUJPO࢚ ೞ ઝ ׳ܻӝ BDUJPOઑ 3FXBSEখਵ۽ೡٸ3FXBSE
ٍ۽оݶ 5SBOTJUJPO1SPCBCJMJUZ %JTDPVOUGBDUPS 4UBUF "DUJPO بੋӥߊীоөтࣻ۾֫SFXBSEܳ߉ח
҅ࣘغחपಁj
ग ܻ݃য়оখਵ۽ೞঋਵ۰Ҋೞחഅ࢚ 4UBUFоCSFBLPVUࠁ؊ࠂೞҊBDUJPO݆
3FXBSEࢸ ݾ׳ࢿೞޅೞݶ दрզٸ݃ ӥߊীࢲݣযݶ ӥߊীоөਕݶ ݾীبೞݶ 1FOBMUZ #POVTSFXBSE୶о
%FFQMFBSOJOHNPEFM 7((NPEFMBOESFHVMBS࠺Ү https://goo.gl/images/eoXooC https://goo.gl/images/s8XrCK ؊Өѱऺইࠁ
ъചण SFJOGPSDFNFOUMFBSOJOH ӝୡࢸݺ߂6OJUZNMBHFOUܳਊೞৈ݅ٚജ҃ীъചणঌҊ્ܻਊ (JUIVC IUUQTHJUIVCDPNXPOTFPLKVOH 'BDFCPPL IUUQTXXXGBDFCPPLDPNXTKVOH #MPH IUUQTXPOTFPLKVOHHJUIVCJPࢿҕ
%2/ਸਊೠੋҕמगಌܻ݃য়ٜ݅ӝ दো
ۨ߰ਸܻযೞחܻ݃য়ח݅ٚറ ܲۨ߰ীࢲࢿמݒڄয0WFSGJUUJOHইקө https://goo.gl/images/6uDmqH
ъചणোҳחഝߊ೯ 3FXBSE &YQMPSBUJPO "MHPSJUIN
хࢎפ (JUIVC IUUQTHJUIVCDPNXPOTFPLKVOH 'BDFCPPL IUUQTXXXGBDFCPPLDPNXTKVOH #MPH IUUQTXPOTFPLKVOHHJUIVCJP
3FGFSFODFT 3FJOGPSDFNFOU-FBSOJOH"O*OUSPEVDUJPO3JDIBSE44VUUPOBOE"OESFX(#BSUP4FDPOE&EJUJPO JOQSPHSFTT.*51SFTT $BNCSJEHF ." IUUQTHJUIVCDPNSMDPEFSFJOGPSDFNFOUMFBSOJOHLS