Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
これからの強化学習2.6
Search
moyomot
May 19, 2017
0
210
これからの強化学習2.6
moyomot
May 19, 2017
Tweet
Share
More Decks by moyomot
See All by moyomot
DRIVE CHARTのMLOpsを体感しよう
moyomot
0
83
現場課題に向き合い MLOps成熟度を高める道
moyomot
1
950
第1回 Data-Centric AI勉強会 LT: AIドラレコを支える一貫性のあるデータの作り方
moyomot
0
810
DRIVE CHARTにおけるAI開発とアーキテクチャ全容
moyomot
0
890
これからの強化学習2.7
moyomot
0
130
Gunosyのデータ分析基盤、ログ基盤の全容
moyomot
14
9.4k
GunosyにおけるSparkStreaming活用事例
moyomot
1
5.1k
トピックモデル第2章
moyomot
0
300
adhoc analysis apache spark
moyomot
1
1.1k
Featured
See All Featured
Keith and Marios Guide to Fast Websites
keithpitt
410
22k
Building a Modern Day E-commerce SEO Strategy
aleyda
38
7k
The Cost Of JavaScript in 2023
addyosmani
45
7k
Stop Working from a Prison Cell
hatefulcrawdad
267
20k
Code Review Best Practice
trishagee
65
17k
Exploring the Power of Turbo Streams & Action Cable | RailsConf2023
kevinliebholz
28
4.4k
Responsive Adventures: Dirty Tricks From The Dark Corners of Front-End
smashingmag
251
21k
Bash Introduction
62gerente
608
210k
Distributed Sagas: A Protocol for Coordinating Microservices
caitiem20
330
21k
The Art of Delivering Value - GDevCon NA Keynote
reverentgeek
8
1.2k
Become a Pro
speakerdeck
PRO
26
5k
The Cult of Friendly URLs
andyhume
78
6.1k
Transcript
͜Ε͔ΒͷڧԽֶश 2.6 ϦεΫߟྀܕڧԽֶश GUNOSY σʔλϚΠχϯάݚڀձ #121
INTRODUCTION ͜͜·ͰֶΜͩڧԽֶशͰղܾͰ͖ͳ͍ ▸ ڧԽֶशͰใुͷظʢϦλʔϯʣͷ࠷େԽΛతͱ͢Δ ▸ ظͷ࠷େԽʢ࠷খԽʣͱͯ͠ఆࣜԽͰ͖ͳ͍έʔε͕͋Δ ▸ ى͜Δ͕͍͕֬ɺେ͖ͳଛࣦ͕ൃੜͯ͠͠·͏߹Ͱ͋ΓϢʔ βʔ͕ϦεΫճආʹڵຯͷ͋Δ߹ ▸
େ͖ͳෛͷใु͕ൃੜ͢ΔϦεΫΛੵۃతʹճආ͢ΔΈͰͳ͍ ▸ גࣜࢿͷΑ͏ͳ߹খ͞ͳ֬Ͱى͜Δେ͖ͳଛࣦΛճආ͠ ͳ͕ΒऩӹΛߴΊΔΑ͏ʹ͢Δඞཁ͕͋Δ ▸ ϦλʔϯʹظҎ֎ͷใ͕ͳ͍ͨΊ
INTRODUCTION ๅ͘͡ͷظ ▸ ߴ͍֬Ͱ1ηϯτṶ͔Δ ▸ ଟ͘ͷਓṶ͚͕খͯ͘͞ɺ100υϧଛ͢ΔϦεΫ͕େ ͖͍ͱߟ͑ΔͷͰ ▸ http://citeseerx.ist.psu.edu/viewdoc/download? doi=10.1.1.45.8264&rep=rep1&type=pdf
INTRODUCTION ࣍ ▸ 2.6.1 ڧԽֶशͷ෮शʢׂѪʣ ▸ 2.6.2 ϦεΫߟྀܕڧԽֶश๏ ▸ ͋Δछͷ࠷ѱέʔεධՁ
▸ ޮ༻ؔ࣌ؒࠩ(TD)ޡࠩͷඇઢܗԽ ▸ ϦλʔϯҎ֎ͷϦεΫࢦඪͷಋೖ ▸ 2.6.3 ϦεΫߟྀܕڧԽֶशͷͨΊͷϦλʔϯਪఆ ▸ Ϧλʔϯͷ͕֬Θ͔Ε Value-atRiskɺ༷ʑͳϦεΫ ࢦඪΛࢉग़Ͱ͖ɺϦεΫࢦඪʹج͍ͮͨҙࢥܾఆ͕Մೳ ▸ 2.6.4 ͓ΘΓʹ
2.6.2 ϦεΫߟྀܕڧԽֶश๏ ͋Δछͷ࠷ѱέʔεධՁ ▸ Q-learningΛ֦ு͢Δํ๏ ▸ Q-learningʢ෮शʣ ▸ ϕϧϚϯํఔࣜ ▸
TD(࣌ؒࠩ)ֶश
2.6.2 ϦεΫߟྀܕڧԽֶश๏ Qϋοτֶश maximinํࡦʹΑΔ֦ு Heger ▸ maximinͱ ▸ ఆ͞ΕΔ࠷খͷརӹ͕࠷େʹͳΔΑ͏ʹܾஅΛߦ͏ઓུ ▸
ͱ͍͏ͷఆࣜԽ ▸ େଛ͢ΔϦεΫΛ࠷খݶʹ ▸ Q-learningͷTDֶशΛ༻Ͱ͖ΔϝϦοτ ؔvsຊ Aઓུ Bઓུ Aઓུ 100 -100 Bઓུ 10 -10
2.6.2 ϦεΫߟྀܕڧԽֶश๏ ޮ༻ؔ࣌ؒࠩޡࠩΛඇઢܗԽ͢ΔΞϓϩʔν ▸ ϦεΫࢦඪͱͯ͠ϑΝΠφϯεɺ੍ޚཧͰར༻͞ΕΔඇઢ ܗͳޮ༻ؔΛར༻͢ΔΞϓϩʔν ▸ ͜ΕΛར༻ͯ͠ϕϧϚϯํఔࣜΛಋग़͠ɺTDֶश͢Δ͜ ͱͰ͖ͳ͍ ▸
TDޡࠩΛඇઢܗม͠ɺϢʔβʔͷϦεΫબੑΛө͢ ΔΞϓϩʔν
2.6.2 ϦεΫߟྀܕڧԽֶश๏ ϦλʔϯҎ֎ͷϦεΫࢦඪΛಋೖ͢ΔΞϓϩʔν ▸ ใुʹؔ͠ͳ͍ϦεΫཁҼΛߟྀ͢ΔΞϓϩʔν ▸ ϦεΫؔΛಋೖρ
2.6.3 ϦεΫߟྀܕڧԽֶशͷͨΊͷϦλʔϯਪఆ Ϧλʔϯͷਪఆ͕伴 ▸ Ϧλʔϯ͔ΒϦεΫࢦඪΛಋग़͢Δ ▸ http://latent-dynamics.net/02/09_Morimura.ppt.pdf
ϦλʔϯਪఆͷΞϓϩʔν ▸ γϛϡϨʔγϣϯΞϓϩʔν ▸ ঢ়ଶs, ߦಈaΛهԱͯ͠TΛेେ͖͘͢ΕɺϦλʔϯͷඪຊ͕ଟ͘ू·ΓɺϦ λʔϯͷਪఆ͕Մೳ ▸ ܭࢉίετ͕େ ▸
ղੳతΞϓϩʔν ▸ ϦλʔϯΛղੳతʹղ͘ϕϧϚϯํఔࣜ ▸ ϕϧϚϯํఔࣜΛParticle SmoothingͰղ͘ɺϊϯύϥϝτϦοΫϦλʔϯ ਪఆΞϧΰϦζϜ ▸ https://pdfs.semanticscholar.org/ 1ec2/6e05c2577154213e1668ddd374e4da663309.pdf 2.6.3 ϦεΫߟྀܕڧԽֶशͷͨΊͷϦλʔϯਪఆ
ϕϧϚϯํఔࣜ 2.6.3 ϦεΫߟྀܕڧԽֶशͷͨΊͷϦλʔϯਪఆ
ϊϯύϥϝτϦοΫɾϦλʔϯਪఆ 2.6.3 ϦεΫߟྀܕڧԽֶशͷͨΊͷϦλʔϯਪఆ ▸ ύʔςΟΫϧͰϦλʔϯΛۙࣅ ▸ http://latent-dynamics.net/02/09_Morimura.ppt.pdf