Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
これからの強化学習2.6
Search
moyomot
May 19, 2017
0
210
これからの強化学習2.6
moyomot
May 19, 2017
Tweet
Share
More Decks by moyomot
See All by moyomot
DRIVE CHARTのMLOpsを体感しよう
moyomot
0
160
現場課題に向き合い MLOps成熟度を高める道
moyomot
1
1.1k
第1回 Data-Centric AI勉強会 LT: AIドラレコを支える一貫性のあるデータの作り方
moyomot
0
960
DRIVE CHARTにおけるAI開発とアーキテクチャ全容
moyomot
0
1.1k
これからの強化学習2.7
moyomot
0
140
Gunosyのデータ分析基盤、ログ基盤の全容
moyomot
14
9.6k
GunosyにおけるSparkStreaming活用事例
moyomot
1
5.3k
トピックモデル第2章
moyomot
0
320
adhoc analysis apache spark
moyomot
1
1.1k
Featured
See All Featured
Producing Creativity
orderedlist
PRO
347
40k
StorybookのUI Testing Handbookを読んだ
zakiyama
31
6.2k
Connecting the Dots Between Site Speed, User Experience & Your Business [WebExpo 2025]
tammyeverts
9
590
Navigating Team Friction
lara
190
15k
Build The Right Thing And Hit Your Dates
maggiecrowley
37
2.9k
Save Time (by Creating Custom Rails Generators)
garrettdimon
PRO
32
1.6k
How GitHub (no longer) Works
holman
315
140k
Build your cross-platform service in a week with App Engine
jlugia
232
18k
Docker and Python
trallard
46
3.6k
Facilitating Awesome Meetings
lara
56
6.6k
Templates, Plugins, & Blocks: Oh My! Creating the theme that thinks of everything
marktimemedia
31
2.5k
How STYLIGHT went responsive
nonsquared
100
5.8k
Transcript
͜Ε͔ΒͷڧԽֶश 2.6 ϦεΫߟྀܕڧԽֶश GUNOSY σʔλϚΠχϯάݚڀձ #121
INTRODUCTION ͜͜·ͰֶΜͩڧԽֶशͰղܾͰ͖ͳ͍ ▸ ڧԽֶशͰใुͷظʢϦλʔϯʣͷ࠷େԽΛతͱ͢Δ ▸ ظͷ࠷େԽʢ࠷খԽʣͱͯ͠ఆࣜԽͰ͖ͳ͍έʔε͕͋Δ ▸ ى͜Δ͕͍͕֬ɺେ͖ͳଛࣦ͕ൃੜͯ͠͠·͏߹Ͱ͋ΓϢʔ βʔ͕ϦεΫճආʹڵຯͷ͋Δ߹ ▸
େ͖ͳෛͷใु͕ൃੜ͢ΔϦεΫΛੵۃతʹճආ͢ΔΈͰͳ͍ ▸ גࣜࢿͷΑ͏ͳ߹খ͞ͳ֬Ͱى͜Δେ͖ͳଛࣦΛճආ͠ ͳ͕ΒऩӹΛߴΊΔΑ͏ʹ͢Δඞཁ͕͋Δ ▸ ϦλʔϯʹظҎ֎ͷใ͕ͳ͍ͨΊ
INTRODUCTION ๅ͘͡ͷظ ▸ ߴ͍֬Ͱ1ηϯτṶ͔Δ ▸ ଟ͘ͷਓṶ͚͕খͯ͘͞ɺ100υϧଛ͢ΔϦεΫ͕େ ͖͍ͱߟ͑ΔͷͰ ▸ http://citeseerx.ist.psu.edu/viewdoc/download? doi=10.1.1.45.8264&rep=rep1&type=pdf
INTRODUCTION ࣍ ▸ 2.6.1 ڧԽֶशͷ෮शʢׂѪʣ ▸ 2.6.2 ϦεΫߟྀܕڧԽֶश๏ ▸ ͋Δछͷ࠷ѱέʔεධՁ
▸ ޮ༻ؔ࣌ؒࠩ(TD)ޡࠩͷඇઢܗԽ ▸ ϦλʔϯҎ֎ͷϦεΫࢦඪͷಋೖ ▸ 2.6.3 ϦεΫߟྀܕڧԽֶशͷͨΊͷϦλʔϯਪఆ ▸ Ϧλʔϯͷ͕֬Θ͔Ε Value-atRiskɺ༷ʑͳϦεΫ ࢦඪΛࢉग़Ͱ͖ɺϦεΫࢦඪʹج͍ͮͨҙࢥܾఆ͕Մೳ ▸ 2.6.4 ͓ΘΓʹ
2.6.2 ϦεΫߟྀܕڧԽֶश๏ ͋Δछͷ࠷ѱέʔεධՁ ▸ Q-learningΛ֦ு͢Δํ๏ ▸ Q-learningʢ෮शʣ ▸ ϕϧϚϯํఔࣜ ▸
TD(࣌ؒࠩ)ֶश
2.6.2 ϦεΫߟྀܕڧԽֶश๏ Qϋοτֶश maximinํࡦʹΑΔ֦ு Heger ▸ maximinͱ ▸ ఆ͞ΕΔ࠷খͷརӹ͕࠷େʹͳΔΑ͏ʹܾஅΛߦ͏ઓུ ▸
ͱ͍͏ͷఆࣜԽ ▸ େଛ͢ΔϦεΫΛ࠷খݶʹ ▸ Q-learningͷTDֶशΛ༻Ͱ͖ΔϝϦοτ ؔvsຊ Aઓུ Bઓུ Aઓུ 100 -100 Bઓུ 10 -10
2.6.2 ϦεΫߟྀܕڧԽֶश๏ ޮ༻ؔ࣌ؒࠩޡࠩΛඇઢܗԽ͢ΔΞϓϩʔν ▸ ϦεΫࢦඪͱͯ͠ϑΝΠφϯεɺ੍ޚཧͰར༻͞ΕΔඇઢ ܗͳޮ༻ؔΛར༻͢ΔΞϓϩʔν ▸ ͜ΕΛར༻ͯ͠ϕϧϚϯํఔࣜΛಋग़͠ɺTDֶश͢Δ͜ ͱͰ͖ͳ͍ ▸
TDޡࠩΛඇઢܗม͠ɺϢʔβʔͷϦεΫબੑΛө͢ ΔΞϓϩʔν
2.6.2 ϦεΫߟྀܕڧԽֶश๏ ϦλʔϯҎ֎ͷϦεΫࢦඪΛಋೖ͢ΔΞϓϩʔν ▸ ใुʹؔ͠ͳ͍ϦεΫཁҼΛߟྀ͢ΔΞϓϩʔν ▸ ϦεΫؔΛಋೖρ
2.6.3 ϦεΫߟྀܕڧԽֶशͷͨΊͷϦλʔϯਪఆ Ϧλʔϯͷਪఆ͕伴 ▸ Ϧλʔϯ͔ΒϦεΫࢦඪΛಋग़͢Δ ▸ http://latent-dynamics.net/02/09_Morimura.ppt.pdf
ϦλʔϯਪఆͷΞϓϩʔν ▸ γϛϡϨʔγϣϯΞϓϩʔν ▸ ঢ়ଶs, ߦಈaΛهԱͯ͠TΛेେ͖͘͢ΕɺϦλʔϯͷඪຊ͕ଟ͘ू·ΓɺϦ λʔϯͷਪఆ͕Մೳ ▸ ܭࢉίετ͕େ ▸
ղੳతΞϓϩʔν ▸ ϦλʔϯΛղੳతʹղ͘ϕϧϚϯํఔࣜ ▸ ϕϧϚϯํఔࣜΛParticle SmoothingͰղ͘ɺϊϯύϥϝτϦοΫϦλʔϯ ਪఆΞϧΰϦζϜ ▸ https://pdfs.semanticscholar.org/ 1ec2/6e05c2577154213e1668ddd374e4da663309.pdf 2.6.3 ϦεΫߟྀܕڧԽֶशͷͨΊͷϦλʔϯਪఆ
ϕϧϚϯํఔࣜ 2.6.3 ϦεΫߟྀܕڧԽֶशͷͨΊͷϦλʔϯਪఆ
ϊϯύϥϝτϦοΫɾϦλʔϯਪఆ 2.6.3 ϦεΫߟྀܕڧԽֶशͷͨΊͷϦλʔϯਪఆ ▸ ύʔςΟΫϧͰϦλʔϯΛۙࣅ ▸ http://latent-dynamics.net/02/09_Morimura.ppt.pdf