Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
これからの強化学習2.6
Search
Sponsored
·
Your Podcast. Everywhere. Effortlessly.
Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.
→
moyomot
May 19, 2017
0
220
これからの強化学習2.6
moyomot
May 19, 2017
Tweet
Share
More Decks by moyomot
See All by moyomot
DRIVE CHARTのMLOpsを体感しよう
moyomot
0
180
現場課題に向き合い MLOps成熟度を高める道
moyomot
1
1.1k
第1回 Data-Centric AI勉強会 LT: AIドラレコを支える一貫性のあるデータの作り方
moyomot
0
1k
DRIVE CHARTにおけるAI開発とアーキテクチャ全容
moyomot
0
1.3k
これからの強化学習2.7
moyomot
0
140
Gunosyのデータ分析基盤、ログ基盤の全容
moyomot
14
9.7k
GunosyにおけるSparkStreaming活用事例
moyomot
1
5.3k
トピックモデル第2章
moyomot
0
330
adhoc analysis apache spark
moyomot
1
1.1k
Featured
See All Featured
WENDY [Excerpt]
tessaabrams
9
37k
Design and Strategy: How to Deal with People Who Don’t "Get" Design
morganepeng
133
19k
Statistics for Hackers
jakevdp
799
230k
Why You Should Never Use an ORM
jnunemaker
PRO
61
9.8k
Stewardship and Sustainability of Urban and Community Forests
pwiseman
0
160
Lightning Talk: Beautiful Slides for Beginners
inesmontani
PRO
1
490
Visual Storytelling: How to be a Superhuman Communicator
reverentgeek
2
480
Agile Actions for Facilitating Distributed Teams - ADO2019
mkilby
0
160
End of SEO as We Know It (SMX Advanced Version)
ipullrank
3
4.1k
Impact Scores and Hybrid Strategies: The future of link building
tamaranovitovic
0
240
The Anti-SEO Checklist Checklist. Pubcon Cyber Week
ryanjones
0
100
How STYLIGHT went responsive
nonsquared
100
6k
Transcript
͜Ε͔ΒͷڧԽֶश 2.6 ϦεΫߟྀܕڧԽֶश GUNOSY σʔλϚΠχϯάݚڀձ #121
INTRODUCTION ͜͜·ͰֶΜͩڧԽֶशͰղܾͰ͖ͳ͍ ▸ ڧԽֶशͰใुͷظʢϦλʔϯʣͷ࠷େԽΛతͱ͢Δ ▸ ظͷ࠷େԽʢ࠷খԽʣͱͯ͠ఆࣜԽͰ͖ͳ͍έʔε͕͋Δ ▸ ى͜Δ͕͍͕֬ɺେ͖ͳଛࣦ͕ൃੜͯ͠͠·͏߹Ͱ͋ΓϢʔ βʔ͕ϦεΫճආʹڵຯͷ͋Δ߹ ▸
େ͖ͳෛͷใु͕ൃੜ͢ΔϦεΫΛੵۃతʹճආ͢ΔΈͰͳ͍ ▸ גࣜࢿͷΑ͏ͳ߹খ͞ͳ֬Ͱى͜Δେ͖ͳଛࣦΛճආ͠ ͳ͕ΒऩӹΛߴΊΔΑ͏ʹ͢Δඞཁ͕͋Δ ▸ ϦλʔϯʹظҎ֎ͷใ͕ͳ͍ͨΊ
INTRODUCTION ๅ͘͡ͷظ ▸ ߴ͍֬Ͱ1ηϯτṶ͔Δ ▸ ଟ͘ͷਓṶ͚͕খͯ͘͞ɺ100υϧଛ͢ΔϦεΫ͕େ ͖͍ͱߟ͑ΔͷͰ ▸ http://citeseerx.ist.psu.edu/viewdoc/download? doi=10.1.1.45.8264&rep=rep1&type=pdf
INTRODUCTION ࣍ ▸ 2.6.1 ڧԽֶशͷ෮शʢׂѪʣ ▸ 2.6.2 ϦεΫߟྀܕڧԽֶश๏ ▸ ͋Δछͷ࠷ѱέʔεධՁ
▸ ޮ༻ؔ࣌ؒࠩ(TD)ޡࠩͷඇઢܗԽ ▸ ϦλʔϯҎ֎ͷϦεΫࢦඪͷಋೖ ▸ 2.6.3 ϦεΫߟྀܕڧԽֶशͷͨΊͷϦλʔϯਪఆ ▸ Ϧλʔϯͷ͕֬Θ͔Ε Value-atRiskɺ༷ʑͳϦεΫ ࢦඪΛࢉग़Ͱ͖ɺϦεΫࢦඪʹج͍ͮͨҙࢥܾఆ͕Մೳ ▸ 2.6.4 ͓ΘΓʹ
2.6.2 ϦεΫߟྀܕڧԽֶश๏ ͋Δछͷ࠷ѱέʔεධՁ ▸ Q-learningΛ֦ு͢Δํ๏ ▸ Q-learningʢ෮शʣ ▸ ϕϧϚϯํఔࣜ ▸
TD(࣌ؒࠩ)ֶश
2.6.2 ϦεΫߟྀܕڧԽֶश๏ Qϋοτֶश maximinํࡦʹΑΔ֦ு Heger ▸ maximinͱ ▸ ఆ͞ΕΔ࠷খͷརӹ͕࠷େʹͳΔΑ͏ʹܾஅΛߦ͏ઓུ ▸
ͱ͍͏ͷఆࣜԽ ▸ େଛ͢ΔϦεΫΛ࠷খݶʹ ▸ Q-learningͷTDֶशΛ༻Ͱ͖ΔϝϦοτ ؔvsຊ Aઓུ Bઓུ Aઓུ 100 -100 Bઓུ 10 -10
2.6.2 ϦεΫߟྀܕڧԽֶश๏ ޮ༻ؔ࣌ؒࠩޡࠩΛඇઢܗԽ͢ΔΞϓϩʔν ▸ ϦεΫࢦඪͱͯ͠ϑΝΠφϯεɺ੍ޚཧͰར༻͞ΕΔඇઢ ܗͳޮ༻ؔΛར༻͢ΔΞϓϩʔν ▸ ͜ΕΛར༻ͯ͠ϕϧϚϯํఔࣜΛಋग़͠ɺTDֶश͢Δ͜ ͱͰ͖ͳ͍ ▸
TDޡࠩΛඇઢܗม͠ɺϢʔβʔͷϦεΫબੑΛө͢ ΔΞϓϩʔν
2.6.2 ϦεΫߟྀܕڧԽֶश๏ ϦλʔϯҎ֎ͷϦεΫࢦඪΛಋೖ͢ΔΞϓϩʔν ▸ ใुʹؔ͠ͳ͍ϦεΫཁҼΛߟྀ͢ΔΞϓϩʔν ▸ ϦεΫؔΛಋೖρ
2.6.3 ϦεΫߟྀܕڧԽֶशͷͨΊͷϦλʔϯਪఆ Ϧλʔϯͷਪఆ͕伴 ▸ Ϧλʔϯ͔ΒϦεΫࢦඪΛಋग़͢Δ ▸ http://latent-dynamics.net/02/09_Morimura.ppt.pdf
ϦλʔϯਪఆͷΞϓϩʔν ▸ γϛϡϨʔγϣϯΞϓϩʔν ▸ ঢ়ଶs, ߦಈaΛهԱͯ͠TΛेେ͖͘͢ΕɺϦλʔϯͷඪຊ͕ଟ͘ू·ΓɺϦ λʔϯͷਪఆ͕Մೳ ▸ ܭࢉίετ͕େ ▸
ղੳతΞϓϩʔν ▸ ϦλʔϯΛղੳతʹղ͘ϕϧϚϯํఔࣜ ▸ ϕϧϚϯํఔࣜΛParticle SmoothingͰղ͘ɺϊϯύϥϝτϦοΫϦλʔϯ ਪఆΞϧΰϦζϜ ▸ https://pdfs.semanticscholar.org/ 1ec2/6e05c2577154213e1668ddd374e4da663309.pdf 2.6.3 ϦεΫߟྀܕڧԽֶशͷͨΊͷϦλʔϯਪఆ
ϕϧϚϯํఔࣜ 2.6.3 ϦεΫߟྀܕڧԽֶशͷͨΊͷϦλʔϯਪఆ
ϊϯύϥϝτϦοΫɾϦλʔϯਪఆ 2.6.3 ϦεΫߟྀܕڧԽֶशͷͨΊͷϦλʔϯਪఆ ▸ ύʔςΟΫϧͰϦλʔϯΛۙࣅ ▸ http://latent-dynamics.net/02/09_Morimura.ppt.pdf