Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
これからの強化学習2.6
Search
moyomot
May 19, 2017
0
210
これからの強化学習2.6
moyomot
May 19, 2017
Tweet
Share
More Decks by moyomot
See All by moyomot
DRIVE CHARTのMLOpsを体感しよう
moyomot
0
99
現場課題に向き合い MLOps成熟度を高める道
moyomot
1
970
第1回 Data-Centric AI勉強会 LT: AIドラレコを支える一貫性のあるデータの作り方
moyomot
0
840
DRIVE CHARTにおけるAI開発とアーキテクチャ全容
moyomot
0
940
これからの強化学習2.7
moyomot
0
130
Gunosyのデータ分析基盤、ログ基盤の全容
moyomot
14
9.5k
GunosyにおけるSparkStreaming活用事例
moyomot
1
5.1k
トピックモデル第2章
moyomot
0
300
adhoc analysis apache spark
moyomot
1
1.1k
Featured
See All Featured
Product Roadmaps are Hard
iamctodd
PRO
50
11k
Rails Girls Zürich Keynote
gr2m
94
13k
Testing 201, or: Great Expectations
jmmastey
42
7.2k
Why Our Code Smells
bkeepers
PRO
336
57k
How To Stay Up To Date on Web Technology
chriscoyier
790
250k
Code Reviewing Like a Champion
maltzj
521
39k
10 Git Anti Patterns You Should be Aware of
lemiorhan
PRO
656
59k
"I'm Feeling Lucky" - Building Great Search Experiences for Today's Users (#IAC19)
danielanewman
226
22k
Code Review Best Practice
trishagee
67
18k
Templates, Plugins, & Blocks: Oh My! Creating the theme that thinks of everything
marktimemedia
30
2.2k
The Language of Interfaces
destraynor
156
24k
Fantastic passwords and where to find them - at NoRuKo
philnash
51
3k
Transcript
͜Ε͔ΒͷڧԽֶश 2.6 ϦεΫߟྀܕڧԽֶश GUNOSY σʔλϚΠχϯάݚڀձ #121
INTRODUCTION ͜͜·ͰֶΜͩڧԽֶशͰղܾͰ͖ͳ͍ ▸ ڧԽֶशͰใुͷظʢϦλʔϯʣͷ࠷େԽΛతͱ͢Δ ▸ ظͷ࠷େԽʢ࠷খԽʣͱͯ͠ఆࣜԽͰ͖ͳ͍έʔε͕͋Δ ▸ ى͜Δ͕͍͕֬ɺେ͖ͳଛࣦ͕ൃੜͯ͠͠·͏߹Ͱ͋ΓϢʔ βʔ͕ϦεΫճආʹڵຯͷ͋Δ߹ ▸
େ͖ͳෛͷใु͕ൃੜ͢ΔϦεΫΛੵۃతʹճආ͢ΔΈͰͳ͍ ▸ גࣜࢿͷΑ͏ͳ߹খ͞ͳ֬Ͱى͜Δେ͖ͳଛࣦΛճආ͠ ͳ͕ΒऩӹΛߴΊΔΑ͏ʹ͢Δඞཁ͕͋Δ ▸ ϦλʔϯʹظҎ֎ͷใ͕ͳ͍ͨΊ
INTRODUCTION ๅ͘͡ͷظ ▸ ߴ͍֬Ͱ1ηϯτṶ͔Δ ▸ ଟ͘ͷਓṶ͚͕খͯ͘͞ɺ100υϧଛ͢ΔϦεΫ͕େ ͖͍ͱߟ͑ΔͷͰ ▸ http://citeseerx.ist.psu.edu/viewdoc/download? doi=10.1.1.45.8264&rep=rep1&type=pdf
INTRODUCTION ࣍ ▸ 2.6.1 ڧԽֶशͷ෮शʢׂѪʣ ▸ 2.6.2 ϦεΫߟྀܕڧԽֶश๏ ▸ ͋Δछͷ࠷ѱέʔεධՁ
▸ ޮ༻ؔ࣌ؒࠩ(TD)ޡࠩͷඇઢܗԽ ▸ ϦλʔϯҎ֎ͷϦεΫࢦඪͷಋೖ ▸ 2.6.3 ϦεΫߟྀܕڧԽֶशͷͨΊͷϦλʔϯਪఆ ▸ Ϧλʔϯͷ͕֬Θ͔Ε Value-atRiskɺ༷ʑͳϦεΫ ࢦඪΛࢉग़Ͱ͖ɺϦεΫࢦඪʹج͍ͮͨҙࢥܾఆ͕Մೳ ▸ 2.6.4 ͓ΘΓʹ
2.6.2 ϦεΫߟྀܕڧԽֶश๏ ͋Δछͷ࠷ѱέʔεධՁ ▸ Q-learningΛ֦ு͢Δํ๏ ▸ Q-learningʢ෮शʣ ▸ ϕϧϚϯํఔࣜ ▸
TD(࣌ؒࠩ)ֶश
2.6.2 ϦεΫߟྀܕڧԽֶश๏ Qϋοτֶश maximinํࡦʹΑΔ֦ு Heger ▸ maximinͱ ▸ ఆ͞ΕΔ࠷খͷརӹ͕࠷େʹͳΔΑ͏ʹܾஅΛߦ͏ઓུ ▸
ͱ͍͏ͷఆࣜԽ ▸ େଛ͢ΔϦεΫΛ࠷খݶʹ ▸ Q-learningͷTDֶशΛ༻Ͱ͖ΔϝϦοτ ؔvsຊ Aઓུ Bઓུ Aઓུ 100 -100 Bઓུ 10 -10
2.6.2 ϦεΫߟྀܕڧԽֶश๏ ޮ༻ؔ࣌ؒࠩޡࠩΛඇઢܗԽ͢ΔΞϓϩʔν ▸ ϦεΫࢦඪͱͯ͠ϑΝΠφϯεɺ੍ޚཧͰར༻͞ΕΔඇઢ ܗͳޮ༻ؔΛར༻͢ΔΞϓϩʔν ▸ ͜ΕΛར༻ͯ͠ϕϧϚϯํఔࣜΛಋग़͠ɺTDֶश͢Δ͜ ͱͰ͖ͳ͍ ▸
TDޡࠩΛඇઢܗม͠ɺϢʔβʔͷϦεΫબੑΛө͢ ΔΞϓϩʔν
2.6.2 ϦεΫߟྀܕڧԽֶश๏ ϦλʔϯҎ֎ͷϦεΫࢦඪΛಋೖ͢ΔΞϓϩʔν ▸ ใुʹؔ͠ͳ͍ϦεΫཁҼΛߟྀ͢ΔΞϓϩʔν ▸ ϦεΫؔΛಋೖρ
2.6.3 ϦεΫߟྀܕڧԽֶशͷͨΊͷϦλʔϯਪఆ Ϧλʔϯͷਪఆ͕伴 ▸ Ϧλʔϯ͔ΒϦεΫࢦඪΛಋग़͢Δ ▸ http://latent-dynamics.net/02/09_Morimura.ppt.pdf
ϦλʔϯਪఆͷΞϓϩʔν ▸ γϛϡϨʔγϣϯΞϓϩʔν ▸ ঢ়ଶs, ߦಈaΛهԱͯ͠TΛेେ͖͘͢ΕɺϦλʔϯͷඪຊ͕ଟ͘ू·ΓɺϦ λʔϯͷਪఆ͕Մೳ ▸ ܭࢉίετ͕େ ▸
ղੳతΞϓϩʔν ▸ ϦλʔϯΛղੳతʹղ͘ϕϧϚϯํఔࣜ ▸ ϕϧϚϯํఔࣜΛParticle SmoothingͰղ͘ɺϊϯύϥϝτϦοΫϦλʔϯ ਪఆΞϧΰϦζϜ ▸ https://pdfs.semanticscholar.org/ 1ec2/6e05c2577154213e1668ddd374e4da663309.pdf 2.6.3 ϦεΫߟྀܕڧԽֶशͷͨΊͷϦλʔϯਪఆ
ϕϧϚϯํఔࣜ 2.6.3 ϦεΫߟྀܕڧԽֶशͷͨΊͷϦλʔϯਪఆ
ϊϯύϥϝτϦοΫɾϦλʔϯਪఆ 2.6.3 ϦεΫߟྀܕڧԽֶशͷͨΊͷϦλʔϯਪఆ ▸ ύʔςΟΫϧͰϦλʔϯΛۙࣅ ▸ http://latent-dynamics.net/02/09_Morimura.ppt.pdf