Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
これからの強化学習2.6
Search
moyomot
May 19, 2017
0
200
これからの強化学習2.6
moyomot
May 19, 2017
Tweet
Share
More Decks by moyomot
See All by moyomot
DRIVE CHARTのMLOpsを体感しよう
moyomot
0
59
現場課題に向き合い MLOps成熟度を高める道
moyomot
1
880
第1回 Data-Centric AI勉強会 LT: AIドラレコを支える一貫性のあるデータの作り方
moyomot
0
750
DRIVE CHARTにおけるAI開発とアーキテクチャ全容
moyomot
0
750
これからの強化学習2.7
moyomot
0
120
Gunosyのデータ分析基盤、ログ基盤の全容
moyomot
14
9.3k
GunosyにおけるSparkStreaming活用事例
moyomot
0
5k
トピックモデル第2章
moyomot
0
290
adhoc analysis apache spark
moyomot
1
1.1k
Featured
See All Featured
[RailsConf 2023] Rails as a piece of cake
palkan
35
4.4k
How to name files
jennybc
67
96k
Designing Dashboards & Data Visualisations in Web Apps
destraynor
226
52k
Why You Should Never Use an ORM
jnunemaker
PRO
51
8.9k
Code Reviewing Like a Champion
maltzj
517
39k
Fight the Zombie Pattern Library - RWD Summit 2016
marcelosomers
228
16k
Let's Do A Bunch of Simple Stuff to Make Websites Faster
chriscoyier
502
140k
Six Lessons from altMBA
skipperchong
24
3.2k
How to Create Impact in a Changing Tech Landscape [PerfNow 2023]
tammyeverts
34
1.9k
Atom: Resistance is Futile
akmur
261
25k
What the flash - Photography Introduction
edds
65
11k
Docker and Python
trallard
37
2.9k
Transcript
͜Ε͔ΒͷڧԽֶश 2.6 ϦεΫߟྀܕڧԽֶश GUNOSY σʔλϚΠχϯάݚڀձ #121
INTRODUCTION ͜͜·ͰֶΜͩڧԽֶशͰղܾͰ͖ͳ͍ ▸ ڧԽֶशͰใुͷظʢϦλʔϯʣͷ࠷େԽΛతͱ͢Δ ▸ ظͷ࠷େԽʢ࠷খԽʣͱͯ͠ఆࣜԽͰ͖ͳ͍έʔε͕͋Δ ▸ ى͜Δ͕͍͕֬ɺେ͖ͳଛࣦ͕ൃੜͯ͠͠·͏߹Ͱ͋ΓϢʔ βʔ͕ϦεΫճආʹڵຯͷ͋Δ߹ ▸
େ͖ͳෛͷใु͕ൃੜ͢ΔϦεΫΛੵۃతʹճආ͢ΔΈͰͳ͍ ▸ גࣜࢿͷΑ͏ͳ߹খ͞ͳ֬Ͱى͜Δେ͖ͳଛࣦΛճආ͠ ͳ͕ΒऩӹΛߴΊΔΑ͏ʹ͢Δඞཁ͕͋Δ ▸ ϦλʔϯʹظҎ֎ͷใ͕ͳ͍ͨΊ
INTRODUCTION ๅ͘͡ͷظ ▸ ߴ͍֬Ͱ1ηϯτṶ͔Δ ▸ ଟ͘ͷਓṶ͚͕খͯ͘͞ɺ100υϧଛ͢ΔϦεΫ͕େ ͖͍ͱߟ͑ΔͷͰ ▸ http://citeseerx.ist.psu.edu/viewdoc/download? doi=10.1.1.45.8264&rep=rep1&type=pdf
INTRODUCTION ࣍ ▸ 2.6.1 ڧԽֶशͷ෮शʢׂѪʣ ▸ 2.6.2 ϦεΫߟྀܕڧԽֶश๏ ▸ ͋Δछͷ࠷ѱέʔεධՁ
▸ ޮ༻ؔ࣌ؒࠩ(TD)ޡࠩͷඇઢܗԽ ▸ ϦλʔϯҎ֎ͷϦεΫࢦඪͷಋೖ ▸ 2.6.3 ϦεΫߟྀܕڧԽֶशͷͨΊͷϦλʔϯਪఆ ▸ Ϧλʔϯͷ͕֬Θ͔Ε Value-atRiskɺ༷ʑͳϦεΫ ࢦඪΛࢉग़Ͱ͖ɺϦεΫࢦඪʹج͍ͮͨҙࢥܾఆ͕Մೳ ▸ 2.6.4 ͓ΘΓʹ
2.6.2 ϦεΫߟྀܕڧԽֶश๏ ͋Δछͷ࠷ѱέʔεධՁ ▸ Q-learningΛ֦ு͢Δํ๏ ▸ Q-learningʢ෮शʣ ▸ ϕϧϚϯํఔࣜ ▸
TD(࣌ؒࠩ)ֶश
2.6.2 ϦεΫߟྀܕڧԽֶश๏ Qϋοτֶश maximinํࡦʹΑΔ֦ு Heger ▸ maximinͱ ▸ ఆ͞ΕΔ࠷খͷརӹ͕࠷େʹͳΔΑ͏ʹܾஅΛߦ͏ઓུ ▸
ͱ͍͏ͷఆࣜԽ ▸ େଛ͢ΔϦεΫΛ࠷খݶʹ ▸ Q-learningͷTDֶशΛ༻Ͱ͖ΔϝϦοτ ؔvsຊ Aઓུ Bઓུ Aઓུ 100 -100 Bઓུ 10 -10
2.6.2 ϦεΫߟྀܕڧԽֶश๏ ޮ༻ؔ࣌ؒࠩޡࠩΛඇઢܗԽ͢ΔΞϓϩʔν ▸ ϦεΫࢦඪͱͯ͠ϑΝΠφϯεɺ੍ޚཧͰར༻͞ΕΔඇઢ ܗͳޮ༻ؔΛར༻͢ΔΞϓϩʔν ▸ ͜ΕΛར༻ͯ͠ϕϧϚϯํఔࣜΛಋग़͠ɺTDֶश͢Δ͜ ͱͰ͖ͳ͍ ▸
TDޡࠩΛඇઢܗม͠ɺϢʔβʔͷϦεΫબੑΛө͢ ΔΞϓϩʔν
2.6.2 ϦεΫߟྀܕڧԽֶश๏ ϦλʔϯҎ֎ͷϦεΫࢦඪΛಋೖ͢ΔΞϓϩʔν ▸ ใुʹؔ͠ͳ͍ϦεΫཁҼΛߟྀ͢ΔΞϓϩʔν ▸ ϦεΫؔΛಋೖρ
2.6.3 ϦεΫߟྀܕڧԽֶशͷͨΊͷϦλʔϯਪఆ Ϧλʔϯͷਪఆ͕伴 ▸ Ϧλʔϯ͔ΒϦεΫࢦඪΛಋग़͢Δ ▸ http://latent-dynamics.net/02/09_Morimura.ppt.pdf
ϦλʔϯਪఆͷΞϓϩʔν ▸ γϛϡϨʔγϣϯΞϓϩʔν ▸ ঢ়ଶs, ߦಈaΛهԱͯ͠TΛेେ͖͘͢ΕɺϦλʔϯͷඪຊ͕ଟ͘ू·ΓɺϦ λʔϯͷਪఆ͕Մೳ ▸ ܭࢉίετ͕େ ▸
ղੳతΞϓϩʔν ▸ ϦλʔϯΛղੳతʹղ͘ϕϧϚϯํఔࣜ ▸ ϕϧϚϯํఔࣜΛParticle SmoothingͰղ͘ɺϊϯύϥϝτϦοΫϦλʔϯ ਪఆΞϧΰϦζϜ ▸ https://pdfs.semanticscholar.org/ 1ec2/6e05c2577154213e1668ddd374e4da663309.pdf 2.6.3 ϦεΫߟྀܕڧԽֶशͷͨΊͷϦλʔϯਪఆ
ϕϧϚϯํఔࣜ 2.6.3 ϦεΫߟྀܕڧԽֶशͷͨΊͷϦλʔϯਪఆ
ϊϯύϥϝτϦοΫɾϦλʔϯਪఆ 2.6.3 ϦεΫߟྀܕڧԽֶशͷͨΊͷϦλʔϯਪఆ ▸ ύʔςΟΫϧͰϦλʔϯΛۙࣅ ▸ http://latent-dynamics.net/02/09_Morimura.ppt.pdf