Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
強化学習による制御システムの自動反例生成
Search
Yoriyuki Yamagata
December 14, 2018
Technology
3k
0
Share
Embed
Copy iframe code
Copy JS code
Copy link
Start on current slide
強化学習による制御システムの自動反例生成
第1回AI4SEセミナー講演資料
Yoriyuki Yamagata
December 14, 2018
More Decks by Yoriyuki Yamagata
See All by Yoriyuki Yamagata
Individual-based epidemiological model of COVID19 using location data
yoriyukiprf
0
140
On proving consistency of equational theories in bounded arithmetic
yoriyukiprf
0
210
On proving consistency of equational theories in bounded arithmetic
yoriyukiprf
0
170
個人レベルの位置情報を使ったCOVID19の感染モデル
yoriyukiprf
0
140
人流データを用いた人流制限解除後のCOVID-19感染状況の推定
yoriyukiprf
0
190
Falsification of Cyber-Physical Systems Using Deep Reinforcement Learning
yoriyukiprf
0
210
引用の記述説の擁護
yoriyukiprf
0
270
Consistency proof of fragments of equational systems with substitution in bounded arithmetic
yoriyukiprf
0
170
Concepts on AI Fairness
yoriyukiprf
0
190
Other Decks in Technology
See All in Technology
protovalidate-es を導入してみた
bengo4com
0
160
MIERUNE JCT 発表資料「宇宙から伊能忠敬ごっこ」
syuchimu
0
200
Ruby::Boxでできること、Refinementsでできること
joker1007
3
410
GoとSIMDとWasmの今。
askua
3
520
Microsoft Build Keynoteふりかえり
tomokusaba
0
120
生成 AI × MCP で切り拓く次世代 SRE!自律型運用への挑戦と開発者体験の進化
_awache
0
180
Rancherの紹介&Update情報(RancherJP Online Meetup #09)
yoshiyuki_kono
0
140
非定型業務をAI slackbotで自動化する ~ 社内要望を自動壁打ちするbotを作った ~/automating-ad-hoc-work-with-ai-slackbot
shibayu36
0
540
Claude Code×Terraform IaC テンプレート駆動開発
itouhi
1
460
探して_入れて_作って_使う_Agent_Skills___LT.pdf
peintangos
2
190
非エンジニアがClaudeと挑んだ「1ヶ月間プロダクト30本ノック」
askokc
0
150
EventBridge Connection
_kensh
5
680
Featured
See All Featured
エンジニアに許された特別な時間の終わり
watany
107
250k
SEO in 2025: How to Prepare for the Future of Search
ipullrank
3
3.5k
Stewardship and Sustainability of Urban and Community Forests
pwiseman
0
220
CSS Pre-Processors: Stylus, Less & Sass
bermonpainter
360
30k
Code Reviewing Like a Champion
maltzj
528
40k
The Illustrated Children's Guide to Kubernetes
chrisshort
51
52k
What the history of the web can teach us about the future of AI
inesmontani
PRO
1
610
16th Malabo Montpellier Forum Presentation
akademiya2063
PRO
0
140
Large-scale JavaScript Application Architecture
addyosmani
515
110k
Why Our Code Smells
bkeepers
PRO
340
58k
SEO Brein meetup: CTRL+C is not how to scale international SEO
lindahogenes
1
2.7k
Build The Right Thing And Hit Your Dates
maggiecrowley
39
3.2k
Transcript
ڧԽֶशʹΑΔ੍ޚγεςϜͷࣗಈྫੜ ࢁܗ↳೭ʢ࢈ۀٕज़૯߹ݚڀॴʣ ࡚ະʢ࢜௨ݚڀॴʣཱུɺஈၬւɺ㭟ݐြʢఱେֶʣ
3ߦ·ͱΊ • ੍ޚܥ͕ʮ͓͔͠ͳৼΔ͍ʯΛ͢ΔೖྗΛڧԽֶश Λͬͯࣗಈੜͨ͠ • ڧԽֶशΛΘͳ͍ख๏ʹൺͯɺޮతʹ୳ࡧͰ͖ Δʢ͍͍ͩͨͷ߹ʣ • ·࣮ͩ༻ʹఔԕ͍ʢͱࢥ͏ʣ
ڧԽֶश ΤʔδΣϯτ ڥ ΞΫγϣϯ རಘ ঢ়ଶ ؍ଌ
ڧԽֶश • ΤʔδΣϯτརಘͷظׂҾ͖ݱࡏՁΛ࠷େԽ͢ Δ • རಘͷকདྷʹͬͯͷ߹ܭͷظͱߟ͑ͯྑ͍ Ri = [ ∞
∑ k=i γk−irk] , γ ≤ 1 : constant
ڧԽֶशͷख๏ • Q-functionΛ༻͍ͨํ๏ʢDQNͳͲʣ • Q-functionʢঢ়ଶͱΞΫγϣϯ͔ΒظརಘΛٻΊΔ ؔʣΛਪఆ͢Δ • Actor-CriticʢA3CͳͲʣ • ʮΞΫλʔʯ͕ै͏ϙϦγʔΛɺͦͷύϑΥʔϚϯ
εΛਪఆ͢ΔʮΫϦςΟοΫʯ͕Ξοϓσʔτͯ͠ ͍͘
੍ޚγεςϜͷྫɿࣗಈมػ ΞΫηϧ ϒϨʔΩ Τϯδϯ ΪΞ
ࣗಈྫੜ ཁٻɿΤϯδϯຖ4770ճҎԼ ཁٻΛຬͨ͞ͳ͍ΞΫηϧɾϒϨʔΩύλʔϯ ΛࣗಈͰੜ͢Δ ΞΫηϧ ϒϨʔΩ Τϯδϯ ΪΞ
࠷దԽʹΑΔࣗಈྫੜ Τϯδϯ Τϯδϯͷ࠷େΛͰ͖Δ্͚ͩ͛Εྑ͍ →࠷దԽٕ๏͕͑Δʂ ӡసૢ࡞ 0 5 10 15 20
25 30 0 50 100 150 0 100 200 300 400 500 600 0 5 10 15 20 25 30 1000 1500 2000 2500 3000 3500 4000 4500 5000
͜Ε·Ͱͷ࠷దԽʹΑΔࣗಈྫੜͷΈ Τϯδϯ ӡసૢ࡞ 0 5 10 15 20 25 30
0 50 100 150 0 100 200 300 400 500 600 0 5 10 15 20 25 30 1000 1500 2000 2500 3000 3500 4000 4500 5000 ࠷ߴΤϯδϯ ࠷దԽΞϧΰϦζϜ 1ϧʔϓ=1γϛϡϨʔγϣϯ
ڧԽֶशʹΑΔࣗಈྫੜ γεςϜঢ়ଶ ӡసૢ࡞ 0 5 10 15 20 25 30
0 50 100 150 0 100 200 300 400 500 600 0 5 10 15 20 25 30 1000 1500 2000 2500 3000 3500 4000 4500 5000 Τϯδϯ ڧԽֶश 1ϧʔϓ=γϛϡϨʔγϣϯ1εςοϓ
རಘͷઃܭ Ri = [ ∞ ∑ k=i γk−irk] , γ
≤ 1 : constant ڧԽֶशརಘͷΛ࠷େԽ͢Δ ྫੜΤϯδϯͷ࠷େΛ࠷େԽ͢Δ robi = T max k=i ωk , ωk : engine speed
རಘͷઃܭɿ࠷େͷʹΑΔۙࣅ max{x1 , …, xn } ∼ log n ∑
i=1 {exi − 1} Λ͏ robi = T max k=i ωk ∼ log T ∑ k=i {eωk − 1} Ri = [ ∞ ∑ k=i γk−irk] ͱݟൺΔͱ ri = eωi − 1 γ = 1 ͱஔ͚ྑ͍
ຊ͏গ͠ෳࡶ • ͬͱ͍Ζ͍Ζͳੑ࣭ͷྫੜΛ͍ͨ͠ • MTLͱ͍͏ཧࣜͰ͔͚Δੑ࣭ͷҰ෦͕ରԠՄ • རಘͷಋग़͕͏গ͠ෳࡶʹͳΔ • εέʔϦϯάʹΑΔਖ਼نԽ •
ೖग़ྗΛεέʔϦϯάͯ͠ൺֱతখ͍࣮͞ʹ͢Δ • ࣮ࡍͷ੍ޚܥ࿈ଓ࣌ؒͳͷͰ࣌ؒΛ۠Δඞཁ͕͋Δ
࣮ݧɿ࣮ • ੍ޚܥͷϞσϧ • Matlab/Simulinkʹ͍ͭͯ͘Δsldemo_autotrans • ڧԽֶश • ChainerRLͷA3C͓ΑͼDDQN+NAF •
ϋΠύʔύϥϝʔλʔνϡʔχϯά͍ͯ͠ͳ͍ • طଘख๏ • S-Taliroͷম͖ͳ·͠๏ʢSAʣ͓ΑͼCross Entropy๏ʢCEʣ
࣮ݧɿઃఆ • ֤ੑ࣭ɿφ1−φ9ʹ • ֤ΞϧΰϦζϜɿA3C, DDQN, SA, CEΛ • 100ηογϣϯಉ݅͡Ͱద༻
• 1ηογϣϯʹ͖ͭ࠷େ200ճγϛϡϨʔγϣϯ͕Մೳ • 1ηογϣϯ͕ऴΘΔͱֶश༰ফڈ͞ΕΔ
A3C-1 A3C-5 A3C-10 DQN-1 DQN-5 DQN-10 CE-1 CE-5 CE-10 SA-1
SA-5 SA-10 φ1 80 60 70 98 80 90 2 23 14 0 13 8 φ2 47 42 42 99 100 92 5 22 5 0 16 26 φ3 68 0 0 52 0 0 83 0 0 23 0 0 φ4 72 0 0 48 0 0 84 0 0 21 0 0 φ5 100 1 0 100 0 0 100 0 0 100 0 0 φ6 62 71 78 100 100 100 0 98 100 0 26 79 φ7 37 34 41 52 99 100 0 0 0 0 0 0 φ8 35 21 36 93 100 100 100 99 97 21 77 94 φ9 38 53 57 68 100 100 0 0 4 0 0 2 ྫੜͷޭ
0 50 100 150 200 A3C−1 A3C−5 A3C−10DDQN−1 DDQN−5 DDQN−10
CE−1 CE−5 CE−10 SA−1 SA−5 SA−10 Algorithm Number of Episodes fml1
0 50 100 150 200 A3C−1 A3C−5 A3C−10DDQN−1 DDQN−5 DDQN−10
CE−1 CE−5 CE−10 SA−1 SA−5 SA−10 Algorithm Number of Episodes fml2
0 50 100 150 200 A3C−1 A3C−5 A3C−10DDQN−1 DDQN−5 DDQN−10
CE−1 CE−5 CE−10 SA−1 SA−5 SA−10 Algorithm Number of Episodes fml3
0 50 100 150 200 A3C−1 A3C−5 A3C−10DDQN−1 DDQN−5 DDQN−10
CE−1 CE−5 CE−10 SA−1 SA−5 SA−10 Algorithm Number of Episodes fml4
0 50 100 150 200 A3C−1 A3C−5 A3C−10DDQN−1 DDQN−5 DDQN−10
CE−1 CE−5 CE−10 SA−1 SA−5 SA−10 Algorithm Number of Episodes fml5
0 50 100 150 200 A3C−1 A3C−5 A3C−10DDQN−1 DDQN−5 DDQN−10
CE−1 CE−5 CE−10 SA−1 SA−5 SA−10 Algorithm Number of Episodes fml6
0 50 100 150 200 A3C−1 A3C−5 A3C−10DDQN−1 DDQN−5 DDQN−10
CE−1 CE−5 CE−10 SA−1 SA−5 SA−10 Algorithm Number of Episodes fml7
0 50 100 150 200 A3C−1 A3C−5 A3C−10DDQN−1 DDQN−5 DDQN−10
CE−1 CE−5 CE−10 SA−1 SA−5 SA−10 Algorithm Number of Episodes fml8
0 50 100 150 200 A3C−1 A3C−5 A3C−10DDQN−1 DDQN−5 DDQN−10
CE−1 CE−5 CE−10 SA−1 SA−5 SA−10 Algorithm Number of Episodes fml9
ߟ • DDQNͲͷ՝ɾઃఆͰൺֱత҆ఆͯ͠ੑೳΛग़͠ ͍ͯΔ • ΪΞʹؔ͢Δੑ࣭ʹ͍ͭͯA3C, DDQNCEʹྼΔ • ͓ͦΒ͘ΪΞෆ࿈ଓʹมԽ͢ΔͨΊ •
ڧԽֶशγϛϡϨʔγϣϯճ͕ಉ͡ͳΒ ͍͕ɺ͜Ε࣮ͷ
3ߦ·ͱΊ • ੍ޚܥ͕ʮ͓͔͠ͳৼΔ͍ʯΛ͢ΔೖྗΛڧԽֶश Λͬͯࣗಈੜͨ͠ • ڧԽֶशΛΘͳ͍ख๏ʹൺͯɺޮతʹ୳ࡧͰ͖ Δʢ͍͍ͩͨͷ߹ʣ • ·࣮ͩ༻ʹఔԕ͍ʢͱࢥ͏ʣ