強化学習による制御システムの自動反例生成
by
Yoriyuki Yamagata
×
Copy
Open
Share
Embed
Copy iframe code
Copy JS code
Copy link
Start on current slide
Slide 1
Slide 1 text
ڧԽֶशʹΑΔ੍ޚγεςϜͷࣗಈྫੜ ࢁܗ↳೭ʢ࢈ۀٕज़૯߹ݚڀॴʣ ࡚ະʢ࢜௨ݚڀॴʣཱུɺஈၬւɺ㭟ݐြʢఱେֶʣ
Slide 2
Slide 2 text
3ߦ·ͱΊ • ੍ޚܥ͕ʮ͓͔͠ͳৼΔ͍ʯΛ͢ΔೖྗΛڧԽֶश Λͬͯࣗಈੜͨ͠ • ڧԽֶशΛΘͳ͍ख๏ʹൺͯɺޮతʹ୳ࡧͰ͖ Δʢ͍͍ͩͨͷ߹ʣ • ·࣮ͩ༻ʹఔԕ͍ʢͱࢥ͏ʣ
Slide 3
Slide 3 text
ڧԽֶश ΤʔδΣϯτ ڥ ΞΫγϣϯ རಘ ঢ়ଶ ؍ଌ
Slide 4
Slide 4 text
ڧԽֶश • ΤʔδΣϯτརಘͷظׂҾ͖ݱࡏՁΛ࠷େԽ͢ Δ • རಘͷকདྷʹͬͯͷ߹ܭͷظͱߟ͑ͯྑ͍ Ri = [ ∞ ∑ k=i γk−irk] , γ ≤ 1 : constant
Slide 5
Slide 5 text
ڧԽֶशͷख๏ • Q-functionΛ༻͍ͨํ๏ʢDQNͳͲʣ • Q-functionʢঢ়ଶͱΞΫγϣϯ͔ΒظརಘΛٻΊΔ ؔʣΛਪఆ͢Δ • Actor-CriticʢA3CͳͲʣ • ʮΞΫλʔʯ͕ै͏ϙϦγʔΛɺͦͷύϑΥʔϚϯ εΛਪఆ͢ΔʮΫϦςΟοΫʯ͕Ξοϓσʔτͯ͠ ͍͘
Slide 6
Slide 6 text
੍ޚγεςϜͷྫɿࣗಈมػ ΞΫηϧ ϒϨʔΩ Τϯδϯ ΪΞ
Slide 7
Slide 7 text
ࣗಈྫੜ ཁٻɿΤϯδϯຖ4770ճҎԼ ཁٻΛຬͨ͞ͳ͍ΞΫηϧɾϒϨʔΩύλʔϯ ΛࣗಈͰੜ͢Δ ΞΫηϧ ϒϨʔΩ Τϯδϯ ΪΞ
Slide 8
Slide 8 text
࠷దԽʹΑΔࣗಈྫੜ Τϯδϯ Τϯδϯͷ࠷େΛͰ͖Δ্͚ͩ͛Εྑ͍ →࠷దԽٕ๏͕͑Δʂ ӡసૢ࡞ 0 5 10 15 20 25 30 0 50 100 150 0 100 200 300 400 500 600 0 5 10 15 20 25 30 1000 1500 2000 2500 3000 3500 4000 4500 5000
Slide 9
Slide 9 text
͜Ε·Ͱͷ࠷దԽʹΑΔࣗಈྫੜͷΈ Τϯδϯ ӡసૢ࡞ 0 5 10 15 20 25 30 0 50 100 150 0 100 200 300 400 500 600 0 5 10 15 20 25 30 1000 1500 2000 2500 3000 3500 4000 4500 5000 ࠷ߴΤϯδϯ ࠷దԽΞϧΰϦζϜ 1ϧʔϓ=1γϛϡϨʔγϣϯ
Slide 10
Slide 10 text
ڧԽֶशʹΑΔࣗಈྫੜ γεςϜঢ়ଶ ӡసૢ࡞ 0 5 10 15 20 25 30 0 50 100 150 0 100 200 300 400 500 600 0 5 10 15 20 25 30 1000 1500 2000 2500 3000 3500 4000 4500 5000 Τϯδϯ ڧԽֶश 1ϧʔϓ=γϛϡϨʔγϣϯ1εςοϓ
Slide 11
Slide 11 text
རಘͷઃܭ Ri = [ ∞ ∑ k=i γk−irk] , γ ≤ 1 : constant ڧԽֶशརಘͷΛ࠷େԽ͢Δ ྫੜΤϯδϯͷ࠷େΛ࠷େԽ͢Δ robi = T max k=i ωk , ωk : engine speed
Slide 12
Slide 12 text
རಘͷઃܭɿ࠷େͷʹΑΔۙࣅ max{x1 , …, xn } ∼ log n ∑ i=1 {exi − 1} Λ͏ robi = T max k=i ωk ∼ log T ∑ k=i {eωk − 1} Ri = [ ∞ ∑ k=i γk−irk] ͱݟൺΔͱ ri = eωi − 1 γ = 1 ͱஔ͚ྑ͍
Slide 13
Slide 13 text
ຊ͏গ͠ෳࡶ • ͬͱ͍Ζ͍Ζͳੑ࣭ͷྫੜΛ͍ͨ͠ • MTLͱ͍͏ཧࣜͰ͔͚Δੑ࣭ͷҰ෦͕ରԠՄ • རಘͷಋग़͕͏গ͠ෳࡶʹͳΔ • εέʔϦϯάʹΑΔਖ਼نԽ • ೖग़ྗΛεέʔϦϯάͯ͠ൺֱతখ͍࣮͞ʹ͢Δ • ࣮ࡍͷ੍ޚܥ࿈ଓ࣌ؒͳͷͰ࣌ؒΛ۠Δඞཁ͕͋Δ
Slide 14
Slide 14 text
࣮ݧɿ࣮ • ੍ޚܥͷϞσϧ • Matlab/Simulinkʹ͍ͭͯ͘Δsldemo_autotrans • ڧԽֶश • ChainerRLͷA3C͓ΑͼDDQN+NAF • ϋΠύʔύϥϝʔλʔνϡʔχϯά͍ͯ͠ͳ͍ • طଘख๏ • S-Taliroͷম͖ͳ·͠๏ʢSAʣ͓ΑͼCross Entropy๏ʢCEʣ
Slide 15
Slide 15 text
࣮ݧɿઃఆ • ֤ੑ࣭ɿφ1−φ9ʹ • ֤ΞϧΰϦζϜɿA3C, DDQN, SA, CEΛ • 100ηογϣϯಉ݅͡Ͱద༻ • 1ηογϣϯʹ͖ͭ࠷େ200ճγϛϡϨʔγϣϯ͕Մೳ • 1ηογϣϯ͕ऴΘΔͱֶश༰ফڈ͞ΕΔ
Slide 16
Slide 16 text
A3C-1 A3C-5 A3C-10 DQN-1 DQN-5 DQN-10 CE-1 CE-5 CE-10 SA-1 SA-5 SA-10 φ1 80 60 70 98 80 90 2 23 14 0 13 8 φ2 47 42 42 99 100 92 5 22 5 0 16 26 φ3 68 0 0 52 0 0 83 0 0 23 0 0 φ4 72 0 0 48 0 0 84 0 0 21 0 0 φ5 100 1 0 100 0 0 100 0 0 100 0 0 φ6 62 71 78 100 100 100 0 98 100 0 26 79 φ7 37 34 41 52 99 100 0 0 0 0 0 0 φ8 35 21 36 93 100 100 100 99 97 21 77 94 φ9 38 53 57 68 100 100 0 0 4 0 0 2 ྫੜͷޭ
Slide 17
Slide 17 text
0 50 100 150 200 A3C−1 A3C−5 A3C−10DDQN−1 DDQN−5 DDQN−10 CE−1 CE−5 CE−10 SA−1 SA−5 SA−10 Algorithm Number of Episodes fml1
Slide 18
Slide 18 text
0 50 100 150 200 A3C−1 A3C−5 A3C−10DDQN−1 DDQN−5 DDQN−10 CE−1 CE−5 CE−10 SA−1 SA−5 SA−10 Algorithm Number of Episodes fml2
Slide 19
Slide 19 text
0 50 100 150 200 A3C−1 A3C−5 A3C−10DDQN−1 DDQN−5 DDQN−10 CE−1 CE−5 CE−10 SA−1 SA−5 SA−10 Algorithm Number of Episodes fml3
Slide 20
Slide 20 text
0 50 100 150 200 A3C−1 A3C−5 A3C−10DDQN−1 DDQN−5 DDQN−10 CE−1 CE−5 CE−10 SA−1 SA−5 SA−10 Algorithm Number of Episodes fml4
Slide 21
Slide 21 text
0 50 100 150 200 A3C−1 A3C−5 A3C−10DDQN−1 DDQN−5 DDQN−10 CE−1 CE−5 CE−10 SA−1 SA−5 SA−10 Algorithm Number of Episodes fml5
Slide 22
Slide 22 text
0 50 100 150 200 A3C−1 A3C−5 A3C−10DDQN−1 DDQN−5 DDQN−10 CE−1 CE−5 CE−10 SA−1 SA−5 SA−10 Algorithm Number of Episodes fml6
Slide 23
Slide 23 text
0 50 100 150 200 A3C−1 A3C−5 A3C−10DDQN−1 DDQN−5 DDQN−10 CE−1 CE−5 CE−10 SA−1 SA−5 SA−10 Algorithm Number of Episodes fml7
Slide 24
Slide 24 text
0 50 100 150 200 A3C−1 A3C−5 A3C−10DDQN−1 DDQN−5 DDQN−10 CE−1 CE−5 CE−10 SA−1 SA−5 SA−10 Algorithm Number of Episodes fml8
Slide 25
Slide 25 text
0 50 100 150 200 A3C−1 A3C−5 A3C−10DDQN−1 DDQN−5 DDQN−10 CE−1 CE−5 CE−10 SA−1 SA−5 SA−10 Algorithm Number of Episodes fml9
Slide 26
Slide 26 text
ߟ • DDQNͲͷ՝ɾઃఆͰൺֱత҆ఆͯ͠ੑೳΛग़͠ ͍ͯΔ • ΪΞʹؔ͢Δੑ࣭ʹ͍ͭͯA3C, DDQNCEʹྼΔ • ͓ͦΒ͘ΪΞෆ࿈ଓʹมԽ͢ΔͨΊ • ڧԽֶशγϛϡϨʔγϣϯճ͕ಉ͡ͳΒ ͍͕ɺ͜Ε࣮ͷ
Slide 27
Slide 27 text
3ߦ·ͱΊ • ੍ޚܥ͕ʮ͓͔͠ͳৼΔ͍ʯΛ͢ΔೖྗΛڧԽֶश Λͬͯࣗಈੜͨ͠ • ڧԽֶशΛΘͳ͍ख๏ʹൺͯɺޮతʹ୳ࡧͰ͖ Δʢ͍͍ͩͨͷ߹ʣ • ·࣮ͩ༻ʹఔԕ͍ʢͱࢥ͏ʣ