強化学習による制御システムの自動反例生成
by
Yoriyuki Yamagata
Link
Embed
Share
Beginning
This slide
Copy link URL
Copy link URL
Copy iframe embed code
Copy iframe embed code
Copy javascript embed code
Copy javascript embed code
Share
Tweet
Share
Tweet
Slide 1
Slide 1 text
ڧԽֶशʹΑΔ੍ޚγεςϜͷࣗಈྫੜ ࢁܗ↳೭ʢ࢈ۀٕज़૯߹ݚڀॴʣ ࡚ະʢ࢜௨ݚڀॴʣཱུɺஈၬւɺ㭟ݐြʢఱେֶʣ
Slide 2
Slide 2 text
3ߦ·ͱΊ • ੍ޚܥ͕ʮ͓͔͠ͳৼΔ͍ʯΛ͢ΔೖྗΛڧԽֶश Λͬͯࣗಈੜͨ͠ • ڧԽֶशΛΘͳ͍ख๏ʹൺͯɺޮతʹ୳ࡧͰ͖ Δʢ͍͍ͩͨͷ߹ʣ • ·࣮ͩ༻ʹఔԕ͍ʢͱࢥ͏ʣ
Slide 3
Slide 3 text
ڧԽֶश ΤʔδΣϯτ ڥ ΞΫγϣϯ རಘ ঢ়ଶ ؍ଌ
Slide 4
Slide 4 text
ڧԽֶश • ΤʔδΣϯτརಘͷظׂҾ͖ݱࡏՁΛ࠷େԽ͢ Δ • རಘͷকདྷʹͬͯͷ߹ܭͷظͱߟ͑ͯྑ͍ Ri = [ ∞ ∑ k=i γk−irk] , γ ≤ 1 : constant
Slide 5
Slide 5 text
ڧԽֶशͷख๏ • Q-functionΛ༻͍ͨํ๏ʢDQNͳͲʣ • Q-functionʢঢ়ଶͱΞΫγϣϯ͔ΒظརಘΛٻΊΔ ؔʣΛਪఆ͢Δ • Actor-CriticʢA3CͳͲʣ • ʮΞΫλʔʯ͕ै͏ϙϦγʔΛɺͦͷύϑΥʔϚϯ εΛਪఆ͢ΔʮΫϦςΟοΫʯ͕Ξοϓσʔτͯ͠ ͍͘
Slide 6
Slide 6 text
੍ޚγεςϜͷྫɿࣗಈมػ ΞΫηϧ ϒϨʔΩ Τϯδϯ ΪΞ
Slide 7
Slide 7 text
ࣗಈྫੜ ཁٻɿΤϯδϯຖ4770ճҎԼ ཁٻΛຬͨ͞ͳ͍ΞΫηϧɾϒϨʔΩύλʔϯ ΛࣗಈͰੜ͢Δ ΞΫηϧ ϒϨʔΩ Τϯδϯ ΪΞ
Slide 8
Slide 8 text
࠷దԽʹΑΔࣗಈྫੜ Τϯδϯ Τϯδϯͷ࠷େΛͰ͖Δ্͚ͩ͛Εྑ͍ →࠷దԽٕ๏͕͑Δʂ ӡసૢ࡞ 0 5 10 15 20 25 30 0 50 100 150 0 100 200 300 400 500 600 0 5 10 15 20 25 30 1000 1500 2000 2500 3000 3500 4000 4500 5000
Slide 9
Slide 9 text
͜Ε·Ͱͷ࠷దԽʹΑΔࣗಈྫੜͷΈ Τϯδϯ ӡసૢ࡞ 0 5 10 15 20 25 30 0 50 100 150 0 100 200 300 400 500 600 0 5 10 15 20 25 30 1000 1500 2000 2500 3000 3500 4000 4500 5000 ࠷ߴΤϯδϯ ࠷దԽΞϧΰϦζϜ 1ϧʔϓ=1γϛϡϨʔγϣϯ
Slide 10
Slide 10 text
ڧԽֶशʹΑΔࣗಈྫੜ γεςϜঢ়ଶ ӡసૢ࡞ 0 5 10 15 20 25 30 0 50 100 150 0 100 200 300 400 500 600 0 5 10 15 20 25 30 1000 1500 2000 2500 3000 3500 4000 4500 5000 Τϯδϯ ڧԽֶश 1ϧʔϓ=γϛϡϨʔγϣϯ1εςοϓ
Slide 11
Slide 11 text
རಘͷઃܭ Ri = [ ∞ ∑ k=i γk−irk] , γ ≤ 1 : constant ڧԽֶशརಘͷΛ࠷େԽ͢Δ ྫੜΤϯδϯͷ࠷େΛ࠷େԽ͢Δ robi = T max k=i ωk , ωk : engine speed
Slide 12
Slide 12 text
རಘͷઃܭɿ࠷େͷʹΑΔۙࣅ max{x1 , …, xn } ∼ log n ∑ i=1 {exi − 1} Λ͏ robi = T max k=i ωk ∼ log T ∑ k=i {eωk − 1} Ri = [ ∞ ∑ k=i γk−irk] ͱݟൺΔͱ ri = eωi − 1 γ = 1 ͱஔ͚ྑ͍
Slide 13
Slide 13 text
ຊ͏গ͠ෳࡶ • ͬͱ͍Ζ͍Ζͳੑ࣭ͷྫੜΛ͍ͨ͠ • MTLͱ͍͏ཧࣜͰ͔͚Δੑ࣭ͷҰ෦͕ରԠՄ • རಘͷಋग़͕͏গ͠ෳࡶʹͳΔ • εέʔϦϯάʹΑΔਖ਼نԽ • ೖग़ྗΛεέʔϦϯάͯ͠ൺֱతখ͍࣮͞ʹ͢Δ • ࣮ࡍͷ੍ޚܥ࿈ଓ࣌ؒͳͷͰ࣌ؒΛ۠Δඞཁ͕͋Δ
Slide 14
Slide 14 text
࣮ݧɿ࣮ • ੍ޚܥͷϞσϧ • Matlab/Simulinkʹ͍ͭͯ͘Δsldemo_autotrans • ڧԽֶश • ChainerRLͷA3C͓ΑͼDDQN+NAF • ϋΠύʔύϥϝʔλʔνϡʔχϯά͍ͯ͠ͳ͍ • طଘख๏ • S-Taliroͷম͖ͳ·͠๏ʢSAʣ͓ΑͼCross Entropy๏ʢCEʣ
Slide 15
Slide 15 text
࣮ݧɿઃఆ • ֤ੑ࣭ɿφ1−φ9ʹ • ֤ΞϧΰϦζϜɿA3C, DDQN, SA, CEΛ • 100ηογϣϯಉ݅͡Ͱద༻ • 1ηογϣϯʹ͖ͭ࠷େ200ճγϛϡϨʔγϣϯ͕Մೳ • 1ηογϣϯ͕ऴΘΔͱֶश༰ফڈ͞ΕΔ
Slide 16
Slide 16 text
A3C-1 A3C-5 A3C-10 DQN-1 DQN-5 DQN-10 CE-1 CE-5 CE-10 SA-1 SA-5 SA-10 φ1 80 60 70 98 80 90 2 23 14 0 13 8 φ2 47 42 42 99 100 92 5 22 5 0 16 26 φ3 68 0 0 52 0 0 83 0 0 23 0 0 φ4 72 0 0 48 0 0 84 0 0 21 0 0 φ5 100 1 0 100 0 0 100 0 0 100 0 0 φ6 62 71 78 100 100 100 0 98 100 0 26 79 φ7 37 34 41 52 99 100 0 0 0 0 0 0 φ8 35 21 36 93 100 100 100 99 97 21 77 94 φ9 38 53 57 68 100 100 0 0 4 0 0 2 ྫੜͷޭ
Slide 17
Slide 17 text
0 50 100 150 200 A3C−1 A3C−5 A3C−10DDQN−1 DDQN−5 DDQN−10 CE−1 CE−5 CE−10 SA−1 SA−5 SA−10 Algorithm Number of Episodes fml1
Slide 18
Slide 18 text
0 50 100 150 200 A3C−1 A3C−5 A3C−10DDQN−1 DDQN−5 DDQN−10 CE−1 CE−5 CE−10 SA−1 SA−5 SA−10 Algorithm Number of Episodes fml2
Slide 19
Slide 19 text
0 50 100 150 200 A3C−1 A3C−5 A3C−10DDQN−1 DDQN−5 DDQN−10 CE−1 CE−5 CE−10 SA−1 SA−5 SA−10 Algorithm Number of Episodes fml3
Slide 20
Slide 20 text
0 50 100 150 200 A3C−1 A3C−5 A3C−10DDQN−1 DDQN−5 DDQN−10 CE−1 CE−5 CE−10 SA−1 SA−5 SA−10 Algorithm Number of Episodes fml4
Slide 21
Slide 21 text
0 50 100 150 200 A3C−1 A3C−5 A3C−10DDQN−1 DDQN−5 DDQN−10 CE−1 CE−5 CE−10 SA−1 SA−5 SA−10 Algorithm Number of Episodes fml5
Slide 22
Slide 22 text
0 50 100 150 200 A3C−1 A3C−5 A3C−10DDQN−1 DDQN−5 DDQN−10 CE−1 CE−5 CE−10 SA−1 SA−5 SA−10 Algorithm Number of Episodes fml6
Slide 23
Slide 23 text
0 50 100 150 200 A3C−1 A3C−5 A3C−10DDQN−1 DDQN−5 DDQN−10 CE−1 CE−5 CE−10 SA−1 SA−5 SA−10 Algorithm Number of Episodes fml7
Slide 24
Slide 24 text
0 50 100 150 200 A3C−1 A3C−5 A3C−10DDQN−1 DDQN−5 DDQN−10 CE−1 CE−5 CE−10 SA−1 SA−5 SA−10 Algorithm Number of Episodes fml8
Slide 25
Slide 25 text
0 50 100 150 200 A3C−1 A3C−5 A3C−10DDQN−1 DDQN−5 DDQN−10 CE−1 CE−5 CE−10 SA−1 SA−5 SA−10 Algorithm Number of Episodes fml9
Slide 26
Slide 26 text
ߟ • DDQNͲͷ՝ɾઃఆͰൺֱత҆ఆͯ͠ੑೳΛग़͠ ͍ͯΔ • ΪΞʹؔ͢Δੑ࣭ʹ͍ͭͯA3C, DDQNCEʹྼΔ • ͓ͦΒ͘ΪΞෆ࿈ଓʹมԽ͢ΔͨΊ • ڧԽֶशγϛϡϨʔγϣϯճ͕ಉ͡ͳΒ ͍͕ɺ͜Ε࣮ͷ
Slide 27
Slide 27 text
3ߦ·ͱΊ • ੍ޚܥ͕ʮ͓͔͠ͳৼΔ͍ʯΛ͢ΔೖྗΛڧԽֶश Λͬͯࣗಈੜͨ͠ • ڧԽֶशΛΘͳ͍ख๏ʹൺͯɺޮతʹ୳ࡧͰ͖ Δʢ͍͍ͩͨͷ߹ʣ • ·࣮ͩ༻ʹఔԕ͍ʢͱࢥ͏ʣ