Upgrade to Pro — share decks privately, control downloads, hide ads and more …

強化学習による制御システムの自動反例生成

Sponsored · Your Podcast. Everywhere. Effortlessly. Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.

 強化学習による制御システムの自動反例生成

第1回AI4SEセミナー講演資料

Avatar for Yoriyuki Yamagata

Yoriyuki Yamagata

December 14, 2018
Tweet

More Decks by Yoriyuki Yamagata

Other Decks in Technology

Transcript

  1. ࠷దԽʹΑΔࣗಈ൓ྫੜ੒ Τϯδϯ଎౓ Τϯδϯ଎౓ͷ࠷େ஋ΛͰ͖Δ্͚ͩ͛Ε͹ྑ͍ →࠷దԽٕ๏͕࢖͑Δʂ ӡసૢ࡞ 0 5 10 15 20

    25 30 0 50 100 150 0 100 200 300 400 500 600 0 5 10 15 20 25 30 1000 1500 2000 2500 3000 3500 4000 4500 5000
  2. ͜Ε·Ͱͷ࠷దԽʹΑΔࣗಈ൓ྫੜ੒ͷ࢓૊Έ Τϯδϯ଎౓ ӡసૢ࡞ 0 5 10 15 20 25 30

    0 50 100 150 0 100 200 300 400 500 600 0 5 10 15 20 25 30 1000 1500 2000 2500 3000 3500 4000 4500 5000 ࠷ߴΤϯδϯ଎౓ ࠷దԽΞϧΰϦζϜ 1ϧʔϓ=1γϛϡϨʔγϣϯ
  3. ڧԽֶशʹΑΔࣗಈ൓ྫੜ੒ γεςϜঢ়ଶ ӡసૢ࡞ 0 5 10 15 20 25 30

    0 50 100 150 0 100 200 300 400 500 600 0 5 10 15 20 25 30 1000 1500 2000 2500 3000 3500 4000 4500 5000 Τϯδϯ଎౓ ڧԽֶश 1ϧʔϓ=γϛϡϨʔγϣϯ1εςοϓ
  4. རಘͷઃܭ Ri = [ ∞ ∑ k=i γk−irk] , γ

    ≤ 1 : constant ڧԽֶश͸རಘͷ࿨Λ࠷େԽ͢Δ ൓ྫੜ੒͸Τϯδϯ଎౓ͷ࠷େ஋Λ࠷େԽ͢Δ robi = T max k=i ωk , ωk : engine speed
  5. རಘͷઃܭɿ࠷େ஋ͷ࿨ʹΑΔۙࣅ max{x1 , …, xn } ∼ log n ∑

    i=1 {exi − 1} Λ࢖͏ robi = T max k=i ωk ∼ log T ∑ k=i {eωk − 1} Ri = [ ∞ ∑ k=i γk−irk] ͱݟൺ΂Δͱ ri = eωi − 1 γ = 1 ͱஔ͚͹ྑ͍
  6. ࣮ݧɿ࣮૷ • ੍ޚܥͷϞσϧ • Matlab/Simulinkʹ͍ͭͯ͘Δsldemo_autotrans • ڧԽֶश • ChainerRLͷA3C͓ΑͼDDQN+NAF •

    ϋΠύʔύϥϝʔλʔνϡʔχϯά͸͍ͯ͠ͳ͍ • طଘख๏ • S-Taliroͷম͖ͳ·͠๏ʢSAʣ͓ΑͼCross Entropy๏ʢCEʣ
  7. ࣮ݧɿઃఆ • ֤ੑ࣭ɿφ1−φ9ʹ • ֤ΞϧΰϦζϜɿA3C, DDQN, SA, CEΛ • 100ηογϣϯಉ͡৚݅Ͱద༻

    • 1ηογϣϯʹ͖ͭ࠷େ200ճγϛϡϨʔγϣϯ͕Մೳ • 1ηογϣϯ͕ऴΘΔͱֶश಺༰͸ফڈ͞ΕΔ
  8. A3C-1 A3C-5 A3C-10 DQN-1 DQN-5 DQN-10 CE-1 CE-5 CE-10 SA-1

    SA-5 SA-10 φ1 80 60 70 98 80 90 2 23 14 0 13 8 φ2 47 42 42 99 100 92 5 22 5 0 16 26 φ3 68 0 0 52 0 0 83 0 0 23 0 0 φ4 72 0 0 48 0 0 84 0 0 21 0 0 φ5 100 1 0 100 0 0 100 0 0 100 0 0 φ6 62 71 78 100 100 100 0 98 100 0 26 79 φ7 37 34 41 52 99 100 0 0 0 0 0 0 φ8 35 21 36 93 100 100 100 99 97 21 77 94 φ9 38 53 57 68 100 100 0 0 4 0 0 2 ൓ྫੜ੒ͷ੒ޭ཰
  9. 0 50 100 150 200 A3C−1 A3C−5 A3C−10DDQN−1 DDQN−5 DDQN−10

    CE−1 CE−5 CE−10 SA−1 SA−5 SA−10 Algorithm Number of Episodes fml1
  10. 0 50 100 150 200 A3C−1 A3C−5 A3C−10DDQN−1 DDQN−5 DDQN−10

    CE−1 CE−5 CE−10 SA−1 SA−5 SA−10 Algorithm Number of Episodes fml2
  11. 0 50 100 150 200 A3C−1 A3C−5 A3C−10DDQN−1 DDQN−5 DDQN−10

    CE−1 CE−5 CE−10 SA−1 SA−5 SA−10 Algorithm Number of Episodes fml3
  12. 0 50 100 150 200 A3C−1 A3C−5 A3C−10DDQN−1 DDQN−5 DDQN−10

    CE−1 CE−5 CE−10 SA−1 SA−5 SA−10 Algorithm Number of Episodes fml4
  13. 0 50 100 150 200 A3C−1 A3C−5 A3C−10DDQN−1 DDQN−5 DDQN−10

    CE−1 CE−5 CE−10 SA−1 SA−5 SA−10 Algorithm Number of Episodes fml5
  14. 0 50 100 150 200 A3C−1 A3C−5 A3C−10DDQN−1 DDQN−5 DDQN−10

    CE−1 CE−5 CE−10 SA−1 SA−5 SA−10 Algorithm Number of Episodes fml6
  15. 0 50 100 150 200 A3C−1 A3C−5 A3C−10DDQN−1 DDQN−5 DDQN−10

    CE−1 CE−5 CE−10 SA−1 SA−5 SA−10 Algorithm Number of Episodes fml7
  16. 0 50 100 150 200 A3C−1 A3C−5 A3C−10DDQN−1 DDQN−5 DDQN−10

    CE−1 CE−5 CE−10 SA−1 SA−5 SA−10 Algorithm Number of Episodes fml8
  17. 0 50 100 150 200 A3C−1 A3C−5 A3C−10DDQN−1 DDQN−5 DDQN−10

    CE−1 CE−5 CE−10 SA−1 SA−5 SA−10 Algorithm Number of Episodes fml9