$30 off During Our Annual Pro Sale. View Details »

強化学習による制御システムの自動反例生成

 強化学習による制御システムの自動反例生成

第1回AI4SEセミナー講演資料

Yoriyuki Yamagata

December 14, 2018
Tweet

More Decks by Yoriyuki Yamagata

Other Decks in Technology

Transcript

  1. ࠷దԽʹΑΔࣗಈ൓ྫੜ੒ Τϯδϯ଎౓ Τϯδϯ଎౓ͷ࠷େ஋ΛͰ͖Δ্͚ͩ͛Ε͹ྑ͍ →࠷దԽٕ๏͕࢖͑Δʂ ӡసૢ࡞ 0 5 10 15 20

    25 30 0 50 100 150 0 100 200 300 400 500 600 0 5 10 15 20 25 30 1000 1500 2000 2500 3000 3500 4000 4500 5000
  2. ͜Ε·Ͱͷ࠷దԽʹΑΔࣗಈ൓ྫੜ੒ͷ࢓૊Έ Τϯδϯ଎౓ ӡసૢ࡞ 0 5 10 15 20 25 30

    0 50 100 150 0 100 200 300 400 500 600 0 5 10 15 20 25 30 1000 1500 2000 2500 3000 3500 4000 4500 5000 ࠷ߴΤϯδϯ଎౓ ࠷దԽΞϧΰϦζϜ 1ϧʔϓ=1γϛϡϨʔγϣϯ
  3. ڧԽֶशʹΑΔࣗಈ൓ྫੜ੒ γεςϜঢ়ଶ ӡసૢ࡞ 0 5 10 15 20 25 30

    0 50 100 150 0 100 200 300 400 500 600 0 5 10 15 20 25 30 1000 1500 2000 2500 3000 3500 4000 4500 5000 Τϯδϯ଎౓ ڧԽֶश 1ϧʔϓ=γϛϡϨʔγϣϯ1εςοϓ
  4. རಘͷઃܭ Ri = [ ∞ ∑ k=i γk−irk] , γ

    ≤ 1 : constant ڧԽֶश͸རಘͷ࿨Λ࠷େԽ͢Δ ൓ྫੜ੒͸Τϯδϯ଎౓ͷ࠷େ஋Λ࠷େԽ͢Δ robi = T max k=i ωk , ωk : engine speed
  5. རಘͷઃܭɿ࠷େ஋ͷ࿨ʹΑΔۙࣅ max{x1 , …, xn } ∼ log n ∑

    i=1 {exi − 1} Λ࢖͏ robi = T max k=i ωk ∼ log T ∑ k=i {eωk − 1} Ri = [ ∞ ∑ k=i γk−irk] ͱݟൺ΂Δͱ ri = eωi − 1 γ = 1 ͱஔ͚͹ྑ͍
  6. ࣮ݧɿ࣮૷ • ੍ޚܥͷϞσϧ • Matlab/Simulinkʹ͍ͭͯ͘Δsldemo_autotrans • ڧԽֶश • ChainerRLͷA3C͓ΑͼDDQN+NAF •

    ϋΠύʔύϥϝʔλʔνϡʔχϯά͸͍ͯ͠ͳ͍ • طଘख๏ • S-Taliroͷম͖ͳ·͠๏ʢSAʣ͓ΑͼCross Entropy๏ʢCEʣ
  7. ࣮ݧɿઃఆ • ֤ੑ࣭ɿφ1−φ9ʹ • ֤ΞϧΰϦζϜɿA3C, DDQN, SA, CEΛ • 100ηογϣϯಉ͡৚݅Ͱద༻

    • 1ηογϣϯʹ͖ͭ࠷େ200ճγϛϡϨʔγϣϯ͕Մೳ • 1ηογϣϯ͕ऴΘΔͱֶश಺༰͸ফڈ͞ΕΔ
  8. A3C-1 A3C-5 A3C-10 DQN-1 DQN-5 DQN-10 CE-1 CE-5 CE-10 SA-1

    SA-5 SA-10 φ1 80 60 70 98 80 90 2 23 14 0 13 8 φ2 47 42 42 99 100 92 5 22 5 0 16 26 φ3 68 0 0 52 0 0 83 0 0 23 0 0 φ4 72 0 0 48 0 0 84 0 0 21 0 0 φ5 100 1 0 100 0 0 100 0 0 100 0 0 φ6 62 71 78 100 100 100 0 98 100 0 26 79 φ7 37 34 41 52 99 100 0 0 0 0 0 0 φ8 35 21 36 93 100 100 100 99 97 21 77 94 φ9 38 53 57 68 100 100 0 0 4 0 0 2 ൓ྫੜ੒ͷ੒ޭ཰
  9. 0 50 100 150 200 A3C−1 A3C−5 A3C−10DDQN−1 DDQN−5 DDQN−10

    CE−1 CE−5 CE−10 SA−1 SA−5 SA−10 Algorithm Number of Episodes fml1
  10. 0 50 100 150 200 A3C−1 A3C−5 A3C−10DDQN−1 DDQN−5 DDQN−10

    CE−1 CE−5 CE−10 SA−1 SA−5 SA−10 Algorithm Number of Episodes fml2
  11. 0 50 100 150 200 A3C−1 A3C−5 A3C−10DDQN−1 DDQN−5 DDQN−10

    CE−1 CE−5 CE−10 SA−1 SA−5 SA−10 Algorithm Number of Episodes fml3
  12. 0 50 100 150 200 A3C−1 A3C−5 A3C−10DDQN−1 DDQN−5 DDQN−10

    CE−1 CE−5 CE−10 SA−1 SA−5 SA−10 Algorithm Number of Episodes fml4
  13. 0 50 100 150 200 A3C−1 A3C−5 A3C−10DDQN−1 DDQN−5 DDQN−10

    CE−1 CE−5 CE−10 SA−1 SA−5 SA−10 Algorithm Number of Episodes fml5
  14. 0 50 100 150 200 A3C−1 A3C−5 A3C−10DDQN−1 DDQN−5 DDQN−10

    CE−1 CE−5 CE−10 SA−1 SA−5 SA−10 Algorithm Number of Episodes fml6
  15. 0 50 100 150 200 A3C−1 A3C−5 A3C−10DDQN−1 DDQN−5 DDQN−10

    CE−1 CE−5 CE−10 SA−1 SA−5 SA−10 Algorithm Number of Episodes fml7
  16. 0 50 100 150 200 A3C−1 A3C−5 A3C−10DDQN−1 DDQN−5 DDQN−10

    CE−1 CE−5 CE−10 SA−1 SA−5 SA−10 Algorithm Number of Episodes fml8
  17. 0 50 100 150 200 A3C−1 A3C−5 A3C−10DDQN−1 DDQN−5 DDQN−10

    CE−1 CE−5 CE−10 SA−1 SA−5 SA−10 Algorithm Number of Episodes fml9