Slide 1

Slide 1 text

ڧԽֶशʹΑΔ੍ޚγεςϜͷࣗಈ൓ྫੜ੒ ࢁܗ↳೭ʢ࢈ۀٕज़૯߹ݚڀॴʣ ੺࡚୓ະʢ෋࢜௨ݚڀॴʣཱུ૘ɺஈၬւɺ㭟ݐြʢఱ௡େֶʣ

Slide 2

Slide 2 text

3ߦ·ͱΊ • ੍ޚܥ͕ʮ͓͔͠ͳৼΔ෣͍ʯΛ͢ΔೖྗΛڧԽֶश Λ࢖ͬͯࣗಈੜ੒ͨ͠ • ڧԽֶशΛ࢖Θͳ͍ख๏ʹൺ΂ͯɺޮ཰తʹ୳ࡧͰ͖ Δʢ͍͍ͩͨͷ৔߹ʣ • ·࣮ͩ༻ʹ͸ఔԕ͍ʢͱࢥ͏ʣ

Slide 3

Slide 3 text

ڧԽֶश ΤʔδΣϯτ ؀ڥ ΞΫγϣϯ རಘ ঢ়ଶ ؍ଌ

Slide 4

Slide 4 text

ڧԽֶश • ΤʔδΣϯτ͸རಘͷظ଴ׂҾ͖ݱࡏՁ஋Λ࠷େԽ͢ Δ • རಘͷকདྷʹ౉ͬͯͷ߹ܭͷظ଴஋ͱߟ͑ͯྑ͍ Ri = [ ∞ ∑ k=i γk−irk] , γ ≤ 1 : constant

Slide 5

Slide 5 text

ڧԽֶशͷख๏ • Q-functionΛ༻͍ͨํ๏ʢDQNͳͲʣ • Q-functionʢঢ়ଶͱΞΫγϣϯ͔Βظ଴རಘΛٻΊΔ ؔ਺ʣΛ௚઀ਪఆ͢Δ • Actor-CriticʢA3CͳͲʣ • ʮΞΫλʔʯ͕ै͏ϙϦγʔΛɺͦͷύϑΥʔϚϯ εΛਪఆ͢ΔʮΫϦςΟοΫʯ͕Ξοϓσʔτͯ͠ ͍͘

Slide 6

Slide 6 text

੍ޚγεςϜͷྫɿࣗಈม଎ػ ΞΫηϧ ϒϨʔΩ ଎౓ Τϯδϯ଎౓ ΪΞ

Slide 7

Slide 7 text

ࣗಈ൓ྫੜ੒ ཁٻɿΤϯδϯ଎౓͸ຖ෼4770ճҎԼ ཁٻΛຬͨ͞ͳ͍ΞΫηϧɾϒϨʔΩύλʔϯ ΛࣗಈͰੜ੒͢Δ ΞΫηϧ ϒϨʔΩ ଎౓ Τϯδϯ଎౓ ΪΞ

Slide 8

Slide 8 text

࠷దԽʹΑΔࣗಈ൓ྫੜ੒ Τϯδϯ଎౓ Τϯδϯ଎౓ͷ࠷େ஋ΛͰ͖Δ্͚ͩ͛Ε͹ྑ͍ →࠷దԽٕ๏͕࢖͑Δʂ ӡసૢ࡞ 0 5 10 15 20 25 30 0 50 100 150 0 100 200 300 400 500 600 0 5 10 15 20 25 30 1000 1500 2000 2500 3000 3500 4000 4500 5000

Slide 9

Slide 9 text

͜Ε·Ͱͷ࠷దԽʹΑΔࣗಈ൓ྫੜ੒ͷ࢓૊Έ Τϯδϯ଎౓ ӡసૢ࡞ 0 5 10 15 20 25 30 0 50 100 150 0 100 200 300 400 500 600 0 5 10 15 20 25 30 1000 1500 2000 2500 3000 3500 4000 4500 5000 ࠷ߴΤϯδϯ଎౓ ࠷దԽΞϧΰϦζϜ 1ϧʔϓ=1γϛϡϨʔγϣϯ

Slide 10

Slide 10 text

ڧԽֶशʹΑΔࣗಈ൓ྫੜ੒ γεςϜঢ়ଶ ӡసૢ࡞ 0 5 10 15 20 25 30 0 50 100 150 0 100 200 300 400 500 600 0 5 10 15 20 25 30 1000 1500 2000 2500 3000 3500 4000 4500 5000 Τϯδϯ଎౓ ڧԽֶश 1ϧʔϓ=γϛϡϨʔγϣϯ1εςοϓ

Slide 11

Slide 11 text

རಘͷઃܭ Ri = [ ∞ ∑ k=i γk−irk] , γ ≤ 1 : constant ڧԽֶश͸རಘͷ࿨Λ࠷େԽ͢Δ ൓ྫੜ੒͸Τϯδϯ଎౓ͷ࠷େ஋Λ࠷େԽ͢Δ robi = T max k=i ωk , ωk : engine speed

Slide 12

Slide 12 text

རಘͷઃܭɿ࠷େ஋ͷ࿨ʹΑΔۙࣅ max{x1 , …, xn } ∼ log n ∑ i=1 {exi − 1} Λ࢖͏ robi = T max k=i ωk ∼ log T ∑ k=i {eωk − 1} Ri = [ ∞ ∑ k=i γk−irk] ͱݟൺ΂Δͱ ri = eωi − 1 γ = 1 ͱஔ͚͹ྑ͍

Slide 13

Slide 13 text

ຊ౰͸΋͏গ͠ෳࡶ • ΋ͬͱ͍Ζ͍Ζͳੑ࣭ͷ൓ྫੜ੒Λ͍ͨ͠ • MTLͱ͍͏࿦ཧࣜͰ͔͚Δੑ࣭ͷҰ෦͕ରԠՄ • རಘͷಋग़͕΋͏গ͠ෳࡶʹͳΔ • εέʔϦϯάʹΑΔਖ਼نԽ • ೖग़ྗ஋ΛεέʔϦϯάͯ͠ൺֱతখ͍࣮͞਺஋ʹ͢Δ • ࣮ࡍͷ੍ޚܥ͸࿈ଓ࣌ؒͳͷͰ࣌ؒΛ۠੾Δඞཁ͕͋Δ

Slide 14

Slide 14 text

࣮ݧɿ࣮૷ • ੍ޚܥͷϞσϧ • Matlab/Simulinkʹ͍ͭͯ͘Δsldemo_autotrans • ڧԽֶश • ChainerRLͷA3C͓ΑͼDDQN+NAF • ϋΠύʔύϥϝʔλʔνϡʔχϯά͸͍ͯ͠ͳ͍ • طଘख๏ • S-Taliroͷম͖ͳ·͠๏ʢSAʣ͓ΑͼCross Entropy๏ʢCEʣ

Slide 15

Slide 15 text

࣮ݧɿઃఆ • ֤ੑ࣭ɿφ1−φ9ʹ • ֤ΞϧΰϦζϜɿA3C, DDQN, SA, CEΛ • 100ηογϣϯಉ͡৚݅Ͱద༻ • 1ηογϣϯʹ͖ͭ࠷େ200ճγϛϡϨʔγϣϯ͕Մೳ • 1ηογϣϯ͕ऴΘΔͱֶश಺༰͸ফڈ͞ΕΔ

Slide 16

Slide 16 text

A3C-1 A3C-5 A3C-10 DQN-1 DQN-5 DQN-10 CE-1 CE-5 CE-10 SA-1 SA-5 SA-10 φ1 80 60 70 98 80 90 2 23 14 0 13 8 φ2 47 42 42 99 100 92 5 22 5 0 16 26 φ3 68 0 0 52 0 0 83 0 0 23 0 0 φ4 72 0 0 48 0 0 84 0 0 21 0 0 φ5 100 1 0 100 0 0 100 0 0 100 0 0 φ6 62 71 78 100 100 100 0 98 100 0 26 79 φ7 37 34 41 52 99 100 0 0 0 0 0 0 φ8 35 21 36 93 100 100 100 99 97 21 77 94 φ9 38 53 57 68 100 100 0 0 4 0 0 2 ൓ྫੜ੒ͷ੒ޭ཰

Slide 17

Slide 17 text

0 50 100 150 200 A3C−1 A3C−5 A3C−10DDQN−1 DDQN−5 DDQN−10 CE−1 CE−5 CE−10 SA−1 SA−5 SA−10 Algorithm Number of Episodes fml1

Slide 18

Slide 18 text

0 50 100 150 200 A3C−1 A3C−5 A3C−10DDQN−1 DDQN−5 DDQN−10 CE−1 CE−5 CE−10 SA−1 SA−5 SA−10 Algorithm Number of Episodes fml2

Slide 19

Slide 19 text

0 50 100 150 200 A3C−1 A3C−5 A3C−10DDQN−1 DDQN−5 DDQN−10 CE−1 CE−5 CE−10 SA−1 SA−5 SA−10 Algorithm Number of Episodes fml3

Slide 20

Slide 20 text

0 50 100 150 200 A3C−1 A3C−5 A3C−10DDQN−1 DDQN−5 DDQN−10 CE−1 CE−5 CE−10 SA−1 SA−5 SA−10 Algorithm Number of Episodes fml4

Slide 21

Slide 21 text

0 50 100 150 200 A3C−1 A3C−5 A3C−10DDQN−1 DDQN−5 DDQN−10 CE−1 CE−5 CE−10 SA−1 SA−5 SA−10 Algorithm Number of Episodes fml5

Slide 22

Slide 22 text

0 50 100 150 200 A3C−1 A3C−5 A3C−10DDQN−1 DDQN−5 DDQN−10 CE−1 CE−5 CE−10 SA−1 SA−5 SA−10 Algorithm Number of Episodes fml6

Slide 23

Slide 23 text

0 50 100 150 200 A3C−1 A3C−5 A3C−10DDQN−1 DDQN−5 DDQN−10 CE−1 CE−5 CE−10 SA−1 SA−5 SA−10 Algorithm Number of Episodes fml7

Slide 24

Slide 24 text

0 50 100 150 200 A3C−1 A3C−5 A3C−10DDQN−1 DDQN−5 DDQN−10 CE−1 CE−5 CE−10 SA−1 SA−5 SA−10 Algorithm Number of Episodes fml8

Slide 25

Slide 25 text

0 50 100 150 200 A3C−1 A3C−5 A3C−10DDQN−1 DDQN−5 DDQN−10 CE−1 CE−5 CE−10 SA−1 SA−5 SA−10 Algorithm Number of Episodes fml9

Slide 26

Slide 26 text

ߟ࡯ • DDQN͸Ͳͷ՝୊ɾઃఆͰ΋ൺֱత҆ఆͯ͠ੑೳΛग़͠ ͍ͯΔ • ΪΞʹؔ͢Δੑ࣭ʹ͍ͭͯ͸A3C, DDQN͸CEʹྼΔ • ͓ͦΒ͘ΪΞ͸ෆ࿈ଓʹมԽ͢ΔͨΊ • ڧԽֶश͸γϛϡϨʔγϣϯճ਺͕ಉ͡ͳΒ΍΍଎౓ ͸஗͍͕ɺ͜Ε͸࣮૷ͷ໰୊

Slide 27

Slide 27 text

3ߦ·ͱΊ • ੍ޚܥ͕ʮ͓͔͠ͳৼΔ෣͍ʯΛ͢ΔೖྗΛڧԽֶश Λ࢖ͬͯࣗಈੜ੒ͨ͠ • ڧԽֶशΛ࢖Θͳ͍ख๏ʹൺ΂ͯɺޮ཰తʹ୳ࡧͰ͖ Δʢ͍͍ͩͨͷ৔߹ʣ • ·࣮ͩ༻ʹ͸ఔԕ͍ʢͱࢥ͏ʣ