Slide 1

Slide 1 text

%JTUSJCVUFEQSJPSJUJ[FE FYQFSJFODFSFQMBZ  കຊ੖໻  Horgan, Dan, et al. "Distributed prioritized experience replay." arXiv preprint arXiv:1803.00933 (2018). 

Slide 2

Slide 2 text

໨࣍  ڧԽֶश  ݚڀഎܠɼݚڀ໨త  ؔ࿈ݚڀ  ఏҊख๏  ධՁ࣮ݧ  ෼ੳ  ·ͱΊͱߟ࡯ 

Slide 3

Slide 3 text

ڧԽֶशͱ͸ Ϟσϧ͕ࣗ෼Ͱ༷ʑʹߦಈ͠ɼྑ͍ใु͕ಘΒΕΔ ߦಈΛֶश͍ͯ͘͠ख๏ ࣮༻ྫ "MQIB(P ғޟͷଧͪํΛֶश 

Slide 4

Slide 4 text

ڧԽֶशͷཁૉ   Policy   <ྫ> ಛఆͷғޟͷ൫໘Ͱ࠷΋উͭͱࢥ͏खΛଧͭ উͭ PSෛ͚Δ উͯΔͳΒ͜ͷखΛ࢖͍ɼෛ͚ΔͳΒ࢖Θͳ͍ Λ܁Γฦ͢͜ͱͰɼͲͷ൫໘ͰͲͷखΛଧͯ ͹উͪ΍͍͔͢Λֶश͍ͯ͘͠ ߦಈ ݁Ռ ใुؔ਺ͷߋ৽

Slide 5

Slide 5 text

໨࣍  ڧԽֶश  ݚڀഎܠɼݚڀ໨త  ؔ࿈ݚڀ  ఏҊख๏  ධՁ࣮ݧ  ෼ੳ  ·ͱΊͱߟ࡯ 

Slide 6

Slide 6 text

ݚڀഎܠ ڧྗͳܭࢉࢿݯΛޮՌతʹར༻ͨ͠Ϟσϧ͕୆಄ n (PSJMB  n "$  n (16"EWBOUBHF"DUPS$SJUJD  ݱঢ়ଟ͘ͷϞσϧ͸୯ҰͷϚγϯΛ૝ఆ ݱࡏͷڧԽֶशख๏ ଟ਺ͷϚγϯΛ༻͍ͨϞσϧͷඞཁੑ 

Slide 7

Slide 7 text

ݚڀ໨త ڧԽֶशख๏"QF9ͷఏҊ n ෼ࢄγεςϜʴ༏ઌॱҐ෇͖ܦݧ࠶ੜ n ࠷৽ͷΞϧΰϦζϜͷ૊Έ߹Θͤ n ࣮ӡ༻্ʹ͓͚Δࡉ͔͍मਖ਼ ఏҊख๏ͷύϥϝʔλͷֶश΁ͷޮՌͷ෼ੳ n ܦݧΛੜ੒͢ΔXPSLFSͷ਺ n ܦݧͷอ࣋਺ 

Slide 8

Slide 8 text

໨࣍  ڧԽֶश  ݚڀഎܠɼݚڀ໨త  ؔ࿈ݚڀ  ఏҊख๏  ධՁ࣮ݧ  ෼ੳ  ·ͱΊͱߟ࡯ 

Slide 9

Slide 9 text

ؔ࿈ݚڀ ਂ૚ֶशͷޯ഑Λฒྻʹܭࢉ͢Δख๏ ಉظɼඇಉظͰͷߋ৽ํ๏͕ఏҊ /BJSΒ͸͜ΕΒΛڧԽֶशʹద༻ n ޯ഑ͷ෼ࢄඇಉظߋ৽ n ෼ࢄܦݧੜ੒  ෼ࢄ֬཰ޯ഑߱Լ๏  !$ !#""%& !#"! !!#!!% ! !#!%   $&  ୯ҰϚγϯɼϚϧνεϨουͰߴ͍݁Ռ

Slide 10

Slide 10 text

ؔ࿈ݚڀ ֶशͷ଎౓޲্ͨΊʹΑ͘࢖ΘΕ͍ͯΔख๏ n ༏ઌ౓Λ༻͍ͨαϯϓϦϯά͸ภΓ͕ൃੜ n ௿֬཰ͳαϯϓϧͰͷޯ഑มԽΛେ͖͘͢Δ "MBJOΒ͸ڭࢣ͋ΓֶशʹԠ༻ ෼ࢄγεςϜ΁ͷԠ༻ʹ੒ޭ  ෼ࢄԽॏཁ౓αϯϓϦϯά Guillaume Alain, Alex Lamb, Chinnadhurai Sankar, Aaron Courville, and Yoshua Bengio. Variance reduction in sgd by distributed importance sampling. arXiv preprint arXiv:1511.06481, 2015. 

Slide 11

Slide 11 text

ؔ࿈ݚڀ ੜ੒ͨ͠ܦݧΛอଘ͠Կ౓΋ֶशʹ࢖༻͢Δख๏ n ੜ੒ͨ͠ܦݧΛޮ཰తʹ࢖༻Ͱ͖Δ n ݹ͍ํࡦͷܦݧΛ࢒͢͜ͱͰաద߹Λ๦͛Δ 1SJPSJUJ[FE&YQFSJFODF3FQMBZ n ༗༻ͳܦݧΛΑΓଟ͘࠶ੜ͢Δख๏ n 5%ޡࠩΛ༻͍ͯ༏ઌ౓෇͚  &YQFSJFODF3FQMBZ  -$%%('"$' %!$&)*(.$'"* ,$. " ',++ ('* $'!(* & ',% *'$'")%''$'"', #$'"#$'  *'$'"   (&#-%(#'-'(''$+ ',('("%(-'.$$%. **$(*$,$1  /) *$ ' * )%0 '', *',$('% ('! * ' (' *'$'" )* + ',,$('+   

Slide 12

Slide 12 text

໨࣍  ڧԽֶश  ݚڀഎܠɼݚڀ໨త  ؔ࿈ݚڀ  ఏҊख๏  ධՁ࣮ݧ  ෼ੳ  ·ͱΊͱߟ࡯ 

Slide 13

Slide 13 text

ఏҊख๏  "QF9ͷ֓ཁ Learner Network Replay Experiences Actor Network Environment         ڧԽֶशΛͭͷ໾ׂ΁෼ׂ

Slide 14

Slide 14 text

ఏҊख๏ n ֤ࣗͷߦಈՁ஋OFUXPSLͱFOWJSPONFOUΛॴ࣋ n ํࡦʹج͖ͮߦಈ͠ɼঢ়ଶભҠΛ؍ଌ n ભҠʹ༏ઌ౓Λ෇༩͠ɼ3FQMBZ.FNPSZʹૹ৴ n "DUPS͸ߦಈՁ஋OFUXPSLΛֶश͠ͳ͍  "DUPS େྔͷ"DUPS͕ಠཱʹߦಈ͠ɼܦݧΛେྔʹੜ੒

Slide 15

Slide 15 text

ఏҊख๏ "DUPS͔Βૹ৴͞ΕͨܦݧΛอ࣋ n શମͰͭͷ3FQMBZ.FNPSZΛ࣋ͭ n อ࣋Ͱ͖Δܦݧͷ্ݶ਺Λઃఆ n ্ݶΛ௒͑ͨ৔߹͸'*'0Ͱ࡟আ  3FQMBZ.FNPSZ -FBSOFSֶ͕श͢ΔܦݧΛେྔʹอ࣋

Slide 16

Slide 16 text

ఏҊख๏ n ܦݧΛ༏ઌॱҐʹج͖ͮαϯϓϦϯάɼֶश n ֶशʹ༻͍ͨܦݧ͸༏ઌ౓Λ࠶ܭࢉ n ҰఆִؒͰ"DUPS΁ύϥϝʔλΛૹ৴  -FBSOFS ༗༻ͳܦݧΛ༏ઌతʹֶश

Slide 17

Slide 17 text

ఏҊख๏  "QF9ͷ֓ཁͷ·ͱΊ Learner Network Replay Experiences Actor Network Environment         ฒྻʹܦݧΛେྔʹੜ੒ େྔͷܦݧΛอ࣋ ใुΛ૿΍͢Α͏ʹֶश

Slide 18

Slide 18 text

ఏҊख๏ (16Λେྔʹཁٻ͠ͳ͍ n -FBSOFS͸(16ΛੵΜͩϚγϯ্Ͱಈ࡞ ͭ n "DUPS͸$16ͷΈͷϚγϯ্Ͱಈ࡞ େྔ ܦݧͷޮ཰తͳར༻ n 3FQMBZNFNPSZ͸શମͰڞ༗ n ܦݧʹ͸༏ઌ౓Λ෇༩  ఏҊख๏ͷಛ௃ ͭͷ"DUPSʹΑΔ༗༻ͳൃݟ͕શମͰڞ༗

Slide 19

Slide 19 text

ఏҊख๏ n ֶशΞϧΰϦζϜ n 2ؔ਺ͷۙࣅث n σʔλͷαϯϓϦϯά  -FBSOFSͷϞσϧ %PVCMF%FFQ2/FUXPSL NVMUJTUFQCPPUTUSBQUBSHFU %VFMJOH/FUXPSL 1SJPSJUJ[FE&YQFSJFODF3FQMBZ

Slide 20

Slide 20 text

ఏҊख๏ n "DUPS͸ݸผʹઃఆ͞Εͨ! − greedy๏ʹै͏ l ֬཰!ͰϥϯμϜʹߦಈ͢Δख๏ l ϥϯμϜʹߦಈ͢Δ͜ͱͰաద߹Λ๦͛Δ l "DUPSຖʹઃఆ͢Δ͜ͱͰଟ༷ੑΛ୲อ n ༏ઌॱҐʹج͖ͮαϯϓϦϯά͢ΔͨΊɼ ॏཁ౓αϯϓϦϯάʹΑͬͯ෼෍ͷภΓΛमਖ਼  ͦͷଞͷࡉ͔͍ઃఆ

Slide 21

Slide 21 text

໨࣍  ڧԽֶश  ݚڀഎܠɼݚڀ໨త  ؔ࿈ݚڀ  ఏҊख๏  ධՁ࣮ݧ  ෼ੳ  ·ͱΊͱߟ࡯ 

Slide 22

Slide 22 text

ධՁ࣮ݧ n ࣮ݧ͸"UBSJͷήʔϜ FHϒϩοΫ่͠ n "DUPS਺ɿ "DUPSʹ$16 n "DUPSͷੜ੒ܦݧ਺ɿ'14 n શମੜ੒ܦݧ਺ɿ ,'14 3FQFBU n ޯ഑ͷߋ৽ɿճTFD n ܦݧ͸༰ྔ࡟ݮͷͨΊ1/(Ͱѹॖ͠อଘ  ࣮ݧઃఆ

Slide 23

Slide 23 text

ධՁ࣮ݧ  ֶशऴྃ࣌ͷੑೳൺֱ ֶश࣌ؒ είΞ n ήʔϜͷείΞͷதԝ஋ n ͸ਓؒͷείΞ n ࠷ऴείΞɼֶश࣌ؒڞʹ طଘख๏͔Βେ͖͘վળ 

Slide 24

Slide 24 text

ධՁ࣮ݧ  ใुͷ࣌ؒมԽ ֶश࣌ؒ ใु n ͭͷήʔϜʹ͓͚Δ ֫ಘใुͷฏۉ n ଞͷख๏ͱൺֱ͠ɼ ֫ಘใुΛΑΓૣ͘ େ͖͍ͯ͘͠Δ         

Slide 25

Slide 25 text

ධՁ࣮ݧ  ࣮ݧ݁Ռ - )1( ) ) ) 3) - 1 0 0-2 0 %) - -. %) (2 . % 50 - 0 ) -4 % 50 % 50 - 0 n "QF9͕࠷΋ߴ͍είΞΛه࿥ n ෼ࢄֶशʹΑֶͬͯश࣌ؒ΋େ෯ʹ୹ॖ

Slide 26

Slide 26 text

໨࣍  ڧԽֶश  ݚڀഎܠɼݚڀ໨త  ؔ࿈ݚڀ  ఏҊख๏  ධՁ࣮ݧ  ෼ੳ  ·ͱΊͱߟ࡯ 

Slide 27

Slide 27 text

෼ੳ  "DUPS਺ͱใुͷؔ܎ "DUPS਺͕ଟ͍΄ͲɼΑΓྑ͍ใुΛ֫ಘ 

Slide 28

Slide 28 text

෼ੳ  3FQMBZ.FNPSZͱใुͷؔ܎ ༰ྔ͕ଟ͍΄Ͳɼൺֱతྑ͍ใुΛ֫ಘ 

Slide 29

Slide 29 text

෼ੳ ΑΓ࠷৽ͷܦݧͷֶश͸είΞʹد༩͢Δ͔ʁ ࠷৽ͷܦݧ͸ɼ࠷৽ͷύϥϝʔλʹجͮ͘ "DUPS͕ૹ৴͢ΔܦݧΛෳ੡ͯ͠ଟΊʹૹ৴ ΑΓ৽͍͠ܦݧ͕ଟΊʹαϯϓϦϯά͞ΕΔ  ࠷৽ͷܦݧ

Slide 30

Slide 30 text

෼ੳ  ࠷৽ͷܦݧͱใुͷؔ܎  !    ࠷৽ͷܦݧͷֶशͱ ใु͸݁ͼ͍͍ͭͯͳ͍

Slide 31

Slide 31 text

෼ੳ n "DUPS਺Λ૿΍͢ͱใु͕૿Ճ l ہॴղ΁ؕΔ͜ͱΛ๦͛Δಇ͖ l େྔͷ୳ࡧͰɼ༗༻ͳܦݧΛ֫ಘ n 3FQMBZ.FNPSZΛ૿΍͢͜ͱͰใु͕૿Ճ n ࠷৽ͷܦݧͱใुʹ͸௚઀తͳد༩͸ͳ͍  ෼ੳ݁Ռ·ͱΊ ༗༻ͳܦݧΛΑΓ௕͘อ࣋Ͱ͖ͨ ܦݧͷਫ૿͠͸ଟ༷ੑΛ௿͘͠ɼ ύϑΥʔϚϯεΛԼ͛Δ

Slide 32

Slide 32 text

໨࣍  ڧԽֶश  ݚڀഎܠɼݚڀ໨త  ؔ࿈ݚڀ  ఏҊख๏  ධՁ࣮ݧ  ෼ੳ  ·ͱΊͱߟ࡯ 

Slide 33

Slide 33 text

·ͱΊͱߟ࡯ n ෼ࢄʴ༏ઌ౓෇͖ܦݧ࠶ੜͷ'SBNFXPSLΛఏҊ n "QF9͸ֶ࣮࣌ؒश଎౓ɼ࠷ऴੑೳʹ͓͍ͯ࠷΋ྑ ͍ੑೳΛࣔͨ͠ n աద߹͸ڧԽֶशʹ͓͚Δେ͖ͳ໰୊Ͱɼࠓճ͸σʔ λΛେྔʹੜ੒͢Δ୯७ͳํ๏͕ޮՌతͰ͋Δ͜ͱΛ ࣔͨ͠ n কདྷతʹ͸σʔλΛޮ཰Α͘࢖͏ํ๏Λ໛ࡧ͢Δ΂͖  ·ͱΊ

Slide 34

Slide 34 text

·ͱΊͱߟ࡯ "QF9͸ܦݧΛߴ଎ʹେྔʹूΊΔख๏ ෳࡶͳλεΫͰ͸ঢ়ଶ!"͕େྔʹଘࡏ େྔͷܦݧͷੜ੒͕ঢ়ଶ!"Λ޿͘Χόʔֶ͠श͕ਐΜͩ ݱঢ়ɼϥϯμϜ୳ࡧʹΑͬͯະ஌ͷߦಈΛܦݧ ൃੜස౓ͷ௿͍ঢ়ଶ!"Λॏ఺తʹ୳ࡧ͢Δख๏  ߟ࡯

Slide 35

Slide 35 text

2MFBSOJOHͷ2ؔ਺ͷߋ৽ࣜ ! "# , %# ← ! "# , %# + α(*#+, + - max 12∈4 52 ! "#+, , 67 − ! "# , %# ) "# : ࣌ࠁ;ͷঢ়ଶ %# :࣌ࠁ;ͷߦಈ ! "# , %# ঢ়ଶ"#Ͱߦಈ%#Λͱͬͨ৔߹ͷਪఆใु *#ɿ࣌ࠁ;ʹ͓͚Δใु αɿֶश཰ -ɿׂҾ཰  5%ޡࠩʢ5FNQPSBMMZ%JGGFSFODFʣ ෇࿥ 5%ޡࠩ ਪఆใुͱ࣮ࡍͷใुͷࠩ