Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Distributed prioritized experience replay

umeco
July 03, 2018

Distributed prioritized experience replay

Research paper readings in my laboratory

umeco

July 03, 2018
Tweet

More Decks by umeco

Other Decks in Research

Transcript

 1. %JTUSJCVUFEQSJPSJUJ[FE FYQFSJFODFSFQMBZ കຊ੖໻ Horgan, Dan, et al. "Distributed

  prioritized experience replay." arXiv preprint arXiv:1803.00933 (2018). 
 2. ڧԽֶशͷཁૉ  Policy  <ྫ> ಛఆͷғޟͷ൫໘Ͱ࠷΋উͭͱࢥ͏खΛଧͭ উͭ PSෛ͚Δ

  উͯΔͳΒ͜ͷखΛ࢖͍ɼෛ͚ΔͳΒ࢖Θͳ͍ Λ܁Γฦ͢͜ͱͰɼͲͷ൫໘ͰͲͷखΛଧͯ ͹উͪ΍͍͔͢Λֶश͍ͯ͘͠ ߦಈ ݁Ռ ใुؔ਺ͷߋ৽
 3. ݚڀഎܠ ڧྗͳܭࢉࢿݯΛޮՌతʹར༻ͨ͠Ϟσϧ͕୆಄ n (PSJMB n "$ n (16"EWBOUBHF"DUPS$SJUJD

   ݱঢ়ଟ͘ͷϞσϧ͸୯ҰͷϚγϯΛ૝ఆ ݱࡏͷڧԽֶशख๏ ଟ਺ͷϚγϯΛ༻͍ͨϞσϧͷඞཁੑ 
 4. ؔ࿈ݚڀ ਂ૚ֶशͷޯ഑Λฒྻʹܭࢉ͢Δख๏ ಉظɼඇಉظͰͷߋ৽ํ๏͕ఏҊ /BJSΒ͸͜ΕΒΛڧԽֶशʹద༻ n ޯ഑ͷ෼ࢄඇಉظߋ৽ n ෼ࢄܦݧੜ੒ ෼ࢄ֬཰ޯ഑߱Լ๏

   !$ !#""%& !#"! !!#!!% ! !#!%  $& ୯ҰϚγϯɼϚϧνεϨουͰߴ͍݁Ռ
 5. ؔ࿈ݚڀ ֶशͷ଎౓޲্ͨΊʹΑ͘࢖ΘΕ͍ͯΔख๏ n ༏ઌ౓Λ༻͍ͨαϯϓϦϯά͸ภΓ͕ൃੜ n ௿֬཰ͳαϯϓϧͰͷޯ഑มԽΛେ͖͘͢Δ "MBJOΒ͸ڭࢣ͋ΓֶशʹԠ༻ ෼ࢄγεςϜ΁ͷԠ༻ʹ੒ޭ ෼ࢄԽॏཁ౓αϯϓϦϯά

  Guillaume Alain, Alex Lamb, Chinnadhurai Sankar, Aaron Courville, and Yoshua Bengio. Variance reduction in sgd by distributed importance sampling. arXiv preprint arXiv:1511.06481, 2015. 
 6. ؔ࿈ݚڀ ੜ੒ͨ͠ܦݧΛอଘ͠Կ౓΋ֶशʹ࢖༻͢Δख๏ n ੜ੒ͨ͠ܦݧΛޮ཰తʹ࢖༻Ͱ͖Δ n ݹ͍ํࡦͷܦݧΛ࢒͢͜ͱͰաద߹Λ๦͛Δ 1SJPSJUJ[FE&YQFSJFODF3FQMBZ n ༗༻ͳܦݧΛΑΓଟ͘࠶ੜ͢Δख๏ n

  5%ޡࠩΛ༻͍ͯ༏ઌ౓෇͚ &YQFSJFODF3FQMBZ -$%%('"$' %!$&)*(.$'"* ,$. " ',++ ('* $'!(* & ',% *'$'")%''$'"', #$'"#$' *'$'"  (&#-%(#'-'(''$+ ',('("%(-'.$$%. **$(*$,$1 /) *$ ' * )%0 '', *',$('% ('! * ' (' *'$'" )* + ',,$('+  
 7. ఏҊख๏ "QF9ͷ֓ཁ Learner Network Replay Experiences Actor Network Environment

      ڧԽֶशΛͭͷ໾ׂ΁෼ׂ
 8. ఏҊख๏ "QF9ͷ֓ཁͷ·ͱΊ Learner Network Replay Experiences Actor Network Environment

      ฒྻʹܦݧΛେྔʹੜ੒ େྔͷܦݧΛอ࣋ ใुΛ૿΍͢Α͏ʹֶश
 9. ఏҊख๏ (16Λେྔʹཁٻ͠ͳ͍ n -FBSOFS͸(16ΛੵΜͩϚγϯ্Ͱಈ࡞ ͭ n "DUPS͸$16ͷΈͷϚγϯ্Ͱಈ࡞ େྔ ܦݧͷޮ཰తͳར༻ n

  3FQMBZNFNPSZ͸શମͰڞ༗ n ܦݧʹ͸༏ઌ౓Λ෇༩ ఏҊख๏ͷಛ௃ ͭͷ"DUPSʹΑΔ༗༻ͳൃݟ͕શମͰڞ༗
 10. ఏҊख๏ n ֶशΞϧΰϦζϜ n 2ؔ਺ͷۙࣅث n σʔλͷαϯϓϦϯά -FBSOFSͷϞσϧ %PVCMF%FFQ2/FUXPSL

  NVMUJTUFQCPPUTUSBQUBSHFU %VFMJOH/FUXPSL 1SJPSJUJ[FE&YQFSJFODF3FQMBZ
 11. ఏҊख๏ n "DUPS͸ݸผʹઃఆ͞Εͨ! − greedy๏ʹै͏ l ֬཰!ͰϥϯμϜʹߦಈ͢Δख๏ l ϥϯμϜʹߦಈ͢Δ͜ͱͰաద߹Λ๦͛Δ l

  "DUPSຖʹઃఆ͢Δ͜ͱͰଟ༷ੑΛ୲อ n ༏ઌॱҐʹج͖ͮαϯϓϦϯά͢ΔͨΊɼ ॏཁ౓αϯϓϦϯάʹΑͬͯ෼෍ͷภΓΛमਖ਼ ͦͷଞͷࡉ͔͍ઃఆ
 12. ධՁ࣮ݧ n ࣮ݧ͸"UBSJͷήʔϜ FHϒϩοΫ่͠ n "DUPS਺ɿ "DUPSʹ$16 n "DUPSͷੜ੒ܦݧ਺ɿ'14 n

  શମੜ੒ܦݧ਺ɿ ,'14 3FQFBU n ޯ഑ͷߋ৽ɿճTFD n ܦݧ͸༰ྔ࡟ݮͷͨΊ1/(Ͱѹॖ͠อଘ ࣮ݧઃఆ
 13. ධՁ࣮ݧ ใुͷ࣌ؒมԽ ֶश࣌ؒ ใु n ͭͷήʔϜʹ͓͚Δ ֫ಘใुͷฏۉ n ଞͷख๏ͱൺֱ͠ɼ

  ֫ಘใुΛΑΓૣ͘ େ͖͍ͯ͘͠Δ     
 14. ධՁ࣮ݧ ࣮ݧ݁Ռ - )1( ) ) ) 3) -

  1 0 0-2 0 %) - -. %) (2 . % 50 - 0 ) -4 % 50 % 50 - 0 n "QF9͕࠷΋ߴ͍είΞΛه࿥ n ෼ࢄֶशʹΑֶͬͯश࣌ؒ΋େ෯ʹ୹ॖ
 15. ෼ੳ n "DUPS਺Λ૿΍͢ͱใु͕૿Ճ l ہॴղ΁ؕΔ͜ͱΛ๦͛Δಇ͖ l େྔͷ୳ࡧͰɼ༗༻ͳܦݧΛ֫ಘ n 3FQMBZ.FNPSZΛ૿΍͢͜ͱͰใु͕૿Ճ n

  ࠷৽ͷܦݧͱใुʹ͸௚઀తͳد༩͸ͳ͍ ෼ੳ݁Ռ·ͱΊ ༗༻ͳܦݧΛΑΓ௕͘อ࣋Ͱ͖ͨ ܦݧͷਫ૿͠͸ଟ༷ੑΛ௿͘͠ɼ ύϑΥʔϚϯεΛԼ͛Δ
 16. 2MFBSOJOHͷ2ؔ਺ͷߋ৽ࣜ ! "# , %# ← ! "# , %#

  + α(*#+, + - max 12∈4 52 ! "#+, , 67 − ! "# , %# ) "# : ࣌ࠁ;ͷঢ়ଶ %# :࣌ࠁ;ͷߦಈ ! "# , %# ঢ়ଶ"#Ͱߦಈ%#Λͱͬͨ৔߹ͷਪఆใु *#ɿ࣌ࠁ;ʹ͓͚Δใु αɿֶश཰ -ɿׂҾ཰ 5%ޡࠩʢ5FNQPSBMMZ%JGGFSFODFʣ ෇࿥ 5%ޡࠩ ਪఆใुͱ࣮ࡍͷใुͷࠩ