Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Distributed prioritized experience replay
Search
umeco
July 03, 2018
Research
0
440
Distributed prioritized experience replay
Research paper readings in my laboratory
umeco
July 03, 2018
Tweet
Share
More Decks by umeco
See All by umeco
【WSSIT2019】食材名の分散表現学習を用いた料理レシピの栄養推定手法
umeco
0
530
Cookpad_R&D_internship_2018_byumeco
umeco
0
400
【WSSIT2018】料理レシピの分散表現を用いた代替食材の発見手法
umeco
2
560
Using an Artificial Financial Market for studying a Cryptocurrency Market
umeco
0
560
【WSSIT2017】過去の変動に対する類似検索を用いた短時間USD/JPY為替レート予測
umeco
0
430
Other Decks in Research
See All in Research
Harnessing the Power of Vicinity-Informed Analysis for Classification under Covariate Shift
nanofi
3
230
AIが非ヒト動物に与える有益・有害な影響の検討
takeshit_m
0
300
"多様な推薦"はユーザーの目にどう映るか
kuri8ive
3
260
SSII2024 [PD] 30周年記念特別企画SSII 技術マップ / LLMサーベイ
ssii
PRO
0
630
大規模言語モデル (LLM) の技術と最新動向
ikuyamada
30
15k
HP (Hitto Point: 筆頭ポイント)
tanichu
0
900
MLtraq: Track your AI experiments at hyperspeed
micheda
1
170
SSII2024 [OS3] 基盤モデル(オープニング)
ssii
PRO
0
280
Conducting AI Research on High-Performance Computing (HPC) Systems
yoshipon
2
460
The Future of AI: Beyond Completion Models to Systematic Innovation
sunghopark0
0
120
SSII2024 [OS1] 現場の課題を解決する ロボットラーニング
ssii
PRO
0
420
DroidKaigi CfP分析
yukihiromori
0
110
Featured
See All Featured
Building Adaptive Systems
keathley
34
2k
We Have a Design System, Now What?
morganepeng
46
7k
Easily Structure & Communicate Ideas using Wireframe
afnizarnur
189
16k
Making the Leap to Tech Lead
cromwellryan
127
8.7k
Designing Dashboards & Data Visualisations in Web Apps
destraynor
226
52k
Ruby is Unlike a Banana
tanoku
96
10k
Building a Modern Day E-commerce SEO Strategy
aleyda
25
6.7k
Visualizing Your Data: Incorporating Mongo into Loggly Infrastructure
mongodb
36
9.1k
10 Git Anti Patterns You Should be Aware of
lemiorhan
652
58k
Become a Pro
speakerdeck
PRO
15
4.8k
[RailsConf 2023] Rails as a piece of cake
palkan
35
4.4k
The Cult of Friendly URLs
andyhume
75
5.9k
Transcript
%JTUSJCVUFEQSJPSJUJ[FE FYQFSJFODFSFQMBZ കຊ Horgan, Dan, et al. "Distributed
prioritized experience replay." arXiv preprint arXiv:1803.00933 (2018).
࣍ ڧԽֶश ݚڀഎܠɼݚڀత ؔ࿈ݚڀ ఏҊख๏
ධՁ࣮ݧ ੳ ·ͱΊͱߟ
ڧԽֶशͱ Ϟσϧ͕ࣗͰ༷ʑʹߦಈ͠ɼྑ͍ใु͕ಘΒΕΔ ߦಈΛֶश͍ͯ͘͠ख๏ ࣮༻ྫ "MQIB(P ғޟͷଧͪํΛֶश
ڧԽֶशͷཁૉ Policy <ྫ> ಛఆͷғޟͷ൫໘Ͱ࠷উͭͱࢥ͏खΛଧͭ উͭ PSෛ͚Δ
উͯΔͳΒ͜ͷखΛ͍ɼෛ͚ΔͳΒΘͳ͍ Λ܁Γฦ͢͜ͱͰɼͲͷ൫໘ͰͲͷखΛଧͯ উ͍͔ͪ͢Λֶश͍ͯ͘͠ ߦಈ ݁Ռ ใुؔͷߋ৽
࣍ ڧԽֶश ݚڀഎܠɼݚڀత ؔ࿈ݚڀ ఏҊख๏
ධՁ࣮ݧ ੳ ·ͱΊͱߟ
ݚڀഎܠ ڧྗͳܭࢉࢿݯΛޮՌతʹར༻ͨ͠Ϟσϧ͕಄ n (PSJMB n "$ n (16"EWBOUBHF"DUPS$SJUJD
ݱঢ়ଟ͘ͷϞσϧ୯ҰͷϚγϯΛఆ ݱࡏͷڧԽֶशख๏ ଟͷϚγϯΛ༻͍ͨϞσϧͷඞཁੑ
ݚڀత ڧԽֶशख๏"QF9ͷఏҊ n ࢄγεςϜʴ༏ઌॱҐ͖ܦݧ࠶ੜ n ࠷৽ͷΞϧΰϦζϜͷΈ߹Θͤ n ࣮ӡ༻্ʹ͓͚Δࡉ͔͍मਖ਼ ఏҊख๏ͷύϥϝʔλͷֶशͷޮՌͷੳ n
ܦݧΛੜ͢ΔXPSLFSͷ n ܦݧͷอ࣋
࣍ ڧԽֶश ݚڀഎܠɼݚڀత ؔ࿈ݚڀ ఏҊख๏
ධՁ࣮ݧ ੳ ·ͱΊͱߟ
ؔ࿈ݚڀ ਂֶशͷޯΛฒྻʹܭࢉ͢Δख๏ ಉظɼඇಉظͰͷߋ৽ํ๏͕ఏҊ /BJSΒ͜ΕΒΛڧԽֶशʹద༻ n ޯͷࢄඇಉظߋ৽ n ࢄܦݧੜ ࢄ֬ޯ߱Լ๏
!$ !#""%& !#"! !!#!!% ! !#!% $& ୯ҰϚγϯɼϚϧνεϨουͰߴ͍݁Ռ
ؔ࿈ݚڀ ֶशͷ্ͨΊʹΑ͘ΘΕ͍ͯΔख๏ n ༏ઌΛ༻͍ͨαϯϓϦϯάภΓ͕ൃੜ n ֬ͳαϯϓϧͰͷޯมԽΛେ͖͘͢Δ "MBJOΒڭࢣ͋ΓֶशʹԠ༻ ࢄγεςϜͷԠ༻ʹޭ ࢄԽॏཁαϯϓϦϯά
Guillaume Alain, Alex Lamb, Chinnadhurai Sankar, Aaron Courville, and Yoshua Bengio. Variance reduction in sgd by distributed importance sampling. arXiv preprint arXiv:1511.06481, 2015.
ؔ࿈ݚڀ ੜͨ͠ܦݧΛอଘ͠Կֶशʹ༻͢Δख๏ n ੜͨ͠ܦݧΛޮతʹ༻Ͱ͖Δ n ݹ͍ํࡦͷܦݧΛ͢͜ͱͰաద߹Λ͛Δ 1SJPSJUJ[FE&YQFSJFODF3FQMBZ n ༗༻ͳܦݧΛΑΓଟ͘࠶ੜ͢Δख๏ n
5%ޡࠩΛ༻͍ͯ༏ઌ͚ &YQFSJFODF3FQMBZ -$%%('"$' %!$&)*(.$'"* ,$. " ',++ ('* $'!(* & ',% *'$'")%''$'"', #$'"#$' *'$'" (&#-%(#'-'(''$+ ',('("%(-'.$$%. **$(*$,$1 /) *$ ' * )%0 '', *',$('% ('! * ' (' *'$'" )* + ',,$('+
࣍ ڧԽֶश ݚڀഎܠɼݚڀత ؔ࿈ݚڀ ఏҊख๏
ධՁ࣮ݧ ੳ ·ͱΊͱߟ
ఏҊख๏ "QF9ͷ֓ཁ Learner Network Replay Experiences Actor Network Environment
ڧԽֶशΛͭͷׂׂ
ఏҊख๏ n ֤ࣗͷߦಈՁOFUXPSLͱFOWJSPONFOUΛॴ࣋ n ํࡦʹج͖ͮߦಈ͠ɼঢ়ଶભҠΛ؍ଌ n ભҠʹ༏ઌΛ༩͠ɼ3FQMBZ.FNPSZʹૹ৴ n "DUPSߦಈՁOFUXPSLΛֶश͠ͳ͍
"DUPS େྔͷ"DUPS͕ಠཱʹߦಈ͠ɼܦݧΛେྔʹੜ
ఏҊख๏ "DUPS͔Βૹ৴͞ΕͨܦݧΛอ࣋ n શମͰͭͷ3FQMBZ.FNPSZΛ࣋ͭ n อ࣋Ͱ͖Δܦݧͷ্ݶΛઃఆ n ্ݶΛ͑ͨ߹'*'0Ͱআ 3FQMBZ.FNPSZ
-FBSOFSֶ͕श͢ΔܦݧΛେྔʹอ࣋
ఏҊख๏ n ܦݧΛ༏ઌॱҐʹج͖ͮαϯϓϦϯάɼֶश n ֶशʹ༻͍ͨܦݧ༏ઌΛ࠶ܭࢉ n ҰఆִؒͰ"DUPSύϥϝʔλΛૹ৴ -FBSOFS ༗༻ͳܦݧΛ༏ઌతʹֶश
ఏҊख๏ "QF9ͷ֓ཁͷ·ͱΊ Learner Network Replay Experiences Actor Network Environment
ฒྻʹܦݧΛେྔʹੜ େྔͷܦݧΛอ࣋ ใुΛ૿͢Α͏ʹֶश
ఏҊख๏ (16Λେྔʹཁٻ͠ͳ͍ n -FBSOFS(16ΛੵΜͩϚγϯ্Ͱಈ࡞ ͭ n "DUPS$16ͷΈͷϚγϯ্Ͱಈ࡞ େྔ ܦݧͷޮతͳར༻ n
3FQMBZNFNPSZશମͰڞ༗ n ܦݧʹ༏ઌΛ༩ ఏҊख๏ͷಛ ͭͷ"DUPSʹΑΔ༗༻ͳൃݟ͕શମͰڞ༗
ఏҊख๏ n ֶशΞϧΰϦζϜ n 2ؔͷۙࣅث n σʔλͷαϯϓϦϯά -FBSOFSͷϞσϧ %PVCMF%FFQ2/FUXPSL
NVMUJTUFQCPPUTUSBQUBSHFU %VFMJOH/FUXPSL 1SJPSJUJ[FE&YQFSJFODF3FQMBZ
ఏҊख๏ n "DUPSݸผʹઃఆ͞Εͨ! − greedy๏ʹै͏ l ֬!ͰϥϯμϜʹߦಈ͢Δख๏ l ϥϯμϜʹߦಈ͢Δ͜ͱͰաద߹Λ͛Δ l
"DUPSຖʹઃఆ͢Δ͜ͱͰଟ༷ੑΛ୲อ n ༏ઌॱҐʹج͖ͮαϯϓϦϯά͢ΔͨΊɼ ॏཁαϯϓϦϯάʹΑͬͯͷภΓΛमਖ਼ ͦͷଞͷࡉ͔͍ઃఆ
࣍ ڧԽֶश ݚڀഎܠɼݚڀత ؔ࿈ݚڀ ఏҊख๏
ධՁ࣮ݧ ੳ ·ͱΊͱߟ
ධՁ࣮ݧ n ࣮ݧ"UBSJͷήʔϜ FHϒϩοΫ่͠ n "DUPSɿ "DUPSʹ$16 n "DUPSͷੜܦݧɿ'14 n
શମੜܦݧɿ ,'14 3FQFBU n ޯͷߋ৽ɿճTFD n ܦݧ༰ྔݮͷͨΊ1/(Ͱѹॖ͠อଘ ࣮ݧઃఆ
ධՁ࣮ݧ ֶशऴྃ࣌ͷੑೳൺֱ ֶश࣌ؒ είΞ n ήʔϜͷείΞͷதԝ n ਓؒͷείΞ n
࠷ऴείΞɼֶश࣌ؒڞʹ طଘख๏͔Βେ͖͘վળ
ධՁ࣮ݧ ใुͷ࣌ؒมԽ ֶश࣌ؒ ใु n ͭͷήʔϜʹ͓͚Δ ֫ಘใुͷฏۉ n ଞͷख๏ͱൺֱ͠ɼ
֫ಘใुΛΑΓૣ͘ େ͖͍ͯ͘͠Δ
ධՁ࣮ݧ ࣮ݧ݁Ռ - )1( ) ) ) 3) -
1 0 0-2 0 %) - -. %) (2 . % 50 - 0 ) -4 % 50 % 50 - 0 n "QF9͕࠷ߴ͍είΞΛه n ࢄֶशʹΑֶͬͯश࣌ؒେ෯ʹॖ
࣍ ڧԽֶश ݚڀഎܠɼݚڀత ؔ࿈ݚڀ ఏҊख๏
ධՁ࣮ݧ ੳ ·ͱΊͱߟ
ੳ "DUPSͱใुͷؔ "DUPS͕ଟ͍΄ͲɼΑΓྑ͍ใुΛ֫ಘ
ੳ 3FQMBZ.FNPSZͱใुͷؔ ༰ྔ͕ଟ͍΄Ͳɼൺֱతྑ͍ใुΛ֫ಘ
ੳ ΑΓ࠷৽ͷܦݧͷֶशείΞʹد༩͢Δ͔ʁ ࠷৽ͷܦݧɼ࠷৽ͷύϥϝʔλʹجͮ͘ "DUPS͕ૹ৴͢ΔܦݧΛෳͯ͠ଟΊʹૹ৴ ΑΓ৽͍͠ܦݧ͕ଟΊʹαϯϓϦϯά͞ΕΔ ࠷৽ͷܦݧ
ੳ ࠷৽ͷܦݧͱใुͷؔ ! ࠷৽ͷܦݧͷֶशͱ ใु݁ͼ͍͍ͭͯͳ͍
ੳ n "DUPSΛ૿͢ͱใु͕૿Ճ l ہॴղؕΔ͜ͱΛ͛Δಇ͖ l େྔͷ୳ࡧͰɼ༗༻ͳܦݧΛ֫ಘ n 3FQMBZ.FNPSZΛ૿͢͜ͱͰใु͕૿Ճ n
࠷৽ͷܦݧͱใुʹతͳد༩ͳ͍ ੳ݁Ռ·ͱΊ ༗༻ͳܦݧΛΑΓ͘อ࣋Ͱ͖ͨ ܦݧͷਫ૿͠ଟ༷ੑΛ͘͠ɼ ύϑΥʔϚϯεΛԼ͛Δ
࣍ ڧԽֶश ݚڀഎܠɼݚڀత ؔ࿈ݚڀ ఏҊख๏
ධՁ࣮ݧ ੳ ·ͱΊͱߟ
·ͱΊͱߟ n ࢄʴ༏ઌ͖ܦݧ࠶ੜͷ'SBNFXPSLΛఏҊ n "QF9ֶ࣮࣌ؒशɼ࠷ऴੑೳʹ͓͍ͯ࠷ྑ ͍ੑೳΛࣔͨ͠ n աద߹ڧԽֶशʹ͓͚Δେ͖ͳͰɼࠓճσʔ λΛେྔʹੜ͢Δ୯७ͳํ๏͕ޮՌతͰ͋Δ͜ͱΛ ࣔͨ͠
n কདྷతʹσʔλΛޮΑ͘͏ํ๏Λࡧ͢Δ͖ ·ͱΊ
·ͱΊͱߟ "QF9ܦݧΛߴʹେྔʹूΊΔख๏ ෳࡶͳλεΫͰঢ়ଶ!"͕େྔʹଘࡏ େྔͷܦݧͷੜ͕ঢ়ଶ!"Λ͘Χόʔֶ͠श͕ਐΜͩ ݱঢ়ɼϥϯμϜ୳ࡧʹΑͬͯະͷߦಈΛܦݧ ൃੜසͷ͍ঢ়ଶ!"Λॏతʹ୳ࡧ͢Δख๏ ߟ
2MFBSOJOHͷ2ؔͷߋ৽ࣜ ! "# , %# ← ! "# , %#
+ α(*#+, + - max 12∈4 52 ! "#+, , 67 − ! "# , %# ) "# : ࣌ࠁ;ͷঢ়ଶ %# :࣌ࠁ;ͷߦಈ ! "# , %# ঢ়ଶ"#Ͱߦಈ%#Λͱͬͨ߹ͷਪఆใु *#ɿ࣌ࠁ;ʹ͓͚Δใु αɿֶश -ɿׂҾ 5%ޡࠩʢ5FNQPSBMMZ%JGGFSFODFʣ 5%ޡࠩ ਪఆใुͱ࣮ࡍͷใुͷࠩ