Lock in $30 Savings on PRO—Offer Ends Soon! ⏳
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Distributed prioritized experience replay
Search
umeco
July 03, 2018
Research
0
510
Distributed prioritized experience replay
Research paper readings in my laboratory
umeco
July 03, 2018
Tweet
Share
More Decks by umeco
See All by umeco
Clineプロンプト徹底解剖
umeco
0
660
LLMでの多言語対応どうする問題
umeco
0
200
大生成AI時代の新規事業戦略を考える
umeco
0
150
【WSSIT2019】食材名の分散表現学習を用いた料理レシピの栄養推定手法
umeco
0
590
Cookpad_R&D_internship_2018_byumeco
umeco
0
470
【WSSIT2018】料理レシピの分散表現を用いた代替食材の発見手法
umeco
2
660
Using an Artificial Financial Market for studying a Cryptocurrency Market
umeco
0
620
【WSSIT2017】過去の変動に対する類似検索を用いた短時間USD/JPY為替レート予測
umeco
0
510
Other Decks in Research
See All in Research
HoliTracer:Holistic Vectorization of Geographic Objects from Large-Size Remote Sensing Imagery
satai
3
350
データサイエンティストをめぐる環境の違い2025年版〈一般ビジネスパーソン調査の国際比較〉
datascientistsociety
PRO
0
300
SREはサイバネティクスの夢をみるか? / Do SREs Dream of Cybernetics?
yuukit
2
240
国際論文を出そう!ICRA / IROS / RA-L への論文投稿の心構えとノウハウ / RSJ2025 Luncheon Seminar
koide3
11
6.4k
論文紹介: ReGenesis: LLMs can Grow into Reasoning Generalists via Self-Improvement
hisaokatsumi
0
150
一人称視点映像解析の最先端(MIRU2025 チュートリアル)
takumayagi
6
4.4k
GPUを利用したStein Particle Filterによる点群6自由度モンテカルロSLAM
takuminakao
0
670
教師あり学習と強化学習で作る 最強の数学特化LLM
analokmaus
2
770
Aurora Serverless からAurora Serverless v2への課題と知見を論文から読み解く/Understanding the challenges and insights of moving from Aurora Serverless to Aurora Serverless v2 from a paper
bootjp
5
830
Language Models Are Implicitly Continuous
eumesy
PRO
0
360
思いつきが武器になる:研究というゲームを始めよう / Ideas Are Your Equipments : Let the Game of Research Begin!
ks91
PRO
0
110
論文紹介:Not All Tokens Are What You Need for Pretraining
kosuken
1
220
Featured
See All Featured
職位にかかわらず全員がリーダーシップを発揮するチーム作り / Building a team where everyone can demonstrate leadership regardless of position
madoxten
47
33k
End of SEO as We Know It (SMX Advanced Version)
ipullrank
2
3.8k
30 Presentation Tips
portentint
PRO
1
170
Mind Mapping
helmedeiros
PRO
0
38
Exploring the relationship between traditional SERPs and Gen AI search
raygrieselhuber
PRO
2
3.4k
Ten Tips & Tricks for a 🌱 transition
stuffmc
0
31
Building Experiences: Design Systems, User Experience, and Full Site Editing
marktimemedia
0
320
RailsConf & Balkan Ruby 2019: The Past, Present, and Future of Rails at GitHub
eileencodes
141
34k
Art, The Web, and Tiny UX
lynnandtonic
304
21k
How STYLIGHT went responsive
nonsquared
100
6k
Code Review Best Practice
trishagee
74
19k
Lightning talk: Run Django tests with GitHub Actions
sabderemane
0
90
Transcript
%JTUSJCVUFEQSJPSJUJ[FE FYQFSJFODFSFQMBZ കຊ Horgan, Dan, et al. "Distributed
prioritized experience replay." arXiv preprint arXiv:1803.00933 (2018).
࣍ ڧԽֶश ݚڀഎܠɼݚڀత ؔ࿈ݚڀ ఏҊख๏
ධՁ࣮ݧ ੳ ·ͱΊͱߟ
ڧԽֶशͱ Ϟσϧ͕ࣗͰ༷ʑʹߦಈ͠ɼྑ͍ใु͕ಘΒΕΔ ߦಈΛֶश͍ͯ͘͠ख๏ ࣮༻ྫ "MQIB(P ғޟͷଧͪํΛֶश
ڧԽֶशͷཁૉ Policy <ྫ> ಛఆͷғޟͷ൫໘Ͱ࠷উͭͱࢥ͏खΛଧͭ উͭ PSෛ͚Δ
উͯΔͳΒ͜ͷखΛ͍ɼෛ͚ΔͳΒΘͳ͍ Λ܁Γฦ͢͜ͱͰɼͲͷ൫໘ͰͲͷखΛଧͯ উ͍͔ͪ͢Λֶश͍ͯ͘͠ ߦಈ ݁Ռ ใुؔͷߋ৽
࣍ ڧԽֶश ݚڀഎܠɼݚڀత ؔ࿈ݚڀ ఏҊख๏
ධՁ࣮ݧ ੳ ·ͱΊͱߟ
ݚڀഎܠ ڧྗͳܭࢉࢿݯΛޮՌతʹར༻ͨ͠Ϟσϧ͕಄ n (PSJMB n "$ n (16"EWBOUBHF"DUPS$SJUJD
ݱঢ়ଟ͘ͷϞσϧ୯ҰͷϚγϯΛఆ ݱࡏͷڧԽֶशख๏ ଟͷϚγϯΛ༻͍ͨϞσϧͷඞཁੑ
ݚڀత ڧԽֶशख๏"QF9ͷఏҊ n ࢄγεςϜʴ༏ઌॱҐ͖ܦݧ࠶ੜ n ࠷৽ͷΞϧΰϦζϜͷΈ߹Θͤ n ࣮ӡ༻্ʹ͓͚Δࡉ͔͍मਖ਼ ఏҊख๏ͷύϥϝʔλͷֶशͷޮՌͷੳ n
ܦݧΛੜ͢ΔXPSLFSͷ n ܦݧͷอ࣋
࣍ ڧԽֶश ݚڀഎܠɼݚڀత ؔ࿈ݚڀ ఏҊख๏
ධՁ࣮ݧ ੳ ·ͱΊͱߟ
ؔ࿈ݚڀ ਂֶशͷޯΛฒྻʹܭࢉ͢Δख๏ ಉظɼඇಉظͰͷߋ৽ํ๏͕ఏҊ /BJSΒ͜ΕΒΛڧԽֶशʹద༻ n ޯͷࢄඇಉظߋ৽ n ࢄܦݧੜ ࢄ֬ޯ߱Լ๏
!$ !#""%& !#"! !!#!!% ! !#!% $& ୯ҰϚγϯɼϚϧνεϨουͰߴ͍݁Ռ
ؔ࿈ݚڀ ֶशͷ্ͨΊʹΑ͘ΘΕ͍ͯΔख๏ n ༏ઌΛ༻͍ͨαϯϓϦϯάภΓ͕ൃੜ n ֬ͳαϯϓϧͰͷޯมԽΛେ͖͘͢Δ "MBJOΒڭࢣ͋ΓֶशʹԠ༻ ࢄγεςϜͷԠ༻ʹޭ ࢄԽॏཁαϯϓϦϯά
Guillaume Alain, Alex Lamb, Chinnadhurai Sankar, Aaron Courville, and Yoshua Bengio. Variance reduction in sgd by distributed importance sampling. arXiv preprint arXiv:1511.06481, 2015.
ؔ࿈ݚڀ ੜͨ͠ܦݧΛอଘ͠Կֶशʹ༻͢Δख๏ n ੜͨ͠ܦݧΛޮతʹ༻Ͱ͖Δ n ݹ͍ํࡦͷܦݧΛ͢͜ͱͰաద߹Λ͛Δ 1SJPSJUJ[FE&YQFSJFODF3FQMBZ n ༗༻ͳܦݧΛΑΓଟ͘࠶ੜ͢Δख๏ n
5%ޡࠩΛ༻͍ͯ༏ઌ͚ &YQFSJFODF3FQMBZ -$%%('"$' %!$&)*(.$'"* ,$. " ',++ ('* $'!(* & ',% *'$'")%''$'"', #$'"#$' *'$'" (&#-%(#'-'(''$+ ',('("%(-'.$$%. **$(*$,$1 /) *$ ' * )%0 '', *',$('% ('! * ' (' *'$'" )* + ',,$('+
࣍ ڧԽֶश ݚڀഎܠɼݚڀత ؔ࿈ݚڀ ఏҊख๏
ධՁ࣮ݧ ੳ ·ͱΊͱߟ
ఏҊख๏ "QF9ͷ֓ཁ Learner Network Replay Experiences Actor Network Environment
ڧԽֶशΛͭͷׂׂ
ఏҊख๏ n ֤ࣗͷߦಈՁOFUXPSLͱFOWJSPONFOUΛॴ࣋ n ํࡦʹج͖ͮߦಈ͠ɼঢ়ଶભҠΛ؍ଌ n ભҠʹ༏ઌΛ༩͠ɼ3FQMBZ.FNPSZʹૹ৴ n "DUPSߦಈՁOFUXPSLΛֶश͠ͳ͍
"DUPS େྔͷ"DUPS͕ಠཱʹߦಈ͠ɼܦݧΛେྔʹੜ
ఏҊख๏ "DUPS͔Βૹ৴͞ΕͨܦݧΛอ࣋ n શମͰͭͷ3FQMBZ.FNPSZΛ࣋ͭ n อ࣋Ͱ͖Δܦݧͷ্ݶΛઃఆ n ্ݶΛ͑ͨ߹'*'0Ͱআ 3FQMBZ.FNPSZ
-FBSOFSֶ͕श͢ΔܦݧΛେྔʹอ࣋
ఏҊख๏ n ܦݧΛ༏ઌॱҐʹج͖ͮαϯϓϦϯάɼֶश n ֶशʹ༻͍ͨܦݧ༏ઌΛ࠶ܭࢉ n ҰఆִؒͰ"DUPSύϥϝʔλΛૹ৴ -FBSOFS ༗༻ͳܦݧΛ༏ઌతʹֶश
ఏҊख๏ "QF9ͷ֓ཁͷ·ͱΊ Learner Network Replay Experiences Actor Network Environment
ฒྻʹܦݧΛେྔʹੜ େྔͷܦݧΛอ࣋ ใुΛ૿͢Α͏ʹֶश
ఏҊख๏ (16Λେྔʹཁٻ͠ͳ͍ n -FBSOFS(16ΛੵΜͩϚγϯ্Ͱಈ࡞ ͭ n "DUPS$16ͷΈͷϚγϯ্Ͱಈ࡞ େྔ ܦݧͷޮతͳར༻ n
3FQMBZNFNPSZશମͰڞ༗ n ܦݧʹ༏ઌΛ༩ ఏҊख๏ͷಛ ͭͷ"DUPSʹΑΔ༗༻ͳൃݟ͕શମͰڞ༗
ఏҊख๏ n ֶशΞϧΰϦζϜ n 2ؔͷۙࣅث n σʔλͷαϯϓϦϯά -FBSOFSͷϞσϧ %PVCMF%FFQ2/FUXPSL
NVMUJTUFQCPPUTUSBQUBSHFU %VFMJOH/FUXPSL 1SJPSJUJ[FE&YQFSJFODF3FQMBZ
ఏҊख๏ n "DUPSݸผʹઃఆ͞Εͨ! − greedy๏ʹै͏ l ֬!ͰϥϯμϜʹߦಈ͢Δख๏ l ϥϯμϜʹߦಈ͢Δ͜ͱͰաద߹Λ͛Δ l
"DUPSຖʹઃఆ͢Δ͜ͱͰଟ༷ੑΛ୲อ n ༏ઌॱҐʹج͖ͮαϯϓϦϯά͢ΔͨΊɼ ॏཁαϯϓϦϯάʹΑͬͯͷภΓΛमਖ਼ ͦͷଞͷࡉ͔͍ઃఆ
࣍ ڧԽֶश ݚڀഎܠɼݚڀత ؔ࿈ݚڀ ఏҊख๏
ධՁ࣮ݧ ੳ ·ͱΊͱߟ
ධՁ࣮ݧ n ࣮ݧ"UBSJͷήʔϜ FHϒϩοΫ่͠ n "DUPSɿ "DUPSʹ$16 n "DUPSͷੜܦݧɿ'14 n
શମੜܦݧɿ ,'14 3FQFBU n ޯͷߋ৽ɿճTFD n ܦݧ༰ྔݮͷͨΊ1/(Ͱѹॖ͠อଘ ࣮ݧઃఆ
ධՁ࣮ݧ ֶशऴྃ࣌ͷੑೳൺֱ ֶश࣌ؒ είΞ n ήʔϜͷείΞͷதԝ n ਓؒͷείΞ n
࠷ऴείΞɼֶश࣌ؒڞʹ طଘख๏͔Βେ͖͘վળ
ධՁ࣮ݧ ใुͷ࣌ؒมԽ ֶश࣌ؒ ใु n ͭͷήʔϜʹ͓͚Δ ֫ಘใुͷฏۉ n ଞͷख๏ͱൺֱ͠ɼ
֫ಘใुΛΑΓૣ͘ େ͖͍ͯ͘͠Δ
ධՁ࣮ݧ ࣮ݧ݁Ռ - )1( ) ) ) 3) -
1 0 0-2 0 %) - -. %) (2 . % 50 - 0 ) -4 % 50 % 50 - 0 n "QF9͕࠷ߴ͍είΞΛه n ࢄֶशʹΑֶͬͯश࣌ؒେ෯ʹॖ
࣍ ڧԽֶश ݚڀഎܠɼݚڀత ؔ࿈ݚڀ ఏҊख๏
ධՁ࣮ݧ ੳ ·ͱΊͱߟ
ੳ "DUPSͱใुͷؔ "DUPS͕ଟ͍΄ͲɼΑΓྑ͍ใुΛ֫ಘ
ੳ 3FQMBZ.FNPSZͱใुͷؔ ༰ྔ͕ଟ͍΄Ͳɼൺֱతྑ͍ใुΛ֫ಘ
ੳ ΑΓ࠷৽ͷܦݧͷֶशείΞʹد༩͢Δ͔ʁ ࠷৽ͷܦݧɼ࠷৽ͷύϥϝʔλʹجͮ͘ "DUPS͕ૹ৴͢ΔܦݧΛෳͯ͠ଟΊʹૹ৴ ΑΓ৽͍͠ܦݧ͕ଟΊʹαϯϓϦϯά͞ΕΔ ࠷৽ͷܦݧ
ੳ ࠷৽ͷܦݧͱใुͷؔ ! ࠷৽ͷܦݧͷֶशͱ ใु݁ͼ͍͍ͭͯͳ͍
ੳ n "DUPSΛ૿͢ͱใु͕૿Ճ l ہॴղؕΔ͜ͱΛ͛Δಇ͖ l େྔͷ୳ࡧͰɼ༗༻ͳܦݧΛ֫ಘ n 3FQMBZ.FNPSZΛ૿͢͜ͱͰใु͕૿Ճ n
࠷৽ͷܦݧͱใुʹతͳد༩ͳ͍ ੳ݁Ռ·ͱΊ ༗༻ͳܦݧΛΑΓ͘อ࣋Ͱ͖ͨ ܦݧͷਫ૿͠ଟ༷ੑΛ͘͠ɼ ύϑΥʔϚϯεΛԼ͛Δ
࣍ ڧԽֶश ݚڀഎܠɼݚڀత ؔ࿈ݚڀ ఏҊख๏
ධՁ࣮ݧ ੳ ·ͱΊͱߟ
·ͱΊͱߟ n ࢄʴ༏ઌ͖ܦݧ࠶ੜͷ'SBNFXPSLΛఏҊ n "QF9ֶ࣮࣌ؒशɼ࠷ऴੑೳʹ͓͍ͯ࠷ྑ ͍ੑೳΛࣔͨ͠ n աద߹ڧԽֶशʹ͓͚Δେ͖ͳͰɼࠓճσʔ λΛେྔʹੜ͢Δ୯७ͳํ๏͕ޮՌతͰ͋Δ͜ͱΛ ࣔͨ͠
n কདྷతʹσʔλΛޮΑ͘͏ํ๏Λࡧ͢Δ͖ ·ͱΊ
·ͱΊͱߟ "QF9ܦݧΛߴʹେྔʹूΊΔख๏ ෳࡶͳλεΫͰঢ়ଶ!"͕େྔʹଘࡏ େྔͷܦݧͷੜ͕ঢ়ଶ!"Λ͘Χόʔֶ͠श͕ਐΜͩ ݱঢ়ɼϥϯμϜ୳ࡧʹΑͬͯະͷߦಈΛܦݧ ൃੜසͷ͍ঢ়ଶ!"Λॏతʹ୳ࡧ͢Δख๏ ߟ
2MFBSOJOHͷ2ؔͷߋ৽ࣜ ! "# , %# ← ! "# , %#
+ α(*#+, + - max 12∈4 52 ! "#+, , 67 − ! "# , %# ) "# : ࣌ࠁ;ͷঢ়ଶ %# :࣌ࠁ;ͷߦಈ ! "# , %# ঢ়ଶ"#Ͱߦಈ%#Λͱͬͨ߹ͷਪఆใु *#ɿ࣌ࠁ;ʹ͓͚Δใु αɿֶश -ɿׂҾ 5%ޡࠩʢ5FNQPSBMMZ%JGGFSFODFʣ 5%ޡࠩ ਪఆใुͱ࣮ࡍͷใुͷࠩ