Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Distributed prioritized experience replay
Search
umeco
July 03, 2018
Research
0
500
Distributed prioritized experience replay
Research paper readings in my laboratory
umeco
July 03, 2018
Tweet
Share
More Decks by umeco
See All by umeco
Clineプロンプト徹底解剖
umeco
0
620
LLMでの多言語対応どうする問題
umeco
0
190
大生成AI時代の新規事業戦略を考える
umeco
0
140
【WSSIT2019】食材名の分散表現学習を用いた料理レシピの栄養推定手法
umeco
0
580
Cookpad_R&D_internship_2018_byumeco
umeco
0
460
【WSSIT2018】料理レシピの分散表現を用いた代替食材の発見手法
umeco
2
650
Using an Artificial Financial Market for studying a Cryptocurrency Market
umeco
0
610
【WSSIT2017】過去の変動に対する類似検索を用いた短時間USD/JPY為替レート予測
umeco
0
500
Other Decks in Research
See All in Research
2025/7/5 応用音響研究会招待講演@北海道大学
takuma_okamoto
1
220
言語モデルの地図:確率分布と情報幾何による類似性の可視化
shimosan
8
1.8k
20250725-bet-ai-day
cipepser
2
480
投資戦略202508
pw
0
570
HoliTracer:Holistic Vectorization of Geographic Objects from Large-Size Remote Sensing Imagery
satai
3
120
[RSJ25] Enhancing VLA Performance in Understanding and Executing Free-form Instructions via Visual Prompt-based Paraphrasing
keio_smilab
PRO
0
140
20250624_熊本経済同友会6月例会講演
trafficbrain
1
680
cvpaper.challenge 10年の軌跡 / cvpaper.challenge a decade-long journey
gatheluck
3
350
ウェブ・ソーシャルメディア論文読み会 第31回: The rising entropy of English in the attention economy. (Commun Psychology, 2024)
hkefka385
1
110
Generative Models 2025
takahashihiroshi
25
14k
VectorLLM: Human-like Extraction of Structured Building Contours via Multimodal LLMs
satai
4
330
Hiding What from Whom? A Critical Review of the History of Programming languages for Music
tomoyanonymous
0
200
Featured
See All Featured
How GitHub (no longer) Works
holman
315
140k
Creating an realtime collaboration tool: Agile Flush - .NET Oxford
marcduiker
34
2.3k
StorybookのUI Testing Handbookを読んだ
zakiyama
31
6.2k
Build your cross-platform service in a week with App Engine
jlugia
232
18k
No one is an island. Learnings from fostering a developers community.
thoeni
21
3.5k
ReactJS: Keep Simple. Everything can be a component!
pedronauck
667
120k
VelocityConf: Rendering Performance Case Studies
addyosmani
332
24k
Building a Scalable Design System with Sketch
lauravandoore
463
33k
Sharpening the Axe: The Primacy of Toolmaking
bcantrill
46
2.5k
Dealing with People You Can't Stand - Big Design 2015
cassininazir
367
27k
Practical Tips for Bootstrapping Information Extraction Pipelines
honnibal
PRO
23
1.5k
Thoughts on Productivity
jonyablonski
70
4.9k
Transcript
%JTUSJCVUFEQSJPSJUJ[FE FYQFSJFODFSFQMBZ കຊ Horgan, Dan, et al. "Distributed
prioritized experience replay." arXiv preprint arXiv:1803.00933 (2018).
࣍ ڧԽֶश ݚڀഎܠɼݚڀత ؔ࿈ݚڀ ఏҊख๏
ධՁ࣮ݧ ੳ ·ͱΊͱߟ
ڧԽֶशͱ Ϟσϧ͕ࣗͰ༷ʑʹߦಈ͠ɼྑ͍ใु͕ಘΒΕΔ ߦಈΛֶश͍ͯ͘͠ख๏ ࣮༻ྫ "MQIB(P ғޟͷଧͪํΛֶश
ڧԽֶशͷཁૉ Policy <ྫ> ಛఆͷғޟͷ൫໘Ͱ࠷উͭͱࢥ͏खΛଧͭ উͭ PSෛ͚Δ
উͯΔͳΒ͜ͷखΛ͍ɼෛ͚ΔͳΒΘͳ͍ Λ܁Γฦ͢͜ͱͰɼͲͷ൫໘ͰͲͷखΛଧͯ উ͍͔ͪ͢Λֶश͍ͯ͘͠ ߦಈ ݁Ռ ใुؔͷߋ৽
࣍ ڧԽֶश ݚڀഎܠɼݚڀత ؔ࿈ݚڀ ఏҊख๏
ධՁ࣮ݧ ੳ ·ͱΊͱߟ
ݚڀഎܠ ڧྗͳܭࢉࢿݯΛޮՌతʹར༻ͨ͠Ϟσϧ͕಄ n (PSJMB n "$ n (16"EWBOUBHF"DUPS$SJUJD
ݱঢ়ଟ͘ͷϞσϧ୯ҰͷϚγϯΛఆ ݱࡏͷڧԽֶशख๏ ଟͷϚγϯΛ༻͍ͨϞσϧͷඞཁੑ
ݚڀత ڧԽֶशख๏"QF9ͷఏҊ n ࢄγεςϜʴ༏ઌॱҐ͖ܦݧ࠶ੜ n ࠷৽ͷΞϧΰϦζϜͷΈ߹Θͤ n ࣮ӡ༻্ʹ͓͚Δࡉ͔͍मਖ਼ ఏҊख๏ͷύϥϝʔλͷֶशͷޮՌͷੳ n
ܦݧΛੜ͢ΔXPSLFSͷ n ܦݧͷอ࣋
࣍ ڧԽֶश ݚڀഎܠɼݚڀత ؔ࿈ݚڀ ఏҊख๏
ධՁ࣮ݧ ੳ ·ͱΊͱߟ
ؔ࿈ݚڀ ਂֶशͷޯΛฒྻʹܭࢉ͢Δख๏ ಉظɼඇಉظͰͷߋ৽ํ๏͕ఏҊ /BJSΒ͜ΕΒΛڧԽֶशʹద༻ n ޯͷࢄඇಉظߋ৽ n ࢄܦݧੜ ࢄ֬ޯ߱Լ๏
!$ !#""%& !#"! !!#!!% ! !#!% $& ୯ҰϚγϯɼϚϧνεϨουͰߴ͍݁Ռ
ؔ࿈ݚڀ ֶशͷ্ͨΊʹΑ͘ΘΕ͍ͯΔख๏ n ༏ઌΛ༻͍ͨαϯϓϦϯάภΓ͕ൃੜ n ֬ͳαϯϓϧͰͷޯมԽΛେ͖͘͢Δ "MBJOΒڭࢣ͋ΓֶशʹԠ༻ ࢄγεςϜͷԠ༻ʹޭ ࢄԽॏཁαϯϓϦϯά
Guillaume Alain, Alex Lamb, Chinnadhurai Sankar, Aaron Courville, and Yoshua Bengio. Variance reduction in sgd by distributed importance sampling. arXiv preprint arXiv:1511.06481, 2015.
ؔ࿈ݚڀ ੜͨ͠ܦݧΛอଘ͠Կֶशʹ༻͢Δख๏ n ੜͨ͠ܦݧΛޮతʹ༻Ͱ͖Δ n ݹ͍ํࡦͷܦݧΛ͢͜ͱͰաద߹Λ͛Δ 1SJPSJUJ[FE&YQFSJFODF3FQMBZ n ༗༻ͳܦݧΛΑΓଟ͘࠶ੜ͢Δख๏ n
5%ޡࠩΛ༻͍ͯ༏ઌ͚ &YQFSJFODF3FQMBZ -$%%('"$' %!$&)*(.$'"* ,$. " ',++ ('* $'!(* & ',% *'$'")%''$'"', #$'"#$' *'$'" (&#-%(#'-'(''$+ ',('("%(-'.$$%. **$(*$,$1 /) *$ ' * )%0 '', *',$('% ('! * ' (' *'$'" )* + ',,$('+
࣍ ڧԽֶश ݚڀഎܠɼݚڀత ؔ࿈ݚڀ ఏҊख๏
ධՁ࣮ݧ ੳ ·ͱΊͱߟ
ఏҊख๏ "QF9ͷ֓ཁ Learner Network Replay Experiences Actor Network Environment
ڧԽֶशΛͭͷׂׂ
ఏҊख๏ n ֤ࣗͷߦಈՁOFUXPSLͱFOWJSPONFOUΛॴ࣋ n ํࡦʹج͖ͮߦಈ͠ɼঢ়ଶભҠΛ؍ଌ n ભҠʹ༏ઌΛ༩͠ɼ3FQMBZ.FNPSZʹૹ৴ n "DUPSߦಈՁOFUXPSLΛֶश͠ͳ͍
"DUPS େྔͷ"DUPS͕ಠཱʹߦಈ͠ɼܦݧΛେྔʹੜ
ఏҊख๏ "DUPS͔Βૹ৴͞ΕͨܦݧΛอ࣋ n શମͰͭͷ3FQMBZ.FNPSZΛ࣋ͭ n อ࣋Ͱ͖Δܦݧͷ্ݶΛઃఆ n ্ݶΛ͑ͨ߹'*'0Ͱআ 3FQMBZ.FNPSZ
-FBSOFSֶ͕श͢ΔܦݧΛେྔʹอ࣋
ఏҊख๏ n ܦݧΛ༏ઌॱҐʹج͖ͮαϯϓϦϯάɼֶश n ֶशʹ༻͍ͨܦݧ༏ઌΛ࠶ܭࢉ n ҰఆִؒͰ"DUPSύϥϝʔλΛૹ৴ -FBSOFS ༗༻ͳܦݧΛ༏ઌతʹֶश
ఏҊख๏ "QF9ͷ֓ཁͷ·ͱΊ Learner Network Replay Experiences Actor Network Environment
ฒྻʹܦݧΛେྔʹੜ େྔͷܦݧΛอ࣋ ใुΛ૿͢Α͏ʹֶश
ఏҊख๏ (16Λେྔʹཁٻ͠ͳ͍ n -FBSOFS(16ΛੵΜͩϚγϯ্Ͱಈ࡞ ͭ n "DUPS$16ͷΈͷϚγϯ্Ͱಈ࡞ େྔ ܦݧͷޮతͳར༻ n
3FQMBZNFNPSZશମͰڞ༗ n ܦݧʹ༏ઌΛ༩ ఏҊख๏ͷಛ ͭͷ"DUPSʹΑΔ༗༻ͳൃݟ͕શମͰڞ༗
ఏҊख๏ n ֶशΞϧΰϦζϜ n 2ؔͷۙࣅث n σʔλͷαϯϓϦϯά -FBSOFSͷϞσϧ %PVCMF%FFQ2/FUXPSL
NVMUJTUFQCPPUTUSBQUBSHFU %VFMJOH/FUXPSL 1SJPSJUJ[FE&YQFSJFODF3FQMBZ
ఏҊख๏ n "DUPSݸผʹઃఆ͞Εͨ! − greedy๏ʹै͏ l ֬!ͰϥϯμϜʹߦಈ͢Δख๏ l ϥϯμϜʹߦಈ͢Δ͜ͱͰաద߹Λ͛Δ l
"DUPSຖʹઃఆ͢Δ͜ͱͰଟ༷ੑΛ୲อ n ༏ઌॱҐʹج͖ͮαϯϓϦϯά͢ΔͨΊɼ ॏཁαϯϓϦϯάʹΑͬͯͷภΓΛमਖ਼ ͦͷଞͷࡉ͔͍ઃఆ
࣍ ڧԽֶश ݚڀഎܠɼݚڀత ؔ࿈ݚڀ ఏҊख๏
ධՁ࣮ݧ ੳ ·ͱΊͱߟ
ධՁ࣮ݧ n ࣮ݧ"UBSJͷήʔϜ FHϒϩοΫ่͠ n "DUPSɿ "DUPSʹ$16 n "DUPSͷੜܦݧɿ'14 n
શମੜܦݧɿ ,'14 3FQFBU n ޯͷߋ৽ɿճTFD n ܦݧ༰ྔݮͷͨΊ1/(Ͱѹॖ͠อଘ ࣮ݧઃఆ
ධՁ࣮ݧ ֶशऴྃ࣌ͷੑೳൺֱ ֶश࣌ؒ είΞ n ήʔϜͷείΞͷதԝ n ਓؒͷείΞ n
࠷ऴείΞɼֶश࣌ؒڞʹ طଘख๏͔Βେ͖͘վળ
ධՁ࣮ݧ ใुͷ࣌ؒมԽ ֶश࣌ؒ ใु n ͭͷήʔϜʹ͓͚Δ ֫ಘใुͷฏۉ n ଞͷख๏ͱൺֱ͠ɼ
֫ಘใुΛΑΓૣ͘ େ͖͍ͯ͘͠Δ
ධՁ࣮ݧ ࣮ݧ݁Ռ - )1( ) ) ) 3) -
1 0 0-2 0 %) - -. %) (2 . % 50 - 0 ) -4 % 50 % 50 - 0 n "QF9͕࠷ߴ͍είΞΛه n ࢄֶशʹΑֶͬͯश࣌ؒେ෯ʹॖ
࣍ ڧԽֶश ݚڀഎܠɼݚڀత ؔ࿈ݚڀ ఏҊख๏
ධՁ࣮ݧ ੳ ·ͱΊͱߟ
ੳ "DUPSͱใुͷؔ "DUPS͕ଟ͍΄ͲɼΑΓྑ͍ใुΛ֫ಘ
ੳ 3FQMBZ.FNPSZͱใुͷؔ ༰ྔ͕ଟ͍΄Ͳɼൺֱతྑ͍ใुΛ֫ಘ
ੳ ΑΓ࠷৽ͷܦݧͷֶशείΞʹد༩͢Δ͔ʁ ࠷৽ͷܦݧɼ࠷৽ͷύϥϝʔλʹجͮ͘ "DUPS͕ૹ৴͢ΔܦݧΛෳͯ͠ଟΊʹૹ৴ ΑΓ৽͍͠ܦݧ͕ଟΊʹαϯϓϦϯά͞ΕΔ ࠷৽ͷܦݧ
ੳ ࠷৽ͷܦݧͱใुͷؔ ! ࠷৽ͷܦݧͷֶशͱ ใु݁ͼ͍͍ͭͯͳ͍
ੳ n "DUPSΛ૿͢ͱใु͕૿Ճ l ہॴղؕΔ͜ͱΛ͛Δಇ͖ l େྔͷ୳ࡧͰɼ༗༻ͳܦݧΛ֫ಘ n 3FQMBZ.FNPSZΛ૿͢͜ͱͰใु͕૿Ճ n
࠷৽ͷܦݧͱใुʹతͳد༩ͳ͍ ੳ݁Ռ·ͱΊ ༗༻ͳܦݧΛΑΓ͘อ࣋Ͱ͖ͨ ܦݧͷਫ૿͠ଟ༷ੑΛ͘͠ɼ ύϑΥʔϚϯεΛԼ͛Δ
࣍ ڧԽֶश ݚڀഎܠɼݚڀత ؔ࿈ݚڀ ఏҊख๏
ධՁ࣮ݧ ੳ ·ͱΊͱߟ
·ͱΊͱߟ n ࢄʴ༏ઌ͖ܦݧ࠶ੜͷ'SBNFXPSLΛఏҊ n "QF9ֶ࣮࣌ؒशɼ࠷ऴੑೳʹ͓͍ͯ࠷ྑ ͍ੑೳΛࣔͨ͠ n աద߹ڧԽֶशʹ͓͚Δେ͖ͳͰɼࠓճσʔ λΛେྔʹੜ͢Δ୯७ͳํ๏͕ޮՌతͰ͋Δ͜ͱΛ ࣔͨ͠
n কདྷతʹσʔλΛޮΑ͘͏ํ๏Λࡧ͢Δ͖ ·ͱΊ
·ͱΊͱߟ "QF9ܦݧΛߴʹେྔʹूΊΔख๏ ෳࡶͳλεΫͰঢ়ଶ!"͕େྔʹଘࡏ େྔͷܦݧͷੜ͕ঢ়ଶ!"Λ͘Χόʔֶ͠श͕ਐΜͩ ݱঢ়ɼϥϯμϜ୳ࡧʹΑͬͯະͷߦಈΛܦݧ ൃੜසͷ͍ঢ়ଶ!"Λॏతʹ୳ࡧ͢Δख๏ ߟ
2MFBSOJOHͷ2ؔͷߋ৽ࣜ ! "# , %# ← ! "# , %#
+ α(*#+, + - max 12∈4 52 ! "#+, , 67 − ! "# , %# ) "# : ࣌ࠁ;ͷঢ়ଶ %# :࣌ࠁ;ͷߦಈ ! "# , %# ঢ়ଶ"#Ͱߦಈ%#Λͱͬͨ߹ͷਪఆใु *#ɿ࣌ࠁ;ʹ͓͚Δใु αɿֶश -ɿׂҾ 5%ޡࠩʢ5FNQPSBMMZ%JGGFSFODFʣ 5%ޡࠩ ਪఆใुͱ࣮ࡍͷใुͷࠩ