Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Distributed prioritized experience replay
Search
umeco
July 03, 2018
Research
0
510
Distributed prioritized experience replay
Research paper readings in my laboratory
umeco
July 03, 2018
Tweet
Share
More Decks by umeco
See All by umeco
Clineプロンプト徹底解剖
umeco
0
680
LLMでの多言語対応どうする問題
umeco
0
210
大生成AI時代の新規事業戦略を考える
umeco
0
150
【WSSIT2019】食材名の分散表現学習を用いた料理レシピの栄養推定手法
umeco
0
590
Cookpad_R&D_internship_2018_byumeco
umeco
0
470
【WSSIT2018】料理レシピの分散表現を用いた代替食材の発見手法
umeco
2
660
Using an Artificial Financial Market for studying a Cryptocurrency Market
umeco
0
620
【WSSIT2017】過去の変動に対する類似検索を用いた短時間USD/JPY為替レート予測
umeco
0
510
Other Decks in Research
See All in Research
[IBIS 2025] 深層基盤モデルのための強化学習驚きから理論にもとづく納得へ
akifumi_wachi
19
9.2k
SREはサイバネティクスの夢をみるか? / Do SREs Dream of Cybernetics?
yuukit
3
310
令和最新技術で伝統掲示板を再構築: HonoX で作る型安全なスレッドフロート型掲示板 / かろっく@calloc134 - Hono Conference 2025
calloc134
0
480
データサイエンティストの業務変化
datascientistsociety
PRO
0
130
Community Driveプロジェクト(CDPJ)の中間報告
smartfukushilab1
0
120
Sat2City:3D City Generation from A Single Satellite Image with Cascaded Latent Diffusion
satai
4
490
国際論文を出そう!ICRA / IROS / RA-L への論文投稿の心構えとノウハウ / RSJ2025 Luncheon Seminar
koide3
12
6.8k
生成AIとうまく付き合うためのプロンプトエンジニアリング
yuri_ohashi
0
110
SREのためのテレメトリー技術の探究 / Telemetry for SRE
yuukit
13
2.7k
Proposal of an Information Delivery Method for Electronic Paper Signage Using Human Mobility as the Communication Medium / ICCE-Asia 2025
yumulab
0
110
HoliTracer:Holistic Vectorization of Geographic Objects from Large-Size Remote Sensing Imagery
satai
3
460
AWSで実現した大規模日本語VLM学習用データセット "MOMIJI" 構築パイプライン/buiding-momiji
studio_graph
2
1.1k
Featured
See All Featured
Digital Ethics as a Driver of Design Innovation
axbom
PRO
0
140
SERP Conf. Vienna - Web Accessibility: Optimizing for Inclusivity and SEO
sarafernandez
1
1.3k
How to Think Like a Performance Engineer
csswizardry
28
2.4k
Design in an AI World
tapps
0
110
Conquering PDFs: document understanding beyond plain text
inesmontani
PRO
4
2.2k
Art, The Web, and Tiny UX
lynnandtonic
304
21k
Git: the NoSQL Database
bkeepers
PRO
432
66k
Automating Front-end Workflow
addyosmani
1371
200k
世界の人気アプリ100個を分析して見えたペイウォール設計の心得
akihiro_kokubo
PRO
65
35k
Building Applications with DynamoDB
mza
96
6.9k
Between Models and Reality
mayunak
1
160
Evolving SEO for Evolving Search Engines
ryanjones
0
93
Transcript
%JTUSJCVUFEQSJPSJUJ[FE FYQFSJFODFSFQMBZ കຊ Horgan, Dan, et al. "Distributed
prioritized experience replay." arXiv preprint arXiv:1803.00933 (2018).
࣍ ڧԽֶश ݚڀഎܠɼݚڀత ؔ࿈ݚڀ ఏҊख๏
ධՁ࣮ݧ ੳ ·ͱΊͱߟ
ڧԽֶशͱ Ϟσϧ͕ࣗͰ༷ʑʹߦಈ͠ɼྑ͍ใु͕ಘΒΕΔ ߦಈΛֶश͍ͯ͘͠ख๏ ࣮༻ྫ "MQIB(P ғޟͷଧͪํΛֶश
ڧԽֶशͷཁૉ Policy <ྫ> ಛఆͷғޟͷ൫໘Ͱ࠷উͭͱࢥ͏खΛଧͭ উͭ PSෛ͚Δ
উͯΔͳΒ͜ͷखΛ͍ɼෛ͚ΔͳΒΘͳ͍ Λ܁Γฦ͢͜ͱͰɼͲͷ൫໘ͰͲͷखΛଧͯ উ͍͔ͪ͢Λֶश͍ͯ͘͠ ߦಈ ݁Ռ ใुؔͷߋ৽
࣍ ڧԽֶश ݚڀഎܠɼݚڀత ؔ࿈ݚڀ ఏҊख๏
ධՁ࣮ݧ ੳ ·ͱΊͱߟ
ݚڀഎܠ ڧྗͳܭࢉࢿݯΛޮՌతʹར༻ͨ͠Ϟσϧ͕಄ n (PSJMB n "$ n (16"EWBOUBHF"DUPS$SJUJD
ݱঢ়ଟ͘ͷϞσϧ୯ҰͷϚγϯΛఆ ݱࡏͷڧԽֶशख๏ ଟͷϚγϯΛ༻͍ͨϞσϧͷඞཁੑ
ݚڀత ڧԽֶशख๏"QF9ͷఏҊ n ࢄγεςϜʴ༏ઌॱҐ͖ܦݧ࠶ੜ n ࠷৽ͷΞϧΰϦζϜͷΈ߹Θͤ n ࣮ӡ༻্ʹ͓͚Δࡉ͔͍मਖ਼ ఏҊख๏ͷύϥϝʔλͷֶशͷޮՌͷੳ n
ܦݧΛੜ͢ΔXPSLFSͷ n ܦݧͷอ࣋
࣍ ڧԽֶश ݚڀഎܠɼݚڀత ؔ࿈ݚڀ ఏҊख๏
ධՁ࣮ݧ ੳ ·ͱΊͱߟ
ؔ࿈ݚڀ ਂֶशͷޯΛฒྻʹܭࢉ͢Δख๏ ಉظɼඇಉظͰͷߋ৽ํ๏͕ఏҊ /BJSΒ͜ΕΒΛڧԽֶशʹద༻ n ޯͷࢄඇಉظߋ৽ n ࢄܦݧੜ ࢄ֬ޯ߱Լ๏
!$ !#""%& !#"! !!#!!% ! !#!% $& ୯ҰϚγϯɼϚϧνεϨουͰߴ͍݁Ռ
ؔ࿈ݚڀ ֶशͷ্ͨΊʹΑ͘ΘΕ͍ͯΔख๏ n ༏ઌΛ༻͍ͨαϯϓϦϯάภΓ͕ൃੜ n ֬ͳαϯϓϧͰͷޯมԽΛେ͖͘͢Δ "MBJOΒڭࢣ͋ΓֶशʹԠ༻ ࢄγεςϜͷԠ༻ʹޭ ࢄԽॏཁαϯϓϦϯά
Guillaume Alain, Alex Lamb, Chinnadhurai Sankar, Aaron Courville, and Yoshua Bengio. Variance reduction in sgd by distributed importance sampling. arXiv preprint arXiv:1511.06481, 2015.
ؔ࿈ݚڀ ੜͨ͠ܦݧΛอଘ͠Կֶशʹ༻͢Δख๏ n ੜͨ͠ܦݧΛޮతʹ༻Ͱ͖Δ n ݹ͍ํࡦͷܦݧΛ͢͜ͱͰաద߹Λ͛Δ 1SJPSJUJ[FE&YQFSJFODF3FQMBZ n ༗༻ͳܦݧΛΑΓଟ͘࠶ੜ͢Δख๏ n
5%ޡࠩΛ༻͍ͯ༏ઌ͚ &YQFSJFODF3FQMBZ -$%%('"$' %!$&)*(.$'"* ,$. " ',++ ('* $'!(* & ',% *'$'")%''$'"', #$'"#$' *'$'" (&#-%(#'-'(''$+ ',('("%(-'.$$%. **$(*$,$1 /) *$ ' * )%0 '', *',$('% ('! * ' (' *'$'" )* + ',,$('+
࣍ ڧԽֶश ݚڀഎܠɼݚڀత ؔ࿈ݚڀ ఏҊख๏
ධՁ࣮ݧ ੳ ·ͱΊͱߟ
ఏҊख๏ "QF9ͷ֓ཁ Learner Network Replay Experiences Actor Network Environment
ڧԽֶशΛͭͷׂׂ
ఏҊख๏ n ֤ࣗͷߦಈՁOFUXPSLͱFOWJSPONFOUΛॴ࣋ n ํࡦʹج͖ͮߦಈ͠ɼঢ়ଶભҠΛ؍ଌ n ભҠʹ༏ઌΛ༩͠ɼ3FQMBZ.FNPSZʹૹ৴ n "DUPSߦಈՁOFUXPSLΛֶश͠ͳ͍
"DUPS େྔͷ"DUPS͕ಠཱʹߦಈ͠ɼܦݧΛେྔʹੜ
ఏҊख๏ "DUPS͔Βૹ৴͞ΕͨܦݧΛอ࣋ n શମͰͭͷ3FQMBZ.FNPSZΛ࣋ͭ n อ࣋Ͱ͖Δܦݧͷ্ݶΛઃఆ n ্ݶΛ͑ͨ߹'*'0Ͱআ 3FQMBZ.FNPSZ
-FBSOFSֶ͕श͢ΔܦݧΛେྔʹอ࣋
ఏҊख๏ n ܦݧΛ༏ઌॱҐʹج͖ͮαϯϓϦϯάɼֶश n ֶशʹ༻͍ͨܦݧ༏ઌΛ࠶ܭࢉ n ҰఆִؒͰ"DUPSύϥϝʔλΛૹ৴ -FBSOFS ༗༻ͳܦݧΛ༏ઌతʹֶश
ఏҊख๏ "QF9ͷ֓ཁͷ·ͱΊ Learner Network Replay Experiences Actor Network Environment
ฒྻʹܦݧΛେྔʹੜ େྔͷܦݧΛอ࣋ ใुΛ૿͢Α͏ʹֶश
ఏҊख๏ (16Λେྔʹཁٻ͠ͳ͍ n -FBSOFS(16ΛੵΜͩϚγϯ্Ͱಈ࡞ ͭ n "DUPS$16ͷΈͷϚγϯ্Ͱಈ࡞ େྔ ܦݧͷޮతͳར༻ n
3FQMBZNFNPSZશମͰڞ༗ n ܦݧʹ༏ઌΛ༩ ఏҊख๏ͷಛ ͭͷ"DUPSʹΑΔ༗༻ͳൃݟ͕શମͰڞ༗
ఏҊख๏ n ֶशΞϧΰϦζϜ n 2ؔͷۙࣅث n σʔλͷαϯϓϦϯά -FBSOFSͷϞσϧ %PVCMF%FFQ2/FUXPSL
NVMUJTUFQCPPUTUSBQUBSHFU %VFMJOH/FUXPSL 1SJPSJUJ[FE&YQFSJFODF3FQMBZ
ఏҊख๏ n "DUPSݸผʹઃఆ͞Εͨ! − greedy๏ʹै͏ l ֬!ͰϥϯμϜʹߦಈ͢Δख๏ l ϥϯμϜʹߦಈ͢Δ͜ͱͰաద߹Λ͛Δ l
"DUPSຖʹઃఆ͢Δ͜ͱͰଟ༷ੑΛ୲อ n ༏ઌॱҐʹج͖ͮαϯϓϦϯά͢ΔͨΊɼ ॏཁαϯϓϦϯάʹΑͬͯͷภΓΛमਖ਼ ͦͷଞͷࡉ͔͍ઃఆ
࣍ ڧԽֶश ݚڀഎܠɼݚڀత ؔ࿈ݚڀ ఏҊख๏
ධՁ࣮ݧ ੳ ·ͱΊͱߟ
ධՁ࣮ݧ n ࣮ݧ"UBSJͷήʔϜ FHϒϩοΫ่͠ n "DUPSɿ "DUPSʹ$16 n "DUPSͷੜܦݧɿ'14 n
શମੜܦݧɿ ,'14 3FQFBU n ޯͷߋ৽ɿճTFD n ܦݧ༰ྔݮͷͨΊ1/(Ͱѹॖ͠อଘ ࣮ݧઃఆ
ධՁ࣮ݧ ֶशऴྃ࣌ͷੑೳൺֱ ֶश࣌ؒ είΞ n ήʔϜͷείΞͷதԝ n ਓؒͷείΞ n
࠷ऴείΞɼֶश࣌ؒڞʹ طଘख๏͔Βେ͖͘վળ
ධՁ࣮ݧ ใुͷ࣌ؒมԽ ֶश࣌ؒ ใु n ͭͷήʔϜʹ͓͚Δ ֫ಘใुͷฏۉ n ଞͷख๏ͱൺֱ͠ɼ
֫ಘใुΛΑΓૣ͘ େ͖͍ͯ͘͠Δ
ධՁ࣮ݧ ࣮ݧ݁Ռ - )1( ) ) ) 3) -
1 0 0-2 0 %) - -. %) (2 . % 50 - 0 ) -4 % 50 % 50 - 0 n "QF9͕࠷ߴ͍είΞΛه n ࢄֶशʹΑֶͬͯश࣌ؒେ෯ʹॖ
࣍ ڧԽֶश ݚڀഎܠɼݚڀత ؔ࿈ݚڀ ఏҊख๏
ධՁ࣮ݧ ੳ ·ͱΊͱߟ
ੳ "DUPSͱใुͷؔ "DUPS͕ଟ͍΄ͲɼΑΓྑ͍ใुΛ֫ಘ
ੳ 3FQMBZ.FNPSZͱใुͷؔ ༰ྔ͕ଟ͍΄Ͳɼൺֱతྑ͍ใुΛ֫ಘ
ੳ ΑΓ࠷৽ͷܦݧͷֶशείΞʹد༩͢Δ͔ʁ ࠷৽ͷܦݧɼ࠷৽ͷύϥϝʔλʹجͮ͘ "DUPS͕ૹ৴͢ΔܦݧΛෳͯ͠ଟΊʹૹ৴ ΑΓ৽͍͠ܦݧ͕ଟΊʹαϯϓϦϯά͞ΕΔ ࠷৽ͷܦݧ
ੳ ࠷৽ͷܦݧͱใुͷؔ ! ࠷৽ͷܦݧͷֶशͱ ใु݁ͼ͍͍ͭͯͳ͍
ੳ n "DUPSΛ૿͢ͱใु͕૿Ճ l ہॴղؕΔ͜ͱΛ͛Δಇ͖ l େྔͷ୳ࡧͰɼ༗༻ͳܦݧΛ֫ಘ n 3FQMBZ.FNPSZΛ૿͢͜ͱͰใु͕૿Ճ n
࠷৽ͷܦݧͱใुʹతͳد༩ͳ͍ ੳ݁Ռ·ͱΊ ༗༻ͳܦݧΛΑΓ͘อ࣋Ͱ͖ͨ ܦݧͷਫ૿͠ଟ༷ੑΛ͘͠ɼ ύϑΥʔϚϯεΛԼ͛Δ
࣍ ڧԽֶश ݚڀഎܠɼݚڀత ؔ࿈ݚڀ ఏҊख๏
ධՁ࣮ݧ ੳ ·ͱΊͱߟ
·ͱΊͱߟ n ࢄʴ༏ઌ͖ܦݧ࠶ੜͷ'SBNFXPSLΛఏҊ n "QF9ֶ࣮࣌ؒशɼ࠷ऴੑೳʹ͓͍ͯ࠷ྑ ͍ੑೳΛࣔͨ͠ n աద߹ڧԽֶशʹ͓͚Δେ͖ͳͰɼࠓճσʔ λΛେྔʹੜ͢Δ୯७ͳํ๏͕ޮՌతͰ͋Δ͜ͱΛ ࣔͨ͠
n কདྷతʹσʔλΛޮΑ͘͏ํ๏Λࡧ͢Δ͖ ·ͱΊ
·ͱΊͱߟ "QF9ܦݧΛߴʹେྔʹूΊΔख๏ ෳࡶͳλεΫͰঢ়ଶ!"͕େྔʹଘࡏ େྔͷܦݧͷੜ͕ঢ়ଶ!"Λ͘Χόʔֶ͠श͕ਐΜͩ ݱঢ়ɼϥϯμϜ୳ࡧʹΑͬͯະͷߦಈΛܦݧ ൃੜසͷ͍ঢ়ଶ!"Λॏతʹ୳ࡧ͢Δख๏ ߟ
2MFBSOJOHͷ2ؔͷߋ৽ࣜ ! "# , %# ← ! "# , %#
+ α(*#+, + - max 12∈4 52 ! "#+, , 67 − ! "# , %# ) "# : ࣌ࠁ;ͷঢ়ଶ %# :࣌ࠁ;ͷߦಈ ! "# , %# ঢ়ଶ"#Ͱߦಈ%#Λͱͬͨ߹ͷਪఆใु *#ɿ࣌ࠁ;ʹ͓͚Δใु αɿֶश -ɿׂҾ 5%ޡࠩʢ5FNQPSBMMZ%JGGFSFODFʣ 5%ޡࠩ ਪఆใुͱ࣮ࡍͷใुͷࠩ