Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
TensorFlow & DeepMind Lab & UNREAL
Search
Kosuke Miyoshi
April 20, 2017
Technology
1
2.6k
TensorFlow & DeepMind Lab & UNREAL
TensorFlowで実装したUNREALアルゴリズムでDeepMind Labの3D迷路を解く
Kosuke Miyoshi
April 20, 2017
Tweet
Share
More Decks by Kosuke Miyoshi
See All by Kosuke Miyoshi
Representation Learning with Contrastive Predictive Coding
miyosuda
1
180
Sutton "Reinforcement Learning" 2nd Edition Ch13: Policy Gradient Methods
miyosuda
0
190
Sutton "Reinforcement Learning" 2nd Edition Ch7: n-step Bootstrapping
miyosuda
0
81
Sutton "Reinforcement Learning" 2nd Edition Ch6: TD-learning
miyosuda
0
90
SCAN
miyosuda
0
810
Variational Auto Encoderでの Disentangled表現
miyosuda
0
620
Other Decks in Technology
See All in Technology
Snowflakeの生成AI機能を活用したデータ分析アプリの作成 〜Cortex AnalystとCortex Searchの活用とStreamlitアプリでの利用〜
nayuts
1
460
AWSで始める実践Dagster入門
kitagawaz
1
580
人工衛星のファームウェアをRustで書く理由
koba789
13
7.1k
なぜスクラムはこうなったのか?歴史が教えてくれたこと/Shall we explore the roots of Scrum
sanogemaru
5
1.6k
250905 大吉祥寺.pm 2025 前夜祭 「プログラミングに出会って20年、『今』が1番楽しい」
msykd
PRO
1
680
AI駆動開発に向けた新しいエンジニアマインドセット
kazue
0
340
現場で効くClaude Code ─ 最新動向と企業導入
takaakikakei
1
210
会社紹介資料 / Sansan Company Profile
sansan33
PRO
6
380k
今!ソフトウェアエンジニアがハードウェアに手を出すには
mackee
11
4.6k
なぜテストマネージャの視点が 必要なのか? 〜 一歩先へ進むために 〜
moritamasami
0
210
「どこから読む?」コードとカルチャーに最速で馴染むための実践ガイド
zozotech
PRO
0
280
Autonomous Database - Dedicated 技術詳細 / adb-d_technical_detail_jp
oracle4engineer
PRO
4
10k
Featured
See All Featured
Visualizing Your Data: Incorporating Mongo into Loggly Infrastructure
mongodb
48
9.7k
Music & Morning Musume
bryan
46
6.8k
StorybookのUI Testing Handbookを読んだ
zakiyama
31
6.1k
Performance Is Good for Brains [We Love Speed 2024]
tammyeverts
12
1.1k
Done Done
chrislema
185
16k
Building a Modern Day E-commerce SEO Strategy
aleyda
43
7.6k
The Cost Of JavaScript in 2023
addyosmani
53
8.9k
Keith and Marios Guide to Fast Websites
keithpitt
411
22k
The Art of Programming - Codeland 2020
erikaheidi
55
13k
Fireside Chat
paigeccino
39
3.6k
Let's Do A Bunch of Simple Stuff to Make Websites Faster
chriscoyier
507
140k
Optimising Largest Contentful Paint
csswizardry
37
3.4k
Transcript
5FOTPS'MPX %FFQNJOE-BC OBSSBUJWFOJHIUTגࣜձࣾ ࡾ߁༞ 5FOTPS'MPX6TFS(SPVQ
%FFQ.JOE-BC
6/3&"- ڧԽֶशͷ"$ΞϧΰϦζϜΛϕʔεʹ&YQFSJFODF 3FQMBZΛͬͨิॿλεΫΛΈ߹Θͤͯ%໎࿏Ͱ YഒͷֶशͷߴԽΛ࣮ݱ REINFORCEMENT LEARNING WITH UNSUPERVISED AUXILIARY TASKS
Max Jaderberg, Volodymyr Mnih, Wojciech Marian Czarnecki et. al (DeepMind, 2016)
ಈͷເ w ಈເͷதͰܦݧͨ͠ग़དྷࣄΛ࠶ݱ ϦϓϨΠ ͠ ͳ͕Βւഅ৽ൽ࣭ͷهԱͷݻఆΛߦ͍ͬͯΔ w ߠఆత൱ఆతͳใुʹؔΘΔग़དྷࣄͷເΛಘʹස ൟʹݟֶͯशΛߦ͍ͬͯΔ w
FYʮਫҿΈͰϥΠΦϯΛݟ͔͚ͯةݥͳʹ͋ͬ ͨʯ w 6/3&"-Ͱ͜ΕΛώϯτʹ͍ͯ͠Δ
ڧԽֶश ڥ ΤʔδΣϯτ "DUJPO ⬆ ➡ ⬇ ঢ়ଶ T ใु
S
6/3&"-ͷྲྀΕ %2/ "$ 6/3&"-
"$ "TZODISPOPVT"EWBODFE"DUPS$SJUJD w ෳͷڥΛඇಉظʹฒྻʹಈֶ͔ͯ͠शΛߴԽ ҆ఆԽͤͨ͞
К 1PMJDZ 7 ֤"DUJPOΛऔΔ֬ ݱࡏͷঢ়ଶՁ ⬆ ➡ ⬇ TPGUNBY MJOFBS
$POW $POW '$ -45. "$ͷωοτϫʔΫߏ
֤-PDBM/FUXPSLͰɺֶश݁Ռͷޯ EВ ͷΈΛٻΊɺ ΣΠτʹөͤͣ(MPCBMͷΣΠτ В ʹݸผʹөɻ (MPCBMͷΣΠτΛ·֤ͨ-PDBMͷΣΠτʹίϐʔɻ EВ EВ EВ
EВ В ʜ
1PMJDZ К 7ͷޯ R= = = w 73ʹ͚ۙͮΔ༷ʹߋ৽ w 37͕ਖ਼ͳΒɺऔͬͨBDUJPO͕ग़Δ֬Λ૿༷͢ʹߋ৽
37͕ෛͳΒɺऔͬͨBDUJPO͕ग़Δ֬ΛݮΒ༷͢ʹߋ৽ V network: Policy network: ˞্هͷදهͰ7(SBEJFOU%FTDFOU 1PMJDZ(SBEJFOU"TDFOUθv = θv - α * dθv, θ = θ + α * dθ 1PMJDZ 7
6/3&"- w "$ʹɺ&YQFSJFODF3FQMBZΛޮՌతʹͬͨิ ॿλεΫΛಋೖ͠ɺ͞ΒʹֶशΛߴԽͤ͞Δ w 1JYFM$POUSPM w 3FXBSE1SFEJDUJPO w 7BMVF'VODUJPO3FQMBZ
6/TVQFSWJTFE3&JOGPSDFNFOU"VYJMJBSZ-FBSOJOH
&YQFSJFODF3FQMBZ w <ঢ়ଶ "DUJPO ใु ࣍ঢ়ଶ>ͷϖΞΛେྔʹอଘ͠ ͯɺ͔ͦ͜ΒαϯϓϦϯάͯ͠ωοτϫʔΫΛֶश w %2/ɺ͜Ε͕ͳ͍ͱֶश͕҆ఆ͠ͳ͔ͬͨ w
"$Ͱ͍ͬͯͳ͍
None
1JYFM$POUSPM w ը໘ͷϐΫηϧͷมԽྔΛΑΓେ͖͘͢Δ༷ʹ͞ ͍ͤͨ w ը໘ͷϐΫηϧͷมԽΛٖࣅใुͱ͢Δิॿλε Ϋ
1JYFM$POUSPM w ը໘ΛYͷϐΫηϧάϦουʹ͚ɺάϦουຖʹ2ֶशΛߦ͏ w %VFMJOH/FUXPSLΛͬͨ2ֶश ˞1JYFM$POUSPMͰಘΒΕͨ2͕BDUJPOͷબʹΘΕΔ༁Ͱͳ͍ YͷάϦου BDUJPO ֤άϦουͷϐΫηϧมԽྔฏۉΛใुͱͨ࣌͠ͷׂҾՃࢉใु߹ܭ2
3FXBSE1SFEJDUJPO w &YQFSJFODF3FQMBZ͔Β࿈ଓͨ͠ϑϨʔϜऔΓग़ ͠ɺϑϨʔϜͷใु͕ɺਖ਼͔ෛ͔θϩ͔Λ༧ଌ ͢ΔิॿλεΫ w ༧ଌ͢Δใुɺ ʴ ʔPSͷൺ͕ʹͳΔ༷ʹαϯϓϦϯά ༗ӹͳใुΠϕϯτϨΞͰ͋ͬͯɺසൟʹαϯϓϦϯά͞ΕΔ
3FXBSE1SFEJDUJPO ࣍ͷใु͕ PSPSΛ༧ଌ
7BMVF'VODUJPO3FQMBZ w "$Ͱ͍ͬͯΔɺঢ়ଶՁ 7 ͷਪఆ "DUPS$SJUJDͷ$SJUJDଆ Λɺ&YQFSJFODF3FQMBZ͔ΒαϯϓϦϯάͨ͠ϑϨʔϜͰ࠶ ߦ͏ w 3FXBSE1SFEJDUJPOͱҧͬͯɺαϯϓϦϯάಛʹภΒͤͳ͍
ิॿλεΫɺ"DUJPOબʹӨڹ༩͑ͳ͍͕ɺϕʔ εͷ"$ͱ$POWɺ-45.ͷ8FJHIUΛڞ༗͍ͯ͠Δͷ ͰɺิॿλεΫΛೖΕΔ͜ͱʹΑΓɺͦΕΛղ͘ޮՌతͳ ಛදݱ͕ಘΒΕΔ͜ͱʹΑΓɺؒతʹ"DUJPOબʹӨ ڹΛ༩͑Δ
ଛࣦؔ #BTF"$ 7BMVF'VODUJPO 3FQMBZ 1JYFM$POUSPM YάϦου 3FXBSE 1SFEJDUJPO
None
"$ͱͷൺֱ %FFQ.JOE-BCڥʹͯฏۉͰYഒͷߴԽ
ΓΜ͝ΛऔΔͱ ϫʔϓʹ౸ୡ͢Δͱ ΛಘͯϥϯμϜͳ ॴʹϫʔϓ
࠶ݱݕূಈը IUUQTZPVUVCFY),R#F)* ˞4QFBLFS%FDLͰද͍ࣔͯ͠Δ߹ɺ63-ϦϯΫ͕ΫϦοΫͰ͖ͳ͍ͷͰɺQEGΛμϯϩʔυͯ͠ΫϦοΫ͍ͯͩ͘͠͞
1JYFM$POUSPM ֤άϦουͷલϑϨʔϜͱͷ ϐΫηϧมԽྔ ֤άϦουͷ2 औͬͨ"DUJPOʹର͢Δ2
1PMJDZ К ֤ΞΫγϣϯΛऔΔ֬ લਐ ޙୀ ࠨӈճస ࠨӈεϥΠυ ֶश͕ਐΉͱ΄΅ͷ֬Ͱ֤"DUJPOΛબͿΑ͏ʹͳͬͯ͘Δ
7BMVF'VODUJPO ݱࡏͷঢ়ଶՁ ϫʔϓ ʹۙͮ͘ʹͭΕ্͕͍ͯͬͯ͘
3FXBSE1SFEJDUJPO ϓϥεใु͕དྷΔͱ༧ଌ͍ͯ͠Δ
4PVSDF w IUUQTHJUIVCDPNNJZPTVEBVOSFBM