Lock in $30 Savings on PRO—Offer Ends Soon! ⏳
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
TensorFlow & DeepMind Lab & UNREAL
Search
Kosuke Miyoshi
April 20, 2017
Technology
1
2.6k
TensorFlow & DeepMind Lab & UNREAL
TensorFlowで実装したUNREALアルゴリズムでDeepMind Labの3D迷路を解く
Kosuke Miyoshi
April 20, 2017
Tweet
Share
More Decks by Kosuke Miyoshi
See All by Kosuke Miyoshi
Representation Learning with Contrastive Predictive Coding
miyosuda
1
200
Sutton "Reinforcement Learning" 2nd Edition Ch13: Policy Gradient Methods
miyosuda
0
200
Sutton "Reinforcement Learning" 2nd Edition Ch7: n-step Bootstrapping
miyosuda
0
88
Sutton "Reinforcement Learning" 2nd Edition Ch6: TD-learning
miyosuda
0
99
SCAN
miyosuda
0
830
Variational Auto Encoderでの Disentangled表現
miyosuda
0
620
Other Decks in Technology
See All in Technology
直接メモリアクセス
koba789
0
290
会社紹介資料 / Sansan Company Profile
sansan33
PRO
11
390k
日本Rubyの会の構造と実行とあと何か / hokurikurk01
takahashim
4
970
文字列の並び順 / Unicode Collation
tmtms
3
360
打 造 A I 驅 動 的 G i t H u b ⾃ 動 化 ⼯ 作 流 程
appleboy
0
200
ML PM Talk #1 - ML PMの分類に関する考察
lycorptech_jp
PRO
1
760
AI時代の開発フローとともに気を付けたいこと
kkamegawa
0
2.6k
バグハンター視点によるサプライチェーンの脆弱性
scgajge12
3
1.1k
WordPress は終わったのか ~今のWordPress の制作手法ってなにがあんねん?~ / Is WordPress Over? How We Build with WordPress Today
tbshiki
1
580
[JAWS-UG 横浜支部 #91]DevOps Agent vs CloudWatch Investigations -比較と実践-
sh_fk2
1
240
AI活用によるPRレビュー改善の歩み ― 社内全体に広がる学びと実践
lycorptech_jp
PRO
1
190
MapKitとオープンデータで実現する地図情報の拡張と可視化
zozotech
PRO
1
130
Featured
See All Featured
The Cult of Friendly URLs
andyhume
79
6.7k
Principles of Awesome APIs and How to Build Them.
keavy
127
17k
The Language of Interfaces
destraynor
162
25k
Docker and Python
trallard
47
3.7k
For a Future-Friendly Web
brad_frost
180
10k
Build The Right Thing And Hit Your Dates
maggiecrowley
38
3k
The Success of Rails: Ensuring Growth for the Next 100 Years
eileencodes
47
7.8k
Performance Is Good for Brains [We Love Speed 2024]
tammyeverts
12
1.3k
GraphQLとの向き合い方2022年版
quramy
50
14k
Visualization
eitanlees
150
16k
No one is an island. Learnings from fostering a developers community.
thoeni
21
3.5k
The Art of Programming - Codeland 2020
erikaheidi
56
14k
Transcript
5FOTPS'MPX %FFQNJOE-BC OBSSBUJWFOJHIUTגࣜձࣾ ࡾ߁༞ 5FOTPS'MPX6TFS(SPVQ
%FFQ.JOE-BC
6/3&"- ڧԽֶशͷ"$ΞϧΰϦζϜΛϕʔεʹ&YQFSJFODF 3FQMBZΛͬͨิॿλεΫΛΈ߹Θͤͯ%໎࿏Ͱ YഒͷֶशͷߴԽΛ࣮ݱ REINFORCEMENT LEARNING WITH UNSUPERVISED AUXILIARY TASKS
Max Jaderberg, Volodymyr Mnih, Wojciech Marian Czarnecki et. al (DeepMind, 2016)
ಈͷເ w ಈເͷதͰܦݧͨ͠ग़དྷࣄΛ࠶ݱ ϦϓϨΠ ͠ ͳ͕Βւഅ৽ൽ࣭ͷهԱͷݻఆΛߦ͍ͬͯΔ w ߠఆత൱ఆతͳใुʹؔΘΔग़དྷࣄͷເΛಘʹස ൟʹݟֶͯशΛߦ͍ͬͯΔ w
FYʮਫҿΈͰϥΠΦϯΛݟ͔͚ͯةݥͳʹ͋ͬ ͨʯ w 6/3&"-Ͱ͜ΕΛώϯτʹ͍ͯ͠Δ
ڧԽֶश ڥ ΤʔδΣϯτ "DUJPO ⬆ ➡ ⬇ ঢ়ଶ T ใु
S
6/3&"-ͷྲྀΕ %2/ "$ 6/3&"-
"$ "TZODISPOPVT"EWBODFE"DUPS$SJUJD w ෳͷڥΛඇಉظʹฒྻʹಈֶ͔ͯ͠शΛߴԽ ҆ఆԽͤͨ͞
К 1PMJDZ 7 ֤"DUJPOΛऔΔ֬ ݱࡏͷঢ়ଶՁ ⬆ ➡ ⬇ TPGUNBY MJOFBS
$POW $POW '$ -45. "$ͷωοτϫʔΫߏ
֤-PDBM/FUXPSLͰɺֶश݁Ռͷޯ EВ ͷΈΛٻΊɺ ΣΠτʹөͤͣ(MPCBMͷΣΠτ В ʹݸผʹөɻ (MPCBMͷΣΠτΛ·֤ͨ-PDBMͷΣΠτʹίϐʔɻ EВ EВ EВ
EВ В ʜ
1PMJDZ К 7ͷޯ R= = = w 73ʹ͚ۙͮΔ༷ʹߋ৽ w 37͕ਖ਼ͳΒɺऔͬͨBDUJPO͕ग़Δ֬Λ૿༷͢ʹߋ৽
37͕ෛͳΒɺऔͬͨBDUJPO͕ग़Δ֬ΛݮΒ༷͢ʹߋ৽ V network: Policy network: ˞্هͷදهͰ7(SBEJFOU%FTDFOU 1PMJDZ(SBEJFOU"TDFOUθv = θv - α * dθv, θ = θ + α * dθ 1PMJDZ 7
6/3&"- w "$ʹɺ&YQFSJFODF3FQMBZΛޮՌతʹͬͨิ ॿλεΫΛಋೖ͠ɺ͞ΒʹֶशΛߴԽͤ͞Δ w 1JYFM$POUSPM w 3FXBSE1SFEJDUJPO w 7BMVF'VODUJPO3FQMBZ
6/TVQFSWJTFE3&JOGPSDFNFOU"VYJMJBSZ-FBSOJOH
&YQFSJFODF3FQMBZ w <ঢ়ଶ "DUJPO ใु ࣍ঢ়ଶ>ͷϖΞΛେྔʹอଘ͠ ͯɺ͔ͦ͜ΒαϯϓϦϯάͯ͠ωοτϫʔΫΛֶश w %2/ɺ͜Ε͕ͳ͍ͱֶश͕҆ఆ͠ͳ͔ͬͨ w
"$Ͱ͍ͬͯͳ͍
None
1JYFM$POUSPM w ը໘ͷϐΫηϧͷมԽྔΛΑΓେ͖͘͢Δ༷ʹ͞ ͍ͤͨ w ը໘ͷϐΫηϧͷมԽΛٖࣅใुͱ͢Δิॿλε Ϋ
1JYFM$POUSPM w ը໘ΛYͷϐΫηϧάϦουʹ͚ɺάϦουຖʹ2ֶशΛߦ͏ w %VFMJOH/FUXPSLΛͬͨ2ֶश ˞1JYFM$POUSPMͰಘΒΕͨ2͕BDUJPOͷબʹΘΕΔ༁Ͱͳ͍ YͷάϦου BDUJPO ֤άϦουͷϐΫηϧมԽྔฏۉΛใुͱͨ࣌͠ͷׂҾՃࢉใु߹ܭ2
3FXBSE1SFEJDUJPO w &YQFSJFODF3FQMBZ͔Β࿈ଓͨ͠ϑϨʔϜऔΓग़ ͠ɺϑϨʔϜͷใु͕ɺਖ਼͔ෛ͔θϩ͔Λ༧ଌ ͢ΔิॿλεΫ w ༧ଌ͢Δใुɺ ʴ ʔPSͷൺ͕ʹͳΔ༷ʹαϯϓϦϯά ༗ӹͳใुΠϕϯτϨΞͰ͋ͬͯɺසൟʹαϯϓϦϯά͞ΕΔ
3FXBSE1SFEJDUJPO ࣍ͷใु͕ PSPSΛ༧ଌ
7BMVF'VODUJPO3FQMBZ w "$Ͱ͍ͬͯΔɺঢ়ଶՁ 7 ͷਪఆ "DUPS$SJUJDͷ$SJUJDଆ Λɺ&YQFSJFODF3FQMBZ͔ΒαϯϓϦϯάͨ͠ϑϨʔϜͰ࠶ ߦ͏ w 3FXBSE1SFEJDUJPOͱҧͬͯɺαϯϓϦϯάಛʹภΒͤͳ͍
ิॿλεΫɺ"DUJPOબʹӨڹ༩͑ͳ͍͕ɺϕʔ εͷ"$ͱ$POWɺ-45.ͷ8FJHIUΛڞ༗͍ͯ͠Δͷ ͰɺิॿλεΫΛೖΕΔ͜ͱʹΑΓɺͦΕΛղ͘ޮՌతͳ ಛදݱ͕ಘΒΕΔ͜ͱʹΑΓɺؒతʹ"DUJPOબʹӨ ڹΛ༩͑Δ
ଛࣦؔ #BTF"$ 7BMVF'VODUJPO 3FQMBZ 1JYFM$POUSPM YάϦου 3FXBSE 1SFEJDUJPO
None
"$ͱͷൺֱ %FFQ.JOE-BCڥʹͯฏۉͰYഒͷߴԽ
ΓΜ͝ΛऔΔͱ ϫʔϓʹ౸ୡ͢Δͱ ΛಘͯϥϯμϜͳ ॴʹϫʔϓ
࠶ݱݕূಈը IUUQTZPVUVCFY),R#F)* ˞4QFBLFS%FDLͰද͍ࣔͯ͠Δ߹ɺ63-ϦϯΫ͕ΫϦοΫͰ͖ͳ͍ͷͰɺQEGΛμϯϩʔυͯ͠ΫϦοΫ͍ͯͩ͘͠͞
1JYFM$POUSPM ֤άϦουͷલϑϨʔϜͱͷ ϐΫηϧมԽྔ ֤άϦουͷ2 औͬͨ"DUJPOʹର͢Δ2
1PMJDZ К ֤ΞΫγϣϯΛऔΔ֬ લਐ ޙୀ ࠨӈճస ࠨӈεϥΠυ ֶश͕ਐΉͱ΄΅ͷ֬Ͱ֤"DUJPOΛબͿΑ͏ʹͳͬͯ͘Δ
7BMVF'VODUJPO ݱࡏͷঢ়ଶՁ ϫʔϓ ʹۙͮ͘ʹͭΕ্͕͍ͯͬͯ͘
3FXBSE1SFEJDUJPO ϓϥεใु͕དྷΔͱ༧ଌ͍ͯ͠Δ
4PVSDF w IUUQTHJUIVCDPNNJZPTVEBVOSFBM