Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
TensorFlow & DeepMind Lab & UNREAL
Search
Kosuke Miyoshi
April 20, 2017
Technology
1
2.6k
TensorFlow & DeepMind Lab & UNREAL
TensorFlowで実装したUNREALアルゴリズムでDeepMind Labの3D迷路を解く
Kosuke Miyoshi
April 20, 2017
Tweet
Share
More Decks by Kosuke Miyoshi
See All by Kosuke Miyoshi
Representation Learning with Contrastive Predictive Coding
miyosuda
1
210
Sutton "Reinforcement Learning" 2nd Edition Ch13: Policy Gradient Methods
miyosuda
0
210
Sutton "Reinforcement Learning" 2nd Edition Ch7: n-step Bootstrapping
miyosuda
0
91
Sutton "Reinforcement Learning" 2nd Edition Ch6: TD-learning
miyosuda
0
110
SCAN
miyosuda
0
840
Variational Auto Encoderでの Disentangled表現
miyosuda
0
630
Other Decks in Technology
See All in Technology
Databricks Free Edition講座 データサイエンス編
taka_aki
0
250
メルカリのAI活用を支えるAIセキュリティ
s3h
8
5.5k
制約が導く迷わない設計 〜 信頼性と運用性を両立するマイナンバー管理システムの実践 〜
bwkw
2
500
SREが向き合う大規模リアーキテクチャ 〜信頼性とアジリティの両立〜
zepprix
0
230
オープンウェイトのLLMリランカーを契約書で評価する / searchtechjp
sansan_randd
3
480
2026年はチャンキングを極める!
shibuiwilliam
8
1.7k
最速で価値を出すための プロダクトエンジニアのツッコミ術
kaacun
1
440
What happened to RubyGems and what can we learn?
mikemcquaid
0
140
クレジットカード決済基盤を支えるSRE - 厳格な監査とSRE運用の両立 (SRE Kaigi 2026)
capytan
5
1.2k
Introduction to Bill One Development Engineer
sansan33
PRO
0
350
SREの仕事を自動化する際にやっておきたい5つのポイント
jacopen
6
1.2k
分析画面のクリック操作をそのままコード化 ! エンジニアとビジネスユーザーが共存するAI-ReadyなBI基盤
ikumi
0
110
Featured
See All Featured
Beyond borders and beyond the search box: How to win the global "messy middle" with AI-driven SEO
davidcarrasco
1
45
sira's awesome portfolio website redesign presentation
elsirapls
0
140
Sharpening the Axe: The Primacy of Toolmaking
bcantrill
46
2.7k
Keith and Marios Guide to Fast Websites
keithpitt
413
23k
DBのスキルで生き残る技術 - AI時代におけるテーブル設計の勘所
soudai
PRO
61
49k
Lightning Talk: Beautiful Slides for Beginners
inesmontani
PRO
1
430
Between Models and Reality
mayunak
1
180
Leading Effective Engineering Teams in the AI Era
addyosmani
9
1.5k
Building Better People: How to give real-time feedback that sticks.
wjessup
370
20k
StorybookのUI Testing Handbookを読んだ
zakiyama
31
6.6k
How to Create Impact in a Changing Tech Landscape [PerfNow 2023]
tammyeverts
55
3.2k
Lessons Learnt from Crawling 1000+ Websites
charlesmeaden
PRO
1
1.1k
Transcript
5FOTPS'MPX %FFQNJOE-BC OBSSBUJWFOJHIUTגࣜձࣾ ࡾ߁༞ 5FOTPS'MPX6TFS(SPVQ
%FFQ.JOE-BC
6/3&"- ڧԽֶशͷ"$ΞϧΰϦζϜΛϕʔεʹ&YQFSJFODF 3FQMBZΛͬͨิॿλεΫΛΈ߹Θͤͯ%໎࿏Ͱ YഒͷֶशͷߴԽΛ࣮ݱ REINFORCEMENT LEARNING WITH UNSUPERVISED AUXILIARY TASKS
Max Jaderberg, Volodymyr Mnih, Wojciech Marian Czarnecki et. al (DeepMind, 2016)
ಈͷເ w ಈເͷதͰܦݧͨ͠ग़དྷࣄΛ࠶ݱ ϦϓϨΠ ͠ ͳ͕Βւഅ৽ൽ࣭ͷهԱͷݻఆΛߦ͍ͬͯΔ w ߠఆత൱ఆతͳใुʹؔΘΔग़དྷࣄͷເΛಘʹස ൟʹݟֶͯशΛߦ͍ͬͯΔ w
FYʮਫҿΈͰϥΠΦϯΛݟ͔͚ͯةݥͳʹ͋ͬ ͨʯ w 6/3&"-Ͱ͜ΕΛώϯτʹ͍ͯ͠Δ
ڧԽֶश ڥ ΤʔδΣϯτ "DUJPO ⬆ ➡ ⬇ ঢ়ଶ T ใु
S
6/3&"-ͷྲྀΕ %2/ "$ 6/3&"-
"$ "TZODISPOPVT"EWBODFE"DUPS$SJUJD w ෳͷڥΛඇಉظʹฒྻʹಈֶ͔ͯ͠शΛߴԽ ҆ఆԽͤͨ͞
К 1PMJDZ 7 ֤"DUJPOΛऔΔ֬ ݱࡏͷঢ়ଶՁ ⬆ ➡ ⬇ TPGUNBY MJOFBS
$POW $POW '$ -45. "$ͷωοτϫʔΫߏ
֤-PDBM/FUXPSLͰɺֶश݁Ռͷޯ EВ ͷΈΛٻΊɺ ΣΠτʹөͤͣ(MPCBMͷΣΠτ В ʹݸผʹөɻ (MPCBMͷΣΠτΛ·֤ͨ-PDBMͷΣΠτʹίϐʔɻ EВ EВ EВ
EВ В ʜ
1PMJDZ К 7ͷޯ R= = = w 73ʹ͚ۙͮΔ༷ʹߋ৽ w 37͕ਖ਼ͳΒɺऔͬͨBDUJPO͕ग़Δ֬Λ૿༷͢ʹߋ৽
37͕ෛͳΒɺऔͬͨBDUJPO͕ग़Δ֬ΛݮΒ༷͢ʹߋ৽ V network: Policy network: ˞্هͷදهͰ7(SBEJFOU%FTDFOU 1PMJDZ(SBEJFOU"TDFOUθv = θv - α * dθv, θ = θ + α * dθ 1PMJDZ 7
6/3&"- w "$ʹɺ&YQFSJFODF3FQMBZΛޮՌతʹͬͨิ ॿλεΫΛಋೖ͠ɺ͞ΒʹֶशΛߴԽͤ͞Δ w 1JYFM$POUSPM w 3FXBSE1SFEJDUJPO w 7BMVF'VODUJPO3FQMBZ
6/TVQFSWJTFE3&JOGPSDFNFOU"VYJMJBSZ-FBSOJOH
&YQFSJFODF3FQMBZ w <ঢ়ଶ "DUJPO ใु ࣍ঢ়ଶ>ͷϖΞΛେྔʹอଘ͠ ͯɺ͔ͦ͜ΒαϯϓϦϯάͯ͠ωοτϫʔΫΛֶश w %2/ɺ͜Ε͕ͳ͍ͱֶश͕҆ఆ͠ͳ͔ͬͨ w
"$Ͱ͍ͬͯͳ͍
None
1JYFM$POUSPM w ը໘ͷϐΫηϧͷมԽྔΛΑΓେ͖͘͢Δ༷ʹ͞ ͍ͤͨ w ը໘ͷϐΫηϧͷมԽΛٖࣅใुͱ͢Δิॿλε Ϋ
1JYFM$POUSPM w ը໘ΛYͷϐΫηϧάϦουʹ͚ɺάϦουຖʹ2ֶशΛߦ͏ w %VFMJOH/FUXPSLΛͬͨ2ֶश ˞1JYFM$POUSPMͰಘΒΕͨ2͕BDUJPOͷબʹΘΕΔ༁Ͱͳ͍ YͷάϦου BDUJPO ֤άϦουͷϐΫηϧมԽྔฏۉΛใुͱͨ࣌͠ͷׂҾՃࢉใु߹ܭ2
3FXBSE1SFEJDUJPO w &YQFSJFODF3FQMBZ͔Β࿈ଓͨ͠ϑϨʔϜऔΓग़ ͠ɺϑϨʔϜͷใु͕ɺਖ਼͔ෛ͔θϩ͔Λ༧ଌ ͢ΔิॿλεΫ w ༧ଌ͢Δใुɺ ʴ ʔPSͷൺ͕ʹͳΔ༷ʹαϯϓϦϯά ༗ӹͳใुΠϕϯτϨΞͰ͋ͬͯɺසൟʹαϯϓϦϯά͞ΕΔ
3FXBSE1SFEJDUJPO ࣍ͷใु͕ PSPSΛ༧ଌ
7BMVF'VODUJPO3FQMBZ w "$Ͱ͍ͬͯΔɺঢ়ଶՁ 7 ͷਪఆ "DUPS$SJUJDͷ$SJUJDଆ Λɺ&YQFSJFODF3FQMBZ͔ΒαϯϓϦϯάͨ͠ϑϨʔϜͰ࠶ ߦ͏ w 3FXBSE1SFEJDUJPOͱҧͬͯɺαϯϓϦϯάಛʹภΒͤͳ͍
ิॿλεΫɺ"DUJPOબʹӨڹ༩͑ͳ͍͕ɺϕʔ εͷ"$ͱ$POWɺ-45.ͷ8FJHIUΛڞ༗͍ͯ͠Δͷ ͰɺิॿλεΫΛೖΕΔ͜ͱʹΑΓɺͦΕΛղ͘ޮՌతͳ ಛදݱ͕ಘΒΕΔ͜ͱʹΑΓɺؒతʹ"DUJPOબʹӨ ڹΛ༩͑Δ
ଛࣦؔ #BTF"$ 7BMVF'VODUJPO 3FQMBZ 1JYFM$POUSPM YάϦου 3FXBSE 1SFEJDUJPO
None
"$ͱͷൺֱ %FFQ.JOE-BCڥʹͯฏۉͰYഒͷߴԽ
ΓΜ͝ΛऔΔͱ ϫʔϓʹ౸ୡ͢Δͱ ΛಘͯϥϯμϜͳ ॴʹϫʔϓ
࠶ݱݕূಈը IUUQTZPVUVCFY),R#F)* ˞4QFBLFS%FDLͰද͍ࣔͯ͠Δ߹ɺ63-ϦϯΫ͕ΫϦοΫͰ͖ͳ͍ͷͰɺQEGΛμϯϩʔυͯ͠ΫϦοΫ͍ͯͩ͘͠͞
1JYFM$POUSPM ֤άϦουͷલϑϨʔϜͱͷ ϐΫηϧมԽྔ ֤άϦουͷ2 औͬͨ"DUJPOʹର͢Δ2
1PMJDZ К ֤ΞΫγϣϯΛऔΔ֬ લਐ ޙୀ ࠨӈճస ࠨӈεϥΠυ ֶश͕ਐΉͱ΄΅ͷ֬Ͱ֤"DUJPOΛબͿΑ͏ʹͳͬͯ͘Δ
7BMVF'VODUJPO ݱࡏͷঢ়ଶՁ ϫʔϓ ʹۙͮ͘ʹͭΕ্͕͍ͯͬͯ͘
3FXBSE1SFEJDUJPO ϓϥεใु͕དྷΔͱ༧ଌ͍ͯ͠Δ
4PVSDF w IUUQTHJUIVCDPNNJZPTVEBVOSFBM