Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
TensorFlow & DeepMind Lab & UNREAL
Search
Sponsored
·
Your Podcast. Everywhere. Effortlessly.
Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.
→
Kosuke Miyoshi
April 20, 2017
Technology
1
2.6k
TensorFlow & DeepMind Lab & UNREAL
TensorFlowで実装したUNREALアルゴリズムでDeepMind Labの3D迷路を解く
Kosuke Miyoshi
April 20, 2017
Tweet
Share
More Decks by Kosuke Miyoshi
See All by Kosuke Miyoshi
Representation Learning with Contrastive Predictive Coding
miyosuda
1
210
Sutton "Reinforcement Learning" 2nd Edition Ch13: Policy Gradient Methods
miyosuda
0
210
Sutton "Reinforcement Learning" 2nd Edition Ch7: n-step Bootstrapping
miyosuda
0
91
Sutton "Reinforcement Learning" 2nd Edition Ch6: TD-learning
miyosuda
0
110
SCAN
miyosuda
0
840
Variational Auto Encoderでの Disentangled表現
miyosuda
0
630
Other Decks in Technology
See All in Technology
MySQLのJSON機能の活用術
ikomachi226
0
120
Vitest Highlights in Angular
rainerhahnekamp
0
120
AI開発の落とし穴 〜馬には乗ってみよAIには添うてみよ〜
sansantech
PRO
10
5.6k
分析画面のクリック操作をそのままコード化 ! エンジニアとビジネスユーザーが共存するAI-ReadyなBI基盤
ikumi
0
110
オープンウェイトのLLMリランカーを契約書で評価する / searchtechjp
sansan_randd
3
490
フロントエンド開発者のための「厄払い」
optim
0
190
toCプロダクトにおけるAI機能開発のしくじりと学び / ai-product-failures-and-learnings
rince
6
4.8k
~Everything as Codeを諦めない~ 後からCDK
mu7889yoon
2
120
SMTP完全に理解した ✉️
yamatai1212
0
130
全員が「作り手」になる。職能の壁を溶かすプロトタイプ開発。
hokuo
1
650
ファシリテーション勉強中 その場に何が求められるかを考えるようになるまで / 20260123 Naoki Takahashi
shift_evolve
PRO
3
420
Digitization部 紹介資料
sansan33
PRO
1
6.7k
Featured
See All Featured
Claude Code のすすめ
schroneko
67
210k
The Success of Rails: Ensuring Growth for the Next 100 Years
eileencodes
47
7.9k
How to audit for AI Accessibility on your Front & Back End
davetheseo
0
150
Intergalactic Javascript Robots from Outer Space
tanoku
273
27k
Mobile First: as difficult as doing things right
swwweet
225
10k
The Anti-SEO Checklist Checklist. Pubcon Cyber Week
ryanjones
0
53
The untapped power of vector embeddings
frankvandijk
1
1.6k
Building AI with AI
inesmontani
PRO
1
660
Highjacked: Video Game Concept Design
rkendrick25
PRO
1
280
Connecting the Dots Between Site Speed, User Experience & Your Business [WebExpo 2025]
tammyeverts
11
810
I Don’t Have Time: Getting Over the Fear to Launch Your Podcast
jcasabona
34
2.6k
The State of eCommerce SEO: How to Win in Today's Products SERPs - #SEOweek
aleyda
2
9.5k
Transcript
5FOTPS'MPX %FFQNJOE-BC OBSSBUJWFOJHIUTגࣜձࣾ ࡾ߁༞ 5FOTPS'MPX6TFS(SPVQ
%FFQ.JOE-BC
6/3&"- ڧԽֶशͷ"$ΞϧΰϦζϜΛϕʔεʹ&YQFSJFODF 3FQMBZΛͬͨิॿλεΫΛΈ߹Θͤͯ%໎࿏Ͱ YഒͷֶशͷߴԽΛ࣮ݱ REINFORCEMENT LEARNING WITH UNSUPERVISED AUXILIARY TASKS
Max Jaderberg, Volodymyr Mnih, Wojciech Marian Czarnecki et. al (DeepMind, 2016)
ಈͷເ w ಈເͷதͰܦݧͨ͠ग़དྷࣄΛ࠶ݱ ϦϓϨΠ ͠ ͳ͕Βւഅ৽ൽ࣭ͷهԱͷݻఆΛߦ͍ͬͯΔ w ߠఆత൱ఆతͳใुʹؔΘΔग़དྷࣄͷເΛಘʹස ൟʹݟֶͯशΛߦ͍ͬͯΔ w
FYʮਫҿΈͰϥΠΦϯΛݟ͔͚ͯةݥͳʹ͋ͬ ͨʯ w 6/3&"-Ͱ͜ΕΛώϯτʹ͍ͯ͠Δ
ڧԽֶश ڥ ΤʔδΣϯτ "DUJPO ⬆ ➡ ⬇ ঢ়ଶ T ใु
S
6/3&"-ͷྲྀΕ %2/ "$ 6/3&"-
"$ "TZODISPOPVT"EWBODFE"DUPS$SJUJD w ෳͷڥΛඇಉظʹฒྻʹಈֶ͔ͯ͠शΛߴԽ ҆ఆԽͤͨ͞
К 1PMJDZ 7 ֤"DUJPOΛऔΔ֬ ݱࡏͷঢ়ଶՁ ⬆ ➡ ⬇ TPGUNBY MJOFBS
$POW $POW '$ -45. "$ͷωοτϫʔΫߏ
֤-PDBM/FUXPSLͰɺֶश݁Ռͷޯ EВ ͷΈΛٻΊɺ ΣΠτʹөͤͣ(MPCBMͷΣΠτ В ʹݸผʹөɻ (MPCBMͷΣΠτΛ·֤ͨ-PDBMͷΣΠτʹίϐʔɻ EВ EВ EВ
EВ В ʜ
1PMJDZ К 7ͷޯ R= = = w 73ʹ͚ۙͮΔ༷ʹߋ৽ w 37͕ਖ਼ͳΒɺऔͬͨBDUJPO͕ग़Δ֬Λ૿༷͢ʹߋ৽
37͕ෛͳΒɺऔͬͨBDUJPO͕ग़Δ֬ΛݮΒ༷͢ʹߋ৽ V network: Policy network: ˞্هͷදهͰ7(SBEJFOU%FTDFOU 1PMJDZ(SBEJFOU"TDFOUθv = θv - α * dθv, θ = θ + α * dθ 1PMJDZ 7
6/3&"- w "$ʹɺ&YQFSJFODF3FQMBZΛޮՌతʹͬͨิ ॿλεΫΛಋೖ͠ɺ͞ΒʹֶशΛߴԽͤ͞Δ w 1JYFM$POUSPM w 3FXBSE1SFEJDUJPO w 7BMVF'VODUJPO3FQMBZ
6/TVQFSWJTFE3&JOGPSDFNFOU"VYJMJBSZ-FBSOJOH
&YQFSJFODF3FQMBZ w <ঢ়ଶ "DUJPO ใु ࣍ঢ়ଶ>ͷϖΞΛେྔʹอଘ͠ ͯɺ͔ͦ͜ΒαϯϓϦϯάͯ͠ωοτϫʔΫΛֶश w %2/ɺ͜Ε͕ͳ͍ͱֶश͕҆ఆ͠ͳ͔ͬͨ w
"$Ͱ͍ͬͯͳ͍
None
1JYFM$POUSPM w ը໘ͷϐΫηϧͷมԽྔΛΑΓେ͖͘͢Δ༷ʹ͞ ͍ͤͨ w ը໘ͷϐΫηϧͷมԽΛٖࣅใुͱ͢Δิॿλε Ϋ
1JYFM$POUSPM w ը໘ΛYͷϐΫηϧάϦουʹ͚ɺάϦουຖʹ2ֶशΛߦ͏ w %VFMJOH/FUXPSLΛͬͨ2ֶश ˞1JYFM$POUSPMͰಘΒΕͨ2͕BDUJPOͷબʹΘΕΔ༁Ͱͳ͍ YͷάϦου BDUJPO ֤άϦουͷϐΫηϧมԽྔฏۉΛใुͱͨ࣌͠ͷׂҾՃࢉใु߹ܭ2
3FXBSE1SFEJDUJPO w &YQFSJFODF3FQMBZ͔Β࿈ଓͨ͠ϑϨʔϜऔΓग़ ͠ɺϑϨʔϜͷใु͕ɺਖ਼͔ෛ͔θϩ͔Λ༧ଌ ͢ΔิॿλεΫ w ༧ଌ͢Δใुɺ ʴ ʔPSͷൺ͕ʹͳΔ༷ʹαϯϓϦϯά ༗ӹͳใुΠϕϯτϨΞͰ͋ͬͯɺසൟʹαϯϓϦϯά͞ΕΔ
3FXBSE1SFEJDUJPO ࣍ͷใु͕ PSPSΛ༧ଌ
7BMVF'VODUJPO3FQMBZ w "$Ͱ͍ͬͯΔɺঢ়ଶՁ 7 ͷਪఆ "DUPS$SJUJDͷ$SJUJDଆ Λɺ&YQFSJFODF3FQMBZ͔ΒαϯϓϦϯάͨ͠ϑϨʔϜͰ࠶ ߦ͏ w 3FXBSE1SFEJDUJPOͱҧͬͯɺαϯϓϦϯάಛʹภΒͤͳ͍
ิॿλεΫɺ"DUJPOબʹӨڹ༩͑ͳ͍͕ɺϕʔ εͷ"$ͱ$POWɺ-45.ͷ8FJHIUΛڞ༗͍ͯ͠Δͷ ͰɺิॿλεΫΛೖΕΔ͜ͱʹΑΓɺͦΕΛղ͘ޮՌతͳ ಛදݱ͕ಘΒΕΔ͜ͱʹΑΓɺؒతʹ"DUJPOબʹӨ ڹΛ༩͑Δ
ଛࣦؔ #BTF"$ 7BMVF'VODUJPO 3FQMBZ 1JYFM$POUSPM YάϦου 3FXBSE 1SFEJDUJPO
None
"$ͱͷൺֱ %FFQ.JOE-BCڥʹͯฏۉͰYഒͷߴԽ
ΓΜ͝ΛऔΔͱ ϫʔϓʹ౸ୡ͢Δͱ ΛಘͯϥϯμϜͳ ॴʹϫʔϓ
࠶ݱݕূಈը IUUQTZPVUVCFY),R#F)* ˞4QFBLFS%FDLͰද͍ࣔͯ͠Δ߹ɺ63-ϦϯΫ͕ΫϦοΫͰ͖ͳ͍ͷͰɺQEGΛμϯϩʔυͯ͠ΫϦοΫ͍ͯͩ͘͠͞
1JYFM$POUSPM ֤άϦουͷલϑϨʔϜͱͷ ϐΫηϧมԽྔ ֤άϦουͷ2 औͬͨ"DUJPOʹର͢Δ2
1PMJDZ К ֤ΞΫγϣϯΛऔΔ֬ લਐ ޙୀ ࠨӈճస ࠨӈεϥΠυ ֶश͕ਐΉͱ΄΅ͷ֬Ͱ֤"DUJPOΛબͿΑ͏ʹͳͬͯ͘Δ
7BMVF'VODUJPO ݱࡏͷঢ়ଶՁ ϫʔϓ ʹۙͮ͘ʹͭΕ্͕͍ͯͬͯ͘
3FXBSE1SFEJDUJPO ϓϥεใु͕དྷΔͱ༧ଌ͍ͯ͠Δ
4PVSDF w IUUQTHJUIVCDPNNJZPTVEBVOSFBM