Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Metalearning shared Hierarchy
Search
Wonseok Jung
August 28, 2018
Science
0
37
Metalearning shared Hierarchy
Metalearning shared Hierarchy
논문 review
Wonseok Jung
August 28, 2018
Tweet
Share
More Decks by Wonseok Jung
See All by Wonseok Jung
Ai for business -self car driving
wonseokjung
0
170
reinforcement_learning_.pdf
wonseokjung
2
1.5k
원석이의 모두연에서 강화학습 보석되기
wonseokjung
0
390
NeuralIPS
wonseokjung
0
300
Introduction Deep Reinforcement Learning
wonseokjung
0
120
Deep reinforcemenet learning -2
wonseokjung
0
160
Deep Reinforcement Learning - Introduction
wonseokjung
1
600
How to become a datascientist ?
wonseokjung
2
2.3k
Review of Taylor series
wonseokjung
1
110
Other Decks in Science
See All in Science
HAS Dark Site Orientation
astronomyhouston
0
5k
構造活性フォーラム2023-山﨑担当分
yamasakih
0
320
AI科学の何が“哲学”の問題になるのか ~問いマッピングの試み~
rmaruy
1
1.3k
FIBA W杯の日本代表って組み合わせ次第で2次ラウンド行けたんじゃね?をデータで検証
saltcooky12
0
210
JSol'Ex : solar image processing in Java
melix
0
250
ChatGPT によるプログラミング授業の課題の解答生成の評価
toskamiya
0
270
AI(人工知能)の過去・現在・未来 —AIは人間を超えるのか—
tagtag
0
230
2023-08-02_spatialLIBD_BioC2023_demo
lcolladotor
0
110
『データ可視化学入門』を PythonからRに翻訳した話
bob3bob3
1
370
Yasuke
drawsbygba
0
610
救急外来でのめまい診療_中枢性めまいを見逃さない!
psasa
0
170
MIKAMI Koichi
genomethica
0
180
Featured
See All Featured
The Success of Rails: Ensuring Growth for the Next 100 Years
eileencodes
33
6k
The Art of Programming - Codeland 2020
erikaheidi
43
12k
Web development in the modern age
philhawksworth
203
10k
Faster Mobile Websites
deanohume
300
30k
Building Adaptive Systems
keathley
32
1.9k
Web Components: a chance to create the future
zenorocha
306
41k
Navigating Team Friction
lara
179
13k
Let's Do A Bunch of Simple Stuff to Make Websites Faster
chriscoyier
501
140k
The Power of CSS Pseudo Elements
geoffreycrofte
61
5k
CSS Pre-Processors: Stylus, Less & Sass
bermonpainter
352
28k
Product Roadmaps are Hard
iamctodd
45
9.7k
[RailsConf 2023 Opening Keynote] The Magic of Rails
eileencodes
13
8.3k
Transcript
.FUB-FBOJOHTIBSFE)JFSBSDIZ 8POTFPL+VOH 3FJOGPSDFNFOU-FBSOJOH
ਗࢳ 8POTFPL+VOH $JUZ6OJWFSTJUZPG/FX:PSL#BSVDI$PMMFHF %BUB4DJFODF.BKPS $POOFYJPO"*"*3FTFBSDIFS %FFQ-FBSOJOH$PMMFHF3FJOGPSDFNFOU-FBSOJOH3FTFBSDIFS .PEVMBCT$53--FBEFS 3FJOGPSDFNFOU-FBSOJOH 0CKFDU%FUFDUJPO
$IBUCPU (JUIVC IUUQTHJUIVCDPNXPOTFPLKVOH 'BDFCPPL IUUQTXXXGBDFCPPLDPNXTKVOH #MPH IUUQTXPOTFPLKVOHHJUIVCJP
ݾର 1. Introduction 2. Problem Statement 3. Algorithm 4. Experiments
META LEARNING SHARED HIERARCHIES
1.INTRODUCTION
1. UTILIZE PRIOR KNOWLEDGE META LEARNING SHARED HIERARCHIES 6UJMJ[FQSJPSLOPXMFEHF .BTUFSOFXUBTL
1.1 BUT REINFORCEMENT… META LEARNING SHARED HIERARCHIES How about Reinforcement
Learning?
1.2 SOLVE EACH TASK INDEPENDENTLY AND FROM SCRATCH SUPERMARIO WITH
R.L https://www.youtube.com/watch?v=IjvbhwuCaF0
1.3 ISSUES META LEARNING SHARED HIERARCHIES Sharing information Task1 Task2
Task3 θ1 θ2 θ3
1.4 MASTER POLICY META LEARNING SHARED HIERARCHIES Master Policy Sub1
Sub2 Sub3 θ1 θ2 θ3
1.5 MLSH META LEARNING SHARED HIERARCHIES Metalearning shared hierarchies
2.PROBLEM STATEMENT
2.1 NOTATION Time step Action Transition Function Reward Set of
states Set of actions Start state Discount factor t a P(s′, r ∣ s, a) r A S S0 γ Set of reward Policy Reward State R π r REINFORCEMENT LEARNING s
2.2 NOTATION META LEARNING SHARED HIERARCHIES EJTUSJCVUJPOPWFS.%1T "HFOUחQBSBNFUFSWFDUPSܳӝਵ۽VQEBUFೠ పझٜՙܻҕਬೞחۄఠ
пపझۄఠ BHFOUоഅపझ.ਸߓݴসؘೞחۄఠ PM πθ,ϕ(a∣s) ϕ θ
"DUJPO "HFOU &OWJSPONFOU 3FXBSE At Rt 4UBUF St Rt+1 St+1
REINFORCEMENT LEARNING 2.3 OBJECTIVE MDP
REINFORCEMENT LEARNING 2.4 NEW MDP &OWJSPONFOU 3FXBSE At Rt St
Rt+1 St+1 5BQUIFCBMM 1PTJUJWF3FXBSE New MDP
SUPERMARIO WITH R.L 2.5 NEW MDP-2 "DUJPO "HFOU &OWJSPONFOU 3FXBSE
At Rt 4UBUF St Rt+1 St+1 3FXBSE 1FOBMUZ Another New MDP
2.6 FIND SHARING PARAMETER META LEARNING SHARED HIERARCHIES maximizeϕ EM∼PM
, t = 0...T − 1[R]
2.7 STRUCTURE META LEARNING SHARED HIERARCHIES
3.ALGORITHM
3.1 MLSH ALGORITHM META LEARNING SHARED HIERARCHIES
3.2 MLSH ALGORITHM META LEARNING SHARED HIERARCHIES Two main components
3.3 MLSH ALGORITHM META LEARNING SHARED HIERARCHIES Joint update period
Warmup period
3.4 MLSH ALGORITHM META LEARNING SHARED HIERARCHIES Joint update period
Warmup period
3.5 MLSH ALGORITHM META LEARNING SHARED HIERARCHIES Joint update period
Warmup period θ θ, ϕ update
3.6 MLSH ALGORITHM-2 META LEARNING SHARED HIERARCHIES Joint update period
Warmup period θ θ, ϕ update
3.7 MLSH ALGORITHM-WARMUP META LEARNING SHARED HIERARCHIES update
3.8 MLSH ALGORITHM- JOINT UPDATE PERIOD META LEARNING SHARED HIERARCHIES
update
3.8 MLSH ALGORITHM META LEARNING SHARED HIERARCHIES update
4. EXPERIMENTS
4.1 2D MOVING BANDITS TASK META LEARNING SHARED HIERARCHIES
4.2 RESULT(2D BALL) META LEARNING SHARED HIERARCHIES
4.3 WALKING, CRAWLING META LEARNING SHARED HIERARCHIES
4.4 WALKING, CRAWLING META LEARNING SHARED HIERARCHIES