Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Metalearning shared Hierarchy
Search
Sponsored
·
Your Podcast. Everywhere. Effortlessly.
Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.
→
Wonseok Jung
August 28, 2018
Science
53
0
Share
Metalearning shared Hierarchy
Metalearning shared Hierarchy
논문 review
Wonseok Jung
August 28, 2018
More Decks by Wonseok Jung
See All by Wonseok Jung
Ai for business -self car driving
wonseokjung
0
210
reinforcement_learning_.pdf
wonseokjung
2
1.5k
원석이의 모두연에서 강화학습 보석되기
wonseokjung
0
440
NeuralIPS
wonseokjung
0
440
Introduction Deep Reinforcement Learning
wonseokjung
0
170
Deep reinforcemenet learning -2
wonseokjung
0
220
Deep Reinforcement Learning - Introduction
wonseokjung
1
670
How to become a datascientist ?
wonseokjung
2
2.3k
Review of Taylor series
wonseokjung
1
130
Other Decks in Science
See All in Science
ド文系だった私が、 KaggleのNCAAコンペでソロ金取れるまで
wakamatsu_takumu
2
2.3k
HajimetenoLT vol.17
hashimoto_kei
1
220
凸最適化からDC最適化まで
santana_hammer
1
390
[NLP2026 参加報告会] AI for Science まとめ / NLP2026
lychee1223
0
1.9k
Deep Space Network (abreviated)
tonyrice
0
130
フィードフォワードニューラルネットワークを用いた記号入出力制御系に対する制御器設計 / Controller Design for Augmented Systems with Symbolic Inputs and Outputs Using Feedforward Neural Network
konakalab
0
120
My Little Monster
juzishuu
0
700
データベース04: SQL (1/3) 単純質問 & 集約演算
trycycle
PRO
0
1.3k
なぜ21は素因数分解されないのか? - Shorのアルゴリズムの現在と壁
daimurat
0
400
(2025) Balade en cyclotomie
mansuy
0
550
Physical AIを支えるWeights & Biases
olachinkei
1
310
機械学習 - 決定木からはじめる機械学習
trycycle
PRO
0
1.4k
Featured
See All Featured
A brief & incomplete history of UX Design for the World Wide Web: 1989–2019
jct
1
370
Raft: Consensus for Rubyists
vanstee
141
7.4k
JavaScript: Past, Present, and Future - NDC Porto 2020
reverentgeek
52
5.9k
The Art of Delivering Value - GDevCon NA Keynote
reverentgeek
16
1.9k
The Spectacular Lies of Maps
axbom
PRO
1
740
Build your cross-platform service in a week with App Engine
jlugia
234
18k
Digital Projects Gone Horribly Wrong (And the UX Pros Who Still Save the Day) - Dean Schuster
uxyall
0
1.3k
職位にかかわらず全員がリーダーシップを発揮するチーム作り / Building a team where everyone can demonstrate leadership regardless of position
madoxten
62
54k
WENDY [Excerpt]
tessaabrams
10
37k
Ruling the World: When Life Gets Gamed
codingconduct
0
220
Getting science done with accelerated Python computing platforms
jacobtomlinson
2
190
Build The Right Thing And Hit Your Dates
maggiecrowley
39
3.1k
Transcript
.FUB-FBOJOHTIBSFE)JFSBSDIZ 8POTFPL+VOH 3FJOGPSDFNFOU-FBSOJOH
ਗࢳ 8POTFPL+VOH $JUZ6OJWFSTJUZPG/FX:PSL#BSVDI$PMMFHF %BUB4DJFODF.BKPS $POOFYJPO"*"*3FTFBSDIFS %FFQ-FBSOJOH$PMMFHF3FJOGPSDFNFOU-FBSOJOH3FTFBSDIFS .PEVMBCT$53--FBEFS 3FJOGPSDFNFOU-FBSOJOH 0CKFDU%FUFDUJPO
$IBUCPU (JUIVC IUUQTHJUIVCDPNXPOTFPLKVOH 'BDFCPPL IUUQTXXXGBDFCPPLDPNXTKVOH #MPH IUUQTXPOTFPLKVOHHJUIVCJP
ݾର 1. Introduction 2. Problem Statement 3. Algorithm 4. Experiments
META LEARNING SHARED HIERARCHIES
1.INTRODUCTION
1. UTILIZE PRIOR KNOWLEDGE META LEARNING SHARED HIERARCHIES 6UJMJ[FQSJPSLOPXMFEHF .BTUFSOFXUBTL
1.1 BUT REINFORCEMENT… META LEARNING SHARED HIERARCHIES How about Reinforcement
Learning?
1.2 SOLVE EACH TASK INDEPENDENTLY AND FROM SCRATCH SUPERMARIO WITH
R.L https://www.youtube.com/watch?v=IjvbhwuCaF0
1.3 ISSUES META LEARNING SHARED HIERARCHIES Sharing information Task1 Task2
Task3 θ1 θ2 θ3
1.4 MASTER POLICY META LEARNING SHARED HIERARCHIES Master Policy Sub1
Sub2 Sub3 θ1 θ2 θ3
1.5 MLSH META LEARNING SHARED HIERARCHIES Metalearning shared hierarchies
2.PROBLEM STATEMENT
2.1 NOTATION Time step Action Transition Function Reward Set of
states Set of actions Start state Discount factor t a P(s′, r ∣ s, a) r A S S0 γ Set of reward Policy Reward State R π r REINFORCEMENT LEARNING s
2.2 NOTATION META LEARNING SHARED HIERARCHIES EJTUSJCVUJPOPWFS.%1T "HFOUחQBSBNFUFSWFDUPSܳӝਵ۽VQEBUFೠ పझٜՙܻҕਬೞחۄఠ
пపझۄఠ BHFOUоഅపझ.ਸߓݴসؘೞחۄఠ PM πθ,ϕ(a∣s) ϕ θ
"DUJPO "HFOU &OWJSPONFOU 3FXBSE At Rt 4UBUF St Rt+1 St+1
REINFORCEMENT LEARNING 2.3 OBJECTIVE MDP
REINFORCEMENT LEARNING 2.4 NEW MDP &OWJSPONFOU 3FXBSE At Rt St
Rt+1 St+1 5BQUIFCBMM 1PTJUJWF3FXBSE New MDP
SUPERMARIO WITH R.L 2.5 NEW MDP-2 "DUJPO "HFOU &OWJSPONFOU 3FXBSE
At Rt 4UBUF St Rt+1 St+1 3FXBSE 1FOBMUZ Another New MDP
2.6 FIND SHARING PARAMETER META LEARNING SHARED HIERARCHIES maximizeϕ EM∼PM
, t = 0...T − 1[R]
2.7 STRUCTURE META LEARNING SHARED HIERARCHIES
3.ALGORITHM
3.1 MLSH ALGORITHM META LEARNING SHARED HIERARCHIES
3.2 MLSH ALGORITHM META LEARNING SHARED HIERARCHIES Two main components
3.3 MLSH ALGORITHM META LEARNING SHARED HIERARCHIES Joint update period
Warmup period
3.4 MLSH ALGORITHM META LEARNING SHARED HIERARCHIES Joint update period
Warmup period
3.5 MLSH ALGORITHM META LEARNING SHARED HIERARCHIES Joint update period
Warmup period θ θ, ϕ update
3.6 MLSH ALGORITHM-2 META LEARNING SHARED HIERARCHIES Joint update period
Warmup period θ θ, ϕ update
3.7 MLSH ALGORITHM-WARMUP META LEARNING SHARED HIERARCHIES update
3.8 MLSH ALGORITHM- JOINT UPDATE PERIOD META LEARNING SHARED HIERARCHIES
update
3.8 MLSH ALGORITHM META LEARNING SHARED HIERARCHIES update
4. EXPERIMENTS
4.1 2D MOVING BANDITS TASK META LEARNING SHARED HIERARCHIES
4.2 RESULT(2D BALL) META LEARNING SHARED HIERARCHIES
4.3 WALKING, CRAWLING META LEARNING SHARED HIERARCHIES
4.4 WALKING, CRAWLING META LEARNING SHARED HIERARCHIES