Probailistic Active Meta-Learning

Probabilistic Active Meta-Learning Jean Kaddour, et al., University College London,
NeurIPS 2020 May 19, 2021 1 / 33

A Table of Contents 1 Abstract & Introduction 2 Probabilistic
Meta-Learning 3 Probabilistic Active Meta-Learning Extending the Meta-Learning Model Ranking Candidates in Latent Space 4 Experiments Observed Task Parameters Partially-Observed/Noisy Task Parameters High-Dimensional(Pixel) Task Parameters 5 Conclusion 2 / 33

1 Abstract & Introduction 2 Probabilistic Meta-Learning 3 Probabilistic Active
Meta-Learning 4 Experiments 5 Conclusion 3 / 33

Abstract & Introduction • Meta-learning algorithms use prior experience about
tasks to learn new, related tasks eﬃciently • Typically, a set of training tasks is assumed given or randomly chosen • However, exploring the task domain is impractical in many real-world application and uniform sampling is often sub-optional 4 / 33

Abstract & Introduction • The main contribution is a probabilistic
active meta-learning(PAML) algorithm • Improve data eﬃciency by selecting which tasks to learn next based on prior experienced 5 / 33

Probabilistic Meta-Learning • Meta-Learning models deal with multiple task-speciﬁc datasets
• Tasks T i (i = 1, ... , N) • Observations DT i = {(xi j , yi j )} (j = 1, ... , Mi ) • Each distribution T i ∼ p(T), DT i ∼ p(Yi|Xi, T i ) • Xi, Yi: A matrix of data • The joint distribution over task T i and data DT i : The Joint Distribution p(Yi, T i |Xi) = p(Yi|T i, Xi)p(T i ) (1) 7 / 33

Probabilistic Meta-Learning • Model the task speciﬁcation by local latent
variable • Made distinct from global model parameters 𝜃 • 𝜃 are shared among all tasks • Learn a continuous latent representation hi ∈ RQ of task T i 8 / 33

Probabilistic Meta-Learning • Figure2(a): Graphical models in the context of
a supervised learning problem with inputs x and targets y, and global parameters 𝜃 are shared all tasks 9 / 33

Probabilistic Meta-Learning • Formulate the probabilistic model: • H: A
matrix of the latent task variables hi The Probabilistic Model p(Y, H, 𝜃|X) = N ∏ i=1 p(hi ) Mi ∏ j=1 p(yi j |xi j , hi, 𝜃)p(𝜃) (2) 10 / 33

Probabilistic Meta-Learning • Eq(2) is amenable to scalable approximate inference
using stochastic variational inference • q𝜙(·): Gaussian distribution Approximate inference p𝜃 (Y, H|X) p𝜃 (H|Y, X) ≈ q𝜙(H) = N ∏ i=1 q𝜙(hi ) (3) 11 / 33

Probabilistic Meta-Learning • For the learning model parameters 𝜃 and
variational parameters 𝜙, the intractability of the model evidence p𝜃 (Y, X) is ﬁnessed by maximizing a lower bound on the evidence (ELBO) ELBO log p𝜃 (Y|X) ≥ Eq𝜙 (H) [log p𝜃 (Y, H|X) q𝜙 (H) ] = Eq𝜙 (H) [log p𝜃 (Y|H, X) + log p𝜃 (H) q𝜙 (H) ] =: LML (𝜃, 𝜙) (4) 12 / 33

Probabilistic Meta-Learning • Formulate the loss function LML (𝜃, 𝜙)
from Eq (4): Loss Function for Meta-Learning LML (𝜃, 𝜙) = N ∑ i=1 Mi ∑ j=1 Eq𝜙 (hi ) [log p𝜃 (yi j |xi j , hi j )] − N ∑ i=1 KL[q𝜙(hi )||p(hi )] (5) 13 / 33

Probabilistic Meta-Learning • The aim is to use the meta-model
to make predictions Y∗ given test inputs X∗ at the test time, as faced with unseen task T ∗ (Scenario: Few-shot learning) Predictions with Optimized Parameters p𝜃 (Y∗|X∗) = Eq𝜙 (h∗) [p𝜃 (Y∗|X∗, h∗)] (6) • Without any observations from new task, it’s possible to make zero-shot predictions by replacing the variational posterior q𝜙(h∗) with the prior p(h∗) 14 / 33

Probabilistic Active Meta-Learning • Explore a given task domain •
Task-descriptors(task-descriptive observations): it’s possible to select which task to learn next • Task-descriptors of task T i : 𝜓i • The algorithm for active meta-learning • To make a discrete selection from a set of task-descriptors • To generate a valid continuous parameterization 16 / 33

Probabilistic Active Meta-Learning • Figure2(b) : Additional task descriptors 𝜓i
that are conditioned on task-speciﬁc latent variables hi 17 / 33

Probabilistic Active Meta-Learning • Figure3 18 / 33

Extending the Meta-Learning Model • Rewrite Eq (2) with latent
variables: • Ψ: A matrix of task-descriptors 𝜓i Extending the Meta-Learning Model p𝜃 (Y, H, Ψ|X) = N ∏ i=1 p𝜃 (𝜓i |hi )p(hi ) Mi ∏ j=1 p𝜃 (yi j |xi j , hi ) (7) 19 / 33

Extending the Meta-Learning Model • Maximize a lower bound on
the log-marginal likelihood Extending the Meta-Learning Model log p𝜃 (Y, Ψ|X) = log Eq𝜙 (H) [p𝜃 (Y|H, X)p𝜃 (Ψ|H) p(H) q𝜙 (H) ] (8) ≥ Eq𝜙 (H) [log p𝜃 (Y|H, X) + log p𝜃 (Ψ|H) + p(H) q𝜙 (H) ] (9) = LML (𝜃, 𝜙) + N ∑ i=1 Eq𝜙 (hi ) [log p𝜃 (𝜓i |hi )] =: LPAML (𝜃, 𝜙)(10) 20 / 33

Extending the Meta-Learning Model • Take advantage of learned task
similarities/diﬀerences that represents the full task conﬁguration T • Eq (10) : Two tasks that are similar are encouraged to be closer in latent space 21 / 33

Ranking Candidates in Latent Space • To rank candidates in
latent space, we deﬁne the utility of a candidate h∗ as the self-information/surprisal associated with h∗ : Utility Function u(h∗) := − log N ∑ i=1 q𝜙i (h∗) + log N (11) 22 / 33

Experiments • Assess whether PAML speeds up learning task domains
by learning a meta-model for the dynamics of simulated robotics systems • Performance measures: • Negative Log-Likelihood(NLL): considers the full posterior predictive distribution at a test input • The Root Mean Squared Error (RMSE): considers only the predictive mean 24 / 33

Experiments • We consider three robotic systems in the experiments:
• Cart-pole • Pendubot • Cart-double-pole 25 / 33

Experiments • Compare PAML to: • Uniform sampling (UNI) •
Latin hypercube sampling (LHS) • Oracle 26 / 33

Exp:Observed Task Parameters • PAML performs signiﬁcantly better than UNI
and LHS in terms of performance on the test tasks 27 / 33

Exp:Partially-Observed/Noisy Task Parameters • In Figure5(a), PAML achieves lower prediction
errors in fewer trials than the baselines • The error after one added task of our methods is approximately matched by the baselines after about ﬁve added tasks • In Figure5(b), PAML performed better prediction than baselines 28 / 33

Exp:Partially-Observed/Noisy Task Parameters • To select task efficiently, PAML need
to learn to effectively ignore the superfluous dimension • Add one dimention 𝜖 ∈ [0.5, 5.0] to the observations 29 / 33

Exp:High-Dimensional(Pixel) Task Parameters • PAML does not access to the
task parameters in the experiment • But observes indirect pixel task descriptors of a cart-pole system • In Figure8, PAML consistently selects more informative cart-pole images and approaches the oracle performance signiﬁcantly faster than UNI 30 / 33

Conclusion • Proposed a general and data-eﬃcient learning algorithm, combining
ideas from active and meta-learning • Extend ideas from meta-learning to incorporate task descriptors for active learning of a task domain • where the algorithm can choose which task to learn next by taking advantage of prior experience • Take advantage of learned latent task embeddings to ﬁnd a meaningful space to express task similarities 32 / 33

Others • YouTube: NeurIPS 2020: Probabilistic Active Meta-Learning https://www.youtube.com/watch?v=ipN-bK6Od3U 33
/ 33

Probailistic Active Meta-Learning

Probailistic Active Meta-Learning

since1998

More Decks by since1998

Other Decks in Programming

Featured

Transcript

Probabilistic Active Meta-Learning Jean Kaddour, et al., University College London,

A Table of Contents 1 Abstract & Introduction 2 Probabilistic

1 Abstract & Introduction 2 Probabilistic Meta-Learning 3 Probabilistic Active

Abstract & Introduction • Meta-learning algorithms use prior experience about

Abstract & Introduction • The main contribution is a probabilistic

1 Abstract & Introduction 2 Probabilistic Meta-Learning 3 Probabilistic Active

Probabilistic Meta-Learning • Meta-Learning models deal with multiple task-speciﬁc datasets

Probabilistic Meta-Learning • Model the task speciﬁcation by local latent

Probabilistic Meta-Learning • Figure2(a): Graphical models in the context of

Probabilistic Meta-Learning • Formulate the probabilistic model: • H: A

Probabilistic Meta-Learning • Eq(2) is amenable to scalable approximate inference

Probabilistic Meta-Learning • For the learning model parameters 𝜃 and

Probabilistic Meta-Learning • Formulate the loss function LML (𝜃, 𝜙)

Probabilistic Meta-Learning • The aim is to use the meta-model

1 Abstract & Introduction 2 Probabilistic Meta-Learning 3 Probabilistic Active

Probabilistic Active Meta-Learning • Explore a given task domain •

Probabilistic Active Meta-Learning • Figure2(b) : Additional task descriptors 𝜓i

Probabilistic Active Meta-Learning • Figure3 18 / 33

Extending the Meta-Learning Model • Rewrite Eq (2) with latent

Extending the Meta-Learning Model • Maximize a lower bound on

Extending the Meta-Learning Model • Take advantage of learned task

Ranking Candidates in Latent Space • To rank candidates in

1 Abstract & Introduction 2 Probabilistic Meta-Learning 3 Probabilistic Active

Experiments • Assess whether PAML speeds up learning task domains

Experiments • We consider three robotic systems in the experiments:

Experiments • Compare PAML to: • Uniform sampling (UNI) •

Exp:Observed Task Parameters • PAML performs signiﬁcantly better than UNI

Exp:Partially-Observed/Noisy Task Parameters • In Figure5(a), PAML achieves lower prediction

Exp:Partially-Observed/Noisy Task Parameters • To select task eﬃciently, PAML need

Exp:High-Dimensional(Pixel) Task Parameters • PAML does not access to the

1 Abstract & Introduction 2 Probabilistic Meta-Learning 3 Probabilistic Active

Conclusion • Proposed a general and data-eﬃcient learning algorithm, combining

Others • YouTube: NeurIPS 2020: Probabilistic Active Meta-Learning https://www.youtube.com/watch?v=ipN-bK6Od3U 33