Probailistic Active Meta-Learning

Slide 1

Slide 1 text

Probabilistic Active Meta-Learning Jean Kaddour, et al., University College London, NeurIPS 2020 May 19, 2021 1 / 33

Slide 2

Slide 2 text

A Table of Contents 1 Abstract & Introduction 2 Probabilistic Meta-Learning 3 Probabilistic Active Meta-Learning Extending the Meta-Learning Model Ranking Candidates in Latent Space 4 Experiments Observed Task Parameters Partially-Observed/Noisy Task Parameters High-Dimensional(Pixel) Task Parameters 5 Conclusion 2 / 33

Slide 3

Slide 3 text

1 Abstract & Introduction 2 Probabilistic Meta-Learning 3 Probabilistic Active Meta-Learning 4 Experiments 5 Conclusion 3 / 33

Slide 4

Slide 4 text

Abstract & Introduction • Meta-learning algorithms use prior experience about tasks to learn new, related tasks eﬃciently • Typically, a set of training tasks is assumed given or randomly chosen • However, exploring the task domain is impractical in many real-world application and uniform sampling is often sub-optional 4 / 33

Slide 5

Slide 5 text

Abstract & Introduction • The main contribution is a probabilistic active meta-learning(PAML) algorithm • Improve data eﬃciency by selecting which tasks to learn next based on prior experienced 5 / 33

Slide 6

Slide 6 text

1 Abstract & Introduction 2 Probabilistic Meta-Learning 3 Probabilistic Active Meta-Learning 4 Experiments 5 Conclusion 6 / 33

Slide 7

Slide 7 text

Probabilistic Meta-Learning • Meta-Learning models deal with multiple task-speciﬁc datasets • Tasks T i (i = 1, ... , N) • Observations DT i = {(xi j , yi j )} (j = 1, ... , Mi ) • Each distribution T i ∼ p(T), DT i ∼ p(Yi|Xi, T i ) • Xi, Yi: A matrix of data • The joint distribution over task T i and data DT i : The Joint Distribution p(Yi, T i |Xi) = p(Yi|T i, Xi)p(T i ) (1) 7 / 33

Slide 8

Slide 8 text

Probabilistic Meta-Learning • Model the task speciﬁcation by local latent variable • Made distinct from global model parameters 𝜃 • 𝜃 are shared among all tasks • Learn a continuous latent representation hi ∈ RQ of task T i 8 / 33

Slide 9

Slide 9 text

Probabilistic Meta-Learning • Figure2(a): Graphical models in the context of a supervised learning problem with inputs x and targets y, and global parameters 𝜃 are shared all tasks 9 / 33

Slide 10

Slide 10 text

Probabilistic Meta-Learning • Formulate the probabilistic model: • H: A matrix of the latent task variables hi The Probabilistic Model p(Y, H, 𝜃|X) = N ∏ i=1 p(hi ) Mi ∏ j=1 p(yi j |xi j , hi, 𝜃)p(𝜃) (2) 10 / 33

Slide 11

Slide 11 text

Probabilistic Meta-Learning • Eq(2) is amenable to scalable approximate inference using stochastic variational inference • q𝜙(·): Gaussian distribution Approximate inference p𝜃 (Y, H|X) p𝜃 (H|Y, X) ≈ q𝜙(H) = N ∏ i=1 q𝜙(hi ) (3) 11 / 33

Slide 12

Slide 12 text

Probabilistic Meta-Learning • For the learning model parameters 𝜃 and variational parameters 𝜙, the intractability of the model evidence p𝜃 (Y, X) is ﬁnessed by maximizing a lower bound on the evidence (ELBO) ELBO log p𝜃 (Y|X) ≥ Eq𝜙 (H) [log p𝜃 (Y, H|X) q𝜙 (H) ] = Eq𝜙 (H) [log p𝜃 (Y|H, X) + log p𝜃 (H) q𝜙 (H) ] =: LML (𝜃, 𝜙) (4) 12 / 33

Slide 13

Slide 13 text

Probabilistic Meta-Learning • Formulate the loss function LML (𝜃, 𝜙) from Eq (4): Loss Function for Meta-Learning LML (𝜃, 𝜙) = N ∑ i=1 Mi ∑ j=1 Eq𝜙 (hi ) [log p𝜃 (yi j |xi j , hi j )] − N ∑ i=1 KL[q𝜙(hi )||p(hi )] (5) 13 / 33

Slide 14

Slide 14 text

Probabilistic Meta-Learning • The aim is to use the meta-model to make predictions Y∗ given test inputs X∗ at the test time, as faced with unseen task T ∗ (Scenario: Few-shot learning) Predictions with Optimized Parameters p𝜃 (Y∗|X∗) = Eq𝜙 (h∗) [p𝜃 (Y∗|X∗, h∗)] (6) • Without any observations from new task, it’s possible to make zero-shot predictions by replacing the variational posterior q𝜙(h∗) with the prior p(h∗) 14 / 33

Slide 15

Slide 15 text

1 Abstract & Introduction 2 Probabilistic Meta-Learning 3 Probabilistic Active Meta-Learning 4 Experiments 5 Conclusion 15 / 33

Slide 16

Slide 16 text

Probabilistic Active Meta-Learning • Explore a given task domain • Task-descriptors(task-descriptive observations): it’s possible to select which task to learn next • Task-descriptors of task T i : 𝜓i • The algorithm for active meta-learning • To make a discrete selection from a set of task-descriptors • To generate a valid continuous parameterization 16 / 33

Slide 17

Slide 17 text

Probabilistic Active Meta-Learning • Figure2(b) : Additional task descriptors 𝜓i that are conditioned on task-speciﬁc latent variables hi 17 / 33

Slide 18

Slide 18 text

Probabilistic Active Meta-Learning • Figure3 18 / 33

Slide 19

Slide 19 text

Extending the Meta-Learning Model • Rewrite Eq (2) with latent variables: • Ψ: A matrix of task-descriptors 𝜓i Extending the Meta-Learning Model p𝜃 (Y, H, Ψ|X) = N ∏ i=1 p𝜃 (𝜓i |hi )p(hi ) Mi ∏ j=1 p𝜃 (yi j |xi j , hi ) (7) 19 / 33

Slide 20

Slide 20 text

Extending the Meta-Learning Model • Maximize a lower bound on the log-marginal likelihood Extending the Meta-Learning Model log p𝜃 (Y, Ψ|X) = log Eq𝜙 (H) [p𝜃 (Y|H, X)p𝜃 (Ψ|H) p(H) q𝜙 (H) ] (8) ≥ Eq𝜙 (H) [log p𝜃 (Y|H, X) + log p𝜃 (Ψ|H) + p(H) q𝜙 (H) ] (9) = LML (𝜃, 𝜙) + N ∑ i=1 Eq𝜙 (hi ) [log p𝜃 (𝜓i |hi )] =: LPAML (𝜃, 𝜙)(10) 20 / 33

Slide 21

Slide 21 text

Extending the Meta-Learning Model • Take advantage of learned task similarities/diﬀerences that represents the full task conﬁguration T • Eq (10) : Two tasks that are similar are encouraged to be closer in latent space 21 / 33

Slide 22

Slide 22 text

Ranking Candidates in Latent Space • To rank candidates in latent space, we deﬁne the utility of a candidate h∗ as the self-information/surprisal associated with h∗ : Utility Function u(h∗) := − log N ∑ i=1 q𝜙i (h∗) + log N (11) 22 / 33

Slide 23

Slide 23 text

1 Abstract & Introduction 2 Probabilistic Meta-Learning 3 Probabilistic Active Meta-Learning 4 Experiments 5 Conclusion 23 / 33

Slide 24

Slide 24 text

Experiments • Assess whether PAML speeds up learning task domains by learning a meta-model for the dynamics of simulated robotics systems • Performance measures: • Negative Log-Likelihood(NLL): considers the full posterior predictive distribution at a test input • The Root Mean Squared Error (RMSE): considers only the predictive mean 24 / 33

Slide 25

Slide 25 text

Experiments • We consider three robotic systems in the experiments: • Cart-pole • Pendubot • Cart-double-pole 25 / 33

Slide 26

Slide 26 text

Experiments • Compare PAML to: • Uniform sampling (UNI) • Latin hypercube sampling (LHS) • Oracle 26 / 33

Slide 27

Slide 27 text

Exp:Observed Task Parameters • PAML performs signiﬁcantly better than UNI and LHS in terms of performance on the test tasks 27 / 33

Slide 28

Slide 28 text

Exp:Partially-Observed/Noisy Task Parameters • In Figure5(a), PAML achieves lower prediction errors in fewer trials than the baselines • The error after one added task of our methods is approximately matched by the baselines after about ﬁve added tasks • In Figure5(b), PAML performed better prediction than baselines 28 / 33

Slide 29

Slide 29 text

Exp:Partially-Observed/Noisy Task Parameters • To select task efficiently, PAML need to learn to effectively ignore the superfluous dimension • Add one dimention 𝜖 ∈ [0.5, 5.0] to the observations 29 / 33

Slide 30

Slide 30 text

Exp:High-Dimensional(Pixel) Task Parameters • PAML does not access to the task parameters in the experiment • But observes indirect pixel task descriptors of a cart-pole system • In Figure8, PAML consistently selects more informative cart-pole images and approaches the oracle performance signiﬁcantly faster than UNI 30 / 33

Slide 31

Slide 31 text

1 Abstract & Introduction 2 Probabilistic Meta-Learning 3 Probabilistic Active Meta-Learning 4 Experiments 5 Conclusion 31 / 33

Slide 32

Slide 32 text

Conclusion • Proposed a general and data-eﬃcient learning algorithm, combining ideas from active and meta-learning • Extend ideas from meta-learning to incorporate task descriptors for active learning of a task domain • where the algorithm can choose which task to learn next by taking advantage of prior experience • Take advantage of learned latent task embeddings to ﬁnd a meaningful space to express task similarities 32 / 33

Slide 33

Slide 33 text

Others • YouTube: NeurIPS 2020: Probabilistic Active Meta-Learning https://www.youtube.com/watch?v=ipN-bK6Od3U 33 / 33