Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Probailistic Active Meta-Learning

Probailistic Active Meta-Learning

since1998

May 19, 2021
Tweet

More Decks by since1998

Other Decks in Programming

Transcript

  1. A Table of Contents 1 Abstract & Introduction 2 Probabilistic

    Meta-Learning 3 Probabilistic Active Meta-Learning Extending the Meta-Learning Model Ranking Candidates in Latent Space 4 Experiments Observed Task Parameters Partially-Observed/Noisy Task Parameters High-Dimensional(Pixel) Task Parameters 5 Conclusion 2 / 33
  2. Abstract & Introduction • Meta-learning algorithms use prior experience about

    tasks to learn new, related tasks efficiently • Typically, a set of training tasks is assumed given or randomly chosen • However, exploring the task domain is impractical in many real-world application and uniform sampling is often sub-optional 4 / 33
  3. Abstract & Introduction • The main contribution is a probabilistic

    active meta-learning(PAML) algorithm • Improve data efficiency by selecting which tasks to learn next based on prior experienced 5 / 33
  4. Probabilistic Meta-Learning • Meta-Learning models deal with multiple task-specific datasets

    • Tasks T i (i = 1, ... , N) • Observations DT i = {(xi j , yi j )} (j = 1, ... , Mi ) • Each distribution T i ∼ p(T), DT i ∼ p(Yi|Xi, T i ) • Xi, Yi: A matrix of data • The joint distribution over task T i and data DT i : The Joint Distribution p(Yi, T i |Xi) = p(Yi|T i, Xi)p(T i ) (1) 7 / 33
  5. Probabilistic Meta-Learning • Model the task specification by local latent

    variable • Made distinct from global model parameters 𝜃 • 𝜃 are shared among all tasks • Learn a continuous latent representation hi ∈ RQ of task T i 8 / 33
  6. Probabilistic Meta-Learning • Figure2(a): Graphical models in the context of

    a supervised learning problem with inputs x and targets y, and global parameters 𝜃 are shared all tasks 9 / 33
  7. Probabilistic Meta-Learning • Formulate the probabilistic model: • H: A

    matrix of the latent task variables hi The Probabilistic Model p(Y, H, 𝜃|X) = N ∏ i=1 p(hi ) Mi ∏ j=1 p(yi j |xi j , hi, 𝜃)p(𝜃) (2) 10 / 33
  8. Probabilistic Meta-Learning • Eq(2) is amenable to scalable approximate inference

    using stochastic variational inference • q𝜙(·): Gaussian distribution Approximate inference p𝜃 (Y, H|X) p𝜃 (H|Y, X) ≈ q𝜙(H) = N ∏ i=1 q𝜙(hi ) (3) 11 / 33
  9. Probabilistic Meta-Learning • For the learning model parameters 𝜃 and

    variational parameters 𝜙, the intractability of the model evidence p𝜃 (Y, X) is finessed by maximizing a lower bound on the evidence (ELBO) ELBO log p𝜃 (Y|X) ≥ Eq𝜙 (H) [log p𝜃 (Y, H|X) q𝜙 (H) ] = Eq𝜙 (H) [log p𝜃 (Y|H, X) + log p𝜃 (H) q𝜙 (H) ] =: LML (𝜃, 𝜙) (4) 12 / 33
  10. Probabilistic Meta-Learning • Formulate the loss function LML (𝜃, 𝜙)

    from Eq (4): Loss Function for Meta-Learning LML (𝜃, 𝜙) = N ∑ i=1 Mi ∑ j=1 Eq𝜙 (hi ) [log p𝜃 (yi j |xi j , hi j )] − N ∑ i=1 KL[q𝜙(hi )||p(hi )] (5) 13 / 33
  11. Probabilistic Meta-Learning • The aim is to use the meta-model

    to make predictions Y∗ given test inputs X∗ at the test time, as faced with unseen task T ∗ (Scenario: Few-shot learning) Predictions with Optimized Parameters p𝜃 (Y∗|X∗) = Eq𝜙 (h∗) [p𝜃 (Y∗|X∗, h∗)] (6) • Without any observations from new task, it’s possible to make zero-shot predictions by replacing the variational posterior q𝜙(h∗) with the prior p(h∗) 14 / 33
  12. Probabilistic Active Meta-Learning • Explore a given task domain •

    Task-descriptors(task-descriptive observations): it’s possible to select which task to learn next • Task-descriptors of task T i : 𝜓i • The algorithm for active meta-learning • To make a discrete selection from a set of task-descriptors • To generate a valid continuous parameterization 16 / 33
  13. Probabilistic Active Meta-Learning • Figure2(b) : Additional task descriptors 𝜓i

    that are conditioned on task-specific latent variables hi 17 / 33
  14. Extending the Meta-Learning Model • Rewrite Eq (2) with latent

    variables: • Ψ: A matrix of task-descriptors 𝜓i Extending the Meta-Learning Model p𝜃 (Y, H, Ψ|X) = N ∏ i=1 p𝜃 (𝜓i |hi )p(hi ) Mi ∏ j=1 p𝜃 (yi j |xi j , hi ) (7) 19 / 33
  15. Extending the Meta-Learning Model • Maximize a lower bound on

    the log-marginal likelihood Extending the Meta-Learning Model log p𝜃 (Y, Ψ|X) = log Eq𝜙 (H) [p𝜃 (Y|H, X)p𝜃 (Ψ|H) p(H) q𝜙 (H) ] (8) ≥ Eq𝜙 (H) [log p𝜃 (Y|H, X) + log p𝜃 (Ψ|H) + p(H) q𝜙 (H) ] (9) = LML (𝜃, 𝜙) + N ∑ i=1 Eq𝜙 (hi ) [log p𝜃 (𝜓i |hi )] =: LPAML (𝜃, 𝜙)(10) 20 / 33
  16. Extending the Meta-Learning Model • Take advantage of learned task

    similarities/differences that represents the full task configuration T • Eq (10) : Two tasks that are similar are encouraged to be closer in latent space 21 / 33
  17. Ranking Candidates in Latent Space • To rank candidates in

    latent space, we define the utility of a candidate h∗ as the self-information/surprisal associated with h∗ : Utility Function u(h∗) := − log N ∑ i=1 q𝜙i (h∗) + log N (11) 22 / 33
  18. Experiments • Assess whether PAML speeds up learning task domains

    by learning a meta-model for the dynamics of simulated robotics systems • Performance measures: • Negative Log-Likelihood(NLL): considers the full posterior predictive distribution at a test input • The Root Mean Squared Error (RMSE): considers only the predictive mean 24 / 33
  19. Experiments • We consider three robotic systems in the experiments:

    • Cart-pole • Pendubot • Cart-double-pole 25 / 33
  20. Experiments • Compare PAML to: • Uniform sampling (UNI) •

    Latin hypercube sampling (LHS) • Oracle 26 / 33
  21. Exp:Observed Task Parameters • PAML performs significantly better than UNI

    and LHS in terms of performance on the test tasks 27 / 33
  22. Exp:Partially-Observed/Noisy Task Parameters • In Figure5(a), PAML achieves lower prediction

    errors in fewer trials than the baselines • The error after one added task of our methods is approximately matched by the baselines after about five added tasks • In Figure5(b), PAML performed better prediction than baselines 28 / 33
  23. Exp:Partially-Observed/Noisy Task Parameters • To select task efficiently, PAML need

    to learn to effectively ignore the superfluous dimension • Add one dimention 𝜖 ∈ [0.5, 5.0] to the observations 29 / 33
  24. Exp:High-Dimensional(Pixel) Task Parameters • PAML does not access to the

    task parameters in the experiment • But observes indirect pixel task descriptors of a cart-pole system • In Figure8, PAML consistently selects more informative cart-pole images and approaches the oracle performance significantly faster than UNI 30 / 33
  25. Conclusion • Proposed a general and data-efficient learning algorithm, combining

    ideas from active and meta-learning • Extend ideas from meta-learning to incorporate task descriptors for active learning of a task domain • where the algorithm can choose which task to learn next by taking advantage of prior experience • Take advantage of learned latent task embeddings to find a meaningful space to express task similarities 32 / 33