Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Meta-learning from Tasks with Heterogeneous Att...

Meta-learning from Tasks with Heterogeneous Attribute Spaces

Update: Dec 16 2021

since1998

August 09, 2021
Tweet

More Decks by since1998

Other Decks in Programming

Transcript

  1. So far, many meta-learning methods have been proposed. • Model-Agnostic

    Meta-learning (MAML) • Probabilistic Active Meta-learning (PAML) • Meta Reinforcement Learning etc ... However, any methods assume that all training and target tasks share the same attribute (feature) space, and they are inapplicable when attribute sizes are different across tasks. Tasks whose attribute sizes are different: • y = wx & y = w1 x1 + w2 x2 3 / 34 Preface
  2. We propose a heterogeneous meta-learning method that train a model

    on tasks with various attribute spaces It enables to solve unseen tasks whose attribute spaces are different from the training tasks given few labeled instances 4 / 34 Preface
  3. Training phase {Dd }D d=1 : Datasets in multiple tasks

    with heterogeneous attribute spaces • Dd = {(xdn , ydn )}Nd n=1 : The set of the pairs of observed attribute vectors xdn ∈ RId and response vectors ydn ∈ RJd of the nth instance in task d • Nd is the number of instances, Id is the number of attributes, and Jd is the number of responses • The numbers of instances, attributes and responses can be different across tasks Nd = Nd , Id = Id , and Jd = Jd 5 / 34 Propose Method: Dataset
  4. Test phase Dd∗ = {(xd∗n , yd∗n )}Nd∗ n=1 :

    Dataset on a target task (Support set) • Nd∗ : The number of instances which is small • The target task isn’t contained in the given training tasks d∗ ∈ {1, . . . , D} We want to predict response yd∗n for observed attribute vector xd∗n (Query) in the target task 6 / 34 Propose Method: Dataset
  5. Our method is composed of two networks: Inference Network Infer

    latent representations of each attribute and each response from a few labeled instances Prediction Network Responses of unlabeled instances are predicted with the inferred representations 7 / 34 Proposed Method: Model
  6. S = {(xn , yn )}N n=1 : Support set

    in a task • xn = (xni )I i=1 : I-dimensional observed attribute vector • yn = (ynj )J j=1 : J-dimensional observed response vector Use categorical values using one-hot encoding 8 / 34 Proposed Method: Model
  7. V = {vi }I i=1 : Latent attribute vectors •

    vi : The representation of the ith attribute C = {cj }J j=1 : Latent response vectors • cj : The representation of the jth response z: Latent instance vector 9 / 34 Proposed Method: Model
  8. Input: Support set in a task S Output: Latent attribute

    vectors V & Latent response vectors C 10 / 34 Inference Network
  9. First, we calculate initial attribute representation ¯ vi and initial

    response representation ¯ cj • f¯ v , g¯ v , f¯ c , g¯ c : Feed-forward neural networks Initial attribute / response representation ¯ vi / ¯ cj ¯ vi = g¯ v 1 N N n=1 f¯ v (xni ) , ¯ cj = g¯ c 1 N N n=1 f¯ c (ynj ) (1) 11 / 34 Inference Network
  10. The initial attribute ¯ vi and response representations ¯ cj

    don’t contain information about the relationship with other attributes and responses ⇒ By concatenating the representations and their values, we incorporate information on each attribute and response with a permutation invariant neural network 12 / 34 Inference Network
  11. Next, we calculate the representation for the nth instance un

    • fu , gu : Feed-forward neural networks The representation for the nth instance un un = gu 1 I I i=1 fu ([¯ vi , xni ]) + 1 J J j=1 fu ([¯ cj , ynj ]) (2) 13 / 34 Inference Network
  12. Finally, we calculate attribute representation vi and response representation cj

    • fv , gv , fc , gc : Feed-forward neural networks Attribute / Response representation ¯ vi / ¯ cj vi = gv 1 N N n=1 fv ([un , xni ]) , cj = gc 1 N N n=1 fc ([un , ynj ]) (3) 14 / 34 Inference Network
  13. Input: Query x, Latent attribute spaces V & Latent response

    spaces C Output: The predicted response ˆ y 15 / 34 Prediction Network
  14. We obtain latent instance vector z given observed attribute vector

    x = (xi )I i=1 and latent attribute vectors V • fz , gz : Feed-forward neural networks Latent instance vector z z = gz 1 I I i=1 fz ([vi , xi ]) (4) 16 / 34 Prediction Network
  15. We predict response ˆ y on query x with latent

    instance vector z and latent response vectors C • fy : Feed-forward neural networks • Φ : Parameters The predicted response ˆ y ˆ yj (x, S; Φ) = fy ([cj , z]) (5) The prediction depends on support set S and parameters Φ of the following all neural networks 17 / 34 Prediction Network
  16. We estimate neural network parameters Φ by minimizing the loss

    • Generated support set S and query set Q given training datasets {Dd }D d=1 • NQ , JQ : The number of instances / responses in query set Q Updated parameters ˆ Φ ˆ Φ = arg min Φ EDd [E(S,Q)∼Dd [E(Q|S; Φ)]] (6) where E(Q|S; Φ) = 1 NQ JQ (x,y)∈Q JQ j=1 yj − ˆ yj (x, S; Φ) (7) 18 / 34 Training
  17. We first evaluated the proposed method on simple synthetic regression

    tasks with one- or two-dimensional attribute spaces and one-dimensional response spaces • One third of tasks were generated from a one-dimensional linear model y = wd x • One third of tasks were generated from a one-dimensional sine curve y = sin(x + 3wd ) • The remaining of tasks were generated from the following two-dimensional model y = wd1 x1 + sin(x2 + 3wd2 ) y = sin(x + 3wd ) y = wd1 x1 + sin(x2 + 3wd2 ) y = wd x All Tasks 20 / 34 Experiments: Synthetic Data
  18. Attributes & Parameters Attributes x, x1 , x2 : The

    uniform randomly generated from [-3, 3] Task-specific model parameters wd , wd1 , wd2 : The uniform randomly generated from [-1, 1] 21 / 34 Experiments: Synthetic Data
  19. Tasks Training tasks : 10,000 Validation tasks : 30 Target

    tasks : 300 Instances The number of support instances NS : 5 The number of query instances NQ : 27 22 / 34 Experiments: Synthetic Data
  20. Networks (f¯ v , f¯ c ), (g¯ v ,

    g¯ c ), (fv , fc ), (gv , gc ) : Three-layered feed-forward neural networks with 32 hidden units for all neural networks fy : Three-layered feed-forward neural networks with 1 units for the output layer and 32 hidden units for other neural networks Activation function : ReLU(x) = max(0, x) 23 / 34 Experiments: Proposed Method Settings
  21. Other Adam with learning rate 10−3 Dropout rate 0.1 Batch

    size B = 256 24 / 34 Experiments: Proposed Method Settings
  22. The result by the proposed method with the 6 target

    tasks • Two-dimensional linear (a), (b) relationships • Two-dimensional nonlinear (c), (d) relationships • Three-dimensional relationship with a single model Red circles : Five target support instances Blue crosses : True target query instances Green crosses : The predicted target query instances with proposed method 25 / 34 Results
  23. t-SNE visualization of latent attribute vectors vdi for target support

    sets in the synthetic datasets Red : x in y = wd x Green : x in y = sin(x + 3wd ) Blue and Magenta : x1 and x2 in y = wd1 x1 + sin(x2 + 3wd2 ) The latent attribute vectors with the same attribute property were closely located to each other 26 / 34 Results
  24. Data OpenML: Open online platform for machine learning Instances :

    10 ∼ 300 Attributes : 2 ∼ 30 Tasks Training tasks : 37 Validation tasks : 5 Target tasks : 17 —————————— Total tasks : 59 27 / 34 Experiments: OpenML
  25. Instances The number of support instances : NS : 3

    The number of query instances : NQ : 29 Other Batch size B = 37 Other settings are the same as the simple synthetic regression tasks 28 / 34 Experiments: OpenML
  26. Comparing methods • DS (deep set) • FT (fine-tuning) •

    MAML (model-agnostic meta-learning) • NP (conditional neural process) • Ridge (linear regression with L2 regularization) • Lasso (linear regression with L1 regularization) • BR (Bayesian ridge regression) • KR (kernel ridge regression with a linear kernel) • GP (Gaussian process regression with an RBF kernel) • NN (neural net) 29 / 34 Results
  27. The mean squared error averaged over 30 experiments with different

    training, validation, and target splits The proposed method achieved the lowest error compared with existing meta-learning and regression methods 30 / 34 Results
  28. Left : The averaged mean squared errors when changing the

    number of instances in a support set at a test phase Right : The averaged mean squared errors with different number of training tasks 31 / 34 Results
  29. Training computational time in hours The training time of the

    proposed method was shorter than MAML since the proposed method does not require iterative gradient descent steps for adapting to a support set In the test phase, the proposed method efficiently predicted responses without optimization by feeding the support and query sets into the trained neural networks 32 / 34 Results
  30. We proposed a neural network-based meta-learning method that learns from

    multiple tasks with different attribute spaces, and predicts a response given a few instances in unseen tasks In the experiments with synthetic datasets and 59 datasets in OpenML, we demonstrate that our proposed method can predict the responses given a few labeled instances in new tasks after being trained with tasks with heterogeneous attribute spaces 33 / 34 Conclusion
  31. 1 Improve the efficiency of the training procedure 2 Investigate

    different types of neural networks with variable length inputs for inferring latent attribute and response vectors, such as attentions 3 Use prior knowledge about attributes, such as correspondence information across tasks and descriptions on attributes 34 / 34 Future Work