Meta-learning (MAML) • Probabilistic Active Meta-learning (PAML) • Meta Reinforcement Learning etc ... However, any methods assume that all training and target tasks share the same attribute (feature) space, and they are inapplicable when attribute sizes are different across tasks. Tasks whose attribute sizes are different: • y = wx & y = w1 x1 + w2 x2 3 / 34 Preface
on tasks with various attribute spaces It enables to solve unseen tasks whose attribute spaces are different from the training tasks given few labeled instances 4 / 34 Preface
with heterogeneous attribute spaces • Dd = {(xdn , ydn )}Nd n=1 : The set of the pairs of observed attribute vectors xdn ∈ RId and response vectors ydn ∈ RJd of the nth instance in task d • Nd is the number of instances, Id is the number of attributes, and Jd is the number of responses • The numbers of instances, attributes and responses can be different across tasks Nd = Nd , Id = Id , and Jd = Jd 5 / 34 Propose Method: Dataset
Dataset on a target task (Support set) • Nd∗ : The number of instances which is small • The target task isn’t contained in the given training tasks d∗ ∈ {1, . . . , D} We want to predict response yd∗n for observed attribute vector xd∗n (Query) in the target task 6 / 34 Propose Method: Dataset
latent representations of each attribute and each response from a few labeled instances Prediction Network Responses of unlabeled instances are predicted with the inferred representations 7 / 34 Proposed Method: Model
vi : The representation of the ith attribute C = {cj }J j=1 : Latent response vectors • cj : The representation of the jth response z: Latent instance vector 9 / 34 Proposed Method: Model
response representation ¯ cj • f¯ v , g¯ v , f¯ c , g¯ c : Feed-forward neural networks Initial attribute / response representation ¯ vi / ¯ cj ¯ vi = g¯ v 1 N N n=1 f¯ v (xni ) , ¯ cj = g¯ c 1 N N n=1 f¯ c (ynj ) (1) 11 / 34 Inference Network
don’t contain information about the relationship with other attributes and responses ⇒ By concatenating the representations and their values, we incorporate information on each attribute and response with a permutation invariant neural network 12 / 34 Inference Network
• fu , gu : Feed-forward neural networks The representation for the nth instance un un = gu 1 I I i=1 fu ([¯ vi , xni ]) + 1 J J j=1 fu ([¯ cj , ynj ]) (2) 13 / 34 Inference Network
x = (xi )I i=1 and latent attribute vectors V • fz , gz : Feed-forward neural networks Latent instance vector z z = gz 1 I I i=1 fz ([vi , xi ]) (4) 16 / 34 Prediction Network
instance vector z and latent response vectors C • fy : Feed-forward neural networks • Φ : Parameters The predicted response ˆ y ˆ yj (x, S; Φ) = fy ([cj , z]) (5) The prediction depends on support set S and parameters Φ of the following all neural networks 17 / 34 Prediction Network
• Generated support set S and query set Q given training datasets {Dd }D d=1 • NQ , JQ : The number of instances / responses in query set Q Updated parameters ˆ Φ ˆ Φ = arg min Φ EDd [E(S,Q)∼Dd [E(Q|S; Φ)]] (6) where E(Q|S; Φ) = 1 NQ JQ (x,y)∈Q JQ j=1 yj − ˆ yj (x, S; Φ) (7) 18 / 34 Training
tasks with one- or two-dimensional attribute spaces and one-dimensional response spaces • One third of tasks were generated from a one-dimensional linear model y = wd x • One third of tasks were generated from a one-dimensional sine curve y = sin(x + 3wd ) • The remaining of tasks were generated from the following two-dimensional model y = wd1 x1 + sin(x2 + 3wd2 ) y = sin(x + 3wd ) y = wd1 x1 + sin(x2 + 3wd2 ) y = wd x All Tasks 20 / 34 Experiments: Synthetic Data
g¯ c ), (fv , fc ), (gv , gc ) : Three-layered feed-forward neural networks with 32 hidden units for all neural networks fy : Three-layered feed-forward neural networks with 1 units for the output layer and 32 hidden units for other neural networks Activation function : ReLU(x) = max(0, x) 23 / 34 Experiments: Proposed Method Settings
tasks • Two-dimensional linear (a), (b) relationships • Two-dimensional nonlinear (c), (d) relationships • Three-dimensional relationship with a single model Red circles : Five target support instances Blue crosses : True target query instances Green crosses : The predicted target query instances with proposed method 25 / 34 Results
sets in the synthetic datasets Red : x in y = wd x Green : x in y = sin(x + 3wd ) Blue and Magenta : x1 and x2 in y = wd1 x1 + sin(x2 + 3wd2 ) The latent attribute vectors with the same attribute property were closely located to each other 26 / 34 Results
The number of query instances : NQ : 29 Other Batch size B = 37 Other settings are the same as the simple synthetic regression tasks 28 / 34 Experiments: OpenML
training, validation, and target splits The proposed method achieved the lowest error compared with existing meta-learning and regression methods 30 / 34 Results
proposed method was shorter than MAML since the proposed method does not require iterative gradient descent steps for adapting to a support set In the test phase, the proposed method efficiently predicted responses without optimization by feeding the support and query sets into the trained neural networks 32 / 34 Results
multiple tasks with different attribute spaces, and predicts a response given a few instances in unseen tasks In the experiments with synthetic datasets and 59 datasets in OpenML, we demonstrate that our proposed method can predict the responses given a few labeled instances in new tasks after being trained with tasks with heterogeneous attribute spaces 33 / 34 Conclusion
different types of neural networks with variable length inputs for inferring latent attribute and response vectors, such as attentions 3 Use prior knowledge about attributes, such as correspondence information across tasks and descriptions on attributes 34 / 34 Future Work