Meta-learning from Tasks with Heterogeneous Attribute Spaces

Meta-learning from Tasks with Heterogeneous Attribute Spaces Tomoharu Iwata, Atsutoshi
Kumagai, NeurIPS 2020 December 16, 2021

1 Preface 2 Propose Method 3 Training 4 Experiments &
Results 2 / 34 Contents

So far, many meta-learning methods have been proposed. • Model-Agnostic
Meta-learning (MAML) • Probabilistic Active Meta-learning (PAML) • Meta Reinforcement Learning etc ... However, any methods assume that all training and target tasks share the same attribute (feature) space, and they are inapplicable when attribute sizes are different across tasks. Tasks whose attribute sizes are different: • y = wx & y = w1 x1 + w2 x2 3 / 34 Preface

We propose a heterogeneous meta-learning method that train a model
on tasks with various attribute spaces It enables to solve unseen tasks whose attribute spaces are different from the training tasks given few labeled instances 4 / 34 Preface

Training phase {Dd }D d=1 : Datasets in multiple tasks
with heterogeneous attribute spaces • Dd = {(xdn , ydn )}Nd n=1 : The set of the pairs of observed attribute vectors xdn ∈ RId and response vectors ydn ∈ RJd of the nth instance in task d • Nd is the number of instances, Id is the number of attributes, and Jd is the number of responses • The numbers of instances, attributes and responses can be different across tasks Nd = Nd , Id = Id , and Jd = Jd 5 / 34 Propose Method: Dataset

Test phase Dd∗ = {(xd∗n , yd∗n )}Nd∗ n=1 :
Dataset on a target task (Support set) • Nd∗ : The number of instances which is small • The target task isn’t contained in the given training tasks d∗ ∈ {1, . . . , D} We want to predict response yd∗n for observed attribute vector xd∗n (Query) in the target task 6 / 34 Propose Method: Dataset

Our method is composed of two networks: Inference Network Infer
latent representations of each attribute and each response from a few labeled instances Prediction Network Responses of unlabeled instances are predicted with the inferred representations 7 / 34 Proposed Method: Model

S = {(xn , yn )}N n=1 : Support set
in a task • xn = (xni )I i=1 : I-dimensional observed attribute vector • yn = (ynj )J j=1 : J-dimensional observed response vector Use categorical values using one-hot encoding 8 / 34 Proposed Method: Model

V = {vi }I i=1 : Latent attribute vectors •
vi : The representation of the ith attribute C = {cj }J j=1 : Latent response vectors • cj : The representation of the jth response z: Latent instance vector 9 / 34 Proposed Method: Model

Input: Support set in a task S Output: Latent attribute
vectors V & Latent response vectors C 10 / 34 Inference Network

First, we calculate initial attribute representation ¯ vi and initial
response representation ¯ cj • f¯ v , g¯ v , f¯ c , g¯ c : Feed-forward neural networks Initial attribute / response representation ¯ vi / ¯ cj ¯ vi = g¯ v 1 N N n=1 f¯ v (xni ) , ¯ cj = g¯ c 1 N N n=1 f¯ c (ynj ) (1) 11 / 34 Inference Network

The initial attribute ¯ vi and response representations ¯ cj
don’t contain information about the relationship with other attributes and responses ⇒ By concatenating the representations and their values, we incorporate information on each attribute and response with a permutation invariant neural network 12 / 34 Inference Network

Next, we calculate the representation for the nth instance un
• fu , gu : Feed-forward neural networks The representation for the nth instance un un = gu 1 I I i=1 fu ([¯ vi , xni ]) + 1 J J j=1 fu ([¯ cj , ynj ]) (2) 13 / 34 Inference Network

Finally, we calculate attribute representation vi and response representation cj
• fv , gv , fc , gc : Feed-forward neural networks Attribute / Response representation ¯ vi / ¯ cj vi = gv 1 N N n=1 fv ([un , xni ]) , cj = gc 1 N N n=1 fc ([un , ynj ]) (3) 14 / 34 Inference Network

Input: Query x, Latent attribute spaces V & Latent response
spaces C Output: The predicted response ˆ y 15 / 34 Prediction Network

We obtain latent instance vector z given observed attribute vector
x = (xi )I i=1 and latent attribute vectors V • fz , gz : Feed-forward neural networks Latent instance vector z z = gz 1 I I i=1 fz ([vi , xi ]) (4) 16 / 34 Prediction Network

We predict response ˆ y on query x with latent
instance vector z and latent response vectors C • fy : Feed-forward neural networks • Φ : Parameters The predicted response ˆ y ˆ yj (x, S; Φ) = fy ([cj , z]) (5) The prediction depends on support set S and parameters Φ of the following all neural networks 17 / 34 Prediction Network

We estimate neural network parameters Φ by minimizing the loss
• Generated support set S and query set Q given training datasets {Dd }D d=1 • NQ , JQ : The number of instances / responses in query set Q Updated parameters ˆ Φ ˆ Φ = arg min Φ EDd [E(S,Q)∼Dd [E(Q|S; Φ)]] (6) where E(Q|S; Φ) = 1 NQ JQ (x,y)∈Q JQ j=1 yj − ˆ yj (x, S; Φ) (7) 18 / 34 Training

Algorithm: 19 / 34 Training

We first evaluated the proposed method on simple synthetic regression
tasks with one- or two-dimensional attribute spaces and one-dimensional response spaces • One third of tasks were generated from a one-dimensional linear model y = wd x • One third of tasks were generated from a one-dimensional sine curve y = sin(x + 3wd ) • The remaining of tasks were generated from the following two-dimensional model y = wd1 x1 + sin(x2 + 3wd2 ) y = sin(x + 3wd ) y = wd1 x1 + sin(x2 + 3wd2 ) y = wd x All Tasks 20 / 34 Experiments: Synthetic Data

Attributes & Parameters Attributes x, x1 , x2 : The
uniform randomly generated from [-3, 3] Task-specific model parameters wd , wd1 , wd2 : The uniform randomly generated from [-1, 1] 21 / 34 Experiments: Synthetic Data

Tasks Training tasks : 10,000 Validation tasks : 30 Target
tasks : 300 Instances The number of support instances NS : 5 The number of query instances NQ : 27 22 / 34 Experiments: Synthetic Data

Networks (f¯ v , f¯ c ), (g¯ v ,
g¯ c ), (fv , fc ), (gv , gc ) : Three-layered feed-forward neural networks with 32 hidden units for all neural networks fy : Three-layered feed-forward neural networks with 1 units for the output layer and 32 hidden units for other neural networks Activation function : ReLU(x) = max(0, x) 23 / 34 Experiments: Proposed Method Settings

Other Adam with learning rate 10−3 Dropout rate 0.1 Batch
size B = 256 24 / 34 Experiments: Proposed Method Settings

The result by the proposed method with the 6 target
tasks • Two-dimensional linear (a), (b) relationships • Two-dimensional nonlinear (c), (d) relationships • Three-dimensional relationship with a single model Red circles : Five target support instances Blue crosses : True target query instances Green crosses : The predicted target query instances with proposed method 25 / 34 Results

t-SNE visualization of latent attribute vectors vdi for target support
sets in the synthetic datasets Red : x in y = wd x Green : x in y = sin(x + 3wd ) Blue and Magenta : x1 and x2 in y = wd1 x1 + sin(x2 + 3wd2 ) The latent attribute vectors with the same attribute property were closely located to each other 26 / 34 Results

Data OpenML: Open online platform for machine learning Instances :
10 ∼ 300 Attributes : 2 ∼ 30 Tasks Training tasks : 37 Validation tasks : 5 Target tasks : 17 —————————— Total tasks : 59 27 / 34 Experiments: OpenML

Instances The number of support instances : NS : 3
The number of query instances : NQ : 29 Other Batch size B = 37 Other settings are the same as the simple synthetic regression tasks 28 / 34 Experiments: OpenML

Comparing methods • DS (deep set) • FT (fine-tuning) •
MAML (model-agnostic meta-learning) • NP (conditional neural process) • Ridge (linear regression with L2 regularization) • Lasso (linear regression with L1 regularization) • BR (Bayesian ridge regression) • KR (kernel ridge regression with a linear kernel) • GP (Gaussian process regression with an RBF kernel) • NN (neural net) 29 / 34 Results

The mean squared error averaged over 30 experiments with different
training, validation, and target splits The proposed method achieved the lowest error compared with existing meta-learning and regression methods 30 / 34 Results

Left : The averaged mean squared errors when changing the
number of instances in a support set at a test phase Right : The averaged mean squared errors with different number of training tasks 31 / 34 Results

Training computational time in hours The training time of the
proposed method was shorter than MAML since the proposed method does not require iterative gradient descent steps for adapting to a support set In the test phase, the proposed method efficiently predicted responses without optimization by feeding the support and query sets into the trained neural networks 32 / 34 Results

We proposed a neural network-based meta-learning method that learns from
multiple tasks with different attribute spaces, and predicts a response given a few instances in unseen tasks In the experiments with synthetic datasets and 59 datasets in OpenML, we demonstrate that our proposed method can predict the responses given a few labeled instances in new tasks after being trained with tasks with heterogeneous attribute spaces 33 / 34 Conclusion

1 Improve the efficiency of the training procedure 2 Investigate
different types of neural networks with variable length inputs for inferring latent attribute and response vectors, such as attentions 3 Use prior knowledge about attributes, such as correspondence information across tasks and descriptions on attributes 34 / 34 Future Work

Meta-learning from Tasks with Heterogeneous Att...

Meta-learning from Tasks with Heterogeneous Attribute Spaces

More Decks by since1998

Other Decks in Programming

Featured

Transcript