NIPS2016] n The Fully Conditional Embedding g ⁃ Embed in consideration of S g’ LSTM LSTM + Figure 1: Matching Networks architecture by showing only a few examples per class, switching the task from minibatch to minibatch, ike how it will be tested when presented with a few examples of a new task. s our contributions in defining a model and training criterion amenable for one-shot learning, ntribute by the definition of tasks that can be used to benchmark other approaches on both Net and small scale language modeling. We hope that our results will encourage others to work challenging problem. ganized the paper by first defining and explaining our model whilst linking its several compo- o related work. Then in the following section we briefly elaborate on some of the related work Figure 1: Matching Networks architecture it by showing only a few examples per class, switching the task from minibatch to minibatch, h like how it will be tested when presented with a few examples of a new task. des our contributions in defining a model and training criterion amenable for one-shot learning, ontribute by the definition of tasks that can be used to benchmark other approaches on both geNet and small scale language modeling. We hope that our results will encourage others to work his challenging problem. 23 Figure 1: Matching Networks architecture examples per class, switching the task from minibatch to minibatch, much like when presented with a few examples of a new task. utions in defining a model and training criterion amenable for one-shot learning, x i Support Set(S) y i g’ LSTM LSTM + g’ LSTM LSTM + noting that LSTM (x, h, c) follows the same LSTM implementation defined in [ h the output (i.e., cell after the output gate), and c the cell. a is commonly refe based attention, and the softmax in eq. 6 normalizes w.r.t. g(xi) . The read-ou concatenated to hk 1 . Since we do K steps of “reads”, attLSTM (f 0 (ˆ x), g(S), is as described in eq. 3. A.2 The Fully Conditional Embedding g In section 2.1.2 we described the encoding function for the elements in the sup as a bidirectional LSTM. More precisely, let g 0 (xi) be a neural network (simila VGG or Inception model). Then we define g(xi, S) = ~ hi + ~ hi + g 0 (xi) with: ~ hi,~ ci = LSTM (g 0 (xi),~ hi 1,~ ci 1) ~ hi, ~ ci = LSTM (g 0 (xi), ~ hi +1, ~ ci +1) where, as in above, LSTM (x, h, c) follows the same LSTM implementation de the input, h the output (i.e., cell after the output gate), and c the cell. Note tha starts from i = | S |. As in eq. 3, we add a skip connection between input and ou B ImageNet Class Splits Here we define the two class splits used in our full ImageNet experiments – excluded for training during our one-shot experiments described in section 4.1. Lrand = n01498041, n01537544, n01580077, n01592084, n01632777, n01644373, n01665541, n01675722, n016882 n01818515, n01843383, n01883070, n01950731, n02002724, n02013706, n02092339, n02093256, n020953 10 g’: neural network (e.g., VGG or Inception) a(hk 1, g(xi)) = softmax (hk 1g(xi)) noting that LSTM (x, h, c) follows the same LSTM implementation defined in [23] with x th h the output (i.e., cell after the output gate), and c the cell. a is commonly referred to as “c based attention, and the softmax in eq. 6 normalizes w.r.t. g(xi) . The read-out rk 1 from concatenated to hk 1 . Since we do K steps of “reads”, attLSTM (f 0 (ˆ x), g(S), K) = hK w is as described in eq. 3. A.2 The Fully Conditional Embedding g In section 2.1.2 we described the encoding function for the elements in the support set S , g as a bidirectional LSTM. More precisely, let g 0 (xi) be a neural network (similar to f 0 above VGG or Inception model). Then we define g(xi, S) = ~ hi + ~ hi + g 0 (xi) with: ~ hi,~ ci = LSTM (g 0 (xi),~ hi 1,~ ci 1) ~ hi, ~ ci = LSTM (g 0 (xi), ~ hi +1, ~ ci +1) where, as in above, LSTM (x, h, c) follows the same LSTM implementation defined in [23] the input, h the output (i.e., cell after the output gate), and c the cell. Note that the recursio starts from i = | S |. As in eq. 3, we add a skip connection between input and outputs. B ImageNet Class Splits Here we define the two class splits used in our full ImageNet experiments – these classe excluded for training during our one-shot experiments described in section 4.1.2. Lrand = n01498041, n01537544, n01580077, n01592084, n01632777, n01644373, n01665541, n01675722, n01688243, n01729977, n n01818515, n01843383, n01883070, n01950731, n02002724, n02013706, n02092339, n02093256, n02095314, n02097130, n g(x i ,S) Let be the sum of and outputs of Bi-LSTM g(x i ,S) g'(x i ) x i