Slide 23
Slide 23 text
Copyright (C) DeNA Co.,Ltd. All Rights Reserved.
Matching Networks [Vinyals+, NIPS2016]
n The Fully Conditional Embedding g
⁃ Embed in consideration of S
g’
LSTM
LSTM
+
Figure 1: Matching Networks architecture
by showing only a few examples per class, switching the task from minibatch to minibatch,
ike how it will be tested when presented with a few examples of a new task.
s our contributions in defining a model and training criterion amenable for one-shot learning,
ntribute by the definition of tasks that can be used to benchmark other approaches on both
Net and small scale language modeling. We hope that our results will encourage others to work
challenging problem.
ganized the paper by first defining and explaining our model whilst linking its several compo-
o related work. Then in the following section we briefly elaborate on some of the related work
Figure 1: Matching Networks architecture
it by showing only a few examples per class, switching the task from minibatch to minibatch,
h like how it will be tested when presented with a few examples of a new task.
des our contributions in defining a model and training criterion amenable for one-shot learning,
ontribute by the definition of tasks that can be used to benchmark other approaches on both
geNet and small scale language modeling. We hope that our results will encourage others to work
his challenging problem.
23
Figure 1: Matching Networks architecture
examples per class, switching the task from minibatch to minibatch, much like
when presented with a few examples of a new task.
utions in defining a model and training criterion amenable for one-shot learning,
x
i
Support Set(S)
y
i
g’
LSTM
LSTM
+
g’
LSTM
LSTM
+
noting that LSTM
(x, h, c)
follows the same LSTM implementation defined in [
h
the output (i.e., cell after the output gate), and
c
the cell.
a
is commonly refe
based attention, and the softmax in eq. 6 normalizes w.r.t.
g(xi)
. The read-ou
concatenated to
hk
1
. Since we do
K
steps of “reads”, attLSTM
(f
0
(ˆ
x), g(S),
is as described in eq. 3.
A.2 The Fully Conditional Embedding
g
In section 2.1.2 we described the encoding function for the elements in the sup
as a bidirectional LSTM. More precisely, let
g
0
(xi)
be a neural network (simila
VGG or Inception model). Then we define
g(xi, S) = ~
hi +
~
hi + g
0
(xi)
with:
~
hi,~
ci =
LSTM
(g
0
(xi),~
hi
1,~
ci
1)
~
hi,
~
ci =
LSTM
(g
0
(xi),
~
hi
+1,
~
ci
+1)
where, as in above, LSTM
(x, h, c)
follows the same LSTM implementation de
the input,
h
the output (i.e., cell after the output gate), and
c
the cell. Note tha
starts from
i =
|
S
|. As in eq. 3, we add a skip connection between input and ou
B ImageNet Class Splits
Here we define the two class splits used in our full ImageNet experiments –
excluded for training during our one-shot experiments described in section 4.1.
Lrand =
n01498041, n01537544, n01580077, n01592084, n01632777, n01644373, n01665541, n01675722, n016882
n01818515, n01843383, n01883070, n01950731, n02002724, n02013706, n02092339, n02093256, n020953
10
g’: neural network (e.g., VGG or Inception)
a(hk
1, g(xi)) =
softmax
(hk
1g(xi))
noting that LSTM
(x, h, c)
follows the same LSTM implementation defined in [23] with
x
th
h
the output (i.e., cell after the output gate), and
c
the cell.
a
is commonly referred to as “c
based attention, and the softmax in eq. 6 normalizes w.r.t.
g(xi)
. The read-out
rk
1
from
concatenated to
hk
1
. Since we do
K
steps of “reads”, attLSTM
(f
0
(ˆ
x), g(S), K) = hK
w
is as described in eq. 3.
A.2 The Fully Conditional Embedding
g
In section 2.1.2 we described the encoding function for the elements in the support set
S
,
g
as a bidirectional LSTM. More precisely, let
g
0
(xi)
be a neural network (similar to
f
0 above
VGG or Inception model). Then we define
g(xi, S) = ~
hi +
~
hi + g
0
(xi)
with:
~
hi,~
ci =
LSTM
(g
0
(xi),~
hi
1,~
ci
1)
~
hi,
~
ci =
LSTM
(g
0
(xi),
~
hi
+1,
~
ci
+1)
where, as in above, LSTM
(x, h, c)
follows the same LSTM implementation defined in [23]
the input,
h
the output (i.e., cell after the output gate), and
c
the cell. Note that the recursio
starts from
i =
|
S
|. As in eq. 3, we add a skip connection between input and outputs.
B ImageNet Class Splits
Here we define the two class splits used in our full ImageNet experiments – these classe
excluded for training during our one-shot experiments described in section 4.1.2.
Lrand =
n01498041, n01537544, n01580077, n01592084, n01632777, n01644373, n01665541, n01675722, n01688243, n01729977, n
n01818515, n01843383, n01883070, n01950731, n02002724, n02013706, n02092339, n02093256, n02095314, n02097130, n
g(x
i
,S)
Let be the sum of
and outputs of Bi-LSTM
g(x
i
,S) g'(x
i
)
x
i