Slide 18
Slide 18 text
0 0 4
(
Input
FC Layer
Transposed
Gumbel Noise
1/τ
Softmax
Dot-Product
( Ni , c )
( Ni+1 , Ni )
( Ni+1 , Ni )
( Ni+1 , c )
×
+
(b) Gumbel Subset Sampling
Annealing
τ → 0+
τ = 1
ntion. The core representation
Instead, we use a hard and discrete selection w
to-end trainable gumbel softmax (Eq. 3):
y
gumbel
= gumbel softmax(wXT
i
) · X
i
, w 2
in training phase, it provides smooth gradients
crete reparameterization trick. With annealing,
ates to a hard selection in test phase.
A Gumbel Subset Sampling (GSS) is simply
point version of Eq. 13, which means a distribut
sets,
GSS(X
i
) = gumbel softmax(WXT
i
)·X
i
, W
The following proposition theoretically gua
permutation-invariance of GSS.
!, ∈ ℝ), × j
!,
* = klm !, ∈ ℝ), × ),n$
ÖÜÜ !, = Máàâäã(å!,
ç) !,
å ∈ ℝ),n$×j
S
fv p Um